Black coins & swan flipping

My previous post sparked a number of responses from various corners, including some exchanges I had on Twitter as well as two blog posts, one by Simine Vazire and another one following on from that. In addition, there has also been another post which discussed (and, in my opinion, misrepresented) similar things I said recently at the UCL “Is Science Broken” panel discussion.

I don’t blame people for misunderstanding what I’m saying. The blame must lie largely with my own inability to communicate my thoughts clearly. Maybe I am just crazy. As they say, you can’t really judge your own sanity. However, I am a bit worried about our field. To me the issues I am trying to raise are self-evident and fundamental. The fact that they apparently aren’t to others makes me wonder if Science isn’t in fact broken after all…

Either way, I want to post a brief (even by Alexetz’ standards?) rejoinder to that. They will just be brief answers to frequently asked questions (or rather, the often constructed strawmen):

Why do you hate replications?

I don’t. I am saying replications are central to good science. This means all (or close to it) studies should contain replications as part of their design. It should be a daisy chain. Each experiment should contain some replications, some sanity checks and control conditions. This serves two purposes: it shows that your experiment was done properly and it helps to accumulate evidence on whether or not the previous findings are reliable. Thus we must stop distinguishing between “replicators” and “original authors”. All scientists should be replicators all the bloody time!

Why should replicators have to show why they failed to replicate?

They shouldn’t. But, as I said in the previous point, they should be expected to provide evidence that they did a proper experiment. And of course the original study should be held to the same standard. This could in fact be a sanity check: if you show that the method used couldn’t possibly reveal reliable data this speaks volumes about the original effect.

It’s not the replicator’s fault if the original study didn’t contain a sanity check!

That is true. It isn’t your fault if the previous study was badly designed. But it is your fault if you are aware of that defect and nonetheless don’t try to do better. And it’s not really that black and white. What was good design yesterday can be bad design today and indefensibly terrible tomorrow. We can always do better. That’s called progress.

But… but… fluke… *gasp* type 1 error… Randomness!?!

Almost every time I discuss this topic someone will righteously point out that I am ignoring the null hypothesis. I am not. Of course the original finding may be a total fluke but you simply can’t know for sure. Under certain provisions you can test predictions that the null hypothesis makes (with Bayesian inference anyway). But that isn’t the same. There are a billion reasons between heaven and earth why you may fail a replication. You don’t even need to do it poorly. It may just be bad luck. Brain-behaviour correlations observed in London will not necessarily be detectable in Amsterdam* because the heterogeneity, and thus the inter-individual variance, in the latter sample is likely to be smaller. This means that for the very same effect size resulting from the same underlying biological process you may need more statistical power. Of course, it could also be some methodological error. Or perhaps the original finding was just a false positive. You can never know.

Confirming the original result was a fluke is new information!

That view is problematic for two reasons. First of all, it is impossible to prove the null (yes, even for Bayesians). Science isn’t math, it doesn’t prove anything. You just collect data that may or may not be consistent with theoretical predictions. Secondly, you should never put too much confidence in any new glorious findings – even if it was high powered (because you don’t really know that) and pre-registered (because that doesn’t prevent people from making mistakes). So your prior that the result is a fluke should be strong anyway. You don’t learn very much new from that.

What then would tell me new information?

A new experiment that tests the same theory – or perhaps even a better theory. It can be a close replication but it can also be a largely conceptual one. I think this dichotomy is false. There are no true direct replications and even if there were they would be pointless. The directness of replication exists on a spectrum (I’ve said this already in a previous post). I admit that the definition of “conceptual” replications in the social priming literature is sometimes a fairly large stretch. You are free to disagree with them. The point is though that if a theory is so flaky that modest changes completely obliterate it then the onus is on the proponent of the theory to show why. In fact, this could be the new, better experiment you’re doing. This is how a replication effort can generate new hypotheses.

Please leave the poor replicators alone!

If you honestly think replicators are the ones getting a hard time I don’t know what world you live in. But that’s a story for another day, perhaps one that will be told by one of the Neuroscience Devils? The invitation to post there remains open…

Science clearly isn’t self-correcting or it wouldn’t be broken!

Apart from being a circular argument, this is also demonstrably wrong. Science isn’t a perpetual motion machine. Science is what scientists do. The fact that we are having these debates is conclusive proof that science self-corrects. I don’t see any tyrannical overlord dictating us to do any of this.

So what do you think should be done?

As I have said many times before, I think we need to train our students (and ourselves) in scientific skepticism and strong inference. We should stop being wedded to our pet theories. We need to make it easier to seek the truth rather than fame. For all I care, pre-registration can probably help with that but it won’t be enough. We have to stop the pervasive idea that an experiment “worked” when it confirmed your hypothesis and failed when it didn’t. We should read Feynman’s Cargo Cult Science. And after thinking about all this negativity, wash it all down by reading (or watching) Carl Sagan to remember how many mysteries yet wait to be solved in this amazing universe we inhabit.


*) I promise I will stop talking about this study now. I really don’t want to offend anyone…

How replication should work