In recent months I have written a lot (and thought a lot more) about the replication crisis and the proliferation of direct replication attempts. I admit I haven’t bothered to quantify this but I have an impression that most of these attempts fail to reproduce the findings they try to replicate. I can understand why this is unsettling to many people. However, as I have argued before, I find the current replication movement somewhat misguided.
A big gaping hole where your theory should be
Over the past year I have also written
a lot too much about Psi research. Most recently, I summarised my views on this in an uncharacteristically short post (by my standards) in reply to Jacob Jolij. But only very recently I realised my that my views on all of this actually converge on the same fundamental issue. On that note I would like to thank Malte Elson with whom I discussed some of these issues at that Open Science event at UCL recently. Our conversation played a significant role in clarifying my thoughts on this.
My main problem with Psi research is that it has no firm theoretical basis and that the use of labels like “Psi” or “anomalous” or whatnot reveals that this line of research is simply about stating the obvious. There will always be unexplained data but that doesn’t prove any theory. It has now dawned on me that my discomfort with the current replication movement stems from the same problem: failed direct replications do not explain anything. They don’t provide any theoretical advance to our knowledge about the world.
I am certainly not the first person to say this. Jason Mitchell’s treatise about failed replications covered many of the same points. In my opinion it is unfortunate that these issues have been largely ignored by commenters. Instead his post has been widely maligned and ridiculed. In my mind, this reaction was not only uncivil but really quite counter-productive to the whole debate.
Why most published research findings are probably not waterfowl
A major problem with his argument was pointed out by Neuroskeptic: Mitchell seems to hold replication attempts to a different standard than original research. While I often wonder if it is easier to incompetently fail to replicate a result than to incompetently p-hack it into existence, I agree that it is not really feasible to take that into account. I believe science should err on the side of open-minded skepticism. Thus even though it is very easy to fail to replicate a finding, the only truly balanced view is to use the same standards for original and replication evidence alike.
Mitchell describes the problems with direct replications with a famous analogy: if you want to prove the existence of black swans, all it takes is to show one example. No matter how many white swans you may produce afterwards, they can never refute the original reports. However, in my mind this analogy is flawed. Most of the effects we study in psychology or neuroscience research are not black swans. A significant social priming effect or a structural brain-behaviour correlation is not irrefutable evidence that it is real.
Imagine that there really were no black swans. It is conceivable that someone might parade around a black swan but maybe it’s all an elaborate hoax. Perhaps somebody just painted a white swan? Frauds of such a sensational nature are not unheard of in science, but most of us trust that they are nonetheless rare. More likely, it could be that the evidence is somehow faulty. Perhaps the swan was spotted in poor lighting conditions making it appear black. Considering how many people can disagree about whether a photo depicts a black or a white dress this possibility seems entirely conceivable. Thus simply showing a black swan is insufficient evidence.
On the other hand, Mitchell is entirely correct that parading a whole swarm of white swans is also insufficient evidence against the existence of black swans. The same principle applies here. The evidence could also be faulty. If we only looked at swans native to Europe we would have a severe sampling bias. In the worst case, people might be photographing black swans under conditions that make them appear white.
On the wizardry of cooking social psychologists
This brings us to another oft repeated argument about direct replications. Perhaps the “replicators” are just incompetent or lacking in skill. Mitchell also has an analogy for this (which I unintentionally also used in my previous post). Replicators may just be bad cooks who follow the recipes but nonetheless fail to produce meals that match the beautiful photographs in the cookbooks. In contrast, Neuroskeptic referred to this tongue-in-cheek as the Harry Potter Theory: only those blessed with magical powers are able to replicate. Inept “muggles” failing to replicate a social priming effect should just be ignored.
In my opinion both of these analogies are partly right. The cooking analogy correctly points out that simply following the recipe in a cookbook does not make you a master chef. However, it also ignores the fact that the beautiful photographs in a cookbook are frequently not entirely genuine. To my knowledge, many cookbook photos are actually of cold food to circumvent problems like steam on the camera etc. Most likely the photos will have been doctored in some way and they will almost certainly be the best pick out of several cooking attempts and numerous photos. So while it is true that the cook was an expert while you probably aren’t, the photo does not necessarily depict a representative meal.
The jocular wizardry argument implies that anyone with a modicum of expertise in a research area should be able to replicate a research finding. As students we are taught that the methods sections of our research publications should allow anyone to replicate our experiments. But this is certainly not feasible: some level of expertise and background knowledge should be expected for a successful replication. I don’t think I could replicate any findings in radio astronomy regardless how well established they may be.
One frustration many authors of results that have failed to replicate have expressed to me (and elsewhere) is the implicit assumption by many “replicators” that social psychology research is easy. I am not a social psychologist. I have no idea how easy these experiments are but I am willing to give people the benefit of the doubt here. It is possible that some replication attempts overlook critical aspects of the original experiments.
However, I think one of the key points of Neuroskeptic’s Harry Potter argument applies here: the validity of a “replicator’s” expertise, that is their ability to cast spells, cannot be contingent on their ability to produce these effects in the first place. This sort of reasoning seems circular and, appropriately enough, sounds like magical thinking.
How to fix our replicator malfunction
The way I see it both arguments carry some weight here. I believe that
muggles replicators should have to demonstrate their ability to do this kind of research properly in order for us to have any confidence in their failed wizardry. When it comes to the recent failure to replicate nearly half a dozen studies reporting structural brain-behaviour correlations, Ryota Kanai suggested that the replicators should have analysed the age dependence of grey matter density to confirm that their methods were sensitive enough to detect such well-established effects. Similarly, all the large-scale replication attempts in social psychology should contain such sanity checks. On a positive note, the Many Labs 3 project included a replication of the Stroop effect and similar objective tests that fulfill such a role.
However, while such clear-cut baselines are great they are probably insufficient, in particular if the effect size of the “sanity check” is substantially greater than the effect of interest. Ideally, any replication attempt should contain a theoretical basis, an alternative hypothesis to be tested that could explain the original findings. As I said previously, it is the absence of such theoretical considerations that makes most failed replications so unsatisfying to me.
The problem is that for a lot of the replication attempts, whether they are of brain-behaviour correlations, social priming, or Bem’s precognition effects, the only underlying theory replicators put forth is that the original findings were spurious and potentially due to publication bias, p-hacking and/or questionable research practices. This seems mostly unfalsifiable. Perhaps these replication studies could incorporate control conditions/analyses to quantify the severity of p-hacking required to produce the original effects. But this is presumably unfeasible in practice because the parameter space of questionable research practices is so vast that it is impossible to derive a sufficiently accurate measure of them. In a sense, methods for detecting publication bias in meta-analysis are a way to estimate this but the evidence they provide is only probabilistic, not experimental.
Of course this doesn’t mean that we cannot have replication attempts in the absence of a good alternative hypothesis. My mentors instilled in me the view that any properly conducted experiment should be published. It shouldn’t matter whether the results are positive, negative, or inconclusive. Publication bias is perhaps the most pervasive problem scientific research faces and we should seek to reduce it, not amplify it by restricting what should and shouldn’t be published.
Rather I believe we must change the philosophy underlying our attempts to improve science. If you disbelieve the claims of many social priming studies (and honestly, I don’t blame you!) it would be far more convincing to test a hypothesis on why the entire theory is false than showing that some specific findings fail to replicate. It would also free up a lot of resources to actually advance scientific knowledge that are currently used on dismantling implausible ideas.
There is a reason why I haven’t tried to replicate “presentiment” experiments even though I have written about it. Well, to be honest the biggest reason is that my grant is actually quite specific as to what research I should be doing. However, if I were to replicate these findings I would want to test a reasonable hypothesis as to how they come about. I actually have some ideas how to do that but in all honesty I simply find these effects so implausible that I don’t really feel like investing a lot of my time into testing them. Still, if I were to try a replication it would have to be to test an alternative theory because a direct replication is simply insufficient. If my replication failed, it would confirm my prior beliefs but not explain anything. However, if it succeeded, I probably still wouldn’t believe the claims. In other words, we wouldn’t have learned very much either way.