Today I received an email from somebody who had read some of my discussions of Psi research. They made an interesting point that has so far been neglected in most of the debates I participated in. With their permission I post their email (without identifying information) and my response to it. I hope this clarifies my views:
I also have experienced real and significant episodes of precognition. After many experiences I researched my ancestry and found relatives who had histories of episodes of precognition. The studies I have read that claim precognition is not real all have the same error. I can’t pick a card, I can’t tell you what the next sound will be. Precognition does not work like that. I will demonstrate with this example.
I was standing at the front desk at work when I got a terrible feeling something was wrong. I didn’t know what. I called a friend and told him something is wrong. I began a one hour drive home and continued talking to my friend. The feeling that something was wrong grew to an increasing level as a approached the river. I saw a city bus parked on the side of the road. Many vehicles were down by the river. I passed that scene and then told my friend that a child was now drowned and she was close to the bridge about 1/2 a mile down river. The next day the TV news confirmed that she was found at the bridge down river.
No one told me there was a drowning, no one told me it was a girl, no one knew she was floating by the bridge.
This type of thing happens to me regularly. I believe it results from the same thing that will stampede cattle. I think humans communicate through speech and other forms of non verbal communication. I think somehow I am able to know what the herd is thinking or saying without being there. I think the reason I got the feeling something was wrong had to do with the escalating fear and crying out of the people who were madly searching for the child who fell in the river.
So trying to study precognition by getting a person to predict the next card will never work. Look at the reality of how it happens and see if you can study it a different way.
My response to this:
Thank you for your email. I’d say we are in greater agreement than you may think. What I have written on my blog and in the scientific literature about precognition/telepathy/presentiment pertains strictly to the scientific experiments that have been done on these paranormal abilities, usually with the sole aim to prove their existence. You say you “can’t pick a card” etc – tell that to the researchers who believe that showing a very subtle difference from chance performance on such simple experiments is evidence for precognition.
Now, do I believe you have precognition? No, I don’t. The experiences you describe are not uncommon but they may be uncommonly frequent for you. Nevertheless they are anecdotal evidence and my first hunch would be to suspect cognitive biases that we know can masquerade as paranormal abilities. There may also be cognitive processes we currently simply have no understanding of. How we remember our own thoughts is still very poorly understood. The perception of causality is a fascinating topic. We know we can induce causality illusions but this line of inquiry is still in its infancy.
But I cannot be certain of this. Perhaps you do have precognition. I don’t have any intention to convince you that you don’t; I only want to clarify why I don’t believe it, certainly not based on the limited information I have. The main issue here is that your precognition is unfalsifiable. You say yourself that “Precognition does not work like that.” If it does not occur with the same regularity as other natural phenomena, it isn’t amenable to scientific study. Psi researchers believe that precognition etc have that regularity and so they think you can demonstrate it with card-picking experiments. My primary argument is about that line of thinking.
I am not one of those scientists who feel the need to tell everyone what to believe in. These people are just as irritating as religious fundamentalists who seek to convert everybody. If some belief is unfalsiable, like the existence of God or your belief in your precognition, then it falls outside the realm of science. I have no problem with you believing that you have precognition, certainly not as long as it doesn’t cause any harm to anyone. But unless we can construct a falsifiable hypothesis, science has no place in it.
I mentioned the issue of data quality before but reading Richard Morey’s interesting post about standardised effect sizes the other day made me think about this again. Yesterday I gave a lecture discussing Bem’s infamous precognition study and the meta-analysis he recently published of the replication attempts. I hadn’t looked very closely at the meta-analysis data before but for my lecture I produced the following figure:
This shows the standardised effect size for each of the 90 results in that meta-analysis split into four categories. On the left in red we have the ten results by Bem himself (nine of which are his original study and one is a replication of one of them by himself). Next, in orange we have what they call ‘exact replications’ in the meta-analysis, that is, replications that used his program/materials. In blue we have ‘non-exact replications’ – those that sought to replicate the paradigms but didn’t use his materials. Finally, on the right in black we have what I called ‘different’ experiments. These are at best conceptual replications because they also test whether precognition exists but use different experiment protocols. The hexagrams denote the means across all the experiments in each category (these are non-weighted means but it’s not that important for this post).
While the means for all categories are evidently greater than zero, the most notable thing should be that Bem’s findings are dramatically different from the rest. While the mean effect size in the other categories are below or barely at 0.1 and there is considerable spread beyond zero in all of them, all ten of Bem’s results are above zero and, with one exception, above 0.1. This is certainly very unusual and there are all sorts of reasons we could discuss for why this might be…
But let’s not. Instead let’s assume for the sake of this post that there is indeed such a thing as precognition and that Daryl Bem simply knows how to get people to experience it. I doubt that this is a plausible explanation in this particular case – but I would argue that for many kinds of experiments such “experimenter effects” are probably notable. In an fMRI experiment different labs may differ considerably in how well they control participants’ head motion or even simply in terms of the image quality of the MRI scans. In psychophysical experiments different experimenters may differ in how well they explain the task to participants or how meticulous they are in ensuring that they really understood the instructions, etc. In fact, the quality of the methods surely must matter in all experiments, whether they are in astronomy, microbiology, or social priming. Now this argument has been made in many forms, most infamously perhaps in Jason Mitchell’s essay “On the emptiness of failed replications” that drew much ire from many corners. You may disagree with Mitchell on many things but not on the fact that good methods are crucial. What he gets wrong is laying the blame for failed replications solely at the feet of “replicators”. Who is to say that the original authors didn’t bungle something up?
However, it is true that all good science should seek to reduce noise from irrelevant factors to obtain as clean observations as possible of the effect of interest. Using again Bem’s precognition experiments as an example, we could hypothesise that he indeed had a way to relax participants to unlock their true precognitive potential that others seeking to replicate his findings did not. If that were true (I’m willing to bet a fair amount of money that it isn’t but that’s not the point), if true, this would indeed mean that most of the replications – failed or successful – in his meta-analysis are only of low scientific value. All of these experiments are more contaminated by noise confounds than his experiments; thus only he provides clean measurements. Standardised effect sizes like Cohen’s d divide the absolute raw effect by a measure of uncertainty or dispersion in the data. The dispersion is a direct consequence of the noise factors involved. So it should be unsurprising that the effect size is greater for experimenters that are better at eliminating unnecessary noise.
Statistical inference seeks to estimate the population effect size from a limited sample. Thus, a meta-analytic effect size is an estimate of the “true” effect size from a set of replications. But since this population effect includes the noise from all the different experimenters, it does not actually reflect the true effect? The true effect is people’s inherent precognitive ability. The meta-analytic effect size estimate is spoiling that with all the rubbish others pile on with their sloppy Psi experimentation skills. Surely we want to know the former not the latter? Again, for precognition most of us will probably agree that this is unlikely – it seems more trivially explained by some Bem-related artifact – but in many situations this is a very valid point: Imagine one researcher manages to produce a cure for some debilitating disease but others fail to replicate it. I’d bet that most people wouldn’t run around shouting “Failed replication!”, “Publication bias!”, “P-hacking!” but would want to know what makes the original experiment – the one with the working drug – different from the rest.
The way I see that, meta-analysis of large scale replications is not the right way to deal with this problem. Meta-analysis of one lab’s replications are worthwhile, especially as a way to summarise a set of conceptually related experiments – but then you need to take them with a grain of salt because they aren’t independent replications. But large-scale meta-analysis across different labs don’t really tell us all that much. They simply don’t estimate the effect size that really matters. The same applies to replication efforts (and I know I’ve said this before). This is the point on which I have always sympathised with Jason Mitchell: you cannot conclude a lot from a failed replication. A successful replication that nonetheless demonstrates that the original claim is false is another story but simply failing to replicate some effect only tells you that something is (probably) different between the original and the replication. It does not tell you what the difference is.
Sure, it’s hard to make that point when you have a large-scale project like Brian Nosek’s “Estimating the reproducibility of psychological science” (I believe this is a misnomer because they mean replicability not reproducibility – but that’s another debate). Our methods sections are supposed to allow independent replication. The fact that so few of their attempts produced significant replications is a great cause for concern. It seems doubtful that all of the original authors knew what they were doing and so few of the “replicators” did. But in my view, there are many situations where this is not the case.
I’m not necessarily saying that large-scale meta-analysis is entirely worthless but I am skeptical that we can draw many firm conclusions from it. In cases where there is reasonable doubt about differences in data quality or experimenter effects, you need to test these differences. I’ve repeatedly said that I have little patience for claims about “hidden moderators”. You can posit moderating effects all you want but they are not helpful unless you test them. The same principle applies here. Rather than publishing one big meta-analysis after another showing that some effect is probably untrue or, as Psi researchers are wont to do, in an effort to prove that precognition, presentiment, clairvoyance or whatever are real, I’d like to see more attempts to rule out these confounds.
In my opinion the only way to do this is through adversarial collaboration. If an honest skeptic can observe Bem conduct his experiments, inspect his materials, and analyse the data for themselves and yet he still manages to produce these findings, that would go a much longer way convincing me that these effects are real than any meta-analysis ever could.
In recent months I have written a lot (and thought a lot more) about the replication crisis and the proliferation of direct replication attempts. I admit I haven’t bothered to quantify this but I have an impression that most of these attempts fail to reproduce the findings they try to replicate. I can understand why this is unsettling to many people. However, as I have argued before, I find the current replication movement somewhat misguided.
A big gaping hole where your theory should be
Over the past year I have also written a lot too much about Psi research. Most recently, I summarised my views on this in an uncharacteristically short post (by my standards) in reply to Jacob Jolij. But only very recently I realised my that my views on all of this actually converge on the same fundamental issue. On that note I would like to thank Malte Elson with whom I discussed some of these issues at that Open Science event at UCL recently. Our conversation played a significant role in clarifying my thoughts on this.
My main problem with Psi research is that it has no firm theoretical basis and that the use of labels like “Psi” or “anomalous” or whatnot reveals that this line of research is simply about stating the obvious. There will always be unexplained data but that doesn’t prove any theory. It has now dawned on me that my discomfort with the current replication movement stems from the same problem: failed direct replications do not explain anything. They don’t provide any theoretical advance to our knowledge about the world.
I am certainly not the first person to say this. Jason Mitchell’s treatise about failed replications covered many of the same points. In my opinion it is unfortunate that these issues have been largely ignored by commenters. Instead his post has been widely maligned and ridiculed. In my mind, this reaction was not only uncivil but really quite counter-productive to the whole debate.
Why most published research findings are probably not waterfowl
A major problem with his argument was pointed out by Neuroskeptic: Mitchell seems to hold replication attempts to a different standard than original research. While I often wonder if it is easier to incompetently fail to replicate a result than to incompetently p-hack it into existence, I agree that it is not really feasible to take that into account. I believe science should err on the side of open-minded skepticism. Thus even though it is very easy to fail to replicate a finding, the only truly balanced view is to use the same standards for original and replication evidence alike.
Mitchell describes the problems with direct replications with a famous analogy: if you want to prove the existence of black swans, all it takes is to show one example. No matter how many white swans you may produce afterwards, they can never refute the original reports. However, in my mind this analogy is flawed. Most of the effects we study in psychology or neuroscience research are not black swans. A significant social priming effect or a structural brain-behaviour correlation is not irrefutable evidence that it is real.
Imagine that there really were no black swans. It is conceivable that someone might parade around a black swan but maybe it’s all an elaborate hoax. Perhaps somebody just painted a white swan? Frauds of such a sensational nature are not unheard of in science, but most of us trust that they are nonetheless rare. More likely, it could be that the evidence is somehow faulty. Perhaps the swan was spotted in poor lighting conditions making it appear black. Considering how many people can disagree about whether a photo depicts a black or a white dress this possibility seems entirely conceivable. Thus simply showing a black swan is insufficient evidence.
On the other hand, Mitchell is entirely correct that parading a whole swarm of white swans is also insufficient evidence against the existence of black swans. The same principle applies here. The evidence could also be faulty. If we only looked at swans native to Europe we would have a severe sampling bias. In the worst case, people might be photographing black swans under conditions that make them appear white.
On the wizardry of cooking social psychologists
This brings us to another oft repeated argument about direct replications. Perhaps the “replicators” are just incompetent or lacking in skill. Mitchell also has an analogy for this (which I unintentionally also used in my previous post). Replicators may just be bad cooks who follow the recipes but nonetheless fail to produce meals that match the beautiful photographs in the cookbooks. In contrast, Neuroskeptic referred to this tongue-in-cheek as the Harry Potter Theory: only those blessed with magical powers are able to replicate. Inept “muggles” failing to replicate a social priming effect should just be ignored.
In my opinion both of these analogies are partly right. The cooking analogy correctly points out that simply following the recipe in a cookbook does not make you a master chef. However, it also ignores the fact that the beautiful photographs in a cookbook are frequently not entirely genuine. To my knowledge, many cookbook photos are actually of cold food to circumvent problems like steam on the camera etc. Most likely the photos will have been doctored in some way and they will almost certainly be the best pick out of several cooking attempts and numerous photos. So while it is true that the cook was an expert while you probably aren’t, the photo does not necessarily depict a representative meal.
The jocular wizardry argument implies that anyone with a modicum of expertise in a research area should be able to replicate a research finding. As students we are taught that the methods sections of our research publications should allow anyone to replicate our experiments. But this is certainly not feasible: some level of expertise and background knowledge should be expected for a successful replication. I don’t think I could replicate any findings in radio astronomy regardless how well established they may be.
One frustration many authors of results that have failed to replicate have expressed to me (and elsewhere) is the implicit assumption by many “replicators” that social psychology research is easy. I am not a social psychologist. I have no idea how easy these experiments are but I am willing to give people the benefit of the doubt here. It is possible that some replication attempts overlook critical aspects of the original experiments.
However, I think one of the key points of Neuroskeptic’s Harry Potter argument applies here: the validity of a “replicator’s” expertise, that is their ability to cast spells, cannot be contingent on their ability to produce these effects in the first place. This sort of reasoning seems circular and, appropriately enough, sounds like magical thinking.
How to fix our replicator malfunction
The way I see it both arguments carry some weight here. I believe that muggles replicators should have to demonstrate their ability to do this kind of research properly in order for us to have any confidence in their failed wizardry. When it comes to the recent failure to replicate nearly half a dozen studies reporting structural brain-behaviour correlations, Ryota Kanai suggested that the replicators should have analysed the age dependence of grey matter density to confirm that their methods were sensitive enough to detect such well-established effects. Similarly, all the large-scale replication attempts in social psychology should contain such sanity checks. On a positive note, the Many Labs 3 project included a replication of the Stroop effect and similar objective tests that fulfill such a role.
However, while such clear-cut baselines are great they are probably insufficient, in particular if the effect size of the “sanity check” is substantially greater than the effect of interest. Ideally, any replication attempt should contain a theoretical basis, an alternative hypothesis to be tested that could explain the original findings. As I said previously, it is the absence of such theoretical considerations that makes most failed replications so unsatisfying to me.
The problem is that for a lot of the replication attempts, whether they are of brain-behaviour correlations, social priming, or Bem’s precognition effects, the only underlying theory replicators put forth is that the original findings were spurious and potentially due to publication bias, p-hacking and/or questionable research practices. This seems mostly unfalsifiable. Perhaps these replication studies could incorporate control conditions/analyses to quantify the severity of p-hacking required to produce the original effects. But this is presumably unfeasible in practice because the parameter space of questionable research practices is so vast that it is impossible to derive a sufficiently accurate measure of them. In a sense, methods for detecting publication bias in meta-analysis are a way to estimate this but the evidence they provide is only probabilistic, not experimental.
Of course this doesn’t mean that we cannot have replication attempts in the absence of a good alternative hypothesis. My mentors instilled in me the view that any properly conducted experiment should be published. It shouldn’t matter whether the results are positive, negative, or inconclusive. Publication bias is perhaps the most pervasive problem scientific research faces and we should seek to reduce it, not amplify it by restricting what should and shouldn’t be published.
Rather I believe we must change the philosophy underlying our attempts to improve science. If you disbelieve the claims of many social priming studies (and honestly, I don’t blame you!) it would be far more convincing to test a hypothesis on why the entire theory is false than showing that some specific findings fail to replicate. It would also free up a lot of resources to actually advance scientific knowledge that are currently used on dismantling implausible ideas.
There is a reason why I haven’t tried to replicate “presentiment” experiments even though I have written about it. Well, to be honest the biggest reason is that my grant is actually quite specific as to what research I should be doing. However, if I were to replicate these findings I would want to test a reasonable hypothesis as to how they come about. I actually have some ideas how to do that but in all honesty I simply find these effects so implausible that I don’t really feel like investing a lot of my time into testing them. Still, if I were to try a replication it would have to be to test an alternative theory because a direct replication is simply insufficient. If my replication failed, it would confirm my prior beliefs but not explain anything. However, if it succeeded, I probably still wouldn’t believe the claims. In other words, we wouldn’t have learned very much either way.
Please note that this post is a re-post from my lab webpage. I removed it from there because the opinions expressed here are my own and shouldn’t be taken to reflect those of my lab members.
In 2014 I was drawn into debates with various parapsychologists about purported extrasensory perception, such as precognition, telepathy, or clairvoyance (also frequently referred to as “Psi”). It is important to note that there is nothing wrong per se with studying such phenomena. For some “mainstream” researchers even talking about these topics seems to have a stigma and such studies are sometimes ignored. Even though I think many of the claims from para-psychology research are preposterous, ignoring or shunning hypotheses should not be part of the scientific method. Here is a quote by Carl Sagan about a person who had put forth an implausible theory about the solar system:
“Science is a self-correcting process. To be accepted, new ideas must survive the most rigorous standards of evidence and scrutiny. The worst aspect of the Velikovsky affair is not that many of his ideas were wrong or silly or in gross contradiction to the facts.
Rather, the worst aspect is that some scientists attempted to suppress Velikovsky’s ideas. The suppression of uncomfortable ideas may be common in religion or in politics, but it is not the path to knowledge. And there’s no place for it in the endeavor of science.”
Carl Sagan’s Cosmos, Episode 4, Heaven and Hell
So-called Psi phenomena are all fairly common human experiences and therefore gaining a better understanding of them will doubtless advance our general understanding of how the mind works. Critically though, such study calls for an open-minded approach that allows us to see past our preconceptions (I am fully aware of the irony of this statement: failing to keep an open mind is a criticism parapsychologists frequently level against “skeptics”). It requires taking seriously all the possible explanations and working gradually from the bottom up until we have a theory with adequate explanatory power.
Most Psi experiences probably have a very simple explanation. Some observations may indeed be evidence of some process we don’t currently understand; however, the vast majority most likely aren’t. It is far more plausible that the mechanisms by which our brain tries to make sense of the world around us can go wrong occasionally and thus give rise to experiences that seem to contradict physical reality. We know the brain allows a form of precognition, which is called making educated guesses. It also has a kind of telepathic ability to infer what another person is thinking or feeling – this is known as theory of mind. And it even allows clairvoyance of a sort by tapping the endless power of the imagination. Moreover, we know that the human mind is very poor at detecting randomness, precisely because it has evolved to be excellent at detecting patterns, a crucial skill for ensuring survival in a cluttered, chaotic environment. Our intuitions also frequently make us fall for simple logical fallacies and even people with statistical training are not immune to this. By investigating and scrutinising Psi experience in these terms we can learn a lot about the mind and the brain. However, it is when this cautious approach is replaced by the aim to support the existence of a “statistical anomaly that has no mundane explanation” that things go haywire. This is when psychology turns into parapsychology*. It is my estimation that most research on Psi does not aim for a better understanding of the cosmos. Rather, it strives to perpetually maintain the status quo of not-understanding.
As for many “mainstream” scientists, my interest in this line of research was originally sparked by the publication of a study by Daryl Bem in a major psychology journal about apparent precognition effects. I used some of his original data for an inferential method I have been developing because I felt that the implausibility of his findings made for a very good demonstration of how statistical procedures can fail. However, as I outlined above, there is also a wider philosophical aspect to this entire debate in that much of the parapsychology literature seems to violate fundamental principles of the scientific method: Occam’s Razor and informed skepticism. I was thus drawn into debating these issues with some of these researchers. Here I will list the various publications and posts I have written as part of this discussion.
Finally, I published an external blog post arguing why I feel Psi is not a legitimate hypothesis. This was also in response to Jacob Jolij as well as a general response to Mossbridge et al and Bem.
I was also asked to review an EEG study investigating telepathic links between individuals. This journal (F1000 Research) has a unique model of transparency. All of the reviews are post-publication and thus visible to all. Critically, all the raw data of the study are also publicly available allowing the reviewers (or anyone else) to inspect it. You can read the various versions of that manuscript and the review discussion here.
*) Some people use parapsychology to simply mean the scientific investigation of purported paranormal or psychic phenomena and perhaps this is the traditional meaning of the term. This seems odd to me however. Such investigation falls squarely within the area of “mainstream” science. The addition of the “para” prefix separates such investigations unnecessarily from the broader scientific community. It is my impression that many para-psychologists do base their research on the Psi assumption, despite protestations to the contrary, and that they are mainly concerned with convincing others that Psi exists.