Category Archives: bias

On the magic of independent piloting

TL,DR: Never simply decide to run a full experiment based on whether one of the small pilots in which you tweaked your paradigm supported the hypothesis. Use small pilots only to ensure the experiment produces high quality data, judged by criteria that are unrelated to your hypothesis.

Sorry for the bombardment with posts on data peeking and piloting. I felt this would have cluttered up the previous post so I wrote a separate one. After this one I will go back to doing actual work though, I promise! That grant proposal I should be writing has been neglected for too long…

In my previous post, I simulated what happens when you conduct inappropriate pilot experiments by running a small experiment and then continuing data collection if the pilot produces significant results. This is really data peeking and it shouldn’t come as much of a surprise that this inflates false positives and massively skews effect size estimates. I hope most people realize that this is a terrible thing to do because it makes your results entirely dependent on the outcome. Quite possibly, some people would have learned about this in their undergrad stats classes. As one of my colleagues put it, “if it ends up in the final analysis it is not a pilot.” Sadly, I don’t think this as widely known as it should be. I was not kidding when I said that I have seen it happen before or overheard people discussing having done this type of inappropriate piloting.

But anyway, what is an appropriate pilot then? In my previous post, I suggested you should redo the same experiment but restart data collection. You now stick to the methods that gave you a significant pilot result. Now the data set used to test your hypothesis is completely independent, so it won’t be skewed by the pre-selected pilot data. Put another way, your exploratory pilot allows you to estimate a prior, and your full experiment seeks to confirm it. Surely there is nothing wrong with that, right?

I’m afraid there is and it is actually obvious why: your small pilot experiment is underpowered to detect real effects, especially small ones. So if you use inferential statistics to determine if a pilot experiment “worked,” this small pilot is biased towards detecting larger effect sizes. Importantly, this does not mean you bias your experiment towards larger effect sizes. If you only continue the experiment when the pilot was significant, you are ignoring all of the pilots that would have shown true effects but which – due to the large uncertainty (low power) of the pilot – failed to do so purely by chance. Naturally, the proportion of these false negatives becomes smaller the larger you make your pilot sample – but since pilots are by definition small, the error rate is pretty high in any case. For example, for a true effect size of δ = 0.3, the false negatives at a pilot sample of 2 is 95%. With a pilot sample of 15, it is still as high as 88%. Just for illustration I show below the false negative rates (1-power) for three different true effect sizes. Even for quite decent effect sizes the sensitivity of a small pilot is abysmal:

False Negatives

Thus, if you only pick pilot experiments with significant results to do real experiments you are deluding yourself into thinking that the methods you piloted are somehow better (or “precisely calibrated”). Remember this is based on a theoretical scenario that the effect is real and of fixed strength. Every single pilot experiment you ran investigated the same underlying phenomenon and any difference in outcome is purely due to chance – the tweaking of your methods had no effect whatsoever. You waste all manner of resources piloting some methods you then want to test.

So frequentist inferential statistics on pilot experiments are generally nonsense. Pilots are by nature exploratory. You should only determine significance for confirmatory results. But what are these pilots good for? Perhaps we just want to have an idea of what effect size they can produce and then do our confirmatory experiments for those methods that produce a reasonably strong effect?

I’m afraid that won’t do either. I simulated this scenario in a similar manner as in my previous post. 100,000 times I generated two groups (with a full sample size of n = 80, although the full sample size isn’t critical for this). Both groups are drawn from a population with standard deviation 1 but one group has a mean of zero while the other’s mean is shifted by 0.3 – so we have a true effect size here (the actual magnitude of this true effect size is irrelevant for the conclusions). In each of the 100,000 simulations, the researcher runs a number of pilot subjects per group (plotted on x-axis). Only if the effect size estimate for this pilot exceeds a certain criterion level, the researcher runs an independent, full experiment. The criterion is either 50%, 100%, or 200% of the true effect size. Obviously, the researcher cannot know this however. I simply use these criteria as something that the researcher might be doing in a real world situation. (For the true effect size I used here, these criteria would be d = 0.15, d = 0.3, or d = 0.6, respectively).

The results are below. The graph on the left once again plots the false negative rates against the pilot sample size. A false negative here is not based on significance but on effect size, so any simulation for which d was below the criterion. When the criterion is equal to the true effect size, the false negative rate is constant at 50%. The reason for this is obvious: each simulation is drawn from a population centered on the true effect of 0.3, so half of these simulations will exceed that value. However, when the criterion is not equal to the true effect the false negative rates depend on the pilot sample size. If the criterion is lower than the true effect, false negatives decrease. If the criterion is strict, false negatives increase. Either way, the false negative rates are substantially greater than the 20% mark you would have with an adequately powered experiment. So you will still delude yourself a considerable number of times if you only conduct the full experiment when your pilot has a particular effect size. Even if your criterion is lax (and d = 0.15 for a pilot sounds pretty lax to me), you are missing a lot of true results. Again, remember that all of the pilot experiments here investigated a real effect of fixed size. Tweaking the method makes no difference. The difference between simulations is simply due to chance.

Finally, the graph on the right shows the mean effect sizes  estimated by your completed experiments (but not the absolute this time!). The criterion you used in the pilot makes no difference here (all colors are at the same level), which is reassuring. However, all is not necessarily rosy. The open circles plot the effect size you get under publication bias, that is, if you only publish the significant experiments with p < 0.05. This effect is clearly inflated compared to the true effect size of 0.3. The asterisks plot the effect size estimate if you take all of the experiments. This is the situation you would have (Chris Chambers will like this) if you did a Registered Report for your full experiment and publication of the results is guaranteed irrespective of whether or not they are significant. On average, this effect size is an accurate estimate of the true effect.

Simulation Results

Again, these are only the experiments that were lucky enough to go beyond the piloting stage. You already wasted a lot of time, effort, and money to get here. While the final outcome is solid if publication bias is minimized, you have thrown a considerable number of good experiments into the trash. You’ve also misled yourself into believing that you conducted a valid pilot experiment that honed the sensitivity of your methods when in truth all your pilot experiments were equally mediocre.

I have had a few comments from people saying that they are only interested in large effect sizes and surely that means they are fine? I’m afraid not. As I said earlier already, the principle here is not dependent on the true effect size. It is solely a factor of the low sensitivity of the pilot experiment. Even with a large true effect, your outcome-dependent pilot is a blind chicken that errs around in the dark until it is lucky enough to hit a true effect more or less by chance. For this to happen you must use a very low criterion to turn your pilot into a real experiment. This however also means that if the null hypothesis is true an unacceptable proportion of your pilots produce false positives. Again, remember that your piloting is completely meaningless – you’re simply chasing noise here. It means that your decision whether to go from pilot to full experiment is (almost) completely arbitrary, even when the true effect is large.

So for instance, when the true effect is a whopping δ = 1, and you are using d > 0.15 as a criterion in your pilot of 10 subjects (which is already large for pilots I typically hear about), your false negative rate is nice and low at ~3%. But critically, if the null hypothesis of δ = 0 is true, your false positive rate is ~37%. How often you will fool yourself by turning a pilot into a full experiment depends on the base rate. If you give this hypothesis at 50:50 chance of being true, almost one in three of your pilot experiments will lead you to chase a false positive. If these odds are lower (which they very well may be), the situation becomes increasingly worse.

What should we do then? In my view, there are two options: either run a well-powered confirmatory experiment that tests your hypothesis based on an effect size you consider meaningful. This is the option I would chose if resources are a critical factor. Alternatively, if you can afford the investment of time, money, and effort, you could run an exploratory experiment with a reasonably large sample size (that is, more than a pilot). If you must, tweak the analysis at the end to figure out what hides in the data. Then, run a well-powered replication experiment to confirm the result. The power for this should be high enough to detect effects that are considerably weaker than the exploratory effect size. This exploratory experiment may sound like a pilot but it isn’t because it has decent sensitivity and the only resource you might be wasting is your time* during the exploratory analysis stage.

The take-home message here is: don’t make your experiments dependent on whether your pilot supported your hypothesis, even if you use independent data. It may seem like a good idea but it’s tantamount to magical thinking. Chances are that you did not refine your method at all. Again (and I apologize for the repetition but it deserves repeating): this does not mean all small piloting is bad. If your pilot is about assuring that the task isn’t too difficult for subjects, that your analysis pipeline works, that the stimuli appear as you intended, that the subjects aren’t using a different strategy to perform the task, or quite simply to reduce the measurement noise, then it is perfectly valid to run a few people first and it can even be justified to include them in your final data set (although that last point depends on what you’re studying). The critical difference is that the criteria for green-lighting a pilot experiment are completely unrelated to the hypothesis you are testing.

(* Well, your time and the carbon footprint produced by your various analysis attempts. But if you cared about that, you probably wouldn’t waste resources on meaningless pilots in the first place, so this post is not for you…)

MatLab code for this simulation.

No Tea in Schwarzkopf

Yesterday I came across* this essay by Etzel Cardeña entitled “The unbearable fear of psi: On scientific censorship in the 21st century” in the Journal of Scientific Exploration, an outlet that frequently publishes parapsychology studies. In this essay he bemoans the “suppression” of Psi research by the scientific establishment. I have noticed (personal opinion) that Psi researchers frequently tend to have a bit of a persecution complex although some of the concerns may very well be justified. Seriously entertaining the hypothesis that there is precognition or telepathy is often met with ridicule and I can imagine that it could make life with “mainstream” scientists harder. At the same time I am almost certain that the claims of this suppression are vastly overstated and they don’t make Psi researchers the 21st century Galileos or Giordano Brunos.

Psi is like the thing on the right…

In fact, in a commentary on a Psi study I published two years ago I tried to outline specifically what differs between Galileo’s theories and the Psi hypothesis: the principle of parsimony. Heliocentricism may have faced dogmatic opposition from the religious establishment because it threatened their power and worldview. However, it is nonetheless the better model to explain observations of nature whilst being consistent with the current state of our knowledge. This is why it eventually succeeded against all opposition. The truth will out. Science is self-correcting even if it can take a long time and painful revolutions to get there. The same does not apply to the Psi hypothesis because Psi doesn’t explain anything. Rather Psi is the absence of an explanation because it posits that there are unexplained observations, something I would call stating the obvious. Anyway, I’ve repeatedly said all this before and this isn’t actually the point of this blog post…

In his essay, Cardeña briefly mentions my commentary and discusses in the Appendix some of the strawman arguments that have been levelled against my points. That’s all fair and good. I disagree with him but I have neither the time nor the desire to get back into this discussion right now. However, it brings me to another puzzling thing that I have long wondered about – mainly because it has followed me around for most of my life (ever since moving to an English-speaking country at least): the unbearable inability of people to spell my name correctly.

It used to frustrate me but after now decades of experiencing it regularly I have become accustomed to it. But this doesn’t stop me from being mystified by this error. Let me repeat it again:

There is no T in Schwarzkopf

By far the most common misspelling of my surname is Schwartzkopf. There have also been other mistakes, such as missing the 2nd letter C or replacing the F with an H (that one is particularly common when people try to write it phonetically). I assume the prevalence of the TZ spelling is that in the English language Z is a soft S sound and you need to have a T to produce the sharp German Z sound. That certainly makes sense. I know I’m not alone; a lot of people with foreign sounding names will suffer from frequent misspellings. I have become quite sensitive to this issue and I usually try very hard to ensure I spell other people’s names properly but of course I occasionally fail, too.

But this does not explain the incredible robustness of this TZ error. Cardeña is by far not the only person who has made it and under normal circumstances it would’ve barely registered on my radar. Yet what makes his essay so fascinating is that he manages to spell it correctly at the bottom of page 9 but then repeatedly misspells it in all the subsequent instances. This is in spite of the fact that he did spell it correctly in his own paper (that this essay discusses), that it is correct in his bibliography, and that he could easily access my article. This reminds me of a dyslexic student in my high school class who baffled us all (especially the teacher) by managing to change the spelling of the same word from one line to the next in his school papers (this was before dyslexia was well known or widely accepted as a condition – it would probably not be as dumbfounding to teachers these days). Cardeña is not a bad speller in general. His dyslexia seems to be Schwarzkopf-specific.

And he’s not alone in that. I singled him out here because it’s the latest example I came across but it would be harsh to lay this at his door. In fact, it is possible that his misspelling started because he quoted the TZ mistake from an email by Hauke Heekeren. This does not excuse his misspelling of my name after that in my book given he had access to the correct spelling – but it certainly proves he isn’t alone. Heekeren is German (I think) so he doesn’t have the language excuse either. How did he manage to misspell what is essentially two common German words? But it doesn’t stop there. I’ve also had my name misspelled in this way by a former employer who decided to acknowledge me in a paper they published. I worked with that person for over a year and published papers with them. You’d think they would know how to spell my name but at the very least you’d think they’d look it up before putting it in writing.

The general language excuse is also not that valid for statistical reasons. I am sure there are people with the same name spelled with a T but I don’t know any. I don’t know which spelling is more frequent but the T-spelling certainly has far less exposure. Schwarzkopf (with the correct spelling :P) is the name of an international brand of hair care products (no relations and none of the proceeds go my way, unfortunately). People should see that all the time. Schwarzkopf was also the surname of “Stormin’ Norman“, the coalition commander in the first Iraq War. At least people in the United States were relatively frequently exposed to his name for some time.

So what is it about people consistently misspelling my name despite better knowledge? Is there some sort of cognitive or even perceptual bias happening here? Can we test this experimentally? If you have an idea how to do that, let me know.

Whilst mainly a coffee drinker, occasionally there is also tea in Schwarzkopf

(* Thanks to Leonid Schneider and UK Skeptics for drawing my attention to this article)

Does correlation imply prediction?

TL;DR: Leave-one-out cross-validation is a bad way for testing the predictive power of linear correlation/regression.

Correlation or regression analysis are popular tools in neuroscience and psychology research for analysing individual differences. It fits a model (most typically a linear relationship between two measures) to infer whether the variability in some measure is related to the variability in another measure. Revealing such relationships can help understand the underlying mechanisms. We and others used it in previous studies to test specific mechanistic hypotheses linking brain structure/function and behaviour. It also forms the backbone of twin studies of heritability that in turn can implicate genetic and experiential factors in some trait. Most importantly, in my personal view individual differences are interesting because they acknowledge the fact that every human being is unique rather than simply treating variability as noise and averaging across large groups people.

But typically every report of a correlational finding will be followed by someone zealously pointing out that “Correlation does not imply causation”. And doubtless it is very important to keep that in mind. A statistical association between two variables may simply reflect the fact that they are both related to a third, unknown factor or a correlation may just be a fluke.

Another problem is that the titles of studies using correlation analysis sometimes use what I like to call “smooth narrative” style. Saying that some behaviour is “predicted by” or  “depends on” some brain measure makes for far more accessible and interesting reading that dryly talking about statistical correlations. However, it doesn’t sit well with a lot of people, in part because such language may imply a causal link that the results don’t actually support. Jack Gallant seems to regularly point out on Twitter that the term “prediction” should only ever be used when a predictive model is built on some data set but the validity is tested on an independent data set.

Recently I came across an interesting PubPeer thread debating this question. In this one commenter pointed out that the title of the study under discussion, “V1 surface size predicts GABA concentration“, was unjustified because this relationship explains only about 7% of the variance when using a leave-one-out cross-validation procedure. In this procedure all data points except one are used to fit the regression and the final point is then used to evaluate the fit of the model. This procedure is then repeated n-fold using every data point as evaluation data once.

Taken at face value this approach sounds very appealing because it uses independent data for making predictions and for testing them. Replication is a cornerstone of science and in some respects cross-validation is an internal replication. So surely this is a great idea? Naive as I am I have long had a strong affinity for this idea.

Cross-validation underestimates predictive power

But not so fast. These notions fail to address two important issues (both of which some commenters on that thread already pointed out): first, it is unclear what amount of variance a model should explain to be important. 7% is not very much but it can nevertheless be of substantial theoretical value. The amount of variance that can realistically be explained by any model is limited by the noise in the data that arises from measurement error or other distortions. So in fact many studies using cross-validation to estimate the variance explained by some models (often in the context of model comparison) instead report the amount of explainable variance accounted for by the model. To derive this one must first estimate the noise ceiling, that is, the realistic maximum of variance that can possibly be explained. This depends on the univariate variability of the measures themselves.

Second, the cross-validation approach is based on the assumption that the observed sample, which is then subdivided into model-fitting and evaluation sets, is a good representation of the population parameters the analysis is attempting to infer. As such, the cross-validation estimate also comes with an error (this issue is also discussed by another blog post mentioned in that discussion thread). What we are usually interested in when we conduct scientific studies is to make an inference about the whole population, say a conclusion that can be broadly generalised to any human brain, not just the handful of undergraduate students included in our experiments. This does not really fit the logic of cross-validation because the evaluation is by definition only based on the same sample we collected.

Because I am a filthy, theory-challenged experimentalist, I decided to simulate this (and I apologise to all my Bayesian friends for yet again conditioning on the truth here…). For a range of sample sizes between n=3 and n=300 I drew a sample with from a population with a fixed correlation of rho=0.7 and performed leave-one-out cross-validation to quantify the variance explained by it (using the squared correlation between predicted and observed values). I also performed a standard regression analysis and quantified the variance explained by that. At each sample size I did this 1000 times and then calculated the mean variance explained for each approach. Here are the results:


What is immediately clear is that the results strongly depend on the sample size. Let’s begin with the blue line. This represents the variance explained by the standard regression analysis on the whole observed sample. The dotted, black, horizontal line denotes the true effect size, that is, the variance explained by the population correlation (so R^2=49%). The blue line starts off well above the true effect but then converges on it. This means that at small sample sizes, especially below n=10, the observed sample inflates how much variance is explained by the fitted model.

Next look at the red line. This denotes the variance explained by the leave-one-out cross-validation procedure. This also starts off above the true population effect and follows the decline of the observed correlation. But then it actually undershoots and goes well below the true effect size. Only then it gradually increases again and converges on the true effect. So at sample sizes that are most realistic in individual differences research, n=20-100ish, this cross-validation approach underestimates how much variance a regression model can explain and thus in fact undervalues the predictive power of the model.

The error bars in this plot denote +/- 1 standard deviation across the simulations at each sample size. So as one would expect, the variability across simulations is considerable when sample size is small, especially when n <= 10. These sample sizes are maybe unusually small but certainly not unrealistically small. I have seen publications calculating correlations on such small samples. The good news here is that even with such small samples on average the effect may not be inflated massively (let’s assume for the moment that publication bias or p-hacking etc are not an issue). However, cross-validation is not reliable under these conditions.

A correlation of rho=0.7 is unusually strong for most research. So I repeated this simulation analysis using a perhaps more realistic effect size of rho=0.3. Here is the plot:rho=3

Now we see a hint of something fascinating: the variance explained by the cross-validation approach actually subtly exceeds that of the observed sample correlation. They again converge on the true population level of 9% when the sample size reaches n=50. Actually there is again an undershoot but it is negligible. But at least for small samples with n <= 10 the cross-validation certainly doesn’t perform any better than the observed correlation. Both massively overestimate the effect size.

When the null hypothesis is true…

So if this is what is happening at a reasonably realistic rho=0.3, what about when the null hypothesis is true? This is what is shown in here (I apologise for the error bars extending into the impossible negative range but I’m too lazy to add that contingency to the code…):


The problem we saw hinted at above for rho=0.3 is exacerbated here. As before, the variance explained for the observed sample correlation is considerably inflated when sample size is small. However, for the cross-validated result this situation is much worse. Even at a sample size of n=300 the variance explained by the cross-validation is greater than 10%. If you read the PubPeer discussion I mentioned, you’ll see that I discussed this issue. This result occurs because when the null hypothesis is true – or the true effect is very weak – the cross-validation will produce significant correlations between the inadequately fitted model predictions and the actual observed values. These correlations can be positive or  negative (that is, the predictions systematically go in the wrong direction) but because the variance explained is calculated by squaring the correlation coefficient they turn into numbers substantially greater than 0%.

As I discussed in that thread, there is another way to calculate the variance explained by the cross-validation. I won’t go into detail on this but unlike the simpler approach I employed here this does not limit the variance explained to fall between 0-100%. While the estimates are numerically different, the pattern of results is qualitatively the same. At smaller sample sizes the variance explained by cross-validation systematically underestimates the true variance explained.

When the interocular traumatic test is significant…

My last example is the opposite scenario. While we already looked at an unusually strong correlation, I decided to also simulate a case where the effect should be blatantly obvious. Here rho=0.9:


Unsurprisingly, the results are similar as those seen for rho=0.7 but now the observed correlation is already doing a pretty decent job at reaching the nominal level of 81% variance explained. Still, the cross-validation underperforms at small sample sizes. In this situation, this actually seems to be a problem. It is rare that one would observe a correlation of this magnitude in psychological or biological sciences but if so chances are good that the sample size is small in that case. Often the reason for this may be that correlation estimates are inflated at small sample sizes but that’s not the point here. The point is that leave-one-out cross-validation won’t tell you. It underestimates the association even if it is real.

Where does all this leave us?

It is not my intention to rule out cross-validation. It can be a good approach for testing models and is often used successfully in the context of model comparison or classification analysis. In fact, as the debate about circular inference in neuroscience a few years ago illustrated, there are situations where it is essential that independent data are used. Cross-validation is a great way to deal with overfitting. Just don’t let yourself be misled into believing it can tell you something it doesn’t. I know it is superficially appealing and I had played with it previously for just that reason – but this exercise has convinced me that it’s not as bullet-proof is one might think.

Obviously, validation of a model with independent data is a great idea. A good approach is to collect a whole independent replication sample but this is expensive and may not always be feasible. Also, if a direct replication is performed it seems better that this is acquired independently by different researchers. A collaborative project could do this in which each group uses the data acquired by the other group to test their predictive model. But that again is not something that is likely to become regular practice anytime soon.

In the meantime we can also remember that performing typical statistical inference is a good approach after all. Its whole point is to infer the properties of the whole population from a sample. When used properly it tends to do a good job at that. Obviously, we should take measures to improve its validity, such as increasing power by using larger samples and/or better measurements. I know I am baysed but Bayesian hypothesis tests seem superior at ensuring validity than traditional significance testing. Registered Reports can probably also help and certainly should reduce the skew by publication bias and flexible analyses.

Wrapping up

So, does correlation imply prediction? I think so. Statistically this is precisely what it does. It uses one measure (or multiple measures) to make predictions of another measure. The key point is not whether calling it a prediction is valid but whether the prediction is sufficiently accurate to be important. The answer to this question actually depends considerably on what we are trying to do. A correlation explaining 10-20% of the variance in a small sample is not going to be a clear biomarker for anything. I sure as hell wouldn’t want any medical or judicial decisions to be based solely on such an association. But it may very well be very informative about mechanisms. It is a clearly detectable effect even with the naked eye.

In the context of these analysis, a better way than quantifying the variance explained is to calculate the root mean squared deviation (essentially the error bar) of the prediction. This provides an actually much more direct index of how accurately one variable predicts another. The next step – and I know I sound like a broken record – should be to confirm that these effects are actually scientifically plausible. This mantra is true for individual differences research as much as it is for Bem’s precognition and social priming experiments where I mentioned it before. Are the differences in neural transmission speed or neurotransmitter concentration implied by these correlation results realistic based on what we know about the brain?  These are the kinds of predictions we should actually care about in these discussions.


The Objectivity Illusion

The other day I posted an angry rant opinion about this whole sorry “trouble with girls” debacle. Don’t worry, I won’t write any more about this bullshitstorm “debate” in any further detail. There just wouldn’t be any point. I will, however, write a general note about perception and debates.

As a psychic (debating with psi people is contagious), I knew from the start that the reaction to my post would serve as a perfect example of what I was talking about. Or perhaps I planned this all along? It was an online experiment to show how any given piece of speech can be understood in at least as many different ways as there are people listening. (Actually, if that were true, Chris Chambers and Dorothy Bishop would call this HARKing and tell me I should have preregistered my hypothesis – so I can’t claim to have predicted this really :/)

In all seriousness though, the reactions – including the single commenter on my post – illustrate how people can take just about anything from what you say. People just hear what they want to hear – even if they really don’t like what they’re hearing. While my post expressed no endorsement or defense of any one side in that debate, certain readers immediately jumped to conclusions based on their entrenched philosophical/political stance. I obviously have an opinion on this affair and repeatedly explained that I wouldn’t state it. Unlike for the brainless jokers and paranoid nutcases that populate both sides of this twitter fight (some notable exceptions notwithstanding), my opinion on this is a bit more complex and thus I’d be here all day and I seriously have no appetite for this.

My post wasn’t about that though. It was about the idiocy and total irrelevance of whether Tim Hunt did or did not utter certain words in his speech. It stressed the pointlessness of arguing over who “lied” about their account of things when there is no factual record to compare it to and no tangible evidence to prove that somebody was deliberately distorting the truth. These things are pointless because they really don’t matter one iota and don’t address the actual issues.

As I discussed previously, our view of the world and our reactions to it are inherently biased. This is completely normal and defines the human condition. I don’t think we can entirely overcome these biases – and it isn’t even a given that this would be a good thing. The kinds of perceptual biases my colleagues and I study in the lab (things like this) exist for good reasons – even if the reasons remain in many cases highly controversial. They could reflect our statistical experience of the environment. Alternatively (and this may in fact not be mutually exclusive) they could be determined by very fundamental processes in the brain that backfire when they encounter these particular situations. In this way, perceptual illusions reveal the hidden workings by which your brain makes sense of the world.

Discussions and catfights, like those about climate change, gun control, religious liberty, about psi research, Bayes vs frequentism, or the comments made by certain retired professors are no different. Social media makes them particularly vitriolic and incendiary. I don’t know if this is because social media actually makes them worse or if this is just because it makes the worst more visible. Either way, fights like this are characterised by the same kinds of biases that distort all other perception and behaviour. People are amazingly resistant to factual evidence. You can show somebody some very clear data refuting their preconceived notions and yet they won’t budge. It may even drive them deeper into their prior beliefs. Perhaps there is some sort of Bayesian explanation for this phenomenon – if so, I’d like to hear it. Anyway, if there is one thing you can trust, it is that objectivity is an illusion.

Now as I’ve said, such cognitive and perceptual biases are normal and can’t be prevented. But I think all is not lost. I do believe they can be counteracted – to some extent at least – if we remain vigilant of them. We may even make them work for us instead of being swayed by them. I am wondering about ways to achieve that. Any ideas are welcome, I’d be happy to chat about this. Here is the first principle though according to the (biased) worldview of Sam:

If anyone tells you that they are objective, that their account is “investigative” or “forensic” or “factual,” or if they tell you outright that the other side is lying, then it doesn’t matter who they are or what credentials they may have. They are blinkered fools, they are wrong by definition, and they don’t deserve a second of your time.