Category Archives: priming

Of unspeakable unrepeatables…

(7th Aug 2015: I edited the section ‘Unsupported assumptions’ because I realised my earlier comments didn’t make sense)

Fate (or most likely coincidence) just had it that soon after my previous post, the first issue in my “Journal of Disbelief“, someone wrote a long post about social priming. This very long and detailed post by Michael Ramscar is very worth a read. In it he discusses the question of failed replications and going into particular depth on an alternative explanation for why Bargh’s elderly priming experiments failed to replicate. He also criticises the replication attempt. Since I discuss both these studies in detail (in my last post and several others) and because it pertains to my general skepticism of social priming, I decided to respond. I’d have done it on his blog but that doesn’t seem to be open for comments – so here I go instead:

With regard to replication, he argues that a lot of effort is essentially wasted on repeating these priming studies that could be put to better use to advance scientific knowledge. I certainly agree with this notion to some extent and I have argued similar points in the past. He then carries out a seemingly (I say this because I have neither the time nor the expertise to verify it) impressive linguistic analysis suggesting that the elderly priming study was unlikely to be replicated so many years after the original study because the use of language has undoubtedly evolved since the late 80s/early 90s when the original study was conducted (in fact, as he points out, many of the participants in the replication study were not even born when the original one was done). His argument is essentially that the words to prime the “schema of old age” in Bargh’s original study could no longer be effective as primes.

Ramscar further points out that the replication attempt was conducted on French-speaking participants and goes to some lengths to show that French words are unlikely to exert the same effect. This difference could very well be of great importance and in fact there may be very many reasons why two populations from which we sample our participants are not comparable (problems that may be particularly serious when trying to generalise results from Western populations to people with very different lifestyles and environments, like native tribes in Sub-Saharan Africa or the Amazon etc.). This however ignores that there have been other replication attempts of this experiment that also failed to show this elderly priming effect. I am aware of at least one that was done on English-speaking participants. Although as this was also done only a few years ago the linguistic point presumably still stands.

How about No?

The first thought I had when reading Ramscar’s hypothesis about how elderly priming works and why we shouldn’t expect it to replicate in modern samples was that this sounds like the complicated explanatory handwaving that people all too often engage in when their “experiment didn’t work” (meaning that their hypothesis wasn’t confirmed). I often encounter this when marking student project reports but it would be grossly unfair to make this out to be a problem specific to students. Rather you can often see this even from very successful, tenured researchers. While some of this behaviour is probably natural (and thus it makes sense why many students write things like this) I think the main blame for this lies with how we train students to approach their results, both in words and action. The problem is nicely summarised by the title of James Alcock’s paper “Give the Null Hypothesis a Chance“. While he wrote this as a critique of Psi research (which I may or may not cover in a future issue of the Journal of Disbelief – I kind of feel I’ve written enough about Psi…), I think it would serve us all well to remember that sometimes our beautifully crafted hypotheses may simply not be correct. To me this is also the issue with social priming.

Now, I get the impression Michael Ramscar does not necessarily believe that this linguistic account is the only explanation for the failure to replicate Bargh’s finding. He may very well accept that the result may have been a fluke. I am also being vastly unjust to liken his detailed post to “handwaving”. Considering to what lengths he goes to produce data about word frequencies his post is anything but handwavy. But whether or not it is entirely serious, it is out there and deserves some additional thoughts.

The way I see it, the linguistic explanation is based on a whole host of assumptions that probably do not hold. As I said in my previous post, we need more Occam’s Razor. Often mischaracterised as that “the simplest explanation is usually correct” what is really statesĀ  (at least in my interpretation) is that the explanation that requires the smallest number of assumptions whilst producing the maximal explanatory power is probably closest to the truth. The null hypothesis that there is no such thing as social priming (or if it exists, it is much, much weaker than these underpowered experiments could possibly hope to detect) seems to me far more likely than the complex explanation posited by Ramscar’s post.

Unsupported assumptions

Why should we expect the words most frequently associated with old age (which he argues is – or rather was – the word ‘old’) to produce the strongest age priming effect? Couldn’t that just as well lead to a habituation? The most effective words for priming the old age schema may be more obscure ones that however strongly evoke thoughts about the elderly. I agree that even in the US ‘Florida’ and ‘bingo’ don’t necessarily cut it in that respect (and my guess is outside the US ‘Florida’ mainly evokes images of beaches and palm trees and possibly cheesy 80s cop dramas) . Others though, like ‘retired’ and ‘grey’ very well might though. And words like ‘rigid’ could very well evoke the concept of slowness. The fact that the word frequency produces such priming effects is a mostly unsupported assumption.

Another possibility is that the aggregate of the 30 primes is highly non-linear. By this I mean that the combined effect may be a lot more than the average (or even the sum) of individual priming effects. To me it actually seems quite likely that any activation of the concept of old age would only gradually build up over the course of the experiment. So essentially, the word ‘old’ may have no effect on its own but in combination with all the other words it might clearly evoke the schema. Of course, on the other hand I find it quite hard to fathom that one little word in each of thirty sentences – sentences that by themselves may be completely unrelated to old age – will produce a very noticeable effect on an irrelevant behaviour after the experiment is over.

Implausible hypothesis

The discussion of semantic priming effect, such as that reading/hearing the word ‘doctor’ might make you more likely to think of ‘nurse’ than of ‘commissar’, is a perfect example of the reasons I was describing in my last post why I think social priming hypotheses (at least the more fanciful ones based on abstract ideas and folk sayings) are highly implausible. How strong are semantic priming effects? How likely do you think it is that a social priming effect like that shown by Bargh could be even half that strong? Surely there must be numerous additional factors that exert their tugs and pulls on your unsuspecting mind. Many of which must be stronger than the effect of some words you form into a sentence. I realise that these noise factors must cancel out with a sufficiently large data set – but they form variability that will dilute the effect size we can possibly expect from such effects.

In my opinion the major problem with the whole theoretical idea behind social priming is that it just seems rather unfathomable to me that human beings and human society could function at all, if the effects were so strong, so long-lasting and so simple as claimed by much of this research. I don’t buy into the idea that these effects can only be produced reliably under laboratory conditions and only by skilled researchers. I know I can’t hope to replicate a particle physics experiment both for lack of lab equipment and lack of expertise. I can believe that some training in conducting social priming (or, while I’m at it again, Psi) experiments may require some experience with doing that. However, at the same time, if these effects are so real and so obvious as these researchers claim, they should be easier to replicate than something that requires years of practical training, thorough knowledge of maths and theoretical physics, and a million-dollar hadron collider. Psychology labs may reduce some noise in human behaviour compared to, say, doing your experiments on street corners but they are not dust-free rooms or telescopes outside of the Earth atmosphere. The subjects that come in to your lab remain heavily contaminated with all the baggage and noisiness that only the human mind is capable of. If effects as impressive as those in the social priming literature were real, human beings should bounce around the world as if they were inside a pinball machine.

So … what about replications again?

Finally, as I said, I kind of agree with Ramscar about replication attempts. I think a lot of direct replications are valid to establish the robustness of some effects. I am not sure that it really makes sense to repeat the elder-priming experiments though. Not that I don’t appreciate Doyen et al.’s and Hal Pashler’s attempts to replicate this experiment. However, as I have tried to argue, the concepts in many social priming studies are simply so vague and complex that one probably can’t learn all that much. I entirely accept Ramscar’s point that different times, different populations, and (most critically) different languages might make an enormous difference here. The possible familiarity of research subjects with the original experiment may further complicate matters. And unless the experimenters are suitably blinded to the experimental condition of each participant (which I don’t think is always the case), there may be further problems with demand effect etc.

A lot of the rebuttals by original authors of failed social priming replications seem to revolve around the point that while specific experiments don’t replicate this does not mean the whole theory is invalid. There have been numerous findings of social priming in the literature. However, even I, having written extensively about what I think is misguided about the current wave of replication attempts, would say that the sheer number of failed social priming replications should pose a serious problem for advocates of that theory.

But this is where I think social psychologists need to do better. I think rather than more direct replications of social priming we need more conceptual replication attempts that try to directly address this question:

Can social priming effects of this magnitude be real?

I don’t believe that they can but perhaps I am too skeptical. I can only tell you that I won’t be convinced of the existence of social priming (or Psi) by yet more underpowered, possibly p-hacked studies by researchers who just know how to get these effects. Especially not if the effects are so large that they seem vastly incompatible with the way it appears our behaviour works. Maybe I am relying too much on my gut here than my brain but when faced between the choice of a complex theory based on numerous (typically posthoc) assumptions and the notion that these effects just don’t exist, I know which I’d choose…

This is your brain on social priming (Image shamelessly stolen harvested from here)

The Journal of Disbelief: I. Social Priming

Unless you have been living on the Moon in recent years, you will have heard about the replication crisis in science. Some will want to make you believe that this problem is specific to psychology and neuroscience but similar discussions also plague other research areas from drug development and stem cell research to high energy physics. However, since psychology deals with flaky, hard-to-define things like the human mind and the myriad of behaviours it can produce, it is unsurprising that reproducibility is perhaps a greater challenge in this field. As opposed to, say, an optics experiment (I assume, I may be wrong about that), there are just too many factors than that you could conduct a controlled experiment producing clear results.

Science is based on the notion that the world, and the basic laws that govern it, are for the most part deterministic. If you have situation A under condition X, you should get a systematic change to situation B when you change the condition to Y. The difference may be small and there may be a lot of noise meaning that the distinction between situations A and B (or between conditions X and Y for that matter) isn’t always very clear-cut. Nevertheless, our underlying premise remains that there is a regularity that we can reveal provided we have sufficiently exact measurement tools and are able and willing to repeat the experiment a sufficiently large number of times. Without causality as we know it, that is the assumption that a manipulation will produce a certain effect, scientific inquiry just wouldn’t work.

There is absolutely nothing wrong with wanting to study complicated phenomena like human thought or behaviour. Some people seem to think that if you can’t study every minute detail of a complex system like the brain and don’t understand what every cell and ion channel is doing at any given time, you have no hope of revealing anything meaningful about the larger system. This is nonsense. Phenomena exist at multiple scales and different levels and a good understanding of all the details is not a prerequisite for understanding some of the broader aspects, which may in fact be more than the sum of their parts. So by all means we should be free to investigate lofty hypotheses about consciousness, about how cognition influences perception (and vice versa), or about whether seemingly complex attributes like political alignment or choice of clothing relate to simple behaviours and traits. But when we do we should always keep in mind the complexity of the system we study and whether the result is even plausible under our hypothesis.

Whatever you may feel about Bayesian statistics, the way I see it there should be a Bayesian approach to scientific reasoning. Start with an educated guess on what effects one might realistically expect under a range of different hypotheses. Then see under which hypothesis (or hypotheses) the observed results are most likely. In my view, a lot of scientific claims fall flat in that respect. Note that this doesn’t necessarily mean that these claims aren’t true – it’s just that the evidence for them so far has been far from convincing and I will try to explain why I feel that way. I also don’t claim to be immune to this delusion either. Some of my own hypotheses are probably also scientifically implausible. The whole point of research is to work this out and arrive at the most parsimonious explanation for any effect. We need more Occam’s Razor instead of the Sherlock Holmes Principle.

The Sherlock Holmes Principle should not be confused with the Benedict Cumberbatch Principle: The more socially awkward, ingenious, criminally insane or dragon-like a character in a movie/television show, the closer to one is the probability that he/she/it will be portrayed by Benedict Cumberbatch.

So my next few posts will be about scientific claims and ideas I just can’t believe. I was going to post them as a list but there turns out to be so much material that I think it’s better to expand this across several posts…

Part I: Social Priming

If you read any of my blogs before, you will know that I am somewhat critical of the current mainstream (or is it only a vocal minority?) of direct replication advocates. Note that this does not make me an opponent of replication attempts. Replication is a cornerstone of science and it should be encouraged. I am not sure if there are many people who actually disagree with that point and who truly regard the currently fashionable efforts as the work of a “replication police.” All I have been saying in the past is that the way we approach replication could be better. In my opinion a good replication attempt should come with a sanity check or some control condition/analysis that can provide evidence that the experiments were done properly and that the data are sound. Ideally, I want the distinction between replications and original research to be much blurrier than it currently is. The way I see it, both direct and indirect replications should be done regularly as part of most original research. I know that this is not always feasible and when you simply fail to replicate some previous finding and also don’t support your new hypothesis, you should obviously still be permitted to publish (certainly if you can demonstrate that the experiment worked and you didn’t just produce rubbish data).

Some reasonably credible claims

But this post isn’t about replication. Whether or not the replication attempts have been optimal, the sheer number of failed social priming (I may be using this term loosely) replications is really casting the shadow of serious doubt on that field. The reason for this, in my mind, is that it contains so many perfect examples of implausible results. This presumably does not apply to all such claims. I am quite happy to believe that some seemingly innocuous, social cues can influence human behaviour, even though I don’t know if they have actually been tested with scientific rigour. For example, there have been claims that putting little targets into urinals, such as a picture of a fly or a tiny football (or for my American readers “soccer”) goal with a swinging ball, reduces the amount of urine splattered all over the bathroom floor. I can see how this might work, if only from anecdotal self observation (not that I pee all over the bathroom floor without that, mind you). I can also believe that when an online sales website tells you “Only two left in stock!” it makes you more likely to buy then and there. Perhaps somewhat less plausible but still credible is the notion that playing classical music in London Underground stations reduces anti-social behaviour because some unsavoury characters don’t like that ambience.

An untrue (because fake) but potentially credible claim

While on the topic of train stations, the idea that people are more prone to racist stereotyping when they are in messy environments, does not seem entirely far-fetched to me. This is perhaps somewhat ironic because this is one of Diederik Stapel’s infamous fraudulent research claims. I don’t know if anyone has tried to carry out that research for real. It might very well not be a real effect at all but I could easily see how messy, graffiti-covered environments could trigger a cascade of all sorts of reactions and feelings that in turn influence your perception and behaviour to some degree. If it is real, the effect will probably be very small at the level of the general population because what one regards as an unpleasantly messy environment (and how prone one is to stereotyping) will probably differ substantially between people and between different contexts, such as the general location or the time. For instance, a train station in London is probably not perceived the same as one in the Netherlands (trust me on that one…), and a messy environment after carnival or another street party is probably not viewed in the same light as during other days. All these factors must contribute “noise” to the estimate of the population effect size.

The problem of asymmetric noise

However, this example already hints at the larger problem with determining whether or not the results from most social priming research are credible. It is possible that some effect size estimates will be stronger in the original test conditions than they will be in large-scale replication attempts. I would suspect this for most reported effects in the literature even under the best conditions and with pre-registered protocols. Usually your subject sample does not generalise perfectly to the world population or even the wider population of your country. It is perhaps impossible to completely abolish the self-selection bias induced by those people who choose to participate in research.

And sometimes the experimenter’s selection bias also makes sense as it helps to reduce noise and thus enhances the sensitivity of the experimental paradigm. For instance, in our own research using fMRI retinotopic mapping we are keen to scan people who we know are “good fMRI subjects”: people who can lie still inside the scanner and fixate perfectly for prolonged periods of time without falling asleep too much (actually even the best subjects suffer from that flaw…). If you scan someone who jitters and twitches all the time and who can’t keep their eyes open during the (admittedly dull) experiments, you can’t be surprised if your data turns out to be crap. This doesn’t tell you anything about the true effect size in the typical brain but only that it is much harder to obtain these measurements from the broader population. The same applies to psychophysics. A trained observer will produce beautifully smooth sigmoidal curves in whatever experiment you have them do. In contrast, randomers off the street will often give you a zigzag from which you can estimate thresholds only with the most generous imagination. It would be idiotic to regard the latter as more precise measurement of human behaviour. The only thing we can say is that testing a broader sample can give you greater confidence that the result truly generalises.

Turning the hands of time

Could it perhaps be possible that some of the more bizarre social priming effects are “real” in the same sense? They may just be much weaker than the original reports because in the wider population these effects are increasingly diluted by asymmetric noise factors. However, I find it hard to see how this criticism could apply when it comes to many social priming claims. What are the clean, controlled laboratory conditions that assure the most accurate measurement and strongest signal-to-noise ratio in such experiments? Take for instance the finding that people become more or less “open to new experiences” (but see this) depending on the direction they turn a crank/cylinder/turn-table. How likely is it that effects as those reported (Cohen’s d mostly between 0.3-0.7ish) will be observed even if they are real? It seems to me that there are countless factors that will affect a person’s preference for familiar or novel experiences. Presumably if the effect of clockwise or anticlockwise (or for Americans: counterclockwise) rotation exists, it should manifest with a lot of individual differences because not everyone will be equally familiar with analogue clocks. I cannot rule out that engaging or seeing clockwise rotation activates some representation of the future. This could influence people to think about novel things. It might just as equally make them anxious: as someone who is perpetually late for appointments the sight of a ticking clock certainly mainly primes in me the thought that I am running late again. I’d not be surprised if it increased my heart rate but I’d be pretty surprised if it made me desire unfamiliar items.

Walking slowly because you read the word “Florida”

The same goes for many other social priming claims, many of which have spectacularly failed to be replicated by other researchers. Take Bargh’s finding that priming people with words related to the elderly makes them walk more slowly. I can see how thinking about old age could make you behave more like your stereotype of old people although at the same time I don’t know why you should. It might as well have the opposite effect. More importantly, there should be countless other factors that probably have a much stronger influence on your walking speed, such as our general fitness level or the time of day and the activities you were doing before. Another factor influencing you will be the excitement about what to do next, for instance whether you are going to go to work or are about to head to a party. Or, most relevant to my life probably, whether or not you are running late for your next appointment.

Under statistical assumptions we regard all of these factors as noise, that is, random variation in the subject sample. If we test enough subjects the noise should presumably cancel out and the true effect of elderly priming, tiny as it may be, should crystallise. Fair enough. But that does not answer the question how strong an effect of priming the elderly we can realistically expect. I may very well be wrong, but it seems highly improbable to me that such a basic manipulation could produce a difference in average walking speed of one whole second (at least around an eighth of the time it took people on average to walk down the corridor). Even if the effect were really that large, it should be swamped by noise making it unlikely that it would be statistically significant with a sample size of 15 per group. Rather the explanation offered by one replication attempt (I’ve written about this several times before) seems more parsimonious: that there was an experimenter effect in that whoever was measuring the walking speed consistently misestimated the walking speed depending on what priming condition they believed the participant had been exposed to.

I should have professor-primed before all my exams

Even more incredible to me is the idea of “professor priming” in which exposing participants to things that remind them of professors makes them better at answering trivia questions than when they are primed with the concept of “soccer hooligans”, another finding that recently failed to be replicated. What mechanism could possibly explain such a cognitive and behavioural change? I can imagine how being primed to think about hooligans could generate all sorts of thoughts and feelings. They could provoke anxiety and stress responses. Perhaps that could make you perform worse on common knowledge questions. It’s the same mechanism that perhaps underlies stereotype threat effects (incidentally, how do those do with regard to replicability?). But this wouldn’t explain improvements in people’s performance when primed with professors.

I could see how being primed by hooligans or professors might produce some effects on your perception – perhaps judging somebody’s character by their faces etc. Perhaps you are more likely to think an average-looking person has above average intelligence when you’re already thinking about professors than when you think about hooligans (although there might just as well be contrast effects and I can’t really predict what should happen). But I find it very hard to fathom how thinking about professors should produce a measurable boost in trivia performance. Again, even if it were real, this effect should be swamped by all sorts of other factors all of which are likely to exert much greater influence on your ability to answer common knowledge questions. Presumably, common knowledge depends in the first instance on one’s common knowledge. Thinking of facts you do not have immediate access to may be helped by wakefulness and arousal. It may also help if you’re already “thinking outside the box” (I mean this figuratively – I have this vague suspicion that there is also a social priming study that claims being inside vs outside a physical box has some effect on creative thinking… (I was right – there is such a study)). You may be quicker/better at coming up with unconventional, rarely accessed information when you are already on a broad search than when you are monotonously carrying out a simple, repetitive task. But I don’t see how being primed by professors could activate such a process and produce anything but the tiniest of effects.

Flag-waving sheep

There was also a study that claimed that exposing subjects to a tiny American flag in the corner of the screen while they answered a survey affected their voting behaviour in a the presidential election many months later. After all that I have written already, it should strike you as highly unlikely that such an effect could be established reliably. There are multitudes of factors that may influence a person’s voting behaviour, especially within the months between the critical flag experiment and election day. Surely the thousands of stars and stripes that any person residing in the United States would be exposed to on a daily basis should have some effect? I can believe that there are a great many hidden variables that govern where you make your cross on the ballot (or however they may be voting there) but I don’t think participation in a psychology experiment can produce a long-term shift in that behaviour by over 10%. If that were true, I think we would be well-advised to bring back absolutist monarchy. At least then you know who’s in charge.

Will seeing this make my British readers more likely to vote Yes in the referendum on EU membership next year?

Of fishy smells and ticking time bombs

One thing that many of these social priming studies have in common is that they take common folk sayings and turn them into psychology experiments. Similar claims, that I just learned about this week (thanks for Alex Etz and Stuart Ritchie), that being exposed to the smell of fish oil makes you more suspicious (because “it smells fishy”) and that the sound of a ticking clock pressures women from poor backgrounds into getting married and having children. I don’t know about you but if all of these findings are true, I feel seriously sorry for my brain. It’s constantly bombarded by conflicting cues telling it to change its perceptions and decisions on all sorts of things. It is a miracle we get anything done. Maybe it is because I am not a native English speaker but when I smell fish I think I may develop an appetite for fish. I don’t think it makes me more skeptical. Reading even more titles of studies turning proverbs into psychology studies just might though…

So what next?

I could go on as I am sure there are many more such studies but that’s beside the point. My main problem is that few research studies seem to ask whether the results they obtained are realistic. Ideally, we should start with some form of prediction of what kind of result we can even expect. To be honest, when it comes to social priming I don’t know how to go about doing this. I think it’s fine to start badly as long as someone is actually trying at all. Some thorough characterisation of the evidence to produce norm data may be useful. For instance, it would be useful to have data on general walking speeds of people leaving the lab from a larger sample so that you have a better estimate of the variability in walking speeds you could expect. If that is substantially larger than 1 second you should probably look to test a pretty large sample. Or characterise the various factors that can impact “openness to new experiences” more strongly than innocuous actions like turning a lever and then make an estimate as to how probable it is that your small social priming manipulation could exert a measurable effect with a typical sample size. Simulations could help with this. Last but definitely not least, think of more parsimonious hypotheses and either test them as part of your study or make sure that they are controlled – such as replacing the experimenter using a stopwatch with laser sensors at both ends of the corridor.

Of course, the issues I discussed here don’t apply only to social priming and future posts will deal with those topics. However, social priming has a particularly big problem. It is simply mechanistically very underdetermined. Sure, the general idea is that activating some representation or some idea can have an influence on behaviour. This essentially treats a human mind like a clean test tube just waiting for you to pour in your chemicals so you can watch the reaction. The problem is that in truth a human mind is more like a really, really filthy test tube, contaminated with blood and bacteria and dirty paw prints of all the people who fingered them before…

In my next experiment I will see if test tubes also fall asleep when you do retinotopic mapping on them