(7th Aug 2015: I edited the section ‘Unsupported assumptions’ because I realised my earlier comments didn’t make sense)
Fate (or most likely coincidence) just had it that soon after my previous post, the first issue in my “Journal of Disbelief“, someone wrote a long post about social priming. This very long and detailed post by Michael Ramscar is very worth a read. In it he discusses the question of failed replications and going into particular depth on an alternative explanation for why Bargh’s elderly priming experiments failed to replicate. He also criticises the replication attempt. Since I discuss both these studies in detail (in my last post and several others) and because it pertains to my general skepticism of social priming, I decided to respond. I’d have done it on his blog but that doesn’t seem to be open for comments – so here I go instead:
With regard to replication, he argues that a lot of effort is essentially wasted on repeating these priming studies that could be put to better use to advance scientific knowledge. I certainly agree with this notion to some extent and I have argued similar points in the past. He then carries out a seemingly (I say this because I have neither the time nor the expertise to verify it) impressive linguistic analysis suggesting that the elderly priming study was unlikely to be replicated so many years after the original study because the use of language has undoubtedly evolved since the late 80s/early 90s when the original study was conducted (in fact, as he points out, many of the participants in the replication study were not even born when the original one was done). His argument is essentially that the words to prime the “schema of old age” in Bargh’s original study could no longer be effective as primes.
Ramscar further points out that the replication attempt was conducted on French-speaking participants and goes to some lengths to show that French words are unlikely to exert the same effect. This difference could very well be of great importance and in fact there may be very many reasons why two populations from which we sample our participants are not comparable (problems that may be particularly serious when trying to generalise results from Western populations to people with very different lifestyles and environments, like native tribes in Sub-Saharan Africa or the Amazon etc.). This however ignores that there have been other replication attempts of this experiment that also failed to show this elderly priming effect. I am aware of at least one that was done on English-speaking participants. Although as this was also done only a few years ago the linguistic point presumably still stands.
How about No?
The first thought I had when reading Ramscar’s hypothesis about how elderly priming works and why we shouldn’t expect it to replicate in modern samples was that this sounds like the complicated explanatory handwaving that people all too often engage in when their “experiment didn’t work” (meaning that their hypothesis wasn’t confirmed). I often encounter this when marking student project reports but it would be grossly unfair to make this out to be a problem specific to students. Rather you can often see this even from very successful, tenured researchers. While some of this behaviour is probably natural (and thus it makes sense why many students write things like this) I think the main blame for this lies with how we train students to approach their results, both in words and action. The problem is nicely summarised by the title of James Alcock’s paper “Give the Null Hypothesis a Chance“. While he wrote this as a critique of Psi research (which I may or may not cover in a future issue of the Journal of Disbelief – I kind of feel I’ve written enough about Psi…), I think it would serve us all well to remember that sometimes our beautifully crafted hypotheses may simply not be correct. To me this is also the issue with social priming.
Now, I get the impression Michael Ramscar does not necessarily believe that this linguistic account is the only explanation for the failure to replicate Bargh’s finding. He may very well accept that the result may have been a fluke. I am also being vastly unjust to liken his detailed post to “handwaving”. Considering to what lengths he goes to produce data about word frequencies his post is anything but handwavy. But whether or not it is entirely serious, it is out there and deserves some additional thoughts.
The way I see it, the linguistic explanation is based on a whole host of assumptions that probably do not hold. As I said in my previous post, we need more Occam’s Razor. Often mischaracterised as that “the simplest explanation is usually correct” what is really states (at least in my interpretation) is that the explanation that requires the smallest number of assumptions whilst producing the maximal explanatory power is probably closest to the truth. The null hypothesis that there is no such thing as social priming (or if it exists, it is much, much weaker than these underpowered experiments could possibly hope to detect) seems to me far more likely than the complex explanation posited by Ramscar’s post.
Why should we expect the words most frequently associated with old age (which he argues is – or rather was – the word ‘old’) to produce the strongest age priming effect? Couldn’t that just as well lead to a habituation? The most effective words for priming the old age schema may be more obscure ones that however strongly evoke thoughts about the elderly. I agree that even in the US ‘Florida’ and ‘bingo’ don’t necessarily cut it in that respect (and my guess is outside the US ‘Florida’ mainly evokes images of beaches and palm trees and possibly cheesy 80s cop dramas) . Others though, like ‘retired’ and ‘grey’ very well might though. And words like ‘rigid’ could very well evoke the concept of slowness. The fact that the word frequency produces such priming effects is a mostly unsupported assumption.
Another possibility is that the aggregate of the 30 primes is highly non-linear. By this I mean that the combined effect may be a lot more than the average (or even the sum) of individual priming effects. To me it actually seems quite likely that any activation of the concept of old age would only gradually build up over the course of the experiment. So essentially, the word ‘old’ may have no effect on its own but in combination with all the other words it might clearly evoke the schema. Of course, on the other hand I find it quite hard to fathom that one little word in each of thirty sentences – sentences that by themselves may be completely unrelated to old age – will produce a very noticeable effect on an irrelevant behaviour after the experiment is over.
The discussion of semantic priming effect, such as that reading/hearing the word ‘doctor’ might make you more likely to think of ‘nurse’ than of ‘commissar’, is a perfect example of the reasons I was describing in my last post why I think social priming hypotheses (at least the more fanciful ones based on abstract ideas and folk sayings) are highly implausible. How strong are semantic priming effects? How likely do you think it is that a social priming effect like that shown by Bargh could be even half that strong? Surely there must be numerous additional factors that exert their tugs and pulls on your unsuspecting mind. Many of which must be stronger than the effect of some words you form into a sentence. I realise that these noise factors must cancel out with a sufficiently large data set – but they form variability that will dilute the effect size we can possibly expect from such effects.
In my opinion the major problem with the whole theoretical idea behind social priming is that it just seems rather unfathomable to me that human beings and human society could function at all, if the effects were so strong, so long-lasting and so simple as claimed by much of this research. I don’t buy into the idea that these effects can only be produced reliably under laboratory conditions and only by skilled researchers. I know I can’t hope to replicate a particle physics experiment both for lack of lab equipment and lack of expertise. I can believe that some training in conducting social priming (or, while I’m at it again, Psi) experiments may require some experience with doing that. However, at the same time, if these effects are so real and so obvious as these researchers claim, they should be easier to replicate than something that requires years of practical training, thorough knowledge of maths and theoretical physics, and a million-dollar hadron collider. Psychology labs may reduce some noise in human behaviour compared to, say, doing your experiments on street corners but they are not dust-free rooms or telescopes outside of the Earth atmosphere. The subjects that come in to your lab remain heavily contaminated with all the baggage and noisiness that only the human mind is capable of. If effects as impressive as those in the social priming literature were real, human beings should bounce around the world as if they were inside a pinball machine.
So … what about replications again?
Finally, as I said, I kind of agree with Ramscar about replication attempts. I think a lot of direct replications are valid to establish the robustness of some effects. I am not sure that it really makes sense to repeat the elder-priming experiments though. Not that I don’t appreciate Doyen et al.’s and Hal Pashler’s attempts to replicate this experiment. However, as I have tried to argue, the concepts in many social priming studies are simply so vague and complex that one probably can’t learn all that much. I entirely accept Ramscar’s point that different times, different populations, and (most critically) different languages might make an enormous difference here. The possible familiarity of research subjects with the original experiment may further complicate matters. And unless the experimenters are suitably blinded to the experimental condition of each participant (which I don’t think is always the case), there may be further problems with demand effect etc.
A lot of the rebuttals by original authors of failed social priming replications seem to revolve around the point that while specific experiments don’t replicate this does not mean the whole theory is invalid. There have been numerous findings of social priming in the literature. However, even I, having written extensively about what I think is misguided about the current wave of replication attempts, would say that the sheer number of failed social priming replications should pose a serious problem for advocates of that theory.
But this is where I think social psychologists need to do better. I think rather than more direct replications of social priming we need more conceptual replication attempts that try to directly address this question:
Can social priming effects of this magnitude be real?
I don’t believe that they can but perhaps I am too skeptical. I can only tell you that I won’t be convinced of the existence of social priming (or Psi) by yet more underpowered, possibly p-hacked studies by researchers who just know how to get these effects. Especially not if the effects are so large that they seem vastly incompatible with the way it appears our behaviour works. Maybe I am relying too much on my gut here than my brain but when faced between the choice of a complex theory based on numerous (typically posthoc) assumptions and the notion that these effects just don’t exist, I know which I’d choose…