Category Archives: data sharing

When the hole changes the pigeon

or How innocent assumptions can lead to wrong conclusions

I promised you a (neuro)science post. Don’t let the title mislead you into thinking we’re talking about world affairs and societal ills again. While pigeonholing is directly related to polarised politics or social media, for once this is not what this post is about. Rather, it is about a common error in data analysis. While there have been numerous expositions about similar issues throughout the decades – as we’ve learned the hard way, it is a surprisingly easy mistake to make. A scientific article by Susanne Stoll laying out this problem in more detail is currently available as a preprint.

Pigeonholing (Source:

Data binning

In science you often end up with large data sets, with hundreds or thousands of individual observations subject to considerable variance. For instance, in my own field of retinotopic population receptive field (pRF) mapping, a given visual brain area may have a few thousand recording sites, and each has a receptive field position. There are many other scenarios of course. It could be neural firing, or galvanic skin responses, or eye positions recorded at different time points. Or it could be hundreds or thousands of trials in a psychophysics experiment etc. I will talk about pRF mapping because this is where we recently encountered the problem and I am going to describe how it has affected our own findings – however, you may come across the same issue in many guises.

Imagine that we want to test how pRFs move around when you attend to a particular visual field location. I deliberately use this example because it is precisely what a bunch of published pRF studies did, including one of ours. There is some evidence that selective attention shifts the position of neuronal receptive fields, so it is not far-fetched that it might shift pRFs in fMRI experiments also. Our study for instance investigated whether pRFs shift when participants are engaged in a demanding (“high load”) task at fixation, compared to a baseline condition where they only need to detect a simple colour change of the fixation target (“low load”). Indeed, we found that across many visual areas pRFs shifted outwards (i.e. away from fixation). This suggested to us that the retinotopic map reorganises to reflect a kind of tunnel vision when participants are focussed on the central task.

What would be a good way to quantify such map reorganisation? One simple way might be to plot each pRF in the visual field with a vector showing how it is shifted under the attentional manipulation. In the graph below, each dot shows a pRF location under the attentional condition, and the line shows how it has moved away from baseline. Since there is a large number pRFs, many of which are affected by measurement noise or other errors, these plots can be cluttered and confusing:

Plotting shift of each pRF in the attention condition relative to baseline. Each dot shows where a pRF landed under the attentional manipulation, and the line shows how it has shifted away from baseline. This plot is a hellishly confusing mess.

Clearly, we need to do something to tidy up this mess. So we take the data from the baseline condition (in pRF studies, this would normally be attending to a simple colour change at fixation) and divide the visual field up into a number of smaller segments, each of which contains some pRFs. We then calculate the mean position of the pRFs from each segment under the attentional manipulation. Effectively, we summarise the shift from baseline for each segment:

We divide the visual field into segments based on the pRF data from the baseline condition and then plot the mean shift in the experimental condition for each segment. A much clearer graph that suggests some very substantial shifts…

This produces a much clearer plot that suggests some interesting, systematic changes in the visual field representation under attention. Surely, this is compelling evidence that pRFs are affected by this manipulation?

False assumptions

Unfortunately it is not1. The mistake here is to assume that there is no noise in the baseline measure that was used to divide up the data in the first place. If our baseline pRF map were a perfect measure of the visual field representation, then this would have been fine. However, like most data, pRF estimates are variable and subject to many sources of error. The misestimation is also unlikely to be perfectly symmetric – for example, there are several reasons why it is more likely that a pRF will be estimated closer to central vision than in the periphery. This means there could be complex and non-linear error patterns that are very difficult to predict.

The data I showed in these figures are in fact not from an attentional manipulation at all. Rather, they come from a replication experiment where we simply measured a person’s pRF maps twice over the course of several months. One thing we do know is that pRF measurements are quite robust, stable over time, and even similar between scanners with different magnetic field strengths. What this means is that any shifts we found are most likely due to noise. They are completely artifactual.

When you think about it, this error is really quite obvious: sorting observations into clear categories can only be valid if you can be confident in the continuous measure on which you base these categories. Pigeonholing can only work if you can be sure into which hole each pigeon belongs. This error is also hardly new. It has been described in numerous forms as regression to the mean and it rears its ugly head every few years in different fields. It is also related to circular inference, which has already caused a stir in cognitive and social neuroscience a few years ago. Perhaps the reason for this is that it is a damn easy mistake to make – but that doesn’t make the face-palming moment any less frustrating.

It is not difficult to correct this error. In the plot below, I used an independent map from yet another, third pRF mapping session to divide up the visual field. Then I calculated how the pRFs in each visual field segment shifted on average between the two experimental sessions. While some shift vectors remain, they are considerably smaller than in the earlier graph. Again, keep in mind that these are simple replication data and we would not really expect any systematic shifts. There certainly does not seem to be a very obvious pattern here – perhaps there is a bit of a clockwise shift in the right visual hemifield but that breaks down in the left. Either way, this analysis gives us an estimate of how much variability there may be in this measurement.

We use an independent map to divide the visual field into segments. Then we calculate the mean position for each segment in the baseline and the experimental condition, and work out the shift vector between them. For each segment, this plot shows that vector. This plot loses some information, but it shows how much and into which direction pRFs in each segment shifted on average.

This approach of using a third, independent map loses some information because the vectors only tell you the direction and magnitude of the shifts, not exactly where the pRFs started from and where they end up. Often the magnitude and direction of the shift is all we really need to know. However, when the exact position is crucial we could use other approaches. We will explore this in greater depth in upcoming publications.

On the bright side, the example I picked here is probably extreme because I didn’t restrict these plots to a particular region of interest but used all supra-threshold voxels in the occipital cortex. A more restricted analysis would remove some of that noise – but the problem nevertheless remains. How much it skews the findings depends very much on how noisy the data are. Data tend to be less noisy in early visual cortex than in higher-level brain regions, which is where people usually find the most dramatic pRF shifts…

Correcting the literature

It is so easy to make this mistake that you can find it all over the pRF literature. Clearly, neither authors nor reviewers have given it much thought. It is definitely not confined to studies of visual attention, although this is how we stumbled across it. It could be a comparison between different analysis methods or stimulus protocols. It could be studies measuring the plasticity of retinotopic maps after visual field loss. Ironically, it could even be studies that investigate the potential artifacts when mapping such plasticity incorrectly. It is not restricted to the kinds of plots I showed here but should affect any form of binning, including the binning into eccentricity bins that is most common in the literature. We suspect the problem is also pervasive in many other fields or in studies using other techniques. Only a few years ago a similar issue was described by David Shanks in the context of studying unconscious processing. It is also related to warnings you may occasionally hear about using median splits – really just a simpler version of the same approach.

I cannot tell you if the findings from other studies that made this error are spurious. To know that we would need access to the data and reanalyse these studies. Many of them were published before data and code sharing was relatively common2. Moreover, you really need to have a validation dataset, like the replication data in my example figures here. The diversity of analysis pipelines and experimental designs makes this very complex – no two of these studies are alike. The error distributions may also vary between different studies, so ideally we need replication datasets for each study.

In any case, as far as our attentional load study is concerned, after reanalysing these data with unbiased methods, we found little evidence of the effects we published originally. While there is still a hint of pRF shifts, these are no longer statistically significant. As painful as this is, we therefore retracted that finding from the scientific record. There is a great stigma associated with retraction, because of the shady circumstances under which it often happens. But to err is human – and this is part of the scientific method. As I said many times before, science is self-correcting but that is not some magical process. Science doesn’t just happen, it requires actual scientists to do the work. While it can be painful to realise that your interpretation of your data was wrong, this does not diminish the value of this original work3 – if anything this work served an important purpose by revealing the problem to us.

We mostly stumbled across this problem by accident. Susanne Stoll and Elisa Infanti conducted a more complex pRF experiment on attention and found that the purported pRF shifts in all experimental conditions were suspiciously similar (you can see this in an early conference poster here). It took us many months of digging, running endless simulations, complex reanalyses, and sometimes heated arguments before we cracked that particular nut. The problem may seem really obvious now – it sure as hell wasn’t before all that.

This is why this erroneous practice appears to be widespread in this literature and may have skewed the findings of many other published studies. This does not mean that all these findings are false but it should serve as a warning. Ideally, other researchers will also revisit their own findings but whether or not they do so is frankly up to them. Reviewers will hopefully be more aware of the issue in future. People might question the validity of some of these findings in the absence of any reanalysis. But in the end, it doesn’t matter all that much which individual findings hold up and which don’t4.

Check your assumptions

I am personally more interested in taking this whole field forward. This issue is not confined to the scenario I described here. pRF analysis is often quite complex. So are many other studies in cognitive neuroscience and, of course, in many other fields as well. Flexibility in study designs and analysis approaches is not a bad thing – it is in fact essential for addressing scientific questions that we can adapt our experimental designs.

But what this story shows very clearly is the importance of checking our assumptions. This is all the more important when using the complex methods that are ubiquitous in our field. As cognitive neuroscience matures, it is critical that we adopt good practices in ensuring the validity of our methods. In the computational and software development sectors, it is to my knowledge commonplace to test algorithms on conditions where the ground truth is known, such as random and/or simulated data.

This idea is probably not even new to most people and it certainly isn’t to me. During my PhD there was a researcher in the lab who had concocted a pretty complicated analysis of single-cell electrophysiology recordings. It involved lots of summarising and recentering of neuronal tuning functions to produce the final outputs. Neither I nor our supervisor really followed every step of this procedure based only on our colleague’s description – it was just too complex. But eventually we suspected that something might be off and so we fed random numbers to the algorithm – lo and behold the results were a picture perfect reproduction of the purported “experimental” results. Since then, I have simulated the results of my analyses a few other times – for example, when I first started with pRF modelling or when I developed new techniques for measuring psychophysical quantities.

This latest episode taught me that we must do this much more systematically. For any new design, we should conduct control analyses to check how it behaves with data for which the ground truth is known. It can reveal statistical artifacts that might hide inside the algorithm but also help you determine the method’s sensitivity and thus allow you to conduct power calculations. Ideally, we would do that for every new experiment even if it uses a standard design. I realise that this may not always be feasible – but in that case there should be a justification why it is unnecessary.

Because what this really boils down to is simply good science. When you use a method without checking that it works as intended, you are effectively doing a study without a control condition – quite possibly the original sin of science.


In conclusion, I quickly want to thank several people: First of all, Susanne Stoll deserves major credit for tirelessly pursuing this issue in great detail over the past two years with countless reanalyses and simulations. Many of these won’t ever see the light of day but helped us wrap our heads around what is going on here. I want to thank Elisa Infanti for her input and in particular the suggestion of running the analysis on random data – without this we might never have realised how deep this rabbit hole goes. I also want to acknowledge the patience and understanding of our co-authors on the attentional load study, Geraint Rees and Elaine Anderson, for helping us deal with all the stages of grief associated with this. Lastly, I want to thank Benjamin de Haas, the first author of that study for honourably doing the right thing. A lesser man would have simply booked a press conference at Current Biology Total Landscaping instead to say it’s all fake news and announce a legal challenge5.


  1. The sheer magnitude of some of these shifts may also be scientifically implausible, an issue I’ve repeatedly discussed on this blog already. Similar shifts have however been reported in the literature – another clue that perhaps something is awry in these studies…
  2. Not that data sharing is enormously common even now.
  3. It is also a solid data set with a fairly large number of participants. We’ve based our canonical hemodynamic response function on the data collected for this study – there is no reason to stop using this irrespective of whether the main claims are correct or not.
  4. Although it sure would be nice to know, wouldn’t it?
  5. Did you really think I’d make it through a blog post without making any comment like this?

A brave new world of research parasites

What a week! I have rarely seen the definition of irony being demonstrated more clearly in front of my eyes than during the days following the publication of this comment by Lewandowsky and Bishop in Nature. I mentioned this at the end of my previous post. The comment discusses the question how to deal with data requests and criticisms of scientific claims in the new world of open science. A lot of digital ink has already been spilled elsewhere debating what they did or didn’t say and what they meant to say with their article. I have no intention of rehashing that debate here. So while I typically welcome any meaningful and respectful comments under my posts, I’ll regard any comments on the specifics of the L&B article as off-topic and will not publish them. There are plenty of other channels for this.

I think the critics attack a strawman and the L&B discussion is a red herring. Irrespective of what they actually said, I want to get back to the discussion we should be having, which I already alluded to last time.  In order to do so, let’s get the premise crystal clear. I have said all this before in my various posts about data sharing but let me summarise the fundamental points:

  1. Data sharing: All data for scientific studies needed to reproduce the results should be made public in some independent repository at the point of publication. This must exclude data which would be unethical to share, e.g. unprocessed brain images from human participants. Such data fall in a grey area as to how much anonymisation is necessary and it is my policy to err on the side of caution there. We have no permission from our participants (except for some individual cases) to share their data with anyone outside the team if there is a chance that they could be identified from it so we don’t. For the overwhelming majority of purposes such data are not required and the pre-processed, anonymised data will suffice.
  2. Material sharing: When I talk about sharing data I implicitly also mean material so any custom analysis code, stimulus protocols, or other materials used for the study  should also be shared. This is not only good for reproducibility, i.e. getting the same results using the same data. It is also useful for replication efforts aiming to repeat the same experiment to collect new data.
  3. Useful documentation: Shared data are unlikely to be much use to anyone if there isn’t a minimum of documentation explaining what it contains. I don’t think this needs to be excessive, especially given the fact that most data will probably never be accessed by anyone. But there should at least be some basic guide how to use the data to return a result. It should be reasonably clear what data can be found where or how to run the experiment. Provided the uncompiled code is included and the methods section of the publication contains sufficient detail of what is being done, anyone looking at it should be able to work it out by themselves. More extensive documentation is certainly helpful and may also help the researchers themselves in organising their work – but I don’t think we should expect more than the basics.

Now with this out of the way I don’t want to hear no lamentations about how I am “defending” the restriction of data access to anyone or any such rubbish. Let’s simply work on the assumption that the world is how it should be and that the necessary data are available to anyone with an internet connection. So let’s talk about the worries and potential problems this may bring. Note that, as I already said, most data sets will probably not generate much interest. That is fine – they should be available for potential future use in any case. More importantly this doesn’t mean the following concerns aren’t valid:

Volume of criticism

In some cases the number of people reusing the shared data will be very large. This is particularly likely for research on controversial topics. This could be because the topic is a political battleground or that the research is being used to promote policy changes people are not happy with. Perhaps the research receives undeserved accolades from the mainstream media or maybe it’s just a very sensational claim (Psi research springs to mind again…). The criticisms of this research may or may not be justified. None of this matters and I don’t care to hear about the specifics about your particular pet peeve whether it’s climate change or some medical trial. All that matters in this context is that the topic is controversial.

As I said last time, it should be natural that sensational or controversial research attracts more attention and more scepticism. This is how it should be. Scientists should be sceptical. But individual scientists or small research teams are composed of normal human beings and they have a limit with how much criticism they can keep up with. This is a simple fact. Of course this statement will no doubt draw out the usual suspects who feel the need to explain to me that criticism and scepticism is necessary in science and that this is simply what one should expect.


So let me cut the heads off this inevitable hydra right away. First of all, this is exactly what I just said: Yes, science depends on scepticism. But it is also true that humans have limited capacity for answering questions and criticisms and limited ability to handle stress. Simply saying that they should be prepared for that and have no right to complain is unrealistic. If anything it will drive people away from doing research on controversial questions which cannot be a good thing.

Similar, it is unrealistic to say that they could just ignore criticisms if it gets too much for them. It is completely natural that a given scientist will want to respond to criticisms, especially if those criticisms are public. They will want to defend the conclusions they’ve drawn and they will also feel that they have a reputation to defend. I believe science would generally be better off if we all learned to become less invested in our pet theories and conducted our inferences in a less dogmatic way. I hope there are ways we can encourage such a change – but I don’t think you can take ego out of the question completely. Especially if a critic accuses a researcher of incompetence or worse, it shouldn’t surprise anyone if they react emotionally and have personal stakes in the debate.

So what can we expect? To me it seems entirely justified in this situation that a researcher would write a summary response that addresses the criticism collectively. In that they would most likely have to be selective and only address the more serious points and ignore the minutia. This may require some training. Even then it may be difficult because critics might insist that their subtle points are of fundamental importance. In that situation an adjudicating article by an independent party may be helpful (albeit probably not always feasible).

On a related note, it also seems justified to me that a researcher will require time to make a response. This pertains more to how we should assess a scientific disagreement as outside observers. Just because a researcher hasn’t responded to every little criticism within days of somebody criticising their work doesn’t mean that the criticism is valid. Scientists have lives too. They have other professional duties, mortgages to pay with their too-low salaries, children to feed, and – hard as it is to believe – they deserve some time off occasionally. As long as they declare their intention to respond in depth at some stage we should respect that. Of course if they never respond that may be a sign that they simply don’t have a good response to the criticism. But you need some patience, something we seem to have lost in the age of instant access social media.

Excessive criticism or harassment

This brings us to the next issue. Harassment of researchers is never okay. Which is really because harassment of anyone is never okay. So pelting a researcher with repeated criticisms, making the same points or asking the same questions over and over, is not acceptable. This certainly borders on harassment and may cross the line. This constant background noise can wear people out. It is also counterproductive because it slows them down in making their response. It may also paralyse their other research efforts which in turn will stress them out because they have grant obligations to fulfill etc. Above all, stress can make you sick. If you harassed somebody out of the ability to work, you’ll never get a response – this doesn’t make your criticism valid.

If the researchers declared their intention to respond to criticism we should leave it at that. If they don’t respond after a significant time it might be worth a reminder if they are still working on it. As I said above, if they never respond this may be a sign that they have no response. In that case, leave it at that.

It should require no explanation why any blatant harassment, abusive contact, or any form of interference in the researchers’ personal lives, is completely unacceptable. Depending on the severity of such cases they should be prosecuted to the full extent of the law. And if someone reports harassment, in the first instance you should believe them. It is a common tactic of harassers to downplay claims of abuse. Sure, it is also unethical to make false accusations but you should leave that for the authorities to judge, in particular if you don’t have any evidence one way or the other. Harassment is also subjective. What might not bother you may very well affect another person badly. Brushing this off as them being too sensitive demonstrates a serious lack of compassion, is disrespectful, and I think it also makes you seem untrustworthy.

Motive and bias

Speaking of untrustworthiness brings me to the next point. There has been much discussion about the motives of critics and in how far a criticism is to be taken in “good faith”. This is a complex and highly subjective judgement. In my view, your motive for reanalysing or critiquing a particular piece of research is not automatically a problem. All the data should be available, remember? Anyone can reanalyse it.

However, as all researchers should be honest so should all critics. Obviously this isn’t mandatory and it couldn’t be enforced even if it were. But this is how it should be and how good scientists should work. I have myself criticised and reanalysed research by others and I was not beating around the bush in either case – I believe I was pretty clear that I didn’t believe their hypothesis was valid. Hiding your prior notions is disrespectful to the authors and also misleads the neutral observers of the discussion. Even if you think that your public image already makes your views clear – say, because you ranted at great length on social media about how terribly flawed you think that study was – this isn’t enough. Even the Science Kardashians don’t have that large a social media following and probably only a fraction of that following will have read all your in-depth rants.

In addition to declaring your potential bias you should also state your intention. It is perfectly justified to dig into the data because you suspect it isn’t kosher. But this is an exploratory analysis and it comes with many of the same biases that uncontrolled, undeclared exploration always has. Of course you may find some big smoking gun that invalidates or undermines the original authors’ conclusions. But you are just as likely to find some spurious glitch or artifact in the data that doesn’t actually mean anything. In the latter case it would make more sense to conduct a follow up experiment that tests  your new alternative hypothesis to see if your suspicion holds up. If on the other hand you have a clear suspicion to start with you should declare it and then test it and report the findings no matter what. Preregistration may help to discriminate the exploratory fishing trips from the pointed critical reanalyses – however, it is logistically not very feasible to check whether this wasn’t just a preregistration after the fact because the data were already available.

So I think this judgement will always rely heavily on trust but that’s not a bad thing. I’m happy to trust a critic if they declare their prior opinion. I will simply take their views with some scepticism that their bias may have influenced them. A critic who didn’t declare their bias but is then shown to have a bias appears far less trustworthy. So it is actually in your interest to declare your bias.

Now before anyone inevitably reminds us that we should also worry about the motives and biases of the original authors – yes, of course. But this is a discussion we’ve already had for years and this is why data sharing and novel publication models like preregistration and registered reports are becoming more commonplace.

Lack of expertise

On to the final point. Reanalyses or criticism may come from people with limited expertise and knowledge of a research area to provide useful contributions. Such criticisms may obfuscate the discussion and that is never a good thing. Again preempting the inevitable comments: No, this does not mean that you have to prove your expertise to reanalyse the data. (Seriously guys, which part of “all data should be available to anyone” don’t you get?!). What it does mean is that I might not want to weight the criticism by someone who once took a biology class in high school the same way as that of a world expert. It also means that I will be more sceptical when someone is criticising something outside their own field.

There are many situations where this caveat doesn’t matter. Any scientist with some statistical training may be able to comment on some statistical issue. In fact, a statistician is presumably more qualified to comment on some statistical point than a non-statistician of whatever field. And even if you may not be an expert on some particular research topic you may still be an expert on the methods used by the researchers. Importantly, even a non-expert can reveal a fundamental flaw. The lack of a critic’s expertise shouldn’t be misused to discredit them. In the end, what really matters is that your argument is coherent and convincing. For that it doesn’t actually matter if you are an expert or not (an expert may however find it easier to communicate their criticism convincingly).

However, let’s assume that a large number of non-experts are descending on a data set picking little things they perceive as flaws that aren’t actually consequential or making glaring errors (to an expert) in their analysis. What should the researchers do in this situation? Not responding at all is not in their interest. This can easily be misinterpreted as a tacit acknowledgement that their research is flawed. On the other hand, responding to every single case is not in their interest either if they want to get on with their work (and their lives for that matter). As above, perhaps the best thing to do would be write a summary response collectively rebuking the most pertinent points, make a clear argument about the inconsequentialness of these criticisms, and then leave it at that.


In general, scientific criticisms are publications that should work like any other scientific publications. They should be subject to peer review (which, as readers of this blog will know, I believe should be post-publication and public). This doesn’t mean that criticism cannot start on social media, blogs, journal comment sections, or on PubPeer, and the boundaries may also blur at times. For some kinds of criticism, such as pointing out basic errors or misinterpretations some public comments may suffice and there have been cases where a publication was retracted simply because of the social media response. But for a criticism to be taken seriously by anyone, especially non-experts, it helps if it is properly vetted by independent experts – just how any study should be vetted. This may also help particularly with cases where the validity of the criticism is uncertain.

I think this is a very important discussion to have. We need to have this to bring about the research culture most of us seem to want. A brave new world of happy research parasites.


(Note: I changed the final section somewhat after Neuroskeptic rightly pointed out that the conclusions were a bit too general. Tal Yarkoni independently replicated this sentiment. But he was only giving me a hard time.)


Parasitical science?

This weekend marked another great moment in the saga surrounding the discussion about open science – a worthy sequel to “angry birds” and “shameless little bullies”. This time it was an editorial about data sharing in the New England Journal of Medicine which contains the statement that:

There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites.”

Remarks like this from journal editors are just all kinds of stupid. Even though this was presented in the context of quotes by unnamed “front-line researchers” (whatever that means) they implicitly endorse the interpretation that re-using other people’s published data is parasitical. In fact, their endorsement is made clear later on in the editorial when the editors express the hope that data sharing “should happen symbiotically, not parasitically.”

Contact Richard Morey to add this badge to your publications!

It shouldn’t come as a surprise that this editorial was immediately greeted by wide-spread ridicule and the creation of all sorts of internet memes poking fun of the notion of research parasites.” Even if some people believe this, hell, even if the claim were true (spoiler: it’s not), this is just a very idiotic thing to do. Like it or not, open access, transparency, and post-publication scrutiny of published scientific findings are becoming increasingly common and are already required in many places. We’re now less than a year away from the date when the Peer Reviewers Openness Initiative, whose function it is to encourage data sharing, will come into effect. Not only is the clock not turning back on this stuff – it is deeply counterproductive to liken the supporters of this movement to parasites. This is no way to start (or have) a reasonable conversation.

And there should be a conversation. If there is one thing I have learned from talking with colleagues, worries about data sharing and open science as a whole are far from rare. Misguided as it may be, the concern about others scooping your ideas and sifting through your data you spent blood, sweat, and tears collecting resonates with many people. This editorial didn’t just pop into existence from the quantum foam – it comes from a real place. The mocking and snide remarks about this editorial are fully deserved. This editorial is moronic and ass-backwards. But speaking more generally, snide and mocking are never a good way to convince people of the strength of your argument. All too often worries like this are met with disrespect and ridicule. Is it any surprise that a lot of people don’t dare to speak up against open science? Similarly, when someone discovers errors or problems in somebody else’s data, some are quick to make jokes or serious accusations about these researchers. Is this encouraging them to be open their lab books and file drawers? I think not.

Scientists are human beings and they tend to have normal human reactions when being accused of wrong-doing, incompetence, or sloppiness. Whether or not the accusations are correct is irrelevant. Even mentioning the dreaded “questionable research practices” sounds like a fierce accusation to the accused even though questionable research practices can occur quite naturally without conscious ill intent when people are wandering in the garden of forking paths. In my opinion we need to be mindful of that and try to be more considerate in the way we discuss these issues. Social media like Facebook and Twitter do not exactly seem to encourage respectful dialogue. I know this firsthand as I have myself said things about (in my view) questionable research that I subsequently regretted. Scepticism is good and essential to scientific progress – disrespect is not.

It seems to have been the intention of this misguided editorial to communicate a similar message. It encourages researchers using other people’s data to work with the original authors. So far so good. I am sure no sensible person would actually disagree with that notion. But where the editorial misses the point is that there is no plan for what happens if this “symbiotic” relationship isn’t forming, either because the original authors are not cooperating or because there is a conflict of interests between skeptics and proponents of a scientific claim. In fact, the editorial lays bare what I think is the heart of the problem in a statement that to me seems much worse than the “research parasites” label. They say that people…

…even use the data to try to disprove what the original investigators had posited.

It baffles me that anyone can write something like this whilst keeping a straight face. Isn’t this how science is supposed to work? Trying to disprove a hypothesis is just basic Popperian falsification. Not only should others do that, you should do that yourself with your own research claims. To be fair, the best way to do science in my opinion is to generate competing hypotheses and test them with as little emotional attachment to any of them as possible but this is more easily said than done… So ideally we should try to find the hypothesis that best explains the data rather than just seeking to disprove. Either way however, this sentence is clearly symptomatic of a much greater problem: Science should be about “finding better ways of being wrong.” The first step towards this is to acknowledge that anything we posited is never really going to be “true” and that it can always use a healthy dose of scientific scepticism and disproving.

I want to have this dialogue. I want to debate the ways to make science healthier, more efficient, and more flexible in overturning false ideas. As I outlined in a previous post, data sharing is the single most important improvement we can make to our research culture. I think even if there are downsides to it, the benefits outweigh them by far. But not everyone shares my enthusiasm for data sharing and many people seem worried but afraid to speak up. This is wrong and it must change. I strongly believe that most of the worries can be alleviated:

  • I think it’s delusional that data sharing will produce a “class” of “research parasites.” People will still need to generate their own science to be successful. Simply sitting around waiting for other people to generate data is not going to be a viable career strategy. If anything, large consortia like the Human Genome or Human Connectome Project will produce large data sets that a broad base of researchers can use. But this won’t allow them to test every possible hypothesis under the sun. In fact, most data sets are far too specific to be much use to many other people.
  • I’m willing to bet that the vast majority of publicly shared data sets won’t be downloaded, let alone analysed by anyone other than the original authors. This is irrelevant. The point is that the data are available because they could be potentially useful to future science.
  • Scooping other people’s research ideas by doing the experiment they wanted to do by using their published data is a pretty ineffective and risky strategy. In most cases, there is just no way that someone else would be faster than you publishing an experiment you wanted to do using your data. This doesn’t mean that it never happens but I’m still waiting for anyone to tell me of a case where this actually did happen… But if you are worried about it, preregister your intention so at least anyone can see that you planned it. Or even better, submit it as a Registered Report so you can guarantee that this work will be published in a journal regardless of what other people did with your data.
  • While we’re at it, upload the preprints of your manuscripts when you submit them to journals. I still dream of a publication system where we don’t submit to journals at all, or at least not until peer review took place and the robustness of the finding has been confirmed. But until we get there, preprints are the next best thing. With a public preprint the chronological precedence is clear for all to see.

Now that covers the “parasites” feeding on your research productivity. But what to do if someone else subjects your data to sceptical scrutiny in the attempt to disprove what you posited? Again, first of all I don’t think this is going to be that frequent. It is probably more frequent for controversial or surprising claims and it bloody well should be. This is how science progresses and shouldn’t be a concern. And if it actually turns out that the result or your interpretation of it is wrong, wouldn’t you want to know about it? If your answer to this question is No, then I honestly wonder why you do research.

I can however empathise with the fear that people, some of whom may lack the necessary expertise or who cherry pick the results, will actively seek to dismantle your findings. I am sure that this does happen and with more general data sharing this may certainly become more common. If the volume of such efforts becomes so large that it overwhelms an individual researcher and thus hinders their own progress unnecessarily, this would indeed be a concern. Perhaps we need to have a discussion on what safeguards could ensure that this doesn’t get out of hand or how one should deal with that situation. I think it’s a valid concern and worth some serious thought. (Update on 25 Jan 2016: In this context Stephan Lewandowsky and Dorothy Bishop wrote an interesting comment about this).

But I guarantee you, throwing the blame at data sharing is not the solution to this potential problem. The answer to scepticism and scrutiny cannot ever be to keep your data under lock and key. You may never convince a staunch sceptic but you also will not win the hearts and minds of the undecidedly doubtful by hiding in your ivory tower. In science, the only convincing argument is data, more data, better tests – and the willingness to change your mind if the evidence demands it.

Here at CoCoNiT (Cook-Islands Centre Of NeuroImaging Tests) we understand that once you crack the hard shell of your data, the sweet, white knowledge will just come pouring out…

Why wouldn’t you share data?

Data sharing has been in the news a lot lately from the refusal of the authors of the PACE trial to share their data even though the journal expects it to the eventful story of the “Sadness impairs color perception” study. A blog post by Dorothy Bishop called “Who’s afraid of Open Data?” made the rounds. The post itself is actually a month old already but it was republished by the LSE blog which gave it some additional publicity. In it she makes a impassioned argument for open data sharing and discusses the fears and criticisms many researchers have voiced against data sharing.

I have long believed in making all data available (and please note that in the following I will always mean data and materials, so not just the results but also the methods). The way I see it this transparency is the first and most important remedy to the ills of scientific research. I have regular discussions with one of my close colleagues* about how to improve science – we don’t always agree on various points like preregistration, but if there is one thing where we are on the same page, it is open data sharing. By making data available anyone can reanalyse it and check if the results reproduce and it allows you to check the robustness of a finding for yourself, if you feel that you should. Moreover, by documenting and organising your data you not only make it easier for other researchers to use, but also for yourself and your lab colleagues. It also helps you with spotting errors. It is also a good argument that stops reviewer 2 from requesting a gazillion additional analyses – if they really think these analyses are necessary they can do them themselves and publish them. This aspect in fact overlaps greatly with the debate on Registered Reports (RR) and it is one of the reasons I like the RR concept. But the benefits of data sharing go well beyond this. Access to the data will allow others to reuse the data to answer scientific questions you may not even have thought of. They can also be used in meta-analyses. With the increasing popularity and feasibility of large-scale permutation/bootstrapping methods it also means that availability to the raw values will be particularly important. Access to the data allows you to take into account distributional anomalies, outliers, or perhaps estimate the uncertainty on individual data points.

But as Dorothy describes, many scientists nevertheless remain afraid of publishing their actual data alongside their studies. For several years many journals and funding agencies have had a policy that data should always be shared upon request – but a laughably small proportion of such requests are successful. This is why some have now adopted the policy that all data must be shared in repositories upon publication or even upon submission. And to encourage this process recently the Peer Reviewer Openness Initiative was launched by which signatories would refuse to conduct in-depth reviews of manuscripts unless the authors can give a reason why data and materials aren’t public.

My most memorable experience with fears about open data involve a case where the lab head refused to share data and materials with the graduate student* who actually created the methods and collected the data. The exact details aren’t important. Maybe one day I will talk more about this little horror story… For me this demonstrates how far we have come already. Nowadays that story would be baffling to most researchers but back then (and that’s only a few years ago – I’m not that old!) more than one person actually told me that the PI and university were perfectly justified in keeping the student’s results and the fruits of their intellectual labour under lock and key.

Clearly, people are still afraid of open data. Dorothy lists the following reasons:

  1. Lack of time to curate data;  Data are only useful if they are understandable, and documenting a dataset adequately is a non-trivial task;
  2. Personal investment – sense of not wanting to give away data that had taken time and trouble to collect to other researchers who are perceived as freeloaders;
  3. Concerns about being scooped before the analysis is complete;
  4. Fear of errors being found in the data;
  5. Ethical concerns about confidentiality of personal data, especially in the context of clinical research;
  6. Possibility that others with a different agenda may misuse the data, e.g. perform selective analysis that misrepresented the findings;

In my view, points 1-4 are invalid arguments even if they seem understandable. I have a few comments about some of these:

The fear of being scooped 

I honestly am puzzled by this one. How often does this actually happen? The fear of being scooped is widespread and it may occasionally be justified. Say, if you discuss some great idea you have or post a pilot result on social media perhaps you shouldn’t be surprised if someone else agrees that the idea is great and also does it. Some people wouldn’t be bothered by that but many would and that’s understandable. Less understandable to me is if you present research at a conference and then complain about others publishing similar work because they were inspired by you. That’s what conferences are for. If you don’t want that to happen, don’t go to conferences. Personally, I think science would be a lot better if we cared a lot less about who did what first and instead cared more about what is true and how we can work together…

But anyway, as far as I can see none of that applies to data sharing. By definition data you share is either already published or at least submitted for peer review. If someone reuses your data for something else they have to cite you and give you credit. In many situations they may even do it in collaboration with you which could lead to coauthorship. More importantly, if the scooped result is so easily obtained that somebody beats you to it despite your head start (it’s your data, regardless of how well documented it is you will always know it better than some stranger) then perhaps you should have thought about that sooner. You could have held back on your first publication and combined the analyses. Or, if it really makes more sense to publish the data in separate papers, then you could perhaps declare that the full data set will be shared after the second one is published. I don’t really think this is necessary but I would accept that argument.

Either way, I don’t believe being scooped by data sharing is very realistic and any cases of that happening must be extremely rare. But please share these stories if you have them to prove me wrong! If you prefer, you can post it anonymously on the Neuroscience Devils. That’s what I created that website for.

Fear of errors being discovered

I’m sure everyone can understand that fear. It can be embarrassing to have your errors (and we all make mistakes) being discovered – at least if they are errors with big consequences. Part of the problem is also that all too often the discovery of errors is associated with some malice. To err is human, to forgive divine. We really need to stop treating every time somebody’s mistakes are being revealed (or, for that matter, when somebody’s findings fail to replicate) as an implication of sloppy science or malpractice. Sometimes (usually?) mistakes are just mistakes.

Probably nobody wants to have all of their data combed by vengeful sleuths nitpicking every tiny detail. If that becomes excessive and the same person is targeted, it could border on harassment and that should be counteracted. In-depth scrutiny of all the data by a particular researcher should be a special case that only happens when there is a substantial reason, say, in a fraud investigation. I would hope though that these cases are also rare.

And surely nobody can seriously want the scientific record to be littered with false findings, artifacts, and coding errors. I am not happy if someone tells me I made a serious error but I would nonetheless be grateful to them for telling me! It has happened before when lab members or collaborators spotted mistakes I made. In turn I have spotted mistakes colleagues made. None of this would have been possible if we didn’t share our data and methods amongst each another. I am always surprised when I hear how uncommon this seems to be in some labs. Labs should be collaborative, and so should science as a whole. And as I already said, organising and documenting your data actually helps you to spot errors before the work is published. If anything, data sharing reduces mistakes.

Ethical issues with patient confidentiality

This is a big concern – and the only one that I have full sympathy with. But all of our ethics and data protection applications actually discuss this. The only data that is shared should be anonymised. Participants should only be identified by unique codes that only the researchers who collected the data have access to. For a lot of psychology or other behavioural experiments this shouldn’t be hard to achieve.

Neuroimaging or biological data are a different story. I have a strict rule for my own results. We do not upload the actual brain images of our fMRI experiments to public repositories. While under certain conditions I am willing to share such data upon request as long as the participant’s name has been removed, I don’t think it is safe to make those data permanently available to the entire internet. Participant confidentiality must trump the need for transparency. It simply is not possible to remove all identifying information from these files. Skull-stripping, which removes the head tissues from an MRI scan except for the brain, does not remove all identifying information. Brains are like finger-prints and they can easily be matched up, if you have the required data. As someone* recently said in a discussion of this issue, the undergrad you are scanning in your experiment now may be Prime Minister in 20-30 years time. They definitely didn’t consent to their brain scans being available to anyone. It may not take much to identify a person’s data using only their age, gender, handedness, and a basic model of their head shape derived from their brain scan. We must also keep in mind of what additional data mining may be possible in the coming decades that we simply have no idea about yet. Nobody can know what information could be gleaned from these data, say, about health risks or personality factors. Sharing this without very clear informed consent (that many people probably wouldn’t give) is in my view irresponsible.

I also don’t believe that for most purposes this is even necessary. Most neuroimaging studies involve group analyses. In those you first spatially normalise the images of each participant and the perform statistical analysis across participants. It is perfectly reasonable to make those group results available. For purpose of non-parametric permutation analyses (also in the news recently) you would want to share individual data points but even there you can probably share images after sufficient processing that not much incidental information is left (e.g. condition contrast images). In our own work, these considerations don’t apply. We conduct almost all our analyses in the participant’s native brain space. As such we decided to only share the participants’ data projected on a cortical reconstruction. These data contain the functional results for every relevant voxel after motion correction and signal filtering. No this isn’t raw data but it is sufficient to reproduce the results and it is also sufficient for applying different analyses. I’d wager that for almost all purposes this is more than enough. And again, if someone were to be interested in applying different motion correction or filtering methods, this would be a negotiable situation. But I don’t think we need to allow unrestricted permanent access for such highly unlikely purposes.

Basically, rather than sharing all raw data I think we need to treat each data set on a case-by-case basis and weigh the risks against benefits. What should be mandatory in my view is sharing all data after default processing that is needed to reproduce the published results.

People with agendas and freeloaders

Finally a few words about a combination of points 2 and 6 in Dorothy Bishop’s list. When it comes to controversial topics (e.g. climate change, chronic fatigue syndrome, to name a few examples where this apparently happened) there could perhaps be the danger that people with shady motivations will reanalyse and nitpick the data to find fault with them and discredit the researcher. More generally, people with limited expertise may conduct poor reanalysis. Since failed reanalysis (and again, the same applies to failed replications) often cause quite a stir and are frequently discussed as evidence that the original claims were false, this could indeed be a problem. Also some will perceive these cases as “data tourism”, using somebody else’s hard-won results for quick personal gain – say by making a name for themselves as a cunning data detective.

There can be some truth in that and for that reason I feel we really have to work harder to change the culture of scientific discourse. We must resist the bias to agree with the “accuser” in these situations. (Don’t pretend you don’t have this bias because we all do. Maybe not in all cases but in many cases…)

Of course skepticism is good. Scientists should be skeptical but the skepticism should apply to all claims (see also this post by Neuroskeptic on this issue). If somebody reanalyses somebody else’s data using a different method that does not automatically make them right and the original author wrong. If somebody fails to replicate a finding, that doesn’t mean that finding was false.

Science thrives on discussion and disagreement. The critical thing is that the discussion is transparent and public. Anyone who has an interest should have the opportunity to follow it. Anyone who is skeptical of the authors’ or the reanalysers’/replicators’ claims should be able to check for themselves.

And the only way to achieve this level of openness is Open Data.


* They will remain anonymous unless they want to join this debate.

On the value of unrecorded piloting

In my previous post, I talked about why I think all properly conducted research should be published. Null results are important. The larger scientific community needs to know whether or not a particular hypothesis has been tested before. Otherwise you may end up wasting somebody’s time because they repeatedly try in vain to answer the same question. What is worse, we may also propagate false positives through the scientific record because failed replications are often still not published. All of this contributes to poor replicability of scientific findings.

However, the emphasis here is on ‘properly conducted research‘. I already discussed this briefly in my post but it also became the topic of an exchange between (for the most part) Brad Wyble, Daniël Lakens, and myself. In some fields, for example psychophysics, extensive piloting, and “fine-tuning” of experiments is not only very common but probably also necessary. To me it doesn’t seem sensible to make the results of all of these attempts publicly available. This inevitably floods the scientific record with garbage. Most likely nobody will look at it. Even if you are a master at documenting your work, nobody but you (and after a few months maybe not even you) will understand what is in your archive.

Most importantly, it can actually be extremely misleading for others who are less familiar with the experiment to see all of the tests you did ensuring the task was actually doable, that monitors were at the correct distance from the participant, your stereoscope was properly aligned, the luminance of the stimuli was correct, that the masking procedure was effective, etc. Often you may only realise during your piloting that the beautiful stimulus you designed after much theoretical deliberation doesn’t really work in practice. For example, you may inadvertently induce an illusory percept that alters how participants respond in the task. This in fact happened recently with an experiment a collaborator of mine piloted. And more often than not, after having tested a particular task on myself at great length I then discover that it is far too difficult for anyone else (let’s talk about overtrained psychophysicists another time…).

Such pilot results are not very meaningful

It most certainly would not be justified to include them in a meta-analysis to quantify the effect – because they presumably don’t even measure the same effect (or at least not very reliably). A standardised effect size, like Cohen’s d, is a signal-to-noise ratio as it compares an effect (e.g. difference in group means) to the variability of the sample. The variability is inevitably larger if a lot of noisy, artifactual, and quite likely erroneous data are included. While some degree of this can be accounted for in meta-analysis by using a random-effects model, it simply doesn’t make sense to include bad data. We are not interested in the meta-effect, that is, the average result over all possible experimental designs we can dream up, no matter how inadequate.

What we are actually interested in is some biological effect and we should ensure that we take the most precise measurement as possible. Once you have a procedure that you are confident will yield precise measurements, by all means, carry out a confirmatory experiment. Replicate it several times, especially if it’s not an obvious effect. Pre-register your design if you feel you should. Maximise statistical power by testing many subjects if necessary (although often significance is tested on a subject-by-subject basis, so massive sample sizes are really overkill as you can treat each participant as a replication – I’ll talk about replication in a future post so I’ll leave it at this for now). But before you do all this you usually have to fine-tune an experiment, at least if it is a novel problem.

Isn’t this contributing to the problem?

Colleagues in social/personality psychology often seem to be puzzled and even concerned by this. The opacity of what has or hasn’t been tried is part of the problems that plague the field and lead to publication bias. There is now a whole industry meta-analysing results in the literature to quantify ‘excess significance’ or a ‘replication index’. This aims to reveal whether some additional results, especially null results, may have been suppressed or if p-hacking was employed. Don’t these pilot experiments count as suppressed studies or p-hacking?

No, at least not if this is done properly. The criteria you use to design your study must of course be orthogonal to and independent from your hypothesis. Publication bias, p-hacking, and other questionable practices are all actually sub-forms of circular reasoning: You must never use the results of your experiment to inform the design as you may end up chasing (overfitting) ghosts in your data. Of course, you must not run 2-3 subjects on an experiment, look at the results and say ‘The hypothesis wasn’t confirmed. Let’s tweak a parameter and start over.’ This would indeed be p-hacking (or rather ‘result hacking’ – there are usually no p-values at this stage).

A real example

I can mainly speak from my own experience but typically the criteria used to set up psychophysics experiments are sanity/quality checks. Look for example at the figure below, which shows a psychometric curve of one participant. The experiment was a 2AFC task using the method of constant stimuli: In each trial the participant made a perceptual judgement on two stimuli, one of which (the ‘test’) could vary physically while the other remained constant (the ‘reference’). The x-axis plots how different the two stimuli were, so 0 (the dashed grey vertical line) means they were identical. To the left or right of this line the correct choice would be the reference or test stimulus, respectively. The y-axis plots the percentage of trials the participant chose the test stimulus. By fitting a curve to these data we can extrapolate the ability of the participant to tell apart the stimuli – quantified by how steep the curve is – and also their bias, that is at what level of x the two stimuli appeared identical to them (dotted red vertical line):Good

As you can tell, this subject was quite proficient at discriminating the stimuli because the curve is rather steep. At many stimulus levels the performance is close to perfect (that is, either near 0 or 100%). There is a point where performance is at chance (dashed grey horizontal line). But once you move to the left or the right of this point performance becomes good very fast. The curve is however also shifted considerably to the right of zero, indicating that the participant indeed had a perceptual bias. We quantify this horizontal shift to infer the bias. This does not necessarily tell us the source of this bias (there is a lot of literature dealing with that question) but that’s beside the point – it clearly measures something reliably. Now look at this psychometric curve instead:


The general conventions here are the same but these results are from a completely different experiment that clearly had problems. This participant did not make correct choices very often as the curve only barely goes below the chance line – they chose the test stimulus far too often. There could be numerous reasons for this. Maybe they didn’t pay attention and simply made the same choice most of the time. For that the trend is bit too clean though. Perhaps the task was too hard for them, maybe because the stimulus presentation was too brief. This is possible although it is very unlikely that a healthy, young adult with normal vision would not be able to tell apart the more extreme stimulus levels with high accuracy. Most likely, the participant did not really understand the task instructions or perhaps the stimuli created some unforeseen effect (like the illusion I mentioned before) that actually altered what percept they were judging. Whatever the reason, there is no correct way to extrapolate the psychometric parameters here. The horizontal shift and the slope are completely unusable. We see an implausibly poor discrimination performance and extremely large perceptual bias. If their vision really worked this way, they should be severely impaired…

So these data are garbage. It makes no sense to meta-analyse biologically implausible parameter estimates. We have no idea what the participant was doing here and thus we can also have no idea what effect we are measuring. Now this particular example is actually a participant a student ran as part of their project. If you did this pilot experiment on yourself (or a colleague) you might have worked out what the reason for the poor performance was.

What can we do about it?

In my view, it is entirely justified to exclude such data from our publicly shared data repositories. It would be a major hassle to document all these iterations. And what is worse, it would obfuscate the results for anyone looking at the archive. If I look at a data set and see a whole string of brief attempts from a handful of subjects (usually just the main author), I could be forgiven for thinking that something dubious is going on here. However, in most cases this would be unjustified and a complete waste of everybody’s time.

At the same time, however, I also believe in transparency. Unfortunately, some people do engage in result-hacking and iteratively enhance their findings by making the experimental design contingent on the results. In most such cases this is probably not done deliberately and with malicious intent – but that doesn’t make it any less questionable. All too often people like to fiddle with their experimental design while the actual data collection is already underway. In my experience this tendency is particularly severe among psychophysicists who moved into neuroimaging where this is a really terrible (and costly) idea.

How can we reconcile these issues? In my mind, the best way is perhaps to document briefly what you did to refine the experimental design. We honestly don’t need or want to see all the failed attempts at setting up an experiment but it could certainly be useful to have an account of how the design was chosen. What experimental parameters were varied? How and why were they chosen? How many pilot participants were there? This last point is particularly telling. When I pilot something, there usually is one subject: Sam. Possibly I will have also tested one or two others, usually lab members, to see if my familiarity with the design influences my results. Only if the design passes quality assurance, say by producing clear psychometric curves or by showing to-be-expected results in a sanity check (e.g., the expected response on catch trials), I would dare to actually subject “real” people to a novel design. Having some record, even if as part of the documentation of your data set, is certainly a good idea though.

The number of participants and pilot experiments can also help you judge the quality of the design. Such “fine-tuning” and tweaking of parameters isn’t always necessary – in fact most designs we use are actually straight-up replications of previous ones (perhaps with an added condition). I would say though that in my field this is a very normal thing to do when setting up a new design at least. However, I have also heard of extreme cases that I find fishy. (I will spare you the details and will refrain from naming anyone). For example in one study the experimenters ran over a 100 pilot participants – tweaking the design all along the way – to identify those that showed a particular perceptual effect and then used literally a handful of these for an fMRI study that claims to have been about “normal” human brain function. Clearly, this isn’t alright. But this also cannot possibly count as piloting anymore. The way I see it, a pilot experiment can’t have an order of magnitude more data than the actual experiment…

How does this relate to the wider debate?

I don’t know how applicable these points are to social psychology research. I am not a social psychologist and my main knowledge about their experiments are from reading particularly controversial studies or the discussions about them on social media. I guess that some of these issues do apply but that it is far less common. An equivalent situation to what I describe here would be that you redesign your questionnaire because it people always score at maximum – and by ‘people’ I mean the lead author :P. I don’t think this is a realistic situation in social psychology, but it is exactly how psychophysical experiments work. Basically, what we do in piloting is what a chemist would do when they are calibrating their scales or cleaning their test tubes.

Or here’s another analogy using a famous controversial social psychology finding we discussed previously: Assume you want to test whether some stimulus makes people walk more slowly as they leave the lab. What I do in my pilot experiments is to ensure that the measurement I take of their walking speed is robust. This could involve measuring the walking time for a number of people before actually doing any experiment. It could also involve setting up sensors to automate this measurement (more automation is always good to remove human bias but of course this procedure needs to be tested too!). I assume – or I certainly hope so at least – that the authors of these social psychology studies did such pre-experiment testing that was not reported in their publications.

As I said before, humans are dirty test tubes. But you should ensure that you get them as clean as you can before you pour in your hypothesis. Perhaps a lot of this falls under methods we don’t report. I’m all for reducing this. Methods sections frequently lack necessary detail. But to some extend, I think some unreported methods and tests are unavoidable.

Humans apparently also glow with unnatural light