Some questions about Registered Reports

Recently I participated in an event organised by PhD students in Experimental Psychology at UCL called “Is Science broken?”. It involved a lively panel discussion in which the panel members answered many questions from the audience about how we feel science can be improved. The opening of the event was a talk by Chris Chambers of Cardiff University about Registered Reports (RR), a new publishing format in which researchers preregister their introduction and methods sections before any data collection takes place. Peer review takes place in two stages: first, reviewers evaluate the appropriateness of the question and the proposed experimental design and analysis procedures, and then, after data collection and analysis have been completed and the results are known, peer review continues to finalise the study for publication. This approach is aimed to make scientific publishing independent from the pressure to get perfect results or changing one’s apparent hypothesis depending on the outcome of the experiments.

Chris’ talk was in large part a question and answer session for specific concerns with the RR approach that had been raised at other talks or in writing. Most of these questions he (and his coauthors) had already answered in a similar way in a published FAQ paper. However, it was nice to see him talk so passionately about this topic. Also speaking for myself at least I can say that seeing a person arguing their case is usually far more compelling than reading an article on it – even though the latter will in the end probably have a wider reach.

Here I want to raise some additional questions about the answers Chris (and others) have given to some of these specific concerns. The purpose in doing so is not to “bring about death by a thousand cuts” to the RR concept as Aidan Horner calls it. I completely agree with Aidan that many concerns people have with RR (and lighter forms of preregistration) are probably logistical. It may well be that some people just really want to oppose this idea and are looking for any little reason to use as an excuse. However, I think both sides of the debate about this issue have suffered from a focus on potentials rather than fact. We simply won’t know how much good or bad preregistration will do for science unless we try it. This seems to be a concept that everyone at the discussion was very much in agreement on and we all discussed ways in which we could actually assess the evidence for whether RRs improved science over the next few decades.

Therefore I want it to be clear that the points I raise are not an ideological opposition to preregistration. Rather they are some points where I found the answers Chris describe to be not entirely satisfying. I very much believe that preregistration must be tried but I want to provoke some thought about possible problems with it. The sooner we are aware of these issues, the sooner they can be fixed.

Wouldn’t reviewers rely even more on the authors’ reputation?

In the Stage 1 of an RR, when only the scientific question and experimental design are reviewed, reviewers have little to go on to evaluate the protocol. Provided that the logic of the question and the quality of the design are evident, they would hopefully be able to make some informed decisions about it. However, I think it is a bit naive to assume that the reputation of the authors isn’t going to influence the reviewers’ judgements. I have heard of many grant reviews asking questions as to whether the authors would be capable of pulling off some proposed research. There is an extensive research literature on how the evaluation of identical exam papers or job applications or whatnot can be influenced by factors like the subject’s gender or name. I don’t think simply saying “Author reputation is not among” the review criteria is enough of a safeguard.

I also don’t think that having a double-blind review system is necessarily a good way to protect against this. There have been wider discussions about the short-comings of double-blind reviews and this situation is no different. In many situations you could easily guess the authors’ identity by the experimental protocol alone. And double blind review suffers even more from one of the main problems with anonymous reviewers (which I generally support): when the reviewers guess the authors’ identities incorrectly that could have even worse consequences because their decision will be based on an incorrect assessment of the authors.

Can’t people preregister experiments they have already completed?

The general answer here is that this would constitute fraud. The RR format would also require time stamped data files and lab logs to guarantee that data were produced only after the protocol has been registered. Both of these points are undeniably true. However, while there may be such a thing as an absolute ethical ideal, in the end a lot of our ethics are probably governed by majority consensus. The fact that many questionable research practices are apparently so widespread is presumably just that: while most people deep down understand that these things are “not ideal”, they may nonetheless engage in them because they feel that “everybody else does it.”

For instance, I often hear that people submit grant proposals for experiments they have already completed although I have personally never seen this with any grant proposals. I have also heard that it is more common in the US perhaps but at least based on all the anecdotes it may generally be widespread. Surely this is also fraudulent but nevertheless people apparently do it?

Regarding time stamped data, I also don’t know if this is necessarily a sufficiently strong safeguard. For the most part, time stamps are pretty easy to “adjust”. Crucially, I don’t think many reviewers or post-publication commenters will go through the trouble of checking them. Faking time stamps is certainly deep into fraud territory but people’s ethical views are probably not all black and white. I could easily see some people bending the rules just that little, especially if preregistered studies become a new gold standard in the scientific community.

Now perhaps this is a bit too pessimistic a view of our colleagues. I agree that we probably should not exaggerate this concern. But given the concerns people have with questionable research practices now I am not entirely sure we can really just dismiss this possibility by saying that this would be fraud.

Finally, another answer to this concern is that preregistering your experiment after the fact would backfire because the authors could then not implement any changes suggested by reviewers in Stage 1. However, this only applies to changes in the experimental design, the stimuli or apparatus etc. The most confusing corners in the “garden of forking paths” are usually the analysis procedure, not the design. There are only so many ways to run a simple experiment and most minor changes suggested by a reviewer could easily be dismissed by authors. However, changes to the analysis approach could quite easily be implemented after the results are known.

Reviewers could steal my preregistered protocol and scoop me

I agree that this concern is not overly realistic. In fact, I don’t even believe the fear of being scooped is overly realistic. I’m sure it happens (and there are some historical examples) but it is far rarer than most people believe. Certainly it is rather unlikely for a reviewer to do this. For one thing, time is usually simply not on their side. There is a lot that could be written about the fear of getting scooped and I might do that at some future point. But this is outside the scope of this post.

Whatever its actual prevalence, the paranoia (or boogieman) of scooping is clearly widespread. Until we find a way to allay this fear I am not sure that it will be enough to tell people that the Manuscript Received date of a preregistered protocol would clearly show who had the idea sooner. First of all, the Received date doesn’t really tell you when somebody had an idea. The “scooper” could always argue that they had the idea before that date but only ended up submitting the study afterwards (and I am sure that actually happens fairly often without scooping).

More importantly though, one of the main reasons people are afraid of being scooped is the pressure to publish in high impact journals. Having a high impact publication has greater currency than the Received date of a RR in what is most likely a lower impact journal. I doubt many people would actually check the date unless you specifically point it out to them. We already now have a problem with people not reading the publications of job and grant applicants but relying on metrics like impact factors and h-indeces. I don’t easily see them looking through that information.

As a junior researcher I must publish in high impact journals

I think this is an interesting issue. I would love nothing more than if we could stop caring about who published what where. At the same time I think that there is a role for high impact journals like Nature, Science or Neuron (seriously folks, PNAS doesn’t belong in that list – even if you didn’t boycott it like me…). I would like the judgement of scientific quality and merit to be completely divorced from issues of sensationalism, novelty, and news that quite likely isn’t the whole story. I don’t know how to encourage that change though. Perhaps RRs can help with that but I am not sure they’re enough. Either way, it may be a foolish and irrational fear but I know that as a junior scientist I (and my postdocs and students) currently do seek to publish at least some (but not exclusively) “high impact” research to be successful. But I digress.

Chris et al. write that sooner or later high impact outlets will probably come on board with offering RRs. I don’t honestly see that happening, at least not without a much more wide-ranging change in culture. I think RRs are a great format for specialised journals to have. However, the top impact journals primarily exist for publishing exciting results (whatever that means). I don’t think they will be keen to open the floodgates for lots of submissions that aim to test exciting ideas but fail to deliver the results to match them. What I could see perhaps is a system in which a journal like Nature would review a protocol and accept to publish it in its flagship journal if the results are positive but in its lower-impact outlet (e.g. Nature Communications) if the results are negative. The problem with this idea is that it somehow goes against the egalitarian philosophy of the current RR proposals. The publication again would be dependent on the outcome of the experiments.

Registered Reports are incompatible with short student projects

After all the previous fairly negative points I think this one is actually about a positive aspect of science. For me this is actually one of the greatest concerns. In my mind this is a very valid worry and Chris and co acknowledge this also in their article. I think RRs would be a viable solution for experiments by a PhD student but for master students, who are typically around for a few months only, it is simply not very realistic to first submit a protocol and revising it over weeks and months of reviews before even collecting the first data.

A possible solution suggested for this problem is that you could design the experiments and have them approved by peer reviewers before the students commence. I think this is a terrible idea. For me perhaps the best part of supervising student projects in my lab is when we discuss the experimental design. The best students typically come with their own ideas and make critical suggestions and improvements to the procedure. Not only is this phase very enjoyable but I think designing good experiments is also one of the most critical skills for junior scientists to learn. By having the designs finalised before the students even step through the door of the lab would undermine that.

Perhaps for those cases it would make more sense to just use light preregistration, that is, uploading your protocol to a timestamped archive but without external review. But if in the future RR do become the new gold standard, I would worry that this denigrates the excellent research projects of many students.

Wrapping up…

As I said, these points are not meant to shoot down the concept of Registered Reports. Some of the points may not even be such enormous concerns at all. However, I hope that my questions provoke thought and that we can discuss ways to improve the concept further and find safeguards against these possible problems.

Sorry this post was very long as usual but there seems to be a lot to say. My next post though will be short, I promise! 😉

8 thoughts on “Some questions about Registered Reports

  1. As someone who has been through the RR process, I’d like to give my views on some of these points.
    NN: Wouldn’t reviewers rely even more on the authors’ reputation? In the Stage 1 of an RR, when only the scientific question and experimental design are reviewed, reviewers have little to go on to evaluate the protocol.
    DB: Nope. You have to give a comprehensive account of what you plan, thinking through every aspect of rationale, methods and analysis: Cortex really doesn’t want to publish anything flawed and so they screw you down on the details.
    NN: I think it is a bit naive to assume that the reputation of the authors isn’t going to influence the reviewers’ judgements.
    DB: Why any more so than for other publication methods? I really find this concern quite an odd one.
    NN: Can’t people preregister experiments they have already completed? For the most part, time stamps are pretty easy to “adjust”.
    DB: Well, for a start registered reports require you to have very high statistical power and so humungous great N. Most people who just happened to do a study won’t meet that criterion. Second, as Chris pointed out in his talk, if you submit a registered report, then it goes out to review, and the reviewers do what reviewers do, i.e. make suggestions for changes, new conditions, etc etc. They do also expect you to specify your analysis in advance: that is one of the important features of RR. So you would be unlikely to get through the RR process even if you did decide to fake your time stamps (and let’s face it, if you’re going to do that, you are beyond redemption).
    NN: Reviewers could steal my preregistered protocol and scoop me
    DB; Since you yourself don’t find this all that plausible, I won’t rehearse the reasons why it isn’t.
    NN: As a junior researcher I must publish in high impact journals
    DB: Your view on this may be reinforced by PIs in your institution. However, be aware that there are some senior people who are more interested in whether your research is replicable than whether it is sexy. And who find the soundbite form of reporting required by Nature etc quite inadequate.
    NN; Registered Reports are incompatible with short student projects
    DB: Yup, but nobody is saying that short student projects – or indeed other types of research – have to be registered reports. As you point out, they could still benefit from having a registered protocol. It is easy to do that in Open Science Framework. My own view is that I would go for a registered report in cases where it is feasible, as it has three big benefits – 1) you get good peer review before doing the study, 2) it can be nice to have a guarantee of publication and 3) you don’t have to convince people that you didn’t make up the hypothesis after seeing the data. But where it’s not feasible, I’d go for a registered protocol on OSF which at least gives me (3).
    At the end of the day, I’d suggest you give it a try. If nothing else, I think you’d find the process of submitting a RR a useful exercise to go through that has potential to improve your research.

    Like

  2. Thanks for your very quick yet extensive reply! I will try to respond to your detailed points later, hopefully next week. But I certainly agree with your last paragraph. There is no real way of knowing how it will work unless we try it.

    And I’m coming to the conclusion that getting some expert feedback on a design before you begin is certainly a good idea. We are of course already doing this for imaging projects at least by having internal project presentations. At least in my department they are supposed to be uploaded to the web (but we’re admittedly way behind doing that). Still some wider feedback is probably a good idea especially because you can get reviewers who are more familiar with the topic than your departmental colleagues.

    Like

  3. Very thoughtful post. I also attended that meeting and like you, came away feeling there was a lot to like about the RR scheme (for *some* kinds of research), but also quite a few issues that still need resolving.

    RR seems like a good idea for things like clinical trials, or straightforward yes-no hypothesis-testing, but may of us do work that is far more exploratory and unpredictable. I think of it as like hacking through the jungle on a newly-discovered continent – you think you are hunting tigers but end up finding gold. This unpredictability generates hypotheses never considered at the beginning, which you have to test with further experiments, and that turns up more unexpected findings, and so on… all on far too fast a timescale to be practical in terms of going back and forth with editors and reviewers. So, I think RR is great for areas that are fairly well understood where gaps are being filled in (in fact I think it’s an excellent idea) but I can’t see it becoming universal. I don’t think that is the intention in fact, although some people have worried about that (http://bit.ly/1EzEl2H).

    It will be interesting to see how things develop in the coming few years.

    Like

    1. Hi Kate. RR does not preclude exploratory analysis – it just makes it clear what is exploratory and what is not.
      Perhaps the most informative experience I had was when I was playing with simulated random data. I was tired and forgot what I was doing: when I do these simulations I always label things as if they were real data, so I can have a context to relate them to. Well, I actually had the experience of finding a massive and exciting interaction – thought it was gold and tigers all rolled into one – until i remembered that I was analysing totally random data. So I ‘m not saying don’t explore, but you absolutely have to be clear when a result was obtained by exploration and when it was driven by an a priori test or you run the risk of fooling yourself.
      And I find that people who emphasise the importance of exploration always reply to this kind of argument by saying they will of course replicate the finding. In which case you are clearly in RR territory.

      Like

  4. Dorothy Bishop, can you expand on why you think it is crucial to distinguish exploratory findings from a priori predictions? The data don’t know which is which. So I’m assuming the argument is that it is crucial not to present an accidental finding as though you designed the study to look for it — for the sake of intellectual honesty and not ‘fooling yourself’. But in terms of what is true, if it is a real effect that stands up to replication, what does it matter whether the person/team that first found it predicted it or not?

    Like

    1. Jim: If you attach a p-value to a finding, that implies it was obtained by hypothesis-testing, not exploration. If you stumble across something, it makes no sense to do statistical tests that give you a p-value. Please see http://deevybee.blogspot.co.uk/2013/06/interpreting-unexpected-significant.html for further clarification.
      If, however, you stumble across something, formulate a hypothesis on that basis and replicate it, then I agree it doesn’t matter how you first arrived at the hypothesis.
      Clearest way I can show this is with simulated random numbers. Each row of this table is from a new run of random data submitted to 4 way ANOVA. http://www.slideshare.net/deevybishop/to-accompany-blogpost. What you can see is that, reading across rows, many of them find something significant (in white).
      So if I was just exploring the data and got excited about anything with low p-value, I’d kid myself.
      However, if I had predicted a specific effect (main effect or interaction) in advance, then the p-values make sense; on only around 1 in 20 runs do I find a ‘significant’ result in random data.
      This example also illustrates how if I found something by exploring, if it was really just a type I error, then I’d not replicate it on a new run.

      Like

  5. Hi Sam – thanks for such a detailed and considered post and apologies for my Johnny-come-lately response! You raise many good points, which I’ll try address in turn. I can’t promise these will be very satisfying answers because, like you, I think we’re too early in the life of RRs to know how serious some might be. But here goes anyway!

    1. On the issue of reviewers being swayed by an author’s reputation or track record in assessing a Stage 1 protocol, I agree that the RR mechanism isn’t completely immune to this problem. My response is twofold: first, conventional review is also vulnerable to this problem and I don’t think that seeing the results of an experiment necessarily neutralises or even reduces reputational bias. The exception to this may be that seeing the results can assure reviewers that the data passed sufficient quality checks (independent of the outcome of the hypothesis tests) but this is why the specification of outcome-neutral checks is such a core component of the Stage 1 and Stage 2 review criteria of an RR – the prospective specification of such checks is a necessary component at Stage 1 and failure to meet them is a formal reason for rejection at Stage 2.

    Of course this may not prevent a reviewer from, say, recommending the rejection of a protocol at Stage 1 because the authors have a limited track record in the field. Which brings me to my second point, which is the role of the editor: In such cases it is the job of the editor(s) to ensure that editorial decisions are not based on such assessments (if made explicitly) and to detect them and filter them out when insinuated. Where the reputation of the authors may be questioned directly or indirectly, it is incumbent on the reviewers (and editors) to express such concerns in terms of the *evidence* the authors would need to show at Stage 2 to confirm that their study was performed at a sufficiently high quality to justify publication. These can and should be identified a priori to prevent the bias that reviewers might “like” the outcomes of the hypothesis tests and so overlook quality concerns (or vice versa – shifting the goal posts for outcomes they dislike). I might be argued that by formally entering these criteria into the review process and requiring them to be stated prospectively, the RR mechanism may actually be more robustly protected against reputational bias than conventional review.

    2. On the issue of reviewer pre-registering protocols for studies already completed, I very much doubt that a majority consensus in our field would consider such behaviour in any way legal or even a “grey” research practice. Doing so is unequivocally fraudulent. Your point about the fact that researchers often submit grant applications for work that they have (at least partially) done already is well made, but the key difference in my mind is that “post-registration” requires an explicitly dishonest act, whereas submitting a grant based on already completed work is a very grey area which, to my knowledge, many funders don’t explicitly forbid. If the authors are genuinely planning to do the experiments again then the prior work can be entered as pilot data (I have done occasionally in grant apps, transparently of course). If the authors plan to do completely *different* experiments once they get the money then I agree this is fraudulent (I have never done that!). But I’m not sure how common this latter kind of behaviour is; Iike you I have heard of occasional cases but I don’t know of any evidence to suggest it is the norm; therefore I don’t see why an even more egregious violation of such principles would present a serious threat to the integrity of RRs.

    Of course, the big reason why RR “post-registration” is futile is that reviewers *always* ask for changes to the experimental procedures. Examples at Cortex so far include everything from subtle changes to the experimental stimuli to the addition of new experimental conditions, more participants, and even different samples. Every RR has had to make at least some change to their proposed experimental procedure, and it is of course impossible (without a time machine) to change the procedural methodology of a “proposed” experiment after it has been completed. In such cases, dodgy authors would either have to withdraw their paper (bluff called) or they would have to deepen their fraud by falsely stating that they undertook a particular method at Stage 2 when they didn’t. This, in addition to faking the laboratory logs is unambiguously fraudulent, and I can imagine the ORI, the authors’ employer, or the authors’ peers taking anything other than a very dim view on such behaviour.

    3. On the issue of scooping, my view is that this is a virtual non issue in our field, and certainly no more a concern for RRs than that faced by researchers when they submit a grant proposal, an ethics application, give a talk at a conference that includes mention of any prospective plans, or even submit a standard (non-RR) manuscript with results that is rejected after review. For RRs, the ‘protocol received’ date at least allows an author to prove that where something very similar was published, they had the idea first (if “being first” even matters – in most cases it is irrelevant). I agree with you that this kind of detail might be lost in the flurry of an unscrupulous RR reviewer publishing a Nature paper, but the scooper in this case might find that such behaviour backfires in terms of their own reputation. Scientists love to gossip and mud sticks! In any case I think such circumstances are so rare that they present no significant threat to RRs.

    4. On the issue of junior researchers needing to publish in high impact journals, I agree with what you say here. As I said in my talk at the UCL event, I don’t believe it should ever fall on junior researchers to drive reforms, though they should be encouraged to do so where it is in their interests, and it is our job as the more senior members of the field to create positive incentives that make it in their best interests. For example, when I advertise for my two 4-year postdoctoral positions next year (attached to my ERC consolidator grant), the essential criteria will include that applicants have made a demonstrable contribution to research transparency, either by publicly archiving data at the point of publication (for at least one prior publication), or by pre-registering a protocol. You are right that it is harder to convince high-impact journals to adopt RRs because the publication bias that RRs prevent is their modus operandi, but without saying too much (yet) I can tell you that talking to them isn’t futile (watch this space…)

    5. On the issue of RRs being incompatible with short student projects (like undergraduates or MSc students), I agree that this might be true depending on the time scale and the extent to which the student is responsible for the design of the study. It will be interesting to see how this pans out. Perhaps student projects are better aligned with more exploratory research rather than confirmatory hypothesis testing, or perhaps we can adopt different ways of doing student research at junior levels. For instance, we could adopt a daisy-chain model where students propose an experiment with their supervisor, which is submitted as a RR but conducted by the *next* student, who in turn undertakes that study while pre-registering a future one. In this way, the students gain experience designing and conducting/analysing research within a time scale that is also compatible with RRs. In some ways this could actually be a superior training method because preparing a Stage 1 RR is an immensely valuable pedagogical tool for learning how to do deductive science.

    I’m not sure how satisfactory these responses are – as you say, we don’t have the evidence yet to really know what potential unintended consequences RRs might have. The best thing we can do, in my view, is widen their availability as much as possible and collect that evidence, adjusting the mechanism as necessary to make it is robust and fair as possible. Thanks for the interesting discussion!

    Like

    1. Thanks for your detailed response, Chris. I think a lot of what you say makes perfect sense. And as we both already said, in the end we need to wait and see. I hope we can look out for answers to these particular points. I’ll try to be very brief (really!) in my reply:

      1. I agree that reputation can be an issue with current review practices as well. I just wonder if that might get worse when you don’t see the results. My guess is such biases most likely won’t be explicitly stated but simply influence the depth and quality of the review. But with review explicit guidelines that may perhaps be avoided. Another thing that may help is to encourage pilot/preliminary data to be included with the Stage 1 submission to show feasibility (and I’m with you on grants – preliminary data makes total sense there as well). Considering you expect strict power calculations this could be useful anyway.

      2. Yes the more I keep thinking about it, the more I think it is probably unrealistic that many people will engage in misconduct of that scale. Then again I’ve heard and seen some bad things – I think it depends on how strict the journal is about the log and data availability. I think what might help there is if there is a fixed data log entry in your submission. As much as I hate form-filling, I think it would make sense to have the same system for everyone as that makes it easier for reviewers to check it. In the end, the greatest protection from that sort of thing is if it is substantially more difficult to fake a pre-registered study than it is to do one for real. It sounds like it might be.

      3. You are perhaps also right that the gossip surrounding even a purportedly scooped study would be tremendous. In general, we agree that scooping is a non-issue. It’s not like I don’t sometimes keep my ideas close to my chest before we start them but once a study is already in progress (which it is when you are in Stage 1 review) the fear of being scooped seems pretty paranoid. One of my collaborators actually had an issue that looked like someone was trying to scoop their ideas (I don’t believe they are. It’s probably a misremembering or a genuine duplicate idea. Great minds think alike…). I told my collaborator they should pre-register the experiment to avoid being scooped but they didn’t want to 😛

      4. Interesting about high impact journals. I’d be quite curious to see how they would implement it. As I said it might not be impossible. I am not sure I’d like the two-tier system I discussed but I suppose it may sometimes be possible to preregister very groundbreaking ideas and then publish the findings in Nature regardless of the outcome…

      5. I think it is probably right that short term student projects may be more amenable to exploratory work, pilot experiments etc. I suppose you still have the option of basic preregistration without reviews. I’m not so sure of the daisy chain idea. I agree it could be nice for teaching but I think it deprives the student of the ownership of their ideas. While I love designing experiments, I think it is also very rewarding to see the results coming in. But either way, this is not an insurmountable problem. It just means that short student projects may not be very suitable for the kind of research for which you should do RR.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s