Of hacked peas and crooked teas

The other day, my twitter feed got embroiled in another discussion about whether or not p-hacking is deliberate and if it constitutes fraud. Fortunately, I then immediately left for a trip abroad and away from my computer, so there was no danger of me being drawn into this debate too deeply and running the risk of owing Richard Morey another drink. However, now that I am back I wanted to elaborate a bit more on why I think the way our field has often approached p-hacking is both wrong and harmful.

What the hell is p-hacking anyway? When I google it I get this Wikipedia article, which uses it as a synonym for “data dredging”. There we already have a term that seems to me more appropriate. P-hacking refers to when you massage your data and analysis methods until your result reaches a statistically significant p-value. I will put it to you that in practice most p-hacking is not necessarily about hacking p-s but about dredging your data until your results fit a particular pattern. That may be something you predicted but didn’t find or could even just be some chance finding that looked interesting and is amplified this way. However, the p-value is usually probably secondary to the act here. The end result may very well be the same in that you continue abusing the data until a finding becomes significant, but I would bet that in most cases what matters to people is not the p-value but the result. Moreover, while null-hypothesis significance testing with p-values is still by far the most widespread way to make inferences about results, it is not the only way. All this fussing about p-hacking glosses over the fact that the same analytic flexibility or data dredging can be applied to any inference, whether it is based on p-values, confidence intervals, Bayes factors, posterior probabilities, or simple summary statistics. By talking of p-hacking we create a caricature that this is somehow a problem specific to p-values. Whether or not NHST is the best approach for making statistical inferences is a (much bigger) debate for another day – but it has little to do with p-hacking.

What is more, not only is p-hacking not really about p’s but it is also not really about hacking. Here is the dictionary entry for the term ‘hacking‘. I think we can safely assume that when people say p-hacking they don’t mean that peas are physically being chopped or cut or damaged in any way. I’d also hazard a guess that it’s not meant in the sense of “to deal or cope with” p-values. In fact, the only meaning of the term that seems to come even remotely close is this:

“to modify a computer program or electronic device in a skillful or clever way”

Obviously, what is being modified in p-hacking is the significance or impressiveness of a result, rather than a computer program or electronic device, but we can let this slide. I’d also suggest that it isn’t always done in a skillful or clever way either, but perhaps we can also ignore this. However, the verb ‘hacking’ to me implies that this is done in a very deliberate way. It may even, as with computer hacking, carry the connotation of fraud, of criminal intent. I believe neither of these things are true about p-hacking.

That is not to say that p-hacking isn’t deliberate. I believe in many situations it likely is. People no doubt make conscious decisions when they dig through their data. But the overwhelming majority of p-hacking is not deliberately done to create spurious results that the researcher knows to be false. Anyone who does so would be committing actual fraud. Rather, most p-hacking is the result of confirmation bias combined with analytical flexibility. This leads people to sleep walk into creating false positives or – as Richard Feynman would have called it – fooling themselves. Simine Vazire already wrote an excellent post about this a few years ago (and you may see a former incarnation of yours truly in the comment section arguing against the point I’m making here… I’d like to claim that it’s cause I have grown as a person but in truth I only exorcised this personality :P). I’d also guess that a lot of p-hacking happens out of ignorance, although that excuse really shouldn’t fly as easily in 2017 as it may have done in 2007. Nevertheless, I am pretty sure people do not normally p-hack because they want to publish false results.

Some may say that it doesn’t matter whether or not p-hacking is fraud – the outcome is the same: many published results are false. But in my view it’s not so simple. First, the solution to these two problems surely isn’t the same. Preregistration and transparency may very well solve the problem of analytical flexibility and data dredging – but it is not going to stop deliberate fraud, nor is it meant to. Second, actively conflating fraud and data dredging implicitly accuses researchers of being deliberately misleading and thus automatically puts them on the defensive. This is hardly a way to have a productive discussion and convince people to do something about p-hacking. You don’t have to look very far for examples of that playing out. Several protracted discussions on a certain Psychology Methods Facebook group come to mind…

Methodological flexibility is a real problem. We definitely should do something about it and new moves towards preregistration and data transparency are at least theoretically effective solutions to improve things. The really pernicious thing about p-hacking is that people are usually entirely unaware of the fact that they are doing it. Until you have tried to do a preregistered study, you don’t appreciate just how many forks in the road you passed along the way (I may blog about my own experiences with that at some point). So implying, however unintentionally, that people are fraudsters is not helping matters.

Preregistration and data sharing have gathered a lot of momentum over the past few years. Perhaps the opinions of some old tenured folks opposed to such approaches no longer carry so much weight now, regardless how powerful they may be. But I’m not convinced that this is true. Just because there has been momentum now does not mean that these ideas will prevail. It is just as likely that they fizzle out due to lacking enthusiasm or because people begin to feel that the effort isn’t worth it. I seems to me that “open science” very much exists in a bubble and I have bemoaned that before. To change scientific practices we need to open the hearts and minds of sceptics to why p-hacking is so pervasive. I don’t believe we will achieve that by preaching to them. Everybody p-hacks if left to their own devices. Preregistration and open data can help protect yourself against your mind’s natural tendency to perceive patterns in noise. A scientist’s training is all about developing techniques to counteract this tendency, and so open practices are just another tool for achieving that purpose.

1920px-fish2c_chips_and_mushy_peas
There is something fishy about those pea values…

 

A few pints in Brussels

Good day to you, my name is David but my frenemies call me Tory Dave. Last night, I went to the pub with some mates of mine for some pints. I say “mates” but to be honest I don’t really like most of these people very much. They are all foreigners and – while I know it’s not okay to say this out loud – I must admit I don’t really like foreigners, except perhaps when they are white and come from one of the former colonies.

The Italian guy is just so lazy, always on siesta, and the Spaniard is from some filthy place like Venice or some such and always complains when I put chorizo in my risotto. That German lass, Angie, has no sense of humour and they always beat us at the football. But the worst is of course the French guy, Michel. I’ve never liked the French. He always smells of garlic and looks so sour. This is why I always pre-drink before I even met those guys in the pub and why I look so cheerful in the photos.

So anyway, the others bought a few rounds. But when it came to my turn I just thought “Nah, you know what, I’ll just go home.” For some reason all the others got really pissed off that I wasn’t going to buy a round for them. At first I wanted to tell them to go whistle but then I took a deep breath lest they kick me out on the kerb without my umbrella and bowler hat.  I told them I would carefully check every item on the bill and only pay for what I drank, but that they should keep in mind that I already paid loads many years ago even though nobody can really remember. They claim that the German, French, and Italian all put in more than me but of course that’s just because of that discount I’ve had in the pub for decades because I used to be a lot less well-off than I am now. I felt it was best not to bring that up though because that’s always been a sore spot with them. Instead I just shrugged and stumbled home. They are still asking me to pay them back now because apparently “we had all agreed on that” but I have no recollection of that at all. Perhaps I shouldn’t have had all that gin before joining them.

Anyway, I still haven’t paid and won’t give them a penny. All the same, I’m sure they will all be happy to see me again soon because they want to sell me their cars and prosecco. After all, they’ll know I’ve saved up for that because I didn’t waste my hard earned cash on buying them rounds in the pub. Besides I have such excellent jams made from strawberries in my garden. Jam production costs me next to nothing because I just invite the neighbours around to pick them for a pittance. Shame is those bastards always overstay their welcome so I have now told them they can come for two hours but then they have to go home. I don’t mean to sound ungrateful, some of the Poles actually came to fix my radiator once and the Dutch put a plaster on my scraped knee when I fell over drunk after the pub. And they’ve always put in a couple of coins in the piggy bank (I’m saving for a cool model train for my kids). So I totally think these guys are valued members of our community here. I just don’t want them in my house.

Obviously, the ones I already let come in can stay, they just need to carry a card at all times so we know who they are and that they can get a cup of tea when I put on the kettle while the newcomers don’t. This is only fair. By the way, the card is totally not an “ID card” – I hate those obviously. It’s simply a card you use to identify yourself with. Surely, the other people coming in later won’t mind. They’re all happy they can come to pick my strawberries and then piss off again when I’m tired of hearing them speak that foreign gibberish. They steadfastly refuse to spend the fortune they make working for me on proper English lessons. Seriously, sometimes I don’t hear a word of English before I make it upstairs to the bedroom – and even then it’s mostly because I’m mumbling to myself about sovereignty and all the Islamic extremists from devout Muslim countries like Austria and Poland.

Naturally, the ID ca… sorry not-an-ID-card will only be required for people who don’t already live at this address, by which I mean who were born here or who lived over six years at this address, took an exam on the history of my house that I couldn’t pass myself in a million years, and who spent between £1200-2000 for the honour to pledge allegiance to the Queen. Because you can totally tell the difference between these people and the ones that didn’t. They just immediately become part of the family and show it by wearing tweed jackets, going on illegal fox hunts, and losing the ability to speak any other language but English.

Obviously, if anyone living at this address wants to marry someone who doesn’t they can just get the hell out. Why should I let some random slit-eyed or brown person live in my house just because my son or daughter wants to be with their spouse? Wait, did I say that out loud? I meant some person from one of those wonderful countries that I would like to sell lots of stuff to. You know, things like my strawberry jam.

Anyway, I disgress. If those bullies in the pub keep insisting that I pay for a round then I’ll just walk away. No round is better than a bad round, by which I mean a round that isn’t free. If they don’t want to spend time with me, then I am sure I’ll find someone else to have drinks with. Like Theresa, the vicar’s daughter from down the road, or Moggy, who looks and talks more and more like a vicar every day himself. Or perhaps Boris the Blond although he’s really a bit of a clown. And of course I can always go over to the golf course for some well-done steak with ketchup and chlorine chicken with Donny. Provided he doesn’t go to jail first.

1

Angels in our midst?

A little more on “tone” – but also some science

This post is somewhat related to the last one and will be my last words on the tone debate*. I am sorry if calling it the “tone debate” makes some people feel excluded from participating in scientific discourse. I thought my last post was crystal clear that science should be maximally inclusive, that everyone has the right to complain about things they believe to be wrong, and that unacceptable behaviour should be called out. And certainly, I believe that those with the most influence have a moral obligation to defend those who are in a weaker position (with great power comes great responsibility, etc…). It is how I have always tried to act. In fact, not so long ago I called out a particularly bullish but powerful individual because he repeatedly acts in my (and, for that matter, many other people’s) estimation grossly inappropriately in post-publication peer review. In response, I and others have taken a fair bit of abuse from said person. Speaking more generally, I also feel that as a PI I have a responsibility to support those junior to me. I think my students and postdocs can all stand up for themselves, and I would support them in doing so, but in any direct confrontation I’ll be their first line of defense. I don’t think many who have criticised the “tone debate” would disagree with this.

The problem with arguments about tone is that they are often very subjective. The case I mentioned above is a pretty clear cut case. Many other situations are much greyer. More importantly, all too often “tone” is put forth as a means to silence criticism. Quite to the contrary of the argument that this “excludes” underrepresented groups from participating in the debate, it is used to categorically dismiss any dissenting views. In my experience, the people making these arguments are almost always people in positions of power.

A recent example of the tone debate

One of the many events that recently brought the question of tone to my mind was this tweet by Tom Wallis. On PubPeer** a Lydia Maniatis has been posting comments on what seems to be just about every paper published on psychophysical vision science.

I find a lot of things to be wrong with Dr Maniatis’ comments. First and foremost, it remains a mystery to me what the actual point is she is trying to make. I confess I must first read some of the literature she cites to comprehend the fundamental problem with vision science she clearly believes to have identified. Who knows, she might have an important theoretical point but it eludes me. This may very well be due to my own deficiency but it would help if she spelled it out more clearly for unenlightened readers.

The second problem with her comments is that they are in many places clearly uninformed with regard to the subject matter. It is difficult to argue with someone about the choices and underlying assumptions for a particular model of the data when they seemly misapprehend what these parameters are. This is not an insurmountable problem and it may also partly originate in the lack of clarity with which they are described in publications. Try as you might***, to some degree your method sections will always make tacit assumptions about the methodological knowledge of the reader. A related issue is that she picks seemingly random statements from papers and counters them with quotes from other papers that often do not really support her point.

The third problem is that there is just so much of Maniatis’ comments! I probably can’t talk as I am known to write verbose blogs myself – but conciseness is a virtue in communication. In my scientific writing in manuscripts or reviews I certainly aim for it. Yet, in her comments of this paper by my colleague John Greenwood are a perfect example: by my count she expends 5262 words on this before giving John a chance to respond! Now perhaps the problems with that paper are so gigantic that this is justified but somehow I doubt it. Maniatis’ concern seems to be with the general theoretical background of the field. It seems to me that a paper or even a continuous blog would be a far better way to communicate her concerns than targeting one particular paper with this deluge. Even if the paper were a perfect example of the fundamental problem, it is hard to see the forest for the trees here. Furthermore, it also drowns out the signal-to-noise ratio of the PubPeer thread considerably. If someone had an actual specific concern, say because they identified a major statistical flaw, it would be very hard to see it in this sea of Maniatis. Fortunately most of her other comments on PubPeer aren’t as extensive but they are still long and the same issue applies.

Why am I talking about this? Well, a fourth problem that people have raised is that her “tone” is unacceptable (see for example here). I disagree. If there is one thing I don’t take issue with it is her tone. Don’t get me wrong: I do not like her tone. I also think that her criticisms are aggressive, hostile, and unnecessarily inflammatory. Does this mean we can just brush aside her comments and ignore her immediately? It most certainly doesn’t. Even if her comments were the kind of crude bullying some other unpleasant characters in the post-publication peer review sphere are guilty of (like that bullish person I mentioned above), we should at least try to extract the meaning. If someone continues to be nasty after being called out on it, I think it is best to ignore them. In particularly bad cases they should be banned from participating in the debate. No fruitful discussion will happen with someone who just showers you in ad hominems. However, none of that categorically invalidates the arguments they make underneath all that rubbish.

Maniatis’ comments are aggressive and uncalled for. I do however not think they are nasty. I would prefer it if she “toned it down” as they say but I can live with how she says what she says (but of course YMMV). The point is, the other three issues I described above are what concerns me, not her tone. To address them I see these solutions: first of all, I need to read some of the literature her criticisms are based on to try to understand where she is coming from. Secondly, people in the field need to explain to her points of apparent misunderstanding. If she refuses to engage or acknowledge that, then it is best to ignore her. Third, the signal-to-noise ratio of PubPeer comments could be improved by better filtering, so by muting a commenter like you can on Twitter. If PubPeer doesn’t implement that, then perhaps it can be achieved with a browser plug-in.

You promised there would be some science!

Yes I did. I am sorry it took so long to get here but I will briefly discuss a quote from Maniatis’ latest comment on John’s paper:

Let’s suppose that the movement of heavenly bodies is due to pushing by angels, and that some of these angels are lazier than others. We may then measure the relative motions of these bodies, fit them to functions, infer the energy with which each angel is pushing his or her planet, and report our “angel energy” findings. We may ignore logical arguments against the angel hypothesis. When, in future measurements, changes in motion are observed that makes the fit to our functions less good, we can add assumptions, such as that angels sometimes take a break, causing a lapse in their performance. And we can report these inferences as well. If discrepancies can’t be managed with quantitative fixes, we can just “hush them up.”

I may disagree (and fail to understand) most of her criticisms, but I really like this analogy. It actually reminds me of an example I used when commenting on Psi research and which I also use in my teaching about the scientific method. I used the difference between the heliocentric and geocentric models of planetary movements to illustrate Occam’s Razor, explanatory power, and the trade-off with model complexity. Maniatis’ angels are a perfect example for how we can update our models to account for new observations by increasing their complexity and overfitting the noise. The best possible model however should maximise explanatory power while minimising our assumptions. If we can account for planetary motion without assuming the existence of angels, we may be on the right track (as disappointing as that is).

It won’t surprise you when I say I don’t believe Maniatis’ criticism applies to vision science. Our angels are supported by a long list of converging scientific observations and I think that if we remove them from the model the explanatory power of the models goes down and the complexity increases. Or at least Maniatis hasn’t made it clear why that isn’t the case. However, leaving this specific case aside, I do like the analogy a lot. There you go, I actually discussed science for a change.

* I expect someone to hold me to this!
** She also commented on PubMed Central but apparently her account there has been blocked.
*** But this is no reason not to try harder.

fnhum-08-00332-g001

Is open science tone deaf?

The past week saw the latest installment of what Chris Chambers called the “arse-clenchingly awful ‘tone debate‘ in psychology”. If you have no idea what he might be referring to, consider yourself lucky, leave this blog immediately, and move on with your life with the happy thought that sometimes ignorance is indeed bliss. If you think to know what it is referring to, you may or may not be right because there seem to have been lots of different things going on and “tone” seems to mean very different things to different people. It apparently involves questions such as this:

  1. What language is acceptable when engaging in critical post-publication peer review?
  2. Is it ever okay to call reanalysis and replication attempts “terrorism”?
  3. While on this topic, what should we do when somebody’s brain fart produces a terrible and tenuous analogy about something?
  4. Should you tag someone in a twitter discussion on a conference when they didn’t attend it?
  5. How should a new and unconventional conference be covered on social media?
  6. What is sarcasm and satire and are they ever okay?
  7. Also, if I don’t find your (bad?) joke or meme funny, does this mean you’re “excluding” me from the discussion?
  8. When should somebody be called a troll?
  9. Is open science tone deaf?

If you were hoping to find a concrete answer to any of these questions, I am sorry to disappoint you. We could write several volumes on each of these issues. But here I only want to address the final question, which is also the title of this post. In clear adherence to Betteridge’s Law the answer is No.

What has bothered me about this “tone debate” for quite some time, but which I only now managed to finally put my finger on, is that tone and science are completely orthogonal and independent of one another. I apologise to Chris as I’m probably rehashing this point from his arse-unclenching post. The point is also illustrated in this satirical post, which you may or may not find funny/clever/appropriate/gluten-free.

In fact, what also bothers me is this focus on open science as, to use Chris’ turn of phrase, an “evangelical movement”. If open science is an evangelical movement, is Brian Nosek its Pope? And does this make Daniël Lakens and Chris Chambers rabid crusaders, EJ Wagenmakers a p-value-bashing Lutheran, and Susan Fiske the Antichrist? I guess there is no doubt that Elsevier is the cult of Cthulhu.

Seriously, what the £$%@ is “open” science anyway? I have come to the conclusion that all this talk about open science is actually detrimental to the cause this “movement” seeks to advance. I hereby vow not to use the term “open science” ever again except to ridicule the concept. I think the use of this term undermines its goals and ironically produces all this garbage about exclusivity and tone that actually prevents more openness in science.

I have no illusions that I can effect a change in people’s use of the term. It is far too wide-spread and ingrained at this point. Perhaps you could change it if you could get Donald Trump to repeatedly tweet about it abusively and thus tarnish the term for good – just as he did with the Fake News moniker (I think “Sad” might be another victim). But at least I can stop using this exclusive and discriminatory term in my own life and thus help bring about a small but significant (p=0.0049) change in the way we do research.

There is no such thing as “open science”. There is good science and there is bad science (and lots of it). There are ways to conduct research that are open and transparent. I believe greater openness makes science better. As things stand right now, the larger part of the scientific community, at least in biological, social, and behavioural sciences, remains in the status quo and has not (yet) widely embraced many open practices. Slowly but surely, the field is however moving in the direction of more openness. And we have already made great strides, certainly within the decade or so that I have been a practicing scientist. Having recently had the displeasure of experiencing firsthand in my own life how the news media operate, I can tell you that we have made leaps in terms of transparency and accountability. In my view, the news media and politics would be well served to adopt more scientific practice by having easier access to source data, fighting plagiarism, and minimising unsubstantiated interpretation of data.

None of this makes “open science” special – it is really just science. Treating proponents of open practices as some sort of homogeneous army (“The Methodological Liberation Front”?) is doing all scientists a disservice. Yes, there are vocal proponents (who often vehemently disagree on smaller points, such as the best use of p-values) but in the end all scientists should have an interest in improving scientific practice. This artificial division into open science and the status quo (“closed science”?) is not helpful in convincing sceptics to adopt open practices. It is bad enough when some sceptics use their professional position to paint a large number of people with the same brush (e.g. “replicators”, “terrorists”, “parasites”, etc). The last thing people whose goal is to improve science should do is to encapsulate and separate themselves from the larger scientific community by calling themselves things like “open science”.

So what does any of this have to do with “tone”? Nothing whatsoever – that’s my point. Are there people whose language could be more refined when criticising published scientific studies? Yes, no doubt there are. One of my first experiences with data sharing was when somebody sent me a rude one-line email asking for our data and spiced it up with a link to the journal’s data sharing policy which added a level of threat to their lack of tact. It was annoying and certainly didn’t endear them to me but I shared the data anyway, neither because of the tone of the email nor the journal’s policy but because it is the right thing to do. We can avoid that entire problem in the future by regularly publishing data (as far as ethically and practically feasible) with the publication or (even better) when submitting the manuscript for review.

Wouldn’t it be better if everyone were just kind and polite to one another and left their emotions out of it? Yes, no doubt it would be but we aren’t machines. You can’t remove the emotion from the human beings who do the science. All of human communication is plagued by emotions, misunderstandings, and failures of diplomacy. I have a friend and colleague who regularly asks questions at conference talks that come across as rather hostile and accusatory. Knowing the man asking the question I’m confident this is due to adrenaline rather than spite. This does not mean you can’t call out people for offending you – but at least initially they also deserve to be given the benefit of the doubt (see Hanlon’s Razor and, for that matter, the Presumption of Innocence).

Bad “tone” is also not exactly a new thing. If memory serves, a few years before many of us were even involved in science social media, a journal deemed it acceptable to publish a paper by one of my colleagues calling his esteemed colleagues’ arguments “gobbledygook“. Go back a few decades or centuries and you’ll find scientists arguing in the most colourful words and making all manner of snide remarks about one another. And of course, the same is true outside the world of science. Questions about the appropriate tone are as old as our species.

By all means, complain about the tone people use if you feel it is inappropriate but be warned that this frequently backfires. The same emotions that lead you to take offense to somebody’s tone (which may or may not be justified) may also cause them to take offense to you using bad “tone” as a defense. In many situations it often seems wiser to simply ignore that individual by filtering them out. If they somehow continue to break into your bubble and pester you, you may have a case of abuse and harassment and that’s a whole different beast, one that deserves to be slain. But honestly, it’s a free world so nobody can or should stop you from complaining about it. Sometimes a complaint is fully justified.

It is also true that we people on social media or post-publication peer review platforms can probably take a good hard look in the mirror and consider our behaviour. I have several colleagues who told me they avoid science twitter “because of all the assholes”. Nobody can force anyone to stop being an asshole but it is true that you may get further with other people when you don’t act like a dick around them. I also think that post-publication review and science in general could be a bit more forgiving. Mistakes and lack of knowledge are human and common and we can do a lot better at appreciating this. Someone once described the posts on RetractionWatch as “gleeful” and I think there is some truth to that. If we want to improve science we need to make it easier and socially acceptable to admit when you’re wrong. There have been some laudable efforts in that direction but we’re far from where we should be.

Last but not least, you don’t have to like snarky remarks. Nobody can force you to find Dr Primestein funny or to be thrilled when he generalises all research in a particular field or even alludes that it’s fraudulent. But again, satire and snark are as old as humanity. It should be taken with a grain of salt. I don’t find every joke funny. For instance, I find it incredibly tedious when people link every mention of Germans back to the Nazis. It’s a tired old trope but to be honest I don’t even find it particularly offensive – I certainly don’t feel the need to complain about it every bloody time. But the question of hilarity aside, satire can reveal some underlying truths and in my view there is something in Primestein’s message that people should take to heart. However, if he pisses you off and you’d rather leave him, that’s your unalienable right.

Whatever you do, just for the love of god don’t pretend that this has anything to do with “open science”! Primestein isn’t the open science spokesperson. Neither is a racist who uses open data reflecting bad on the “movement”. The price of liberty is eternal vigilance. Freedom of speech isn’t wrong because it enables some people to say unacceptable things. Neither is open data bad because somebody might abuse it for their nasty agenda. And the truth is, they could have easily done the same with closed science. If somebody does bad science, you should criticise them and prove them wrong, even more so when they do it with some bad ulterior motive. If somebody is abusive or exploitative or behaving unethically, call them out, report them, sue them, get them arrested, depending on the severity of the case. Open science doesn’t have a problem with inclusivity because open science doesn’t exist. However, science definitely does have a problem with inclusivity and I think we should all work hard to improve that. Making science more open, both in terms of access to results and methods as well as who can join its community, is making science better. But treating “open science” as some exclusive club inside science you are inadvertently creating barriers that did not need to exist in the first place.

And honestly, why and how should the “tone” of some people turn you off from using open practices? Is data sharing only a good cause when people are nice? Does a pre-registration become useless when someone snarkily dismisses your field? Is post-publication review worthless simply because some people are assholes? I don’t think so. If anything, more people adopting such practices would further normalise them and thus help equilibrate the entire field. Openness is not the problem but the solution.

cthulhu_and_r27lyeh
At the nightly editorial board meeting

 

Would the TEF have passed peer review?

Today we have the first NeuroNeurotic guest post! The following short rant was written by my colleague Lee de Wit. In this he talks about the recently published “Teaching Excellence Framework” in which UK universities are ranked based on the quality of their undergraduate teaching… If you are also a psychologist and would like to write and sign a more formal letter/editorial to the Times Higher Education outlining these points, email l.de-wit@ucl.ac.uk

As psychologists when we create a new way of measuring something complex (like a personality trait) we have to go to rigorous lengths to demonstrate that the measures we use are valid, reliable and that we classify people meaningfully.

When it comes to measuring teaching in higher education however, it seems we can just lower the standards. Apparently the TEF is meant to help students make meaningful choices, yet I can see no evidence that it is a valid measure, no evidence it is reliable, and no evidence that it meaningfully clusters Universities.

Validity – One of the key measures used in TEF are student satisfaction scores – yet we already know that they are not a valid measure of teaching quality. In fact there are meta-analyses demonstrating that high student satisfaction scores don’t even correlate with learning outcomes.

Reliability – Apparently it is fine to just have a panel of 27 people make some subjective judgements about the quantitative data, to classify Universities. No need to have two panels rate them and then check they come to similar judgements.

Clustering – in terms of the underlying distribution of the data, no need to seriously think about whether there are meaningful clusters or more continuous variability. Gold, Sliver and Bronze – job done.

If there are any academics tweeting today about your University’s strong result, I would seriously call into question the excellence with which you can teach critical thinking to your students.

The one lesson I would take from this for UK Universities, is that we are clearly failing to educate politicians and policy makers to think carefully about evidence based policy. Presumably most of the key players in designing and implementing TEF went to UK Universities. So I’m worried about what they learnt that made them think this was a good idea.

Marking myself annoyed

First of all, let me apologise for the very long delay since my last blog post. As you all know, the world is going through a lot of turmoil right now. I was also busy and travelling a lot and so I’ve had neither time nor the energy to blog. But anyway, I’m back and have a number of posts in mind for the next few weeks.

Before I begin, let me say this: My heart goes out to the victims of the horrific terrorist attack at Westminster Bridge and the Houses of Parliament the other day. All whose loved ones were injured or killed in this senseless act of violence are in my thoughts. I admire the efficiency and bravery of the emergency services and the bystanders who rushed to help. There is never an excuse to commit such vile crimes in the pursuit of some political goal. In the case of this brand of Islamic terrorism (if this is indeed confirmed to be the case), the actual political goal is also pretty obscure. Either way, it is a meaningless and evil act. We should stand united in the face of such evil. Don’t be cowed into giving up liberty and justice and never give in to hate and fear.

Having said this, let me get to the point. For several years now Facebook has had this feature where people “mark themselves safe” when a terror attack strikes. I presume it may also be used for natural disasters but if so I haven’t seen that yet. From the first time I saw this, during the terror attack in Paris, I found this rather tasteless and also far from helpful.

Back then, many people criticised Facebook as the feature was heavily biased towards white, western countries. Around the same time of the Paris attacks there were several other attacks in Turkey and the Middle East. Nobody got to “mark themselves safe” during those attacks. And in certain parts of the world terror attacks are a weekly occurrence. So the outrage over Facebook starting this feature for attacks in Europe is understandable. But I think it is misplaced: Facebook has always rolled out their new features in a geographically limited way and they typically start in the western world where they are based. There is also a related discussion to be had about in-groups and out-groups. And about our habituation to bad news: sad as it may be, even after this string of terror attacks in European cities they remain more newsworthy than those in Baghdad or Kabul where this seems to happen all the time. Since then, Facebook have expanded their use of this feature to non-western countries. Whether this was because of people’s complaints or they always planned this I do not know. But either way, it is no longer limited to the West.

What annoys me about this Facebook feature is something else however. To me it seems  demeaning and callous. I don’t think the emotional engagement we should have with such events and the concern we should feel for our fellow human beings should be condensed down to a button press and notification on social media. Perhaps I’m just an old fart who doesn’t comprehend the way the modern world works. I certainly don’t really understand dating via Tinder and a lot of the social media communication millennials get up to these days (snapstagram or chat roulette or whatever they’re called). And don’t get me started on the excessive hash tagging.

But there is a big difference: most of those other things are trivial or affectations. I have no problem with people looking for casual sex or even seeking a life partner via modern social media if this is what works for them. I may not understand the excessive selfie craze and glammed up pictures some people post of themselves emulating the growing ranks of celebrities who are only famous for being famous. But I don’t have a problem with that. It’s up to each and everyone how they want to spend their spare time and what they do in the pursuit of happiness. And of course I use social media too. I like using Facebook actually and use it often (some of my friends probably think too often, although they vastly overestimate how much time it actually takes from my life). Facebook is a great way to stay in touch with friends and family. I even got back in touch with some really old friends who I would not otherwise have any contact with now. So I don’t even feel that all of our social media contact is trivial. I have some very meaningful human contact that way and rekindled old friendships.

In contrast, this marking safe business seems deeply inappropriate to me. It trivialises the gravity of the situations. In my view, our emotional reaction to a situation like this should go beyond an emoji or clicking a “sad” button. You might say, to each their own. You don’t have to use this thing and can turn off notifications about this. But it’s not that simple. That’s not how social media work. The whole feature is designed around the idea that people mark themselves safe, thus spreading the word, and also ask their friends if they are safe. It creates a kind of peer pressure that coerces people into marking themselves “safe” causing a chain reaction that makes the whole thing spiral out of control.

You might also say, that is is a good and social thing to get in touch with your friends and loved ones. As I said, I use social media too. I am not Susan Greenfield, or any one of those people who think that staring into your phone or having social contact via the internet withers away our interhuman contact. Quite to the contrary in fact. I remember seeing this excellent cartoon about how smart phones are all about interhuman contact but sadly my google skills are too poor to find it. I most certainly disagree with this article – it is nonsense in so many ways.

But again, there is a difference: getting in touch with your loved ones is not the same as seeing a notification (or even requesting) that they “mark themselves safe”. It seems so cold, so removed from humanity. Of course, you worry about your loved ones. The clue why is in the word. You see on the news that some tragedy occurred and you want to know your friends and family are all right. Well then, pick up that smart phone of yours and send them a message or give them a call! The best way to find out if they are okay and letting them know you care about them is to speak to them. Several friends and family got in touch with me via phone or email or instant message asking if we are okay. And I certainly did the same. I have friends and family in Paris and in Berlin and I contacted them when the terror attacks there happened. On the day of the 7/7 bombings I contacted all of my London friends at the time. Even though I realise that the odds of any of them being caught up in these events are low, you also want them to know you think of them, find out how they feel, and give them some consolation and support. By all means, use social media for that purpose – it’s very good for that. But to me, reducing this down to one tap of your finger on the phone is sorely insufficient. I hardly says “I care” and in some ways it even seems to disrespect the victims and the plight of those people who actually grieve for their loved ones.

And then there is the practical side of this. The blunt nature of the algorithms behind this feature and the fact that people (quite rightly) don’t actually share all the details of their lives on Facebook causes some really stupid artifacts. Not only is Egham (home of Royal Holloway “University of London”) really, really safe, my department in actual London was also pretty safe from this terror attack (ironically enough, my department is right next to several of the sites of the 7/7 bombings, in particular the bus bombing at Tavistock Square). While I have walked across Westminster Bridge and past Parliament many times, believe me, it’s not where I spend most of my work days. And while of course it was possible that the terrorist didn’t act alone and other attacks might be happening (a common feature to IS and Al-Qaeda attacks), there were no reports of anything else happening at the time. But what if there had been other attacks? What if your friend marks themselves “safe” of the first one and then gets caught up in the second? Is there a way to “unmark” yourself again? And would that really be your first priority in that situation?

The even more bizarre artifacts of Facebook’s indiscriminate scatter approach are of course that it not only wants us to make sure people in Egham are okay but also those in galaxies far, far away. On the mark yourself safe page I saw several people who haven’t lived in London for years but are in the United States and other places thousands of miles away. Not everyone changes their personal details every time they move because that really isn’t always the most important thing in their lives. And of course, some people may have been in London at the time even though according to their “official Facebook records” they live somewhere else. They will fall through the cracks completely.

A much more severe side effect, however, is the distorted picture of reality this sort of thing produces. The tweet by Hanna Isotalus I already mentioned starts a thread elaborating on this problem. This whole business of marking yourself safe actually has the consequence of making everyone feel less safe than they are. While of course horrible and tragic for everyone who was involved, as I already said this attack was a pretty isolated event. By drawing this much attention to it by frantically requesting everyone who has anything to do with London mark themselves “safe” we actually vastly exaggerate its effects. The same can probably also be said about the intense news coverage of such events.

The casualties of terrorism in the western world have clearly declined considerably over the past decades. Admittedly, there are some spikes in recent years and most of those are related to jihadist terrorism. However, the actual reach of these attacks in Europe or the US is very small compared to the extent of fear-mongering and political agonising it causes. Also, not that it should matter but a very large proportion of Islamist terror happens in predominantly Muslim countries and most certainly a large proportion of the victims are Muslims.

This stands in stark contrast to the number of people injured and killed all the time by car accidents or – in the US anyways – by guns. It stands in contrast to the risks we are subjected to every day. Nobody seems to think to mark themselves safe every time they take a car or cross a road as if they’d unlocked some achievement in a computer game. I have yet to see a notification on Facebook from one of my many daredevil colleagues telling me “I rode my bike to work and managed to survive for yet another day”.

So as Hanna points out, you are safe. Marking yourself safe doesn’t make you safe. Take a step back (but omit the deep breaths – in London that is actually dangerous). Think about what this really achieves. By all means, contact your loved ones to let them know you care. While statistically they are not at risk, there is one distinct difference between accidents and terrorism. An accident happens by misfortune or neglect. Crime and terrorism are deliberate acts of evil. Talking to your friends and family who happen to be close to such things shows your support. And of course, please pay your respects to the victims, console the ones close to them, and honour the heroes who saved people’s lives and bring the perpetrators to justice.

But don’t buy into this callous scheme of “marking yourself safe”. You’re just playing into the terrorists’ hands. You just spread the fear they want to cause, the hatred and divisions they want to incite, and it contributes to the continued erosion of our liberties and way of life. It strengthens the forces who want to undermine our freedom and respect for one another. All those far-right politicians may not know it but they are bedfellows of these Islamist murderers. Sorry for the cliche but it’s true: If we buy into this crap, the terrorists win.

Chris Chambers is a space alien

Imagine you are a radio astronomer and you suddenly stumble across a signal from outer space that appears to be evidence of an extra-terrestrial intelligence. Let’s also assume you already confidently ruled out any trivial artifactual explanation to do with naturally occurring phenomena or defective measurements. How could you confirm that this signal isn’t simply a random fluke?

This is actually the premise of the novel Contact by Carl Sagan, which happens to be one of my favorite books (I never watched the movie but only caught the end which is nothing like the book so I wouldn’t recommend it…). The solution to this problem proposed in the book is that one should quantify how likely the observed putative extraterrestrial signal would be under the assumption that it is the product of random background radiation.

This is basically what a p-value in frequentist null hypothesis significance testing represents. Using frequentist inference requires that you have a pre-specified hypothesis and a pre-specified design. You should have an effect size in mind, determine how many measurements you need to achieve a particular statistical power, and then you must carry out this experiment precisely as planned. This is rarely how real science works and it is often put forth as one of the main arguments why we should preregister our experimental designs. Any analysis that wasn’t planned a priori is by definition exploratory. The most extreme form of this argument posits that any experiment that hasn’t been preregistered is exploratory. While I still find it hard to agree with this extremist position, it is certainly true that analytical flexibility distorts the inferences we can make about an observation.

This proposed frequentist solution is therefore inappropriate for confirming our extraterrestrial signal. Because the researcher stumbled across the signal, the analysis is by definition exploratory. Moreover, you must also beware of the base-rate fallacy: even an event extremely unlikely under the null hypothesis is not necessarily evidence against the null hypothesis. Even if p=0.00001, a true extraterrestrial signal may be even less likely, say, p=10-100. Even if extra-terrestrial signals are quite common, given the small amount of space, time, and EM bands we have studied thus far, how probable is it we would just stumble across a meaningful signal?

None of that means that exploratory results aren’t important. I think you’d agree that finding credible evidence of an extra-terrestrial intelligence capable of sending radio transmissions would be a major discovery. The other day I met up with Rob McIntosh, one of the editors for Registered Reports at Cortex, to discuss the distinction between exploratory and confirmatory research. A lot of the criticism of preregistration focuses on whether it puts too much emphasis on hypothesis-driven research and whether it in turn devalues or marginalizes exploratory studies. I have spent a lot of time thinking about this issue and (encouraged by discussions with many proponents of preregistration) I have come to the conclusion that the opposite is true: by emphasizing which parts of your research are confirmatory I believe exploration is actually valued more. The way scientific publishing works conventionally many studies are written up in a way that pretends to be hypothesis-driven when in truth they weren’t. Probably for a lot of published research the truth lies somewhere in the middle.

So preregistration just keeps you honest with yourself and if anything it allows you to be more honest about how you explored the data. Nobody is saying that you can’t explore, and in fact I would argue you should always include some exploration. Whether it is an initial exploratory experiment that you did that you then replicate or test further in a registered experiment, or whether it is a posthoc robustness test you do to ensure that your registered result isn’t just an unforeseen artifact, some exploration is almost always necessary. “If we knew what we were doing, it would not be called research, would it?” (a quote by Albert Einstein, apparently).

One idea I discussed with Rob is whether there should be a publication format that specifically caters to exploration (Chris Chambers has also mentioned this idea previously). Such Exploratory Reports would allow researchers to publish interesting and surprising findings without first registering a hypothesis. You may think this sounds a lot like what a lot of present day high impact papers are like already. The key difference is that these Exploratory Reports would contain no inferential statistics and critically they are explicit about the fact that the research is exploratory – something that is rarely the case in conventional studies. However, this idea poses a critical challenge: on the one hand you want to ensure that the results presented in such a format are trustworthy. But how do you ensure this without inferential statistics?

Proponents of the New Statistics (which aren’t actually “new” and it is also questionable whether you should call them “statistics”) will tell you that you could just report the means/medians and confidence intervals, or perhaps the whole distributions of data. But that isn’t really helping. Inspecting confidence intervals and how far they are from zero (or another value of no interest) is effectively the same thing as a significance test. Even merely showing the distribution of observations isn’t really helping. If a result is so blatantly obvious that it convinces you by visual inspection (the “inter-ocular trauma test”), then formal statistical testing would be unnecessary anyway. If the results are even just a little subtler, it can be very difficult to decide whether the finding is interesting. So the way I see it, we either need a way to estimate statistical evidence, or you need to follow up the finding with a registered, confirmatory experiment that specifically seeks to replicate and/or further test the original exploratory finding.

In the case of our extra-terrestrial signal you may plan a new measurement. You know the location in the sky where the signal came from, so part of your preregistered methods is to point your radio telescope at the same point. You also have an idea of the signal strength, which allows you to determine the number of measurements needed to have adequate statistical power. Then you carry out this experiment, sticking meticulously to your planned recipe. Finally, you report your result and the associated p-value.

Sounds good in theory. In practice, however, this is not how science typically works. Maybe the signal isn’t continuous. There could be all sorts of reasons why the signal may only be intermittent, be it some interstellar dust clouds blocking the line of transmission, the transmitter pointing away from Earth due to the rotation of the aliens’ home planet, or even simply the fact that the aliens are operating their transmitter on a random schedule. We know nothing about what an alien species, let alone their civilization, may be like. Who is to say that they don’t just fall into random sleeping periods in irregular intervals?

So some exploratory, flexible analysis is almost always necessary. If you are too rigid in your approach, you are very likely to miss important discoveries. At the same time, you must be careful not to fool yourself. If we are really going down the route of Exploratory Reports without any statistical inference we need to come up with a good way to ensure that such exploratory findings aren’t mostly garbage. I think in the long run the only way to do so is to replicate and test results in confirmatory studies. But this could already be done as part of a Registered Report in which your design is preregistered. Experiment 1 would be exploratory without any statistical inference but simply reporting the basic pattern of results. Experiment 2 would then be preregistered and replicate or test the finding further.

However, Registered Reports can take a long time to publish. This may in fact be one of the weak points about this format that may stop the scientific community from becoming more enthusiastic about them. As long as there is no real incentive to doing slow science, the idea that you may take two or three years to publish one study is not going to appeal to many people. It will stop early career researchers from getting jobs and research funding. It also puts small labs in poorer universities at a considerable disadvantage compared to researchers with big grants, big data, and legions of research assistants.

The whole point of Exploratory Reports would be to quickly push out interesting observations. In some ways, this is then exactly what brief communications in high impact journals are currently for. I don’t think it will serve us well to replace the notion of snappy (and likely untrue) high impact findings with inappropriate statistical inferences with snappy (and likely untrue) exploratory findings without statistical inference. If the purpose of Exploratory Reports is solely to provide an outlet for quick publication of interesting results, we still have the same kind of skewed incentive structure as now. Also, while removing statistical inference from our exploratory findings may be better statistical practice I am not convinced that it is better scientific practice unless we have other ways of ensuring that these exploratory results are kosher.

The way I see it, the only way around this dilemma is to finally stop treating publications as individual units. Science is by nature a lengthy, incremental process. Yes, we need exciting discoveries to drive science forward. At the same time, replicability and robustness of our discoveries is critical. In order to combine these two needs I believe research findings should not be seen as separate morsels but as a web of interconnected results. A single Exploratory Report (or even a bunch of them) could serve as the starting point. But unless they are followed up by Registered Reports replicating or scrutinizing these findings further, they are not all that meaningful. Only once replications and follow up experiments have been performed the whole body of a finding takes shape. A search on PubMed or Google Scholar would not merely spit out the original paper but a whole tree of linked experiments.

The perceived impact and value of a finding thus would be related to how much of a interconnected body of evidence it has generated rather than whether it was published in Nature or Science. Critically, this would allow people to quickly publish their exciting finding and thus avoid being deadlocked by endless review processes and disadvantaged compared to other people who can afford to do more open science. At the same time, they would be incentivized to conduct follow-up studies. Because a whole body of related literature is linked, it would however also be an incentive for others to conduct replications or follow up experiments on your exploratory finding.

There are obviously logistic and technical challenges with this idea. The current publication infrastructure still does not really allow for this to work. This is not a major problem however. It seems entirely feasible to implement such a system. The bigger challenge is how to convince the broader community and publishers and funders to take this on board.

200px-arecibo_message-svg