I can hear you singing in the distance
I can see you when I close my eyes
Once you were somewhere and now you’re everywhere
Superblood Wolfmoon – Pearl Jam
If you read my previous blog post you’ll know I have a particular relationship these days with regression to the mean – and binning artifacts in general. Our recent retraction of a study reminded me of this issue. Of course, I was generally aware of the concept, as I am sure are most quantitative scientists. But often the underlying issues are somewhat obscure, which is why I certainly didn’t immediately clock on to them in our past work. It took a collaborate group effort with serendipitous suggestions, much thinking and simulating and digging, and not least of all the tireless efforts of my PhD student Susanne Stoll to uncover the full extent of this issue in our published research. We also still maintain that this rabbit hole goes a lot deeper because there are numerous other studies that used similar analyses. They must by necessity contain the same error – hopefully the magnitude of the problem is less severe in most other studies so that their conclusions aren’t all completely spurious. However, we simply cannot know that until somebody investigates this empirically. There are several candidates out there where I think the problem is almost certainly big enough to invalidate the conclusions. I am not the data police and I am not going to run around arguing people’s conclusions are invalid without A) having concrete evidence and B) having talked to the authors personally first.
What I can do, however, is explain how to spot likely candidates of this problem. And you really don’t have far too look. We believe that this issue is ubiquitous in almost all pRF studies; specifically, it affects all pRF studies that use any kind of binning. There are cases where this is probably of no consequence – but people must at least be aware of the issue before it leads to false assumptions and thus erroneous conclusions. We hope to publish another article in the future that lays out this issue in some depth.
But it goes well beyond that. This isn’t a specific problem with pRF studies. Many years before that I had discussions with David Shanks about this subject when he was writing an article (also long since published) of how this artifact confounds many studies in the field of unconscious processing, something that certainly overlaps with my own research. Only last year there was an article arguing that the same artifact explains the Dunning-Kruger effect. And I am starting to see this issue literally everywhere1 now… Just the other day I saw this figure on one of my social media feeds:
This data visualisation makes a striking claim with very clear political implications: High income earners (and presumably very rich people in general) underestimate their wealth relative to society as a whole, while low income earners overestimate theirs. A great number of narratives can be spun about this depending on your own political inclinations. It doesn’t take much imagination to conjure up the ways this could be used to further a political agenda, be it a fierce progressive tax policy or a rabid pulling-yourself-up-by-your-own-bootstraps type of conservatism. I have no interest in getting into this discussion here. What interests me here is whether the claim is actually supported by the evidence.
There are a number of open questions here. I don’t know how “perceived income” is measured exactly2. It could theoretically be possible that some adjustments were made here to control for artifacts. However, taken at face value this looks almost like a textbook example of regression to the mean. Effectively, you have an independent variable, the individuals’ actual income levels. We can presumably regard this as a ground truth – an individual’s income is what it is. We then take a dependent variable, perceived income. It is probably safe to assume that this will correlate with actual income. However, this is not a perfect correlation because perfect correlations are generally meaningless (say correlating body height in inches and centimeters). Obviously, perceived income is a psychological measure that must depend on a whole number of extraneous factors. For one thing, people’s social networks aren’t completely random but we all live embedded in a social context. You will doubtless judge your wealth relative to the people you mostly interact with. Another source of misestimation could be how this perception is measured. I don’t know how that was done here in detail but people were apparently asked to self-rate their assumed income decile. We can expect psychological factors at play that make people unlikely to put themselves in the lowest or highest scores on such a scale. There are many other factors at play but that’s not really important. The point is that we can safely assume that people are relatively bad at judging their true income relative to the whole of society.
But to hell with it, let’s just disregard all that. Instead, let us assume that people are actually perfectly accurate at judging their own income relative to society. Let’s simulate this scenario3. First we draw 10,000 people a Gaussian distribution of actual incomes. This distribution has a mean of $60,000 and a standard deviation of $20,000 – all in fictitious dollars which we assume our fictitious country uses. We assume these are based on people’s paychecks so there is no error4 on this independent variable at all. I use the absolute values to ensure that there is no negative income. The figure below shows the actual objective income for each (simulated) person on the x-axis. The y-axis is just random scatter for visualisation – it has no other significance. The colour code denotes the income bracket (decile) each person belongs to.
Next I simulate perceived income deciles for these fictitious people. To do this we need to do some rescaling to get everyone on the scale 1-10, with 10 being highest top earner. However – and this is important – as per our (certainly false) assumption above, perceived income is perfectly correlated with actual income. It is a simple transformation to rescale it. Now, what happens when you average the perceived income in each of these decile brackets like that graph above did? I do that below, using the same formatting as the original graph:
I will leave it to you, gentle reader, to determine how this compares to the original figure. Why is this happening? It’s simple really when you think about it: Take the highest income bracket. This ranges widely from high-but-reasonable to filthy-more-money-than-you-could-ever-spend-in-a-lifetime rich. This is not a symmetric distribution. The summary statistics of these binned data will be heavily skewed. Its mean/median will be biased downward for the top income brackets and upwards for the low income brackets. Only the income decile near the centre will be approximately symmetric and thus produce an unbiased estimate. Or to put it in simpler terms: the left column simply labels the deciles brackets. The only data here is in the right column and all this plot really shows is that the incomes have a Gaussian-like distribution. This has nothing to do with perceptions of income whatsoever.
In discussions I’ve had this all still confuses some people. So I added another illustration. In the graph below I plot a normal distribution. The coloured bands denote the approximated deciles. The white dots on the X-axis show the mean for each decile. The distance between these dots is obviously not equal. They all trend to be closer to the population mean (zero) than to the middle of their respective bands. This bias is present for all deciles except perhaps the most central ones. However, it is most extreme for the outermost deciles because these have the most asymmetric distributions. This is exactly what the income plots above are showing. It doesn’t matter whether we are looking at actual or perceived income. It doesn’t matter at all if there is error on those measures or not. All that matters is the distribution of the data.
Now, as I already said, I haven’t seen the detailed methodology of that original survey. If the analysis made any attempt to mathematically correct for this problem then I’ll stand corrected5. However, even in that case, the general statistical issue is extremely wide-spread and this serves as a perfect example of how binning can result in widely erroneous conclusions. It also illustrates the importance of this issue. The same problem relates to pRF tuning widths and stimulus preferences and whatnot – but that is frankly of limited importance. But things like these income statistics could have considerable social implications. What this shows to me is two-fold: First, please be careful when you do data analysis. Whenever possible, feed some simulated data to your analysis to see if it behaves as you think it should. Second, binning sucks. I see it effing everywhere now and I feel like I haven’t slept in months6…
- A very similar thing happened when I first learned about heteroscedasticity. I kept seeing it in all plots then as well – and I still do…
- Many thanks to Susanne Stoll for digging up the source for these data. I didn’t see much in terms of actual methods details here but I also didn’t really look too hard. Via Twitter I also discovered the corresponding Guardian piece which contains the original graph.
- Matlab code for this example is available here. I still don’t really do R. Can’t teach an old dog new tricks or whatever…
- There may be some error with a self-report measure of people’s actual income although this error is perhaps low – either way we do not need to assume any error here at all.
- Somehow I doubt it but I’d be very happy to be wrong.
- There could however be other reasons for that…
If this post confused you, there is now a follow-up post to confuse you even more… 🙂