P-hacking the System

Science is hard. Scientists have to stare at mountains of data and try to figure out what secrets nature is whispering to them. There are innumerable blind alleys, dead ends, and false starts in academic research. That's life, and that's why over the centuries we've developed sophisticated statistical techniques to help lead us to understanding. But if you're not careful, you can fool yourself into thinking there's a signal when really you've found nothing but noise.

The problem is in correlations, or when two variables in your experiment or observation seem to be related to each other. Uncovering a correlation is usually the first step in "hey I think I found something," and so many researchers report a connection as soon as their experiment reveals one.

But experiments are often exceedingly complex, with many variables constantly changing - sometimes under your control and sometimes not. If you have, say, twenty variables that are all totally random, then by pure chance at least two of those variables will be correlated.

So when scientists fail to spot the correlation they were looking for, sometimes they start digging through the data until something pops up. And when it inevitably does - publish! But it was just a statistical fluke all along.

This practice is called "p-hacking", for reasons I'll get into another time, and it's a prime source of juicy headlines but faulty results.