It goes like this. Null-hypothesis-significance-testing (NHST) only works when you have enough accuracy that you can confidently reject the null hypothesis. You get this accuracy from a large sample of measurements with low bias and low variance. But you also need a large effect size. Or, at least, a large effect size, compared to the accuracy of your experiment.

But . . . we’ve grabbed all the low-hanging fruit. In medicine, public health, social science, etc etc etc, we’re studying smaller and smaller effects. These effects can still be important in aggregate, but each individual effect is small.

To study smaller and smaller effects using NHST, you need better measurements and larger sample sizes. The strategy of run-a-crappy-study, get p less than 0.05, come up with a cute story based on evolutionary psychology, and PROFIT . . . well, I wanna say it doesn’t work anymore. OK, maybe it still can work if your goal is to get published in PPNAS, get tenure, give Ted talks, and make boatloads of money in speaking fees. But it won’t work in the real sense, the important sense of learning about the world.

What, then, should you be doing? You should be fitting multilevel models. Take advantage of the fact that lots and lots of these studies are being done. Forget about getting definitive results from a single experiment; instead embrace variation, accept uncertainty, and learn what you can.

I think there’s room for a more formal article on this topic, but the above is a start.