When confronted with these recommendations it seems many researchers balk. This is surprising to me because many of these recommendations seem quite mundane and easy to implement. Why would researchers choose not to embrace these recommendations as a means to improve the quality of their work?

I believe the reason for the
passion behind the protests is that the proposed changes undermine the incentive structure on which the
field is built. What is that incentive structure?

P < .05.

The first, seemingly most valued component of psychological science is that your findings must be “statistically significant” which is indicated concretely by achieving results where the probability of your data given the null hypothesis is less than 5%. Researchers must attain a p-value less than .05 to be a success in psychological science. If your p-value is greater than .05, you have no finding and nothing to say because your work will not be published, discussed in the press, or net you a TED talk.

Because the p-value is the primary key to the
domain of scientific success we do almost anything we can to find the desired small p-value. We root around in our data, digging up p-values either by cherry picking studies, selectively reporting outcomes, or through some arcane statistical modeling. It is clear from reviews of psychological science that we not only value p-values less than .05, but also have been remarkably successful in limiting the publication of alternative p-values. In our published literature psychology confirms 95% of its hypotheses (Fanelli, 2012).

Even worse, we punish those who try to publish null effects by considering them “second stringers” or incompetent—especially if they fail to replicate already published, and by
default, statistically significant effects. Of course, if you successfully emerge from your graduate training still possessing the view that the ideal of psychological science is the pursuit of truth, maybe you deserve to be punished. The successful, eminent scientists in our
field know better. They know that “the game” is not to pursue truth, but to produce something with p < .05. If you don’t figure that out early, you are destined to be unsuccessful because the people in control of resources are the ones who’ve succeeded at the game as it is played now.

Small N, Conceptual Replications

Under the right circumstances, conceptual replications are an excellent device in the researcher’s tool kit. The problem, of course, is that the “right circumstances” are those in which an effect is reproducible—as in directly replicable. In the absence of evidence that an effect can be directly replicated, conceptual replications might as well be a billboard screaming that the effect cannot be directly reproduced and the author was left sifting through either multiple studies or multiple outcomes across studies to find a statistically significant effect.

And for seemingly many good reasons the ideal conceptual
replication seems to be a small N
replication. Despite decades of smart methodologists pointing out that our research is poorly designed to detect such amazingly subtle things as between subjects, 2x2 interaction effects, researchers continue to plug away at sample sizes well south of 100 where they should be using samples in excess of 400 (Cohen, 1990; Simonsohn, 2014).