The replication movement in psychology has had many positive effects, such as the discussion of how to avoid p-hacking and the emphasis on increased transparency, including posting data, detailed methods sections, and the results of unpublished studies on publicly available web sites. These practices will undoubtedly improve our science.

But something is seriously out of whack. Despite its benefits, the replication movement has had a polarizing effect. Whereas most of the researchers involved in the replication movement have the best interests of the field at heart and are well-intentioned, some seem bent on disproving other researchers’ results by failing to replicate. Whereas some researchers have embraced the movement and taken part in it, others are deeply suspicious and fear that ill-intentioned replicators will target them, fail to replicate their findings, and damage their reputations.

Why are many people afraid? One reason, I believe, is that there has been more emphasis on false positives than false negatives. When an effect fails to replicate, the spotlight of suspicion shines on the original study and the authors who conducted it. “False Positive Alert” flashes like a neon sign as the buzz spreads in the Tweetosphere and Blogworld. But why should we assume that a failure to replicate is “truer” than the original study? Shouldn’t the spotlight shine as brightly on the replicators, with a close examination of their research practices, in case they have obtained a false negative?

There are many reasons why a false negative could occur, including these:

Replications might be conducted by researchers who are inexperienced or lack expertise, either in general or in the particular area they are trying to replicate.

As has been well documented, researchers are human and can act in ways that make them more likely to confirm a hypothesis, resulting in p-hacking. But replicators are human too, and if their hypothesis is that an effect will not replicate, they too can act in ways that increase the likelihood of obtaining that outcome—a practice we might call p-squashing. For example, it would be relatively easy to take an independent variable that had a significant effect in the laboratory, translate it into an on-line study that delivers the manipulation in a much weaker fashion, and then run hundreds of participants, resulting in a null effect. Adding such a study to a meta-analysis could cancel out positive findings from several smaller studies because of its very large sample size, resulting in meta p-squashing.

As others have noted (e.g., Stroebe & Strack, 2013), a direct replication could fail because it was conducted in a different context or with a different population, and as a result did not manipulate the psychological construct in the same manner as did the original study.

Do I have evidence that many of the studies that have been done as part of the current replication movement have been plagued by the above problems? Well, not much, though I suggest that the evidence is equally weak that false positives are rampant. One might even argue that there is just as much evidence that we have a crisis of false negatives as we do a crisis of false positives.

This is important because both kinds of errors can have serious consequences. As many in the replication movement have argued, false positives can be costly to a field’s credibility and to subsequent researchers who spend valuable research time going down a blind alley. But false negatives can also be damaging, both to the reputation of the original researcher and the progression of science (see Fiedler, Kutzner, and Krueger, 2012, for an excellent discussion of this issue). Consequently, neither those who attempt replications nor the authors of original studies should stake out the moral high ground in this debate. We should all scrutinize replications with the same critical eye as we do original studies and not assume that a failure to duplicate a result means that the original finding was false. For example, if replications are submitted to a journal, they should undergo the same rigorous review process as any other submission.

There is another unintended effect of the replication movement, namely that it places too much emphasis on duplication and not enough on discovering new and interesting things about human behavior, which is, after all, why most of us got into the field in the first place. As noted by Jim Coan, the field has become preoccupied with prevention and error detection—negative psychology—at the expense of exploration and discovery. The biggest scientific advances are usually made by researchers who pursue unorthodox ideas, invent new methods, and take chances. Almost by definition, researchers who adopt this approach will produce findings that are less replicable than ones by researchers who conduct small extensions of established methodologies, at least at first, because the moderator variables and causal mechanisms of novel phenomena are not as well understood. I fear that in the current atmosphere, many researchers will gravitate to safe, easily replicable projects and away from novel, creative ones that may not be easily replicable at first but could lead to revolutionary advances.

For those interested in conducting replications, there might be a happy medium. For example, researchers all over the world have conducted replications of the same phenomenon as part of the “Many Labs” project. I suggest that we would learn more from this endeavor with a small twist: Ask all participating labs to add an interesting moderator variable of their choice to the design, with random assignment, in addition to performing a direct replication. This would nudge replicators into thinking deeply about the phenomenon they are trying to replicate and to make predictions about the underlying psychological processes, possibly leading to substantial advances in our understanding of the phenomenon under study—that is, to discovery as well as duplication.

In any polarized debate, common ground becomes obscured. It is thus worth remembering that all scientists agree on two things: We want our methods to be as sound as possible and we value novel, creative, groundbreaking findings. It would be unfortunate if the emphasis on one came at the expense of the other.

(Note: This post benefited greatly from comments by Jerry Clore, Dan Gilbert, and Brian Nosek—but by thanking them I do not mean to imply in the least that they agree with anything I have said.)

I definitely agree that replications should be reviewed along the same standards as original studies. A very important criterion for both replications as original studies that I missed in this blog is power. Neither a replication nor an original study with low power is very informative, as low power increases the probability of chance (non-)findings.

However, I disagree that “the evidence is equally weak that false positives are rampant”. Many studies have shown that there are way too many positive findings published, given the typical power that is used (see e.g., Fanelli, 2010; Button et al., 2013). This indicates that false positives are indeed a very serious problem.

It might be true that the biggest scientific discoveries are initially made by researchers who pursue unorthodox ideas, but new discoveries have to be thoroughly replicated to become scientific advances.

There is no (or should be no) such thing as a ‘replication movement’. Framing it this way makes it sound like replication is something tricky and suspicious and revolutionary, when replication should just be part of our standard operating procedure because it’s part and parcel of science.

Of course single failed replications have about as much weight as the original research. Anyone who thinks otherwise doesn’t understand how science progresses.

‘Too much emphasis on duplication, not enough on novel things’; this attitude is the problem. There is nothing ‘mere’ about duplication, it is in fact the best tool we have for figuring out whether that exciting new and interesting thing about human behaviour is real or not. Without this, we have a field of unproven results that no one can reliably build and develop new ideas from.

This idea that the best science is done by lone geniuses thinking outside the box is also nonsense. First, it’s not how most good science is really done. Second, even the occasional lone genius who is doing good things is only able to do so because of all the hard work that went into shoring up the foundations of their field.

Science is hard. It takes time, effort, and careful attention to detail, and results don’t get to be trusted until they’ve been poked, prodded and thwacked from every direction. Psychology needs to front up to this fact and make replication and extension a part of our everyday business.

Perhaps, though one of my points is that this can be a two-edged sword. If the replication was done badly, or if the researchers engaged in p-squashing, than the increased power could further obscure a true effect.

“There is another unintended effect of the replication movement, namely that it places too much emphasis on duplication . . . the field has become preoccupied with prevention and error detection—negative —at the expense of exploration and discovery.”

I wonder if you could elaborate on “too much emphasis” and “preoccupied.” From where I sit (as someone who has funded recent replication initiatives), it seems that a field that, like many others, has ignored replication for far too long now has a tiny smidgen of concern about it. I mean, out of many dozens of psychology journals, one low-ranked journal devoted one issue to replication. And out of zillions of psychology experiments this past year, one was the “Many Labs” project.

If this counts as “too much” replication, what would be the right amount?

Also, what has been the reaction to the O’Keefe/Reis failed replication of a study that you and Gilbert authored? You mention the various ways that a replication could fail — were any of those faults present?

Stuart,
First, thanks for your support of the field. You raise good questions and I would welcome a conversation about them, though I’m not sure that this is the best forum for that conversation. If you would like to chat offline, my email is tdw@virginia.edu.

Briefly, I hope it was clear from my post that the recent emphases on methodology and replications has had many positive effects. The reaction in the field has been quite polarized, though, and I think it is important for us all to have a conversation about why that is and what the potential downsides are. I tried to be clear in my post about possible costs.

As to your question about the replication of Whitchuch, Wilson, & Gilbert (2011), I would first point out that the O’Keefe and Reis study was not done as part of the replication project, but as part of the way science has always proceeded: Other researchers were interested by our findings, wanted to extend them by testing a moderator variable, and so conducted a replication with a new manipulation. I think that’s great.

For me, the biggest problem in psychology, and one that has been noted by various psychologists through the years (e.g. Tulving) is the devaluation of conceptual thinking/conceptual integration in favor of experiments that, in the absence of serious attempts at developing a coherent “infrastructure,” are just feathers in the wind. The more solid the conceptual basis for an experiment is, the more credible it will be and the less vulnerable in the face of chance variations or errors in procedure. In the case of “he said she said” the person with the better argument should, provisionally, be declared the winner. If neither has a good one, then it really doesn’t matter much either way.

I have evidence that most of the recent “replications” related to priming are strongly deficient

Most of those published recently were not precise replications, and deviate from the original protocols. None that I know of discussed it beforehand with the original researchers (one experiment did. But does not seem to have got full agreement. But will not elaborate here….)

Most have had a pre conceived belief that the phenomenon does not exist.

Most were done by researchers at the beginner level.

will be more than happy to share my analysis if someone is into analyzing it