About 40% of economics experiments fail replication survey

When a massive replicability study in psychology was published last year, the results were, to some, shocking: 60% of the 100 experimental results failed to replicate. Now, the latest attempt to verify findings in the social sciences—this time with a small batch from experimental economics—also finds a substantial number of failed replications. Following the exact same protocols of the original studies, the researchers failed to reproduce the results in about 40% of cases.

"I find it reassuring that the replication rate was fairly high," says Michael L. Anderson, an economist at the University of California, Berkeley, not involved with the study. But he notes that most of the failures came from studies using a 5% "p value" cut-off for statistical significance, suggesting "what some realize but fewer are willing to discuss: The accepted standard of a 5% significance level is not sufficient to generate results that are likely to replicate."

Psychology's high-profile replication efforts, which were cautiously welcomed by the research community, have triggered policy changes at some scientific journals and modified priorities at many funding agencies. But the overall failure rate has also been called into question, because most of the original studies were reproduced only once, often without strictly following the initial protocol. And most of the replication studies allowed the replicators to choose their targets.

The latest attempt at social science do-overs—a replication of 18 studies in experimental economics—went to great lengths to avoid such criticisms. "We did not want to pick out studies on any subjective basis," says lead author Colin Camerer, an economist at the California Institute of Technology in Pasadena. Instead, the team set its criteria based on the experimental setup and whether a study produced one central result. They combed through papers published from 2011 through 2014 in two of the field's top journals, American Economic Review and the Quarterly Journal of Economics and came up with a set of 18 that met their criteria.

"Our approach was very lawyerly," Camerer says. Before starting, the researchers drew up a three-page "replication report" for each study, spelling out how it would be executed and interpreted. The report was sent out to the original authors for feedback. "The idea was that in retrospect nobody could say we were not clear about the replications [or that we] were being unfair." And it all went smoothly, he says. "To our pleasant surprise, basically all of them were a combination of flattered and happy we were going to replicate their study."

Eleven of the 18 economic replications succeeded, they report today in Science. "Our takeaway is that the replication rate is rather good," says Camerer, noting that the study topics from the successful replications reflect "most of the things we study in experimental economics [that] are replicated over and over: Do prices move toward where supply meets demand? Are there ‘price bubbles’ in artificial markets? Do people contribute in ‘public goods’ where spending some of your own money helps the group?"

"The authors were fair and collegial," says Homa Zarghamee, an economist at Barnard College in New York City whose 2011 study failed to replicate. "They took great care to exactly replicate our study methodologically," she says. But she adds that the failure doesn't mean the results from the original study were a false positive.

Zarghamee's study, conducted with John Ifcher, an economist at Santa Clara University in California, focused on the effect of happiness on economic decisions. To induce positive emotions in subjects, they used a clip of stand-up comedian Robin Williams. Since that original study was conducted, Williams has committed suicide, so the emotional effect of the video may now be emotionally mixed, or even the opposite, Zarghamee says. And another confounding factor is the audience: The subjects in the original study were American whereas those in the replication were British. "We think it is more accurate to interpret the failure to replicate our result as a 'treatment failure,'" Zarghameesays.

Outside observers see these different outcomes as an inevitable part of social science. "It should not be surprising or discouraging that a substantial number of scientific findings across fields prove difficult to replicate," says Eric Luis Uhlmann, an economist at the INSEAD business school in Singapore who was not involved in the study. "Small samples are noisy and human populations are diverse." The solution is to base conclusions on multiple attempts, he says. "Failures to replicate and reproduce findings should be considered a normal part of science."