Second, on the same day an author made the astonishing claim in Slate magazine that 20 years of studies of ego depletion, an influential and seemingly robust set of findings, have recently dissolved into thin air.

Third, a group of 18 researchers in economics published the (long awaited) findings of another replication project and – after the dismal findings of another such attempt reported by two researchers from the St Louis Fed last year — , had mostly good news for econs. A colleague of mine from psychology sent me the write-up from The Economist with these words:

Is there a crisis, if not in economics, then in psychology? Or, in the social sciences as such?

In the following I will briefly comment on each of these three events from the last couple of weeks and the heat that they have produced. I will then try to cast a broader net before I come to an assessment of the current state of affairs in the social sciences.

So, is all fine in the house of experimental economics?

As I pointed out to my colleague, a replication of 11 of 18 experiments published in a couple of top journals (which have an acceptance rate of well below 5 percent and huge selection biases) says little about the state of the art – replicability, reproducibility — in economics. I like to believe that evidence production in economics is more stable than in psychology because economists’ experimentation practices are less laissez faire but I fear that we have also a lot of false positives. In work that I have done with Le Zhang (currently under second- round review), we have shown that dictator game experiments published in the top experimental economics journal were typically severely underpowered, inviting them pesky false positives. While Camerer et al. ran their replications under an exacting standard of a required power of 0.9, until recently most (well, at least most dictator game) experiments in economics were not properly powered up.

It is also worth recalling briefly that a few months earlier two Federal Reserve economists came to the alarming conclusion that economics research is usually not replicable. Their conclusion was based on an attempt to replicate 67 empirical papers in 13 reputable academic journals of which they could, without assistance by the original researchers, replicate only a third of the results. With the original researchers’ assistance that percentage increased to about half. A good summary of their study can be found here. This is an arguably even more troubling result, as you would think that the additional variance that experimenters’ choice of experimental design and implementation details entails, would reduce the irreplicability of empirical findings (where data sets are, after all, pre-existing). This replication attempt indicates the magnitude of the problems that economics might have.

I could parade many examples (endowment effects anyone? Loss aversion? Conjunction fallacy?) where serious questions have been raised about the replicability and reproducibility of effects claimed in the Biases and Heuristics literature.

On balance then there is reason to believe that economists have way to go and ought to continue to improve their data collection and sharing efforts and to reflect on the design and implementation of their experiments and, very importantly, the appropriate econometric assessment of the evidence produced. The house of experimental economics, I fear, is not yet in good order.

The latter three have, at least to my mind, demolished pretty good the case that Gilbert and his colleagues presented. As did Funder and Gelman (and some of the commentators on Gelman’s piece).

Funder and Gelman also step back from the battle and look at the war that really is being waged here and by doing so provide some much needed light where there is currently way too much heat.

Funder, for example, points out that the OSC study “is not the only, and was far from the first, sign that we have a problem”; he is too modest to point out that he himself has provided more than a decade back a lengthy contribution and problem description.

Funder, seemingly unaware of the replication crisis that economists are dealing with, points also out that “other fields have replicability problems too”; he mentions specifically biochemistry, molecular biology, and medical research including cancer biology studies.

He then argues, “if Gilbert & Co. are right, are we to take it that the concerns in our sister sciences are also overblown?” It is a rhetorical question to which his answer is pretty clear. He concludes with a useful discussion of ”the ultimate source of unreliable scientific research”, locating it in a tightening market for academic jobs and opportunities, the emerging “academic star” system, and other perverse incentives for academics.

You have to live in an ivory tower to believe that there is not. It seems obvious to me that there is and that before it gets better, it will get worse. That’s because suddenly everyone is talking about it and got interested in it. And a general sentiment has developed, and even found its way in editorial practices, that flashy results that barely clear conventional hurdles ought to be not trusted.

How deep the crisis is, is a question that is harder to answer. That’s because any such answer depends on what our measuring rod is, and ought to be. Are we looking for what some people call direct replication, or are we really interested in what some people have called conceptual replication and yet others have called reproduction? Ben Strickland makes an excellent case for conceptual replication here, arguing that what we really ought to be after is reproducibility of robust effects. Rolf Zwaan makes a related argument here.

In sum, there can be little doubt that there is a crisis. There is no crisis of the crisis, for all I can see. And it seems fair to say that the sense that there is a crisis has both widened and deepened to judge by the evidence that has been forthcoming.

That there is a crisis and that it is widening and deepening, at least for now, is the bad news. The good news is that overdue discussions – about replicability and reproducibility and everything that is connected to them — do take place and do take place in a serious manner. Mostly.

The widening sense of a widening and deepening crisis is upping the level of the game; it is for example encouraging to see the increasing offerings of pre-registered studies, the increased opportunities to publish replications or reproductions, the fact that many journals now require submission of data files before publications, etc. Similarly, it is encouraging to see platforms such as retraction watch emerge and clearly stay for good.

– The crisis has ethical, epistemological, methodological and even metaphysical dimensions;

– It has root causes which were predicted by history and philosophy of science scholarship and are described by present-day historical critique of commodified science;

– The crisis of science qua science impacts science as used for policy.

The crisis also calls for a discussion of the paradigm of evidence-based policy, the use of science to produce implausibly precise numbers and reassuring techno-scientific imaginaries and the use of science to ‘compel’ decision by the sheer strength of ‘facts’.

Andrea Saltelli
European Centre for Governance in Complexity
Centre for the Study of the Sciences and the Humanities (SVT) – University of Bergen (UIB), Institut de Ciència i Tecnologia Ambientals (ICTA) -Universitat Autonoma de Barcelona (UAB).