Nearly 80% of the reported effects in these empirical economics literatures are exaggerated; typically, by a factor of two and with one-third inflated by a factor of four or more.

Abstract

We investigate two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 159 empirical economics literatures that draw upon 64,076 estimates of economic parameters reported in more than 6,700 empirical studies. Half of the research areas have nearly 90% of their results under-powered. The median statistical power is 18%, or less. A simple weighted average of those reported results that are adequately powered (power ≥ 80%) reveals that nearly 80% of the reported effects in these empirical economics literatures are exaggerated; typically, by a factor of two and with one-third inflated by a factor of four or more.

“Statisticians routinely advise examining the power function, but economists do not follow the advice.” McCloskey (1985, p. 204)

Good policy and practice is built on the foundations of reliable scientific knowledge. Unfortunately, there are long-held suspicions that much of what passes as evidence in economics, medicine or in psychology (and possibly other fields) lacks sufficient credibility (De Long and Lang, 1992; Ioannidis, 2005b; Leamer, 1983; Ioannidis and Doucouliagos, 2013; Maniadis et al., 2017). For example, it has been difficult to reproduce and verify significant bodies of observational and experimental research independently (Ioannidis, 2005a; Begley and Ellis, 2012; Begley and Ioannidis, 2015; Duvendack et al., 2015; Nosek et al., 2015). Moreover, empirical research is plagued by a range of questionable practices and even the fabrication of results. Consequently, some argue that science is experiencing a credibility crisis. This crisis of confidence in research permeates multiple scientific disciplines. While there are discipline-specific nuances, there are also many shared experiences and distorted incentives. Just as declining credibility may spill over from one discipline to another, successful strategies and practices can benefit other disciplines. Hence, a multidisciplinary approach may advance all sciences.

Statistical power is a critical parameter in assessing the scientific value of an empirical study. Power’s prominence increases with policy importance. The more pressing it is to have evidence-based policy, the more critical it is to have the evidence base adequately powered and thereby credible. By definition, adequate power means that the empirical methods and data should be able to detect an effect, should it be there. Low power means high rates of false negatives. However, as Ioannidis (2005b) has argued, low power also causes high rates of false positives, where non-existent effects are seemingly detected. Aside from the prior probability that a given economic proposition is true (a magnitude that would likely cause endless debate among economists), the key parameters for assessing the validity of any given reported research result are: statistical power and the proportion of reported non-null results that are the artefact of some bias (e.g. misspecification bias and publication selection bias).

How credible is empirical economics? Is empirical economics adequately powered? Many suspect that statistical power is routinely low in empirical economics. However, to date, there has been no large-scale survey of statistical power widely across empirical economics. The main objectives of this article are to fill this gap, investigate the implications of low power on the magnitude of likely bias and recommend changes in practice that are likely to increase power, reduce bias and thereby increase the credibility of empirical economics.

For many researchers, a key consideration is whether a particular research project is publishable. In contrast, from a social welfare perspective, the more important consideration is the contribution that the research inquiry makes to science.1 The validity and credibility of empirical economics has long been questioned. For example, Leamer (1983) famously pointed out that empirical economics is vulnerable to a number of biases and, as a result, produces rather fragile results that few economists take seriously. De Long and Lang (1992) found evidence of publication selection bias among the top economic journals. Ziliak and McCloskey (2004) searched papers in the American Economic Review(AER) and found that only 8% of the empirical studies published in the 1990s actually consider statistical power.2 Doucouliagos and Stanley (2013) quantitatively surveyed 87 empirical economics areas and found evidence of widespread publication selection bias. Ioannidis and Doucouliagos (2013, p. 997) recently reviewed and summarised available evidence of prevalent research practices and biases in the field and called into question the credibility of empirical economics, arguing that overall ‘the credibility of the economics literature is likely to be modest or even low’.

We are not suggesting that power analysis is never done in economics. Indeed, the importance of power is recognised in several areas of empirical economics research. For example, in assessing randomisation of microcredit programmes, Banerjee et al. (2015, p. 3) conclude that ‘statistical power still poses a major challenge to microcredit impact studies’. Power analysis is frequently used in the revealed preference approach to consumption or production decisions (Andreoni et al., 2013). Nevertheless, in spite of its widely recognised importance, there are currently no large-scale surveys of statistical power in empirical economics nor a careful quantification of the consequences of ignoring power.3

Prior studies discuss power or bias only for leading economics journals (De Long and Lang, 1992; McCloskey and Ziliak, 1996), or where a wider range of journals is surveyed, only bias is considered (Doucouliagos and Stanley, 2013). In order to validate the claims of the lack of credibility of economics and to quantify the likely magnitude of bias, it is necessary to investigate the broader evidence base more rigorously. Accordingly, we survey two dimensions of the credibility of empirical economics research: statistical power and bias. Our survey is based on a statistical examination of 159 meta-analyses that provide over 64,000 estimates of key parameters (the estimated effect size and its estimated standard error) drawn from approximately 6,700 empirical studies. Using these data, we calculate: the proportion of reported findings that are adequately powered for a given area of economics research, the median power of that area of research, the estimate of effect that emerges when only adequately powered estimates are considered, and the proportion of the typical reported effect that is likely to be the result of some type of bias or artefact. We find that economics research is generally underpowered and most economics research is afflicted with substantial residual bias. Half of the areas of economics research have 10.5% or fewer of their reported results with adequate power.4We also find that 20% or more of research literatures have no single study that is adequately powered. In spite of this low power, most studies still report statistically significant effects. While these results cast a shadow on the credibility of economics research, not all is lost. At least one adequately powered study is available in most economics literatures that we examined. Moreover, meta-analysis can synthesise the results from numerous underpowered studies, filter out various biases and thereby suggest better estimates of underlying empirical economic parameters, necessary for valid inferences. Hence, even if the credibility of economics research is much lower than desirable, a careful systematic review and meta-analysis may improve statistical inference and offer some policy guidance.

Our second contribution is to present a new approach to correcting bias in empirical economics research, a weighted average of the adequately powered (WAAP). This estimator employs an unrestricted weighted least squares weighted average calculated only on the adequately powered estimates – in contrast to conventional meta-analysis that uses all available estimates. We show that using only the adequately powered studies, WAAP may give a credible and defensible estimate of the empirical effect in question. Should some type of publication selection, reporting or small-sample bias be present in the research record, WAAP is quite likely to reduce it. At the very least, the weighted average of the adequately powered offers a validation of corrected empirical effect estimated by other meta-regression analysis methods. An advantage of WAAP is that it makes no assumption about the shape, cause, or model of publication or selective reporting bias.