Empirical studies in economics journals use linear regression more than any other statistical tool. But can expert economists make accurate predictions from the typical results of such analysis? A recent experiment by Soyer & Hogarth (S&H, 2012) in the International Journal of Forecastingsuggests not. In fact, it was only when reports ditched the statistics altogether, and experts were left with only a simple scatter plot, that their predictions improved in accuracy.

It was only when reports ditched the statistics altogether… that their predictions improved in accuracy.

If the scatter plots are doing all the work you’d think that adding them to the typical statistics reported should help. But it didn’t; predictions didn’t improve until the other statistics were removed. S&H speculate that experts were distracted by other pieces of analysis, and lured into a false sense of certainty. What’s the most distracting piece of statistical information? All fingers point to the statistical significance of the regression coefficients…

This is not the first time we’ve seen experiments suggesting that researchers, even experienced ones who publish in leading journals, misunderstand and/or misinterpret their results—especially when those results include reports of statistical significance. Indeed, some of us in psychology have established entire careers on pointing out such errors in published research.

Typically such studies ask whether researchers can describe and interpret statistical results accurately, and draw reasonable conclusions from them. Invariably these studies reach the depressing conclusion that most researchers do not understand the results of their own studies, well apart from anything more abstract or complex.

The S&H study asks a more difficult question. Rather than simply focusing on whether researchers can interpret the outcomes of statistical analysis, they ask them to make specific probabilistic predictions based on those outcomes. This is difficult because it requires going beyond the average impact of the independent variable on the dependent variable, which is usually all one sees addressed in the discussion section of empirical articles. It’s certainly not an unfair question though, especially given that S&H participants were expert research economists for whom such predictions are likely to be routine.

How results are typically reported in economics journals.

To my mind, the S&H experiment provides yet more evidence against the institutionalised practice of reporting statistical significance.

Before their experiment, S&H surveyed four leading economics journals and found most focus on regression coefficients and their standard errors. R-squared was very commonly reported (80% of articles) whereas the Standard Error of the Regression equation (SER) was rarely given (9% of articles). R-squared gives information about model fit, in other words, the regression line. SER gives information about the degree of unpredictability, or the spread of the dots around the line, in units of the dependent variable. Both are important measures of uncertainty, according to S&H[1]. Scatter plots and other relevant graphics were reported in just 38% of articles. Most discussion sections focused primarily on the regression coefficients and their statistical significance. The latter is a big problem, given the results of the experiment, and how much this information seems to interfere with an appreciation of uncertainty. One of the long standing criticisms of statistical significance testing is the illusion of certainty it provides, where less than 0.05 is interpreted as ‘real’, ‘large’ and ‘important’ and greater than 0.05 is taken to mean ‘no effect’. Once researchers have slipped into that seductive dichotomy, it’s extremely difficult for them to care about anything else. What else could possibly matter? To my mind, the S&H experiment provides yet more evidence against the institutionalised practice of reporting statistical significance.

The experiment

In the experiment itself, S&H presented their expert sample (i.e., 257 faculty members of economics departments in leading US universities) results of a hypothetical linear regression. Participants saw one of three presentations. Presentation 1 was modeled on the typical presentation they found journals: it included a table of coefficients and their standard errors, a constant, and R-squared. Presentation 2 reflected arguably better practice, reporting those statistics as well as bivariate scatter-plots of the dependent and independent variables and the SER. Presentation 3 dropped the statistics altogether and just reported the scatter-plots. (Amongst those conditions they also varied the value of R-squared but we’ll ignore that for now).

As I mentioned above, the startling thing here is that experts did no better with Presentation 2 than with Presentation 1: both resulted in around 60-70% incorrect responses to the three questions below. Researchers had to drop the statistics altogether (Presentation 3) before predictions begin to improve—then incorrect responses dropped dramatically to between 3-7%!

Prediction questions (from S&H, p. 698).

1. What would be the minimum value of X that an individual would need to make sure that s/he obtains a positive outcome (Y>0) with 95% probability?

2. What minimum, positive value of X would make sure, with 95% probability, that the individual obtains more Y than a person who has X=0?

3. Given that the 95% confidence interval for β is (0.936, 1.067), if an individual has X=1, what would be the probability that s/he gets Y>0.936?

A couple of other interesting things about the article:

S&H point out that there are no official statistics reporting guidelines in the economics journals they surveyed. The pattern of typical reporting responses seems to be self generated, or a product of popular textbooks. Seems like some guidelines might be a good idea. Should they suggest NOT reporting statistics?

S&H briefly review reporting practices in other disciplines. They claim that Effect Size reporting has been standard in psychology since the 1980s, largely because of the policy of the journal Psychological Science. This is inaccurate. Effect Size reporting is far from standard practice in psychology, a fact which continues to upset critics of statistical significance testing in the discipline after many decades of advocacy (Cumming, 2012). In terms of statistical reform, psychology lags far behind medicine (Fidler, 2010). Even if S&H’s exaggerated claim was true, Psychological Science would not have been responsible for any revolution in the 1980s. The journal didn’t start until 1990!

Effect Size reporting is far from standard practice in psychology, …. psychology lags far behind medicine…. Even if S&H’s exaggerated claim was true, Psychological Science would not have been responsible for any revolution in the 1980s. The journal didn’t start until 1990!

[1] I say ‘according to S&H’, because I’d never actually heard of SER before this. It might be well known amongst economists, but as a psychologist, I’ve never seen it in a text book or reported in an article.