We study the performance of a comprehensive set of equity premium forecasting strategies that have been shown to outperform the historical mean out-of-sample when tested in isolation. Using a multiple testing framework, we find that previous evidence on out-of-sample predictability is primarily due to data snooping. We are not able to identify any forecasting strategy that produces robust and statistically significant economic gains after controlling for data snooping biases and transaction costs. By focusing on the application of equity premium prediction, our findings support Harvey’s (2017) more general concern that many of the published results in financial economics will fail to hold up.

One challenge in answering the question of out-of-sample predictability is that almost all forecasting strategies are tested on a single data set. When many models are evaluated individually, some are bound to show superior performance by chance alone, even though they are not. This bias in statistical inference is usually referred to as ‘data snooping’. Without properly adjusting for this bias in a multiple testing set-up, we might commit a type I error, i.e., falsely assessing a forecasting strategy as being superior when it is not. In fact, Harvey, Liu, and Zhu (2016) note that equity premium prediction offers an ideal setting to employ multiple testing methods.

To the best of our knowledge, our study is the first to jointly examine the out-of-sample performance of a comprehensive set of equity premium forecasting strategies relative to the historical mean, while accounting for the data snooping bias. We construct a comprehensive set of 100 forecasting strategies that are based on both univariate predictive regressions and advanced forecasting models, including strategies that adopt diffusion indices or combination forecast approaches, apply economic restrictions on the forecasts, predict disaggregated stock market returns, or model economic regime shifts.

We use these forecasting strategies to predict the monthly U.S. equity premium out-of-sample based on the most recent 180 months and track their out-of-sample perfor-mance for the subsequent month over the evaluation period from January 1966 to December 2015. We aim to answer Spiegel’s (2008) question, i.e., whether there are forecasting strategies that provide a significantly higher performance than the prevailing mean model. As performance measures, we use the mean squared forecast error and absolute as well as risk-adjusted excess returns.

Why is data snooping a concern in our analysis? Suppose these 100 models are mutually independent, and we apply a t-test to each model with the significance level of 5%. The probability of falsely rejecting at least one correct null hypothesis is 1 – (1 – 5%)100 ≈ 0.994. Therefore, it is very likely that an individual test may incorrectly suggest an inferior model to be a significant one. This simple example emphasizes the importance of an appropriate method that can control such data-snooping bias and avoids spurious inference when many models are examined together.

Our results show that many forecasting strategies outperform the historical mean when tested individually. However, once we control for data snooping, we find that no forecasting strategy can outperform the historical mean in terms of mean squared forecast errors. With respect to return-based performance measures, we find marginal evidence for statistically significant economic gains at least on a risk-adjusted excess return basis when using the equity premium forecasts in a traditional mean-variance asset allocation, even after controlling for data snooping bias. In contrast, the benefits for a pure market timing investor are limited. Taken together, our findings strengthen the results of Goyal and Welch (2008) that the out-of-sample predictability of the equity premium is questionable."