We study the performance of a comprehensive set of equity premium forecasting strategies that have been shown to outperform the historical mean out-of-sample when tested in isolation. Using a multiple testing framework, we find that previous evidence on out-of-sample predictability is primarily due to data snooping. We are not able to identify any forecasting strategy that produces robust and statistically significant economic gains after controlling for data snooping biases and transaction costs. By focusing on the application of equity premium prediction, our findings support Harvey’s (2017) more general concern that many of the published results in financial economics will fail to hold up.

One challenge in answering the question of out-of-sample predictability is that almost all forecasting strategies are tested on a single data set. When many models are evaluated individually, some are bound to show superior performance by chance alone, even though they are not. This bias in statistical inference is usually referred to as ‘data snooping’. Without properly adjusting for this bias in a multiple testing set-up, we might commit a type I error, i.e., falsely assessing a forecasting strategy as being superior when it is not. In fact, Harvey, Liu, and Zhu (2016) note that equity premium prediction offers an ideal setting to employ multiple testing methods.

To the best of our knowledge, our study is the first to jointly examine the out-of-sample performance of a comprehensive set of equity premium forecasting strategies relative to the historical mean, while accounting for the data snooping bias. We construct a comprehensive set of 100 forecasting strategies that are based on both univariate predictive regressions and advanced forecasting models, including strategies that adopt diffusion indices or combination forecast approaches, apply economic restrictions on the forecasts, predict disaggregated stock market returns, or model economic regime shifts.

We use these forecasting strategies to predict the monthly U.S. equity premium out-of-sample based on the most recent 180 months and track their out-of-sample perfor-mance for the subsequent month over the evaluation period from January 1966 to December 2015. We aim to answer Spiegel’s (2008) question, i.e., whether there are forecasting strategies that provide a significantly higher performance than the prevailing mean model. As performance measures, we use the mean squared forecast error and absolute as well as risk-adjusted excess returns.

Why is data snooping a concern in our analysis? Suppose these 100 models are mutually independent, and we apply a t-test to each model with the significance level of 5%. The probability of falsely rejecting at least one correct null hypothesis is 1 – (1 – 5%)100 ≈ 0.994. Therefore, it is very likely that an individual test may incorrectly suggest an inferior model to be a significant one. This simple example emphasizes the importance of an appropriate method that can control such data-snooping bias and avoids spurious inference when many models are examined together.

Our results show that many forecasting strategies outperform the historical mean when tested individually. However, once we control for data snooping, we find that no forecasting strategy can outperform the historical mean in terms of mean squared forecast errors. With respect to return-based performance measures, we find marginal evidence for statistically significant economic gains at least on a risk-adjusted excess return basis when using the equity premium forecasts in a traditional mean-variance asset allocation, even after controlling for data snooping bias. In contrast, the benefits for a pure market timing investor are limited. Taken together, our findings strengthen the results of Goyal and Welch (2008) that the out-of-sample predictability of the equity premium is questionable."

We at Quantpedia are pleased to invite you to a our new webinar Classification of Quantitative Trading Strategies prepared in cooperation with our friends from QuantInsti. Webinar is scheduled on Tuesday 11th July, 9:30 AM EST | 7:00 PM IST | 9:30 PM SGT and will cover a range of topics related to applicability of financial academic research in a real trading.

Session Outline

- Introduction to ‘Quantpedia & QuantInsti™’
- Overview of research in a field of quantitative trading
- Taxonomy of quantitative trading strategies
- Where to look for unique alpha
- Examples of lesser-known trading strategies
- Common issues in quant research
- Questions and Answers
Register now !

Real estate investment trusts (REITs) are often considered to be a distinct asset class. But, do REITs deserve this designation? While exact definitions for asset class may vary, a number of statistical methods can provide strong evidence either for or against the suitability of the designation. The authors step back from the established real estate and REITs literature and answer this broader question. Beginning with a set of asset class criteria, the authors then utilize a variety of statistical methods from the literature and factor-based asset pricing to evaluate REITs for their candidacy as a distinct asset class. REITs fail to satisfy almost all of the relevant criteria which leads the authors to conclude that REITs, in fact, are not a distinct asset class but do deserve a market capitalization weighted allocation in a diversified investment portfolio.

Notable quotations from the academic research paper:

"Many investors think of real estate investment trusts (REITs) as a distinct asset class because, in aggregate, they have historically had relatively low correlation with both stock and bond markets. However, this is a far too simplistic definition for what defines a distinct asset class. Many individual stocks have low correlation with the overall stock and bond markets, yet no one would (hopefully) consider a single stock, or a small handful of stocks, to be an asset class. For individual equities, a better definition would be a well diversied portfolio of securities which has historically demonstrated statistically significant excess return relative to what is explained by a generally accepted factor model like the Carhart [1997] four-factor model. For example, early research on the size and value premiums argued that these two types of equity securities are distinct equity asset classes because their excess returns are not fully accounted for by CAPM.

On a relative basis, public REIT equities are a young investment vehicle. The REIT Act title law of 1960 allowed the creation of REITs and accordingly, the ability for investors to gain access to diversified real estate portfolios. The first REIT was formed shortly thereafter and the first public REIT debuted in 1965. Early research into public real estate investment, such as Webb and Rubens [1987], tends to use appraisal-based individual property data and suggests that real estate provides diversication benefits for traditional stock and bond portfolios. Following the growth of the industry and accumulation of sufficient returns histories, REIT indexes debuted. Subsequent studies often used REIT indexes, tending to confirm earlier findings concerning diversification benefits and suggesting sizable portfolio allocations.

We establish a pragmatic list of criteria for consideration as an asset class and then use an array of techniques to evaluate REITs as such. While REITs do indeed exhibit relatively low correlation with traditional equity and fixed income, a deeper dive into their returns reveal shortfalls in their qualifications for asset class distinction. Four- and six-factor regression analyses reveal no statistically reliable alpha generation in REIT returns and coefficient estimates point to REITs being well explained by traditional risk factors. Taking direction from the regression results and attempting a long-only replication of REIT returns with small-value and equities and long-term corporate bonds produces a portfolio that comoves well with REIT returns and exhibits historically superior return and risk characteristics. Utilizing tests of mean-variance spanning, we also examine the diversification properties of REITs on a statistically inferred basis. These tests suggest that REITs do not reliably improve the mean-variance frontier when added to a benchmark portfolio of traditional stocks and bonds. These results, and the associated failure to satisfy our asset class criteria, lead us to conclude that REITs are not a distinct asset class.

In conclusion, we want to make clear that we are not suggesting that REITs deserve no allocation in an investment portfolio. Nor are we suggesting that any results previously brought forth in the literature are spurious or incorrect. The results of this study lead us only to suggest that REITs, as an equity security with only marginal diversication benefits, should not receive a weighting in investor portfolios that significantly deviates from market capitalization based weights. The Dow Jones U.S. Select REIT Index represents a non-trivial approximately 2.5 percent of the Russell 3000 Index, as of early 2017, on a market capitalization basis, which we would argue is a valid starting point for a REITs allocation in a diversified portfolio."

Factor investing has experienced a resurgence in popularity under the moniker “smart beta.” Several traditional factors, such as value, size, momentum, and low beta, are well defined and have been heavily researched in academia as return anomalies for many decades. These factors have also been exploited by practitioners as quantitative strategies for enhancing returns. Today, these factors each define a distinct smart beta category (think of style boxes for smart beta strategies) and are the foundational building blocks for the now-ubiquitous multi-factor products.

Notable quotations from the academic research paper:

"The recently popularized quality factor, however, appears to stand-alone in many regards. Like the four previously named factors (value, size, momentum, and low beta), quality investing has been widely practiced as an investment strategy by portfolio managers. MSCI, FTSE Russell, S&P, EDHEC, and Deutsche Bank, among others, have created quality factor indices for licensing and have generally included quality as a part of their multi-factor offerings. But, unlike the conventional factors, quality as a source of return has attracted limited academic attention and has been focused on only some facets of what practitioners categorize as quality. In a way, quality is a product waiting for academic validation, and the early results appear to be more inconclusive than its massive popularity might warrant.

In a routine product conversation with investors, the quality factor is pitched by providers as an independent source of return and as providing diversification due to its supposedly low correlation with the value factor. What remains uncomfortable for researchers, however, is that the quality factor is constructed very differently than other factors. Factors, such as value or low beta, are created from a particular stock characteristic (or a set of highly related stock characteristics) to capture a risk premium associated with an undiversifiable economic risk or to capture an anomalous return associated with a persistent investor behavioral bias. For example, the value factor is generally constructed from stocks that have high book-to-price, high earnings-to-price, high dividend-to-price, or some combination of these valuation measures. Regardless of the chosen definition for factor construction, the resulting portfolio looks and feels like a value portfolio in that it owns low valuation stocks.

In contrast, quality factor portfolios, as constructed by the different providers, have been entirely multi-signal in nature. Providers tag a stock as high quality if it scores high on some combination of the following attributes: earnings growth, earnings-growth stability, low return-volatility, high profitability, high return on assets (ROA), low debt ratio, and low accounting accruals. We begin our study by examining definitions of quality implemented in different product offerings. We show that quality, as executed by practitioners, is a collection of heterogeneous signals having little correlation with each other. Quality, as currently defined, would seem to be a catch-all bucket for those portfolios that blend many otherwise independent return factors.

These stock screens appear to favor heterogeneous groups of stocks and produce portfolios with low correlation to one another. The stocks appear to be selected for their diversity, and the multiple signals used in constructing the quality portfolios do not appear to be proxies for a single specific risk exposure or behavioral anomaly. Thus, a quality portfolio can seem more like a quantitative strategy based on multiple signals than a factor in the heritage of the arbitrage pricing theory (APT) framework. This unique feature plays an important role in how we analyze quality products versus how we examine other more conventional factors.

Because quality is being defined as it is—a collection of heterogeneous signals—a spectrum of possible portfolio outcomes exists. In the most positive case, the resulting quality portfolio has impressive out-of-sample performance if each of the signals or characteristics included in the construction of the portfolio represents a unique source of premium, whether risk based or behavioral based. In this case the resulting quality portfolio would be a multi-factor portfolio offering a diversified basket of excess returns.

In the worst case, the multi-signal portfolio will have indistinguishable from zero out-of-sample performance despite its impressive back-tested t-stats. Why could t-stats be misleading in this case? The large pool of multiple signals to select from creates opportunities for intentional or unintentional data mining overstating the t-stats; this issue has been emphasized by Harvey, Liu, and Zhu (2015). Further, from Novy-Marx (2016) we know that combining several uncorrelated factors selected because of their spurious in-sample performance further overstates the t-stat. For example, if an ex ante random strategy has an in-sample t-stat larger than 2 with only a meager 5% probability, then a mix of three such strategies selected ex post for best performance out of 20 strategy realizations will register a t-stat above 2 with a probability of nearly 98%. In reality, this portfolio will offer investors nothing more than noise and unwarranted fees and expenses, suggesting that the process of analyzing and contrasting different quality portfolio methods is both more difficult and more important.

Given the observation that quality is a collection of heterogeneous signals, we examine where the current quality portfolios are on the spectrum of robustness: from being a collection of robust anomalies with a high chance of outperformance, at one end, to being a collection of signals selected due to their spurious in-sample performance with little chance of outperformance out of sample, at the other end. We use the Hsu, Kalesnik, and Viswanathan (2015) method to identify robustness of variables used in quality definitions."

We examine whether and to what extent successful equities investment strategies are transferrable to the commodities futures market. We investigate a total of 7 investment strategies that involve optimization and mean-variance timing techniques. To account for the unique characteristics of the commodity futures market, we propose a novel method of classification based on momentum or term structure properties in the formation of long-short portfolios in conjunction with the quantitative strategies from the equities literature. Our strategies generate significant excess returns and risk-adjusted performances as measured by the Sharpe and Sortino ratios and the maximum drawdown. We find no significant correlation between the strategies’ excess returns and common risk factors. There is no evidence that excess returns are a compensation for liquidity risk. The strategies are robust to transaction costs and choice of model parameters and exhibit stable performance across various market environments including times of financial crises.

Notable quotations from the academic research paper:

"There are theoretical and empirical reasons to believe that commodity futures investments command positive risk premia. The theoretical considerations relate either to the theory of storage where the risk premium depends on inventory levels, and thus on the slope of the forward curve, or to the hedging pressure where the risk premium is a function of hedgers’ and speculators’ net positions. These theories have been empirically validated in numerous empirical studies, all of which highlight that futures returns depend on the fundamentals of backwardation and contango.

In all of the empirical studies, the scheme employed to weight commodities within a portfolio is equal-weighting. The rationale for this choice comes from the fact that unlike in equity markets, there is no natural value-weighting that can easily be applied; its equivalent (production and consumption weighting) is hard to implement given how difficult it is to collect reliable inventory data. This paper proposes to relax the assumption of equal weights and to test whether weighting schemes emanating from the equity literature could be more profitable. These weighting schemes pertain to mean-variance optimization and volatility timing.

Our first contribution is to apply the weighting schemes of the equity literature to the commodity markets. In total, eight weighting schemes are considered; these can be split into an equal-weighting scheme, two optimization schemes, and five timing schemes. The equally weighted scheme is standard in the commodity pricing literature. The optimization strategies follow Markowitz (1952) and are based on mean-variance (MV) and minimum variance (MIN). When it comes to the timing strategies, three approaches follow Kirby and Ostdiek (2012) and define portfolio weights based on volatility timing (VT), beta timing (BT), and reward-to-risk timing (RRT); the other two are novel and based on tail risk as modeled via Value-at-Risk (VaR) and conditional Value-at-Risk (CVaR).

Our second contribution is to amend the aforementioned weighting algorithms so as to consider the specificities of backwardation and contango that prevail in commodity futures markets. We do that by adding a novel step to the weighting procedure that considers either past performance or the slope of the term structure of commodity futures prices as a buy or sell signal for each of the commodities present in the cross section at the time of portfolio formation. This method allows for the possibility of being either long or short while mitigating common issues such as extreme weights or artificial inflation of returns due to the self-financing nature of long-short positions.

Our findings indicate that the optimization-based and timing strategies perform well, as measured by a range of risk-adjusted return metrics (Sharpe and Sortino ratios, Maximum Drawdown, and VaR and CVaR). We find no evidence that returns from the investment strategies are compensation for liquidity risk. Our investment strategies are robust to transaction costs and the the choice of model parameters. We show two alternative methods of classifications, that nominate commodity futures for long or short positions based on momentum and term structure, exhibit similarly strong performance. We also show that our strategies are stable in various market conditions, from crisis to high growth periods. we document that these strategies are amongst the most profitable in the literature and show that common risk factors in the commodity futures market are unable to account for their positive and significant excess returns."