Ugly vs. Pretty Value Stocks

As Warren Buffett rightly says, value and growth are joined at the hip.[1] It seems like a perfectly sensible strategy to pay more for high-quality businesses than for low-quality deep value stocks. And it is… in theory. In practice, it is extraordinarily difficult to find the right trade-off. There has been done some systematic research trying to improve value strategies by including a quality component. Quality, here, means anything one should be willing to pay for (e.g., ROE, ROIC, growth, profitability, etc.) And some of these studies show very counter-intuitive results.

A prominent example of a strategy that supplements a pure value ranking by a quality measure is Joel Greenblatt’s Magic Formula. In their outstanding book “Quantitative Value”, Wesley Gray and Tobias Carlisle show that the quality component actually decreases the performance of a portfolio based on a value ranking alone.[2] The likely reason for this is the mean reverting nature of return on capital, the used quality measure. Economic theory dictates increased competition if companies demonstrate high returns of capital, and exits of competitors if returns are poor. The new competition decreases returns for all suppliers. Exits of competitors increase returns for prevailing businesses. Betting on businesses with historically high returns seems like a bad idea on average, then.

As ambitious bargain hunters, we try to find high-quality businesses at low prices. Studies, however, show that (in competitive markets) valuation is far more important than quality. And in some cases, due to naïve extrapolation of noise traders, quality is actually associated with lower returns. Lakonishok, Shleifer and Vishny (1994) study exactly that. Their results fly in the face of many investors working hard to find the ‘best’ value stocks. Lakonishok et al. construct value portfolios not only based on current valuation ratios but also on past growth. They define the contrarian value portfolio as having a high Book-to-Market ratio (B/M) — the inverse of P/B — and low past sales growth (GS). The reasoning behind this is that by Lakonishok’s et al. definition, value strategies exploit other investor’s negligence to factor reversion to the mean into their forecasts. This is a form of base rate negligence, a tendency in intuitive decision-making found by Kahneman and Tversky (1982).[3] Lakonishok et al. thus identify stocks with low expected future growth (valuation ratio) and low past growth (GS) that indicate naïve extrapolation of poor performance. They show that this definition of value performs better than a simple definition based only on a valuation ratio (e.g., B/M.) Another way of looking at this is by subdividing the high B/M further into high and low past growth. The low past growth stocks outperform the high past growth stocks by 4% p.a. (21.2% vs. 16.8% p.a.) while the B/M ratios of these sub-portfolios “are not very different.” [4]

In his excellent book “Deep Value”, Tobias Carlisle shows insightful statistics for these portfolios. The incredible insight is that even if valuation ratios are practically the same, stocks that rank low on quality (past sales growth) perform better than high-quality stocks. One likely reason is mean reversion in fundamentals.

Similar results are also showing in the deepest of value strategies: net-nets. Oppenheimer (1986) shows that loss-making net-nets outperformed profitable net-nets (36.2% p.a. vs. 33.1%), and non-dividend-paying net-nets outperformed dividend-paying net-nets (40.6% vs. 27.0%) from 1970 to 1983. Carlisle confirms these results out of sample from 1983 to 2010.[5] My own backtests confirm these results from 1999 to 2015. My results at least are, however, mainly driven by the higher discount — profitable businesses don’t usually trade at large discounts to NCAV.

Whether you are a full quant or not, if you are trying to pick the ‘best’ stocks from a value screen you are likely making a systematic mistake — unless you are searching for businesses with moats (i.e., a sustainable competitive advantage that prevents a high return on capital to revert to the mean.) But good luck finding such a business in deep value territory consistently.

Regression to the mean is such a strong tendency and is systematically underestimated by market participants that just betting on historically poorly performing businesses outperforms the market.[6] Bannister (2013) finds that betting on “unexcellent” companies (ranking low on growth, return on capital, profitability) outperformed the market from 1972 to 2013 (13.74% p.a. vs. 10.59%). A portfolio constructed of stocks of “excellent” businesses, in turn, underperformed the market (9.77%).[7]

I still think good quality measures (i.e., measures that do not implicitly bet against regression to the mean in fundamentals) are a potent tool for improving a value ranking. It is, however, not as easy as layering a quality screen blindly over a value screen and thereby imply equal weights. A category of quality measures that is of special interest to me is distress/bankruptcy prediction. But even if the such a measure is very good at identifying value traps, there is still the very serious issue of false positives. That is, excluding stocks that actually perform well on average. A too sensitive measure will likely exclude all the ugliest stocks that perform the best. More research is needed to determine a sensible weighting mechanism. The merit of such a measure is dependent on the false negative error rate, false positive error rate, the cost of false negatives, and, importantly, on the cost of false positives. The cost of false positives may be very high for concentrated portfolios. Even if in studies the quality measure can improve performance, that doesn’t mean that it will improve a concentrated value portfolio (20-30 stocks). The reason is that these studies often hold a very diversified portfolio (e.g., a decile). This is quite a number of stocks. If the quality factor excludes 20 extremely cheap stocks, it’s not a big deal. If you were to hold the 30 cheapest stocks in the universe, however, and the quality factor excludes 20 of them and the next cheapest stocks have 2 times the valuation ratio, it is very likely that the performance will suffer. It will dilute the value factor too much. The important thing is to actually backtest your portfolio and not just rely on studies.

Another interesting area of research lies in identifying moats that prevent mean reversion of high return businesses. That, however, still leaves the question open if these businesses are systematically undervalued.

And I thought one of the topics more relevant to your excellent article above; specifically I wrote and you kindly responded as follows:

[Me] Agree; loss making net nets per Carlisle’s extension of Oppenheimer’s original research shows this. However, it also shows that the quintile of net nets with the largest discount to NCAV produced the lowest returns – my interference is the discount to NCAV needs to be married with some ‘quality’ attributes.

[You] I don’t know for sure, but the underperformance of the cheapest quintile might be due to data issues. Carlisle, I believe, used Compustat data, which is the highest quality fundamental data I’m aware of. But even this database is not perfect. If, for example, the company liquidated major assets and already returned the capital via special dividends to the shareholders, but the current financial statements do not reflect this return of capital yet, stocks can look extremely cheap on the surface. Quite often my screen picks up a stock trading at 5% of NCAV or something. That tells me immediately that there is probably a data issue. I have to then investigate a little further. Hence, if you want to trade 100% automated, which I can’t recommend because the data just isn’t good enough for these micro caps, then it might be a good idea to screen out everything below 20% of NCAV, for example. This will probably screen out most of the junk. Another cause might be just chance. There aren’t that many net-nets available. Thus, a quintile of that amount is a very small sample size. Excluding the cheapest quintile just based on this study might be overfitting.

I guess the sample size is small, but the other 4 quintiles were consistent with the results of the original study. The fact that the cheapest quintile produced the worst results, and this could be due to bad data is alarming to say the least!

“If, for example, the company liquidated major assets and already returned the capital via special dividends to the shareholders, but the current financial statements do not reflect this return of capital yet, stocks can look extremely cheap on the surface. Quite often my screen picks up a stock trading at 5% of NCAV or something. That tells me immediately that there is probably a data issue.”

Awesome insight – thank you.

If it is due to error I would think that error may have also occurred in the back tests of the Acquirer’s Multiple in Deep Value. After all the results in Deep Value i.e. that the cheapest of the cheap produce returns greater than cheap + quality (to my understanding, having read both books) are empirically false. How do you understand it?

“Even if in studies the quality measure can improve performance, that doesn’t mean that it will improve a concentrated value portfolio (20-30 stocks). The reason is that these studies often hold a very diversified portfolio (e.g., a decile). This is quite a number of stocks. If the quality factor excludes 20 extremely cheap stocks, it’s not a big deal. If you were to hold the 30 cheapest stocks in the universe, however, and the quality factor excludes 20 of them and the next cheapest stocks have 2 times the valuation ratio, it is very likely that the performance will suffer. It will dilute the value factor too much. The important thing is to actually backtest your portfolio and not just rely on studies.”

This is such an interesting analysis, thank you.

Given all the great content in QV do you think applying that process to net nets, as an individual, would work? One would need to ‘dumb it down’ as a retail investor would not have access to resources used in QV but if one only look at firms trading at a discount of at least 50% to NCAV and then filtered out Chinese firms etc. then used the M-Score and Z-score, then used some more simple quality measures related to those shown to work in QV do you think the results would be as close to optimal given the limited resources available to an individual investor?

Thanks for shifting the conversation to the appropriate comment section! This helps enormously in keeping the blog well organized. I appreciate that very much indeed!

Let me first expand on my previous comment that the underperformance of the cheapest quintile might be due to data issues. I said, „If, for example, the company liquidated major assets and already returned the capital via special dividends to the shareholders, but the current financial statements do not reflect this return of capital yet, stocks can look extremely cheap on the surface.“ What I mean is the following: Imagine a company with $10 mil. in assets (all current) and $5 mil. in liabilities as of December 31, 2015. Let’s say the management decided in January of 2016 to liquidate a major asset for $9 mil. The company pays back all liabilities and distributes the remaining $4 mil. to the shareholders via a special dividend. Let’s assume the company traded at $5 mil. before the distribution. Ex-dividend, the company now trades at $1 mil. To a simple quantitative screen that doesn’t account for the distribution, it looks like the company is trading at 20% of NCAV. But this is, of course, false. The asset is gone. Only in the subsequent financial statements is this fact reflected, though. This isn’t really bad data. The financial statements are accurate – they just don’t reflect events after the filing date. This can be thought of as a misspecification of the model. There is an omitted variable, which in this particular example would be distributions after December 31, 2015. But like I said, I don’t know if this is the actual reason for the underperformance. This is just an example. These events are quite rare, but they accumulate at the bottom.

I don’t think the conclusion that contrarian value stocks outperform high-growth stocks is false. The problem described above would impact the contrarian portfolio negatively. Hence, the performance of the contrarian portfolio would actually be even better when one would account for this error. However, these situations are too rare to have a meaningful impact on returns of such a diversified portfolio as constructed by Lakonishok, Shleifer, and Vishny (1994).

Quality measures that don’t let you pay up for qualities that are known to be mean reverting (e.g., growth, ROE) are a great source for improving a value ranking. Ranking first on cheapness and then on financial stability, fraud risk, etc. seems like a good idea. It’s going to be hard to construct a portfolio of such high-quality net-nets, however, because there just aren’t that many left (at least in the US). You really don’t have the luxury to exclude distressed, unprofitable and scary net-nets, because they all are (almost). And if they look like perfectly fine businesses to you, you are very likely overlooking something important. You are then left with stocks that look superficially healthy but have an intrinsic value of zero due to fraud, lawsuits, etc. Always remember that net-nets work because they look like such a bad idea to put your money in.

In „Graham’s Net-Nets: Outdated or Outstanding?“, James Montier states that net-nets experience a total loss (90%+ decline) in roughly 5% of the time. (I think you referred to this statistic in a comment of yours.) This makes for some interesting (i.e., counter-intuitive) statistics using Bayes‘ theorem. Let’s assume there is a score that can predict a 90%+ decline with 80% accuracy. Let’s also assume that such a test has a false positive error rate of 10%. What is the probability that a flagged stock will decline 90%+? And what is the probability that a flagged stock will not decline 90%+?

The probability that a stock will be flagged by that measure is

P(positive) = 0.8*0.05+0.1*0.95=0.135.

The probability that the stock will decline by 90%+ conditional on the fact that it is flagged is

P(loser|positive) = (0.8*0.05)/0.135 = 0.30

The probability that the stock will not decline 90%+ conditional on the fact that it is flagged is

P(not loser|positive) = (0.1*0.95)/0.135 = 0.70

That means that it is more than twice as likely that a stock that this score identifies as a loser will not be a loser than that it will actually be a loser. As there are very few net-nets available, this will very likely worsen your returns. (Here, in contrast to Warren Buffett’s famous saying, there are called strikes in investing.)
Good quality measures are much more useful in EBIT/EV based deep value algos, I think. There you have much more available opportunities and false positives aren’t as costly.

To improve the results of value stocks you could select these stocks based on the following additional criteria:

Asset growth:
Companies that show high asset growth tend to perform worse than companies with shrinking balance sheets.
This effect is not often documented in the value literature, although it is a strong effect.
see here for the U.S. stock market:http://www.krannert.purdue.edu/faculty/hgulen/asset_growth.pdf

“We study the return predictive power of asset growth related measures in the MSCI World Universe, which includes all the developed markets. We find strong return predictive power of asset growth related measures in these markets. This power is particularly stronger for two-year total asset growth rates and is robust to adjustments of size and book-to-market. It is also robust across different subsample periods, in different geographic regions, and among both large and small stocks. We further find that two-year total asset growth rates have the ability to generate abnormal returns for up to four years after its initial measurement period.”

I just noted an additional study about the asset growth effect. I found the study “What is Behind the Asset Growth and Investment Growth Anomalies” which concludes that there are some data errors in the asset growth studies ! !

See the following summary:
“Existing studies show that firm asset and investment growth predict cross-sectional stock returns. Firms that shrink their assets or investments subsequently earn higher returns than firms that expand their assets or investments. I show that the superior returns of the low asset and investment growth portfolios are due to the omission of delisting returns in CRSP monthly stock return file and that the poor returns of the high asset and investment growth portfolios are largely driven by the subsample of firms that have issued large amounts of debt or equity in the previous year. Controlling for the effects of the delisting bias and external financing, I do not find an independent effect of asset or investment growth on stock returns.”

It is reasonable that issuance of large debt or equity amounts are bad for subsequent stock performance.

It is a bit frustrating to notice that an effect (asset growth) is probably for some part a data flunk.

I spent the last days getting into this problem. As survivorship bias is of vital importance, especially to contrarian strategies, this merits a much deeper investigation. I found another study on this topic that summarizes various issues: Boynton and Oppenheimer (2006) “Anomalies in Stock Market Pricing: Problems in Return Measurements” http://www.jstor.org/stable/10.1086/505246

While it is natural to be frustrated by false results, remember that if everything would be totally clear, there would be no alpha left. If you think about it, the whole reason why we might be able to achieve superior returns is because others make some form of mistake. You can’t simultaneously bet against their mistakes and expect them to give you the right answers.

Hi Michael, thanks so much for the follow up. Frustrating yes, but thanks to you we now know that studies can’t be relied on entirely. That is frustrating. Will someone just give us an answer that’s correct?! 😉 To me the quality metrics tested in Quantitative Value would be the place to look! 🙂

We show that tests of market efficiency are sensitive to the inclusion of delisting firm-years. When included, trading strategy returns based on anomaly variables can increase (for strategies based on earnings, cash flows and the book-to-market ratio) or decrease (for a strategy based on accruals). This is due to the disproportionate number of delisting firm-years in the lowest decile of these variables. Delisting firm-years are most often excluded because the researcher does not correctly incorporate delisting returns, because delisting return data are missing or because other research design choices implicitly exclude them.

“Return Reversal in UK Shares
We investigate whether shares that have experienced extreme stock market performance over a five year period become mis-priced leading to opportunities for exploiting that mis-pricing. The results cast serious doubt on the efficiency of the UK stock market to price these shares. This means that managers of these companies may be receiving distorted signals from the market, which will have a knock-on effect on a range of managerial decisions, from the estimation of the equity cost of capital to the timing of share issues. More specifically we find that the 10 per cent of shares with the worst total share return record over five years go on to produce the highest returns in the subsequent five years with an average out-performance of 8.9% per year – defying those who would write these companies off as the ‘dogs’ of the market. Those that are currently flying high on the market (the 10% best performers over five years) go on to under-perform by 47% over the next five years – suggesting that the ‘darlings’ of the market tend to become over-priced leading to the possibility of excessive resource allocation in their favour.”

Yes, this contrarian premium has also been found in the US market. Originally reported by De Bondt and Thaler (1985) , Boynton and Oppenheimer (2006) correct for delisting and rebalancing biases:

“For the rebalanced return, C1 (past losers) earns 1.94% per month (26.9% per year), and C20 (past winners) earns 1.16% per month (14.8% per year). There are strong contrarian premia, especially during the January 1930–December 1947 and January 1984–December 1992 periods. For the corrected returns, the C1 monthly return falls to 1.68% (22.1% per year), and the C10 monthly return falls to 1.08% (13.8% per year). The majority of the corrections occur during the January 1930–December 1947 and January 1984–December 1992 periods. The log-corrected C1 return is 1.21% (15.5% per year), and the C20 return is .74% (9.3% per year). For C1–C20, the 72-year rebalanced return is .78% per month (t-value: 3.79), the corrected return is .60% per month (t-value: 3.11), and the log- corrected return is .47% per month (t-value: 2.75). For the corrected and log- corrected returns, the premium is earned in earlier portions of the sample. Contrarian strategies have neutral performance in more recent years.” (p. 2618)