Bad Data Mining

I keep promising to stop writing about lessons from the election that are applicable to markets, and then I keep finding more examples. So rather than make any promises I cannot keep, let’s just jump right into this.

Since Donald Trump’s surprise victory — though it wasn’t a surprise to those of you with the power of hindsight — there have been numerous after-the-fact explanations for why Trump beat Hillary Clinton. Many appear to be delightful exercises in data mining, the finding of “historical patterns that are driven by random, not real, relationships.” Add to this the assumption that these explanations are durable and will repeat in the future, and you have the makings of a terrible investment process.

None of these elements “predicted” anything. Each was the result of an analysis of what had already occurred. Post-election, data was sifted, a midpoint in each data set was located where a majority of Trump voters resided over Clinton voters, and a conclusion was reached.

This is classic data mining, and it should never be relied upon to make future forecasts.

There is an increased craving to slice and dice the recent election data, particularly given that the major pollsters have been shamed as they all immensely errored in projecting this year’s election’s victor. All gave President-elect Trump <15% a faux probability of winning. The risk of now retorting with data-mining this single election result is that they often miss an analysis of the predictive errors in this unique match-up (e.g., record high undecideds on Election eve), don’t take into account budding geospatial patterns to validate evidence, and in most case none of this should deceptively be promoted as an election forecasting model.

Correlations are very different from what is required to create a reliable model that correctly forecasts a future election or investing outcomes. Rather than mine data, Mehta suggests instead we engage in hypothesis testing.

The obvious parallel to investing is the myriad of back-tested strategies, many of which engage in similar sorts of data mining as the recent election post-mortems do. They seem to work perfectly in the past, but they are less robust than desired. Models that inform us of what has already happened but not what might occur in the future are of limited value.

Cliff Asness of AQR warns us not to confuse factor investing with data mining. He notes that French-Fama factors such as value, momentum and size have all been tested out of sample and proven to be robust. Out-of-sample testing could verify if an election model’s backtest is valid: Take the five data claims above, then apply them to Obama versus McCain or Bush versus Gore to see if they are at all predictive. The same is true for investing models. To avoid poorly constructed models that are form-fitted to past experience, apply them to different data sets than the test.

If a gold mine is a hole in the ground with a liar standing on top of it, a successful data miner is a quant with a data set lying to himself. You probably have never seen a sales pitch that didn’t have a back test “proving” market-beating returns. If only you had a time machine to go back to the period of time covered by the data set.

Investing after the fact is easy. Investors should be cautious when presented with results that only tell you what just happened, not what is about to occur.

_________

Lots of “multicolinearities” — economic inequality, poor health, low educational attainment — may be associated with Trump voters, but they are not likely to forecast the next election. For example, higher education (and therefore better health and possibly higher income) might present a proclivity toward voting red or blue, but as Mehta points out, not all college degrees are created equal. Some generate much greater potential future incomes than others (“nonheterogeneous”).

Say Hello

As Seen On

Masters In Business

Bloomberg View

The Washington Post

Bailout Nation

How Greed and Easy Money Corrupted Wall Street and Shook the World EconomyLearn More...

Search for:

Quote of the Day

Friends can help each other. A true friend is someone who lets you have total freedom to be yourself - and especially to feel. Or, not feel. Whatever you happen to be feeling at the moment is fine with them. That's what real love amounts to - letting a person be what he really is.

Jim Morrison

Sign Up for My Newsletter

Get subscriber-only insights and news delivered by Barry every two weeks.