Data-themed articles, essays, and studies

Surprised, and…

I’m surprised by the election outcome, but unsurprised by a stupendous analytics failure in predicting this result. In fairness: most election models did not certify a Trump defeat. They simply and wrongly indicated it was highly probable.

Amongt the histrionics I read this morning was the statement that “data is dead,” which is a bit much. However, as I’ve remarked before, modern analytics can focus too much on the process- and computer-driven aspects of modeling, rather than good data assessments and whether the questions being asked align with the answers we’re giving. And now we know: the questions being asking of polling data could not predict the true electoral outcome.

Time will tell, but I strongly suspect that the real election result was there all along – people just never bothered with good question and data asssessment, in part because the numbers seemed to deliver the result that everyone expected. And then the numbers took over…

Been there, done that. I’ve often worked with good people and organizations who use analytics largely to confirm what they already believe is true. I understand, but also say that’s not really enough reason for an analytics investment. Surprises make an investment really worthwhile, if we can convince people that those surprises represent reality. Convincing people of the unexpected is less about the apparatus of transformations, reductions and nominal predictions that constitute formal data analysis, and more about good data questions and data assessment. For people have to agree on and have a common understanding of the question at hand, or all the algorithms on the planet are irrelevant – even the most amazing outcomes will simply be disbelieved.

That’s irritating in a way – algorithms and processes are comfortable. But they are not dominant, and analytics assessment is often difficult and frustrating, particularly when we discover we’re not answering the question we really wanted to understand, or do not have the information on hand to do so. I still find many people want to believe that computers will somehow deliver answers without much human intervention. Well, they can’t, and they won’t. Perhaps yesterday’s result was a wake-up call for us data users- we can’t trust the numbers when we don’t know the question they answer. If so, that would be a very good thing.

Post navigation

4 thoughts on “Surprised, and…”

While it is still early as far as final tallies go, there is still a strong possibility that Trump will lose the popular vote (at 6:30am Trump is only ahead by 35,000 votes with probably ten times that much margin for Hillary still outstanding in California). I would argue that polling the popularity of the candidate when we don’t choose by popular vote, diverges from being indicative the closer the divide is in the country.

The other big surprise is that Kelly Ann Conway did get it right, sending her candidate to key counties at the last minute in states the herd of pollsters had already written off as a lost causes. She correctly identified the micro opportunities in the individual areas of states that could flip an otherwise safe state. She appeared to be looking at the answers to the right questions.

Great point. Trump would not be the first president-elect to lose the popular vote, but the electoral college is unlikely to disappear soon.
I agree with your observation about Conway – quite possibly the information was there for those willing to look hard enough. I suspect it was, but can’t be sure at this point. We’ll need to hear more about that.

You didn’t work the election. I did. This was a pure and simple case of GIGO. The number of “undecideds” was stupendous even up until yesterday morning. Whether they broke for Trump or stayed home will have to be determined.

Analytics cannot function with large amounts of null responses. Being a human enterprise, as you say, when the analysts are forced to provide an answer, assumptions have to be made and that’s where biases can turn the data sour.

Excellent point – the dynamic component was quite possibly greater than the reported difference. However, this is something that could be detected (as you did…) and in that case analytics models should report a “null” result, which they did not. So while I agree that GIGO is relevant, part of the job of analytics is to identify and report on that situation in predictive scenarios.