Limitations of Predictive Analytics: Lessons for Data Scientists

The evolving technology of Predictive Analytics is opening new possibilities for predicting future events by studying past performance. Now that Big Data enables Data Scientists to review massive amounts of data, users can hope that the degree of accuracy in future predictions will only rise.

Yet, actual field tests reveal a different story. The element of “surprise” is so high in Predictive Analytics that even the best of algorithms, computational models, and analytics tools can lead to complete failure in some cases. There are situations where even the best Data Scientists failed to take all the “unknown variables” into consideration, which ultimately led to incorrect data-driven predictions. The Polling Prediction Failure

In Trump, Failure of Prediction, and Lessons for Data Scientists, readers will find that different pollsters predicted Donald Trump’s winning chances between 15 and 30 percent, and as everyone found out later, they were way off in the “game of probability.” The statistician Salil Mehta sent warning messages about the unreliability of polls, and the state-based vote count method in a large country like the U.S. made things even more complicated for polling. The failure of nearly all data-enabled pollsters predicting the outcome of the 2016 US Presidential Elections gave a big jolt to the global Data Science community.

In Six Data Science Lessons from the Epic Polling Failure, the author discusses some of the main reasons why the polling forecasts, conducted scientifically with advanced algorithms, failed to predict the actual results of the election. As Bill Schmarzo, CTO of Big Data Practice at EMC Global Services mentioned, “There’s tons of Big Data that’s used in these polls to try to predict, on a county by county basis, who’s going to show up.”

The Data Scientists this time, despite refining their predictive models, missed the mark. Many Data Scientists have admitted to the limitations of Predictive Analytics – their models lose the magic touch over time, and it takes tremendous mental concentration to make them work right. The big realization that comes with polling prediction failure is that in contrast to other types of Big Data applications, in politics, what voters say they do and what they actually do are quite different. Psychological behaviour plays a bigger role than demographic traits.

In How the Election Shined a Spotlight on the Limitations of Predictive Analytics Technology, it is demonstrated that failed predictions often remind users about the risks involved in relying solely on Predictive Analytics, even if is backed by the unlimited scale of Big Data. This article states that Donald Trump’s unexpected victory “illustrates the pitfalls of flawed assumptions, wide margins of error, and lack of context,” which may be present in data-driven predictions.

The article titled A Narrow View on Big Data and Analytics from McKinsey brings up an important point. It states that the current “obsession” with historical data analysis for insights has stymied the growth of intelligent data models that could accurately guide future decisions. The danger in approaching a data discovery process based on desired outcome is that the analysis becomes more “prescriptive” than “predictive,” which is really the goal of all political predictions.

Business Predictions are Different from Polling Forecasts

What is true for business (and also sports) is often not true for politics. The gross “uncertainty” of polling behavior is too complex to predict. In case of business or sports, vast amount of historical data can help spot trends and patterns that are likely to influence future course of actions. In sharp contrast, the past or present political data can neither reflect accurate polling traits nor can it help the Data Scientists to make assumptions about the future polling actions.

The article Pros and Cons of Predictive Analysis reconfirms that while a good understanding of Predictive Analytics can enable the Data Scientist to make accurate business forecasts, this science falls short of expectations when it comes to politics.

Industrialized Analytics Implications of Large-Scale Predictive Analytics Models talks about Cisco using “propensity models” to make forecasts on a customer’s future buying behaviors. Even in this article, the author warns that Data Scientists often miss out on important variables in their predictive models, which perhaps alter the buyer behavior later. Thus, in customer analytics too, the limitation of Predictive Analytics is apparent.

Lack of Understanding Human Behavior Affects HR Analytics

The same problem crops up when Big Data is applied in Human Resources issues. The usual HRMS data that becomes the cornerstone of Predictive Analytics cannot guide the Data Scientists to making accurate HR forecasts.

In HRMS and the Limits of Big Data, the author warns that currently, the HR function understands very little of individual personalities that, to a large degree, control human behavior. Even with all the latest algorithms, models, and employee performance metrics, HRMS data often fails to predict future human actions accurately. One belief in Data Science is that traditional HR departments will begin to reap the benefits of Big Data and Predictive Analytics once their own manpower invests enough time to develop technical savvy, along with people knowledge.

McKinsey’s The Benefits and Limits of Decision Models says that “decision models” have the power to exert an indirect influence through the outcome of the predictions. As models are usually used to predict outcomes beyond human control, this power becomes even greater when the predicted outcomes and actual outcomes match up. Data-driven predictions can often be used to influence or alter human behaviour, though the extent of that influence is not scientifically known.

In business, C-suite executives combine vast amounts of data with sophisticated algorithms to arrive at crucial decisions to shape corporate performance. In case of business, Data Science methods help business leaders to overcome biases that would otherwise color individual judgment. The implied limitation of Predictive Analytics here is that indirectly, it can alter outcome although that is not the goal.

Democratic Data Science: Self-Service Predictive Analytics

In the Self-Service Analytics world, every business user is a data analyst. This type of technology puts immense power in the hands of the ordinary business user. The average user, with little or no knowledge of Data Science, can play with their data on Self-Service Business Intelligence (BI) platforms with no intervention from IT or a BI team. In this scenario, the BI tools allow the user to plug in the data, perform queries, and generate reports mostly through templates.

KDNugget discusses the Advantages and Risks of Self-Service Analytics, where the identified risks include lack of properly trained manpower to use Self-Service BI tools, absence of timely verification of tools for their accuracy, data inconsistency, and lack of Data Governance. This article contains some Data Science lessons that data analysts need to think about.

About the author

Paramita Ghosh has over two and a half decades of business writing experience, much of which has been writing for technology and business domains. She has written extensively for a broad range of industries, including but not limited to data management and data technologies. Paramita has also contributed to blended learning projects. She received her M.A. degree in English Literature in 1984 from Jadavpur University in India, and embarked on her career in the United States in 1989 after completing professional coursework. Having ghostwritten and authored hundreds of articles, blog posts, white papers, case studies, marketing content, and learning modules, Paramita has included authorship of one or two books on the business of business writing as part of her post-retirement projects. She thinks her professional strength is “lifelong learning.”