Predictive Analytics Takes a Victory Lap

Summary: Over the last eight years predictive analytics has become a fully mature technology with wide adoption among the largest and most successful companies. The Advanced Analytic Platforms we have to make our work more effective and efficient also show substantial improvement.

Predictive analytics combines the core disciplines of data science to do the everyday heavy lifting of predicting consumer behavior and forecasting future values, plus a lot of other really valuable niche stuff like predictive maintenance and medical diagnoses. That’s well over 90% of all the jobs, projects, and value creation happening in our profession.

This deeply foundational set of tools is also referred to as classical statistical data science. Although deep learning does much the same thing it’s getting all the hype thanks to its underlying much of what is called AI these days.

But while deep learning and AI are getting all the press, we wanted to remind everyone how successful, ubiquitous, and mature predictive analytics has become.

How Many Companies are Using Predictive Analytics

When I first started writing about data science in 2013 I found only one study credited to Gartner that concluded that 12% of US companies had adopted predictive analytics. That sounded very low but tended to agree with my experience in finding new clients. At least 8 out of 10 companies I spoke to needed to understand what predictive analytics was and why they should use it, not why they should buy more.

We may never be able to answer this question with complete accuracy. It’s very tough to design a survey that gets at this and just as tough to find someone to pay for it. It’s tough because some respondents still won’t know the difference between predictive analytics and simple BI. It’s tough because the implementation of predictive analytics especially in large companies is broken up into many small groups and spread out in subsidiaries, division, or lines of business.

Especially on this last challenge, how should they be counted? Each of those groups is an opportunity for a new data science hire or a new predictive analytic project.

The most recent number I find credible comes from a 2016 Accenture study that says 40% of companies have adopted. That sounds about right.

What that surely means though is that 100% of the largest companies have adopted and that the ranks get thinner as the companies get smaller. But then we come back to the question of subsidiaries and divisions. Do we count General Electric as one or several hundred which more accurately reflects the penetration in that company?

Another way to get at this is by looking at the number of data scientists at work. This is a little easier thanks to resources like LinkedIn but still by no means definitive. Although there is still a structural shortage of data scientists there are now about 54,000 LinkedIn members with that title, about 23,000 of those in the US. That’s up from a reported 12,000 in 2014 (it’s unclear whether that was US or worldwide). Still that’s enough to put 23 in each of the 1,000 largest US companies. That indicates pretty wide spread adoption and doesn’t begin to account for people with different titles doing predictive analytics.

If we ever do get a good answer it will need to include some measure of penetration. You can still find smaller companies that don’t utilize any predictive analytics but they’re getting tougher and tougher to find. You can find a lot of organizations, including divisions in large companies that are under using predictive analytics. That’s the opportunity.

Predictive Analytics is a Fully Mature Technology

Calling predictive analytics a technology seems too narrow. Maybe something like ‘discipline’ is more accurate. However, our chief historian, Gartner, labels it a technology. Gartner’s Hype Cycles and Magic Quadrants are as close as we get to an objective view of our past.

I had hoped to go back at least a decade to track predictive analytics across their hype cycle but alas, although there were ‘data mining’ or other data related hype cycles back at least 15 years, Gartner kept reorganizing and renaming making tracking impossible before about 2014.

To prove the point about maturity, we should be able to track predictive analytics as it moves inexorably from the Trough of Disillusionment, across the Slope of Enlightenment, arriving finally on the Plateau of Productivity. And in fact that’s what we found.

From 2011 to 2013 predictive analytics marched solidly to the right with a prediction of full adoption within one to two years. In 2014 it fell off the chart leading us to believe that full adoption (whatever that means to Gartner) was at hand.

Holy Cow! What Gives?

To be a completely honest historian, the story doesn’t end quite there. Miraculously in 2015 and 2016 it reappears at the top of the chart in the Peak of Inflated Expectations?

What gives? Has our source of objective ground truth gone rogue? Well kind of. You may or may not be aware that Gartner has multiplied the number of hype cycles very rapidly. The best known and perhaps most followed is the Hype Cycle for Emerging Technologies. That’s reflected in 2011 through 2014. Predictive Analytics continues to be absent from these charts thereafter showing heavy adoption as an established technology.

2015 and 2016’s data points however come from brand new hype cycles for those years under the title “Hype Cycle for Advanced Analytics and Data Science”, shortened to just “Data Science” in 2016. In 2015 there was an essentially identical one called “Big Data” particularly notable for the fact that ‘Big Data’ itself was no longer a data point.

In 2017 the trail disappears into a thickening fog. Gartner has announced that there are now seven separate hype cycles for data and analytics.

Unfortunately, we can’t find a good explanation for putting predictive analytics back at the top of that precarious curve. If there was some new element introduced like streaming data or semi-structured data that increased risk it should have occurred many years ago. And both those enhancements are represented by other technologies on the 2015 and 2016 Data Science Hype Cycle charts. Maybe these guys just aren’t talking to each other.

Our Tools Are Better

Leaving aside this minor inconsistency, our core predictive analytic tool chest has become remarkably robust and mature in just the last few years. We compared the four years between 2014 and 2017 for the Gartner Magic Quadrant for Advanced Analytic Platforms which had reasonably stable definitions during that period. The arrows show the path from their 2014 position to their current rating.

Our four leaders, SAS, IBM, Knime, and RapidMiner have grown closer together and show incremental improvement in both their ability to execute and their vision. Most if not all these platforms now fully integrate text, semi-structured data, and accept additional code in R and Python.

There are now 8 choices ‘above the line’ in their ability to execute.

Six new competitors have entered the field (those without arrows) including two already ‘above the line’ in their ability to execute.

Some older competitors have dropped off during this period but newer competitors who perhaps have read our needs better have emerged to make our professional lives better.

The small declines for SAS and IBM seem to be related to the fact that they both now offer so many choices that it can be tough to figure out which packages to use.

There’s always room for improvement and always new opportunities to apply predictive analytics. We’ll continue to increase the number of data scientists at work and increase our penetration in business and the public sector. But on the whole we can say that predictive analytics has more than proved its worth and is now a mature technology found everywhere in companies and organizations throughout the economy. Predictive Analytics, take a victory lap. You earned it.

About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at:

Thanks for the reminder of how data science has evolved. There is no dispute in my mind about the great economic value of statistical science brought to commerce while acknowledging the fact that we are facing a world of disruptive changes. Regression and prediction will go as far as there are historical data and patterns. As we are saturated by this well founded science, companies are discovering the unpredictable reality caused by disruptive new economics such as crypto currency, shared economy, geo-political events and much more. Following past data models to guide business execution is no longer sufficient for tactical strategic execution by business executives. We see examples like the recent security breach by EQUIFAX, Wal-Mart and other big retailers competing to beat Amazon in online retail, grocery deliver, UBER's rise and fall, .... It is this kind of challenges that made us to start asking whether there are other science that can add value to the playbook of business executives.