A place where models, software engineering and the science of decision making come together to make the future a better place

Tuesday, June 9, 2015

Why your predictive analytics solution might just not work

I got a fair bit of interest and comments with my previous post on how the predictive analytics journey can feel a bit like a roller coaster.

To summarize, the four main phases in building a predictive solution are
-
Data munging => Sorrow
- Getting the first model output => Joy
- Realizing that the model still needs a lot of work => Sorrow
- Applying a seemingly imperfect model and seeing impressive customer or business impact => Joy

I also want to add a fifth phase to this - which is deploying the solution. Which in itself can be even more challenging and frustrating than actually building the solution.

In many organizations, the systems supporting the development of the predictive solution and the systems required to implement or deploy the solution are quite different. Analytical data warehouses (mostly the foundation for analytic model builds) are often optimized to generating insights and capturing additional insights for others to leverage. Therefore they contain lots of transformed variables and data. When these analytic stores are used to build models, these transformed variables invariably make their way into the model. Operational systems are optimized for speed of transaction processing and are therefore quite removed from the analytical data.
The picture looks somewhat like the one below.

Fig 1: How analytic and operational data can diverge from source systems

This disconnect between analytic systems and operational systems can be a BIG problem when it comes to monetizing the analysis. When you go and try deploy the cool analytical model you just built, it is either really difficult to productionalize - or even worse, it cannot be implemented without considerable watering down.

This could be because:
- The transformations in the analytic data need to be re-done using the operational data, and that introduces delays in the deployment process
- Some of the data elements available in the analytic store might be altogether missing in the operational store

So that's delay in either getting to market or a fraction of the true value gets realized.

As a person who passionately believes in how data and analytics can change consumers life for the better, this is the most frustrating trough in the overall journey.
Interestingly this last trough is a massive blind spot in the cognition of several well meaning predictive modelers. Unless you carry the battle scars of trying to deploy these models in production, it is often to even conceptualize that this gap can exist.

Got to remember this though. Unless you are able to deploy your solution to a place where it touches real people, things really don't matter a whole lot.
My next post will cover how to close some of these gaps.

Krish Swamy - practitioner of predictive analytics

I am a quantitative practitioner of predictive business analytics. My job gives me the opportunity to indulge in my passion: using quantitative approaches to solve business problems and understand human behaviour. My specific skills are using regression and other statistical inference techniques.