Misleading correlations

The compelling nature of today’s highly-polished BI and advanced visualizations inflates the credibility of conclusions that appear to be supported by big data analyses. After all, if we can manage to collect and properly condition volumes of data, in varied forms, at a rapid pace, in the face of uncertainty, then the end result must accurately represent the insights we need to manage our enterprise assets and business processes—right? Well, this impressive slicing and dicing of big data into new insights can be leading us fast astray.

To leverage the IIoT, some businesses are busy tagging equipment, installing sensors, deploying data collection systems, and building big data management systems. More and more assets and their components are tagged with 2D barcodes, RFID tags or GPS. Scanners, smart meters, fleet tracking, and mobile devices are linked together wirelessly and used extensively in maintenance and field auditing. ERPs have evolved to ingest these growing volumes of data. Machine to machine communication is rising rapidly further automating the generation of big data: data streams are being generated without human intervention.

After rapidly generating, collecting, and conditioning mountains of big data to transform business practices, the next step is to make sense of it all. This effort yields holistic views of enterprise data that were previously unavailable. Where we take it from here can lead into dangerous waters.

Suddenly we spot patterns, relationships and connections that promise breakthrough revelations. We can start using these newly found insights to strategically shape our operations and draw high ROI out of this intricate big data management investment—right?

If we skip rigorous predictive analysis at this point, we risk letting our shiny new big data insights convince us to believe that interesting correlations represent more strongly bonded cause-and-effect links. Is the new maintenance concept recently adopted by our east coast repair facility really behind the improvements in equipment readiness that suddenly appeared at about the same time? Are the two loosely connected through a yet-to-be-discovered driving factor? Are we observing a mere coincidence? Is the new maintenance concept really a disaster that has yet to manifest itself in our historical data? Does it make sense to adopt similar maintenance practices on the west coast and overseas?

The decision-maker’s human nature is likely to push hard towards overstating the meaning of correlations—especially when they’re new and interesting. After all, we have to make use of all this costly big data effort, and we can see the correlation clearly in our wonderful, multi-dimensional dashboard. We really need to know if we should be updating maintenance practices in future operations based on what we’re seeing in our newly found data visualizations. Never fear, we have data scientists that can make sense of these findings. Let’s run down their analysis options:

We could use ML. We’d either have to make the best out of the predictor devised from the existing set of historical data, or wait a little while the ML algorithm observes new samples and re-learns to improve its prediction–all the while it is predicting in the rearview mirror.

We could skip the reactive learning approach and come up with statistical mode that makes the best use of the data we already have. But we’ll need to segment our data sample or collect more. If we collect data over the next few months, can it really be mixed with last quarter’s observations? Will the model be valid?

We could apply traditional forecasting methods. We’d smooth out the peaks and valleys in our data plots and generate a clean, tidy trend line showing where we’re headed into the future. Wait, do all those simplifying assumptions in traditional forecasting hold true? Our business is global, multi-layered, full of uncertainty, and never in steady state. Does it make sense to use a simplified model?

What about just sticking to statistical inference? Can I check a hypothesis about whether or not our new maintenance practice if favorable? Which significance level is best? In case we’re wrong, how much risk can we accept? If my model is powered by history, does it apply to the future?

We haven’t really started to account for some other factors about our future operations that may be important:

Next month we’ll start a refurbishing program to update some old equipment.

Yesterday, maintenance crews in the field reported observing a failure mode that could possibly be new.

Some of our equipment just exceeded its intended life cycle, but seems to be operating just fine.

We don’t know it yet, but next quarter’s environmental conditions (beyond our control) will cause us to temporarily shut down a few facilities while we increase operations elsewhere.

Suddenly the complexities in our analysis just grew into a multi-dimensional monstrosity. And we have data—lots of it. It tells me a lot about the last year—which, by the way, looks nothing like the years ahead. And we found correlations in that data—lots of them. They look great on our dashboards. But what do they mean? Can we believe them?

Well, if our world is so complex that these fancy data science techniques and the huge investments in big data management can’t possible capture it all—should we go back to simple spreadsheet calculations and hope for the best? Either way, our predictions will be wrong—so why not save on the effort and use those resources to recover from the inevitable surprises: Maybe we should return to reactive operation–putting out the closest fire, and fighting the nearest alligators, like we’ve always done?

Is there a way to generate data from future operations and expand our modeling view holistically while ensuring a high degree of detail? Yes, there is. We’ll describe this approach after we look at pitfalls of traditional forecasting and BI-driven strategy in the next two sessions.

Related Posts:

About Serg Posadas

Serg is VP of Industry Solutions at Clockwork Solutions. A published author and frequent speaker at conferences, Serg has over 20 years of experience in applied advanced predictive analytics deployments in industry and Department of Defense environments.

Primary Sidebar

Browse

About Clockwork

We are passionate about applying advanced analytics to meet real-world business problems.

Established more than 30 years ago, Clockwork built on the simplicity and accuracy of the modeling and simulation methods pioneered during the Manhattan Project. The result is an advanced analytics platform designed specifically for enterprises and defense organizations. Our modeling, predictive analytics, and Discrete Event Simulation technologies help you manage your capital-intensive, strategic assets delivering deliver high-fidelity insights to improving asset performance and reducing costs.