Can We Predict the Future?

Machine Learning is powerful, and can be used as a predictive tool in select cases. But many misunderstandings exist, and lessons to be learned before you start down this the seemingly magical Machine Learning path. (04:30)

Well, maybe. Machine Learning is powerful, and can be used as a predictive tool in select cases. But many misunderstandings exist, and lessons to be learned before you start down this the seemingly magical Machine Learning path.

As with any new tool or technique, expectations of the value and precision run amok, unless explained. We have implemented ML a number of times, using several different tools, often with promising results, but always complicated by misunderstandings and preconceived notions that challenge attempts to foster consensus.

Machine Learning (ML) tools and techniques have many possible uses, including detecting patterns that will likely repeat themselves. This is a powerful concept, grounded in statistical methods that reinforce its relevance, and numerous practical commercial uses exist. Recent releases of cloud-based tools make ML even more approachable.

Let’s explore a few misconceptions that we have noted in the last year or so.

ML is magic that few understand. On the surface, many will imagine that ML is a magical utility, using some modern form of pixie dust to crunch huge volumes of data using methods that only Ph.D. scientists could understand. In the end, they can actually predict future events. Not true. To be fair, we (the IT community) have not done everything we can to dispel the myth, perhaps clinging to the romantic notion that we will finally find a way to over-please our business leaders. Instead, if we decide to invest in ML, we need to explain the concepts, input, limitations and expected value. As with many elements of IT, our success will be measured, in part, by the expectations we set.

ML is accurate, first time, every time. This is another flavor of misaligned expectations. ML programs are often iterative- growing over time through consistent research and incremental improvement. We have been part of several proof-of-concept projects where the definition of success was the accuracy of pinpoint predictions. As we all explore ML, we should reinforce that long-term value will come from an organizational commitment to continuous learning – much like successful analytics programs. The payback will be huge once all the executives align their thinking – that ML starts with research and grows through iteration.

ML is an algorithm. Many imagine that ML is a transaction processing utility, an algorithmic version of a huge spreadsheet, into which facts are entered, and direct answers are provided. This leads to the incorrect conclusion that individual events can be “pumped into” an ML system and specific recommendations for improvement are provided. In fact, ML is often most accurate, and valuable in a business setting, when data is analyzed in aggregate rather than in specific. For example, predicting whether a particular call into you call center will result in cancelation is difficult. But predicting which marketing campaign will be effective for a select demographic can be very powerful. As you launch your ML program, look to define the use cases in your organization that are most practical, and be clear about those that are not a good fit.

Get everything organized before starting. This is a classic tenet of waterfall-style development – that deep analysis and careful documentation is needed before investing in technology. In this case, bad idea. As mentioned above, ML is best done iteratively. But, on the other hand, the quality (and structure) of the data matters greatly. Surely, “garbage in garbage out” applies here, particularly with such a magical black-box like ML? True, but you can get started in parallel. Let us take an example: imagine that you are trying to correlate product placement with sales uplift. Seems like a good use-case, with lots of available data. But, in this case, it would be best to have a full year of data, for multiple products, for multiple locations, to allow for seasonality. And the sales data and store locations must be accurate. You could work for six months to get all the input organized and cleansed, or you could get started, improving quality and breadth in parallel. Take a subset of data, manually cleansed if you have to, and attempt to correlate factors. Demonstrate the potential. Small insights will result, and you can direct the next iteration. Virtually every company would love to have highly accurate data in their data warehouse, but few do. Think “minimally viable product.” A simple POC (properly explained) can go far to build confidence and spark imagination on possible applications.

ML is a powerful set of concepts and methods, and the tools are becoming productive and practical. We recommend experimentation, research, and the all-important concise communication that will help set expectations. Value and use cases are everywhere and have the potential to be highly differentiating. Start small and give it a try.

What misconceptions have you uncovered in Machine Learning?

Written by Mike Strange, Office Managing Vice President of Pariveda Los Angeles