I’m Sorry CXOs, but You’re Mostly Doing Analytics All Wrong

We tend to talk about digitisation a bit like our parents and grandparents talked about sex in the sixties. Like we invented it.

Only we didn’t.

The first wave of digitisation arguably started in the late 80s and early 90s with the client-server revolution. During the late 80s and early 90s, organisations shovelled money at Information Technology (IT) like it was going out of fashion. But as Robert Solow pointed out in 1987 when he wryly observed that “you can see the computer age everywhere but in the productivity statistics,” at the level of the whole economy productivity actually decreased over this period.

There is no return on investment in technology until we deploy in production and change the way we do business - and Gartner’s research indicates that at least 65% of the predictive models built in organisations today are never implemented in production. For all of the caffeine-driven work done in all of those innovation labs, most of the insight that is generated never leaves the lab except as PowerPoint. And since you can’t connect a slide deck to an operational business process, mostly we are not changing the way we do business - and so are never likely to generate any ROI. As a former boss of mine once remarked when declining to sign-off one of my less well thought-through project proposals 20 odd years ago, “old business process + expensive new technology = expensive old business process."

Business leaders need instead to start to think about Analytics as 1-2-3.

#3 is for deployment and operationalisation. There are several different analytic operationalisation design patterns that make more-or-less sense, depending on the use-case. But in very many cases, scoring predictive models in production and at scale requires us to performantly process multiple features and to ship the result (typically a prediction or a next best action) to a connected operational touchpoint. A feature store built on a high-performance, Enterprise grade, parallel Data Warehouse platform gives us scale, performance, high availability and reliability, channel integration – and, of course, rapid access to all of the variables in the Feature Store. And even where this design pattern doesn’t make sense – for example, where performance or network considerations mean that we need to score models at the “edge” - we still need to instrument these models to understand where they have been deployed and the predictions they are making, because predictive models have a shelf-life and need to be maintained, updated and, ultimately, retired.

#2 is for model training. It’s the sexy bit, where we get to geek-out with cool technologies and to play with a dozen different algorithms (Support Vector Machines are soooo last year, dahling). And that’s a problem – because as an industry, we have dramatically over-rotated on the fun part at the expense of #1 and #3. That means that the productivity of our Data Scientists is lousy, our time-to-market is measured in months (if we’re lucky) – and that in 65% of cases, we never even get to production. For the foreseeable future, you will probably need to use multiple different technologies to support your model building activities. But you should insist that those tools pull data from the feature store, that any new features created are added back to the feature store – and that models are published in a format that can be consumed-and-run on your centralised and integrated data platform or an Edge Node. No exceptions. None.

Machine Learning and AI really are going to become ubiquitous – and really will be the basis of competitive advantage in most industries. And that means we’re going to have to scale them; even on fairly conservative maths, within the next few years a typical, national grocery retailer is going to need to score at least 150M predictive models in production every single day just to run a competitive supply chain. Only organisations that think about Machine Learning and Artificial Intelligence in terms of Analytics 1-2-3 - with a heavy emphasis on #1 and #3 – are going to make the cut.

All the rest of us are doing is boosting the profits of the espresso machine makers.

(Author):

Martin Willcox

Martin leads Teradata’s EMEA technology pre-sales function and organisation and is jointly responsible for driving sales and consumption of Teradata solutions and services throughout Europe, the Middle East and Africa. Prior to taking up his current appointment, Martin ran Teradata’s Global Data Foundation practice and led efforts to modernise Teradata’s delivery methodology and associated tool-sets. In this position, Martin also led Teradata’s International Practices organisation and was charged with supporting the delivery of the full suite of consulting engagements delivered by Teradata Consulting – from Data Integration and Management to Data Science, via Business Intelligence, Cognitive Design and Software Development.

Martin was formerly responsible for leading Teradata’s Big Data Centre of Excellence – a team of data scientists, technologists and architecture consultants charged with supporting Field teams in enabling Teradata customers to realise value from their Analytic data assets. In this role Martin was also responsible for articulating to prospective customers, analysts and media organisations outside of the Americas Teradata’s Big Data strategy. During his tenure in this position, Martin was listed in dataIQ’s “Big Data 100” as one of the most influential people in UK data- driven business in 2016. His Strata (UK) 2016 keynote can be found at: www.oreilly.com/ideas/the-internet-of-things-its-the-sensor-data-stupid; a selection of his Teradata Voice Forbes blogs can be found online here; and more recently, Martin co-authored a series of blogs on Data Science and Machine Learning – see, for example, Discovery, Truth and Utility: Defining ‘Data Science’.

Martin holds a BSc (Hons) in Physics & Astronomy from the University of Sheffield and a Postgraduate Certificate in Computing for Commerce and Industry from the Open University. He is married with three children and is a solo glider pilot, supporter of Sheffield Wednesday Football Club, very amateur photographer – and an even more amateur guitarist.

Related Posts

Excerpted & editorialized interview of Dr. Hani Mahmassani of Northwestern University and Stephen Brobst, CTO of Teradata, and their discussion of how companies are using real-time data for scenario crunching, such as supply chain risk assessment.