Small Data: The Yin to Big Data’s Yang

To be a good scientist and investor, Charlie Munger counsels that we must “Invert, always invert.” It is in this spirit that I picked up the book Small Data: The Tiny Clues that Uncover Huge Trends by Martin Lindstrom earlier this year from Amazon. After spending the last several years evangelising Big Data up and down Asia Pacific, I thought a healthy dose of “big data is not enough” will do me some good. At the very least, it can help lessen my commitment and confirmation biases.

It took a while, but Lindstrom’s book didn’t disappoint. In many ways, it reminds me of the rather wonderful body of work by Herb Sorensen and Paco Underhill on the science of retailing, but broader in scope and application. Lindstrom’s “small data” messaging is about the importance of collecting minute, detailed data through up-close intimate observations and interviews, digested with objective reflections on four criteria of the human condition — cultural, rulership, religion, and tradition — in arriving at an understanding of what people truly wants beneath the facade we sometimes hide behind.

In Lindstrom’s words: “If companies want to understand consumers, big data offers a valuable, but incomplete solution. I would argue that our contemporary preoccupation with digital data endangers high-quality insights and observations — and thus products and product solutions — and that for all the valuable insights big data provides, the Web remains a curated, idealized version of who we really are. Most illuminating to me is combining small data with big data by spending time in homes watching, listening, noticing and teasing out clues to what consumers really want.”

The above shows the kind of insights that can be obtained by putting cameras on consumers and observing where they spend time looking on their shopping trips.

I can certainly sympathise with the “big data is necessary but not sufficient” argument. In fact, one of my failed analytics projects corroborates that. A couple of years ago, I was tasked with the problem of diagnosing a major grocery retail chain’s problem with its Health & Beauty section. Like most retail businesses, they periodically send their sales data (in aggregated form) to Nielsen and get back each time a report on their performance against competitors across different product categories. This particular grocery chain was consistently underperforming on Health and Beauty sales and needed to do something about it. I got hold of the detailed transaction data — all tens of terabytes of them — and started throwing every data science tools I have on the problem. I could find interesting patterns and draw somewhat surprising correlations. I could find out the big heads and long tails and also figure out which are the stores that were underperforming when adjusted for the quality of their catchment areas (population size, literacy level, average income, etc from census data). I have fancy heat maps, trend lines, and all kinds of visualisations. But the project was an unquestioned failure after nearly 2.5 months of intense work. I could tell the customer “what happened” from the big volume of transaction data we collected but I couldn’t figure out the “why”, the causal link behind what they were doing — or not — in Health and Beauty and the relative underperformance against competitors. I simply had no clue. If you were looking for a case study of paralysis by analysis, that was it.

Looking back at the failed project, I knew I should have spent more time forming hypotheses about possible causes on underperformance and testing them. But knowing what to do is quite different from knowing how to do it. Hypothesis-forming is a creative process and we know AI and Big Data still haven’t cracked the problem of computational creativity. So the human touch is still necessary here. Sorensen, Underhill, Fisher and others have provided us with mental models on what to look for in (brick-and-mortar) retailing and they certainly help. What has been missing is a methodology for data scientists, both big and small ;), to observe and digest the small mundane details of our daily lives to postulate subtle but important causal links behind statistical phenomenons derived from big data. Lindstrom has now provided one, well almost.

There is, for sure, not enough in Lindstrom’s book for anyone to replicate his analyses and successes at reviving major brands like LEGO and Lowes, but there is enough of a skeleton framework there for interested and determined practitioners to make a science out of his art. I eagerly await further advances in this field over the next few years.

A Side Note on Health and Beauty H&B is actually one of the most challenging sections to get right in grocery retailing. Most of what we know in this area come from big-box retailing in western societies, but of course health and beauty by their very nature need localisation to meet different demands. Sachet shampoo is a classic example of a successful H&B innovation that caters to the unusual demand structure of Indian consumers: a lot of small and irregular consumption by a large number of people adding to a significant market.

The sachet shampoo market is made up of a large number of women who regularly use shampoo – not a fixed periodicity but once in a while – for special occasions when they want their hair to look good. This sachet strategy is more widely applicable, including baby diapers and gel toothpastes. Unilever was the company responsible for this innovation and they had to tap sophisticated fuel ingestion pump technology to make this happen.

Research also shows that the emotional tone of a segment can be really important in converting sales. For example, in the Health & Beauty segment, people want information and a brightly lit and clean area. In particular, Tesco learned that H&B purchases can be an emotional choice, not a rational one based on price, and that customers value information and advise much more than low price. The highly successful Baby Club was formed as a result. The main goal of Baby Club was to provide timely advise on baby care, as well as vouchers for baby care products.

The Man Zone by HEB is one of my favourite H&B concepts:

Each of the above concepts come not from big data, but from painstaking human-intensive “small data” research and experimentation.