What If Big Data Doesn’t Work?

Let me say this in advance – I know that I am going to go to hell for writing this, in the same way that my Catholic friends see little hope for my immortal soul because I’m Jewish, and my born-again friends here in the south just shake their heads and hope that I’ll come to my senses before my time is up. But here it goes:

What if Big Data doesn’t work?

The mythos of Big Data, what has the industry either very excited or completely freaked-out or bored to tears, depending on where you sit, is the belief that we finally have enough information about a person to predict their behavior with some level of statistical rigor. We’ve moved away from talking about Big Data as the analogy to Asimov’s psychohistory. He posited a science of mass behavior; the priests of Big Data espouse modeling at the individual level. But before Lenny sends me to Limbo, or at least Purgartory, I ask again:

What if Big Data doesn’t work?

Big Data’s promise relies on a set of assumptions, none of which may be valid (or, in fairness, may not be valid today but might be in the future).

Big Data assumes a deterministic view of the world, that a person’s behavior is sufficiently consistent and that the determinants of that behavior operate in a homogenous manner across individuals of some cohort; that you can actually build a model of that behavior and use it for prediction. We have no reason to believe this is true except blind faith. Mostly, we can’t predict a lot of things beforehand with a model, either for human behavior or the more restricted domain of consumer behavior. This is why 80% of new product introductions fail. This is why we do experiments, as Howard Moskowitz has [rightfully and righteously] proclaimed on more than one occasion.

Big Data assumes sufficient acquisition of the causal factors involved in a decision and this may not be true. A simple opt-out will keep them from knowing what television shows I watch, because Comcast is prohibited from sharing my viewing data (if they even keep it) with anyone else. Nor do they know what radio ads I’ve been exposed to, and because I’m a Luddite at heart and still have a bit of late 1960s paranoia, I don’t have a smart phone so I’m not getting mobile ads. In short, Big Data only works if it has all the relevant information, and it may never have that if consumer activists and privacy opt-in initiatives prevail.

Big Data assumes we know how to ask the proper questions and this may not be true. Big Data is only as smart as the researcher who is querying the database or creating the model. Contrary to some popular conceptualizations, it does not recognize patterns on its own, nor does it create statistical models on the fly. While some of this can be automated, the automation itself needs be programmed, the type of model to assume (linear, Bayesian, structural equation, etc.) needs to be explicated, checks on the rationality of the results (colinearity biases, Heywood cases, etc.) built, and so forth. For those of you who read the Retailwire daily blogs, you know that retailers have not come close to figuring out this part. IBM’s Watson may be capable of making remarkable connections remarkably fast, but the days when it is cost effective to ask it how to sell more Charmin to Cottonelle users is not in our near future.

Will Big Data have some big wins? You betcha, if only because every supplier with access will be desperately seeking a highly-publicizable example. I just finished listening to Retailwire’s webinar on retailer usage of Big Data, and the short answer is they are not using it as much as you may think or in a very sophisticated way. But for every big hit you hear about, you will also hear about the big miscues (see Target and Pregnant Teenager – oops). And you can bet you won’t hear about all the little miscues – the ones where the model says “do this” and “doing this” doesn’t help the business. Because it’s just a model. Because it’s just a model based on imperfect or incomplete data. Because it’s just a human being asking the questions.

I’m off to Purgatory now. Mea culpa, mea culpa, mea maxima culpa.

Please share...

Related posts

18 responses to “What If Big Data Doesn’t Work?”

hi thanks for your article if I understand it correctly
the little players in the fish pond are not taken into account and big data is designed for big players only. were does that leave the little guy trying to get ahead. is it not the responsibility of the top end user to then implement systems to capture the larger sector of global users. am I of base I am interested to know

I would think that once an entity has figured out how to “own” the data, there could well be a profitable business in dealing with “small fish”. I’m thinking Dial-A-Datum, where you call up and say, “Are my buyers of X also buying Y or Z” and after tapping away for a minute, you get your answer. Fixed price for an answer (say $1,000) and at 10 answers per analyst per day, we’re all doing okay. Yes, you can steal this idea.

thanks but I would have no use for this type of information at this stage .Maybe later .
I do not think big data has realised how many small fish there are in the pond or they would be paying more attention …just my thoughts on the matter maybe big data needs to catch up lol
enjoy your day
kathy

Good post, Dr. Needle:
I am glad there are others who are willing to have this discussion. The promise of Big Data is quite alluring, especially for those who are on the outside watching Watson and the other big players strut their stuff. I think the Big Data demonstrations we’ve been seeing provide an alternative perspective to traditional data collection/analysis processes that can still be useful to smaller players. Sure, when you’re talking about petabytes of data, and sophisticated mobile data collection apps, then maybe you’ll find something worthwhile (under the assumptions you cover nicely).

For the smaller players, though…the Big Data initiative has illuminated all of the different sources of data and methods for collection that are now available. Perhaps the volume is smaller, but additional points of triangulation are much easier to find and get now. As with any new technology, the early adopters will show some stunning successes, along with many many less successful (and less publicized) attempts. But, eventually the technology, idea, innovation, etc. trickles out further to into the market. From my perspective, the theory behind Big Data may be more hype in practice. However, adoption and adaptation of select pieces of the puzzle may prove quite useful for smaller companies. Yet, as you note, it will be difficult to automate that process…you still need a human to explain why it’s being done that way.

Some would posit, with the advent of Big Data and its analytic pundits and precedents, that we’re in hell (or at least purgatory) already. 😉

Nice post. I especially appreciate your first bullet. I agree that data can certainly point toward re-occurring behavior models, but I have a harder time believing its ability to predict adoption of anything new. At least not in the near future.

And I firmly hope not to be proven wrong, at least for those of us who have some small belief in free will.

Isn’t the point about statistical inference that you don’t need perfect data or models? Given enough data and processing you can just run big Monte Carlo simulations to find out about predicted aggregate behaviour. You don’t need to know exactly who will buy beer and nappies just know enough about the small group of young fathers who will.

On the last point however, you are spot on. The tools are only useful if provided to smart people who truly understand what they are attempting to achieve and why. Here is the real bottleneck.

Has “big data” already worked. Yes. It was a contributing factor to Obama’s win last week.

Is big data a cure for cancer? No. (Although ironically big data is likely to yield large dividends in medicine).

Can big data predict the actions of an individual. No. It’s not designed It can help put you into a group where certain behaviors are more likely but certainly not guaranteed.

Are many discussions about big data definitional? Yes. Starting with finding the boundary between regular sized data (Grande) and big data (Venti). It’s like trying to define the edge of the atmosphere and the beginning of space.

Why is big data (like Hansel in Zoolander) so hot right now? It means big money.

Great post. My 2-cents on the subject (and if Steve goes to the Purgatory, I’ll be the one locked up in the next room in a straitjacket ):

“Big Data assumes a deterministic view of the world, that a person’s behavior is sufficiently consistent and that the determinants of that behavior operate in a homogeneous manner across individuals of some cohort; that you can actually build a model of that behavior and use it for prediction.”

If we go to the root and take it upward, the accuracy increases. What I mean by this is that if we go as deep down as we can possibly go – I wish we could go microscopic! – we end up with a national trade balance and GDP, from which Moody’s, S&P and Fitch generate their ratings that affect millions of consumers. This is the lunacy of it all and my greatest fascination. A sun storm brings certain elements to the earth that trigger a boost in plant growth. Food prices drop; more residual income, retail thrives. If we correlate sun storms, agricultural output and currency rates, we can say with some accuracy what hedge funds will do and when they will act, which again tells us how they will position themselves against negatively correlated series. If we know the extent of the hedge fund move and their combines weight (i.e. trade volume), we can determine what effect that will have on consumers worldwide. The root is elemental, the first layer food, the second shelter, and the third impulsive shopping. To me, it seems that so-called big data is too preoccupied with this last step, which is nothing but a result of a string of precursors.

“Big Data assumes sufficient acquisition of the causal factors involved in a decision and this may not be true. In short, Big Data only works if it has all the relevant information, and it may never have that if consumer activists and privacy opt-in initiatives prevail.”

Until it goes atomic or even sub-atomic (energy and quarks), Big Data is an educated guessing game. It should be more accurate than what we’ve had so far, but I seriously think we need to map out cosmic energy flow before we get some degree of accuracy. Do Northern Lights affect consumers? Sure! They buy heavy coats and gas to get to the highlands … or even airline tickets to get to the Arctic! And the Aurora is caused by solar storms.

“Big Data assumes we know how to ask the proper questions and this may not be true.”

In my opinion, true Big Data – atomic or sub-atomic – yields a result without a question. It becomes a fact. No one asks the question: ‘If I leap off the Hoover dam, will I get smashed?’ Newton kinda made the question moot. The way I see it, Big Data will eventually metamorphose into facts we can’t even comprehend today, which is probably why we can forget formulating questions at this stage. Big data is NOT Big data yet, but it’s getting there.

“Will Big Data have some big wins?”

I want Big Data to prove that only by operating as a unity can we attain the level of growth and prosperity we all want, where natural resources are balanced globally and the entire planet and its population is in equilibrium as far as material gratification is concerned. Why does a fraction of the world have to suffer so another fraction may prosper? To me, real Big Data addresses the inter-connectivity of everything that takes place in this semi-closed system we call Home and proves that only by operating as a single mechanism (ants!) can we really start talking about progress and prosperity.

I think your piece is based on a false premise – which is “Big Data assumes….”

Big Data doesn’t assume anything. Researchers assume. People assume. Writers assume. Big data is just that, big data. There is lots of data out there, and more is generated every second. How people use this data, and whether there are people with sufficient skills and tools to analyze the Big Data remains an interesting question.

In my opinion, the tools and the skills are lagging behind (in general), but there’s rapid movement on the tools end at least, to catch up. Example – Google Analytics is a free tool that can crunch millions of records with multiple variables in seconds. That’s “big data” made small. I can use the tool to learn the effectiveness of an action – e.g. 1) write blog 2) promote blog 3) track who visits blog 4) track who takes additional step related to blog (e.g. subscribe to blog).

I’ve read a few post recently that are railing against this thing called “Big Data” and I wonder whether it is reaction against the popularization of the term, a lack of understanding of the issues related to it, or fear that the shift toward “Big Data” will create a professional challenge for some market research because they lack the skills that data analytics requires.

My take – There is lots of data out there that companies have at their fingertips that goes unanalyzed. Why? There is a lack of skilled people to analyze it. The Big Data discussion should focus on collaboratively educating each other about how to take on the challenge and share techniques, strategies and tools that work. I don’t get the Chicken Little “Sky is Falling” stuff that out there right now. It doesn’t make sense to me.