Why I don’t buy the Big Data “Red Hat of Hadoop” Story September 19, 2013

It’s official. As of 19 August 2013, Big Data has officially past the “peak of inflated expectations” and is hurtling down the rollercoaster ride to the “trough of disillusionment”. Hold on tight boys and girls, this will be a white knuckle ride. This is the point when the rollercoaster reaches the top and you sense the change and then hear the screams as you hurtle down.

Its official because Gartner, who coined the hype cycle, declared it so. They maybe right or wrong, but it’s certainly official (and official does impact reality in Enterprise-Land).

Its not your first time on a rollercoaster, so you knew the top was coming, right? Plenty of smart folks have been predicting this. Robin Bloor for example was articulate, analytical and ahead of the curve on this in 2012.

The problem with the Big Data hype phase was that most of the energy was used to sell technology (lots of servers and tools and consulting) and inflate valuations of VC backed companies, much less time is spent solving real business problems.

There is clearly a lot of value in data, whatever qualifier you put around it – Big Data, Total Data (451’s take on this), Realtime Data etc. There is value here, it just may not be as big as it has been sold.

The hype story was built around the use of Big Data by firms like Google and Facebook. Who would not want their success and who would dispute that they are Big Data companies? Case closed, load up the truck with Hadoop powered servers! Not so fast. These “born digital” companies are fundamentally different from your average enterprise. They don’t have rooms full of analysts pouring over the world’s digital exhaust. They let their own platform specific ad serving engines do that.

So who has “rooms full of analysts pouring over the world’s digital exhaust” finding correlations and insights from my tweets, emails, Skype calls, Likes and Amazon purchases? Cough, err, the NSA? Sure enough a recent big round for Cloudera was led by InQTel, a fund that spots cool tech that the intelligence services could use.

Thankfully, George Orwell’s 1984 is not quite true yet, there are still some educated people who do not work for Big Brother. So Big Data vendors have to find some other companies to sell to.

One reason that narrowly defined Big Data is falling into the trough of disillusionement is that we are seeing lots of fierce battles around a relatively small market, known as “the Red Hat of Hadoop”. There are three high profile startups, each with their own Hadoop distribution. These include pure-plays like Cloudera, Hortonworks and MapR. OK, so Hadoop is not as big as Linux, but its still big enough to support one big winner IPO and a couple of acquisitions that make founders and investors rich? Three problems with this story:

1. A couple of big fierce dogs, Intel and EMC, are already competing with the upstarts.

2. The enterprise gorillas have by now figured out how to play the open source game so we also have IBM, Oracle, and Teradata incorporating Hadoop into their core offerings.

3. The breakthrough killer apps for Hadoop will be created by startups and they won’t pay for a supported distribution. Hadoop’s open source license is more permissive than Linux, so many companies don’t need the paid version.

The other reason that Big Data is hurtling down the rollercoaster ride to the “trough of disillusionment” is that the fundamental premise of storing the world’s digital exhaust on your own servers goes against the grain of the Internet which at its core is a distributed peer to peer architecture. The IT industry needs the Global 2000 to buy lots of servers and related software and consulting. The commoditization in this market is brutal, driven by the server purchases of the mega big born-digital ventures – Google, Facebook, Amazon, Microsoft. There has to be a story that the Enterprise IT Sales can hook their pitch to the Global 2000 CIO around. This has been “if you want to be like Google and Facebook” you need to buy lots of Hadoop servers”. That story is breaking down as companies search for that illusive ROI and discover that they have “yet another data warehouse” (albeit cheaper than the old ones) or, more derisively, “digital landfills”.

One company that gets lumped onto the Big Data pile is Splunk. Their valuation is monstrous, over $6 billion at last look and the stock is on a tear. They have benefited from the hype and their stock will be hammered as Big Data hurtles down the rollercoaster ride to the “trough of disillusionment”. However I think their value proposition is fundamentally different from the Red Hat of Hadoop distro players. Splunk did the equivalent of finding small change in the back of millions of sofas. All those log files have treasure troves of data if you aggregate them intelligently. They used this to gain market entry and may well end up as “last man standing” as the market goes pear shaped and they find the other ponies in the Big Data manure.

Another venture that gets lumped onto the Big Data pile but which I think will continue to thrive is Tableau, for the simple reason that they are about the consumption of Big Data and that is the key to climbing out of Slough of Despond and up the Slope of Enlightenement to the Plateau of Productivity.

Tableau is cool (used it and loved it when I was COO at ReadWriteWeb in 2009) but I think we are only at the very beginning of figuring out how to intelligently consume the world’s digital exhaust.