How and Why Companies Choose Hadoop over EDW

Petabyte-scale enterprise data warehouse (EDW) can already handle the massive parallel processing and high-performance data management required for Big Data analytics, according to Forrester Analyst James Kobielus. So why, with a tried-and-true solution already available, are companies so enchanted by Hadoop, he questions in a recent Information Management column.

Did they even consider high-end EDW solutions, which already come with the enterprise-friendly features IT loves?

Kobielus conducted case studies on Hadoop deployment for an upcoming Forrester report and found that, yes, most had considered a traditional EDW, but they settled on Hadoop for a number of reasons. Primarily, their rationale is related to Hadoop's open-source license:

Lower cost because they don't have to pay hefty proprietary licensing fees

More flexibility because they can modify the code

Access to "leading-edge innovations" from the Hadoop community

But Kobielus also sees another core reason for Hadoop's adoption: Its vendor-agnostic approach to in-database analytics, coupled with MapReduce's powerful analytics framework, make it perfect for cloud EDW. He writes:

The bottom line is that Hadoop is the future of the cloud EDW, and its footprint in companies' core EDW architectures is likely to keep growing throughout this decade.

And what will companies do with Hadoop in the cloud? You can anticipate they'll do pretty much what they're currently doing with Hadoop on the ground. Kobielus identified three popular use cases:

Use Case 1: Hadoop as a staging area to extract and transform large amounts of all kinds of data - structured, unstructured and semi-structured - before loading it into an EDW or terabyte-scale analytical data marts.

The key here is in the variety of data formats. Unstructured and so-called semi-structured data can include anything from spreadsheets to event logs and video. It's a growing issue for enterprises. In a recent survey of 446 data managers and professionals, 78 percent of respondents said they're seeing more unstructured data, but even more telling is that more than half believe unstructured data will eclipse structured data in their organizations within the next 10 years.

Use Case 2: Hadoop as event analytics layer, which basically means analyzing petabyte-size event logs to find ways of optimizing IT, detecting fraud, searching for trends among social networks and so on. Kobielus provides a long list, but you get the idea.

Use Case 3: Hadoop as content analytics layer, which is actually the one area he mentions where Hadoop enjoys a true competitive differentiator due to MapReduce's modeling layer.

This is the use case sales and marketing will love, because it will allow them to skim "customer intelligence from Twitter, Facebook," and integrate "it all with segmentation, churn, and other traditional data-mining models," he writes, explaining, "MapReduce provides the abstraction layer for integrating content analytics with these more traditional forms of advanced analytics, and Hadoop is the platform for MapReduce modeling."

Kobielus says the Hadoop industry still has to address a few issues before it will see widespread adoption. First, he writes, they must "focus on what their up-and-coming approach does better than EDWs, or does best within the context of a traditional EDW architecture."

And second, as he recently told CIO.com, he'd like to see rich IDEs (integrated development environments) for Hadoop, a hole he expects data integration vendors will soon move to fill.