Is there a buck in Apache Hadoop?

Is there a buck (or two) in Apache Hadoop, the open-source framework for crunching gargantuan amounts of data stored on large hardware clusters?

If you’re one of the several IT vendors offering some sort of Hadoop-based platform, the answer is evidently “yes.” Whether in actual software sales, or merely the fees related to support, its clear those companies intend to make some money off the rising interest in Hadoop. EMC recently jumped in with its Pivotal HD; Intel released a Hadoop distribution; Hortonworks and Cloudera recently announced additions to their own Hadoop offerings—and that’s just in the last month.

And it seems a safe move, considering how many analysts believe that Hadoop will remain popular for some time to come. For example, research firm IDC suggested last May that the Hadoop market could hit $812.8 million by 2016—a conservative estimate, in comparison to the billions that other analysts believe the framework will earn for various companies over the next few years.

But other analysts see a schism of sorts developing in the market. On one side of that divide are all the companies using Hadoop as the basis for a proprietary product of some sort; on the other is Hadoop as “pure” open-source technology. When an EMC executive recently indicated that his company was “all in on Hadoop,” 451 Research analyst Matt Aslett took to the blogs with a pithy quote differentiating the two:

“What does it mean to be ‘all in on Hadoop’? Based on a strict reading of Defining Apache Hadoop (a document that demands by its own words to be read strictly), being ‘all in’ on Hadoop means only one thing: being “all in” on Apache Hadoop.”

He added, a little further down:

“It is a matter of defining what users understand Hadoop to be, and what they understand it not to be. It is a matter of drawing a line between Hadoop—Apache Hadoop—and additional, proprietary, functionality beyond the scope of the project.”

Even with lots of companies jumping aboard Hadoop, hoping it’ll lead to fortune (or at least higher revenues), the framework’s open-source nature could blunt those efforts at commercialization: IDC, in its research note from 2012, suggested that the competition between open-source and proprietary will lower revenues for the latter.

“The Hadoop and MapReduce market will likely develop along the lines established by the development of the Linux ecosystem,” Dan Vesset, vice president of Business Analytics Solutions for IDC, wrote in a statement at the time. “Over the next decade, much of the revenue will be accrued by hardware, applications, and application development and deployment software vendors.”

Which leads to the next question: given the increasing number of Hadoop-based platforms filling the market, are we in a Hadoop bubble?