I'm the vice president of strategy at Silicon Valley Data Science. I was the founding chair of O'Reilly Strata, and founding editor of the journal Big Data. I create conferences, software and words. Current interests include data, the Internet of Things and the nature of work.

The Data Lake Dream

In 2013, I spent a lot of time talking about Hadoop’s development towards being a central destination for data. Hadoop may enter an organization for a specific use case, but data attracts data. Once in the door, Hadoop tends to become a center of gravity. This effect is amplified by the appeal of big data being not just about the data size, but the agility it brings to an organization.

However, to exist feasibly in this way, Hadoop needs more than just a data crunching engine and a small army of willing Java programmers. It must become an enterprise platform that supports application development. By the end of the 2013, the major Hadoop vendors had all formulated a platform strategy: be it the Cloudera Enterprise Data Hub, or Hortonworks Data Platform.

One phrase in particular has become popular for describing the massing of data into Hadoop, the “Data Lake”, and indeed, this term has been adopted by Pivotal for their enterprise big data strategy.

But what do the big data vendors mean by this?

The data lake dream is of a place with data-centered architecture, where silos are minimized, and processing happens with little friction in a scalable, distributed environment. Applications are no longer islands, and exist within the data cloud, taking advantage of high bandwidth access to data and scalable computing resource. Data itself is no longer restrained by initial schema decisions, and can be exploited more freely by the enterprise.

I call it a dream, because we’ve a way to go to make the vision come true. It is, however, an accessible dream.

I’ve set out to describe the four levels of Hadoop maturity that lead us to the dream of the data lake. From these levels we can see where today’s Hadoop vendors are, and understand where our own organizations sit.

Post Your Comment

Post Your Reply

Forbes writers have the ability to call out member comments they find particularly interesting. Called-out comments are highlighted across the Forbes network. You'll be notified if your comment is called out.