“One of the hardest things for organizations to get their head around is getting data in the first place,” Dumbill told O’Reilly’s Mac Slocum. “A lot of CIOs will be, ‘Great, I want to do data science but I’ve got this database over here and this one over here and these all need to speak to each other and they’re in different formats and so on.’ In many ways, having data in a data lake provides you with a foundation (with) which you can start to integrate data with and then make it accessible as a building block in an organization.”

By creating a data lake — whether internally or in the cloud — you can then make it more widely available for business intelligence or data analytics via APIs or data services, he said. In this way, data becomes the “building block of new things” and changes application development.

“I think it’s something the whole industry is going to come to, and the reason I’m kinda getting behind it is, it necessitates a change of thinking about the way you build applications,” Dumbill said. “We talk about data as a raw resource. The data lake as a technology helps us to focus on it and then we think about data and what we can make from it and how it can help the business instead of thinking I need to buy tool A, buy program B and use application C.”

My research for a recent Enterprise Apps Today article, “The Down Low on Data Lakes,” leads me to believe that most experts agree with Dumbill, but are urging caution because the concept is still so immature. As Teradata’s GM Dan Graham told me:

“It's still so new there's more worst practices than there are best practices right now. There are just not enough repeatable implementations. In fact, the vision of the data lake is not exactly harmonious across the vendors and the customers.”

There’s also the question of how you keep data lakes from becoming data swamps. Dumbill acknowledges these issues in the podcast, noting that you “have to get search right,” but experts cite further concerns about:

Security concerns

Maintaining data lineage for auditing and compliance issues

Data quality

Data governance

Right now, there are no established answers to these questions, and finding them may take the next five years.

“It's going to be a five-year journey to get this nailed down to where it's really humming along and doing what everybody wants it to do,” Graham said. “They may believe the data lake vision, but a fraction of them are actually building it right now.”

Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.