My latest example of a data lake user -- coming in a story next week -- is a major drug manufacturer that is running Hortonworks on Amazon's cloud. The deployment lets them study all data related to the production of drugs -- 10+ years worth of information including plant-floor temperature and pressure sensor readings -- in order to spot yeild variations by batch and determine how to optimize production yeild. It's the sort of lighthouse implementation that shows others the way and makes adoption much more likely in the five-year time frame than the 15-year time frame. Stay tuned for the full story next week.

The estimates are high but we'll get there eventually -- 15-20 years? The data lake is an aspirational architecture that is for bleeding-edge companies only right now. We'll see slow evolution for some time to come.

I'm told that Cloudera and Hortonworks are each logging deals at a rate of 50 to 70 per quarter. Throw in all the other Hadoop distributors and call it 500 to 700 per year. That's at current rates, which we can assume will accelerate. In 2013, for instance, sales teams at HP, Microsoft, SAP, and Teradata added Hortonworks' distribution to the portfolio, and we can confidently guess that having more feet on the street will increase sales. Cloudera notes that deal sizes and deployments are growting. And let's assume also that these suppliers are pursuing the biggest companies possible.

With these stats, it won't take long to see big adoption among the Global 2000. From there the question is how much serious, production-grade work are they doing on the platform and to what degree are they moving existing workloads onto Hadoop? In 2014 we're still in the early days. By 2020, I think the data-management arena is going to be very different.

Hadoop is something like a big, undifferentiated resevoir. It collects data from everywhere. The water coming out of your faucet, on the other hand, is charcoal-filtered and chlorinated. Yes, there's a lot of water in the resevoir, but remember where the stuff good to drink is coming from. Sorry, we're having a drought here in Calif.

The $50 billion figure is the one that sounds high. The Data Hub/Data Lake concept foresees Hadoop as a place where you store many types of data, including high-scale machine data (log files, clickstreams) that many companies haven't been collecting or keeping until recently. That you have copies of such data in a hub does NOT mean that you are necessarily replacing transactional systems or even data warehouses. It's a cheaper platform on which to gain new insights and perhaps replace some workloads. If the Hub has high-scale data and lots of historcal data long since deleted from operatoinal systems, it's not at all unrealistic to see 60%, 70%, 80% of the total data footprint in an organization being stored on Hadoop.

"Rob Bearden, CEO of Cloudera-rival Hortonworks, said that he expects see "60%, 70%, 80%" of enterprise data moving into Hadoop over the coming years." Does this sound high to you, Doug? Sounds pretty high to me given state of tools and Hadoop talent.

As InformationWeek Government readers were busy firming up their fiscal year 2015 budgets, we asked them to rate more than 30 IT initiatives in terms of importance and current leadership focus. No surprise, among more than 30 options, security is No. 1. After that, things get less predictable.