We'll hear more from Koehler on the predictive part of this implementation at the March 31-April InformationWeek Conferencein Las Vegas. The SUN platform was an essential staring point because it provides so much more data than was previously available to drive accurate forecasts. More data brings greater accuracy, so stepping up from millions of data points four times per hour to billions of data points 15 times per hour is making a big difference, according to the company.

I'm sure that closing half its data centers will make the cloud vs. in-house TCO balance sheet look pretty darn good for a few years. However, at that data volume, has Koehler done any projections out five or 10 years? 20 TB a day adds up, after all. And cloud providers thus far have not passed Moore's Law savings back to customers.

A key metric for the new system is "time to live" (TTL -- as in the time the data has left to live -- not "live" as in, on air). As I understand it, they don't need all that detail forever. The fine-grained detail is for accurate forecasting NOW or today. Once the weather is history, far less data is needed to retain a historical record, so they can delete information they don't need. If you want to know more you can ask Bryson Koehler directly at the upcoming (March 31-April 1) InformationWeek Conference. He's one of two guests on our When Big Data Platforms Make Sense (And When They Don't) panel session.

Weather Underground is collecting 20 TBs a day but doesn't need to save it all. At the same time, what a contribution to weather history and understanding worldwide patterns if it did. Granted, Weather Underground is not a non-profit, and doing so might convert it into one. But it's the first time in history we've had that much information in hand, so much we don't know what to do with it -- other than grab short term results and throw the rest away, so to speak.

I'm not certain the Weather Company is throwing detail away, but it's not likely it's keeping everything given that the 20-terabyte-a-day capture is within an operational system running on a NoSQL database. This granular data feeds near-term forecasting. I suspect they require less detail for historical trend analysis.

This is another good question I'm going to ask Bryson Koehler during our Big Data panel at the March 31-April 1 InformationWeek Conference.