Problems with Big Data

Big Data is constantly in the news. We've been asked at SQLserverCentral to try and develop some articles, perhaps even a stairway to explain what Big Data is and how we might use it. I'm still trying to grasp the concepts myself, and unlike the amorphous cloud, I'm still looking for some good examples of what Big Data really is.

When I ran across this piece warning that Big Data isn't the final solution to all our questions in the world, I wasn't surprised. The piece notes that Google Flu hasn't been very accurate in its predictions of outbreaks. At first glance, this gives lots of credence to the idea that the good, solid data analysis and mining techniques we've used for years are just as good as any new Big Data fad.

However as I read more about the piece, it's not that big data and the analysis of large quantities of information is flawed, it's that a solid hypothesis matters. Researchers need to be willing to evolve their algorithms as they learn more about a problem. Probably they should also assume their algorithms are not correct until they've proven their ability to predict actual trends for some period of time.

We'll constantly be searching for ways to better interpret information and make better decisions. No new technology or product is going to magically solve our problems. Good solid understanding of the problem domain will continue to matter as much as the data itself.