Big Data Big Deal

Thursday, September 20, 2012

If you have been following business intelligence or data visualization at all you may have noticed that almost everywhere you turn you run into Big Data. A tweet, a blog, an article, people seem to be scrambling over each other to discuss it, and work with it. As Stephen Few points out in his article below Big Data isn't exactly new.

Data did not suddenly become big. While it is true that a few new sources of data have emerged in recent years and that we generate and collect data in increasing quantities, changes have been incremental—a matter of degree—not a qualitative departure from the past. Essentially, “big data” is a marketing campaign.

Like many terms that have been coined to promote new interest in data-based decision support (business intelligence, business analytics, business performance monitoring, etc.), big data is more hype than substance and it thrives on remaining ill defined. If you perform a quick Web search on the term, all of the top links other than the Wikipedia entry are to business intelligence (BI) vendors. Interest in big data today is a direct result of vendor marketing; it didn’t emerge naturally from the needs of users. Some of the claims about big data are little more than self-serving fantasies that are meant to inspire big revenues for companies that play in this space. Here’s an example from McKinsey Global Institute (MGI):

MGI studied big data in five domains—healthcare in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal-location data globally. Big data can generate value in each. For example, a retailer using big data to the full could increase its operating margin by more than 60 percent. Harnessing big data in the public sector has enormous potential, too. If US healthcare were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US healthcare expenditure by about 8 percent. In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues. And users of services enabled by personal-location data could capture $600 billion in consumer surplus.

If you’re willing to put your trust in claims such as a 60% increase in operating margin, a $300 billion annual increase in value, an 8% reduction in expenditures, and a $600 billion consumer surplus, don’t embarrass yourself by trying to quantify these benefits after spending millions of dollars on big data technologies. Using data more effectively can indeed lead to great benefits, including those that are measured in monetary terms, but these benefits can’t be predicted in the manner, to the degree, or with the precision that McKinsey suggests.

When I ask representatives of BI vendors what they mean by big data, two characteristics dominate their definitions:

New data sources: These consist primarily of unstructured data sources, such as text-based information related to social media, and new sources of transactional data, such as from sensors.

Increased data volume: Data, data everywhere, in massive quantities.

Collecting data from new sources rarely introduces data of a new nature; it just adds more of the same. For example, even if new types of sensors measure something that we’ve never measured before, a measurement is a measurement—it isn’t a new type of data that requires special handling. What about all of those new sources of unstructured data, such as that generated by social media (Twitter and its cohorts)? Don’t these unstructured sources require new means of data sensemaking? They may require new means of data collection, but rarely new means of data exploration and analysis.