Hi, and welcome to my blog.
I have been working for IBM and working with DB2 for the past 22 years, and I recently started to work with our new colleagues from Netezza. Although I work for IBM, the views expressed are my own and not necessarily those of IBM and its affiliates. The views and opinions expressed by visitors to this blog are theirs and do not necessarily reflect mine.

Monday, July 16, 2012

Adding a 4th V to BIG Data - Veracity

I talked a week or so ago about IBM’s 3 V’s of Big Data. Maybe
it is time to add a 4th V, for Veracity.

Veracity deals with uncertain or imprecise data. In traditional
data warehouses there was always the assumption that the data is certain,
clean, and precise. That is why so much time was spent on ETL/ELT, Master Data
Management, Data Lineage, Identity Insight/Assertion, etc.

However, when we start talking about social media data
like Tweets, Facebook posts, etc. how much faith can or should we put in the
data. Sure, this data can be used as a count toward your sentiment, but you
would not count it toward your total sales and report on that.

Two of the now 4 V’s of Big Data are actually working
against the Veracity of the data. Both Variety and Velocity hinder the ability
to cleanse the data before analyzing it and making decisions.

Due to the sheer velocity of some data (like stock
trades, or machine/sensor generated events), you cannot spend the time to
“cleanse” it and get rid of the uncertainty, so you must process it as is -
understanding the uncertainty in the data. And as you bring multi-structured
data together, determining the origin of the data, and fields that correlate
becomes nearly impossible.

When we talk Big Data, I think we need to define trusted
data differently than we have in the past. I believe that the definition of
trusted data depends on the way you are using the data and applying it to your
business. The “trust” you have in the data will also influence the value of the
data, and the impact of the decisions you make based on that data.

Great to see others finally coming to appreciate the "V"s of Big Data that Gartner first defined over 15 years ago, albeit without the professional courtesy of attributing them to us. Note however that only the original 3Vs I first identified back then are definitional qualities of Big Data. Other "V"s that people (cleverly?) add are not measures of magnitude. And value is an aspirational attribute at that. To see my original 2001 piece on the 3Vs: http://goo.gl/wH3qG. To see what Batman thinks of those being cute by adding other Vs: http://blogs.gartner.com/doug-laney/batman-on-big-data/. --Doug Laney, VP Research, Gartner, @doug_laney