Data Data Everywhere

Like the Ancient Mariner who had ‘water, water everywhere, but not a drop to drink,” we have data, data everywhere, but not the understanding we need about it.

What is it?

What does “big data” mean? What defines it? Wikipedia focuses on the volume and speed of data. IBM talks about the 4 Vs – volume (size), velocity (how fast it moves), variety (how diverse the ways are that it’s put together), and veracity (the question of what to believe). But that leaves out how complex it is and what you’re doing with it. “Big data’ is more about how you think of the data and its use and the set of skills you have in place to do this.

When we don’t know how to handle something, we give it a name, so we labeled the situation “big data,” but the words do not in fact describe what we mean. What we need to do is figure out is how to use data. That’s the starting point for addressing the “big data” conundrum. Data collection and creation follow the 4 Vs. But the other side of the coin is how you use data and turn it into monetizable opportunities. As the world becomes more digitized, data velocity becomes intense, and creating new ways to use data becomes critical.

“Big data’ means pulling multiple sources together and looking at the result. In the old days, we looked at “small data.’ Volume and speed are not good examples. The key is how we understand and go to the next level of how we analyze it. Today we have better data competency, a better understanding of what data is supposed to tell us.

Combining data sets used to be difficult. Today, automated systems make data collection easy. The challenge becomes curation, selecting and combining data sets in ways that make sense. Yes and the data has to be accessible. It’s not enough just to capture and store it.

Data is like oil. Risky. Hard to extract. But, once refined, valuable. (Courtesy of Charlie Schick!)

How do you differentiate signal from noise?

The volume today is too great to eyeball data and find meaning. We need different tools. The kinds of questions we ask about the data haven’t changed. According to Asaf Evenhaim, what’s new is our ability to process the ne volume of data. Visualizations can make the difference. They help us see patterns that show relationships, i.e., the signal within the noise.

With big data, it’s all noise until you find the signal, said Greg Jackson. All of it has value. What you have to do is to discern the hypothesis in order to find the signal. Research used to start with a hypothesis; now we start with data and that drives the hypothesis.

How important is domain knowledge?

Domain expertise is crucial to get value from data – to understand the business and have the ability to manipulate the data