Defining Big Data

With big data, you can increase your margin by 60 percent. Big data means I can track everything my customers do and store it forever. It sounds brilliant, doesn’t it? But admit it; we still don’t know what it practically means for most of us.

When I first heard the phrase “big data” in 2011, I didn’t think, “Hey, that sounds cool”; I actually cynically thought, “Here we go again.” The big technology companies have thought up a new tag line that will hook us all into thinking we need to replace our old systems with new, faster, expensive servers. I worried that the analytics industry was about to go down another cycle of technology-led implementations rather than thinking about how to use the data we already have in an imaginative, insightful way. But maybe there is still time for us to grab the phrase and define what it should mean. Here are some thoughts:

Is big data about getting a huge new server to process billions of records and petabytes of data? No. One of the great things about the discussion about the phrase big data is that it gets people talking about the huge increases in computer processing power. But how about that focus being on the laptop you’re reading this column on rather than on some server that sits in the basement and only the grumpy IT team knows how to use? Most laptops are now powerful enough to process millions of records of data, thereby freeing analysts from the constraints of needing major database systems to analyze their customers’ behavior. This democratization of data should mean more individuals in your company can mine the data you already own. Invest in an analytical tool like SAS or SPSS or even good old Excel, get a data extract on your laptop, and start mining.

Does big data mean I can collect every bit of information about my customers? Storage isn’t expensive and I can use the cloud to keep all my data. No. The problem with this laissez-faire attitude to your data is you fall down the trap of the analytics team being driven by long-term development projects rather than focusing on the here and now. If you’re storing everything, you aren’t placing value on key customer data that can give you the edge over your competitor. It stops you thinking. It moves the weight of your job to data collection and technology rather than asking questions about the data that already exists. What’s the killer stat you need to put in front of your CEO so she remembers your name? Take a step back and think about what you need to calculate that. That’s the data you should be collecting and more of it – not everything else.

Is big data about investment in infrastructure? Well, if it is, let that investment be in people and process, not technology. Let it be about an analyst investing thinking time and the development of her data mining skills to find the value in data that’s already available to her. Web analytics is still dominated by technology people who can write brilliant tagging code, but the think division is the summit of all mathematical achievement. If we’re going to make use of the data available right now, there needs to be investment in more data scientists and statisticians to lead the web analytics industry.

In two years’ time, we’ll look back and be able to say what big data came to mean. Let’s hope that most of us are thinking that this was the year we were released from the constraints of not being able to access and analyze our data rather than thinking big data was the latest in a series of technology fads that we blew our operations budget on.