Will the real big data please stand up?

Retailers by now know big data is a big deal, but its definition remains murky.

Here at Internet Retailer we see plenty of e-commerce jargon and marketing talk cross our PC monitors, but some buzzwords prove harder to translate than others. Big data is one of those. Everyone has heard of it, but not many believe they can define it—and those who do often disagree with each other on what it means.

At Wikibon, a professional online community of technologists from many industries, a manifesto about big data includes this definition:

"Underlying every business analytics practice is data. Traditionally, this means structured data created and stored by enterprises themselves, such as customer data housed in CRM applications, operational data stored in ERP systems or financial data tallied in accounting databases. But the volume and type of data now available to enterprises — and the need to analyze it in near-real time for maximum business value — is growing rapidly thanks to the popularity of social media and networking services like Facebook and Twitter, data-generating censored and networked devices, both machine- and human-generated online transactions, and other sources of unstructured and semi-structured data. We call this Big Data."

The volume of this data, the text continues, is on the scale of petabytes and exabytes (2^50 bytes and 2^60 bytes, respectively) rather than yesteryear’s gigabytes and terabytes (2^30 bytes and 2^40 bytes). One byte is a unit of computer information that consists of eight bits, zeros and ones that together make a binary string, which can thus encode one of 2^8, or 256, different pieces of data, such as a letter or number.

Retailers and even the vendors who help manage and analyze their data for targeted marketing, among other things, might not be so precise about the exact volume. But everyone is aware of the massive amounts of information produced by online behavior tracking, opt-in profiles at e-retail sites and social conversations—not to mention behind-the-scenes information around hardware, such as tracking hits to a particular server or monitoring global web traffic.

Therefore, we can all likely agree that big data in part refers to retailers having more data than they know what to do with. Even in the days of cataloging, retailers had access to tomes of information about their customers that informed their marketing campaigns. What seems to have changed is the approach. Today’s businesses are using tools and techniques originally developed by large online networks for solving problems on a large scale—accessing small sets of data from many sources quickly and processing it at the same time to produce near-instant responses: think Facebook’s news feed, which gathers stories from many sources in its network, and then produces a small sample of those stories in response to a user’s Likes and preferences.

Richard Vermillion, CEO of analytics and marketing technology company Fulcrum, says big data does not necessarily imply a particular volume of data but a particular type, one that requires certain tools to cope with, like parallel processing systems or map-reducing algorithms. For example, he says, a retailer with three million customers who wants to optimize 50 different offers for them must deal with 3,000,000 x 50 pieces of data—a much taller order than posed by examining either the customers or the offers alone. “It actually might be small data, but the problem that you’re trying to solve blows it up to make it big,” he says.

Probably, the definition of big data lies somewhere in the murkiness between the enormous volumes of data that a business captures and the more enormous volumes of data generated in their subsequent analyses, which in turn require special tricks and tools.

Probably, the definition is not so important in the end.

What is important is that retailers are prepared to tackle the onslaught—consumers demand more personalization, relevancy and engagement from them every day, and, short of omniscience, that takes some number crunching.

As the Wikibon manifesto puts it:

"Make no mistake: Big Data is the new definitive source of competitive advantage across all industries. Enterprises and technology vendors that dismiss Big Data as a passing fad do so at their peril and, in our opinion, will soon find themselves struggling to keep up with more forward-thinking rivals."

For those organizations that understand and embrace the new reality of Big Data, the possibilities for new innovation, improved agility, and increased profitability are nearly endless."

Nearly endless possibilities for profit? Sounds to me like what in earlier decades may have helped describe another elusive buzzword: Internet retail.

Now, as before, it’s all a matter of not what a retailer has, but what it does with it. And what retailers have is data—growing, compounding, nearly endless, possibility-rich data.