Big Data Basics: What It Is, What It Means for Your Business

Big data can solve every business problem you have, transform society for the better, and possibly cure your fallen arches — if you believe the firestorm of ultra-positive hype on the subject.

In an effort to separate fact from fiction and get a grip on the reality of what big data can do for business, IBM Global Business Services conducted an extensive research project on the topic, in conjunction with the University of Oxford. The resulting report, titled “Analytics: The real-world use of big data” and subtitled “How innovative enterprises extract value from uncertain data” was co-authored by Michael Schroeck, partner and VP of IBM Global Business Services and an IBM Distinguished Engineer.

In an interview with Travel Weekly PLUS editor in chief Diane Merlino, Schroeck covered the basics of big data — which the report describes as “a business imperative” — and examined its implications and applications in the travel industry. This is the first excerpt from the interview, covering the four “V”s that comprise big data as defined by IBM — volume, variety, velocity and veracity.

Merlino: There’s a huge amount of buzz about big data these days, but what is it exactly?Schroeck: We define it along what we call the four “V”s; this was consistent with what we learned from the survey respondents.

The first V is volume — what people typically associate with big data. That’s the massive amounts of information that organizations are being asked to deal with today. To put it in perspective, and give you a sense of how fast this is growing, it’s estimated that, worldwide, we have generated more data in the last 18 months than in all the previous years of history combined. One of reasons is mass digitization and, clearly, social media is having a major impact.

A couple of other metrics to think about: Today there are over 100 billion Google searches every month. In 2007 that number was about 2.7 billion, so it’s gone from 2.7 billion to 100 billion per month in five years. Today there’s an hour of video uploaded in YouTube every second. There are 140 million tweets via Twitter every day. That gives you a sense of how these numbers are growing geometrically, and will continue to grow.

The challenge is, how can organizations do a better job of taking advantage of that information?

Merlino: So, volume is the first component of the big data definition. What’s next? Schroeck: The second “V” is variety. That has to do with the variety of the information, specifically around unstructured data as opposed to the traditional focus of organizations on structured information — numbers, names, addresses, and so on.

In the new world of big data, organizations are looking at how they can harness unstructured or semi-structured data. This is information like e-mails, audio, or video, or some of the information I alluded to around social media like Facebook, Twitter or LinkedIn, to name just a few.

To bring some perspective, it’s estimated that about 80% of the information throughout the world is unstructured. Organizations see a tremendous opportunity associated with doing a better job of leveraging and analyzing that unstructured information.

Merlino: Everything associated with big data as you’ve defined it so far seems to be moving at the speed of light. Schroeck: Well, the third “V” defining big data is velocity. The pace of change in today’s global economy is growing at an unprecedented rate. The demand for real-time information to keep up with that pace of change is an important driver here, specifically when you think about providing customers information in real time. Logistical information or real-time pricing and promotions require more ready access to information when a customer is on your Web site, or at a kiosk, or in front of an agent at an airport or a hotel.

To give you a sense of how velocity has the opportunity to change an entire industry, Amazon just rolled out a program where they guarantee same-day delivery for major markets, starting in the U.S. You can get on Amazon.com, order a product or book in the morning, and have it at your house, or wherever you want to ship it, in the afternoon. This is already having a major impact, and it will change the face of retail. We’re seeing the same kind of dynamics in many other industries, including travel and transportation.

Merlino: The fourth “V” in your big data definition is veracity, which seems quite different from the first three elements. Schroeck: We’re bringing information in from a lot of different sources, many of which — like social media — are not controlled by organizations. How can we do a better job of ensuring the integrity and the trustworthiness of that information? The whole concept of governance and governing these new types of information has become very important.

As executives think about how to get more value from big data, they also need to understand the inherent uncertainty or ambiguity associated with that data. For example, things like predicting weather or leveraging patient records to be able to do a better job of prescribing treatments and then outcomes. Regardless of the quality of the data, there’s a certain inherent uncertainty associated with those outcomes. It’s important for executives and users of that information to understand that.

Merlino: It seems to me that what you’ve done is to identify and then combine already-existing trends under this single category called big data. Is that so?Schroeck: I think that’s right. To a large extent we see big data as a natural extension of the investments and commitment that organizations have been making over the last several years in things like data warehousing, business intelligence, or corporate performance management.

We see big data as a way to extend the value of those investments, and address some of the new trends and opportunities in the marketplace, like bringing unstructured information into those environments, and moving more towards real time.

As the study points out, we believe the right approach is a pragmatic one that starts out with the technologies you’ve already invested in, with the information you already have, and extends it from there.

Merlino: If it’s more of an extension of already existing initiatives, why is big data getting so much buzz now?Schroeck: It’s a combination of things. Part of it is what’s going on in the marketplace.

In your industry, much research and analysis by customers has moved to the Internet. More sales are being driven through online shopping, and social media channels are being introduced and leveraged more and more. Customers often communicate about their experiences with a restaurant, a hotel, or an airline through social media. The ability of organizations to access that information — to know what their customers and potential customers are saying about their products and services, or about a new hotel they’ve opened, or a new promotion they’re running — has proven to be very valuable. That has clearly been one of the drivers around big data.

Another driver has been advancement in tools and technologies. While organizations have always aspired to do a better job of understanding their customers and anticipating customer behavior, the tools, technologies and processes required to do that were either not available or not practical. That’s changed, and now the tools and technologies exist to deal with the scale of the volume, and things like text analytics can deal with the variety. And they’re certainly more real-time in nature.

So, what’s happening in the marketplace combined with the advancement of tools, technologies, and processes have led to big data being at the top of executives’ agendas this year. We anticipate that will be the case for the next several years.

COMING UP: Is big data just for big businesses? How travel and transportation industries compare with other industries in the adoption of big data strategies.