Harnessing the Potential of Unstructured Data

A retailer analyzes the video feeds from its surveillance cameras to determine how customers are moving through stores and how much time they spend in front of different displays. After sifting through this information, the retailer rearranges the displays to guide customers to more popular products, leading to an uptick in sales.

A manufacturer collects historical data about earthquakes and overlays this information onto a map of its plants, allowing it to find holes in its disaster recovery plan that weren’t discernible before.

A multinational firm that is sponsoring a series of rock concerts gets wind of comments on social media sites that people are upset about a political stance taken by one of the bands scheduled to headline. The manufacturer withdraws its sponsorship, avoiding a potentially costly PR snafu.

These all represent different ways that companies are gaining competitive advantages using nontraditional sources of information—what technologists refer to as “unstructured data.”

In the past, companies have maintained data warehouses filled with “structured” data. This is information such as credit card transactions and customer records that was kept in transactional databases and was easy to organize, analyze and turn into reports.

However, the vast majority of information that companies have now is unstructured. That includes the video, tweets and satellite images, as noted above. Unstructured data is also made up of posts on social media pages; transcripts of contact-center conversations; videos and voicemails; and sensor information collected from highways, bridges and even refrigerators.

“There are so many new ways to capture data,” says Jeff Kaplan, managing director of THINKstrategies, a consulting firm in Wellesley, Mass. “Every time someone makes a keystroke on their smartphone or PC, they are creating data that has potential value to someone.”

Big Data: Complex and Opportunistic

It’s not surprising, then, that a third of data managers in one survey say unstructured information has already surpassed or will surpass the volume of traditional relational data in the next three years.

The September 2012 Aberdeen Group study, “The State of Big Data,” found on average that companies with more than five terabytes of data leverage 41 distinct sources of data—20 internal, 12 from business partners and nine from external sources. And when companies launch into Big Data projects, they tend to use a lot of these varied sources; the top sources for Big Data projects include transactional, structured data (95 percent), social and customer data (85 percent), clickstream (83 percent) and internal, unstructured data (82 percent). Nathaniel Rowe, the Aberdeen Group analyst who did the study, was struck that traditional transactional data still plays a dominant role in Big Data efforts. As he puts it, familiar information is forming the core of companies’ analytic efforts, and additional data sources are being used to augment, clarify or add insight.

Consider the retailer who used video surveillance footage to track customer patterns in the store. The insight was gleaned from combining the unstructured data of the video with the structured information created by the point-of-sale cash registers. Indeed, many believe the Holy Grail of Big Data is to wisely combine unstructured data with structured information, such as customer survey responses, operational metrics and financial information.

The first forays into using unstructured data started nearly a decade ago. Companies would analyze frequently repeated words in warranty claims and service data, which allowed them to identify potential problems and take early action before the problems proliferated.

While such early efforts showed the promise and value of unstructured data, for the most part unstructured data has been difficult or impossible to process and analyze. The advent of Big Data technology, which can handle large volumes of data and process it more rapidly than traditional database systems, is making the widespread use of unstructured data possible for the first time.

Breaking Down the Data Silo

The Aberdeen study found that nearly four in 10 companies lament that data remains “siloed” and inaccessible for analysis. When you add up varying data sources, formats and silos, the end result is that less than a quarter of the information that companies control is even available for analysis.

Thomas Redman, a consultant and author of Data Driven: Profiting from Your Most Important Business Asset, says it’s only a matter of time before companies find ways to harness the power of their unstructured data. He adds, “I don’t like the term unstructured data. I just think of it as data that has not been structured yet.” And, indeed, Big Data technology provides new ways to tag or categorize video, tweets and other unstructured data so it can be analyzed quickly.

Looking forward, Redman sees the most immediate needs for Big Data in the financial industry, healthcare and government. The common features these groups share is they collect and use huge amounts of data, which they need to apply to solving complex, pressing problems.

Finding how to make connections and correlations using both structured and unstructured data is part of the evolution of information technology. “The ability to structure information on the fly and find nuggets is a key aspect of Big Data,” Redman says. “People on the leading edge are doing projects with voice and other nontext information. That isn’t anywhere near mainstream yet, but the structuring of new types of data is never going to stop. The whole idea for structuring data is to know something that nobody else knows—and the search for those insights will be unending.”

Joe Mullich has received more than two dozen awards for writing about business, technology and other topics.