Unlocking Big Data

By Richard Harth

On an average day in cyberspace, 200 million "tweets" on topics ranging from pet behavior to political struggles circle the globe.

This chatter is a virtual treasure trove for advertisers and corporations who would like to harness it to learn more about public attitudes and trends. Making sense of these terabytes, petabytes, or even exabytes of data, however, has been a challenge.

Harsha Krishnareddy (M.S. ITM '11) is on the hunt. As a co-op on the IBM jStart Emerging Technologies team, he is designing components of BigSheets, a new IBM platform created to corral and tame big data sets.

Krishnareddy explains that BigSheets is able to speedily comb through millions or even billions of documents from varied sources on the Web. The Twitter universe is the sort of vast, unstructured data set that BigSheets excels at mining and deciphering. "We're looking for various tools that can crawl the social media, analyze them, and give back insights into data, in a very neat, intuitive manner," he says.

Like a virtual intelligence agency, BigSheets can trawl through the entire planet's online chatter over hours, days, or months, ferreting out pertinent remarks from the enormous haystack of unrelated commentary.

The information that BigSheets gathers is typically outputted as a very large spreadsheet, hence the name. But Krishnareddy has also worked on data-visualization plug-ins that allow huge data sets to be grasped visually, leveraging previous IBM tools, including one named Many Eyes, which allows the data to be configured as a bubble graphic.

Once BigSheets has gathered its gigantic data set of tweets, it goes about interpreting them, using IBM natural language processing technologies. These were demonstrated to uncanny effect when Watson—IBM's "Jeopardy!"-playing phenomenon—trounced its human competitors in 2011.

As Krishnareddy explains, BigSheets uses related language software to carry out sentiment analysis of Twitter tweets, zeroing in on how people are reacting to business products and services, upcoming films, and political candidates, to name but a few.

And Twitter is only the beginning. Ultimately, BigSheets will open up the full galaxy of unstructured data, making it useable in unforeseen ways.