So here’s a sobering thought: According to the Sloan Digital Sky Survey there are 60 billion trillion stars in the observable universe. By the year 2017, according to Kevin Nowka, Director of Research for IBM, the total amount of data humans have generated will surpass that.
Much of that data, of course, is kitten memes.
Nowka spoke Tuesday at the Austin Forum on Science, Technology and Society held at the AT&T Executive Education and Conference Center on the topic: The Big Deal About Big Data.
According to the International Data Corporation, Nowka said, a fourth of the information in the digital universe would be useful for big data if it were tagged and analyzed but only three percent of useful data is tagged and even less than that analyzed.
Useful data includes data that could tackle the world’s problems such as wasteful medical spending, congested urban roadways, and environmental concerns. For example, the Institute of Medicine estimates of that of the $2.7 trillion spent on U.S. healthcare in 2011, $750 billion was wasted because of unnecessary and inefficiently delivered services, excessive administration costs, missed prevention opportunities and fraud. Some of which could be addressed with better data. Also, Nowka said, an IBM study revealed one in three business leaders frequently made decisions based on information they don’t trust or don’t have.
After sharing more than a dozen statistics about the growth of data streams—the volume of data doubles every year; and exploring the burgeoning number of sources—everything from social media to connected automobiles–Nowka introduced IBMs solution to problem of making use of it all: Watson Core Technology. This computer system uses natural language capabilities, hypothesis generation, and evidence-based learning to analyze and find answers in huge amounts of data. It is already being used by medical institutions such as Memorial Sloan Kettering who have trained the system to help diagnose and treat oncology patients and in experiments with retail and institutional banking organizations.
IBM is beginning to offer the technology on a limited basis to individual and business customers through the Watson Engagement Advisor Early Customer Program. The system still only speaks English and doesn’t deal well with images, Nowka said. But it can analyze natural language for subjects and verbs, dates, proper names, and learn as it goes, categorizing information and refining its hypotheses.
The Austin Forum is a monthly speaker series sponsored by the Texas Advanced Computing Center.

Share this:

Comments

Susan, one open source opportunity to mention is HPCC Systems from LexisNexis, a data-intensive supercomputing platform for processing and solving big data analytical problems. Their open source Machine Learning Library and Matrix processing algorithms assist data scientists and developers with business intelligence and predictive analytics. More at http://hpccsystems.com