And this data behemoth isn’t standing still. Its economic foot print will grow to $50 billion by 2017.[ii]

Dollar signs that big means there is an unfathomable amount of data to be analyzed. And tons of it is being generated by the most recently digitized industry: healthcare.

The Healthcare Big Data Explosion

Some industry observers estimate that big data in healthcare was roughly 150 exabytes in 2011. Since then, it has increased at a rate of up to 2.4 exabytes per year.[iii] One exabyte is one billion gigabytes. Put into another way, an exabyte is so large that it would take about one million powerful home computers to store this amount of data.

Let’s look at it from a financial vantage point. Over at McKinsey, researchers believe that big data analysis could deliver more $300B in potential annual value to US healthcare.[iv]

But what exactly is the big data opportunity for healthcare? It’s not confined to the notes your physician types into your medical record during an appointment, even though EHRs are generating a lot of data that was previously scribbled onto pieces of dried plant matter and stored in file cabinets.

The real analytical opportunity is found in the vast oceans of unstructured data streaming out of healthcare organizations’ forms, files and machines. This clinically relevant information is mostly textual and undefined. It flows in varying volume and velocity through EHRs, lab and imaging systems, physician transcriptions, medical correspondence, insurance claims and the finance department. And it comprises 80 percent of all medical data.

What Kind of Data Are We Talkin’ ‘Bout?

Big data is a blend of two types of data: structured and unstructured. Structured data is information someone enters into a specific database field. For example, in an EHR, structured data would be a patient problem or active medication list typed in by a physician. Structured data creates a predictable, consistent view of patients in a healthcare system. But although it can tell you what a patient’s medical problem is, it can’t answer many questions about why the patient is having that problem.

This information is not contained in traditional databases. But if it is extracted and analyzed, it is data gold for healthcare organizations trying to make fully informed decisions.

So How Do You Get At It?

Unstructured data is notoriously difficult to access, standardize and share. Many big data analytical tools use natural language processing (NLP) to read language like a person would to derive meaning. Next, content analytics software typically analyzes that text to squeeze the juicy bits out of it, such as names of people, risk factors, genetic information, medications, diagnoses and how they all relate to each other. Did this drug cause this side effect? Is it a symptom of this disease? Does it correspond with the findings of an obscure study?

Most other industries have been mining their unstructured data for years. Finally unencumbered by the common pain points of paper-based or outdated IT systems, it is time for healthcare to catch up to the big data paradigm shift.

In my next post, I will discuss why big data is so important to healthcare and how the industry can take better advantage of these reams of bits and bytes.