Structured vs. Unstructured Data – Working as One

Friday 19 February 2016

Historically, structured data has dominated data analysis for those looking to understand each aspect of their organisation. Already formatted in a way that machines are able to understand and comprehend, structured data was the obvious choice for organisations to base their analysis on.

Whilst it has the benefit of being easy to store, query and analyse, structured data is relatively limited compared to the potential insight revealed by unstructured data. For example, structured data is perhaps not so helpful when trying to conduct an analysis over consumers to understand what motivates a consumer to purchase a specific product. Unfortunately humans don’t think like machines, a lot of their purchases are based on emotion thus leading to impulse purchases and this is something we cannot understand from looking at purely structured data alone.

Nowadays, with the incredible growth of data – particularly unstructured data (labelled as big data) and the emergence of new technologies it is now possible to combine the analysis of structured data, semi structured data and unstructured data to achieve better business outcomes than ever before. We shouldn’t be thinking of this as structured vs. unstructured data, we should be thinking of ways of integrating the two to complement each other.

Structured vs. unstructured data and the era of big data

Structured data is an organised collection of data typically structured into rows and columns. The purpose of this is to enable machines that apply limited logic, compared with that of a human, to be able to understand and process information. The most well-known example of storing structured data for the purpose of analytics is the relational database made popular with the introduction of the SQL programming language in the 1980’s.

Unstructured data is the complete opposite of this. It is everything else that isn’t in a structured format. It was never meant to be understood by machines, as this type of information has been created specifically for the purpose of being understood by a human mind. Examples of unstructured data include: documents, emails, books, letters, social media posts, images, audio files & video files, etc.

There are no limits to what format this data can be in, the length of it, the contents, how it is organized etc. And this is what has caused issue for so many years. The data is so wild that it has been largely left untamed as very few believed any insight could be garnered from the data.

This has all changed since the explosion of data generation from the turn of the millennium. The evolution of the internet has acted as a catalyst for this in many ways, constantly providing new channels where users are able to create, upload or share information – much of it unstructured.

Organisations realised that they couldn’t just ignore this rapidly growing, yet still untamed data. Here are a few statistics to illustrate this point…

The final and possibly most damning statistic is a study in 2013 revealed that despite the huge volumes of data now globally produced only 0.5% of the data is actually analysed.

Using structured and unstructured data in unison

Unfortunately traditional business intelligence technologies just weren’t up the task of pulling all of this data together in such high volumes, in such rapid velocity and coping with such a wide variety of data. Realising the benefits available by trying to analyse more than 0.5% of all global data meant that efforts to develop technologies enabling the storage and analysis of this data were substantially increased.

The result? Technologies such as the popular open-source software framework Hadoop were created providing many analytics organisations with the technology to augment their current offerings allowing them to store and process big data. Even before that, we recognised the value in this area and launched CXAIR to merge the traditional capability of analysing structured data with unstructured data through search-engine technology.

Having a much bigger pool of data allows you to create a much clearer picture of what it is that you’re trying to see, allowing you to make a much more informed decision. It is a widely accepted fact that more data beats smarter algorithms when talking about analytics.

For example, let’s say that I am a clinician working for a healthcare organisation. Under my remit, it is my responsibility to understand, monitor and suggest ways to improve my patients’ wellbeing. Being able to link unstructured data stored in letters, emails and prescriptions that have been sent between the two parties with each individual patient’s profile will provide me with a much better view of patient health and how their treatment is being managed. Furthermore, a report by Reuters in 2015 found that 71 percent of patients asked in their study were okay with letting their clinicians view their social media activity. By having access to this unstructured data, the proposed benefit is clinicians become more informed on the beginning stages of disease, how patients manage medical conditions and if there are any links to disease progression.

Whilst the example above is just pertinent to healthcare, I would argue that the same process of integrating structured and unstructured data can be applied to any vertical industry in the world. The benefits realised could be massive. You only have to look at the investments companies have made in big data over recent years to understand the potential value in the opportunity.

We shouldn’t just be looking at ways to harness unstructured data. We should be looking at ways that we can harness and manage unstructured data alongside and support the insight that we currently gather from structured data.