Big Data and Biotech – What Is The Big Deal?

by Dr Cameron Ferris | January 24th, 2017

“Big Data”. It’s a phenomenon that has captured the attention of businesses, scientists and governments alike[1]. The space is touted to grow to a $203 billion market by 2020, transforming industries from banking to manufacturing and making Data Scientist the “Sexiest Job of the 21st Century” according to the Harvard Business Review. Big data is certainly making a big impact.

This is no less true in the world of biotechnology. Big pharma companies are looking towards big data as a potential cure for declining R&D efficiency and empty drug pipelines, while the McKinsey Global Institute estimates that the use of big data could save the US healthcare system $100 billion annually.

So what’s the big deal? A quick look at recent trends in the gathering, analysis and use of big data in biotech provides some insights.

Bytes about bodies: the data transforming biotech

If there was ever a poster-child for big data in biotech, it is DNA sequencing — our ability to read the code of human life. The efficiency with which we can gather this data has progressed at an astounding rate. Mapping of the first human genome, completed in 2003, took over 10 years and cost around $3 billion. The process can now be completed in 1-2 days at a cost that is decreasing even faster than Moore’s Law[2] (Figure 1).

Figure 1: Cost of sequencing a human genome since 2001 (Source: NIH)

Other than genomics, different sources of data about our bodies — like proteomics, metabolomics, transcriptomics, epigenomics and others — are expanding rapidly and combining to create personalised biochemical fingerprints. Data from the Human Microbiome Project could dwarf the human genome — there are about 10x more bacteria than human cells living in your body.

Also, an explosion in smart devices and sensors is making it easier to gather and distribute this data, often in real-time. For example, we have been working on the development of a novel biosensor platform for detection of glucose in saliva (Figure 2), with the potential to extend to other analytes. Devices like this offer enormous potential to provide previously untapped data about our bodies and behaviours.

Big means big: the challenge of analysis

All this data from our bodies certainly meets the definition of ‘big’, and that can present some challenges. Raw data from a single sequenced human genome is around 200 gigabytes; that’s equivalent to 50 high-definition feature films. Increasing sequencing activity has led to an explosion of genomics data (Figure 3) which has some researchers concerned about our ability to store and process it all. The world’s data centres — the repositories of all this information — already consume a reported 3% of global electricity supply.

What’s the big deal?

The big deal is that if we can effectively harness and analyse this data — and our ability to do so is improving every day — it could revolutionise healthcare. And that has the world of biotech very excited indeed.

Much of the excitement centres around personalised medicine, where data about an individual is used to tailor their treatment. For example, since the Human Genome Project, some treatments for cancer have been tailored to the genetic profile of certain patients. The opportunity here for improved therapies and more efficient drug development is endless.

But the impact of big data on biotech could be even further reaching. The application of deep machine-learning could enable rapid discovery and re-purposing of drugs without even stepping foot in a biology lab. Smart devices connected within the internet-of-things will stream data for the monitoring of clinical trials and remote healthcare. Data analysis will provide new insights around drug safety and improved pharmacovigilance. These are but a few applications and the list is constantly growing.

The age of big data has undoubtedly arrived — and it is going to be a big deal for biotech.

[1] If you’re not familiar with the term, “big data” simply refers to extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations.

[2] Moore’s Law refers to the trend that the number of transistors per square inch on integrated circuits (which is directly related to computer speed) had doubled every year since their invention.