Big data

21 June 2017

Page Content Zone 1

These days, “big data” is all over the news, and the news is not always good. We read about privacy loss as only 50-odd ‘likes’ on Facebook allow data miners to create an accurate portrait of your psychology, we experience that a website visit results in targeted advertisements everywhere we surf, and we are confronted with large-scale data collection by security services across the world. No wonder big data is sometimes treated with suspicion.

Fortunately, this isn’t the case in life sciences, which have evolved into a big data field almost overnight. Gone are the days when carefully collected specimens needed to travel half the known world on slow and capriciousships, or when the careful work of decades would unravel only a single gene sequence or protein structure. Today, we have sleek machines that wink at us with a few brightly colored LEDs while plowing through samples day and night, humming all the while like bizarre data factories from the future. As a result, the potential for data-gathering in life sciences has started to rival that of the world’s social networks or top security services, as we continue to eavesdrop on the private lives of genes, proteins and metabolites.

In life sciences, too, it has therefore become indispensable to have access to advanced artificial intelligence algorithms and hardware to process the plethora of heterogeneous and noisy data that we extract from livingsystems. In fact, for many researchers, this is a constant stress factor; it is now so trivial to acquire vast quantities of biological data that the inability to cope with this data is often the key research bottleneck.

At the same time, our struggle to manage this data tsunami also means that we are typically unable to fully analyze our data, leaving large amounts of information locked inside our datasets. And so, while data generators will have extracted the relevant results for their research, the leftover, untouched information in published datasets can be a true treasure trove for computational scientists with a penchant for (orthogonal) reprocessing of other people’s data.

Perhaps the most remarkable aspect of the big data cloud that surrounds our collective online presence is the uncanny ability of data miners to connect seemingly disparate and innocent tidbits of personal informationinto a strongly connected and reasonably accurate representation of one’s life. Because such integrative big data analytics could provide a real boon for biological research, these approaches should be very much on every forward-looking computational biologist’s radar.

Regardless of the challenges yet before us, however, big data in life sciences is surely here to stay. And given the progress achieved so far, big data analytics will certainly play a key role in the biology of the not-too-distant future.

Clearly, it’s high time we dedicated an issue of VIBnews to the topic. Enjoy!