The promise and perils of a datafied world

Big Data: A revolution that will transform how we live by Viktor Mayer-Schönberger and Kenneth Cukier alternates between enthusiasm and apocalyptic caution

IN A former life I was a research assistant. After painstaking weeks spent gathering data, I was tasked with putting the numbers into a statistics application that would help us deduce our trends.

While I was analysing the figures, my boss peered over my shoulder and pointed at a record on the screen. "Get rid of that one," I was told. "Also that one, that one, that one and that one." They were outliers and they were going to mess up the findings. "You're never going to trust science again," said my superior with a rueful laugh.

But if you believe Viktor Mayer-Schönberger and Kenneth Cukier, science will be just fine because such practices are about to become as archaic as leeching. Indeed, in Big Data, Cukier, a business writer at TheEconomist, and Mayer-Schönberger, a professor at the University of Oxford, argue that the big data revolution will save science.

Big data seemed to reach the apex of its hype cycle around 2012, when journalists and experts variously extolled its virtues – or wrung their hands about its implications. And yet, somehow it remained elusive: what exactly was it? This book answers that question.

First, then, a definition. Big data describes the idea that everything can be digitised and "datafied" – thanks to cheaper storage, faster processing and better algorithms. And that really means everything, from your current location or liking for strawberry pop tarts, to your propensity for misspelling and degree of personal compassion. And not just your data: everyone's data.

This changes science, say the authors, by ridding us of biased random samples and the need to massage the resulting data to make it sufficiently representative of a larger population. Could biased sample sets be at the root of many failed attempts to replicate experiments?

Whatever the answer, store everything and the need for proxies disappears. Instead of formulating hypotheses and then looking for confirmation in small, error-prone trials or experiments, scientists now have the storage, processing and algorithmic sifting power to simply trawl through the constellation of all data and spot trends.

Getting rid of the hypothesis would be a staggering change in the scientific method, but both Cukier and Mayer-Schönberger appear convinced it is a step in the right direction, freeing science from the (often unconscious) biases of scientists and increasing the accuracy of its findings.

But there is something even more revolutionary going on here. Reusing data was often nigh-on impossible: data collected for one purpose could rarely be reshuffled to probe it for anything other than the original purpose for which it was generated. This is no longer true – and that promises to have deep, potentially dystopian consequences.

The authors provide a brilliant metaphor for big data in the shape of the new Lytro camera, which uses information from not just one plane of light but which, its makers claim, "captures and processes the entire light field" of a given view. This means that you can focus later, during processing, on any plane you choose. Similarly, the ability to suck up massive amounts of unfiltered data means that any dataset can be used a nearly limitless number of times and for any purpose an algorithm designer can think up.

This opens up extraordinary possibilities. Want to tease out people's route to work from their cellphone data? There's an algorithm for that. Want to look through someone's Twitter feed to predict if they are prone to certain crimes? You can bet an algorithm for that will come along some time soon.

But do you really want every detail of your personal life mined ever more efficiently by every proto-Zuckerberg with an algorithm? I don't, but my phone company may decide to sell my data to start-ups for extra profit. And we would have no easy way to protect ourselves: just reading the terms and conditions of the privacy policies for the hardware, software and apps you already own would take a week and a half, with no breaks for sleeping or eating, and that will only increase.

Worse, is there an algorithm to watch over the algorithms? Where do you turn if some day in the future your love of pop tarts gets you barred from surgery. Or what if you're a prisoner and CCTV gait-analysis software decides you are still an unreformed character – and so you are refused parole?

There is an interesting tension in the book, alternating sometimes between enthusiastic business-speak and apocalyptic caution. This comes across as an ongoing "conversation" between the authors. Since Mayer-Schönberger wrote Delete: The virtue of forgetting in the digital age, I suspect that it is his caution tempering Cukier's evangelism.

This ensures that we have a credible picture of the upside of big data. For example, it could help when inspection systems get overwhelmed – as the 55 inspectors tasked with ensuring the safety of 3500 Gulf oil production platforms a year before the Deepwater Horizon accident certainly did. If every single aspect of the rigs' operational data had been routinely collected, an algorithm could have spotted trouble early enough for action.

Cukier and Mayer-Schönberger have pulled all this together in an elegant and readable primer. The one thing missing is a "next steps" section that doesn't just throw the problem into the laps of policy-makers, who are notorious for sitting on their hands until crises force them to act.

And my former boss may well be doing better research thanks to big data, but what secrets will the records of the lab's students divulge to future data-miners?

Book informationBig Data: A revolution that will transform how we live, work and think by Viktor Mayer-Schönberger and Kenneth CukierJohn Murray£20

Big data seemed to reach the apex of its hype cycle around 2012, when journalists and experts variously extolled its virtues - or wrung their hands about its implications. And yet, somehow it remained elusive: what exactly was it? This book answers that question.

First, then, a definition. Big data describes the idea that everything can be digitised and "datafied" - thanks to cheaper storage, faster processing and better algorithms. And that really means everything, from your current location or liking for strawberry pop tarts, to your propensity for misspelling and degree of personal compassion. And not just your data: everyone's data.

This changes science, say the authors, by ridding us of biased random samples and the need to massage the resulting data to make it sufficiently representative of a larger population. Could biased sample sets be at the root of many failed attempts to replicate experiments?

Whatever the answer, store everything and the need for proxies disappears. Instead of formulating hypotheses and then looking for confirmation in small, error-prone trials or experiments, scientists now have the storage, processing and algorithmic sifting power to simply trawl through the constellation of all data and spot trends.

Getting rid of the hypothesis would be a staggering change in the scientific method, but both Cukier and Mayer-Schönberger appear convinced it is a step in the right direction, freeing science from the (often unconscious) biases of scientists and increasing the accuracy of its findings.

But there is something even more revolutionary going on here. Reusing data was often nigh-on impossible: data collected for one purpose could rarely be reshuffled to probe it for anything other than the original purpose for which it was generated. This is no longer true - and that promises to have deep, potentially dystopian consequences.

The authors provide a brilliant metaphor for big data in the shape of the new Lytro camera, which uses information from not just one plane of light but which, its makers claim, "captures and processes the entire light field" of a given view. This means that you can focus later, during processing, on any plane you choose. Similarly, the ability to suck up massive amounts of unfiltered data means that any dataset can be used a nearly limitless number of times and for any purpose an algorithm designer can think up.

This opens up extraordinary possibilities. Want to tease out people's route to work from their cellphone data? There's an algorithm for that. Want to look through someone's Twitter feed to predict if they are prone to certain crimes? You can bet an algorithm for that will come along some time soon.

But do you really want every detail of your personal life mined ever more efficiently by every proto-Zuckerberg with an algorithm? I don't, but my phone company may decide to sell my data to start-ups for extra profit. And we would have no easy way to protect ourselves: just reading the terms and conditions of the privacy policies for the hardware, software and apps you already own would take a week and a half, with no breaks for sleeping or eating, and that will only increase.

Worse, is there an algorithm to watch over the algorithms? Where do you turn if some day in the future your love of pop tarts gets you barred from surgery. Or what if you're a prisoner and CCTV gait-analysis software decides you are still an unreformed character - and so you are refused parole?

There is an interesting tension in the book, alternating sometimes between enthusiastic business-speak and apocalyptic caution. This comes across as an ongoing "conversation" between the authors. Since Mayer-Schönberger wrote Delete: The virtue of forgetting in the digital age, I suspect that it is his caution tempering Cukier's evangelism.

This ensures that we have a credible picture of the upside of big data. For example, it could help when inspection systems get overwhelmed - as the 55 inspectors tasked with ensuring the safety of 3500 Gulf oil production platforms a year before the Deepwater Horizon accident certainly did. If every single aspect of the rigs' operational data had been routinely collected, an algorithm could have spotted trouble early enough for action.

Cukier and Mayer-Schönberger have pulled all this together in an elegant and readable primer. The one thing missing is a "next steps" section that doesn't just throw the problem into the laps of policy-makers, who are notorious for sitting on their hands until crises force them to act.

And my former boss may well be doing better research thanks to big data, but what secrets will the records of the lab's students divulge to future data-miners?

This obsession with Big-Data is a sign of a lazy, explanation-free, empirical/inductive approach to science. Karl Popper would be spinning in his grave. The fact is, these things may have there uses, but in the hard sciences they are useless. You cannot, for instance, explain the mystery of dark matter by collecting lots of data and running a stock machine learning algorithm. No new knowledge is produced. All the knowledge is created by the programmer who chooses the feature space. 'Data' are theory-laden, how we represent something depends on the theory we are using.

Let the buzz word spread. But those of us who truly appreciate science from the true Popperian, explanatory, point of view, understand that it is garbage.

Rodney
on March 26, 2013 2:46 PM

An intresting problem Big Data, but one that seems to be relatively easy to solve. Use a machine intelligence capable of handling all that data, and so therefore far more intelligent than the humans that created it.

Then again, as soon as such a DEM came into operation, how could you possibly tell that the results it was passing onto you were the results you wanted to hear, so it could alter your thinking and behaviour in the ways it analysed would have the overall greatest effect on its survival, such as a stable world with expanding space resources and renewable energy etc.

Dreamer
on March 26, 2013 3:02 PM

You can't learn anything about the cold hard facts of physics or chemistry, but psychology & sociology could benefit from large quantities of data on people's activities, and then climatology & seismology would thoroughly be behind large databases full of information about everything, everywhere on the planet.
I care not how you translate Popper's philosophies, but the hard sciences are not the only sciences worth studying.