The fisherman and the sea of data

What does a «data scientist» actually do? And do we need to be afraid of him?

Christoph Hugenschmidt

The first time I read the term big data, I thought to myself: «Yeah, yeah.» And, whoosh, big data had disappeared from my buzzword-filled brain. Do you remember «Web 2.0»? Or «Second Life»? Exactly!

But big data is still there and is getting bigger and bigger. The term has transformed from a buzzword devised by marketing brains to a brief description of a paradigm shift. Big data has changed our lives and will increasingly do so, as it involves the manner in which people, machines, companies and states find out things, answer questions, determine facts, make prognoses, develop knowledge and control things – for example, the heating in single-family homes or fleets of whalers.

Imagine you are a sock manufacturer: to find out which colours the socks for the Benelux market should have next spring, previously you asked the IT department to immediately start collecting all the data on socks sold, their colours, the time of year and the region and then to analyse this data. By the time the answers were there, you already had different questions and the answers were wrong anyway because you forgot to tell the IT staff that navy blue and blue azure are the same thing.

«Big data is getting bigger and bigger. The term has transformed from a buzzword devised by marketing brains to a brief description of a paradigm shift.»

In the future, you will contact a «data scientist». He or she will cast a net into the «sea of data». He or she will combine all the data from sock sellers (colour, sales date) with historical weather data, a «sentiment analysis» of sock wearers from Twitter and Facebook, analysis of fashion magazines, your sellers’ last five years of e-mail traffic with Dutch sock shops and an analysis of all selfies posted by 15-year-olds from the Benelux states who were wearing socks.

And big data can do even more: for the automatic control of traffic lights, machines will read out the photos of number plates taken by numerous tunnel cameras in the Gubrist tunnel, combine them with holiday calendars, data from petrol stations and weather data and, in this way, make the red-light phase slightly longer for traffic coming from north-western Switzerland at the right time to prevent traffic from coming to a complete standstill on the Gotthard route. Secret police officers will fish around in the «sea of data» and find out who is watching forbidden shows from abroad alone or together with their neighbours. They will no longer need an unreliable and expensive army of informers, as they had in the past.

The «sea of data» will be fed from a never-ending number of sources: photos in social networks, recordings from trillions of sensors, e-mails, telephone calls, public data like minutes from the meetings of authorities and land registers, medical treatment logs and dictionaries. Those who know how to cast nets in this data will find out more than we, as individuals, would even think to ask.

Is that good or bad? That depends on the fisher’s intentions and his or her customer. Ask your data scientist.

Christoph Hugenschmidt

The journalist and publisher founded the online newspaper «inside-it.ch» ten years ago. He has been working with computers since 1978 and still doesn’t know whether they are good or evil.

Further Focus articles

Title topic: big data

Understand the present and design the future

In Switzerland, the first cities are testing out how infrastructure plans can be evaluated better using anonymised mobile communications data.