Big data: necessary but insufficient to understand the brain

OPINION: Neuroscientists should select the data they need, rather than collecting more and more big data, says brain scientist Florian Engert in an interview with ScienceNordic.

PernilleMette Damsgaard

Publisertsøndag 11. september 2016 - 08:59

“It’s absolutely necessary and absolutely insufficient,” says Florian Engert, a professor of molecular and cellular biology at Harvard University, USA, during an interview with ScienceNordic about big data in neuroscience.

He is provoking other scientists in the field with his controversial views on how the big data-projects that are currently under way to map every neuron in the brain are focussed on the wrong questions. Their focus should not be on simply collecting big data, but optimising their methods and focussing on producing smaller, but practically useful datasets, he says.

Scientists need the right data, not big data

The collection, storage, analysis, and interpretation of enormous amounts of data.

Engert defines it as “whatever doesn’t fit on a laptop”

But this definition is not fixed.

Technology develops, so what was considered big data years ago no longer fall into that category.

Geologists, meteorologists, and neurologists, all work with big data.

Companies and authorities collect it in the form of our personal information.

One of the problems with collecting a huge amount of data to map the brain is that a lot of the raw data will be thrown away, and lots of work essentially wasted.

“The whole effort right now to prepare the world for these massive data streams--for curating and distributing this huge amount of data--I think that is ill placed. What we should be doing is put fractions of these efforts into just optimising the extraction algorithms,” says Engert in the video.

It is pointless to simply collect as much data as possible if that data cannot be turned into new knowledge about the brain, he says. In his opinion, scientists should weed out the relevant data before distributing it to other scientists.

“That’s why I don’t think we need big data, they don’t need to be distributed,” he says.

Two big big data projects could tackle the problems

Engert led the research team who produced the first full image of neural activity in the brain using a transparent baby zebra fish. He is currently involved with two large research projects collecting big data to map and understand the brain.

The BRAIN initiative, is a US project that aims to develop the technology and the tools needed to measure and map the neural activity in the human brain.

Another project, The Open Connectome Project, aims to produce a wiring map of the brain to understand the connections between the neurons and how they communicate. They are developing algorithms to sift though and identify the 100 trillion synapses throughout the brain, which would be impossible to do manually.

One way to picture how the two projects fit together is to imagine that the ’Open Connetome Project’ maps the water pipes in a house, and the ’BRAIN Initiative’ determines how much water is being distributed through the pipes.

The big challenge of projects like these is often discussed in terms of big data: How best to collect, store, and distribute all of these data. But Engert argues that this problem can be completely avoided, if neuroscientists focus instead on small, but information-rich data sets.

Turning a useless map into knowledge

To do this, big data projects need to totally rethink how they collect their data, says Engert.

“The essential ingredient that turns a useless map into an invaluable resource is the experimental design employed to gather and analyse the data,” he writes in a recent review article on the challenges of big-data, published in the scientific journal Neuron.

He recommends a complete overhaul of the way that neuroscientists extract the information they need from the raw data using algorithms to isolate the data of interest. This he says, turns big data into something that could easily fit on to a laptop.

Developing the algorithms needed to optimise this data extraction process should be the focus of big data projects like the BRAIN initiative and the Open Connectome Project, he says.

“Once we have optimized the extraction process--when we automatically get spikes from neural imaging data and automatically get wiring diagrams--once we’re there, then there is no big data any more.”