Stanford Medicine magazine examines science's deluge of big data

Jul 112012

Colin Clark

Atul Butte among the machines storing data in one of Stanford's data centers.

Just as people leave digital trails these days, so do our cells — and they've been doing so for decades as a result of biomedical research. This data, much of it stored as genetic transcripts in huge public databases, constitutes a goldmine for research and drug development.

But the magnitude of the data, the speed at which it’s growing and the threat it could pose to individual privacy mean mastering "big data" is one of biomedicine's most pressing challenges.

"Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world," said data miner Atul Butte, MD, PhD, an associate professor of pediatrics, in an article in this issue. "If I don't analyze those data and show others how to do it, too, I fear that no one will."

The report's opening article tells about a project launched by the chair of the Stanford’s Genetics Department, Michael Snyder, PhD, to generate billions of individual data points about his own physiology and then analyze them. The result was 30 terabytes of data — about 30,000 gigabytes, or enough CD-quality audio to play non-stop for seven years. He predicts that data collections such as his will become commonplace tools to personalize health care. He submitted his personal data to public databases, though he realizes many others who have their genomes sequenced will not make this choice.

The data boom is not only molecular. Health information that can be gleaned from patient records is another rich resource. Another story in the special report shows that Stanford University Medical Center is among a handful of biomedical institutions building research databases based on their patients’ information so their researchers can use them to improve medical care.

The federal government is responding to the abundance of data with funding to support its development as a resource: In March, the Obama administration announced the Big Data Research and Development Initiative, committing $200 million to "greatly improve the tools and techniques needed to access, organize and glean discoveries from huge volumes of digital data."

The abundance of data could change how biomedical scientists conceive of their experiments in the first place. Hypothesis-driven science isn’t dead, say many scientists, but it’s not the most useful way to analyze big data sets.

"We've been so focused on generating hypotheses," said cardiologist Euan Ashley, MD, in the report's lead article. "But the availability of big data sets allows the data to speak to you. Meaningful things can pop out that you hadn't expected. In contrast, with a hypothesis, you're never going to be truly surprised at your result."

Inside the report:

The lead article on the data deluge in biomedicine, told through the story of genetics professor Michael Snyder, who made himself the subject of his own big data project, allowing the world to watch as his health took a nose dive.

A piece on the creation of a research database built on medical records from Stanford and Lucile Packard Children's hospitals.

An article on the rise in importance of biostatistics: Suddenly the "stodgy" old field of statistics is where the action is.

A Q&A with science fiction author Vernor Vinge, seven-time winner of the Hugo Award, whose stories explore themes including deep space and the singularity, a term he coined for the emergence of a greater-than-human intelligence brought about by the advance of technology.

This issue’s "Plus" section, featuring stories unrelated to the special report, includes:

A story on using gamification — with a computer game called Septris — to teach residents and doctors how to recognize when patients have sepsis, and how to treat them.

A feature on the mounting evidence that a single antibody, known as anti-CD47, could knock out many cancers.