You are here

Reproducing Statistical Results

Speaking in the MAA Carriage House on October 23, Victoria Stodden (University of Illinois at Urbana-Champaign) discussed the counterintuitive impact of computation and big data on transparency in scientific discovery and dissemination. As an example, she reminded her rapt audience about the hype that once surrounded Google Flu Trends.

Launched in 2008, Google’s flu-tracking service monitored web searches across the United States for words like “fever” and “cough.” It used these data to successfully predict the number of flu-related doctor visits up to nine weeks in advance.

“Everyone was so excited,” Stodden said, “but then it just seemed to not work anymore.”

A March 2014 article in Science revealed that Google had not only overestimated the number of United States flu cases in the 2012-13 season, but also consistently overshot the number since as early as 2011.

It doesn’t sit well with Stodden that the opacity of the Google flu trends research—Google never released the search terms or algorithms driving its predictions—prevented outsiders from determining why its accuracy flagged.

“We’re not able to actually dig in to what was going on,” she said. “And I don’t think it’s okay. I think we need to break open these methods and really understand the inferences coming from big data.” Understand why they might be right and, perhaps more importantly, why they might be wrong.

The scientific method has traditionally included two branches: the deductive (mathematics and formal logic) and the empirical (statistical analysis of controlled experiments). When publishing results under either of these umbrellas, researchers must disclose their reasoning and methods such that the results can be independently verified. Mathematics papers outline proofs; scientific monographs detail experimental set-ups.

In recent years, technological advances and the associated changes to scientific practice have prompted discussion of large-scale simulations and data-driven computational science as third and fourth branches of the scientific method. Stodden argued that these are only potential branches, since computational science is not currently communicated so that it can be routinely reproduced.

So how can the standard of communication be elevated? Stodden called for the scientific community—including journals, funding agencies, scientific societies, as well as researchers—to communicate “really reproducible research.” Stanford professor David Donoho paraphrased the concept:

“The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete...set of instructions [and data] which generated the figures.”

Stodden described some efforts that are underway. Dissemination platforms enable researchers to share software and data. Workflow tracking and research environments automatically record information so scientists don’t have to laboriously retrace their steps later. Embedded publishing moves beyond static PDFs to present research in bundles that constitute reproducible computational science (including executable source code, say).

“The majority of these tools are being constructed and shared and developed just by academics on the side,” Stodden said. “It’s not part of their day job. They just think it’s really important.”

Government Mandates

Washington also thinks it’s important, apparently, to put science—computational included—on a firmer footing. Stodden unpacked for her Carriage House audience three government mandates—all issued within the last two years—related to this issue. These documents call for “expanding public access to the results of federally funded research” and “making open and machine readable the new default for government information.”

Stodden, who framed her own discussion in terms of Mertonian norms and Robert Boyle’s seventeenth century insistence on reproducibility, remarked on the distinctly different spin Washington puts on what are, at base, closely aligned concerns. Where Stodden speaks of scientific skepticism and access to datasets and source code, Washington talks American competitiveness, economic growth, start-ups.

A relevant notice of request for information, for example, bears the title “Strategies for American Innovation.” Sensitivity to how the purpose of scientific research is understood in Washington and by scientific researchers is essential to good policy, Stodden said.

The First Wave

For all the grassroots and governmental progress, it’s early days yet. Stodden envisions the big data deluge as a two-wave phenomenon.

“I think we’re still in the first wave,” she said. “It’s all very exciting to collect all this data. We haven’t seen all that much actually come out of the data yet, if we’re really honest with ourselves. There’s a lot of potential there.”

The second wave in big data will occur when researchers extract information and inferences and make them reliable and robust. They’ll have to communicate their methods of inference and knowledge generation, disclosing the complete methods, software implementation, and raw data that underlie their scientific conclusions. Making big data research reproducible poses many challenges, said Stodden, including managing privacy concerns and updating establishing standards of research dissemination. But scientists and policymakers have recognized the problem and are making progress toward solving it.