Victoria Stodden

Here’s the third in a series of notes from the Science 2.0 conference, a conference for scientists who want to know how software and the web is changing the way they work. It was held on the afternoon of Wednesday, July 29th at the MaRS Centre in downtown Toronto and attended by 102 people. It was a little different from most of the conferences I attend, where the primary focus is on writing software for its own sake; this one was about writing or using software in the course of doing scientific work.

As computation becomes more pervasive in scientific research, it seems to have become a mode of discovery in itself, a “third branch” of the scientific method. Greater computation also facilitates transparency in research through the unprecedented ease of communication of the associated code and data, but typically code and data are not made available and we are missing a crucial opportunity to control for error, the central motivation of the scientific method, through reproducibility. In this talk I explore these two changes to the scientific method and present possible ways to bring reproducibility into today’ scientific endeavor. I propose a licensing structure for all components of the research, called the “Reproducible Research Standard”, to align intellectual property law with longstanding communitarian scientific norms and encourage greater error control and verifiability in computational science.

Here’s her bio:

Victoria Stodden is the Law and Innovation Fellow at the Internet and Society Project at Yale Law School, and a Fellow at Science Commons. She was previously a Fellow at Harvard’s Berkman Center and postdoctoral fellow with the Innovation and Entrepreneurship Group at the MIT Sloan School of Management. She obtained a PhD in Statistics from Stanford University, and an MLS from Stanford Law School.

The Notes

Research has been how massive computation has changed the practice of science and the scientific method

Do we have new modes of knowledge discovery?

Are standards of what we considered knowledge changing?

Why aren’t researchers sharing?

One of my concerns is facilitating reproducibility

The Reproducible Research Standard

Tools for attribution and research transmission

Example: Community Climate Model

Collaborative system simulation

There are community models available

Built on open code, data

If you want to model something a complex as climate, you need data from different fields

Hence, it’s open

Example: High energy physics

Enormous data produced at LHC at CERN — 15 petabytes annually

Data shared through grid

CERN director: 10 – 20 years ago, we might have been able to repeat an experiment – they were cheaper, simpler and on a smaller scale. Today, that’s not the case

Example: Astrophysics

Data and code sharing, even among amateurs uploading their photos

Simulations: This isn’t new: even in the mid-1930s, they were trying to calculate the motion of cosmic rays in Earth’s magnetic field via simulation

Example: Proofs

Mathematical proof via simulation vs deduction

My thesis was proof via simulation – the results were not controversial, but the methodology was