Repeatability, Replication and the Reproducibility Initiative

26Aug12

Almost no one will contest that being able to reproduce the findings from scientific studies is key to advancing science – I say almost no one because in my experience you can always find one person to disagree with anything if you look hard enough. We all acknowledge its possible, have entire sessions devoted to fretting about John Ioannidis’ paper (which has, ironically, gotten extended past the actual support in the paper in my opinion), and node sagely when people talk about making code available, writing clear methods sections, etc.

So when press releases and news reports about the Reproducibility Initiative started making the rounds on various blogs I read, I looked it over with interest. The concept is simple: Reproducible results are good, and should be rewarded. Validate your study through the initiative, and you’ll not only get a ‘Certificate of Reproducibility’ (of whatever worth that might be to you) and more importantly for most career scientists, the replicated results can be published as an independent paper in the PLOS Reproducibility Collection, and the original study will be marked as reproduced in the parent journal if it’s one of the Initiative’s supporters.

That all sounds great…but as with all things, there’s a “but…” coming. Or, to my mind, several. More after the jump.

Cost: How the Reproducibility Initiative (which I’m going to shorten to RI from here on out) works as I understand it is you submit your study and some supporting information to RI – which is an offshoot of the commercial Science Exchange linking up providers of expertise with those who need that expertise – and they hand it off to a blindly matched provider who can validate your results. There’s a key phrase on their site that’s important: “You’ll be responsible for payment of services rendered unless sponsored by a partner organization.”

That means you’re footing the bill, or your grant is. That means you’re essentially paying for a second publication for your work (who knows how valuable those will end up being on a CV) out of progressively more and more stretched research funds. For a big lab this might not be a big thing, but for a small early-stage investigator lab running on the skin of its teeth, or a doctoral student’s project that managed to swing some funding? Yeah right. Without a funding mechanism, all the RI does is look at a small subset of science – both “well heeled” and “reproducible”.

Feasibility: This is the one that actually worries me the most. What, exactly, can Science Exchange reproduce? For the most part, it appears their strength is in wet-lab life science work. Searches for ‘Biostatistics’ and ‘High performance computing’ yielded at best one or two relevant results. ‘Mathematical modeling’ produced none.

For many laboratory-based publications, this might work swimmingly. But how do you propose replicating a mathematical model? The best I can envision is what I’ve termed “Click Run” reproducibility, which means your code works the way you say it works and puts out what you said it puts out. I’ve discussed this a bit with a question on CrossValidated, but as far as I’m concerned, that level of “does your code do what we think it should” reproducibility is a low threshold.

If we’re talking about what you need to really “reproduce” an epidemiology study? It’s not working off the same cohort data. It’s not rerunning the code. It’s conducting an entirely new study of the same effect on a different population. That is what’s needed in the field, it’s what’s needed to address the concerns of someone like Ioannidis and it’s entirely outside the scope of what RI can do.

It seems that science often struggles once it steps away from the bench, and two-group controlled experiments. Statistics becomes murky, causation turns ephemeral, and when people talk about reproducibility, you have to start outlining exactly what you mean. I might have a post on that last one in a few days. In my mind, the goals of the RI are laudable, but between not being able to serve small labs out of budget, and entire fields out of practicality, I wonder if it will do more than double the paper count of a few select labs. That would be a shame.