It seems odd that computers are involved in these kinds of errors – after all, we write instructions down in the form of programs, complete and unambiguous descriptions of our methods. We feed the programs to computers and they do exactly what the programs tell them to do. If there’s an error, the scientific method should catch them when other researchers fail to reproduce the results. So why are errors slipping through?

That’s the question that Mike and I were chewing over between talks at TEDxSHU in December 2015. I think the talks I heard there inspired me to think harder about trying to find an answer. It seems like the first step to solving the problem is reproducing results.

Reproducibility Fail

My MSc. dissertation involved processing a load of data that I was given and running programs that I’d written to draw conclusions. Although my dissertation ran to many thousands of words, it was a fairly shallow description – my interpretation, in fact – of what the data said and what the code did. I can’t give you the data or the code as there were privacy and intellectual property concerns about both.

If I’m going to tear it apart, my dissertation really describes what I intended to tell a computer to do to execute my experiment. Then it claims success based on what happened when it did what I actually told it to do.

If you had my code, you could run it on your own data and see if my conclusions held up. You could inspect it for yourself. You could see the tests I wrote and maybe write some yourself if you had concerns. You could see exactly what versions of what library code I was using – maybe there have been bugs discovered since that invalidate my conclusions. If you had my data you could check that my answers were at least correct at the time and are still correct on more recent versions of the libraries.

If you had my code and my data, you won’t know what kind of computer I did the work on or how it was set up. Even that could change the result – remember the pentium bug? Finally, if you had all that information, you’ve still got to get hold of everything that you need, wire it all up and do your verifications. That’s quite a time and cost commitment, assuming that you still can get hold of all that stuff months or years later.

Continuous Integration to the Rescue?

I’m sure I’ve just skimmed the surface of the problem here – I’m not a researcher myself, nor am I claiming that my dissertation was in any way equivalent to an academic paper. It’s just an example I can talk about, and it’s enough to give me an idea. It sounds a little like the “works on my machine” problem that used to be rife in software development. One of the tools we use to solve it is “continuous integration”.

Developers push their code to a system that “builds” it independently, in a clean and consistent environment (unlike a developer’s computer!). “Building” might involve steps like getting libraries you need, compiling and testing your code. If that system can’t independently build and test your code, then the build breaks and you fix it.

A solution along these lines would necessarily have to automatically verify that all the information needed to get the code running, such as the code itself, configuration parameters, libraries and their versions, and so forth are present and correct. If the solution could also accept data and results, and then verify that the code runs against the data to produce the results, then it seems like we’ve demonstrated reproducibility.

Setting your own CI server isn’t necessarily straightforward, but Codeship, SnapCI and the like show that hosted versions of such solutions work, offer high levels of privacy and (IMHO) simplify the user experience dramatically. A solution like one of these, but tailored to the needs and skills of researchers might help us start to solve the problem.

Tailored CI for Researchers

I think that the needs of a researcher might differ a little from those of a software developer. What kinds of tailoring am I talking about? How about:

quick, easy uploading of code, data and results, every effort to make it “just work” for a researcher with minimal general computing skills

enable more expert users to take more control of the build and test process for more unusual situations

private by default with ability to share code, data and results with individuals or groups

ability to allow individuals or groups to execute your code on their data, or their code on your data, without actually seeing any of your code or data

what-if scenarios, for example, does the code still produce the correct results if I update a library? How about if I run it on a Mac instead of a Windows machine?

support for academic scenarios like teams that might be researching under a grant but then move on to other things

support for important publication concerns like citations

APIs to allow integration with other academic services like figshare and academic journal systems

I think that’s the idea, in a nutshell. I’m not sure if it’s already being or been done, or if not, what could happen next, so I’m punting it into the public domain. If you have any comments or criticism, or if there’s anything I’ve skimmed over that you’d like me to talk about more please leave me a comment or ping me on Twitter.