Reproducible Research: Why it’s great, and why I’ll do it

January 4, 2011

I was thinking recently about a post I made a while back about my academic values, and happened to come across the Reproducible Research website. I can’t emphasise how much I agree with the idea of reproducible research – it would remove so many problems that I have been having with my research, and will help my research to fulfil one of the key requirements of science, that is to be fully reproducible.

When trying to implement image segmentation algorithms for my RTWOBIA project (which is still progressing, albeit very slowly) I had great problems with (a) understanding and (b) implementing the algorithms in the literature. Some of these papers deliberately left out details of the algorithms for commercial reasons (for example Baatz and Schape (2000) who describe the algorithm used for multi-resolution segmentation in eCognition, but leave out some key details), but others just didn’t seem to include the details that would be required to implement the algorithm.

This was incredibly frustrating when trying to implement (and possibly extend) these algorithms. Obviously, if I had been able to implement them (and I am still slowly working on the implementation) then this would have benefited both me and the authors of the paper, as I would have been able to continue with my research, and the authors would have received citations from me and greater exposure of their work.

The concept of reproducible research aims to change this, by ensuring that all published research is accompanied by a website containing all of the details of the algorithm, the code used by the authors to produce the results shown in the paper, and the data used in the study. There is a very simple how-to document available which lists everything that should be provided to enable the research to be fully reproduced, and a very readable paper by Vandewalle and Kovacevic (2009) which explains the idea and the benefits. These benefits are for both the readers and the authors: the readers will have all of the information needed to reproduce the research, and therefore all of the information needed to use the research in their work, or to extend it. The authors are likely to get more citations due to this ease of reproducibility, and, even more important, will find it easier to reproduce their research themselves. I’ve lost track of how many times I’ve struggled to reproduce research outputs that I’ve produced, and ensuring that anyone can reproduce them should help the original authors to reproduce them too!

Unfortunately, I can see a number of issues with fully implementing the concepts of reproducible research in the fields of remote sensing and GIS. However, I will start by working on reproducible research pages for my current publications, and post a follow-up post detailing the issues (and hopefully the solutions) for doing reproducible research in my field.