Shub Niggurath on Archiving Code

Excellent post by Shub Niggurath at his blog here discussing replication problems. It’s interesting to see how the same excuses play themselves out in different fields. Statisticians criticize authors for non-replicability of their results. The authors complain that the statisticians failed to replicate a previously unreported (and usually questionable) methodological procedure. We’ve seen this movie before.

Shub reports that Hothorn and Leisch, Case studies in reproducibility, in Briefings in Bioinformatics noted that one of our papers (MM 2005, EE) even included code in the running text of the paper to clarify certain points:

Acknowledging the many subtle choices that have to be made and that never appear in a ‘Methods’ section in papers, McIntyre and McKitrick go as far as printing the main steps of their analysis in the paper (as R code).

And that is what science should be. People write it in blogland all the time, but if you fear the release of your code then you fear the truth of your result. With the capacity for information storage currently available today, there is little room for people to claim that somehow code should not be disclosed. I throw nothing away. Not one thing. Data storage is cheap to to the point of being a non-issue entirely.

Any questions about disclosure of paper related code are moot unless the code is intended for sale. As is often the case in changing times, policies lag reality.

It is strange to see this all take place. I am specifically referring to the various fields of science reach an understanding that software is method, and if you’re not sharing your methods for getting your answer, you are not showing your work. I am a bit of an open-source software fanboy. The ideas that created the notion of intellectual property never held much water with me. Copyrights and Patents were intended to give original authors time to profit from their creativity, not to allow corporations to hold a population hostage forever through licensing restrictions and legal threats. It is amazing to see the pollution of ideas from the closed software world used to justify creating “invulnerable” results in science.

It is science that should have been leading the way on the issue of fully open code sharing.

I particularly enjoyed the description of an erroneous result that not only depended on the software used, but on the particular *version* of the software used. I change in default values of unspecified parameters changed the results.

That shows that they had not fully understood the implications of the default parameters of the algorithm used, let alone studied the sensitivity of their results to those parameters.

With scientists writing and using code for crucial results, careful code review, QA testing and more becomes increasingly important.

While the presence of significant errors may well be rare, the consequences of such errors can be huge… and go undetected for quite a long time. As nicely illustrated by the sign-error example in the linked articles.

I have my own example from a number of years back: I discovered an error in the core math library of a well-known software provider. The error was sufficiently subtle that it only affected about one in a thousand physical computers (it was random based on certain hardware aspects!) Because the vendor could not reproduce it quickly in their lab, they refused my bug report for more than five years… even though I could provide full proof of the issue and a working fix.

All it takes is for a subtle error to be present in the wrong place at the wrong time… and a huge amount of fallout can result.

Sheesh, that is an interesting, and scary, example. With all the complexity of computational systems (even the vendor lagged in your example) it makes it all the more important that others can reproduce on their machines.

As to the comments in the article, yes, the code scientists write can be ugly and “bad,” but that certainly doesn’t mean it should be withheld — indeed, quite the opposite.

I have found bugs in computations in Excel, Fortran, and Mathematica–with the latter being quick to fix them.
Also, Mathematica is a nice platform for documenting a full analysis. One can include not only the code and data (if not too large), but also the graphics and results, all in a single file.

In his very excellent article, Shub reports the following recommendation by Victoria Stodden:

“We propose that high-quality journals such as Nature not only have editors and reviewers that focus on the prose of a manuscript but also “computational editors” that look over computer codes and verify results.”

Does anyone reading this think it would be practical to expect a reliable code review by pre-pub reviewers?

An absolute requirement that code be available without restraint concurrent with publication might be more effective. The hidden anonymous peer reviewers seem to have done the science little good, although possibly without them things could have been worse.

My impression of the point Stodden’s trying to make is that simple reconstructibility of code-derived results and graphs in a paper should be a pre-requisite quality check before publishing, much as peer-submitted reviews are.

“In his paper on reproducible research in 2006, Randall LeVeque wrote in the journal Proceedings of the International Congress of Mathematicians:

‘Within the world of science, computation is now rightly seen as a third vertex of a triangle complementing experiment and theory. However, as it is now often practiced, one can make a good case that computing is the last refuge of the scientific scoundrel. Of course not all computational scientists are scoundrels, any more than all patriots are, but those inclined to be sloppy in their work currently find themselves too much at home in the computational sciences.'”

Can any reader who is familiar with both the peer-review process and the character of the code we are contemplating comment on whether peer-reviewers can be expected to perform any sort of useful check on the basis that these reviews are presently done. Is it too much free work? Would such reviews really detect the aberrations?

Shub, I thought understood what Dr. Stodden was getting at. I just wondered if it was practical.

There are two things here obviously. Is it practical? Is it desirable?

I think Stodden’s point is that, among all the checks for whether the paper’s basic claims are reproducible a reviewer can perform (which I know for a fact they don’t do anyway), checks on whether the coding works, is the easiest. So why not get it done?

But in reality, – many a time – reviewers don’t know to code, can’t be bothered to put in hard work and offer genuine criticism (why should I do the authors’ work), eyes glaze over code and data, just want to offer generalized criticism, or more commonly, can recognize bad code when he/she sees it, but doesn’t want to go there too much, because their own coding is exactly as bad. Therefore putting this additional burden on reviewers wont be practical – especially for journals like Nature which want ‘efficient’ reviewers and not nosy people who ask too many questions. Secondly, journals might rightly see this as grunt-work which is properly the realm of the researchers and not theirs and therefore, do not want to take on additional responsibilities. In turn, many strongly feel that this should not be the case – given all the money journals make. See this interview with this researcher at Harvard about the dysfunctional scientific journal market(here)

The other reason I can think of, as to why getting journals to run code verification is not a good idea is that, it artificially adds another imprimatur of credibility that scientific findings seem to gain just by publication.

‘Published in Nature? That means the code must have been checked. No need to look there then.’ or even worse, ‘We hold Nature partially responsible for this error. They provided a statement of code verification.’

I, for one, never suggested that reviewers be obligated to check code. The advantage of archiving code is that it documents all the methodological decisions so that someone seeking to replicate the results can more efficiently reconcile. From my own experience, archiving code at the time of publishing an article is helpful to the author since it’s all to easy to forget precisely what you did. And to overwrite the version of code that you used to get your results.

In the first code that we archived, I didn’t try to have the code do more than act as documentation. However, it later became clear to me that you could easily archive code that was fully turnkey and that readers were interested in being able to generate results for themselves and that this helped them understand the article.

I have to agree with Shub that the reviewers who are most likely to understand the thrust of the paper, and the relationships with other publications, are not likely to be expert in software. The purpose of publishing the code is to show the steps actually taken, rather than a description which is only approximate. Our host can expound at length on algorithmic descriptions which are kinda-sorta correct, but which overlook key implementation details (or which were implemented incorrectly). See http://climatecode.org/blog/2011/03/why-publish-code-a-case-study/ for an example of why kinda-sorta descriptions don’t cut it.

And lest one think that the benefit of code publication is just for those who wish to replicate results, I have to second Steve’s observation that clear, well-organized and well-commented code not only serves the reader, but also the author. It’s common, probably universal, that one doesn’t remember the exact steps even in one’s own program, when one returns to it after an interval of a few months. A lesson I learned long ago.

There are two aspects to checking that software is correct in this (and most other) contexts:

1) Whatever it is I’m trying to achieve, is it the right thing to attempt in the first place? (are my requirements understood, is my choice of algorithm suitable?)

2) When I try to achieve it, is my implementation correct and free from errors and unwanted side-effects?

Colossally difficult to achieve even for people who’s business is software development; in the hands of a visiting programmer or dilettante, they are unlikely even to know what they don’t know.

The *best* you can do in this scenario, IMO, is to set standards for development in general when code is part of a submitted paper – formatting, naming, scoping, documentation, methodology, test scripts and results etc. etc.

If the included code does not adhere to the accepted best practice of the language at hand, the paper should be rejected out of hand until the software *does* pass at least this shallow hurdle for quality.

This has a twofold effect:

1) it forces some degree of reflection and adherence to practices that are likely to avoid the most obvious errors

2) it makes the job of anyone deciding to attempt any in-depth analysis for more subtle errors that much easier, by virtue of excluding the donkey work of parsing random awful code.

This is a horrible idea. Now if an error is made people are less likely to catch it, as they will just be blindly reusing the same code. Having people write their own code based on the algorithm given is a better idea.
If skeptics aren’t able to reproduce results, it just means they aren’t qualified to speak on the science.