The Leak Reveals a Failure of Reproducibility of Computational Results

It appears as though the leak came about through a long battle to get the CRU scientists to reveal the code and data associated with published results, and highlights a crack in the scientific method as practiced in computational science. Publishing standards have not yet adapted to the relatively new computational methods used pervasively across scientific research today.

Other branches of science have long-established methods to bring reproducibility into their practice. Deductive or mathematical results are published only with proofs, and there are long established standards for an acceptable proof. Empirical science contains clear mechanisms for communication of methods with the goal of facilitation of replication. Computational methods are a relatively new addition to a scientist’s toolkit, and the scientific community is only just establishing similar standards for verification and reproducibility in this new context. Peer review and journal publishing have generally not yet adapted to the use of computational methods and still operate as suitable for the deductive or empirical branches, creating a growing credibility gap in computational science.

Verifying Computational Results without Clear Communication of the Steps Taken is Near-Impossible

The frequent near-impossibility of verification of computational results when reproducibility is not considered a research goal is shown by the miserable travails of “Harry,” a CRU employee with access to their system who was trying to reproduce the temperature results. The leaked documents contain logs of his unsuccessful attempts. It seems reasonable to conclude that CRU’s published results aren’t reproducible if Harry, an insider, was unable to do so after four years.

This example also illustrates why a decision to leave reproducibility to others, beyond a cursory description of methods in the published text, is wholly inadequate for computational science. Harry seems to have had access to the data and code used and he couldn’t replicate the results. The merging and preprocessing of data in preparation for modeling and estimation encompasses a potentially very large number of steps, and a change in any one could produce different results. Just as when fitting models or running simulations, parameter settings and function invocation sequences must be communicated, again because the final results are a culmination of many decisions and without this information each small step must match the original work – a Herculean task. Responding with raw data when questioned about computational results is merely a canard, not intended to seriously facilitate reproducibility.

The story of Penn State professor of meteorology Michael Mann‘s famous hockey stick temperature time series estimates is an example where lack of verifiability had important consequences. Release of the code and data used to generate the results in the hockey stick paper likely would have avoided the convening of panels to assess the papers. The hockey stick is a dramatic illustration of global warming and became something of a logo for the U.N.’s Intergovernmental Panel of Climate Change (IPCC). Mann was an author of the 2001 IPCC Assessment report, and was a lead author on the “Copenhagen Diagnosis,” a report released Nov 24 and intended to synthesize the hundreds of research papers about human-induced climate change that have been published since the last assessment by the IPCC two years ago. The report was prepared in advance of the Copenhagen climate summit scheduled for Dec 7-18. Emails between CRU researchers and Mann are included in the leak, which happened right before the release of the Copenhagen Diagnosis (a quick search of the leaked emails for “Mann” provided 489 matches).

These reports are important in part because of their impact on policy, as CBS news reports, “In global warming circles, the CRU wields outsize influence: it claims the world’s largest temperature data set, and its work and mathematical models were incorporated into the United Nations Intergovernmental Panel on Climate Change’s 2007 report. That report, in turn, is what the Environmental Protection Agency acknowledged it “relies on most heavily” when concluding that carbon dioxide emissions endanger public health and should be regulated.”

Discussions of Appropriate Level of Code and Data Disclosure on RealClimate.org, Before and After the CRU Leak

A dangerous ramification from the leak could be an undermining of public confidence in science and the conduct of scientists. My sense is that making code and data readily available in a way that facilitates reproducibility of results, can help avoid distractions from the real science, such as potential evasions of FOIA requests, whether or not data were fudged, or scientists acted improperly in squelching dissent or manipulating journal editorial boards. Perhaps data release is becoming an accepted norm, but code release for reproducibility must follow. The issue here is verification and reproducibility, without which it is all but impossible to tell whether the core science done at CRU was correct or not, even for peer reviewing scientists.

4 Responses to “The Climate Modeling Leak: Code and Data Generating Published Results Must be Open and Facilitate Reproducibility”

Victoria,
I think your analysis is too simplistic. Open data and open source on it’s own is a fine and laudable goal, but is not the problem in the CRU case, nor in the wider field of climate modeling.

First, most climate code and data is freely available. The research results are reproduced widely throughout the field, and the comparison and validation of results is build into the IPCC process. Can you name any other field of science, in which 25 different research centres around the world are building large scale models of the same phenomena, and comparing the results in detail through controlled model inter-comparison projects (see CMIP5 for a taste: http://cmip-pcmdi.llnl.gov/cmip5/)

Second, climate phenomena are sufficiently complex that *exact* reproducibility is effectively impossible. This deserves a longer explanation (and I’m working on a paper on that), but for a taste, see: http://moregrumbinescience.blogspot.com/2009/11/data-set-reproducibility.html
So what is needed is a different approach to reproducibility, which is to independently arrive at the comparable results via different methods. Which is exactly what the climate science community has been doing for decades, and it is very effective (unfortunately, little has been written on this, but I’m working on it – it’s a fascinating discipline)

Finally, you’ve blown the Mann and Jones work out of all proportion. The field of dendro-chronology is a minor sideshow in climate science, and a very immature discipline. Mann’s early work on this had errors in it, but of course it did – that’s what science is! Subsequent work has improved the methodologies, without changing the results at all. But then the politicians and lawyers get involved, fail to understand the scientific process, and think that science should be a perfect process every step of the way. Can you honestly say that any of your publications could stand up to the kind of scrutiny at congressional levels that Mann’s has been subjected to? This is not how we do science.

The net result of politically inspired attacks on climate scientists has resulted in an understandable siege mentality, which is what you see in the CRU emails. I have characterized it as a kind of denial of service attack, and it’s not healthy for science in any way. He’s my take: http://www.easterbrook.ca/steve/?p=1001

The simple availability of code and data isn’t enough – these need to be shared in such a way that the results can be verified. According to HARRY’s readme file he apparently had access to code and data from within the CRU and still wasn’t able to replicate results. The file also shows that layers and layers of complex data processing are involved in the work he was trying to replicate – all part of the scientific research, and important for understanding and evaluating the results. I haven’t questioned the scientific results, but secretiveness or obfuscation with regard to code and data makes it easier to believe computational results could possibly be in error, regardless of the field. Reproducibility of computational results is not a concern unique to climate science.

Reproducibility is being taken seriously in a growing number of fields, such as among seismic researchers and in the machine learning community. I don’t agree that the potential for public harassment is an argument for an exception to the scientific method. I’d argue instead that openness is especially important in the case of climate change because of its salience in public policy. I have a previous blog post on the release of government data arguing that the answer to bad speech is more speech (doesn’t necessarily have to come from the original researchers).

Sharing code and data from results that have not been prepared with the intention that they are to be reproducible is tough and I see your work as vital in this effort, for example your two recent papers in CiSE. I agree we self-correct in science, but this comes through openness and disclosure. Encouragingly, the view that science involves reproducibility of results appears to be resonating. Today The Times reported that The Met will conduct a review of it’s temperature data since it used work emanating from the CRU – and it will do this in a transparent fashion because it “wants to create a new and fully open method of analysing temperature data.” In yesterday’s Scientific American Michael Mann was quoted saying his most recent publication in Science includes the underlying code and data as a supplement – I’m hoping it was released in a reproducible fashion, and that we get to a place where our published results are routinely verifiable.