Existing Global Climate Models are typically managed and controlled at a single site, with varied levels of participation by scientists outside the core lab. As these models evolve to encompass a wider set of earth systems, this central control of the modeling effort becomes a bottleneck. But such models cannot evolve to become fully distributed open source projects unless they address the imbalance in the availability of communication channels: scientists at the core site have access to regular face-to-face communication with one another, while those at remote sites have access to only a subset of these conversations – e.g. formally scheduled teleconferences and user meetings. Because of this imbalance, critical decision making can be hidden from many participants, their code contributions can interact in unanticipated ways, and the community loses awareness of who knows what. We have documented some of these problems in a field study at one climate modeling centre, and started to develop tools to overcome these problems. We report on one such tool, TracSNAP, which analyzes the social network of the scientists contributing code to the model by extracting the data in an existing project code repository. The tool presents the results of this analysis to modelers and model users in a number of ways: recommendation for who has expertise on particular code modules, suggestions for code sections that are related to files being worked on, and visualizations of team communication patterns. The tool is currently available as a plugin for the Trac bug tracking system.

The lack of availability of the majority of scientific artifacts reduces credibility and discourages collaboration. Some scientists have begun to advocate for reproducibility, open science, and computational provenance to address this problem, but there is no consolidated effort within the scientific community. There does not appear to be any consensus yet on the goals of an open science effort, and little understanding of the barriers. Hence we need to understand the views of the key stakeholders – the scientists who create and use these artifacts.

The goal of our research is to establish a baseline and categorize the views of experimental scientists on the topics of reproducibility, credibility, scooping, data sharing, results sharing, and the effectiveness of the peer review process. We gathered the opinions of scientists on these issues through a formal questionnaire and analyzed their responses by topic.

We found that scientists see a provenance problem in their communications with the public. For example, results are published separately from supporting evidence and detailed analysis. Furthermore, although scientists are enthusiastic about collaborating and openly sharing their data, they do not do so out of fear of being scooped. We discuss these serious challenges for the reproducibility, open science, and computational provenance movements.

A climate model is an executable theory of the climate; the model encapsulates climatological theories in software so that they can be simulated and their implications investigated directly. Thus, in order to trust a climate model one must trust that the software it is built from is robust. Our study explores the nature of software quality in the context of climate modelling: How do we characterise and assess the quality of climate modelling software? We use two major research strategies: (1) analysis of defect densities of leading global climate models and (2) semi-structured interviews with researchers from several climate modelling centres. Defect density analysis is an established software engineering technique for studying software quality. We collected our defect data from bug tracking systems, version control repository comments, and from static analysis of the source code. As a result of our analysis, we characterise common defect types found in climate model software and we identify the software quality factors that are relevant for climate scientists. We also provide a roadmap to achieve proper benchmarks for climate model software quality, and we discuss the implications of our findings for the assessment of climate model software trustworthiness.

Yes, for the geosciences, journals are the main publication venues – the conferences just have abstracts with minimal peer review (I think they just check for relevance), and accept them for presentation as either posters or talks. The poster sessions are on an incredible scale – thousands of posters set up simultaneously each day (but then the AGU meeting is probably one of the biggest scientific meetings in the world – 15,000 geoscientists are expected).

I have enjoyed wondering around the poster sessions at previous geosciences conferences, and have had some fascinating conversations as a result. It leads to much more interactivity than the more presentation-based conferences we have in SE, but also requires the participants to be much more pro-active at seeking out interesting posters and asking the poster presenters about their work.