I went to a talk yesterday by Mark Pagani (Yale University), on the role of methane hydrates in the Paleocene-Eocene Thermal Maximum (PETM). The talk was focussed on how to explain the dramatic warming seen at the end of the Paleocene, 56 million years ago. During the Paleocene, the world was already much warmer than it is today (by around 5°C), and had been ice free for millions of years. But at the end of the Paleocene, the tempature shot up by at least another 5°C, over the course of a few thousand years, giving us a world with palm trees and crocodiles in the arctic, and this “thermal maximum” lasted around 100,000 years. The era brought a dramatic reduction in animal body size (although note: the dinosaurs had already been wiped out at the beginning of the Paleocene), and saw the emergence of small mammals.

But what explains the dramatic warming? The story is fascinating, involving many different lines of evidence, and I doubt I can do it justice without a lot more background reading. I’ll do a brief summary here, as I want to go on to talk about something that came up in the questions about climate sensitivity.

First, we know that the warming at the PETM coincided with a massive influx of carbon, and the fossil record shows a significant shift in carbon isotopes, so it was a new and different source of carbon. The resulting increase in CO2 warmed the planet in the way we would expect. But where did the carbon come from? The dominant hypothesis has been that it came from a sudden melting of undersea methane hydrates, triggered by tectonic shifts. But Mark explained that this hypothesis doesn’t add up, because there isn’t enough carbon to account for the observed shift in carbon isotopes, and it also requires a very high value for climate sensitivity (in the range 9-11°C), which is inconsistent with the IPCC estimates of 2-4.5ºC. Some have argued this is evidence that climate sensitivity really is much higher, or perhaps that our models are missing some significant amplifiers of warming (see for instance, the 2008 paper by Zeebe et al., which caused a ruckus in the media). But, as Mark pointed out, this really misses the key point. If the numbers are inconsistent with all the other evidence about climate sensitivity, then it’s more likely that the methane hydrates hypothesis itself is wrong. Mark’s preferred explanation is a melting of the antarctic permafrost, caused by a shift in orbital cycles, and indeed he demonstrates that the orbital pattern leads to similar spikes (of decreasing amplitude) throughout the Eocene. Prior to the PETM, Antarctica would have been ice free for so long that a substantial permafrost would have built up, and even conservative estimates based on today’s permafrost in the sub-arctic regions would have enough carbon to explain the observed changes. (Mark has a paper on this coming out soon).

That was very interesting, but for me the most interesting part was in the discussion at the end of the talk. Mark had used the term “earth system sensitivity” instead of “climate sensitivity”, and Dick Peltier suggested he should explain the distinction for the benefit of the audience.

Mark began by pointing out that the real scientific debate about climate change (after you discount the crazies) is around the actual value of climate sensitivity, which is shorthand for the relationship between changes in atmospheric concentrations of CO2 and the resulting change in global temperature:

Key relationships in the climate system. Adapted from a flickr image by ClimateSafety (click image for the original)

The term climate sensitivity was popularized in 1979 by the Charney report, and refers to the eventual temperature response to a doubling of CO2 concentrations, taking into account fast feedbacks such as water vapour, but not the slow feedbacks such as geological changes. Charney sensitivity also assumes everything else about the earth system (e.g. ice sheets, vegetation, ocean biogeochemistry, atmospheric chemistry, aerosols, etc) is held constant. The reason the definition refers to warming per doubling of CO2 is because the radiative effect of CO2 is roughly logarithmic, so you get about he same warming each time you double atmospheric concentrations. Charney calculated climate sensitivity to be 3°C (±1.5), a value that was first worked out in the 1950’s, and hasn’t really changed, despite decades of research since then. Note: equilibrium climate sensitivity is also not the same as the transient response.

Earth System Sensitivity is then the expected change in global temperature in response to a doubling of CO2 when we do take into account all the other aspects of the earth system. This is much harder to estimate, because there is a lot more uncertainty around different kinds of interactions in the earth system. However, many scientists expect it to be higher than the Charney sensitivity, because, on balance, most of the known earth system feedbacks are positive (i.e. they amplify the basic greenhouse gas warming).

Mark put it this way: Earth System Sensitivity is like an accordion. It stretches out or contracts, depending on the current state of the earth system. For example, if you melt the arctic sea ice, this causes an amplifying feedback because white ice has a higher albedo than the dark sea water that replaces it. So if there’s a lot of ice to melt, it would increase earth system sensitivity. But if you’ve already melted all the sea ice, the effect is gone. Similarly, if the warming leads to a massive drying out and burning of vegetation, that’s another temporary amplification that will cease once you’ve burned off most of the forests. If you start the doubling in a warmer world, in which these feedbacks are no longer available, earth system sensitivity might be lower.

The key point is that, unlike Charney sensitivity, earth system sensitivity depends on where you start from. In the case of the PETM, the starting point for the sudden warming was a world that was already ice free. So we shouldn’t expect the earth system sensitivity to be the same as it is in the 21st century. Which certainly complicates the job of comparing climate changes in the distant past with those of today.

But, more relevantly for current thinking about climate policy, thinking in terms of Charney sensitivity is likely to be misleading. If earth system sensitivity is significantly bigger in today’s earth system, which seems likely, then calculations of expected warming based on Charney sensitivity will underestimate the warming, and hence the underestimate the size of the necessary policy responses.

I’ll be giving a talk to the Toronto section of the IEEE Systems Council on December 1st, in which I plan to draw together several of the ideas I’ve been writing about recently on systems thinking and leverage points, and apply them to the problem of planetary boundaries. Come and join in the discussion if you’re around:

At the beginning of this month, the human population reached 7 billion people. The impact of humanity on the planet is vast: we use nearly 40% of the earth’s land surface to grow food, we’re driving other species to extinction at a rate not seen since the last ice age, and we’ve altered the planet’s energy balance by changing the atmosphere. In short, we’ve entered a new geological age, the Anthropocene, in which our collective actions will dramatically alter the inhabitability of the planet. We face an urgent task: we have to learn how to manage the earth as a giant system of systems, before we do irreparable damage. In this talk, I will describe some of the key systems that are relevant to this task, including climate change, agriculture, trade, energy production, and the global financial system. I will explore some of the interactions between these systems, and characterize the feedback cycles that alter their dynamics and affect their stability. This will lead us to an initial attempt to identify planetary boundaries for some of these systems, which together define a safe operating space for humanity. I will end the talk by offering a framework for thinking about the leverage points that may allow us to manage these systems to keep them within the safe operating limits.

I had several interesting conversations at WCRP11 last week about how different the various climate models are. The question is important because it gives some insight into how much an ensemble of different models captures the uncertainty in climate projections. Several speakers at WCRP suggested we need an international effort to build a new, best of breed climate model. For example, Christian Jakob argued that we need a “Manhattan project” to build a new, more modern climate model, rather than continuing to evolve our old ones (I’ve argued in the past that this is not a viable approach). There have also been calls for a new international climate modeling centre, with the resources to build much larger supercomputing facilities.

The counter-argument is that the current diversity in models is important, and re-allocating resources to a single centre would remove this benefit. Currently around 20 or so different labs around the world build their own climate models to participate in the model inter-comparison projects that form a key input to the IPCC assessments. Part of the argument for this diversity of models is that when different models give similar results, that boosts our confidence in those results, and when they give different results, the comparisons provide insights into how well we currently understand and can simulate the climate system. For assessment purposes, the spread of the models is often taken as a proxy for uncertainty, in the absence of any other way of calculating error bars for model projections.

But that raises a number of questions. How well do the current set of coupled climate models capture the uncertainty? How different are the models really? Do they all share similar biases? And can we characterize how model intercomparisons feed back into progress in improving the models? I think we’re starting to get interesting answers to the first two of these questions, while the last two are, I think, still unanswered.

First, then, is the question of representing uncertainty. There are, of course, a number of sources of uncertainty. [Note that ‘uncertainty’ here doesn’t mean ‘ignorance’ (a mistake often made by non-scientists); it means, roughly, how big should the error bars be when we make a forecast, or more usefully, what does the probability distribution look like for different climate outcomes?]. In climate projections, sources of uncertainty can be grouped into three types:

Scenario uncertainty: the uncertainty over future carbon emissions, land use changes, and other types of anthropogenic forcings. As we really don’t know how these will change year-by-year in the future (irrespective of whether any explicit policy targets are set), it’s hard to say exactly how much climate change we should expect.

Model uncertainty: the range of different responses to the same emissions scenario given by different models. Such differences arise, presumably, because we don’t understand all the relevant processes in the climate system perfectly. This is the kind of uncertainty that a large ensemble of different models ought to be able to assess.

Hawkins and Sutton analyzed the impact of these different type of uncertainty on projections of global temperature over the range of a century. Here, Fractional Uncertainty means the ratio of the model spread to the projected temperature change (against a 1971-2000 mean):

This analysis shows that for short term (decadal) projections, the internal variability is significant. Finding ways of reducing this (for example by better model initialization from the current state of the climate) is important the kind of near-term regional projections needed by, for example, city planners, and utility and insurance companies, etc. Hawkins & Sutton indicate with dashed lines some potential to reduce this uncertainty for decadal projections through better initialization of the models.

For longer term (century) projections, internal variability is dwarfed by scenario uncertainty. However, if we’re clear about the nature of the scenarios used, we can put scenario uncertainty aside and treat model runs as “what-if” explorations – if the emissions follow a particular pathway over the 21st Century, what climate response might we expect?

Model uncertainty remains significant over both short and long term projections. The important question here for predicting climate change is how much of this range of different model responses captures the real uncertainties in the science itself. In the analysis above, the variability due to model differences is about 1/4 of the magnitude of the mean temperature rise projected for the end of the century. For example, if a given emissions scenario leads to a model mean of +4°C, the model spread would be about 1°C, yielding a projection of +4±0.5°C. So is that the right size for an error bar on our end-of-century temperature projections? Or, to turn the question around, what is the probability of a surprise – where the climate change turns out to fall outside the range represented by the current model ensemble?

Just as importantly, is the model ensemble mean the most likely outcome? Or do the models share certain biases so that the truth is somewhere other than the multi-model mean? Last year, James Annan demolished the idea that the models cluster around the truth, and in a paper with Julia Hargreaves, provides some evidence that the model ensembles do a relatively good job of bracketing the observational data, and, if anything, the ensemble spread is too broad. If the latter point is correct, then the model ensembles over-estimate the uncertainty.

This brings me to the question of how different the models really are. Over the summer, Kaitlin Alexander worked with me to explore the software architecture of some of the models that I’ve worked with from Europe and N. America. The first thing that jumped out at me when she showed me her diagrams was how different the models all look from one another. Here are six of them presented side-by-side. The coloured ovals indicate the size (in lines of code) of each major model component (relative to other components in the same model; the different models are not shown to scale), and the coloured arrows indicate data exchanges between the major components (see Kaitlin’s post for more details):

There are clearly differences in how the components are coupled together (for example, whether all data exchanges pass through a coupler, or whether components interact directly). In some cases, major subcomponents are embedded as subroutines within a model component, which makes the architecture harder to understand, but may make sense from a scientific point of view, when earth system processes themselves are tightly coupled. However, such differences in the code might just be superficial, as the choice of call structure should not, in principle affect the climatology.

The other significant difference is in the relative sizes of the major components. Lines of code isn’t necessarily a reliable measure, but it usually offers a reasonable proxy for the amount of functionality. So a model with an atmosphere model dramatically bigger than the other components indicates a model for which far more work (and hence far more science) has gone into modeling the atmosphere than the other components.

Compare for example, the relative sizes of the atmosphere and ocean components for HadGEM3 and IPSLCM5A, which, incidentally, both use the same ocean model, NEMO. HadGEMs has a much bigger atmosphere model, representing more science, or at least many more options for different configurations. In part, this is because the UK Met Office is an operational weather forecasting centre, and the code base is shared between NWP and climate research. Daily use of this model for weather forecasting offers many opportunities to improve the skill of the model (although improvement in skill in short term weather forecasting doesn’t necessarily imply improvements in skill for climate simulations). However, the atmosphere model is the biggest beneficiary of this process, and, in fact, the UK Met Office does not have much expertise in ocean modeling. In contrast, the IPSL model is the result of a collaboration between several similarly sized research groups, representing different earth subsystems.

But do these architectural differences show up as scientific differences? I think they do, but was finding this hard to analyze. Then I had a fascinating conversation at WCRP last week with Reto Knutti, who showed me a recent paper that he published with D. Masson, in which they analyzed model similarity from across the CMIP3 dataset. The paper describes a cluster analysis over all the CMIP3 models (plus three re-analysis datasets, to represent observations), based on how well the capture the full spatial field for temperature (on the left) and precipitation (on the right). The cluster diagrams look like this (click for bigger):

In these diagrams, the models from the same lab are coloured the same. Observational data are in pale blue (three observational datasets were included for temperature, and two for precipitation). Some obvious things jump out: the different observational datasets are more similar to each other than they are to any other model, but as a cluster, they don’t look any different from the models. Interestingly, models from the same lab tend to be more similar to one another, even when these span different model generations. For example, for temperature, the UK Met Office models HadCM3 and HadGEM1 are more like each other than they are like any other models, even though they run at very different resolutions, and have different ocean models. For precipitation, all the GISS models cluster together and are quite different from all the other models.

The overall conclusion from this analysis is that using models from just one lab (even in very different configurations, and across model generations) gives you a lot less variability than using models from different labs. Which does suggest that there’s something in the architectural choices made at each lab that leads to a difference in the climatology. In the paper, Masson & Knutti go on to analyze perturbed physics ensembles, and show that the same effect shows up here too. Taking a single model, and systematically varying the parameters used in the model physics still gives you less variability than using models from different labs.

There’s another followup question that I would like to analyze: do models that share major components tend to cluster together? There’s a growing tendency for a given component (e.g. an ocean model, an atmosphere model) to show up in more than one lab’s GCM. It’s not yet clear how this affects variability in a multi-model ensemble.

So what are the lessons here? First, there is evidence that the use of multi-model ensembles is valuable and important, and that these ensembles capture the uncertainty much better than multiple runs of a single model (no matter how it is perturbed). The evidence suggests that models from different labs are significantly different from one another both scientifically and structurally, and at least part of the explanation for this is that labs tend to have different clusters of expertise across the full range of earth system processes. Studies that compare model results with observational data (E.g. Hargreaves & Annan; Masson & Knutti) show that the observations looks no different from just another member of the multi-model ensemble (or to put it in Annan and Hargreaves’ terms, the truth is statistically indistinguishable from another model in the ensemble).

It would appear that the current arrangement of twenty or so different labs competing to build their own models is a remarkably robust approach to capturing the full range of scientific uncertainty with respect to climate processes. And hence it doesn’t make sense to attempt to consolidate this effort into one international lab.