The IPCC model simulation archive

In the lead up to the 4th Assessment Report, all the main climate modelling groups (17 of them at last count) made a series of coordinated simulations for the 20th Century and various scenarios for the future. All of this output is publicly available in the PCMDI IPCC AR4 archive (now officially called the CMIP3 archive, in recognition of the two previous, though less comprehensive, collections). We’ve mentioned this archive before in passing, but we’ve never really discussed what it is, how it came to be, how it is being used and how it is (or should be) radically transforming the comparisons of model output and observational data.
First off, it’s important to note that this effort was not organised by IPCC itself. Instead, it was coordinated by the Working Group on Coupled Modelling (WGCM), an unpaid committee that is part of an alphabet soup of committees, nominally run by the WMO, that try to coordinate all aspects of climate-related research. In the lead up to AR4, WGCM took up the task of deciding what the key experiments would be, what would be requested from the modelling groups and how the data archive would be organised. This was highly non-trivial, and adjustments to the data requirements were still being made right up until the last minute. While this may seem arcane, or even boring, the point I’d like to leave is that just ‘making data available’ is the least of the problems in making data useful. There was a good summary of the process in Bulletin of the American Meteorological Society last month.

Previous efforts to coordinate model simulations had come up against two main barriers: getting the modelling groups to participate and making sure enough data was saved that useful work could be done.

Modelling groups tend to work in cycles. That is, there will be a period of a few years of development of a new model then a year or two of analysis and use of that model, until there is enough momentum and new ideas to upgrade the model and starting a new round of development. These cycles can be driven by purchasing policies for new computers, staff turnover, general enthusiasm, developmental delays etc. and until recently were unique to each modelling group. When new initiatives are announced (and they come roughly once every six months), the decision of the modelling group to participate depends on where they are in their cycle. If they are in the middle of the development phase, they will likely not want to use their last model (because the new one will almost certainly be better), but they might not be able to use the new one either because it just isn’t ready. These phasing issues definitely impacted earlier attempts to produce model output archives.

What was different this time round is that the IPCC timetable has, after almost 20 years, managed to synchronise development cycles such that, with only a couple of notable exceptions, most groups were ready with their new models early in 2004 – which is when these simulations needed to start if the analysis was going to be available for the AR4 report being written in 2005/6. (It’s interesting to compare this with nonlinear phase synchronisation in, for instance, fireflies).

The other big change this time around was the amount of data requested. The diagnostics in previous archives had been relatively sparse – the main atmospheric variables (temperature, precipitation, winds etc.) but not huge amounts extra, and generally only at monthly resolution. This had limited the usefulness of the previous archives because if something interesting was seen, it was almost impossible to diagnose why it had happened without having access to more information. This time, the diagnostic requests for the atmospheric, ocean, land and ice were much more extensive and a significant amount of high-frequency data was asked for as well (i.e. 6 hourly fields). For the first time, this meant that outsiders could really look at the ‘weather’ regimes of the climate models.

The work involved in these experiments was significant and unfunded. At GISS, the simulations took about a year to do. That includes a few partial do-overs to fix small problems (like an inadvertent mis-specification of the ozone depletion trend), the processing of the data, the transfer to PCMDI and the ongoing checking to make sure that the data was what it was supposed to be. The amount of data was so large – about a dozen different experiments, a few ensemble members for most experiments, large amounts of high-frequency data – that transferring it to PCMDI over the internet would have taken years. Thus, all the data was shipped on terabyte hard drives.

Once the data was available from all the modelling groups (all in consistent netcdf files with standardised names and formatting), a few groups were given some seed money from NSF/NOAA/NASA to get cracking on various important comparisons. However, the number of people who have registered to use the data (more than 1000) far exceeded the number of people who were actually being paid to look at it. Although some of the people who were looking at the data were from the modelling groups, the vast majority were from the wider academic community and for many it was the first time that they’d had direct access to raw GCM output.

With that influx of new talent, many innovative diagnostics were examined. Many, indeed, that hadn’t been looked at by the modelling groups themselves, even internally. It is possibly under-appreciated that the number of possible model-data comparisons far exceeds the capacity of any one modelling center to examine them.

The advantages of the database is the ability to address a number of different kinds of uncertainty, not everything of course, but certainly more than was available before. Specifically, the uncertainty in distinguishing forced and unforced variability and the uncertainty due to model imperfections.

When comparing climate models to reality the first problem to confront is the ‘weather’, defined loosely as the unforced variability (that exists on multiple timescales). Any particular realisation of a climate model simulation, say of the 20th Century, will have a different sequence of weather – that is, the weather pattern on Jan 31, 1967 in one realisation will be uncorrelated to the weather pattern on Jan 31, 1967 in another realisation, even though each run has the same climate forcing (increases in greenhouse gases, volcanoes etc.). There is no expectation that the weather in any one model will be correlated to that in the real world either. So any comparison of climate models and data needs to estimate the amount of change that is due to the weather and the amount related to the forcing. In the real world, that is difficult because there is certainly a degree of unforced variability even at decadal scales (and possibly longer). However, in the model archive it is relatively easy to distinguish.

The standard trick is to look at the ensemble of model runs. If each run has different, uncorrelated weather, then averaging over the different simulations (the ensemble mean) gives an estimate of the underlying forced change. Normally this is done for one single model and for metrics like the global mean temperature, only a few ensemble members are needed to reduce the noise. For other metrics – like regional diagnostics – more ensemble members are required. There is another standard way to reduce weather noise, and that is to average over time, or over specific events. If you are interested in the impact of volcanic eruptions, it is basically equivalent to run the same eruption 20 times with different starting points, or collect together the response of 20 different eruptions. The same can be done with the response to El Niño for instance.

With the new archive though, people have tried something new – averaging the results of all the different models. This is termed a meta-ensemble, and at first thought it doesn’t seem very sensible. Unlike the weather noise, the difference between models is not drawn from a nicely behaved distribution, the models are not independent in any solidly statistical sense, and no-one really thinks they are all equally valid. Thus many of the pre-requisites for making this mathematically sound are missing, or at best, unquantified. Expectations from a meta-ensemble are therefore low. But, and this is a curious thing, it turns out that the meta-ensemble of all the IPCC simulations actually outperforms any single model when compared to the real world. That implies that at least some part of the model differences is in fact random and can be cancelled out. Of course, many systematic problems remain even in a meta-ensemble.

There are lots of ongoing attempts to refine this. What happens if you try and exclude some models that don’t pass an initial screening? Can you weight the models in an optimum way to improve forecasts? Unfortunately, there doesn’t seem to be any universal way to do this despite a few successful attempts. More research on this question is definitely needed.

Note however that the ensemble or meta-ensemble only gives a measure of the central tendency or forced component. They do not help answer the question of whether the models are consistent with any observed change. For that, one needs to look at the spread of the model simulations, noting that each simulation is a potential realisation of the underlying assumptions in the models. Do not – for instance, confuse the uncertainty in the estimate of the ensemble mean with the spread!

Particularly important simulations for model-data comparisons are the forced coupled-model runs for the 20th Century, and ‘AMIP’-style runs for the late 20th Century. ‘AMIP’ runs are atmospheric model runs that impose the observed sea surface temperature conditions instead of calculating them with an ocean model, optionally using other forcings as well and are particularly useful if it matters that you get the timing and amplitude of El Niño correct in a comparison. No more need the question be asked ‘what do the models say?’ – you can ask them directly.

The usefulness of any comparison is whether it really provides a constraint on the models and there are plenty of good examples of this. What is ideal are diagnostics that are robust in the models, not too affected by weather, and can be estimated in the real world e.g Ben Santer’s paper on tropospheric trends, the discussion we had on global dimming trends, and the AR4 report is full of more examples. What isn’t useful are short period and/or limited area diagnostics for which the ensemble spread is enormous.

CMIP3 2.0?

In such a large endeavor, it’s inevitable that not everything is done to everyone’s satisfaction and that in hindsight some opportunities were missed. The following items should therefore be read as suggestions for next time around, and not as criticisms of the organisation this time.

Initially the model output was only accessible to people who had registered and had a specific proposal to study the data. While this makes some sense in discouraging needless duplication of effort, it isn’t necessary and discourages the kind of casual browsing that is useful for getting a feel for the output or spotting something unexpected. However, the archive will soon be available with no restrictions and hopefully that setup can be maintained for other archives in future.

Another issue with access is the sheer amount amount of data and the relative slowness of downloading data over the internet. Here some lessons could be taken from more popular high-bandwidth applications. Reducing time-to-download for videos or music has relied on distributed access to the data. Applications like BitTorrent manage download speeds that are hugely faster than direct downloads because you end up getting data from dozens of locations at the same time, from people who’d downloaded the same thing as you. Therefore the more popular an item, the quicker it is to download. There is much that could be learned from this data model.

The other way to reduce download times is to make sure that you only download what is wanted. If you only want a time series of global mean temperatures, you shouldn’t need to download the two-dimensional field and create your own averages. Thus for many purposes, automatic global, zonal-mean or vertical averaging would have saved an enormous amount of time.

Finally, the essence of the Web 2.0 movement is interactivity – consumers can also be producers. In the current CMIP3 setup, the modelling groups are the producers but the return flow of information is rather limited. People who analyse the data have published many interesting papers (over 380 and counting) but their analyses have not been ‘mainstreamed’ into model development efforts. For instance, there is a great paper by Lin et al on tropical intra-seasonal variability (such as the Madden-Julian Oscillation) in the models. Their analysis was quite complex and would be a useful addition to the suite of diagnostics regularly tested in model development, but it is impractical to expect Dr. Lin to just redo his analysis every time the models change. A better model would be for the archive to host the analysis scripts as well so that they could be accessed as easily as the data. There are of course issues of citation with such an idea, but it needn’t be insuperable. In a similar way, how many times did different people calculate the NAO or Niño 3.4 indices in the models? Having some organised user-generated content could have saved a lot of time there.

Maybe some of these ideas (and any others readers might care to suggest), could even be tried out relatively soon…

Conclusion

The diagnoses of the archive done so far are really only the tip of the iceberg compared to what could be done and it is very likely that the archive will be providing an invaluable resource for researchers for years. It is beyond question that the organisers deserve a great deal of gratitude from the community for having spearheaded this.

169 Responses to “The IPCC model simulation archive”

Geoff Beacon (150) — I am an amateur here, but enough of one to provide fairly good answers to your three questions. (I hope others will pitch in as well.)

1. Yes. CO2 is slow and persistent.

2. Methane has a short-life time in the atmosphere, about 20 years should do it.

3. All the fossil fuel derived CO2 released into the atmosphere needs to be removed, the sooner the better. That said, 20 years ought to be soon enough, although we don’t actually know all the damages done in that short time.

The answer to my question (150) might be changed by the size of feedbacks in the climate system. There have been several reports of positive feedbacks: failing carbon sinks, loss of soil carbon, methane from wetlands/tundra, more forest fires, the drying of the Amazon, the sea ice albedo effect, & etc. There may also be anthropogenic feedbacks: turning up the air-conditioning in response to a warmer climate … but also turning down the heating.

Are these feedbacks too small to change the answer? If positive feedbacks exceed negative feedbacks, what extra CO2 must be removed at the end of the chosen period to reverse the net effect of these feedbacks?

The best simple answer I can give is to permanently sequester, as soon as possible, at least 350 GtC and stop adding (or else immediately sequester) the currently about 8.5 GtC being added yearly to the active carbon cycle. Then continual monitoring and scientific advances will discover what additional remediation is required.

I’d argue that 70 tons today is significantly worse than 70 tons over 70 years. The atmospheric half-life of carbon is long, but not infinite – today it might be 120 years; with feedbacks and sink saturation that might double, but 70 would still be a significant fraction.

You also have to consider the economic effect – the 70 tons over 70 years effectively delays the damage, and even if you think the discount rate on welfare should be zero, there’s still significant benefit from that due to time value of money.

Either way, the analysis has to extend to temperature effects (or better yet, impacts), not just atmospheric GHGs. “If one gigatonne of CO2 is released into the atmosphere now, how much CO2 must be extracted in 20 years time to counteract the effect of the initial release.” is a good question, as long as you mean achieving an equivalent welfare trajectory, not just an equivalent CO2 trajectory.

I played around with a simple integrated model to see what would happen. The results aren’t quite what I expected, but I think make sense. Essentially, I tried injecting a 100 GT pulse of carbon into the atmosphere over 1 or 70 years, starting in 2000 and running the model to 2200, with and without a modest temperature feedback to the carbon cycle.

Without temperature feedback to the carbon cycle:

For CO2 concentration, the 70yr pulse actually results in a higher concentration any time after about 2060, simply because the earth hasn’t had time to squirrel away the emissions from later years. However, the difference is not great. Either way, 75% of the stuff is still around in 2200 (not what you’d expect from average lifetimes, but I think reasonable due to diminishing marginal uptake).

For temperature, the picture is different. The 1yr pulse drives temperature about .16C above the no-pulse trajectory, with the effect peaking in 2040, and falling to about .06C by 2200. The 70yr pulse takes much longer to reach peak effect, a maximum of .11C in 2080, with about the same effect in 2200.

The 1yr pulse triggers additional feedback emissions, such that in 2200 effectively 100% of the pulse is still around. The 70yr pulse still takes 60 yrs to reach the same level, but things end up in roughly a tie.

The temperature increase from the 1yr pulse is somewhat greater, peaking a little later at .17C. The 70yr trajectory never quite catches up.

The welfare effect is now 67% worse.

As a wild guess, this suggests that you could use a discount rate of 1 to 2% per year to convert between emissions today vs. emissions distributed over the future, without worrying too much about the feedbacks. That’s barely better than a wild guess and all the numbers above should be taken with a huge grain of salt. It ought to be possible to narrow things down without too much trouble.

I understand your point that the extended release delays the damage. I am happy, for the present, to assume a zero discount rate on welfare. But I would like to understand more about the “time value of money”. Is this a discount rate that has the rate of technological change as a component?

Let me stick to the example of building homes. Advances in construction will allow future homes to be built that have less embodied CO2 at similar cost. Consider a house of brick construction that is set back for five years so that it can be made from hemp and lime. Let us assume that this reduces the embodied CO2 from 70 tonnes to -10 tonnes … I have seen such claims. This would mean a saving of 80 tonnes of CO2 by delaying five years.

Discount rates can be a measure of the benefits from delaying expenditure. Would the “time value of money” account for the benefit from the delay, the “saved CO2″? If not, where can the “saved CO2″ get into the discussion?

[[Concerning the issue of variance in weather data that is used for input into climate model and the problem of lack of correlation between them I am wondering if consideration has been given to the construction of fixed weather data models similar to what geographers use in constructing maps.]]

I am concerned that the European Union’s assessment of climate change may be compromised by an over-emphasis on “official science” and consequently on the current batch of computer models. Loss of arctic sea ice is not the worst consequence of climate change but it is an indicator of the amount by which “official science” underestimates reality. Predictions of the year in which the arctic loses its sea ice in summer, “Arctic Ice Zero”, are tests of the accuracy of current predictive models;

But this year the IPCC predicted Arctic Ice Zero in 2050.
Does this confirm that IPCC science is “two years out-of-date”?

Does the European Union have mechanisms for updating “official science”?

Yesterday, 12 December 2007, there is a report on the BBC website “Arctic summers ice-free ‘by 2013′ “. (http://news.bbc.co.uk/2/hi/science/nature/7139797.stm)
So does the European Union have proper mechanisms for updating “official science”? Does the UK Government? Does anybody else?

I have skimmed the papers associated with the item you report. The main one “Uncertainty in Climate Model Projections of Arctic Sea Ice Decline: An Evaluation Relevant to Polar Bears”. This does consider feedbacks associated with the albedo effect of sea and ice and cloud cover in the Arctic but not the other possible feedbacks. It is concerned with the Arctic sea ice extent and seems to take as a given the other aspects of general models.

I was wondering if the predicted extent of sea-ice could act as a test of these models. Have the models missed important temperature related-feedbacks that are just beginning to have their effects because the Earth’s temperature is just beginning to rise.

Have the models properly taken account of … positive feedbacks: failing carbon sinks, loss of soil carbon, methane from wetlands/tundra, more forest fires, the drying of the Amazon, the sea ice albedo effect, & etc.

I was surprised to hear that the models used for the recent IPCC reports did not include feedbacks for methane from melting tundra. This feedback may or may not be important but are other feedbacks missing?

If some models miss out some feedbacks, how is the whole ensemble changed?

I can’t see myself becoming a real climate scientist with a credible message for politicians, press or public. But I would like to pass one on.

I heard on the BBC news 24 today that glaciers are shrinking twice as fast as previously thought (the 9th of 10 items and lasting 9 seconds). Is this increased rate of melting a surprise? Does it indicate that “we’ve underestimated sensitivity or what carbon cycle feedbacks could do”.

The best we get from politicians is a grudging acceptance of IPCC AR4. Most journalists are worse, especially the BBC. Stephen Sackur interviewed Al Gore and Rajendra Pachauri a few months ago. Instead of challenging the them with James Lovelock or James Hansen he challenged them with Bjorn Lomborg, an economist who uses a mid-range prediction from IPCC AR4 of 2.6 degC global temperature rise by 2100 to argue we can afford to wait.

If climate scientists have underestimated climate change then please let the rest of us know. So we tell the economists, politicians, journalists and government officials the bad news.

Geoff Beacon (162) — I’m but an amateur regarding the very difficult subject of climatology. By now a moderately knowledgeable amateur, based upon a lifetime amateur interest in paleogeology. That said, in my opinion, IPCC has indeed underestimated climate change. From the IPCC 2001 TAR, the linked commentary points out that at that time they underestimated the temperature trend:

THe IPCC AR4 is linked on the sidebar. They explicitly state that they have left out estiamtes for glacier and icecap melting, basically because so little is known. The projections of Arctic sea ice melting are off by decades (although I haven’t read that section of AR4, just commentary). Furthermore, AR4 states right out that the climate models do not do well in predicting precipitation patterns.

However, climatatology is a rapidly developing subject: AR4 was obsolescent even before it was finished. Some climatologists are estimating larger climate changes than the (necessarily conservative) consensus position of IPCC.

I agree with James Hansen that not only must we stop adding carbon to the active carbon cycle, but that lots of carbon needs to be put back underground, securely and permanently. He suggests about 300-350 ppm, I believe. I suggest 315 ppm, largely because this was the concentration in the 1950s, not that I think it anything more than important interim goal.

Eleven of the last twelve years (1995 -2006) rank among the 12 warmest years in the instrumental record of global surface temperature (since 1850). The updated 100-year linear trend (1906–2005) of 0.74 [0.56 to 0.92]°C is therefore larger than the corresponding trend for 1901-2000 given in the TAR of 0.6 [0.4 to 0.8]°C.

(Chapter) 3.2 is referenced, and while various temperature records are referred to there, these figures appear to be from the HadCRU series.

The warming trend for 2001 – 2006 seems to be less pronounced than the cooling period 1901 – 1905.

As the cooling period 1901 – 1905 was omitted in the AR4 series, I wonder about the use of the word “therefore”. It has been suggested to me that the increased trend could be an artefact of the 5 year shift between reports.

That is, if the 1901 – 1905 cooling is removed, might that not “therefore” account (in part or whole) for the increased trend, rather than the “the 12 warmest years in the instrumental record”? (I’m no statistician and am unable to resolve this myself)

My questions are;

1) Are the figures in the SPM derived from HadCRU alone, or from a combination of that and other series (NCDC, GISS)?

2) Is the increased trend in any way an artefact of changing the period by 5 years (and if so, is the use of “therefore” in the SPM statement misleading)?

Though I am unable, my co-interlocutor may understand a sheer statistical response.

Barry, as the warming from 1995-2006 was part of a 30 year warming trend, I rather doubt that that start and end points affect the qualitative conclusion. Also, although the different temperature records use slightly different algorithms and so have slightly different values for a given year, the trends they show are consistent. Therefore, while magnitudes may vary slightly, it is doubtful that the important conclusion–we’re getting warmer–would be altered.