Atmospheric CO2

Ben Bond-Lamberty and Corinne Hartin

2016-07-29

One common use for CMIP5 data, and the RCMIP5 package, is to extract the global mean of some variable of interest for particular models, experiments, and/or ensembles. Here we give an example of this, computing the global mean atmospheric CO2 value by year.

Make sure the required data are present

As the package documentation notes (see e.g. https://github.com/JGCRI/RCMIP5) you first have to download the necessary data from an Earth System Grid Federation node; RCMIP5 won’t do this for you. Assuming this step is complete, then, we check to make sure all the data we’ll need are present. First load information about all the CMIP5 files available:

Here we see that two ensembles are available, both from the ‘rcp85’ experiment run by the ‘CanESM2’ model. Crucially, both are complete: the allHere flag is TRUE (it would be FALSE if, for example, we’d downloaded 2006-2049 and 2051-2100 but forgotten year 2050).

Load the data

Time to load these data into memory. To do so we use the loadCMIP5 function, telling to it to load the ‘co2’ variable from the ‘CanESM2’ model running the ‘rcp85’ experiment. Note that here we’re using the yearRange parameter to limit the data to five years:

loadCMIP5 reports that there are two ensembles to average (r1i1p1 and r2i1p1), the name of each file it loads, the dimension names, the fact that it’s only reading the first five years, and the overall data array size: 128 (lon) x 64 (lat) x 22 (plev, which will be renamed ‘Z’ when read in) x 60 (time). These data are converted to a data frame for fast processing in later steps.

Here we see the name of the variable, model, experiment, the fact that this came from two ensembles, the data range and spatial dimensions, time dimensions and frequency (‘mon’ - monthly), size in memory, and a note about the provenance (see below). Everything looks good. Assuming the ggplot2 package is installed, we might want to make a quick plot of the data to check:

> worldPlot(co2, time=1:12) # first 12 months, i.e. all of 2006

The resulting plot isn’t very interesting, as CO2 is a well-mixed gas, but would be useful if we were working with a spatially variable output.

Compute global means

There are three steps in our processing chain. First, we want to filter the data: this model reports atmospheric CO2 for 22 different levels (the ‘Z’ dimension), and we’re only interested in the level closest to the surface. Second, these are monthly data, and we’re interested in an annual average. Third, we want to reduce the gridded data to a global mean value. This can be done as follows:

The summary reports that co2annual is an annual summary of filtered data, and a weighted mean (from the makeGlobalStat operation). Note the dimensions of the data: no longitude or latitude (since we summarized these to a global mean); a single level (Z) that’s the one filtered to; and 5 time points. The summary also explicitly tells us that these data have been filtered and summarized. At this point, we may want to convert this object to a regular data frame for easy plotting:

Provenance

With each operation we perform, RCMIP5 adds entries to the object’s provenance, a data frame that records various information about the data’s history. If we look at the provenance of the co2summary object, we see a record of the steps performed that produced it:

The provenance also records what function (with parameter values) wrote each message, the data dimensions at each step, and a checksum (MD5 hash) of the data. The provenance can be exported, saved alongside the data, etc.

Speed considerations

The CMIP5 data files can be very large, such that loading (not to mention processing) them is slow or impossible even on computers with plenty of memory. Some tips on dealing with this:

Use the yearRange parameter of loadCMIP5() to process smaller chunks of the data, re-combining only at the end. For the largest CMIP5 data sets, we’ve found it sometimes necessary to process 1-5 years at a time, even on a powerful machine.

If combining results as you go, don’t repeatedly rbind() data frames together. Either write data out to a tempfile, or pre-allocate one results array beforehand.

Order of operations matters! Process the most expensive (largest) dimensions first; for example, makeGlobalStat() should usually come before makeAnnualStat().

Next steps and final notes

As noted above, we can simply use as.data.frame to convert our cmip5data object to a standard data frame. We could also saveNetCDF(co2summary) to save it as a Network Common Data Format (NetCDF) file, for example to send to a colleague; the provenance is saved as NetCDF global attributes, and optionally as an accompanying text file.

Given the above example, it’s straightforward to process a whole collection of CMIP5 data-for example, all the co2 data across experiments and models. For example: