Before looking at predictions for the future I thought it was worth reviewing this claim, seeing as it is so prevalent and is presented as being the current consensus of climate science.

Droughts

SREX 2012, p. 171:

There is medium confidence that since the 1950s some regions of the world have experienced more intense and longer droughts (e.g., southern Europe, west Africa) but also opposite trends exist in other regions (e.g., central North America, northwestern Australia).

The report cites Sheffield and Wood 2008 who show graphs on a variety of drought metrics from around the world over the last 50 years – click to enlarge:

From Sheffield & Wood 2008

Figure 1 – Click to enlarge

The results above were calculated from models based on available meteorological data. According to their analysis some places have experienced more droughts, and other places less droughts. Because they are based on models we can expect that alternative researchers may produce different results.

AR5, published a year after SREX, says, chapter 2, p. 214-215:

Because drought is a complex variable and can at best be incompletely represented by commonly used drought indices, discrepancies in the interpretation of changes can result. For example, Sheffield and Wood (2008) found decreasing trends in the duration, intensity and severity of drought globally. Conversely, Dai (2011a,b) found a general global increase in drought, although with substantial regional variation and individual events dominating trend signatures in some regions (e.g., the 1970s prolonged Sahel drought and the 1930s drought in the USA and Canadian Prairies). Studies subsequent to these continue to provide somewhat different conclusions on trends in global droughts and/ or dryness since the middle of the 20th century (Sheffield et al., 2012; Dai, 2013; Donat et al., 2013c; van der Schrier et al., 2013)..

..In summary, the current assessment concludes that there is not enough evidence at present to suggest more than low confidence in a global-scale observed trend in drought or dryness (lack of rainfall) since the middle of the 20th century, owing to lack of direct observations, geographical inconsistencies in the trends, and dependencies of inferred trends on the index choice.

Based on updated studies, AR4 conclusions regarding global increasing trends in drought since the 1970s were probably overstated.

The paper by Dai is Drought under global warming: a review, A Dai, Climate Change (2011) – for some reason I am unable to access it.

A later paper in Nature, Trenberth et al 2013 (including both Sheffield and Dai as co-authors) said:

Two recent papers looked at the question of whether large-scale drought has been increasing under climate change. A study in Nature by Sheffield et al entitled ‘Little change in global drought over the past 60 years’ was published at almost the same time that ‘Increasing drought under global warming in observations and models’ by Dai appeared in Nature Climate Change (published online in August 2012). How can two research groups arrive at such seemingly contradictory conclusions?

Another later paper on droughts, Orlowski & Seneviratne 2013, likewise shows overwhelming evidence of more droughts – click to enlarge:

From Orlowsky & Seneviratne 2013

Figure 2 – Click to enlarge

Floods

SREX 2012, p. 177:

Overall, there is low confidence (due to limited evidence) that anthropogenic climate change has affected the magnitude and frequency of floods, though it has detectably influenced several components of the hydrological cycle, such as precipitation and snowmelt, that may impact flood trends. The assessment of causes behind the changes in floods is inherently complex and difficult.

AR5, Chapter 2, p. 214:

AR5 WGII assesses floods in regional detail accounting for the fact that trends in floods are strongly influenced by changes in river management (see also Section 2.5.2). Although the most evident flood trends appear to be in northern high latitudes, where observed warming trends have been largest, in some regions no evidence of a trend in extreme flooding has been found, for example, over Russia based on daily river discharge (Shiklomanov et al., 2007).

Other studies for Europe (Hannaford and Marsh, 2008; Renard et al., 2008; Petrow and Merz, 2009; Stahl et al., 2010) and Asia (Jiang et al., 2008; Delgado et al., 2010) show evidence for upward, downward or no trend in the magnitude and frequency of floods, so that there is currently no clear and widespread evidence for observed changes in flooding except for the earlier spring flow in snow-dominated regions (Seneviratne et al., 2012).

In summary, there continues to be a lack of evidence and thus low confidence regarding the sign or trend in the magnitude and/or frequency of floods on a global scale.

[Note: the text in the bottom line cited says: “..regarding the sign of trend in the magnitude..” which I assume is a typo, and so I changed of into or]

Storms

SREX, p. 159:

Detection of trends in tropical cyclone metrics such as frequency, intensity, and duration remains a significant challenge..

..Natural variability combined with uncertainties in the historical data makes it difficult to detect trends in tropical cyclone activity. There have been no significant trends observed in global tropical cyclone frequency records, including over the present 40-year period of satellite observations (e.g., Webster et al., 2005). Regional trends in tropical cyclone frequency have been identified in the North Atlantic, but the fidelity of these trends is debated (Holland and Webster, 2007; Landsea, 2007; Mann et al., 2007a). Different methods for estimating undercounts in the earlier part of the North Atlantic tropical cyclone record provide mixed conclusions (Chang and Guo, 2007; Mann et al., 2007b; Kunkel et al., 2008; Vecchi and Knutson, 2008).

Regional trends have not been detected in other oceans (Chan and Xu, 2009; Kubota and Chan, 2009; Callaghan and Power, 2011). It thus remains uncertain whether any observed increases in tropical cyclone frequency on time scales longer than about 40 years are robust, after accounting for past changes in observing capabilities (Knutson et al., 2010)..

..Time series of power dissipation, an aggregate compound of tropical cyclone frequency, duration, and intensity that measures total energy consumption by tropical cyclones, show upward trends in the North Atlantic and weaker upward trends in the western North Pacific over the past 25 years (Emanuel, 2007), but interpretation of longer-term trends in this quantity is again constrained by data quality concerns.

The variability and trend of power dissipation can be related to SST and other local factors such as tropopause temperature and vertical wind shear (Emanuel, 2007), but it is a current topic of debate whether local SST or the difference between local SST and mean tropical SST is the more physically relevant metric (Swanson, 2008).

The distinction is an important one when making projections of changes in power dissipation based on projections of SST changes, particularly in the tropical Atlantic where SST has been increasing more rapidly than in the tropics as a whole (Vecchi et al., 2008). Accumulated cyclone energy, which is an integrated metric analogous to power dissipation, has been declining globally since reaching a high point in 2005, and is presently at a 40- year low point (Maue, 2009). The present period of quiescence, as well as the period of heightened activity leading up to the high point in 2005, does not clearly represent substantial departures from past variability (Maue, 2009)..

..The present assessment regarding observed trends in tropical cyclone activity is essentially identical to the WMO assessment (Knutson et al., 2010): there is low confidence that any observed long-term (i.e., 40 years or more) increases in tropical cyclone activity are robust, after accounting for past changes in observing capabilities.

AR5, Chapter 2, p. 216:

AR4 concluded that it was likely that an increasing trend had occurred in intense tropical cyclone activity since 1970 in some regions but that there was no clear trend in the annual numbers of tropical cyclones. Subsequent assessments, including SREX and more recent literature indicate that it is difficult to draw firm conclusions with respect to the confidence levels associated with observed trends prior to the satellite era and in ocean basins outside of the North Atlantic.

Lots more tropical storms:

From AR5, wg I

Figure 3

Note that a more important metric than “how many?” is “how severe?” or a combination of both.

In summary it is likely that there has been a poleward shift in the main Northern and Southern Hemisphere extratropical storm tracks during the last 50 years. There is medium confidence in an anthropogenic influence on this observed poleward shift. It has not formally been attributed.

There is low confidence in past changes in regional intensity.

And AR5, chapter 2, p. 217 & 220:

Some studies show an increase in intensity and number of extreme Atlantic cyclones (Paciorek et al., 2002; Lehmann et al., 2011) while others show opposite trends in eastern Pacific and North America (Gulev et al., 2001). Comparisons between studies are hampered because of the sensitivities in identification schemes and/ or different definitions for extreme cyclones (Ulbrich et al., 2009; Neu et al., 2012). The fidelity of research findings also rests largely with the underlying reanalyses products that are used..

..In summary, confidence in large scale changes in the intensity of extreme extratropical cyclones since 1900 is low. There is also low confidence for a clear trend in storminess proxies over the last century due to inconsistencies between studies or lack of long-term data in some parts of the world (particularly in the SH). Likewise, confidence in trends in extreme winds is low, owing to quality and consistency issues with analysed data.

Discussion

The IPCC SREX and AR5 reports were published in 2012 and 2013 respectively. There will be new research published since these reports analyzing the same data and possibly reaching different conclusions. When you have large decadal variability in poorly observed data with a small or non-existent trend then inevitably different groups will be able to reach different conclusions on these trends. And if you focus on specific regions you can demonstrate a clear and unmistakeable trend.

If you are looking for a soundbite just pick the right region.

The last 100 years have seen global warming. As this blog has made clear from the physics, more GHGs (all other things remaining equal) result in more warming. What proportion of the last 100 years is intrinsic climate variability vs the anthropogenic GHG proportion I have no idea.

The last century has seen no clear globally averaged change in floods, droughts or storms – as best as we can tell with very incomplete observing systems. Of course, some regions have definitely seen more, and some regions have definitely seen less. Whether this is different from the period from 1800-1900 or from 1700-1800 no one knows. Perhaps floods, droughts and tropical storms increased globally from 1700-1900. Perhaps they decreased. Perhaps the last 100 years have seen more variability. Perhaps not. (And in recognition of Poe’s law, I note that a few statements within the article presenting graphs did say the opposite of the graphs presented).

In the previous 19 articles of this series we’ve seen a concise summary (just kidding) of the problems of modeling ice ages. That is, it is hard to model ice ages for at least three reasons:

knowledge of the past is hard to come by, relying on proxies which have dating uncertainties and multiple variables being expressed in one proxy (so are we measuring temperature, or a combination of temperature and other variables?)

computing resources make it impossible to run a GCM at current high resolution for the 100,000 years necessary, let alone to run ensembles with varying external forcings and varying parameters (internal physics)

lack of knowledge of key physics, specifically: ice sheet dynamics with very non-linear behavior; and the relationship between CO2, methane and the ice age cycles

The usual approach using GCMs is to have some combination of lower resolution grids, “faster” time and prescribed ice sheets and greenhouse gases.

These articles cover the subject:

Part Seven – GCM I – early work with climate models to try and get “perennial snow cover” at high latitudes to start an ice age around 116,000 years ago

Part Nine – GCM III – very recent work from 2012, a full GCM, with reduced spatial resolution and speeding up external forcings by a factors of 10, modeling the last 120 kyrs

Part Ten – GCM IV – very recent work from 2012, a high resolution GCM called CCSM4, producing glacial inception at 115 kyrs

One of the the papers I thought about covering in this article (Calov et al 2005) is already briefly covered in Part Eight. I would like to highlight one comment I made in the conclusion of Part Ten:

What the paper [Jochum et al, 2012] also reveals – in conjunction with what we have seen from earlier articles – is that as we move through generations and complexities of models we can get success, then a better model produces failure, then a better model again produces success. Also we noted that whereas the 2003 model (also cold-biased) of Vettoretti & Peltier found perennial snow cover through increased moisture transport into the critical region (which they describe as an “atmospheric–cryospheric feedback mechanism”), this more recent study with a better model found no increase in moisture transport.

So, onto a little more about EMICs.

There are two papers from 2000/2001 describing the CLIMBER-2 model and the results from sensitivity experiments. These are by the same set of authors – Petoukhov et al 2000 & Ganopolski et al 2001 (see references).

Here is the grid:

From Petoukhov et al (2000)

The CLIMBER-2 model has a low spatial resolution which only resolves individual continents (subcontinents) and ocean basins (fig 1). Latitudinal resolutions is the same for all modules (10º). In the longitudinal direction the Earth is represented by seven equal sectors (roughly 51º􏰖 longitude) in the atmosphere and land modules.

The ocean model is a zonally averaged multibasin model, which in longitudinal direction resolves only three ocean basins Atlantic, Indian, Pacific). Each ocean grid cell communicates with either one, two or three atmosphere grid cells, depending on the width of the ocean basin. Very schematic orography and bathymetry are prescribed in the model, to represent the Tibetan plateau, the high Antarctic elevation and the presence of the Greenland-Scotland sill in the Atlantic ocean.

The atmospheric model has a simplified approach, leading to the description 2.5D model. The time step can be relaxed to about 1 day per step. The ocean grid is a little finer in latitude.

On selecting parameters and model “tuning”:

Careful tuning is essential for a new model, as some parameter values are not known a priori and incorrect choices of parameter values compromise the quality and reliability of simulations. At the same time tuning can be abused (getting the right results for the wrong reasons) if there are too many free parameters. To avoid this we adhered to a set of common-sense rules for good tuning practice:

1. Parameters which are known empirically or from theory must not be used for tuning.

2. Where ever possible parametrizations should be tuned separately against observed data, not in the context of the whole model. (Most of the parameters values in Table 1 were obtained in this way and only few of them were determined by tuning the model to the observed climate).

4. The number of tuning parameters must be much smaller than the degrees of freedom predicted by the model. (In our case the predicted degrees of freedom exceed the number of tuning parameters by several orders of magnitude).

To apply the coupled climate model for simulations of climates substantially different from the present, it is crucial to avoid any type of ̄flux adjustment. One of the reasons for the need of ̄flux adjustments in many general circulation models is their high computational cost, which makes optimal tuning􏱃 difficult. The high speed of CLIMBER-2 allows us to perform many sensitivity experiments required to identify the physical reasons for model problems and the best parameter choices. A physically correct choice of model parameters is fundamentally different from a flux adjustment; only in the former case the surface fluxes are part of the proper feedbacks when the climate changes.

What I remembered about EMICs and suggested in my comment was based on this 2010 paper by Ganopolski, Calov & Claussen:

We will start the discussion of modelling results with a so-called Baseline Experiment (BE). This experiment represents a “suboptimal” subjective tuning of the model parameters to achieve the best agreement between modelling results and palaeoclimate data. Obviously, even with a model of intermediate complexity it is not possible to test all possible combinations of important model parameters which can be considered as free (tunable) parameters.

In fact, the BE was selected from hundred model simulations of the last glacial cycle with different combinations of key model parameters.

Note, that we consider “tunable” parameters only for the ice-sheet model and the SEMI interface, while the utilized climate component of CLIMBER-2 is the same in previous studies, such as those used by C05 [this is Calov et al. (2005)]. In the next section, we will discuss the results of a set of sensitivity experiments, which show that our modelling results are rather sensitive to the choice of the model parameters..

..The ice sheet model and the ice sheet-climate interface contain a number of parameters which are not derived from first principles. They can be considered as “tunable” parameters. As stated above, the BE was subjectively selected from a large suite of experiments as the best fit to empirical data. Below we will discuss results of a number of additional experiments illustrating the sensitivity of simulated glacial cycle to several model parameters. These results show that the model is rather sensitive to a number of poorly constrained parameters and parameterisations, demonstrating the challenges to realistic simulations of glacial cycles with a comprehensive Earth system model.

And in their conclusion:

Our experiments demonstrate that the CLIMBER-2 model with an appropriate choice of model parameters simulates the major aspects of the last glacial cycle under orbital and greenhouse gases forcing rather realistically. In the simulations, the glacial cycle begins with a relatively abrupt lateral expansion of the North American ice sheets and parallel growth of the smaller northern European ice sheets. During the initial phase of the glacial cycle (MIS 5), the ice sheets experience large variations on precessional time scales. Later on, due to a decrease in the magnitude of the precessional cycle and a stabilising effect of low CO2 concentration, the ice sheets remain large and grow consistently before reaching their maximum at around 20 kyr BP..

..From about 19 kyr BP, the ice sheets start to retreat with a maximum rate of sea level rise reaching some 15 m per 1000 years around 15kyrBP. The northern European ice sheets disappeared first, and the North American ice sheets completely disappeared at around 7 kyr BP. Fast sliding processes and the reduction of surface albedo due to deposition of dust play an important role in rapid deglaciation of the NH. Thus our results strongly support the idea about important role of aeolian dust in the termination of glacial cycles proposed earlier by Peltier and Marshall (1995)..

..Results from a set of sensitivity experiments demonstrate high sensitivity of simulated glacial cycle to the choice of some modelling parameters, and thus indicate the challenge to perform realistic simulations of glacial cycles with the computationally expensive models.

My summary – the simplifications of the EMIC combined with the “trying lots of parameters” approach means I have trouble putting much significance on the results.

While the basic setup, as described in the 2000 & 2001 papers seems reasonable, EMICs miss a lot of physics. This is important with something like starting and ending an ice age, where the feedbacks in higher resolution models can significantly reduce the effect seen by lower resolution models. When we run 100’s of simulations with different parameters (relating to the ice sheet) and find the best result I wonder what we’ve actually found.

That doesn’t mean they are of no value. Models help us to understand how the physics of climate actually works, because we can’t do these calculations in our heads. GCMs require too much computing resources to properly study ice ages.

So I look at EMICs as giving some useful insights that need to be validated with more complex models. Or with further study against other observations (what predictions do these parameter selections give us that can be verified?)

I don’t see them as demonstrating that the results “show” we’ve now modeled ice ages. The exact same comment also goes for another 2007 paper which used a GCM coupled to an ice sheet model that we covered in Part Nineteen – Ice Sheet Models I. An update of that paper in 2013 came with a excited Nature press release but to me simply demonstrates that with a few unknown parameters you can get a good result with some specific values of those parameters. This is not at all surprising. Let’s call it a good start.

Perhaps Abe Ouchi et al 2013 was the paper that will be verified as the answer to the question of ice age terminations – the delayed isostatic rebound.

Perhaps Ganopolski, Calov & Claussen 2010 with the interaction of dust on ice sheets will be verified as the answer to that question.

Twelve – GCM V – Ice Age Termination – very recent work from He et al 2013, using a high resolution GCM (CCSM3) to analyze the end of the last ice age and the complex link between Antarctic and Greenland

Thirteen – Terminator II – looking at the date of Termination II, the end of the penultimate ice age – and implications for the cause of Termination II

Fourteen – Concepts & HD Data – getting a conceptual feel for the impacts of obliquity and precession, and some ice age datasets in high resolution

In one stereotypical view of climate, the climate state has some variability over a 30 year period – we could call this multi-decadal variability “noise” – but it is otherwise fixed by the external conditions, the “external forcings”.

In this stereotypical view, the only reason why “long term” (=30 year statistics) can change is because of “external forcing”. Otherwise, where does the “extra energy” come from (we will examine this particular idea in a future article).

Abrupt climate change is abundant in geological records, but climate models rarely have been able to simulate such events in response to realistic forcing.

Here we report on a spontaneous abrupt cooling event, lasting for more than a century, with a temperature anomaly similar to that of the Little Ice Age. The event was simulated in the preindustrial control run of a high- resolution climate model, without imposing external perturbations.

This is interesting and instructive on many levels so let’s take a look. In later articles we will look at the evidence in climate history for “abrupt” events, for now note that Dansgaard–Oeschger (DO) events are the description of the originally identified form of abrupt change.

The distinction between “abrupt” changes and change that is not “abrupt” is an artificial one, it is more a reflection of the historical order in which we discovered “slow” and “abrupt” change.

Under a Significance inset box in the paper:

There is a long-standing debate about whether climate models are able to simulate large, abrupt events that characterized past climates. Here, we document a large, spontaneously occurring cold event in a preindustrial control run of a new climate model.

The event is comparable to the Little Ice Age both in amplitude and duration; it is abrupt in its onset and termination, and it is characterized by a long period in which the atmospheric circulation over the North Atlantic is locked into a state with enhanced blocking.

Here is their graph of the time-series of temperature (left) , and the geographical anomaly (right) expressed as the change during the 100 year event against the background of years 200-400:

From Drijfhout et al 2013

Figure 1 – Click to expand

In their summary they state:

The lesson learned from this study is that the climate system is capable of generating large, abrupt climate excursions without externally imposed perturbations. Also, because such episodic events occur spontaneously, they may have limited predictability.

Before we look at the “causes” – the climate mechanisms – of this event, let’s briefly look at the climate model.

Their coupled GCM has an atmospheric resolution of just over 1º x 1º with 62 vertical levels, and the ocean has a resolution of 1º in the extra-tropics, increasing to 0.3º near the equator. The ocean has 42 vertical levels, with the top 200m of the ocean represented by 20 equally spaced 10m levels.

The GHGs and aerosols are set at pre-industrial 1860 values and don’t change over the 1,125 year simulation. There are no “flux adjustments” (no need for artificial momentum and energy additions to keep the model stable as with many older models).

See note 1 for a fuller description and the paper in the references for a full description.

The simulated event itself:

After 450 y, an abrupt cooling event occurred, with a clear signal in the Atlantic multidecadal oscillation (AMO). In the instrumental record, the amplitude of the AMO since the 1850s is about 0.4 °C, its SD 0.2 °C. During the event simulated here, the AMO index dropped by 0.8 °C for about a century..

How did this abrupt change take place?

The main mechanism was a change in the Atlantic Meridional Overturning Current (AMOC), also known as the Thermohaline circulation. The AMOC raises a nice example of the sensitivity of climate. The AMOC brings warmer water from the tropics into higher latitudes. A necessary driver of this process is the intensity of deep convection in high latitudes (sinking dense water) which in turn depends on two factors – temperature and salinity. More importantly, more accurately, it depends on the competingdifferences in anomalies of temperature and salinity

To shut down deep convection, the density of the surface water must decrease. In the temperature range of 7–12 °C, typical for the Labrador Sea, the SST anomaly in degrees Celsius has to be roughly 5 times the sea surface salinity (SSS) anomaly in practical salinity units for density compensation to occur. The SST anomaly was only about twice that of the SSS anomaly; the density anomaly was therefore mostly determined by the salinity anomaly.

In the figure below we see (left) the AMOC time series at two locations with the reduction during the cold century, and (right) the anomaly by depth and latitude for the “cold century” vs the climatology for years 200-400:

From Drijfhout et al 2013

Figure 2 – Click to expand

What caused the lower salinities? It was more sea ice, melting in the right location. The excess sea ice was caused by positive feedback between atmospheric and ocean conditions “locking in” a particular pattern. The paper has a detailed explanation with graphics of the pressure anomalies which is hard to reduce to anything more succinct, apart from their abstract:

Initial cooling started with a period of enhanced atmospheric blocking over the eastern subpolar gyre.

In response, a southward progression of the sea-ice margin occurred, and the sea-level pressure anomaly was locked to the sea-ice margin through thermal forcing. The cold-core high steered more cold air to the area, reinforcing the sea-ice concentration anomaly east of Greenland.

The sea-ice surplus was carried southward by ocean currents around the tip of Greenland. South of 70°N, sea ice already started melting and the associated freshwater anomaly was carried to the Labrador Sea, shutting off deep convection. There, surface waters were exposed longer to atmospheric cooling and sea surface temperature dropped, causing an even larger thermally forced high above the Labrador Sea.

Conclusion

It is fascinating to see a climate model reproducing an example of abrupt climate change. There are a few contexts to suggest for this result.

1. From the context of timescale we could ask how often these events take place, or what pre-conditions are necessary. The only way to gather meaningful statistics is for large ensemble runs of considerable length – perhaps thousands of “perturbed physics” runs each of 100,000 years length. This is far out of reach for processing power at the moment. I picked some arbitrary numbers – until the statistics start to converge and match what we see from paleoclimatology studies we don’t know if we have covered the “terrain”.

Or perhaps only five runs of 1,000 years are needed to completely solve the problem (I’m kidding).

2. From the context of resolution – as we achieve higher resolution in models we may find new phenomena emerging in climate models that did not appear before. For example, in ice age studies, coarser climate models could not achieve “perennial snow cover” at high latitudes (as a pre-condition for ice age inception), but higher resolution climate models have achieved this first step. (See Ghosts of Climates Past – Part Seven – GCM I & Part Eight – GCM II).

As a comparison on resolution, the 2,000 year El Nino study we saw in Part Six of this series had an atmospheric resolution of 2.5º x 2.0º with 24 levels.

However, we might also find that as the resolution progressively increases (with the inevitable march of processing power) phenomena that appear at one resolution disappear at yet higher resolutions. This is an opinion, but if you ask people who have experience with computational fluid dynamics I expect they will say this would not be surprising.

3. Other models might reach similar or higher resolution and never get this kind of result and demonstrate the flaw in the EC-Earth model that allowed this “Little Ice Age” result to occur. Or the reverse.

As the authors say:

As a result, only coupled climate models that are capable of realistically simulating atmospheric blocking in relation to sea-ice variations feature the enhanced sensitivity to internal fluctuations that may temporarily drive the climate system to a state that is far beyond its standard range of natural variability.

Notes

Note 1: From the Supporting Information from their paper:

Climate Model and Numerical Simulation. The climate model used in this study is version 2.2 of the EC-Earth earth system model [see references] whose atmospheric component is based on cycle 31r1 of the European Centre for Medium-range Weather Forecasts (ECMWF) Integrated Forecasting System.

The atmospheric component runs at T159 horizontal spectral resolution (roughly 1.125°) and has 62 vertical levels. In the vertical a terrain-following mixed σ/pressure coordinate is used.

The Nucleus for European Modeling of the Ocean (NEMO), version V2, running in a tripolar configuration with a horizontal resolution of nominally 1° and equatorial refinement to 0.3° (2) is used for the ocean component of EC-Earth.

Vertical mixing is achieved by a turbulent kinetic energy scheme. The vertical z coordinate features a partial step implementation, and a bottom boundary scheme mixes dense water down bottom slopes. Tracer advection is accomplished by a positive definite scheme, which does not produce spurious negative values.

The model does not resolve eddies, but eddy-induced tracer advection is parameterized (3). The ocean is divided into 42 vertical levels, spaced by ∼10 m in the upper 200 m, and thereafter increasing with depth. NEMO incorporates the Louvain-la-Neuve sea-ice model LIM2 (4), which uses the same grid as the ocean model. LIM2 treats sea ice as a 2D viscous-plastic continuum that transmits stresses between the ocean and atmosphere. Thermodynamically it consists of a snow and an ice layer.

The ocean, ice, land, and atmosphere are coupled through the Ocean, Atmosphere, Sea Ice, Soil 3 coupler (5). No flux adjustments are applied to the model, resulting in a physical consistency between surface fluxes and meridional transports.

The present preindustrial (PI) run was conducted by Met Éireann and comprised 1,125 y. The ocean was initialized from the World Ocean Atlas 2001 climatology (6). The atmosphere used the 40-year ECMWF Re-Analysis of January 1, 1979, as the initial state with permanent PI (1850) greenhouse gas (280 ppm) and aerosol concentrations.

In Part Three – Attribution & Fingerprints we looked at an early paper in this field, from 1996. I was led there by following back through many papers referenced from AR5 Chapter 10. The lead author of that paper, Gabriele Hegerl, has made a significant contribution to the 3rd report, 4th and 5th IPCC reports on attribution.

We saw in Part Three that this particular paper ascribed a probability:

We find that the latest observed 30-year trend pattern of near-surface temperature change can be distinguished from all estimates of natural climate variability with an estimated risk of less than 2.5% if the optimal fingerprint is applied.

That paper did note that greatest uncertainty was in understanding the magnitude of natural variability. This is an essential element of attribution.

It wasn’t explicitly stated whether the 97.5% confidence was with the premise that natural variability was accurately understood in 1996. I believe that this was the premise. I don’t know what confidence would have been ascribed to the attribution study if uncertainty over natural variability was included.

IPCC AR5

In this article we will look at the IPCC 5th report, AR5, and see how this field has progressed, specifically in regard to the understanding of natural variability. Chapter 10 covers Detection and Attribution of Climate Change.

From p.881 (the page numbers are from the start of the whole report, chapter 10 has just over 60 pages plus references):

I had trouble understanding AR5 Chapter 10 because there was no explicit discussion of natural variability. The papers referenced (usually) have their own section on natural variability, but chapter 10 doesn’t actually cover it.

I emailed Geert Jan van Oldenborgh to ask for help. He is the author of one paper we will briefly look at here – his paper was very interesting and he had a video segment explaining his paper. He suggested the problem was more about communication because natural variability was covered in chapter 9 on models. He had written a section in chapter 11 that he pointed me towards, so this article became something that tried to grasp the essence of three chapters (9 – 11), over 200 pages of reports and several pallet loads of papers.

So I’m not sure I can do the synthesis justice, but what I will endeavor to do in this article is demonstrate the minimal focus (in IPCC AR5) on how well models represent natural variability.

That subject deserves a lot more attention, so this article will be less about what natural variability is, and more about how little focus it gets in AR5. I only arrived here because I was determined to understand “fingerprints” and especially the rationale behind the certainties ascribed.

Subsequent articles will continue the discussion on natural variability.

Knutson et al 2013

The models [CMIP5] are found to provide plausible representations of internal climate variability, although there is room for improvement..

..The modeled internal climate variability from long control runs is used to determine whether observed and simulated trends are consistent or inconsistent. In other words, we assess whether observed and simulated forced trends are more extreme than those that might be expected from random sampling of internal climate variability.

Later

The model control runs exhibit long-term drifts. The magnitudes of these drifts tend to be larger in the CMIP3 control runs than in the CMIP5 control runs, although there are exceptions. We assume that these drifts are due to the models not being in equilibrium with the control run forcing, and we remove the drifts by a linear trend analysis (depicted by the orange straight lines in Fig. 1). In some CMIP3 cases, the drift initially proceeds at one rate, but then the trend becomes smaller for the remainder of the run. We approximate the drift in these cases by two separate linear trend segments, which are identified in the figure by the short vertical orange line segments. These long-term drift trends are removed to produce the drift corrected series.

[Emphasis added].

Another paper suggests this assumption might not be correct. Here is Jones, Stott and Christidis (2013) – “piControl” are the natural variability model simulations:

Often a model simulation with no changes in external forcing (piControl) will have a drift in the climate diagnostics due to various flux imbalances in the model [Gupta et al., 2012]. Some studies attempt to account for possible model climate drifts, for instance Figure 9.5 in Hegerl et al. [2007] did not include transient simulations of the 20th century if the long-term trend of the piControl was greater in magnitude than 0.2 K/century (Appendix 9.C in Hegerl et al. [2007]).

Another technique is to remove the trend, from the transient simulations, deduced from a parallel section of piControl [e.g., Knutson et al., 2006]. However whether one should always remove the piControl trend, and how to do it in practice, is not a trivial issue [Taylor et al., 2012; Gupta et al., 2012]..

..We choose not to remove the trend from the piControl from parallel simulations of the same model in this study due to the impact it would have on long-term variability, i.e., the possibility that part of the trend in the piControl may be long-term internal variability that may or may not happen in a parallel experiment when additional forcing has been applied.

Here are further comments from Knutson et al 2013:

Five of the 24 CMIP3 models, identified by “(-)” in Fig. 1, were not used, or practically not used, beyond Fig. 1 in our analysis. For instance, the IAP_fgoals1.0.g model has a strong discontinuity near year 200 of the control run. We judge this as likely an artifact due to some problem with the model simulation, and we therefore chose to exclude this model from further analysis

From Knutson et al 2013

Figure 1

Perhaps this is correct. Or perhaps the jump in simulated temperature is the climate model capturing natural climate variability.

The authors do comment:

As noted by Wittenberg (2009) and Vecchi and Wittenberg (2010), long-running control runs suggest that internally generated SST variability, at least in the ENSO region, can vary substantially between different 100-yr periods (approximately the length of record used here for observations), which again emphasizes the caution that must be placed on comparisons of modeled vs. observed internal variability based on records of relatively limited duration.

The first paper referenced, Wittenberg 2009, was the paper we looked at in Part Six – El Nino.

So is the “caution” that comes from that study included in the probability of our models ability to simulate natural variability?

In reality, questions about internal variability are not really discussed. Trends are removed, models with discontinuities are artifacts. What is left? This paper essentially takes the modeling output from the CMIP3 and CMIP5 archives (with and without GHG forcing) as a given and applies some tests.

Ribes & Terray 2013

This was a “Part II” paper and they said:

We use the same estimates of internal variability as in Ribes et al. 2013 [the “Part I”].

These are based on intra-ensemble variability from the above CMIP5 experiments as well as pre-industrial simulations from both the CMIP3 and CMIP5 archives, leading to a much larger sample than previously used (see Ribes et al. 2013 for details about ensembles). We then implicitly assume that the multi-model internal variability estimate is reliable.

[Emphasis added]. The Part I paper said:

An estimate of internal climate variability is required in detection and attribution analysis, for both optimal estimation of the scaling factors and uncertainty analysis.

Estimates of internal variability are usually based on climate simulations, which may be control simulations (i.e. in the present case, simulations with no variations in external forcings), or ensembles of simulations with the same prescribed external forcings.

In the latter case, m – 1 independent realisations of pure internal variability may be obtained by subtracting the ensemble mean from each member (assuming again additivity of the responses) and rescaling the result by a factor √(m/(m-1)) , where m denotes the number of members in the ensemble.

Note that estimation of internal variability usually means estimation of the covariance matrix of a spatio-temporal climate-vector, the dimension of this matrix potentially being high. We choose to use a multi-model estimate of internal climate variability, derived from a large ensemble of climate models and simulations. This multi-model estimate is subject to lower sampling variability and better represents the effects of model uncertainty on the estimate of internal variability than individual model estimates. We then simultaneously consider control simulations from the CMIP3 and CMIP5 archives, and ensembles of historical simulations (including simulations with individual sets of forcings) from the CMIP5 archive.

All control simulations longer than 220 years (i.e. twice the length of our study period) and all ensembles (at least 2 members) are used. The overall drift of control simulations is removed by subtracting a linear trend over the full period.. We then implicitly assume that this multi- model internal variability estimate is reliable.

[Emphasis added]. So two approaches to evaluate internal variability – one approach uses GCM runs with no GHG forcing; and the other approach uses the variation between different runs of the same model (with GHG forcing) to estimate natural variability. Drift is removed as “an error”.

Chapter 10 on Spatial Trends

Figure 10.2a shows the pattern of annual mean surface temperature trends observed over the period 1901–2010, based on Hadley Centre/ Climatic Research Unit gridded surface temperature data set 4 (Had- CRUT4). Warming has been observed at almost all locations with sufficient observations available since 1901.

Rates of warming are generally higher over land areas compared to oceans, as is also apparent over the 1951–2010 period (Figure 10.2c), which simulations indicate is due mainly to differences in local feedbacks and a net anomalous heat transport from oceans to land under GHG forcing, rather than differences in thermal inertia (e.g., Boer, 2011). Figure 10.2e demonstrates that a similar pattern of warming is simulated in the CMIP5 simulations with natural and anthropogenic forcing over the 1901–2010 period. Over most regions, observed trends fall between the 5th and 95th percentiles of simulated trends, and van Oldenborgh et al. (2013) find that over the 1950–2011 period the pattern of observed grid cell trends agrees with CMIP5 simulated trends to within a combination of model spread and internal variability..

van Oldenborgh et al (2013)

Let’s take a look at van Oldenborgh et al (2013).

There’s a nice video of (I assume) the lead author talking about the paper and comparing the probabilistic approach used in weather forecasts with that of climate models (see Ensemble Forecasting). I recommend the video for a good introduction to the topic of ensemble forecasting.

With weather forecasting the probability comes from running ensembles of weather models and seeing, for example, how many simulations predict rain vs how many do not. The proportion is the probability of rain. With weather forecasting we can continually review how well the probabilities given by ensembles match the reality. Over time we will build up a set of statistics of “probability of rain” and compare with the frequency of actual rainfall. It’s pretty easy to see if the models are over-confident or under-confident.

Here is what the authors say about the problem and how they approached it:

The ensemble is considered to be an estimate of the probability density function (PDF) of a climate forecast. This is the method used in weather and seasonal forecasting (Palmer et al 2008). Just like in these fields it is vital to verify that the resulting forecasts are reliable in the definition that the forecast probability should be equal to the observed probability (Joliffe and Stephenson 2011).

If outcomes in the tail of the PDF occur more (less) frequently than forecast the system is overconfident (underconfident): the ensemble spread is not large enough (too large).

In contrast to weather and seasonal forecasts, there is no set of hindcasts to ascertain the reliability of past climate trends per region. We therefore perform the verification study spatially, comparing the forecast and observed trends over the Earth. Climate change is now so strong that the effects can be observed locally in many regions of the world, making a verification study on the trends feasible. Spatial reliability does not imply temporal reliability, but unreliability does imply that at least in some areas the forecasts are unreliable in time as well. In the remainder of this letter we use the word ‘reliability’ to indicate spatial reliability.

[Emphasis added]. The paper first shows the result for one location, the Netherlands, with the spread of model results vs the actual result from 1950-2011:

from van Oldenborgh et al 2013

Figure 2

We can see that the models are overall mostly below the observation. But this is one data point. So if we compared all of the datapoints – and this is on a grid of 2.5º – how do the model spreads compare with the results? Are observations above 95% of the model results only 5% of the time? Or more than 5% of the time? And are observations below 5% of the model results only 5% of the time?

We can see that the frequency of observations in the bottom 5% of model results is about 13% and the frequency of observations in the top 5% of model results is about 20%. Therefore the models are “overconfident” in spatial representation of the last 60 year trends:

From van Oldenborgh et al 2013

Figure 3

We investigated the reliability of trends in the CMIP5 multi-model ensemble prepared for the IPCC AR5. In agreement with earlier studies using the older CMIP3 ensemble, the temperature trends are found to be locally reliable. However, this is due to the differing global mean climate response rather than a correct representation of the spatial variability of the climate change signal up to now: when normalized by the global mean temperature the ensemble is overconfident. This agrees with results of Sakaguchi et al (2012) that the spatial variability in the pattern of warming is too small. The precipitation trends are also overconfident. There are large areas where trends in both observational dataset are (almost) outside the CMIP5 ensemble, leading us to conclude that this is unlikely due to faulty observations.

It’s probably important to note that the author comments in the video “on the larger scale the models are not doing so badly”.

Jones et al 2013

It was reassuring to finally find a statement that confirmed what seemed obvious from the “omissions”:

Abasic assumption of the optimal detection analysis is that the estimate of internal variability used is comparable with the real world’s internal variability.

Surely I can’t be the only one reading Chapter 10 and trying to understand the assumptions built into the “with 95% confidence” result. If Chapter 10 is only aimed at climate scientists who work in the field of attribution and detection it is probably fine not to actually mention this minor detail in the tight constraints of only 60 pages.

But if Chapter 10 is aimed at a wider audience it seems a little remiss not to bring it up in the chapter itself.

I probably missed the stated caveat in chapter 10’s executive summary or introduction.

The authors continue:

As the observations are influenced by external forcing, and we do not have a non-externally forced alternative reality to use to test this assumption, an alternative common method is to compare the power spectral density (PSD) of the observations with the model simulations that include external forcings.

We have already seen that overall the CMIP5 and CMIP3 model variability compares favorably across different periodicities with HadCRUT4-observed variability (Figure 5). Figure S11 (in the supporting information) includes the PSDs for each of the eight models (BCC-CSM1-1, CNRM-CM5, CSIRO- Mk3-6-0, CanESM2, GISS-E2-H, GISS-E2-R, HadGEM2- ES and NorESM1-M) that can be examined in the detection analysis.

Variability for the historical experiment in most of the models compares favorably with HadCRUT4 over the range of periodicities, except for HadGEM2-ES whose very long period variability is lower due to the lower overall trend than observed and for CanESM2 and bcc-cm1-1 whose decadal and higher period variability are larger than observed.

While not a strict test, Figure S11 suggests that the models have an adequate representation of internal variability—at least on the global mean level. In addition, we use the residual test from the regression to test whether there are any gross failings in the models representation of internal variability.

Figure S11 is in the supplementary section of the paper:

From Jones et al 2013, figure S11

Figure 4

From what I can see, this demonstrates that the spectrum of the models’ internal variability (“historicalNat”) is different from the spectrum of the models’ forced response with GHG changes (“historical”).

However, the ability to simulate climate variability, both unforced internal variability and forced variability (e.g., diurnal and seasonal cycles) is also important. This has implications for the signal-to-noise estimates inherent in climate change detection and attribution studies where low-frequency climate variability must be estimated, at least in part, from long control integrations of climate models (Section 10.2).

Section 9.5.3:

In addition to the annual, intra-seasonal and diurnal cycles described above, a number of other modes of variability arise on multi-annual to multi-decadal time scales (see also Box 2.5). Most of these modes have a particular regional manifestation whose amplitude can be larger than that of human-induced climate change. The observational record is usually too short to fully evaluate the representation of variability in models and this motivates the use of reanalysis or proxies, even though these have their own limitations.

Figure 9.33a shows simulated internal variability of mean surface temperature from CMIP5 pre-industrial control simulations. Model spread is largest in the tropics and mid to high latitudes (Jones et al., 2012), where variability is also large; however, compared to CMIP3, the spread is smaller in the tropics owing to improved representation of ENSO variability (Jones et al., 2012). The power spectral density of global mean temperature variance in the historical simulations is shown in Figure 9.33b and is generally consistent with the observational estimates. At longer time scale of the spectra estimated from last millennium simulations, performed with a subset of the CMIP5 models, can be assessed by comparison with different NH temperature proxy records (Figure 9.33c; see Chapter 5 for details). The CMIP5 millennium simulations include natural and anthropogenic forcings (solar, volcanic, GHGs, land use) (Schmidt et al., 2012).

Significant differences between unforced and forced simulations are seen for time scale larger than 50 years, indicating the importance of forced variability at these time scales (Fernandez-Donado et al., 2013). It should be noted that a few models exhibit slow background climate drift which increases the spread in variance estimates at multi-century time scales.

Nevertheless, the lines of evidence above suggest with high confidence that models reproduce global and NH temperature variability on a wide range of time scales.

[Emphasis added]. Here is fig 9.33:

From IPCC AR5 Chapter 10

Figure 5 – Click to Expand

The bottom graph shows the spectra of the last 1,000 years – black line is observations (reconstructed from proxies), dashed lines are without GHG forcings, and solid lines are with GHG forcings.

In later articles we will review this in more detail.

Conclusion

The IPCC report on attribution is very interesting. Most attribution studies compare observations of the last 100 – 150 years with model simulations using anthropogenic GHG changes and model simulations without (note 3).

The results show a much better match for the case of the anthropogenic forcing.

The primary method is with global mean surface temperature, with more recent studies also comparing the spatial breakdown. We saw one such comparison with van Oldenborgh et al (2013). Jones et al (2013) also reviews spatial matching, finding a better fit (of models & observations) for the last half of the 20th century than the first half. (As with van Oldenborgh’s paper, the % match outside 90% of model results was greater than 10%).

My question as I first read Chapter 10 was how was the high confidence attained and what is a fingerprint?

I was led back, by following the chain of references, to one of the early papers on the topic (1996) that also had similar high confidence. (We saw this in Part Three). It was intriguing that such confidence could be attained with just a few “no forcing” model runs as comparison, all of which needed “flux adjustment”. Current models need much less, or often zero, flux adjustment.

In later papers reviewed in AR5, “no forcing” model simulations that show temperature trends or jumps are often removed or adjusted.

I’m not trying to suggest that “no forcing” GCM simulations of the last 150 years have anything like the temperature changes we have observed. They don’t.

But I was trying to understand what assumptions and premises were involved in attribution. Chapter 10 of AR5 has been valuable in suggesting references to read, but poor at laying out the assumptions and premises of attribution studies.

..as regular readers know I am fully convinced that the increases in CO2, CH4 and other GHGs over the past 100 years or more can be very well quantified into “radiative forcing” and am 100% in agreement with the IPCCs summary of the work of atmospheric physics over the last 50 years on this topic. That is, the increases in GHGs have led to something like a “radiative forcing” of 2.8 W/m²..

..Therefore, it’s “very likely” that the increases in GHGs over the last 100 years have contributed significantly to the temperature changes that we have seen.

So what’s my point?

Chapter 10 of the IPCC report fails to highlight the important assumptions in the attribution studies. Chapter 9 of the IPCC report has a section on centennial/millennial natural variability with a “high confidence” conclusion that comes with little evidence and appears to be based on a cursory comparison of the spectral results of the last 1,000 years proxy results with the CMIP5 modeling studies.

In chapter 10, the executive summary states:

..given that observed warming since 1951 is very large compared to climate model estimates of internal variability (Section 10.3.1.1.2), which are assessed to be adequate at global scale (Section 9.5.3.1), we conclude that it is virtually certain [99-100%] that internal variability alone cannot account for the observed global warming since 1951.

[Emphasis added]. I agree, and I don’t think anyone who understands radiative forcing and climate basics would disagree. To claim otherwise would be as ridiculous as, for example, claiming that tiny changes in solar insolation from eccentricity modifications over 100 kyrs cause the end of ice ages, whereas large temperature changes during these ice ages have no effect (see note 2).

The executive summary also says:

It is extremely likely [95–100%] that human activities caused more than half of the observed increase in GMST from 1951 to 2010.

The idea is plausible, but the confidence level is dependent on a premise that is claimed via one graph (fig 9.33) of the spectrum of the last 1,000 years. High confidence (“that models reproduce global and NH temperature variability on a wide range of time scales”) is just an opinion.

It’s crystal clear, by inspection of CMIP3 and CMIP5 model results, that models with anthropogenic forcing match the last 150 years of temperature changes much better than models held at constant pre-industrial forcing.

I believe natural variability is a difficult subject which needs a lot more than a cursory graph of the spectrum of the last 1,000 years to even achieve low confidence in our understanding.

Chapters 9 & 10 of AR5 haven’t investigated “natural variability” at all. For interest, some skeptic opinions are given in note 4.

I propose an alternative summary for Chapter 10 of AR5:

It is extremely likely [95–100%] that human activities caused more than half of the observed increase in GMST from 1951 to 2010, but this assessment is subject to considerable uncertainties.

Notes

At a September 2008 meeting involving 20 climate modeling groups from around the world, the WCRP’s Working Group on Coupled Modelling (WGCM), with input from the IGBP AIMES project, agreed to promote a new set of coordinated climate model experiments. These experiments comprise the fifth phase of the Coupled Model Intercomparison Project (CMIP5). CMIP5 will notably provide a multi-model context for

1) assessing the mechanisms responsible for model differences in poorly understood feedbacks associated with the carbon cycle and with clouds

2) examining climate “predictability” and exploring the ability of models to predict climate on decadal time scales, and, more generally

From the website link above you can read more. CMIP5 is a substantial undertaking, with massive output of data from the latest climate models. Anyone can access this data, similar to CMIP3. Here is the Getting Started page.

In response to a proposed activity of the World Climate Research Programme (WCRP) Working Group on Coupled Modelling (WGCM), PCMDI volunteered to collect model output contributed by leading modeling centers around the world. Climate model output from simulations of the past, present and future climate was collected by PCMDI mostly during the years 2005 and 2006, and this archived data constitutes phase 3 of the Coupled Model Intercomparison Project (CMIP3). In part, the WGCM organized this activity to enable those outside the major modeling centers to perform research of relevance to climate scientists preparing the Fourth Asssessment Report (AR4) of the Intergovernmental Panel on Climate Change (IPCC). The IPCC was established by the World Meteorological Organization and the United Nations Environmental Program to assess scientific information on climate change. The IPCC publishes reports that summarize the state of the science.

This unprecedented collection of recent model output is officially known as the “WCRP CMIP3 multi-model dataset.” It is meant to serve IPCC’s Working Group 1, which focuses on the physical climate system — atmosphere, land surface, ocean and sea ice — and the choice of variables archived at the PCMDI reflects this focus. A more comprehensive set of output for a given model may be available from the modeling center that produced it.

With the consent of participating climate modelling groups, the WGCM has declared the CMIP3 multi-model dataset open and free for non-commercial purposes. After registering and agreeing to the “terms of use,” anyone can now obtain model output via the ESG data portal, ftp, or the OPeNDAP server.

As of July 2009, over 36 terabytes of data were in the archive and over 536 terabytes of data had been downloaded among the more than 2500 registered users

“Natural forcings” = radiative changes due to solar insolation variations (which are not known with much confidence) and aerosols from volcanos. “No forcings” is simply fixed pre-industrial values.

Note 4: Chapter 11 (of AR5), p.982:

For the remaining projections in this chapter the spread among the CMIP5 models is used as a simple, but crude, measure of uncertainty. The extent of agreement between the CMIP5 projections provides rough guidance about the likelihood of a particular outcome. But—as partly illustrated by the discussion above—it must be kept firmly in mind that the real world could fall outside of the range spanned by these particular models. See Section 11.3.6 for further discussion.

And p. 1004:

It is possible that the real world might follow a path outside (above or below) the range projected by the CMIP5 models. Such an eventuality could arise if there are processes operating in the real world that are missing from, or inadequately represented in, the models. Two main possibilities must be considered: (1) Future radiative and other forcings may diverge from the RCP4.5 scenario and, more generally, could fall outside the range of all the RCP scenarios; (2) The response of the real climate system to radiative and other forcing may differ from that projected by the CMIP5 models. A third possibility is that internal fluctuations in the real climate system are inadequately simulated in the models. The fidelity of the CMIP5 models in simulating internal climate variability is discussed in Chapter 9..

..The response of the climate system to radiative and other forcing is influenced by a very wide range of processes, not all of which are adequately simulated in the CMIP5 models (Chapter 9). Of particular concern for projections are mechanisms that could lead to major ‘surprises’ such as an abrupt or rapid change that affects global-to-continental scale climate.

Several such mechanisms are discussed in this assessment report; these include: rapid changes in the Arctic (Section 11.3.4 and Chapter 12), rapid changes in the ocean’s overturning circulation (Chapter 12), rapid change of ice sheets (Chapter 13) and rapid changes in regional monsoon systems and hydrological climate (Chapter 14). Additional mechanisms may also exist as synthesized in Chapter 12. These mechanisms have the potential to influence climate in the near term as well as in the long term, albeit the likelihood of substantial impacts increases with global warming and is generally lower for the near term.

The CMIP3 and CMIP5 projections are ensembles of opportunity, and it is explicitly recognized that there are sources of uncertainty not simulated by the models. Evidence of this can be seen by comparing the Rowlands et al. (2012) projections for the A1B scenario, which were obtained using a very large ensemble in which the physics parameterizations were perturbed in a single climate model, with the corresponding raw multi-model CMIP3 projections. The former exhibit a substantially larger likely range than the latter. A pragmatic approach to addressing this issue, which was used in the AR4 and is also used in Chapter 12, is to consider the 5 to 95% CMIP3/5 range as a ‘likely’ rather than ‘very likely’ range.

Replacing ‘very likely’ = 90–100% with ‘likely 66–100%’ is a good start. How does this recast chapter 10?

And Chapter 1 of AR5, p. 138:

Model spread is often used as a measure of climate response uncertainty, but such a measure is crude as it takes no account of factors such as model quality (Chapter 9) or model independence (e.g., Masson and Knutti, 2011; Pennell and Reichler, 2011), and not all variables of interest are adequately simulated by global climate models..

..Climate varies naturally on nearly all time and space scales, and quantifying precisely the nature of this variability is challenging, and is characterized by considerable uncertainty.

In (still) writing what was to be Part Six (Attribution in AR5 from the IPCC), I was working through Knutson et al 2013, one of the papers referenced by AR5. That paper in turn referenced Are historical records sufficient to constrain ENSO simulations? [link corrected] by Andrew Wittenberg (2009). This is a very interesting paper and I was glad to find it because it illustrates some of the points we have been looking at.

It’s an easy paper to read (and free) and so I recommend reading the whole paper.

CM2.1 played a prominent role in the third Coupled Model Intercomparison Project (CMIP3) and the Fourth Assessment of the Intergovernmental Panel on Climate Change (IPCC), and its tropical and ENSO simulations have consistently ranked among the world’s top GCMs [van Oldenborgh et al., 2005; Wittenberg et al., 2006; Guilyardi, 2006; Reichler and Kim, 2008].

The coupled pre-industrial control run is initialized as by Delworth et al. [2006], and then integrated for 2220 yr with fixed 1860 estimates of solar irradiance, land cover, and atmospheric composition; we focus here on just the last 2000 yr. This simulation required one full year to run on 60 processors at GFDL.

First of all we see the challenge for climate models – a reasonable resolution coupled GCM running just one 2000-year simulation consumed one year of multiple processor time.

Wittenberg shows the results in the graph below. At the top is our observational record going back 140 years, then below are the simulation results of the SST variation in the El Nino region broken into 20 century-long segments.

From Wittenberg 2009

Figure 1 – Click to Expand

What we see is that different centuries have very different results:

There are multidecadal epochs with hardly any variability (M5); epochs with intense, warm-skewed ENSO events spaced five or more years apart (M7); epochs with moderate, nearly sinusoidal ENSO events spaced three years apart (M2); and epochs that are highly irregular in amplitude and period (M6). Occasional epochs even mimic detailed temporal sequences of observed ENSO events; e.g., in both R2 and M6, there are decades of weak, biennial oscillations, followed by a large warm event, then several smaller events, another large warm event, and then a long quiet period. Although the model’s NINO3 SST variations are generally stronger than observed, there are long epochs (like M1) where the ENSO amplitude agrees well with observations (R1).

Wittenberg comments on the problem for climate modelers:

An unlucky modeler – who by chance had witnessed only M1-like variability throughout the first century of simulation – might have erroneously inferred that the model’s ENSO amplitude matched observations, when a longer simulation would have revealed a much stronger ENSO.

If the real-world ENSO is similarly modulated, then there is a more disturbing possibility. Had the research community been unlucky enough to observe an unrepresentative ENSO over the past 150 yr of measurements, then it might collectively have misjudged ENSO’s longer-term natural behavior. In that case, historically-observed statistics could be a poor guide for modelers, and observed trends in ENSO statistics might simply reflect natural variations..

..A 200 yr epoch of consistently strong variability (M3) can be followed, just one century later, by a 200 yr epoch of weak variability (M4). Documenting such extremes might thus require a 500+ yr record. Yet few modeling centers currently attempt simulations of that length when evaluating CGCMs under development – due to competing demands for high resolution, process completeness, and quick turnaround to permit exploration of model sensitivities.

Model developers thus might not even realize that a simulation manifested long-term ENSO modulation, until long after freezing the model development. Clearly this could hinder progress. An unlucky modeler – unaware of centennial ENSO modulation and misled by comparisons between short, unrepresentative model runs – might erroneously accept a degraded model or reject an improved model.

[Emphasis added].

Wittenberg shows the same data in the frequency domain and has presented the data in a way that illustrates the different perspective you might have depending upon your period of observation or period of model run. It’s worth taking the time to understand what is in these graphs:

From Wittenberg 2009

Figure 2 – Click to Expand

The first graph, 2a:

..time-mean spectra of the observations for epochs of length 20 yr – roughly the duration of observations from satellites and the Tropical Atmosphere Ocean (TAO) buoy array. The spectral power is fairly evenly divided between the seasonal cycle and the interannual ENSO band, the latter spanning a broad range of time scales between 1.3 to 8 yr.

So the different colored lines indicate the spectral power for each period. The black dashed line is the observed spectral power over the 140 year (observational) period. This dashed line is repeated in figure 2c.

The second graph, 2b shows the modeled results if we break up the 2000 years into 100 x 20-year periods.

The third graph, 2c, shows the modeled results broken up into 100 year periods. The probability number in the bottom right, 90%, is the likelihood of observations falling outside the range of the model results – if “the simulated subspectra independent and identically distributed.. at bottom right is the probability that an interval so constructed would bracket the next subspectrum to emerge from the model.”

So what this says, paraphrasing and over-simplifying: “we are 90% sure that the observations can’t be explained by the models”.

Of course, this independent and identically distributed assumption is not valid, but as we will hopefully get onto many articles further in this series, most of these statistical assumptions – stationary, gaussian, AR1 – are problematic for real world non-linear systems.

To be clear, the paper’s author is demonstrating a problem in such a statistical approach.

Conclusion

Models are not reality. This is a simulation with the GFDL model. It doesn’t mean ENSO is like this. But it might be.

The paper illustrates a problem I highlighted in Part Five – observations are only one “realization” of possible outcomes. The last century or century and a half of surface observations could be an outlier. The last 30 years of satellite data could equally be an outlier. Even if our observational periods are not an outlier and are right there on the mean or median, matching climate models to observations may still greatly under-represent natural climate variability.

Non-linear systems can demonstrate variability over much longer time-scales than the the typical period between characteristic events. We will return to this in future articles in more detail. Such systems do not have to be “chaotic” (where chaotic means that tiny changes in initial conditions cause rapidly diverging results).

What period of time is necessary to capture natural climate variability?

I will give the last word to the paper’s author:

More worryingly, if nature’s ENSO is similarly modulated, there is no guarantee that the 150 yr historical SST record is a fully representative target for model development..

..In any case, it is sobering to think that even absent any anthropogenic changes, the future of ENSO could look very different from what we have seen so far.

Notes

The formulation and simulation characteristics of two new global coupled climate models developed at NOAA’s Geophysical Fluid Dynamics Laboratory (GFDL) are described.

The models were designed to simulate atmospheric and oceanic climate and variability from the diurnal time scale through multicentury climate change, given our computational constraints. In particular, an important goal was to use the same model for both experimental seasonal to interannual forecasting and the study of multicentury global climate change, and this goal has been achieved.

Two versions of the coupled model are described, called CM2.0 and CM2.1. The versions differ primarily in the dynamical core used in the atmospheric component, along with the cloud tuning and some details of the land and ocean components. For both coupled models, the resolution of the land and atmospheric components is 2° latitude x 2.5° longitude; the atmospheric model has 24 vertical levels.

The ocean resolution is 1° in latitude and longitude, with meridional resolution equatorward of 30° becoming progressively finer, such that the meridional resolution is 1/3° at the equator. There are 50 vertical levels in the ocean, with 22 evenly spaced levels within the top 220 m. The ocean component has poles over North America and Eurasia to avoid polar filtering. Neither coupled model employs flux adjustments.

The control simulations have stable, realistic climates when integrated over multiple centuries. Both models have simulations of ENSO that are substantially improved relative to previous GFDL coupled models. The CM2.0 model has been further evaluated as an ENSO forecast model and has good skill (CM2.1 has not been evaluated as an ENSO forecast model). Generally reduced temperature and salinity biases exist in CM2.1 relative to CM2.0. These reductions are associated with 1) improved simulations of surface wind stress in CM2.1 and associated changes in oceanic gyre circulations; 2) changes in cloud tuning and the land model, both of which act to increase the net surface shortwave radiation in CM2.1, thereby reducing an overall cold bias present in CM2.0; and 3) a reduction of ocean lateral viscosity in the extra- tropics in CM2.1, which reduces sea ice biases in the North Atlantic.

Both models have been used to conduct a suite of climate change simulations for the 2007 Intergovern- mental Panel on Climate Change (IPCC) assessment report and are able to simulate the main features of the observed warming of the twentieth century. The climate sensitivities of the CM2.0 and CM2.1 models are 2.9 and 3.4 K, respectively. These sensitivities are defined by coupling the atmospheric components of CM2.0 and CM2.1 to a slab ocean model and allowing the model to come into equilibrium with a doubling of atmospheric CO2. The output from a suite of integrations conducted with these models is freely available online (see http://nomads.gfdl.noaa.gov/).

There’s a brief description of the newer model version CM3.0 on the GFDL page.

In Part Three we looked at attribution in the early work on this topic by Hegerl et al 1996. I started to write Part Four as the follow up on Attribution as explained in the 5th IPCC report (AR5), but got caught up in the many volumes of AR5.

And instead for this article I decided to focus on what might seem like an obscure point. I hope readers stay with me because it is important.

Here is a graphic from chapter 11 of IPCC AR5:

From IPCC AR5 Chapter 11

Figure 1

And in the introduction, chapter 1:

Climate in a narrow sense is usually defined as the average weather, or more rigorously, as the statistical description in terms of the mean and variability of relevant quantities over a period of time ranging from months to thousands or millions of years. The relevant quantities are most often surface variables such as temperature, precipitation and wind.

Classically the period for averaging these variables is 30 years, as defined by the World Meteorological Organization.

Climate in a wider sense also includes not just the mean conditions, but also the associated statistics (frequency, magnitude, persistence, trends, etc.), often combining parameters to describe phenomena such as droughts. Climate change refers to a change in the state of the climate that can be identified (e.g., by using statistical tests) by changes in the mean and/or the variability of its properties, and that persists for an extended period, typically decades or longer.

[Emphasis added].

Weather is an Initial Value Problem, Climate is a Boundary Value Problem

With even a minute uncertainty in the initial starting condition, the predictability of future states is very limited

Over a long time period the statistics of the system are well-defined

(Being technical, the statistics are well-defined in a transitive system).

So in essence, we can’t predict the exact state of the future – from the current conditions – beyond a certain timescale which might be quite small. In fact, in current weather prediction this time period is about one week.

After a week we might as well say either “the weather on that day will be the same as now” or “the weather on that day will be the climatological average” – and either of these will be better than trying to predict the weather based on the initial state.

No one disagrees on this first point.

In current climate science and meteorology the term used is the skill of the forecast. Skill means, not how good is the forecast, but how much better is it than a naive approach like, “it’s July in New York City so the maximum air temperature today will be 28ºC”.

What happens in practice, as can be seen in the simple Lorenz system shown in Part Two, is a tiny uncertainty about the starting condition gets amplified. Two almost identical starting conditions will diverge rapidly – the “butterfly effect”. Eventually these two conditions are no more alike than one of the conditions and a time chosen at random from the future.

The wide divergence doesn’t mean that the future state can be anything. Here’s an example from the simple Lorenz system for three slightly different initial conditions:

Figure 2

We can see that the three conditions that looked identical for the first 20 seconds (see figure 2 in Part Two) have diverged. The values are bounded but at any given time we can’t predict what the value will be.

On the second point – the statistics of the system, there is a tiny hiccup.

But first let’s review what is agreed upon. Climate is the statistics of weather. Weather is unpredictable more than a week ahead. Climate, as the statistics of weather, might be predictable. That is, just because weather is unpredictable, it doesn’t mean (or prove) that climate is also unpredictable.

This is what we find with simple chaotic systems.

So in the endeavor of climate modeling the best we can hope for is a probabilistic forecast. We have to run “a lot” of simulations and review the statistics of the parameter we are trying to measure.

To give a concrete example, we might determine from model simulations that the mean sea surface temperature in the western Pacific (between a certain latitude and longitude) in July has a mean of 29ºC with a standard deviation of 0.5ºC, while for a certain part of the north Atlantic it is 6ºC with a standard deviation of 3ºC. In the first case the spread of results tells us – if we are confident in our predictions – that we know the western Pacific SST quite accurately, but the north Atlantic SST has a lot of uncertainty. We can’t do anything about the model spread. In the end, the statistics are knowable (in theory), but the actual value on a given day or month or year are not.

Now onto the hiccup.

With “simple” chaotic systems that we can perfectly model (note 1) we don’t know in advance the timescale of “predictable statistics”. We have to run lots of simulations over long time periods until the statistics converge on the same result. If we have parameter uncertainty (see Ensemble Forecasting) this means we also have to run simulations over the spread of parameters.

So one body made an ad hoc definition of climate as the 30-year average of weather.

If this definition is correct and accepted then “climate” is not a “boundary value problem” at all. Climate is an initial value problem and therefore a massive problem given our ability to forecast only one week ahead.

Suppose, equally reasonably, that the statistics of weather (=climate), given constant forcing (note 2), are predictable over a 10,000 year period.

In that case we can be confident that, with near perfect models, we have the ability to be confident about the averages, standard deviations, skews, etc of the temperature at various locations on the globe over a 10,000 year period.

Conclusion

The fact that chaotic systems exhibit certain behavior doesn’t mean that 30-year statistics of weather can be reliably predicted.

30-year statistics might be just as dependent on the initial state as the weather three weeks from today.

Notes

Note 1: The climate system is obviously imperfectly modeled by GCMs, and this will always be the case. The advantage of a simple model is we can state that the model is a perfect representation of the system – it is just a definition for convenience. It allows us to evaluate how slight changes in initial conditions or parameters affect our ability to predict the future.

The IPCC report also has continual reminders that the model is not reality, for example, chapter 11, p. 982:

For the remaining projections in this chapter the spread among the CMIP5 models is used as a simple, but crude, measure of uncertainty. The extent of agreement between the CMIP5 projections provides rough guidance about the likelihood of a particular outcome. But — as partly illustrated by the discussion above — it must be kept firmly in mind that the real world could fall outside of the range spanned by these particular models.

[Emphasis added].

Chapter 1, p.138:

Model spread is often used as a measure of climate response uncertainty, but such a measure is crude as it takes no account of factors such as model quality (Chapter 9) or model independence (e.g., Masson and Knutti, 2011; Pennell and Reichler, 2011), and not all variables of interest are adequately simulated by global climate models..

..Climate varies naturally on nearly all time and space scales, and quantifying precisely the nature of this variability is challenging, and is characterized by considerable uncertainty.

I haven’t yet been able to determine how these firmly noted and challenging uncertainties have been factored into the quantification of 95-100%, 99-100%, etc, in the various chapters of the IPCC report.

Note 2: There are some complications with defining exactly what system is under review. For example, do we take the current solar output, current obliquity,precession and eccentricity as fixed? If so, then any statistics will be calculated for a condition that will anyway be changing. Alternatively, we can take these values as changing inputs in so far as we know the changes – which is true for obliquity, precession and eccentricity but not for solar output.

I’ve been somewhat sidetracked on this series, mostly by starting up a company and having no time, but also by the voluminous distractions of IPCC AR5. The subject of attribution could be a series by itself but as I started the series Natural Variability and Chaos it makes sense to weave it into that story.

In Part One and Part Two we had a look at chaotic systems and what that might mean for weather and climate. I was planning to develop those ideas a lot more before discussing attribution, but anyway..

AR5, Chapter 10: Attribution is 85 pages on the idea that the changes over the last 50 or 100 years in mean surface temperature – and also some other climate variables – can be attributed primarily to anthropogenic greenhouse gases.

The technical side of the discussion fascinated me, but has a large statistical component. I’m a rookie with statistics, and maybe because of this, I’m often suspicious about statistical arguments.

Digression on Statistics

The foundation of a lot of statistics is the idea of independent events. For example, spin a roulette wheel and you get a number between 0 and 36 and a color that is red, black – or if you’ve landed on a zero, neither.

The statistics are simple – each spin of the roulette wheel is an independent event – that is, it has no relationship with the last spin of the roulette wheel. So, looking ahead, what is the chance of getting 5 two times in a row? The answer (with a 0 only and no “00” as found in some roulette tables) is 1/37 x 1/37 = 0.073%.

However, after you have spun the roulette wheel and got a 5, what is the chance of a second 5? It’s now just 1/37 = 2.7%. The past has no impact on the future statistics. Most of real life doesn’t correspond particularly well to this idea, apart from playing games of chance like poker and so on.

I was in the gym the other day and although I try and drown it out with music from my iPhone, the Travesty (aka “the News”) was on some of the screens in the gym – with text of the “high points” on the screen aimed at people trying to drown out the annoying travestyreaders. There was a report that a new study had found that autism was caused by “Cause X” – I have blanked it out to avoid any unpleasant feeling for parents of autistic kids – or people planning on having kids who might worry about “Cause X”.

It did get me thinking – if you have let’s say 10,000 potential candidates for causing autism, and you set the bar at 95% probability of rejecting the hypothesis that a given potential cause is a factor, what is the outcome? Well, if there is a random spread of autism among the population with no actual cause (let’s say it is caused by a random genetic mutation with no link to any parental behavior, parental genetics or the environment) then you will expect to find about 500 “statistically significant” factors for autism simply by testing at the 95% level. That’s 500, when none of them are actually the real cause. It’s just chance. Plenty of fodder for pundits though.

That’s one problem with statistics – the answer you get unavoidably depends on your frame of reference.

The questions I have about attribution are unrelated to this specific point about statistics, but there are statistical arguments in the attribution field that seem fatally flawed. Luckily I’m a statistical novice so no doubt readers will set me straight.

On another unrelated point about statistical independence, only slightly more relevant to the question at hand, Pirtle, Meyer & Hamilton (2010) said:

In short, we note that GCMs are commonly treated as independent from one another, when in fact there are many reasons to believe otherwise. The assumption of independence leads to increased confidence in the ‘‘robustness’’ of model results when multiple models agree. But GCM independence has not been evaluated by model builders and others in the climate science community. Until now the climate science literature has given only passing attention to this problem, and the field has not developed systematic approaches for assessing model independence.

.. end of digression

Attribution History

In my efforts to understand Chapter 10 of AR5 I followed up on a lot of references and ended up winding my way back to Hegerl et al 1996.

Gabriele Hegerl is one of the lead authors of Chapter 10 of AR5, was one of the two coordinating lead authors of the Attribution chapter of AR4, and one of four lead authors on the relevant chapter of AR3 – and of course has a lot of papers published on this subject.

As is often the case, I find that to understand a subject you have to start with a focus on the earlier papers because the later work doesn’t make a whole lot of sense without this background.

This paper by Hegerl and her colleagues use the work of one of the co-authors, Klaus Hasselmann – his 1993 paper “Optimal fingerprints for detection of time dependent climate change”.

Fingerprints, by the way, seems like a marketing term. Fingerprints evokes the idea that you can readily demonstrate that John G. Doe of 137 Smith St, Smithsville was at least present at the crime scene and there is no possibility of confusing his fingerprints with John G. Dode who lives next door even though their mothers could barely tell them apart.

This kind of attribution is more in the realm of “was it the 6ft bald white guy or the 5’5″ black guy”?

Well, let’s set aside questions of marketing and look at the details.

Detecting GHG Climate Change with Optimal Fingerprint Methods in 1996

The essence of the method is to compare observations (measurements) with:

model runs with GHG forcing

model runs with “other anthropogenic” and natural forcings

model runs with internal variability only

Then based on the fit you can distinguish one from the other. The statistical basis is covered in detail in Hasselmann 1993 and more briefly in this paper: Hegerl et al 1996 – both papers are linked below in the References.

At this point I make another digression.. as regular readers know I am fully convinced that the increases in CO2, CH4 and other GHGs over the past 100 years or more can be very well quantified into “radiative forcing” and am 100% in agreement with the IPCCs summary of the work of atmospheric physics over the last 50 years on this topic. That is, the increases in GHGs have led to something like a “radiative forcing” of 2.8 W/m² [corrected, thanks to niclewis].

Therefore, it’s “very likely” that the increases in GHGs over the last 100 years have contributed significantly to the temperature changes that we have seen.

To say otherwise – and still accept physics basics – means believing that the radiative forcing has been “mostly” cancelled out by feedbacks while internal variability has been amplified by feedbacks to cause a significant temperature change.

Yet this work on attribution seems to be fundamentally flawed.

Here was the conclusion:

We find that the latest observed 30-year trend pattern of near-surface temperature change can be distinguished from all estimates of natural climate variability with an estimated risk of less than 2.5% if the optimal fingerprint is applied.

With the caveats, that to me, eliminated the statistical basis of the previous statement:

The greatest uncertainty of our analysis is the estimate of the natural variability noise level..

..The shortcomings of the present estimates of natural climate variability cannot be readily overcome. However, the next generation of models should provide us with better simulations of natural variability. In the future, more observations and paleoclimatic information should yield more insight into natural variability, especially on longer timescales. This would enhance the credibility of the statistical test.

Earlier in the paper the authors said:

..However, it is generally believed that models reproduce the space-time statistics of natural variability on large space and long time scales (months to years) reasonably realistic. The verification of variability of CGMCs [coupled GCMs] on decadal to century timescales is relatively short, while paleoclimatic data are sparce and often of limited quality.

..We assume that the detection variable is Gaussian with zero mean, that is, that there is no long-term nonstationarity in the natural variability.

[Emphasis added].

The climate models used would be considered rudimentary by today’s standards. Three different coupled atmosphere-ocean GCMs were used. However, each of them required “flux corrections”.

This method was pretty much the standard until the post 2000 era. The climate models “drifted”, unless, in deity-like form, you topped up (or took out) heat and momentum from various grid boxes.

That is, the models themselves struggled (in 1996) to represent climate unless the climate modeler knew, and corrected for, the long term “drift” in the model.

Conclusion

In the next article we will look at more recent work in attribution and fingerprints and see whether the field has developed.

But in this article we see that the conclusion of an attribution study in 1996 was that there was only a “2.5% chance” that recent temperature changes could be attributed to natural variability. At the same time, the question of how accurate the models were in simulating natural variability was noted but never quantified. And the models were all “flux corrected”. This means that some aspects of the long term statistics of climate were considered to be known – in advance.

So I find it difficult to accept any statistical significance in the study at all.

If the finding instead was introduced with the caveat “assuming the accuracy of our estimates of long term natural variability of climate is correct..” then I would probably be quite happy with the finding. And that question is the key.

The question should be:

What is the likelihood that climate models accurately represent the long-term statistics of natural variability?