The two strongest potentially credible constraints, and conclusions

A guest post by Nic Lewis

In Part 1 of this article the nature and validity of emergent constraints[1] on equilibrium climate sensitivity (ECS) in GCMs were discussed, drawing mainly on the analysis and assessment of 19 such constraints in Caldwell et al. (2018),[2] who concluded that only four of them were credible. An extract of the rows of Table 1 of Part 1 detailing those four emergent constraints is given below.[3]

Name of constraint

Year

Correlation in CMIP5

Description

Sherwood D

2014

0.40

Strength of resolved-scale mixing between BL and lower troposphere in tropical E Pacific and Atlantic

Two of the those four constraints, Sherwood D and Brient Shal, were analysed in Part 2 and found wanting. In this final part of the article I discuss the remaining two potentially credible constraints, Brient Alb and Zhai – which have much higher correlation with ECS than do Sherwood D and Brient Shal – and formulate conclusions.

Brient Albedo

Brient Alb is based on the correlation in CMIP5 models between ECS, and the relationship of shortwave (SW) reflection by low clouds over tropical oceans (TLC) with SST.[4] The authors found that estimates of the strength of that relationship derived from either deseasonalized or interannual variability correlated better with ECS than those based on seasonal or intra-annual variability. They used deseasonalized variability, which is primarily driven by large-scale phenomena such as El Niño-Southern Oscillation. The observational constraint, from CERES data over 2000-15, is quite tight. All but one of the models from NCAR, GFDL, GISS, UKMO and MPI, perhaps the best-known modelling centres, are ruled out by the Brient Alb emergent constraint.

The paper involves sophisticated statistical methods, as one might expect with Tapio Schneider [5] being joint author. Not many climate science papers involve concepts such as Kullback-Leibler divergence.[6] The authors use this divergence measure to weight models, as they are doubtful that fitting a linear relationship between a proposed emergent constraint and ECS – as is done in many emergent constraint studies – is appropriate. Brient & Schneider make the important point that:

procedures that … first infer a linear relation (regression line) between ECS and variables … from models and then use that linear relation to constrain ECS given observations… can be strongly influenced by ‘‘bad’’ models that are not consistent with the data but exert large leverage on the inferred slope of the regression line. If the slope of the regression line is strongly constrained by bad models, such a procedure can, misleadingly, yield very narrow ECS estimates that could not be justified by focusing on ‘‘good’’ models, which are broadly consistent with the data. By contrast, our multimodel inference procedure assigns zero weight to models that are inconsistent with the data.

Regression of ECS on the strength of the relationship between TLC reflection variability and SST with models weighted by how “good” they are, using either Brient & Schneider’s model weightings or those derived directly from model likelihoods given the observational probability density, explains almost none of the intermodel variance in ECS. That justifies their rejection of the usual linear relationship assumption.

However, Brient & Schneider’s Kullback-Leibler divergence based weighted model-averaging approach makes the assumption that uncertainty in the model and observational estimates of the TLC reflection–SST relationship should be identical. The divergence measure penalises differences in the widths (and shapes) of the observation and model derived TLC reflection-SST relationship estimates as well as differences in their means.[7] In my view the equal uncertainty assumption is not valid and accordingly the justification given for using the divergence measure does not stand up.[8]

Aside from statistical issues, model-averaging is unsatisfactory given that many models are closely related to and/or have similar characteristics to other models, but their weightings are not reduced to reflect that.[9] For instance, the method gives substantial weightings to both IPSL-CM5A-LR and IPSL-CM5A-MR, two high-sensitivity models that differ only in their resolution.

More fundamentally, if a constraint is satisfied by one or more low sensitivity models and one or more high sensitivity models, how can it be considered to give useful information about ECS? The models are not a random sample. In such a case, whether there are more models with high sensitivity than low sensitivity that satisfy the constraint depends arbitrarily on development decisions by modelling centres, and their choices as to which offshoot model variants to include in CMIP ensembles. Accordingly, the fact that more of the constraint-satisfying models have high ECS than have low ECS constitutes very weak evidence that ECS is high. Brient Alb is such a case: both IPSL-CM5A-LR and IPSL-CM5B-LR, respectively a high (4.1°C) and low (2.6°C) ECS model, closely satisfy the observational constraint. Moreover, the later, lower sensitivity, CM5B model has improved representation of the convective boundary layer and cumulus clouds, of tropical SW CRF and of mid-level cloud coverage.

Brient & Schneider’s results in fact change little if a more reasonable method of weighting models that does not penalise differences in the uncertainty between the model and observational estimates of the TLC reflection–SST relationship is used. However, if in addition the IPSL-CM5A-LR model is replaced by a second copy of IPSL-CM5B-LR – giving a 2-to-1 weighting in favour of IPSL’s more advanced, improved CM5B model over the earlier CM5A version rather than vice versa – the resulting weighted CMIP5 model ECS uncertainty distribution differs little from the raw unweighted distribution, apart from ECS values below 2.5°C being less likely.[10]

It seems doubtful that the Brient Alb actually provides much constraint on ECS. It does suggest that current models with an ECS of below 2.5°C are poor at simulating the observed TLC reflection–SST relationship, but that may be unrelated to their lower than average sensitivity. These conclusions are consistent with those of the study’s authors.[11]

Zhai

The Zhai constraint is very similar to that in Brient Alb, except that it uses seasonal variability in the extent of marine low clouds rather than deseasonalized variability in their total SW reflection. Zhai et al. also studied the sub-tropics (20°–40°S and 20°–40°N) rather than the tropics (30°S–30°N) and identified low cloud regions using a different method.[12] Also, the ECS for the NorESM1-M model is significantly wrong in the Zhai scatter plots, which suggests that their regression estimates, at least, might be somewhat inaccurate.[13]

Moreover, the linear relationship that the Zhai study fits between the seasonal variability derived relationship of low cloud reflectivity with SST and ECS is dominated by “bad” models that are inconsistent with the observational constraint. This is exactly the problem that led Brient & Schneider to eschew fitting a linear relationship. However, it appears that the constrained best-estimate for ECS that Zhai et al. derive is simply the unweighted mean and standard deviation of ECS values for the seven models having seasonal variability derived relationships of low cloud extent with SST that are consistent with their observational estimate. That method makes much less allowance for uncertainty than does Brient & Schneider’s methodology, and accounts for the Zhai constraint on ECs being narrow.

Worryingly, the assessment of consistency with the observational estimate of the relationship of low cloud extent with SST based on seasonal variability differs greatly between the Zhai and Brient studies for several CMIP5 models.[14] If Brient & Schneider’s assessment of consistency with the observational constraint had been substituted for Zhai et al.’s for the four models for which they differ radically, the resulting constrained ECS range would have a virtually identical median to that of their CMIP5 model-ensemble.[15]

The Zhai methods have more shortcomings than those used by Brient & Schneider for their very similar emergent constraint, and the radical difference for four CMIP5 models in the two studies’ assessment of consistency with the observational constraint from seasonal variations is a major concern. Brient & Schneider found that models were not very good at reproducing the seasonal cycle in low cloud reflection, and the correlation with ECS for their seasonal variability measure was relatively low – much lower than Zhai et al found. These issues, taken together, severely dent the credibility of the Zhai et al. constrained ECS estimate.

Summary and conclusions

It is fairly clear that all potentially credible emergent constraints on ECS in climate models that have been investigated really constrain SW low cloud feedback (Qu et al. 2018).[16] Even the Cox constraint,[17] which is based on fluctuation-dissipation theory, is strongly dominated by SW cloud feedback. That is also likely to be the case for emergent constraints that are proposed in future, since low cloud feedback is the dominant source of inter-model variation in ECS.

The fairly detailed Caldwell 2016 review identified four emergent constraints that were potentially credible, although it did not investigate them in detail. Of these, more detailed examination casts doubt on the credibility of the Sherwood D and Brient Shal constraints, which in any event each explain only about 15% of the ECS variance in CMIP5 models.

The Brient Alb and Zhai emergent constraints are very similar; they both involve the variation of low cloud SW reflection with SST. Zhai makes much less allowance for uncertainty than does Brient Alb, hence its narrow constrained ECS range. However, in Brient Alb seasonal variations – as used in Zhai – were considered to produce a less satisfactory constraint than deseasonalized variations, since models are relatively poor at reproducing the observed seasonal cycle. There are also several cases where the two studies’ assessment of consistency with the observations of a model’s seasonal variations in low cloud reflectivity differ radically. Substituting the Brient Alb assessment for Zhai’s in those cases would bring the median Zhai constrained estimate Therefore, the Zhai constraint seems unreliable.

The main implication of the Brient Alb emergent constraint is that the relationship between deseasonalized tropical low-cloud SW reflection and SST in the low-sensitivity inmcm4 and GISS-E2 models is far from that observed. However, it is not clear that has much to do with low ECS as such, since the observed relationship is well matched by the MRI-CGCM3 and IPSL-CM5B-LR models, which both have bottom-quartile ECS values.

There is ample reason to doubt that the response of low-level cloudiness to environmental conditions, and hence low cloud feedback, is realistic in CMIP5 models. A recent review of shallow cumulus in trade-wind regions,[18] a major contributor to low cloud feedback in CMIP5 models – which parameterize clouds – had this to say:

In models with parameterized convection, cloudiness near cloud-base is very sensitive to the vigor of convective mixing in response to changes in environmental conditions. This is in contrast with results from high-resolution models, which suggest that cloudiness near cloud-base is nearly invariant with warming and independent of large-scale environmental changes.

It goes on to question:

whether the strongly negative coupling between low-level cloudiness and convective mixing in many climate models (as shown in Sherwood et al. 2014; Brient et al. 2015; Vial et al. 2016; Kamae et al. 2016) may be a consequence of parameterizing the convective mass flux in a manner that does not sufficiently account for its link to the mass budget of the subcloud layer.

This concern is supported by Zhao et al. (2016),[19] who found that by varying the cumulus convective precipitation parameterization in the new GFDL AM4 model they could engineer its climate sensitivity over a wide range without being able to find any clear observational constraint that favoured one version of the model over the others. The fact that developing aspects of a model can leave its satisfaction of observational constraints unaltered but drastically change its sensitivity seems to fatally undermine the emergent constraint approach, at least in relation to all constraints for which this can occur.

More generally, Qu et al. (2018) point out that any proposed ECS constraint should not be taken at face value, since other factors influencing ECS besides shortwave cloud feedback could be systematically biased in the models.

There is another serious problem with emergent constraints that do not involve the response of the climate system to increasing greenhouse gas concentrations over multidecadal or longer periods. It is that climate feedback strength in GCMs depends strongly on the pattern of SST increase, particularly in the tropics, one reason being that tropical marine low clouds respond to remote as well as to local SST. In simulations by atmosphere-only CMIP5 models driven by evolving SST patterns (AMIP simulations), if the observed historical evolution of SST patterns is used feedback strength is much greater (climate sensitivity is lower) than if the historical evolution of SST simulated by coupled CMIP5 models is used.[20] And feedback strength in AMIP simulations driven by the evolving SST pattern from simulations involving an initial abrupt increase in CO2 concentration is even lower.[21]

The dependence of sensitivity on the SST warming pattern, in GCMs at least, implies that even if a valid, strong emergent constraint on ECS in coupled GCMs were found, and there were no shortcomings in the atmospheric models of GCMs that satisfied the constraint, that would be insufficient to constrain real-world ECS. Doing so would also require establishing that the long-term evolution of tropical SST patterns in coupled GCMs forced by increasing greenhouse gas concentrations is realistic, notwithstanding that CMIP5 historical simulations do not match the observed warming pattern.

It is interesting that Tapio Schneider, the joint author of the Brient Alb paper, with considerable mathematical/statistical abilities, advocates caution regarding emergent constraint studies.[22] Such caution is amply justified. It seems doubtful that emergent constraints will be able to provide a useful, reliable constraint on real-world ECS unless and until GCMs are demonstrably able to simulate the climate system – ocean as well as atmosphere – with much greater fidelity, including as to SST warming patterns under multidecadal greenhouse gas driven warming.

.

Nic Lewis March 2018

[1] An emergent constraint on ECS is a quantitative measure of an aspect of GCMs’ behaviour (a metric) that is well correlated with ECS values in an ensemble of GCMs and can be compared with observations, enabling the derivation of a narrower (constrained) range of GCM ECS values that correspond to GCMs whose metrics are statistically-consistent with the observations.

[5] Tapio Schneider developed RegEM, an impressive ridge-regression based algorithm for infilling missing data. A less satisfactory variant of RegEM has been much used by Michael Mann for paleoclimate proxy-based reconstructions.

[6] Kullback-Leibler divergence is a measure of relative entropy, which can be used to measure how similar two probability distributions are.

[7] Brient & Schneider’s method thus down-weights a model whose estimate has a larger or smaller uncertainty than the observational estimate even if the model’s and the observational mean estimates are identical.

[8] Brient and Schneider justify using a divergence measure based on the similarity of the model and observational estimate PDFs on the basis that “they are estimated from time series of the same length L so that their sampling variability can be expected to be equal if a [statistical] model is adequate”. But the observational estimate uncertainty includes measurement and related errors that are not present in the model estimate uncertainty (although these appear to be relatively unimportant in this case), while only the model estimates sample decadal/multidecadal climate system internal variability, which very possibly affects the TLC reflection–SST relationship. Moreover there is little overlap between the periods used to estimate the TLC reflection–SST relationship from model simulations and observations, and there were three major volcanic eruptions during 1959-2005 but none during the 2000-2015 observational period. Volcanic eruptions have complex, major effects on atmospheric circulation and may well temporarily disrupt the TLC reflection–SST relationship. More fundamentally, whether a GCM realistically simulates the actual TLC reflection–SST relationship and whether it simulates climate system internal variability (which will impact the uncertainty in the estimate of its TLC reflection–SST relationship) are two quite different matters, the second of which appears to have little relevance to the emergent constraint involved Therefore, there is no reason to expect the uncertainty of the model and observational estimates of the TLC reflection–SST relationship to be the same, and comparing model and observational estimate PDFs (as opposed to just comparing their central measures) does not seem to me appropriate here.

[10] Weighting models by the likelihood of the observed TLC reflection–SST relationship at the model’s best estimate (mean) of it, widening the observational uncertainty to allow for the average uncertainty of the model estimate means, is a more reasonable approach. Their raw (unweighted) model ECS uncertainty 17-83% and 5-95% ranges of 2.4–4.25°C and 1.85–4.8°C can be closely matched by assigning each model’s ECS a standard deviation of 0.5°C. If one then weights the models on the proposed basis, the uncertainty ranges become 2.9–4.45°C and 2.3–4.9°C, very close to Brient & Schneider’s weighted ranges. However, if IPSL-CM5A-LR is replaced by a second copy of IPSL-CM5B-LR, the 17-83% and 5-95% ranges become 2.65–4.4°C and 2.15–4.85°C. Moreover, the median weighted ECS estimate is 3.6°C, well below the weighted posterior mode of 4.05°C that Brient & Schneider point to, and not much different from the raw model median ECS of 3.45°C.

[12] Although in Brient Alb the correlation of seasonal TLC SW reflection variability with ECS was relatively low, the seasonal cycle is stronger in the sub-tropics than in the tropics. However, much of the marine low cloud regions are situated in the 15°–30° zones, so Brient Alb should have captured most of the seasonal variability in overall tropical and sub-tropical low cloud reflection.

[13] Also, the model numbering differs slightly between Zhai et al. Table 1 and their Figures 2 and 3, which raises the possibility of models having been mixed up in their calculations.

[15] A crude revised central estimate is 3.4°C, being the median ECS of the 7 models (CGCM3.1, HadCM3, CanESM2, IPSL-CM5A, MRI-CGCM3, NCAR-CAM5, NorESM1-M) whose seasonal variability lies within the uncertainty range for the observational estimate, after substituting the Brient & Schneider consistency assessment for the 4 models where if differs radically. CGCM3.1 is a CMIP3-only model; its ECS is 3.4°C but excluding it would increase the constrained median ECS to 3.5°C, in line with the unconstrained median ECS for Zhai’s CMIP5 models (including HadCM3, which is both a CMIP3 and CMIP5 model) of 3.45°C.

20 Comments

“It is fairly clear that all potentially credible emergent constraints on ECS in climate models that have been investigated really constrain SW low cloud feedback”

Whether it is SW or LW feedback, it is definitely clouds.

Furthermore, under clear sky conditions, both LW earth spectrum and net incoming SW less outgoing LW are flat or slightly increasing. These measurements disagree with the prediction of the MODTRAN radiative transfer model.

Ron, sorry for the hastiness of the comment and the slow response. Been crazy.

The central point is that there is a fundamental conflict between MODTRAN and CERES, and it is difficult to see how they can both be right. A secondary point is that the net cloud radiative effect and the net all sky TOA (which I neglected to label as inverted) are very closely related. This latter point is important for any “emergent” sensitivity.

CERES data are divided into “all sky” or whatever the imager sees and “clear sky” where clouds are parsed out. All sky data should be generally better and more complete.
All sky TOA data can be summarized as LW and NET loss to space increasing; SW decreasing.
This is astonishing as on a warming planet with emergent sensitivity to clouds, we expect exactly the opposite.

Tapio Schneider’s new project to improve Earth system Modelling: ” Replacing the manual tuning process and the offline fitting of parameterization schemes to data from few locations, the ESM we envision will autotune itself and quantify its uncertainties based on statistics of all available data.” http://climate-dynamics.org/earth-system-modeling-2-0/
Manual tuning is slow, limited but also potentially biased(my Italics). Lets see if the programmers for this Machine Learning manage to work neutrally.

Nic concludes: “It seems doubtful that emergent constraints will be able to provide a useful, reliable constraint on real-world ECS unless and until GCMs are demonstrably able to simulate the climate system – ocean as well as atmosphere – with much greater fidelity, including as to SST warming patterns under multidecadal greenhouse gas driven warming.”

Whereas there seems to be a consensus by virtually everyone in climate science that cloud feedback is an extremely important unknown, could there not also be a consensus built that increased observational studies using focused, specialized models perhaps might be more productive to shed light on the relationship that clouds of all types and circumstance have on the Earth’s energy budget? It just seems to me that there are too many unvalidated pieces to the GCM model to be studying them to expect statistically valid output. Once some of the pieces of the puzzle are definitively solved that would result in rest of the unknowns in the puzzle being able to be solved through GCMs, the ones most difficult to study independently. Is there any others that share this thought?

Nic, I know you have to be sensitive to political currents but do you have any opinions you can voice on where you would like resources allocated to constrain ECS?

Ron, quite a lot of effort is now being put into getting a better understanding of cloud (and convective) behaviour and improving its modelling, aided both by better observations and improved modelling capabilities. I think this is one area where resources should be focussed. However, I thing it will be some time before a satisfactory outcome is achieved.

There are also other things critical to constraining ECS that need resolving, in particular the failure of AOGCMs correctly to simulate evolving SST patterns over the historical period, for unknown reasons, seems to be a major contributor to the difference between energy-budget estimates of ECS using observations and the higher ECS typical in climate models.

It looks like our understanding of cloud formation, movement and dispersal clearly needs to be improved which would then eventually improve the cloud models, especially in the infrared. My question is: are there any similar issues in modeling ocean heat uptake, mixing, etc that is causing similar problems and are we putting resources into that issue if so? Thanks for your views.

EdeF, you ask “are there any similar issues in modeling ocean heat uptake, mixing, etc that is causing similar problems and are we putting resources into that issue if so?”

I certainly think resources should be put into improving ocean modelling; I’m unsure whether there is now adequate focus on this area. Ocean modelling isn’t something I know much about, but the resolution required to well represent eddies, mixing etc. in oceans is higher than for the atmosphere, so computational contraints may be major issue.

Nic and EdeF: This figure may help:
It shows the spatial correlation between tos (CMIP5 mean) and ERSSTv5 for the timespan 1950…2015. The figure was made with the KNMI climate explorer, thanks to Geert-Jan.
The correlation is >0.3 ( that’s faint enough!) in the tropics but not in the east Pacific (ENSO!). In the extratropics the correlation is below 0.2. Not a glorious chapter for the model mean IMHO!

Frank asks: Is this a failure of models to correctly predict how much SWR has been arriving at the surface of the ocean due to incorrectly reproducing local cloud SWR amount? If so, diagnosis and correction might be relatively straightforward. Or is this a consequence of local vertical diffusion of heat in the ocean prompted by local eddies (a phenomena I remember from somewhere, but don’t understand)? Or it could be due to excess DLR from the lower surface of cloud that is higher than anticipated?

I find it surprising that publications show how various feedbacks in each grid cell change during 4X runs, but AR5 has no figures showing how the robust feedbacks observed during seasonal warming agree with those produced by models. When the day comes that climate science is willing to show this data, the ECS from AOGCMs might be worth comparing with that of EBMs.

Nic: In IPCC reports, one can find Figures showing how well the multi-model mean and each model reproduces observed temperature everywhere on the planet: Mean annual temperature, standard deviation (a measure of seasonal temperature change) and even diurnal temperature range. The IPCC wants readers to clearly see how well (or poorly) models reproduce observed temperature and temperature change around the globe.

If the climate science community were interested, they presumably could grid data for observed and modeled feedbacks in response to seasonal warming. Since seasonal warming is so large outside of the tropics, there are very large seasonal changes in LWR and SWR that can be observed from clear and cloudy skies, possibly from cloudy skies with different types of clouds: MBL clouds, shallow convection and deep convection. (This data is summarized globally in Tsushima and Manabe (2013) PNAS). I don’t know if such plots have appeared in publications, but IIRC the AR5 Chapter on Clouds offers no Figures comparing observation and simulation.

Gridding data for feedbacks would likely just show how poor the models reproduce observations. There are reasons why typically only temperature comparisons are presented and why they are usually done on a global scale.