Probabilistic(?) estimates of climate sensitivity

James Annan (with Hargreaves) has a new paper out, entitled “On the generation and interpretation of probabilistic estimates of climate sensitivity.” Here is the abstract:

The equilibrium climate response to anthropogenic forcing has long been one of the dominant, and therefore most intensively studied, uncertainties in predicting future climate change. As a result, many probabilistic estimates of the climate sensitivity (S) have been presented. In recent years, most of them have assigned significant probability to extremely high sensitivity, such as P(S gt 6C) gt 5%. In this paper, we investigate some of the assumptions underlying these estimates. We show that the popular choice of a uniform prior has unacceptable properties and cannot be reasonably considered to generate meaningful and usable results. When instead reasonable assumptions are made, much greater confidence in a moderate value for S is easily justified, with an upper 95% probability limit for S easily shown to lie close to 4°C, and certainly well below 6°C. These results also impact strongly on projected economic losses due to climate change.

The punchline is this. The IPCC AR4 states that equilibrium climate sensitivity is likely (> 66%) to lie in the range 2–4.5C and very unlikely (< 10%) to lie below 1.5C. Annan and Hargreaves demonstrate that the the widely-used approach of a uniform prior fails to adequately represent “ignorance” and generates rather pathological results which depend strongly on the selected upper bound. They then turn to the approach of representing reasonable opinion through an expert prior. They examine Beta distributions and Cauchy distributions, including a long tailed Cauchy prior. They conclude that:

Thus it might be reasonable for the IPCC to upgrade their confidence in S lying below 4.5oC to the “extremely likely” level, indicating 95% probability of a lower value.

This is a nice analysis with a useful link to economic decision making, but possibly with a fatal flaw.

There are many scientific papers and blog posts on this topic, but these provide a basic introduction if you aren’t familiar with the topic

Ignorance, pdfs, and Bayesian reasoning

Take a look at this figure from the IPCC AR4, which represents the pdfs or relative likelihoods for equilibrium climate sensitive from a range of studies (both models and observations). The distributions cover a wide range. Recall that the IPCC AR4 states that equilibrium climate sensitivity is likely (> 66%) to lie in the range 2–4.5C and very unlikely (< 10%) to lie below 1.5C. Annan and Hargreaves find that it is very unlikely to be above 4.5C in the context of a Bayesian analysis, as a result of prior selection and expert judgment.

The issue that I have with this is that the level of ignorance is sufficiently large that probabilities determined from Bayesian analysis are not justified. Bayesian reasoning does not deal well with ignorance. The fact that others have created pdfs from sensitivity estimates and that economists uses these pdfs is not a justification; rather, climate researchers and statisticians need to take a close look at this to see whether this line of reasoning is flawed.

My comments here relate specifically to determination of equilibrium climate sensitivity from climate models. Stainforth et al. (2007) argue that model inadequacy and an inadequate number of simulations in the ensemble preclude producing meaningful probability PDFs from the frequency of model outcomes of future climate. Smith (2006) goes as far to say “Indeed, I now believe that model inadequacy prevents accountable probability forecasts in a manner not dissimilar to that in which uncertainty in the initial condition precludes accurate best first guess forecasting in the root-mean-square sense.” Stainforth et al. state:

The frequency distributions across the ensemble of models may be valuable information for model development, but there is no reason to expect these distributions to relate to the probability of real-world behaviour. One might (or might not) argue for such a relation if the models were empirically adequate, but given nonlinear models with large systematic errors under current conditions, no connection has been even remotely established for relating the distribution of model states under altered conditions to decision-relevant probability distributions. . . There may well exist thresholds, or tipping points (Kemp 2005), which lie within this range of uncertainty. If so, the provision of a mean value is of little decision-support relevance.

Furthermore, they are liable to be misleading because the conclusions, usually in the form of PDFs, imply much greater confidence than the underlying assumptions justify; we know our current models are inadequate and we know many of the reasons why they are so.These methods aim to increase our ability to communicate the appropriate degree of confidence in said results. Each model run is of value as it presents a ‘what if’ scenario from which we may learn about the model or the Earth system. Such insights can hold non-trivial value for decision making.

I agree with Stainforth et al. and Smith on this. Insufficiently large initial condition ensembles combined with model parameter and structural uncertainty preclude forming a pdf from climate model simulations that has much meaning in terms of establish a mean value or confidence intervals.

Annan and Hargreaves recognize the potential inadequacy of the Bayesian paradigm:

However, it must be recognised that in fact there can be no prior that genuinely represents a state of complete ignorance (Bernardo and Smith, 1994, Section 5.4), and indeed the impossibility of representing true ignorance within the Bayesian paradigm is perhaps one of the most severe criticisms that is commonly levelled at it.

Imprecise probabiity measures

Annan and Hargreaves state:

While the Bayesian approach is not the only possible paradigm for the treatment of epistemic uncertainty in climate science (eg Kriegler, 2005), it appears to be the dominant one in the literature. We do not wish to revisit the wider debate concerning the presentation of uncertainty in climate science (eg Moss and Schneider, 2000; Betz, 2007; Risbey, 2007; Risbey and Kandlikar, 2007) but merely note that despite this debate, numerous authors have in fact presented precise pdfs for climate sensitivity, and furthermore their results are frequently used as inputs for further economic and policy analyses (eg Yohe et al., 2004; Meinshausen, 2006; Stern, 2007; Harvey, 2007)

While uncertainty has traditionally been represented by probability, probability is not good at representing ignorance and there are often substantial difficulties in assigning probabilities to events. Other representations of uncertainty have been considered in the literature that address these difficulties (see Halpern 2003 for a review), including Dempster-Shafer evidence theory, possibility measures, and plausibility measures. These alternative representations of uncertainty allow for ignorance and also surprises. While it is beyond the scope of this thread to describe these methods in any detail, a brief description is given of Dempster-Shafer evidence theory and possibility theory, including their relative advantages and disadvantages and how various concepts of likelihood play out in each of these methods. The description below follows Halpern (2003) and references therein.

Dempster-Shafer theory of evidence (also referred to as evidence theory) allows nonmeasureable events, hence allowing one to specify a degree of ignorance. In evidence theory, likelihood is assigned to an interval (referred to as sets), as opposed to probability theory where likelihood is assigned to a point-valued probability and a probability density function. Evidence theory allows for a combination evidence from different sources and arrives at a degree of belief (represented by a belief function) that accounts for all the available evidence. Beliefs corresponding to independent pieces of information are combined using Dempster’s rule of combination. The amount of belief for a given hypothesis forms a lower bound, and plausibility is the upper bound on the possibility that the hypothesis could be true, whereby belief ≤ plausibility. Probability values are assigned to sets of possibilities rather than single events: their appeal rests on the fact they naturally encode evidence in favor of propositions. Evidence theory is being used increasingly for the design of complex engineering systems. Evidence theory and Bayesian theory provide complimentary information when evidence about uncertainty is imprecise. Dempster–Shafer theory allows one to specify a degree of ignorance in this situation instead of being forced to supply prior probabilities which add to unity. (see also the Wikipedia).

Possibility theory is an imprecise probability theory that considers incomplete information. Possibility theory is driven by the principle of minimal specificity that states that any hypothesis not known to be impossible cannot be ruled out. In contrast to probability, possibility theory describes how likely an event is to occur using the dual concepts of the possibility and necessity of the event, which makes it easier to capture partial ignorance. A possibility distribution distinguishes what is plausible versus the normal course of things versus surprising versus impossible. Possibility theory can be interpreted as a non-numerical version of probability theory or as a simple approach to reasoning with imprecise probabilities. (see also the Wikipedia).

I don’t know why imprecise probability paradigms aren’t more widely used in climate science. Probabilities seem misleading, given the large uncertainty.

Black swans and dragon kings

On the one hand, it is “comforting” to have a very likely confidence level of sensitivity not exceeding 4.5C, since this provides a concrete range for economists and others to work with. It may be a false comfort. I discussed this issue in my AGU presentation “Climate surprises, catastrophes, and fat tails” .

Stainforth et al. states:

How should we interpret such ensembles in terms of information about future behaviour of the actual Earth system? The most straightforward interpretation is simply to present the range of behaviour in the variables of interest across different models (Stainforth et al. 2005, 2007). Each model gives a projected distribution; an evaluation of its climate. A grand ensemble provides a range of distributions; a range of ICE means (figure 2), a range of 95th centiles, etc. These are subject to a number of simple assumptions: (i) the forcing scenario explored, (ii) the degree of exploration of model and ICU, and (iii) the processes included and resolved. All analysis procedures will be subject to these assumptions, at least, unless a reliable physical constraint can be identified. Even were we to achieve the impossible and have access to a comprehensive exploration of uncertainty in parameter space, the shape of various distributions extracted would reflect model constructs with no obvious relationship to the probability of real-world behaviour.

Thus, how should we interpret and utilize these simulations? A pragmatic response is to acknowledge and highlight such unquantifiable uncertainties but to present results on the basis that they have zero effect on our analysis. The model simulations are therefore taken as possibilities for future realworld climate and as such of potential value to society, at least on variables and scales where themodels agree in terms of their climate distributions (Smith 2002). But even best available information may be rationally judged quantitatively irrelevant for decision-support applications. Quantifying this relevance is a missing link in connecting modelling and user communities. Today’s ensembles give us a lower bound on the maximum range of uncertainty. This is an honest description of our current abilities and may still be useful as a guide to decision and policy makers.

Applying pdfs of climate sensitivity to economic modelling seems fraught with the potential to mislead and preclude the possibilities of black swans and dragon kings. Didier Sornette’s presentation at the AGU entitled “Dragon-Kings, black swans, and prediction” raises these issues, which need to be considered in the context of risk assessment and economic modeling.

Questions

In summary, given the large uncertainties, I am unconvinced by Annan and Hargreave’s analysis in terms of providing limits to the range of expected climate sensitivity values. Expert judgment and Bayesian analysis are not up to this particular task, IMO.

I would appreciate some input and assessment on these issues from the denizens with statistical expertise. Imprecise probability theories (e.g. evidence theory, possibility theory, plausibility theory) seem to be far better fits to the climate sensitivity problem. As far as I can tell, these haven’t been used in climate science (other than Krieger’s 2005 Ph.D. thesis), and my fledgling attempts (e.g. Italian flag analysis; note Part II is under construction).

This is a technical thread; comments will be moderated for relevance (general comments about the greenhouse effect should be made on the Pierrehumbert thread). Relevant topics are the statistical methods used by Annan and Hargreaves, economic modelling using climate sensitivity information, and evidence for/against a large sensitivit (>4C) and for a low sensitivity (<1.5C).

144 responses to “Probabilistic(?) estimates of climate sensitivity”

It is my slightly informed speculation that at least one of the models (HadGEMx) may not come up data from which a sensitivity can be calculated.

Hadley and perhaps other modelling groups will be incorporating a lot more dynamics with respect to CO2 levels. By which I mean that the emission senarios will give rise to different CO2 concentrations in different runs. It seems that the dynamics will include full biosphere uptake/release modelling for CO2 and possible other trace gasses.

All in all, the results could be breathtaking. Well that’s my rumour and I’m sticking to it. I anticipate the next AR round to setting the tigers amongst the swans.

In my view the system uncertainties are so great that the models merely demonstrate a narrow form of physical possibility, to which no likelihood can be attached. Here is a simple test. Statistically a new ice age seems reasonably likely (10%?) to start in the next thousand years or so, and certainly it is possible. How do the models represent this possibility? So far as I know they do not, for the simple reason that the ice age mechanism is not included. This is true for other known unknown mechanisms as well.

THE GENIE OF UNCERTAINTY. My thoughts exactly. On the bright side, once the genie of uncertainty is let out of the bottle, it is mighty difficult to stuff back in. If the chance of ice age over a particularly period were 1% and the adverse impacts on humanity x10 those of the gloomiest AGW scenarios then the precautionary principle surely tells us something very different to the IPCC narrative. I wonder who should I look to to do these sorts of sums properly? DavS

You should look at other data sources. The direction in which long term temperatures are moving is very much a result of the span chosen as a baseline. Proxy data, especially O-18/O-16 ratios indicate that the planet may very well have begun the descent into the next glacial about 8,000 years ago, around which time post-Pleistocene temperatures seem to have peaked. At the scale, the general course of global temperatures has been generally downward and the length of warm episodes appears to be shortening. So, statistically the “next” ice age may already have started about 8,000 years ago.

I’m afraid I have not had a chance to read all of this thoroughly yet, but noticed your call for “others” on climate sensitivity estimates. I am a fan of Stephen Schwartz of Brookhaven National Labs. His 2007 paper on climate sensitivity can be found at

Good papers and they were published after AR4. Annan and Hargreaves think it is “very likely” sensitivity is less than 4.5C. But Schwartz paper increases the chance sensitivity is below 1.5C. The Schwartz estimate is based on CRU data. If there is a systemic warming bias in CRU data (as Climategate hints there may be), then the Schwartz estimate may be significantly too high.

The problem with Schwartz is that his method undershoots the known sensitivities of the climate models it is tested against. The fact that his method fails to identify known sensitivities in a plausible system should give pause to his results.

Two issues here:
(1) What does the model produce as sensitives? and
(2) How well does that represent the real world.?

On (1), for any given model you can produce sensitivities, but this still begs question (2). The fact that Schwartz’s simpler model produces lower values than more complex models raises reasonable questions around (2) that are worth exploring.

Because Schwartz’s model is simpler it is easier to account for and quantify the uncertainty in it (in fact much of the uncertainty in complex GCMs is hidden eg see Stainford et al referenced in the post), so if you take the view that you are interested not just in the mean but the variation in the estimate Schwartz’s model, despite being simpler, gives you better information.

Thanks for this. I’ve been interested in the use of simpler models to start the process of selectively taking steps to add complexity where this is material to the matter in hand, rather than trying to do everything from first principles (as per GCMs). Schwartz’s model is useful in this regard.

I do think that forecasting how the climate might change on decadal timescales in the face of increased CO2 concentrations is a much more constrained problem than those that GCMs typically are set to deal with. By looking at the error terms in Schwartz’s model you can start to get a sense of where more information would be valuable.

On a tangent, but still within the issues raised in the original post, this kind of thinking is useful for analyzing extreme events (the ones in the fat tails). If we have fat tails in our pdfs then we can do one of a couple of things. We can run lots of experiments (aka model runs) to assess their actual pdfs, or we can specifically look for the physical circumstances that correspond to the large deviations and investigate those processes specifically.

Agreed. Guesswork isn’t expertise and expertise is fundamental to any Bayesian expert system. Put simply, you have to have a good track record of being largely correct to be classed as an expert. What has been largely ignored though is that the more skeptical have a far better track record of being correct with their predictions. Some folk may not like to believe it but that doesn’t make it less true.

I’m still waiting for Annan to realise that there are degrees of guesswork and the higher the value above the canonical 1 degree, the more the guesswork involved, therefore it should be weighted appropriately. So imo a truer analysis of climate science opinion should have a mode at around 1 degree. This is because feedback is assumed positive without much in the way of real evidence.

Estimating a range of climate sensitivity without regard to empirical data is just a self-fulfilling prophecy. Few people are likely to stray far from established range. You don’t even have to do much to reduce the climate sensitivity arising from models; just look for negative feedbacks to balance the positive ones or turn up the aerosol or natural variation cycles knobs.
Here’s one example:http://www.nasa.gov/topics/earth/features/cooling-plant-growth.html
With the line;
“To date, only some models that predict how the planet would respond to a doubling of carbon dioxide have allowed for vegetation to grow as a response to higher carbon dioxide levels and associated increases in temperatures and precipitation.”

But the annoying part is the phrase ” the global warming trend that is expected”. Why is it expected? Because of the assumption of high climate sensitivity. But isn’t that what we are trying to establish in the first place? It all becomes just circular reasoning. The result is determined by your initial assumptions.

A true sensitivity analysis of most models would actuall come up with massive cooling or massive heating or anything in between due to those massive error bars. Not my definition of useful! More people need to leave the desk and get in the field and collect data. There is just no way around it.

James,
I agree on the circular reasoning aspect. We saw the same thing between the Trenberth and Pierrehumbert papers. Trenberth said the CERES data was wrong (showed too much warming) and that 0.9 W fit the theory better. Pierrehumbert writes a paper citing the 0.9 W like it is a measurement and says the theory is confirmed. C’mon!

Rather than being exhaustive about estimating techniques (there are a quite a few), I’ll stay pretty close to to what was mentioned. One simple technique which has some history is to poll a group of experts (they don’t actually have to be experts) and ask for the least it could reasonably be, the most it could reasonably, and the single value they think is their best guess. The least and most values are used as corrections and indicators of bias, and combined with the “best guess” to arrive at a corrected value for best guess, as well as an indication of the uncertainty of the , assuming the problem has a particular kind of underlying distribution to the estimates. You get an estimate of the value, as well as a standard deviation of the estimate.

This works fairly well, but the underlying assumption is you’re using the appropriate distribution for correcting the estimate and estimating standard deviation. More broadly, any of these types of estimating techniques assumes the experts (or non-experts) has enough underlying information of make informed guesses. The spread between an individual’s most and least estimates frequntly gives insight into how well they understand the area. I could probably ask alomost any group of people to come up with estimates for how long it would take to dig a trench in the sand of certain proportions, and get reasonable estimates to work with. That isn’t the case for something like the climate sensitivity. Perhaps the best chance would be to find people who have verifiably already determined this type of parameter for other complex non-linear systems and have them make estimates.

As an aside, there’s an adjunct technique for refining expert estimates. After the first round of estimates, the experts must assume that their estimates are wrong. Then they must say what causes them to be wrong, and give a revised set of estimates, which are then used. This doen’t work if the experts won’t concede they could be wrong.

How to describe uncertainties was the most central theme in the InterAcademy Council Review of IPCC procedures. They presented many good suggestions mostly stating that one should not use formulation that does not correspond to the nature of the uncertainties. Here are their recommendations without explanatory texts that followed:

Recommendation: All Working Groups should use the qualitative level-of-understanding scale in their Summary for Policy Makers and Technical Summary, as suggested in IPCC’s uncertainty guidance for the Fourth Assessment Report. This scale may be supplemented by a quantitative probability scale, if appropriate.

Recommendation: Chapter Lead Authors should provide a traceable account of how they arrived at their ratings for level of scientific understanding and likelihood that an outcome will occur.

Recommendation: Quantitative probabilities (as in the likelihood scale) should be used to describe the probability of well-defined outcomes only when there is sufficient evidence. Authors should indicate the basis for assigning a probability to an outcome or event (e.g., based on measurement, expert judgment, and/or model runs).

Recommendation: The confidence scale should not be used to assign subjective probabilities to ill-defined outcomes.

Recommendation: The likelihood scale should be stated in terms of probabilities (numbers) in addition to words to improve understanding of uncertainty.

Recommendation: Where practical, formal expert elicitation procedures should be used to obtain subjective probabilities for key results.

While I agree almost fully on the principles they state, I was left with the impression that they were in practice sympathetic to deviations from these principles in many situations. The value of the climate sensitivity is a prime example. I think that they should have concluded that there is not sufficient basis for presenting probability distributions, but they rather applauded using the PDF in this case.

Since the hypothetical way in which climate sensitivity is estimated by the IPCC has been shown to be wrong, (Tomas Milancovic, Gerlich & Tscheuschner , no observed data, no scientific method etc. ) does it matter what the accurcay is?

I would be more intersted in this debate regarding climate sensitivity if climate scientists were working with a terrestrial data record that was uncorrupted. How about a third party audit on these records and an effort to get them cleaned up? Untill then, my daily visits to this site have little impact on my quest for the “truth” regarding AGW.

WOW,
Not one cloud or drop of rain.
Proxies seem to forget that there is far more going on than just temperatures.
If using proxies that is effected by moisture, than that MUST be looked at as stunted growth can also be caused by too little or too much rain.

An exceptionally good point. Too often the focus on this debate is binary, this scews the significance of any percieved changes in the data (ESPECIALLY proxies) and gives potentially false conclusions.

As someone unfamiliar with the technical intricacies involved in deriving some of the cited pdfs, I’m forced to rely principally on my intuitive conceptualization of probability and its implications. I’m particular uncomfortable with a choice of Bayesian priors based on subjective belief, whether it be by experts or novices like me. When it comes to climate sensitivity, I wonder whether adequate attention has been paid to the use of reasonably objective priors derived from one era or one method as a basis for pdf estimates applied to a different era or method.

In particular, I’m struck by the fact that in AR4 WG1, two separate chapters, 8 and 9, are devoted to climate sensitivity, approached from two different directions. Chapter 8 uses modeled estimates of feedbacks, constrained by observational data typically obtained via modern technology such as satellite monitoring, to construct probable climate sensitivity values. In most cases, these range from about 2 to 4.5 C per doubled CO2 within the context of our current climate – with a most likely value between 2 and 3 C. On the other hand, chapter 9 describes attempts ranging far back into paleoclimatology to relate forcings to temperature change, sometimes directly (with all the attendant uncertainties), and more often by adjusting model parameters to determine the climate sensitivity ranges that allow the models to best simulate data from the past – e.g., the Last Glacial Maximum (LGM). Here, the climate sensitivity ranges tend to be larger; most still encompass ranges between about 1 and 6 C per doubling, but a few accommodate even a lower boundary, and several pdfs extend far beyond 6 C on the upper end. Because the parameter adjustment utilizes a choice of prior probabilities to generate pdfs, I wonder to what extent values of the kind described in Chapter 8 have been applied to constrain the priors used for the Chapter 9 approach. It seems to me that the two approaches are sufficiently independent to justify applying one to the other, although ensuring independence is sometimes difficult.

So many estimations of climate sensitivity have now been made, involving many different methods and eras, that I have the sense that our confidence in the general range of values that has emerged is reinforced by the convergence of data. Unfortunately, we are still left with a rather large range. As Annan and Hargreaves point out, the uncertainties, particularly at the high end, profoundly affect how we evaluate the costs of climate mitigation or the dangers of failure to mitigate.

I believe that there may be a larger issue with estimation of ECS from models, even before one considers how an ensemble of within-model or between-model results might sensibly be amalgamated into a pdf.

Schwartz 2010 points out that fits to observational data typically give rise to lower estimates of ECS, and also that model-derived values of ECS are strongly controlled by the degree of (negative) aerosol forcing incorporated into each individual GCM. Nothing new there. However, I don’t believe that anyone has tried to emulate the model results themselves to test the validity of ECS estimates derived “empirically” from model runs.

I very recently fitted a Schwartz-type model to the published GISS E results using the actual forcing data used by the GISS E model. Using superposition to forward integrate the flux perturbations and temperature responses to the aggregated forcing data, I optimised the fit to GISS E temperature and OHC anomaly data, by adjusting the ECS and the equilibration time constant in the Schwartz formulation, plus a factor applied to the volcanic forcing. The fit is really quite remarkable. This was the first fit to temperature (adjusted R^2 = 0.985):
and here is the fit to OHC:
Using the equilibration time from the OHC fit to update the temperature match reduces the R^2 value by a tiny amount (to 98.2%).
The net result is a close-to-perfect functional equivalence between the Schwartz model and the GISS E model. The truly extraordinary thing is that the ECS used in the Schwartz model is 0.345 deg/Wm2 with the equilibration constant set at 3.5 years. The latter would suggest that 99% of equilibration is over after 16 years. The published values for the GISS-E model are 0.75 deg/Wm2 (or 2.7 deg for a doubling of CO2) and a full equilibration time amounting to several hundred years.
I can, at the moment, see only three possible explanations for this, apart from gross implementation error:
a) GISS E is storing significant additional energy somewhere outside the ocean
b) there is a positive bias in model variability in long-term equilibration runs as total forcings are increased
c) the information content in the temperature and OHC series is not sufficient to allow accurate estimation of ECS and equilibration time; a flux response function with a very long tail (high ECS, high equilibration time)may give a result similar to the Schwartz exponential response function with a low ECS and low equilibration time.

I would appreciate any other possible explanations that people may have.

My main point in the context of this thread is that the experts in climate science are pre-conditioned to accept model estimates of ECS because the paradigm is that the models are best able to encapsulate all of the variables. The reality however is that such confidence may well be misplaced.

Paul – How is your assessment influenced by recent data indicating that the rate of heat storage in the deep ocean may be greater than previously appreciated – see the preprint of Purkey and Johnson – J. Climate ? The published paper is behind a payline.

Fred,
I would like to chip in on this one. The deep oceans are more likely to warm when the upper oceans warm. Everyone agrees the upper oceans warmed dramatically in the 1990s and early 2000s. It would not be terribly surprising if some of this heat was caught in a down-welling.

The question is did the deep ocean continue to warm after 2003 when the upper ocean warming trend flattened? The paper, which I only skimmed, does not appear to demonstrate that it did.

Fred,
This paper does not – and cannot – change my assessment one iota. My results are not seeking to emulate realworld observations. My results reproduce the GISS E model results for the specific inputs used by that model which was frozen for AR4, and demonstrate that the reported ECS is suspect. When GISS decides how to accommodate abyssal energy and freezes their forcing data and model for AR5, I will repeat the exercise.

Hi again, Fred,
If I were to answer a different question:- to what extent are the P&J results likely to influence attempts to derive ECS from a match to realworld data, I would be inclined to say that they should have a small negative effect on ECS estimates, since this is energy which is coming into the “surface” system in addition to energy from flux imbalance, and which therefore needs to be discounted from previous estimates of flux imbalance. But looking at the magnitudes reported, it is unlikely that they will shift estimates of radiative energy gained by more than 3% (downwards) for the models which have high sensitivity, which is basically all of the CMIP GCMs.

Maybe I misinterpreted your point, but the heat stored in the deep ocean is leaving rather than “coming into” the surface system, if by surface you refer to the land and upper ocean. It is being transferred downward, but is capable of influencing upper ocean temperature in a warming direction for observations over a sufficient interval – which is to say multiple decades. The accessibility of deep ocean heat to the climate system tells us that the equilibration time relevant to multidecadal climate sensitivity estimates is longer than an interval based on upper ocean measurements, and so sensitivity will be underestimated if only the shorter interval is used. Relevant to these points, the Schwartz original value of about 0.30 deg/Wm2 was later adjusted upward by Schwartz to 0.51 deg/Wm2, with a corresponding increase in estimated sensitivity – Schwartz 2008 JGR

Regarding Ron Cram’s point above, I agree that the deep ocean will warm more when the upper ocean warms than when it cools, but it can warm in either case as long as it is cooler than the upper 700 meters – for example, loss of OHC from the surface during early phases of an El Nino may still be accompanied by transfer of upper ocean heat to the deep ocean. The PJ paper doesn’t really address the last decade in detail, but as I suggested in a discussion with Roger Pielke Sr in the “missing heat” thread, I believe that we will have to wait for at least another decade before judging whether the OHC ascertained from upper ocean measurements is departing significantly from earlier long term upward trends. Measurement uncertainty is too great in my view to justify conclusions based on much shorter intervals.

Hi Fred,
Thank you. You wrote:
“Maybe I misinterpreted your point, but the heat stored in the deep ocean is leaving rather than “coming into” the surface system, if by surface you refer to the land and upper ocean.”
I mistakenly assumed that the Purkey and Johnson paper related to heat from marine vulcanism. I now see that it postulates localised thermalised downwelling and upwelling. You are correct and I was wrong in my statement.

Your last point, however, about Schwartz revising his estimate of climate sensitivity is not relevant at all to my first posting. Schwartz derived his estimates from real-world data and initially failed to fully account for autocorrelation in the data from which he was abstracting a trend, hence the correction. I am not trying to repeat his work. I am taking the simple energy balance model which he used and applying it to the GISS E inputs and then comparing modelled results. This gives a close to perfect match, but with parameters in the simple model that are very different from those reported by GISS.

Fred the problem I have with this deep water heating is that it only seems to be invoked for the time period when the lines diverge on the trenberth’s energy budget graph.

Presumably prior to that, with the budget balanced, there is no nett transfer of energy to or from the deep in Trenberth’s mind. It’s not just about energy being transferred to the cold abyss, it’s this only happening post 2005. It seems straw clutching is being raised to the level of a working hypothesis.

Solar radiation can only penetrate so far. Plankton, sealife, impurities, salt changes, all have a factor in how much the sun heats the oceans and the depth it penetrates.
Next we have a planet rotating and a sun rotating, so these energy waves are not constantly hitting at a single point.

“c) the information content in the temperature and OHC series is not sufficient to allow accurate estimation of ECS and equilibration time; a flux response function with a very long tail (high ECS, high equilibration time)may give a result similar to the Schwartz exponential response function with a low ECS and low equilibration time.
”

For the first part:

” the information content in the temperature and OHC series is not sufficient to allow accurate estimation of ECS and equilibration time;”

This is possibly true with regard to ECS, in particular it needs to be shown that ECS is decisively linked to anything that is observable prior to equilibrium. Some of the effects that influence ECS may have very long time lags, very slow rates of initiation and be non-linear. That is they may not be LTI (Linear Time Invariant) processes. It seems likely that ice albedo changes may be non-linear. Assumption of LTI seems to be implicit in most of these studies but is not shown to hold. Even if it does hold without knowing the true nature of the response function it is not obvious that late onset effects can be taken into account.

Another issue is the use of an “equilibration time” even though the analysis shown in some papers indicates that it may not be meaningful, in that perturbation response time is not constant but scales with time. That is it is short when looking at short term perturbations and long when looking at long term perturbations.

For the second part:

” a flux response function with a very long tail (high ECS, high equilibration time)may give a result similar to the Schwartz exponential response function with a low ECS and low equilibration time.”

A very long tail can result with neither high ECS nor high equilibration time. Some functions have naturally long tails and are not exponential and hence have no meaningful time constant.

These considerations used to be accounted for in simplified models, notably upwelling diffusive ocean models which do exhibit the long tail and scaling of perturbation response times that are observed. Why these are not used in these studies is mystery to me? It could be that nobody knows about them or that no oceanographers contribute to these studies, or that they are too difficult to understand or simply no longer available to the authors.

If one considers that the most significant single element in the determination of climatic response is the oceans, which I do, one might think that oceanic modelling would be a high priority in the construction of all models, from the simple to the AOGCMs. It seems to me that they are, and have historically been the poor sisters of the modelling world. Perhaps the oceans are not as exciting as the atmosphere. I can not comment on the AOGCMs used for ARx reports as they never seem to publish their OHC data, only their SSTs. I do know that an earlier version of a Hadley AOGCM was reanalysed using an upwelling diffusive model some years ago and the indications were that the model was burying large amounts of heat in the abyssal ocean. The need for a reanalysis was due to the model not recording oceanic data during its runs. I do not know if data recording has improved but I do note that none of the model data available at Climate Explorer includes oceanic data beyond SSTs. It might be very revealing to know what the models predict for 20th Century OHC. There is a suspicion that many or all of the models have a thermally heavy oceanic component, but it is just that a suspicion. Until more detailed data is available there is nothing to compare to reality. There is one exception that I know of and that is that the GISS model did for a period 1993-2003 (ish) agree with the OHC data (with some further assumptions). Since then it seems doubtful that such an agreement could hold. It strikes me as bizarre that model OHC data is not better reported as it could be a decisive factor in evaluating the models’ strengths and weaknesses. Should it turn out that the models do indeed have oceanic thermal properties that cannot be reconciled with reality it would seem dubious to accept their informing of us with respect to long term temperature prospects.

Alex,
Thanks for a thoughtful and thought-provoking response.
You wrote:
“I can not comment on the AOGCMs used for ARx reports as they never seem to publish their OHC data, only their SSTs. I do know that an earlier version of a Hadley AOGCM was reanalysed using an upwelling diffusive model some years ago and the indications were that the model was burying large amounts of heat in the abyssal ocean.”
Firstly, on a factual matter, the OHC comparison which I published above…
…was not intended to be a comparison with real-world observations, only a comparison with the GISS E ensemble mean result, which should correspond to the reported GISS E temperature profile which was simultaneously matched. The noisy image makes this less than clear. I agree with you that generally OHC results are not published – which is frustrating to say the least.
On your second point, here, my first reaction on seeing the unbelievably good match to temperature was to leap to the assumption that GISS E was burying heat in the deep ocean – or losing heat to some invisible sink. However, the match to GISS E OHC seems to preclude this possibility AND to also suggest that the “0 to infinity” integral of the flux perturbation term implicit in the GCM, or, if you prefer, the total energy packet associated with an individual forcing step, has to be very close to that in the Schwartz model. This leaves limited room to manoeuvre when considering alternative response functions. I need to think further about your other points.

At the time of writing I was not aware of what was being presented on Real Climate. I have since seen it and it contradicts my view of the models having “heavy oceans”. I am surprised. I have posted on Real Climate

the first being an enquiry relating to obtaining the data (to which Gavin kindly responded) and the second which is a view on its implications in light of a Hansen… Gavin, et al paper from 2005 (details in posting) and in light of the “missing” heat, which the ensemble OHC seems to imply is not an issue. I await a reply to certain queries I included in this comment. Unless I get a response that will be as far as I go on Real Climate.

Their presentation seems to square the data with the models but begs the question of what the lower OHC uptake (I pose a figure at Real Climate) implies for the sesnitivity. Normally one would infer a lowering of the short run sensitivity and a lessening of the “warming in the pipleline”. I do however suspect that this is not the whole story as the legend on their graph is Ocean Heat Content 0-700m and one might conclude that the modelled values also refer to just this portion and so the remainder could be anything but the paper I quite there gives us a steer on how Model E used to perform in this respect.

That is it for now, and as usual I do not know what to make of the models, I will try and respond to your points, and hopefully soon.

Thanks, Steve.
The author, M.Crucifix, was formerly at the Met Office’s Hadley Centre, Exeter, UK. Towards the end of the paper Crucifix makes the following statements:
‘…climate sensitivity cannot be easily estimated from the Last Glacial Maximum global temperature…’
and
‘…global estimates of the LGM temperature only weakly constrain climate sensitivity for two reasons: (i) the forcing is not known accurately and (ii)
the ratio between LGM and CO2 feedback factors cannot be
accurately estimated from current state-of-the-art coupled
models…’
I take that to be a conclusion by the author that the ice-core and other Pleistocene records as they stand provide no clear independent estimate of the ‘climate sensitivity’. That’s encouraging, anyway.
Coldish (formerly ‘hr’)

I found his comments on determining the validity of modeled climate sensitivity by other factors besides just temperature equally interesting. I have an interest in irrigation and it’s influence on the climate but just not just for the modern age. When mountains of glaciers are melting you end up with a lot of natural irrigation, new rivers lakes and streams, and a completely new hydrologic system. I can’t imagine any paleoclimatology study being accurate without being able to accurately reconstruct how these influences would interact with the climate system.

I’m not sure how well this fits this discussion, but I have three major issues with using any of these techniques for this case.

The first issue is that they are being used to define whether there is or isn’t a problem. Since the possibility space for the existence of major problems is large, calculating probabilities or possibilities of each is sure to lead to false positives. Note that AGW is one of the more difficult problems to deal with. While I routinely use estimates to provide a basis for solving a problem, I have never and would never use them to define a problem due to the uncertainty in the underlying information. I’ve seen enough false positives using very well known hard statistical techniques to not venture into even more shaky territory.

Using a regrets based decision making process (which I take this analysis is an input to) is known to cause an irrational shift toward avoiding unlikely events.

Even in a regrets based decision process, the proposed process being used by the IPCC and others is fundamentally flawed. The process is essentially answering the question “what steps should be taken to counter AGW”. AGW doesn’t exist in a vacuum – the possibility space for disasters of various kinds is much larger. The question that actually needs answering is how much resources to spend in what time frame on which possibilities. Otherwise you can wind up like Australia – spending your money on the predicted AGW driven droughts while suffering a devastating flood.

Adding more and more sophistication to analysis and decision making doesn’t improve the fundamentals.

Lies, damn lies, and statistics. And the inferences aren’t derived from observation, but model outputs – models that can only do exactly what (human, hence systematically biased) programmers tell them.

Employing mathematical slight of hand to pretend to in some way “know” what can only be accurately described as “unknown” is not helpful. To science or politics.

I was under the impression that AR4 assigned a very high probability (>90%) to the climate sensitivity being in the range of 1.5 to 6 deg/doubling. That would mean (assuming equal chance above and below that range) approx 5% chance of being below 1.5 (and not 10% as you stated).

“It is likely to be in the range 2C to 4.5C with a best estimate of about 3C and is very unlikely to be less than 1.5C. Values substantially higher than 4.5C cannot be excluded, but agreement of models with observations is not as good for those values.”

Meaning that (according to the fat tailed pdf often found) the chance for sensitivity below 1.5 is actually (much) lower than 5% according to AR4 and the <10% chance that Judith quoted isn't helpful to understanding that.

I know just enough about statistics to get into trouble but not enough to get out of it, but here goes anyway.

The IPCC gets its 2-4.5C climate sensitivity range from Table 8.2 of the AR4, which lists 19 climate model-derived equilibrium sensitivity estimates that have a mean of 3.2C and a standard deviation of 0.7C. This gives a range of 1.8C to 4.6C at the 95% confidence interval, and the IPCC rounds these numbers off to 2C and 4.5C.

However, all the IPCC models use basically the same assumptions and algorithms, so these estimates allow only for modeling uncertainty. What happens when we introduce other sources of uncertainty?

An obvious one is the assumption that CO2 continues to warm the atmosphere long after it is emitted – for periods of hundreds of years, according to some models – which is why the IPCC’s equilibrium sensitivities average 80% higher than its transient sensitivities. But other studies, notably the Schwartz study referenced above by Ron Cram, conclude that equilibrium is in fact reached in a few years, in which case the transient and equilibrium sensitivities will be the same. The bottom line seems to be that we still don’t really know whether the equilibrium or the transient sensitivity is the one we should be using.

What happens if we assume that both have an equal chance of being right? Well, we have to combine the two sets of estimates in AR4 Table 8.2, which gives us 38 estimates with a mean of 2.5C and a standard deviation of 0.9C, and 95% confidence limits of 0.7C and 4.3C. Now we have an uncertainty range of almost a factor of six.

Then we can add the uncertainties associated with the estimation of radiative forcings. Climate sensitivity is calculated by factoring temperature increases with forcings, but how accurately can we estimate the forcings? According to AR4 Figure SPM.2 the estimate of 1.6 w/sq m of total anthropogenic forcing since 1750 (natural forcings are deemed to be insignificant) could actually be anywhere between 0.6 and 2.4 w/sq m. If the forcing estimates obtained from the models are subject to the same levels of error then the range of climate sensitivity uncertainty goes out of sight (a crude estimate assuming only +/-33% uncertainty in the model radiative forcing estimates expands the 95% confidence range from 0.7-4.3C t0 0.0-5.0C.)

We could add other uncertainties that would make the situation even worse. but there’s no need too. The above examples show that we can generate climate sensitivity uncertainty limits large enough to satisfy even the most rabid skeptic simply through good old-fashioned probabilistic analysis, without even worrying about plausibilities, possibilities, implausibilities and impossibilities.

One of the important observations of Schwartz is that a process that leads to equilibrium very slowly, must be weak. Therefore it allows the warming to reach almost its full value rapidly. The very long delay applies then only to the small additional correction.

I was struck by the ensemble of pdfs in Figure 9.20 of IPCC AR4 that you referred us to. In my area (accounting research) there are mathematical constraints that limit the ranges of some variables, but in the absence of mathematical or logical restrictions we always find that pdfs spread outside the regions we would like them to stay in. There has been discussion on other threads of positive and negative feedbacks, making it clear that there is at least a logical possibility, however implausible, that negative feedbacks might dominate. But every pdf in Figure 9.20 goes hard to zero at or (usually) 0.5-1C above zero – this is clearly not just an artifact of the IPCC having cut its composite chart off at zero.
So either the underlying models are biased in that they cannot generate net negative feedbacks, or their results have been fitted to non-negative distributions such as the gamma. Either of those would bias the sensitivity upwards, and if everyone is doing it then this bias does not average out in the ensemble.
On a different point, there have been suggestions for getting priors from an ensemble of expert opinions. But what distinguishes an expert from a beginner is their wide experience of such similar cases. Experts do not (and sometimes cannot) describe how they reach their judgments. They appear to have built up templates of what happened in similar situations, and they retrieve the right templates based on cues (which they may not be able to identify if asked). There are thus no experts in the effect of forcings on climate – if there were, we would not need to be here. There are experts in the underlying physics, and in how particular models work; but asking them for their prior probabilities in this context is to take them outside of their expertise.

Paul – Zero probability of 0.5-1.0 deg C does not exclude negative feedbacks, since the no-feedback forcing estimate yields a value of 1.2 C in most estimations. A value of 0,5 would signify substantial negative feedback.

Fred, You write ” A value of 0,5 would signify substantial negative feedback.”

Wrong. This would only be true if there was such a thing as a no-feedback sensitivity of 1.2 C. Since the estimation of non-feedback sensitivity by the IPCC is based on wrong methodology, then the 1.2 C is meaningless.

You note that “… every pdf in Figure 9.20 goes hard to zero at or (usually) 0.5-1C above zero – this is clearly not just an artifact of the IPCC having cut its composite chart off at zero”.

Indeed it isn’t. The reason these pdfs show no negative climate sensitivities is that the estimates consider only periods when both forcings and temperatures increased (or decreased, such as in the case of the Pinatubo eruption), which makes it impossible to estimate a negative climate sensitivity. In other words, it’s an artifact of the data selection.

As to why they should all cut off at 0.5 or 1C, this could also be an artifact of data selection (periods with low positive climate sensitivities were rejected) or it could be that the analytical procedures are skewing the pdf. (In fact I got so interested in this question that I broke off writing this and carried out a spreadsheet analysis to find out what the pdf looks like when a more objectively-selected data set is used. I selected the period between 1890 and 2003, which includes intervals when radiative forcings increased but temperatures decreased, calculated climate sensitivities from the annual changes in surface temperature and radiative forcings, and plotted the pdf. It showed not only negative climate sensitivities but also a normal distribution, which leads me to believe that the pdfs in the IPCC figure are indeed artificially skewed.)

Roger – calculations based on annual changes will yield spurious results if compiled into a distribution rather than averaged out. The physics of the forcing/temperature relationship are such that much of the response emerges more prominently over multiple years than a single year. Using year to year variations is susceptible to domination by internal climate variation such as ENSO, which over the short term is likely tol overwhelm external forcings, but over multiple years becomes subordinate to long term effects of persistent forcings.

I went back and applied a 1-2-2-2-1 filter to remove ENSO effects and got another normal distribution with a substantial negative population. I then applied 11-year smoothing to remove the Schwabe cycle and got the same thing. So I guess my question is; how long a period must I average the data over before I obtain a representative pdf of the “persistent forcings?”

Roger – The centennial Temperature Trend is associated with positive forcings from about 1910 through the 1940’s, and from about 1978 to the present. During mid-century flattening, the forcings are uncertain but there is evidence from diurnal temperature variations to suggest an import role of negative aerosol forcing. I don’t see protracted intervals when forcing and temperature changed in the opposite directions, although such intervals would be common on a year to year basis. Maybe you can elaborate on what you did.

“I don’t see protracted intervals when forcing and temperature changed in the opposite directions.”

Between 1940 and 1970 the GISS global surface air temperature time series shows 0.15C of cooling but the GISS total radiative forcing estimates show an increase of 0.25 w/sq m. You can check the numbers at:

Roger – I believe you are misinterpreting the data. Mid-century forcing and its sign are not well determined, but the magnitude is thought to be small. The cooling was a dip around 1945-1950, but for much of the remaining interval, the temperature was flat with mainly interannual ups and downs. It was not a protracted multidecadal cooling trend. The 1940’s bump and dip were probably mediated mainly by internal climate variations involving ENSO, PDO, and AMO. I see no evidence of protracted positive forcing accompanied by protracted cooling. Here is the Hadcrut 3 dataset for global (land plus ocean) temperatures – Hadcrut3

Positing a true negative climate sensitivity is not supported by the data and has no plausible physical mechanism. I believe the fact that one can derive many conflicting conclusions from analyzing short term variations illustrates the problems of trying to extract decisive results about mechanism from such data. I believe it would be incorrect to conclude that the absence of negative climate sensitivity in the published analyses is the result of selective choice of data intervals. It’s based on physical reality.

“Positing a true negative climate sensitivity is not supported by the data and has no plausible physical mechanism.” I agree. The thesis that a doubling of CO2 causes a temperature decrease, all other things being equal, is of course untenable. However, I never claimed that my sensitivities were “true” sensitivities. They were calculated mathematically simply so that I could plot a pdf.

“I believe the fact that one can derive many conflicting conclusions from analyzing short term variations illustrates the problems of trying to extract decisive results about mechanism from such data.” Nine of the studies listed in IPCC AR4 Table 9.3 used data sets that were shorter than mine, and two of them (Wigley and Forster & Gregory 2006) used data sets that were only ten or twenty years long and which showed nothing but short-term variations.

“.. the absence of negative climate sensitivity in the published analyses is … based on physical reality.” This may well explain why the pdfs in the IPCC figure show no negative sensitivities – they were rejected as impossible.

Finally on the question of HadCRUT3. Speaking as someone with substantial professional experience in verifying data bases to regulatory agency standards, and having spent uncountable hours reviewing the procedures used to construct HadCRUT3, I would respectfully suggest that you base your analyses on a series that is more representative of actual 20th century surface air temperature variations.

Forcing away from an equilibrium would, all other things being equal, tend to raise the temperature but that would tend to add to the forcing, it is a runaway unless or until the sesnsitivity once again becomes positive.

Now there may be some evidence for this happening during ice ages. View the wiki graph:

Now if the data be reliable this has some interesting properties.
As the temperature drops (right to left) the variance or amplitude of the wiggles in the signal increases, but it also slows down (changes from 41kyr to 100kyr cycles). Some might regard this as the telltale of the approach to or occupancy of a bifurcation,which in turn may be due to the reciprocal of the sensitivity approaching (or passing through zero and becoming negative) during the transitions. One of the predictions of temporal chaos theory is that bifurcation approach would be compatable with the observation of increasing variance and increasing persistence (slowing down).

Now I do not know how this data is normally interpreted but if it is due to proximity to a bifurcation it would make trying to use the ice age data to try and infer current sensitivity a little dubious. As far as I am aware there is no direct evidence of a bifurcation currently lying at higher temperatures but it is a theoretical possibility if certain events occurred most notably the triggered release of methane, lots of methane due to rising temperatures. That has to be squared against why it has not already occurred recently (current and previous interglacials) and apparently not commonly in the much more distant past (PETM event if that was due to clathrate release or CO2 productions and probably occurred at much higher temperatures). We also have the Antarctic thawing (~25myr ago) which also probably occurred during times warmer than the present and does not appear to have been associated with a PETM like temperature spike. That is specualtion based on, all other things being equal, but they rarely are, but then so much speculation has similar assumptions.

I am having trouble conjuring a situation in my mind where in a stable system the climate sensitivity could be less then zero. For instance imagine, and it is just imagining as far as I know, that upon achieving a certain temperature the low cloud cover increases and the world cools. This would not make the world continue to cool as the temperature would get slightly below that threshhold and would warm back to it. You would end up with stasis and a climate sensitivity of zero but not negative at least for any length of time. There have been hypotheses regarding tipping points such as the release of large amounts of fresh water released into the ocean causing the Younger Dryas but this is not really a negative sensitivity but rather an adjustment of forcing to new surface conditions. This of course brings me to my major complaint about using paleoclimate data as a way to help set limits on today’s climate sensitivity. There is no reason to believe the climate sensitivity is stagnant and doesn’t change in profound ways with varying surface coverage.

I estimated sensitivities simply by multiplying the TOA forcing change by 3.7 and dividing by the temperature change. This inevitably generated some “negative” sensitivities, particularly over the 1940-70 period (see my response to Fred Moolten above).

However, I don’t claim that the 1940-70 cooling is indicative of a negative climate sensitivity relative to increasing CO2. This cooling in fact had nothing to do with CO2 and probably had little or nothing to do with sulfate aerosols either. It was mostly if not entirely a result of a change from a warm to a cold PDO cycle in 1939 (with a later assist from the AMO), and I get my negative sensitivities because the forcing estimates don’t allow for the heating and cooling impacts of ocean cycles.

If F is the flux balance then the sensitivity is the reciprocal of ∂F/∂T not of dF/dT, you need to find the partial derivative which in general cannot be done, as you need to be sure that either you have factored out all the other things that can change F or know for certain that nothing else changed F whilst T varied.

That is the problem.

That dF/dT is sometimes negative is a happenstance due to other factors and does not inform us that ∂F/∂T is negative.

∂F/∂T is not necessarily an observable, it is a model parameter with a value that could be inferred from the data.

This is all tied up with another model parameter, the forcing, which is even more slippery.

By way of illustration, if at time t=0 we have equilibrium in that the flux balance is zero, and a temperature Ta. Then a forcing is applied casuing the system to develop and may approach another equilibrium state where the flux balance has returned to a value approaching zero but with a new value of the temperature Tb.

ΔT = Tb-Ta but F ≈ 0 at the beginning and at the end hence ΔF ≈ 0.

ΔF/ΔT ≈ 0, so ΔT/ΔF ≈ ∞, which is clearly not the answer we want.

So we model F as a function of both the temperature and another factor g dependent on time but not on T.

F = F(T,g(t))

so dF = ∂F/∂T·dT + ∂F/∂g·dg

and for the experiment above

ΔF ≈ ∂F/∂T·ΔT + ∂F/∂g·Δg = 0

so ∂F/∂T·ΔT ≈ -∂F/∂g·Δg

and ∂F/∂T ≈ -(∂F/∂g·Δg)/ΔT

where -(∂F/∂g·Δg) is “considered” to be a “forcing” and be due to changing some variable g (usually considered to be logarithm of the CO2 content).

Now the scaling factor -∂F/∂g is not directly obervable but can be modelled (RTE). g is in this case an observable the ln(CO2).

That is about the best I can do without it all getting horribly confusing. We are seeking to infer a value to a non-observable parameter caled the sensitivity, by way of an observable ln(CO2) and a modelled scaling factor.

Now if we could observe ∂F/∂T directly that would be fine but it is noticed that so far no one has convinced us that it can be done.

Finally ∂F/∂T as an observable might not give us the correct value for ΔT.

The equation (strictly an approximation) rearranges as

ΔT ≈ -∂F/∂g·Δg/(∂F/∂T)

∂F/∂g is still a modelled value and we have a “≈” not a “=” which states that the short run or transient value for the sensitivity may vary from for long run or equilibrium value.

“If F is the flux balance then the sensitivity is the reciprocal of ∂F/∂T not of dF/dT, you need to find the partial derivative which in general cannot be done, as you need to be sure that either you have factored out all the other things that can change F or know for certain that nothing else changed F whilst T varied.”

I think we are all in agreement that apparent “negative sensitivities” are in fact not due to forcing/temperature relationships but to those “other things” you mention – particularly internal climate variations. However, that being the case, I would argue that it makes sense to exclude sub-zero priors from pdf estimates. One could argue, I suppose, that the same thing applies at the upper end, but we already know that the assignment of upper end priors is problematic, and it’s unlikely that moving it a few degrees one way or the other will make a huge difference. In fact, Annan and Hargreaves above assert that the Cauchy distribution minimizes the effect of large changes in the upper end, although I don’t feel qualified to evaluate the choice of this or other modalities.

I think there is general agreement that negative climate sensitivities can be attributed to “internal” effects such as cold PDO, AMO cycles etc. , and that these effects are not accounted for in TOA forcing estimates.

However, if we are to discard these negative estimates we must also adjust the positive estimates for the warming caused by warm cycles of the PDO, AMO etc. You claim that this would not make a huge difference, but actually it would.

Here is an example. Between 1910 and 1970 global surface temperatures increased by about 0.2C and radiative forcing increased by 0.4 w/sq m. This equates to a climate sensitivity of 1.9C, in line with the IPCC’s transient estimates. But between 1940 and 1970 the PDO, AMO etc. switched to cooling cycles and temperatures fell while forcings rose, giving us negative sensitivities. So we discard these data and analyze only the period between 1910 and 1940. Now we get a climate sensitivity of 8C (delta T = 0.35, delta F = 0.16.) This inflated estimate is largely if not entirely a result of ocean-cycle warming, maybe with an assist from the sun, but it still gets counted as a response to increasing CO2.

Roger – I can’t confirm your statement that the AMO switched to cooling between 1940 and 1970 – AMO . Also, I believe the “assist from the sun” prior to the 1940’s was a substantial contributor to early twentieth century warming – Solar Influences on Climate

I do agree that internal climate variations can amplify or diminish computed climate sensitivity, but here we are talking about the role of priors and not the final estimates. Do you believe that if, using the type of analysis described in the Annan/Hargreaves paper, a uniform prior were extended at its upper end by say 0.4 deg C, that would significantly change the ultimate sensitivity estimate? The examples they cite make that seem unlikely.

I noticed that you calculate a (spurious) value of 8 C, which is already within the 0-10 uniform prior range in some of the Annan/Hargreaves examples. However, if the range had been 0 t0 6 C (no-one would make the priors smaller than that), the 8 C value would require adding 2 C, not 0.4, but I still don’t think it would make much difference. For a 10 C upper limit, adding 2 C should matter even less. If you have data over longer intervals showing spurious values much greater than 8 C, that would change things, but I suspect that as the intervals lengthen, the errors will diminish.

I’ll look for it. In rereading my own comments, it occurs to me that I should have emphasized changing upper bound priors in the downward rather than upward direction, if we are confining lower bound priors to be above zero rather than lower – we should shrink the range from both ends. Quantitatively, I would still suggest that relatively small changes in priors will have only minimal influence on the final sensitivity estimates.

I would just note that all this would be mathematical nonsense if T meant what I think it means – namely a spatial average of temperatures Ts.
In this case Ts is not a function but a functional depending on the true temperature field T(x,y,z,t) and ∂F/∂Ts doesn’t exist – you can’t derive a function (F) with regard to a functional (Ts).
∂Ts/∂F would be a functional derivative of Ts which is useless in this case.
dF/dTs is even worse because dTs isn’t uniquely defined at all.

It happens much too often that people don’t even know what they may and may not do with functionals.
The whole concept of sensibility is ill defined and for a given variation of F there will be an infinity of different variations of the functional Ts depending not only on the path followed by the system but also on time.

Believing that those different variations of the functional Ts follow some invariant probability distribution independent of time has no rational foundation.
At least I have never read a beginning of a mathematical justification of such a belief.
One would have to prove a kind of least action theorem for the climate system what is not proven and would be almost certainly wrong.

“I would just note that all this would be mathematical nonsense if T meant what I think it means – namely a spatial average of temperatures Ts.”

This is indeed what T means – most climate sensitivity estimates are based on temperatures averaged over large areas (global, 60N-60S, the North Hemisphere, the Tropics etc). But because these averages typically lump together some very different regional trends what we effectively get is a blend of apples and oranges. Then we compound the problem by mixing air temperatures with SSTs that don’t track the air temperatures (air temperatures show 0.3C of warming since 2001 but SSTs show none), giving us a blend of apples, oranges and pumpkins.

How does this blending impact things in practice? Well, when we combine 20th century global forcings with air temperature changes in smaller areas we get the climate sensitivities shown below (note: these estimates are intended only to show the range of variation. I don’t claim they’re “real”):

Roger – I would argue that temperatures aren’t averaged, and consequently average temperature is not used to compute global delta T. Rather, what are averaged are grid-based temperature anomalies, so that it is change (delta T) in each region from one year to the next that is used for averaging rather than global average temperatures. While precisely how to weight these anomaly means is subject to disagreement, they have value as long as the weighting function is not unreasonable and is consistent from one year to the next.

Well yes, we average temperature anomalies, not temperatures . But that’s what I did. I identified the “very different” regional trends by averaging temperature anomalies, and I used area-weighted averages of temperature anomalies for the larger areas in the examples.

Let me try to restate the problem in a different way. When we construct a global temperature time series we are making the implicit assumption that regional variations will cancel out when averaged, giving a globally-representative result. And because TOA forcing changes have been substantially the same everywhere the corollary assumption is that temperature changes will have been substantially the same everywhere too. But they weren’t. The pre-1970 warming and cooling periods that appear in the “global” record were actually confined almost entirely to high northern latitudes, and the post-1970 warming has been strongly asymmetrical, decreasing monotonically from over 1C in the Arctic to about 0.5C at the Equator and to around zero in the Antarctic. In other words, the “global” temperature record before 1970 isn’t global at all, and since 1970 it is broadly representative of temperature changes at and around the Equator but nowhere else.

So should we even be using the global surface temperature record as the metric for evaluating climate sensitivity? Well, we can argue the point both ways, but to me it’s like basing an analysis on the study which showed that the average adult US male is 5’7″ tall, weighs 157 pounds and has one breast and one testicle.

I don’t think my disagreement is with you. Rather, I have problems with Tomas’s claim that we are trying to do something mathematically impermissible – achieve a spatially averaged surface temperature. That is not what is happening when one calculates anomalies and then averages them.

Although I believe global temperature anomalies have proved informative, I agree with you that they should be complemented with regional data. Over the long haul, we are beginning to gain some understanding of how the two are interrelated.

Yes, I think the polling only works when the experts are “independent variables”. When groupthink has been hard at work for decades, you get nothing except a read on the current “consensus”. Consider, then, the Iron Lady’s description of “consensus”:

The process of abandoning all beliefs, principles, values, and policies in search of something in which no one believes, but to which no one objects; the process of avoiding the very issues that have to be solved, merely because you cannot get agreement on the way ahead. What great cause would have been fought and won under the banner: ‘I stand for consensus’?

“In summary, given the large uncertainties, I am unconvinced by Annan and Hargreave’s analysis in terms of providing limits to the range of expected climate sensitivity values. Expert judgment and Bayesian analysis are not up to this particular task, IMO.”

There are two issues here.

First the specific question of what were A&H setting out to do, and how well did they do it?

Second how well can the uncertainties in climate sensitivity be represented by classic probabilistic statistics?

On the first, my reading is that all A&H were seeking to show was that if you wish to use Bayesian techniques to achieve posterior probabilities for climate sensitivity then using uniform priors as per IPCC Fig 9.20 leads to results that are very sensitive to the upper bound. They also criticise unifrom priors as something anyone would believe

This obviously calls the use of this prior into question.

They also make the point that Bayesian techniques do depend on the prior being “ignorant” of the observations i.e. you need to watch out that you don’t use information that is both in the prior and the observations, otherwise you’ll obviously bias the results. On this basis they make the point that using outputs from climate models to generate the “observations” (apart from the obvious objections of Stainforth et al about not knowing the uncertainty) you are very unlikely to know what the overlap is with the prior.

Just what went into the pot to give the nice hot soup?

So overall I think they succeed in building a strong case for IPCC to cease using uniform priors in reporting climate sensitivities, and have shown that if IPCC wishes to use priors to in reporting these, the best estimates reduce.

Turing now to the second question of whether using quantified probabilities are appropriate here, I’d note (if I have this correct) that the use of the word “ignorance” above is not the same as in Dr Curry’s reference later:

“The issue that I have with this is that the level of ignorance is sufficiently large that probabilities determined from Bayesian analysis are not justified. Bayesian reasoning does not deal well with ignorance. The fact that others have created pdfs from sensitivity estimates and that economists uses these pdfs is not a justification; rather, climate researchers and statisticians need to take a close look at this to see whether this line of reasoning is flawed.” my emphasis.

Here we are not talking about the prior being ignorant of the observations (a highly desirable thing in Bayesian terms). We are, as I understand it, starting to argue that we simply lack another information to deal with probabilities in quantitative terms. This doesn’t just raise issues for measuring and reporting climate sensitivity, it extends on into any decision theory that may depend upon it.

I had a bit of a poke around here – specifically looking at Walley’s criticism of Bayesian inferences and of course found a group focused on this issue – The Society for Imprecise Probability : Theories and Applications (www.sipta,org). Some members of the climate science community seem to be members.

Stephen – If the fit of Hansen’s model to observed warming trends were the definitive criterion for climate sensitivity, it would exclude both very high and very low climate sensitivites, and would set the actual value at 3.3 deg C per CO2 doubling. In fact, other results yield different values, and it is the spread of values that leads to probabillity estimates of the type addressed in this thread.

One cannot simultaneously claim that the period 1945 to 1980 was artificially cooled by aerosols and then calculate a trend from that artificially cooled point as if it were purely CO2 warming as opposed to aerosol-reduced recovery. To be consistent you need to go further back, ie to the period before the cooling. That would give you half the trend.

If however you claim that this aerosol cooling was not really there, as you have to for the realclimate exercise (and perforce Gavins previous arguments about that were wrong too), then there is considerably more natural variation than anyone assumed and hence who is to say that we are not in another temperature blip as per the 40’s.

As you stated above, we need another 10 years to judge. If it continues to flatline (noting that gisstemp is a suspicious outlier in this regard) then that 3.3 gets less and less. If you use the better performing forecasting techniques available of course you would put rather more weight on recent data.

Her’s another scenario, the real trend is for the entire century and is 0.6/century and is a natural recovery from the little ice age. And if you don’t accept that as a possibility, then please tell us what caused the little ice age in the first place.

First on the AMO. You are quite correct. It didn’t switch to cooling until well after 1940. However, I believe I did mention in a previous post that the “assist” from the AMO was delayed.

Second on the influence of the sun. Yes, my forcings up until 1940 do include the solar contribution, which in fact turns out to be most of it. However, there was an equal increase in solar forcing between 1940 and 1970, when temperature trends reversed. My estimates also come from GISS, not the IPCC. The IPCC estimates a total solar forcing increase of only 0.12 w/sq m since 1750.

As to the role of priors, I think the problem is at the low end, not the high end. None of the Annan/Hargreaves priors go below zero, and while this may be physically realistic it does not allow for the fact that the observational data generate negative sensitivities, mostly because of ocean cycle warming and cooling effects that the radiative forcing estimates do not take into account. So when we truncate the priors at zero we are cutting off the tail of the data distribution at the left end but leaving the right tail in.

The zero prior also seems to have the effect of causing the posterior pdfs to go rapidly to zero just left of the mode. All three of the pdfs in Annan/Hargreaves Figure 2 in fact go to zero around 1C. But based on what we know a 1C climate sensitivity is far from impossible. In fact, if we assume that CO2 warming causes no significant positive feedbacks – and the jury is still out on this – then 1C or thereabouts is the number we would expect to get.

Anyway, based on these results I am now going to withdraw my earlier and ill-considered statement that climate sensitivity estimates are biased by subjective data selection (which now that I think about it was untenable anyway because most of the studies use the same +/- 100 year instrumental record). Instead I am going to posit that they are biased by priors that respect the theory but not the observational data.

Which brings up a question from me to you. What do you think the posterior pdfs would look like if the prior was lowered from zero to, say, minus 5?

Roger – Thanks for the comment. Regarding your last question, I find it hard to answer without specifying an upper bound. From memory, I believe that some rather extreme assessments of the data have yielded ranges of climate sensitivity values up to 10 C. If both the upper and lower bounds of the priors are extended 5 C beyond what is generally thought within the realm of physical plausibility, I believe the main result will be to increase the posterior probabilities in both directions with probably little change in the modal value. Would this increase in uncertainty move us in the direction of a more realistic understanding? I don’t think so, but that’s a matter of judgment. There is already much complaining that currently utilized upper bounds are too high and lead to unrealistically high probabilities of sensitivities far greater than the canonical 4.5 C limit. We should probably strive for a Goldilocks solution – not too high, not too low, but just right.

Finally, it’s slightly off-topic, because we are discussing Charney type sensitivities for climate responses that are discernible over the course of perhaps a few centuries at most, but Jim Hansen has argued that when longer term responses are included (e.g., disappearance of land-based ice sheets), a reasonable modal value is 6 C per doubling, and an upper limit is considerably higher.

I am curious about your model inputs. Solar increase would be quite low, ~0.1 W/M^2, Volcanic lull would be? (I think it is over estimated in most models since it persisted to circa 1960.) Oscillation shift appears to be the main driver, but it is difficult to model. While the AMO went negative in the mid sixties, its decrease started earlier. So I would think that “negative” should not apply to climate oscillations, rather min and max, as far as inclusion in the models. Negative feedback should be limited to items that directly reduce incoming radiation.

This may make little sense, but the climate shifts just move energy around. The impact on climate is a result conditions the energy encounters. That is why all la nina’s are not created equal.

Volcanic eruptions caused some large short-term negative forcings (according to GISS, that is – I think they were actually much smaller) but had no overall effect.

The impacts of ocean oscillations aren’t allowed for because these are “internal” heat sources. The GISS estimates only take TOA forcing changes into account.

We could get rid of the negative forcings by subtracting the impacts of ocean cycles etc. from the temperature record, leaving only the
anthropogenic component. But to do this we would need to know exactly how the earth’s climate works, and if we knew this we wouldn’t need to do it.

I was thinking about how to include what little we do know about atmospheric and ocean cycles into the models. All of the cycles are quasi-cyclic it seems. So using a scaled absolute value of the best guess of the cycles would make more sense to me than trying to include them as variable positive and negative forcing.

I was thinking that the ~1913 to 1940 period could provide some information on what factor would be reasonable, if more accurate solar and aerosol forcing could be teased out from that period.

My little thought seems to have stirred up a small teapot sized tempest over at RC.

“I was thinking about how to include what little we do know about atmospheric and ocean cycles into the models”.

Strange you should ask. I did a modeling exercise a few years ago by tabulating different forcings – CO2 concentrations, sulfur emissions, total solar irradiance and the PDO and AMO – giving each a weighting and playing with the weightings until I got a best fit to the 20th century global temperature record. (This is officially known as a “phenomenological” model, but I find “experimental” a lot easier to say.)

What did I get? Well, I got very close matches between the “model” and smoothed observations (R around 0.98). I also got substantially the same results regardless of which solar reconstruction I used (there are many). About 40% of the warming was always natural and about 60% anthropogenic. But almost all of the natural warming, including the mid-20th century hump, was caused by the sun. The PDO and AMO contributed some minor bumps and wiggles but not much else.

Then I thought, comparisons involving only the global record don’t allow for the fact that temperature trends in some areas are quite different to the global trend, which as I noted in my earlier exchanges with Fred Moolten is a potential source of uncertainty in estimating climate sensitivity. So I segregated the global temperature series into 30-degree latitude-band series (60-90N, 30-60N, 0-30N, 0-30S, 30-60S, no data below 60 S), to see if I could fit them by varying ocean cycle weightings (I added the NOI/SOI, NAO and NAM) while keeping the anthropogenic and solar forcings constant.

Much to my surprise I found I could get reasonably good fits at all latitudes using a constant weighting of 0.3C/watt for the CO2 and solar forcings and by varying the ocean cycle weightings. But now the ocean cycles contributed three times as much to total 20th century global warming as did the sun (0.18 vs. 0.06C). Moreover, when combined the ocean cycles showed a strong 25-30 year peak-to-trough cyclicity in the N. Hemisphere, with peak-t0-trough amplitudes decreasing systematically from north to south (60-90N = 1C, 30-60N = 0.5C, 0-30N = near zero).

I don’t know whether these results are totally realistic, but they certainly look plausible, and they identify ocean cycles as the main source of regional temperature variations. They also allow us to quantify and remove the impacts of these cycles, leaving us with a purely “anthropogenic” series that we can use to calculate climate sensitivity. In fact the 0.29 factor I used to weight the CO2 forcings already does this. It gives sensitivity of 1.1C.

(Fred Moolten, if you are there. I should have mentioned in our earlier discussions that I had done this work. However, I had a senior moment and clean forgot that I had done it.)

I notice in my post above that I identified ocean cycles as the main source of regional temperature variations. (Yes, in this case I am assuming that correlation does equal causation). However, this doesn’t work with the AMO, which tracks the large cyclic air temperature changes in the Arctic very closely but lags them by 5-10 years.

Roger – You may also be interested in evidence (admittedly tentative) that the PDO is driven in part by external forcings, including those arising from CO2 increases. The first link is to a J. Climate abstract (the full article is behind a paywall). The second is to a recorded presentation of the same material by Gerald Meehl –

Thank you. It’s a pity the evidence you refer to is behind a paywall. Since my taxes helped financed the research it irks me exceedingly to have to fork out yet more money to see the results. However, that’s O/T.

As to whether the PDO is driven by external forcings, the problem is that there is no long-term relationship between TOA forcing estimates and the PDO. The rapid increase in the PDO index between about 1975 and 1985 did coincide with increased TOA forcings, but after 1985 the PDO index began to fall while TOA forcings continued to rise. The abrupt decrease in the PDO between 1940 and 1950 – which is a mirror-image of the 1975-85 increase – is so far as I know inexplicable in terms of external forcing.

Check out the recorded presentation by Meehl. He doesn’t suggest that the PDO per se is externally forced. Rather, he suggests that the PDO represents the imposition of forced variability on natural variability of a stochastic nature, with the balance varying from one interval to another.

Annan & Hargreaves reference Forster & Gregory with respect to the latters determination of the feedback parameter which they give as

L = 2.3 ± 1.4 W/m^2/K which is gaussian with σ = 0.7 W/m^2/K

F&G give the 95% confidence interval as [0.9,3.7] as one might expect, and derive a confidence interval for the sensitivity S based on the 2×CO2 forcing being 3.7W/m^2 giving the 95% confidence interval for S as [3.7/3.7,3.7/0.9] = [1.0,4.1]

so P(S<=4.1) ~ 97.5% which equates to P(S<=4.0) ~95%.

Now A&H go to a lot of trouble to find priors to combine with the F&G distribution to show that they can justify P(S<=4.0) ~95% which is F&G's result without priors. I am puzzled by this.

Next there are the pdfs and cdfs in A&H's diagrams.

Now these don't appear to resemble what I would expect to see when working with the inverse of the F&G distribution.

A&H give the likelihood function:

f(O|L) varies as Exp[−(2.3−L)^2/(2×0.7^2)]

which I like, but they seem to invert this to:

f(O|S) varies as Exp[−(2.3−3.7/S)^2/(2×0.7^2)]

which I don't like, but it fits the shape in their diagrams. This is the result of a simple subsitution of 3.7/S for L, which is not I think the same as inverting the distribution.

Inverting the pdf with respect to L to give the pdf with respect to S should I think give:

1/Sqrt[2π×0.7^2]×Exp[−(2.3−3.7/S)^2/(2×0.7^2)] × (3.7/S^2)

where the the last term (3.7/S^2) reflects the scaling of the density due to (dL/dS)

"A Gaussian likelihood in feedback space has the inconvenient property that f(O|L = 0) is strictly greater than zero, and so for all large S, f(O|S) is bounded below by a constant.

The additional term (3.7/S^2) ensures that f(O|S) is not bounded below and hence I don't think the following statement in A&H is true:

"Therefore, the integral of this function (with respect to S) is unbounded."

In fact being bounded below doesn't make much sense as it implies that the likelihood of S = 20 is more or less the same as the likelihood that S = 200 or 2000.

When the pdf given as:

1/Sqrt[2π×0.7^2]×Exp[−(2.3−3.7/S)^2/(2×0.7^2)] × (3.7/S^2)

it integrates to unity as one would hope a pdf would and hence the integral of Exp[−(2.3−3.7/S)^2/(2×0.7^2)] × (3.7/S^2) is bounded.

From this point on the paper doesn't make any sense to me.

I would seem odd for this to be a mistake in the paper and for it to go unnoticed but I cannot see it any other way. In a way I would rather be the one that has erred. So would anyone who can, please have a look at the paper. I have had to surmise some of this as the actual form of the function f(O|S) is not explicitly stated, which is a pity.

I have reproduced A&H’s diagrams and they do indicate that the likelihood function they used is as I indicated above.

I have had some time to think about this and the problem is not just in how they have calculated the likelihood function but in how they have interpreted Bayesian inference.

My understanding of Bayesian inference is as scheme for updating model parameters in the light of new evidence. So the first questions is what is the model parameter we are seeking to update.

We can identify it from the pdf:

1/Sqrt[2π×0.7^2]×Exp[−(2.3−3.7/S)^2/(2×0.7^2)] × (3.7/S^2)

as being the 2.3 term which I shall call µ. F&G gave the estimate µ=2.3 (and also σ=0.7).

rewriting gives:

1/Sqrt[2π×σ^2]×Exp[−(µ−3.7/S)^2/(2×σ^2)] × (3.7/S^2)

Now we need to update µ (incidently the likelihood function for µ is still Gaussian inverting the distribution did not change this). Importantly the prior distribution needs to be of the form P(µ|α) a function of µ not of S. So the information we need from the priors is not a distribution for S which A&H have used but a prior distribution for µ.

Perhaps the easiest have to achieve this and still use the priors that they have chosen would be to invert their priors from the S domain (ºK) to the L domain (W/m^2/ºK) where they would be distributions for µ (you should notice that µ still has the units (W/m^2/ºK), inverting did not change this). Then perform a Bayesian inference update using the F&G distribution and invert back to the S domain.

I don’t dispute that the Bayesian inference update step can come down to forming the product of distributions, but I do say that they must be distributions for the parameter that is to be updated which in the A&H case I would argue they are not.

Now when I started to comment on this paper above I could see that there was something wrong with it but I thought that it must be a small matter but I now consider that I was more misled by the paper (because it presumably passed review) than I should have been, which is my fault. If I now be right it is a major issue in that it is flawed to its conceptual core.

Now either I am getting a whole lot of things horribly wrong or the authors and presumably a gamut of other reviewers and readers are. Which is it to be? I cannot not know that. Whichever is the case it will be a learning excercise, for one or many. As a concensus decision I would loose hands down.

Alexander,
Annan and Hargreaves use in their analysis a Gaussian empirical distribution for L. From this follows that the experiment gives a finite probability for the case that L < 0 and S is not finite (climate is not stable, but there is a real tipping point). Depending on the form of the cost function this may lead to a very strong dependence of the costs on the prior. With the uniform prior the costs may well increase without limit with an increasing upper limit.

Thus their argumentation is to some extent true, but you seem to be right on the error that they have made and on the numerical results. With cumulative distributions switching from L to S can be done directly. Thus the upper limit of S = 10 corresponds to L = 0.37 and L has a value lower than this with a probability of 0.29% while the upper limit of 20 for S is exceeded with a probability of 0.13%. These values are in serious contradiction with their Table 1.

I read your postings with interest, but I’m afraid I can’t follow the mathematics all that well. This being the case, could one or both of you you please answer some simplistic questions for me?

1. If there is indeed a flaw in the Annan & Hargreaves methodology, what would the practical impacts be? (means wrong, distributions wrong, both wrong, whatever).

2. Methodological flaws aside, do you think the climate sensitivity pdfs shown in the IPCC figure referenced at the beginning of this thread are realistic? They show highly skewed distributions and no values below zero, presumably because the prior doesn’t go below zero. Yet pdfs I constructed from the historic temperature data that were used to construct most of them show a normal distribution and up to 40% negative S values (caused by periods of natural cooling that occurred while CO2 forcings continued to increase). Does the zero prior automatically remove these negative values from consideration?

3. The pdfs on the IPCC figure all go rapidly to zero around 0.5 or 1C, and on A&H’s figure 2 they go to zero at 1C. Is this an artifact of the zero prior or is it something else? (I note here that there are some prominent climate scientists who think that 1C or thereabouts is the number we should get.)

Roger,
The problem as I see it, is caused by the fact that we are considering to variables that are the climate sensitivity S and its inverse L = F/S = (1-feedback), where F = 3.7 is the forcing of doubling CO2. Many papers have stated that they can better estimate the feedback (or L) and propose that its empirical estimate has an essentially symmetric Gaussian error distribution.

When the distribution of L is symmetric, the distribution of S has a long upper tail from the small values of L. This is presented in Fig 2. of Knutti and Hegerl. The symmetric distribution of L extends typically all the way to zero, where S goes to infinite. Thus the empirical data allows with finite probability a unstable climate. With the distribution used in A&H this probability is small (0.05%), but still non-zero.

The long upper tail is proportional to 1/S^2 as Alexander explained. The probabilities of large values of S are very much smaller than A&H, when the correct distribution is used. It is not a minor correction, but negates the whole quantitative result that they have obtained. Taking into account the small probability (0.29%) of exceeding S = 10 has practically no influence on the results. The posterior value for the 95% certainty is 3.22 for the upper limit of 10, 20 or any larger value. The difference comes only in the third decimal. Thus their example would lead to the opposite conclusion that the prior has no influence instead of their conclusion that is has a large influence.

This is, however, not the whole truth, because their calculation of damage costs is based on the DICE model of Nordhaus, which leads to much smaller values than some other approaches. If one would adopt some of the other estimates (e.g. Stern), the long term costs might still be very strongly influenced by the prior. The influence of the prior would also be much larger if, the experimental data would not lead to so low likelihood for L < 0. Thus the qualitative arguments of A&H make sense, while their example is seriously in error as far as I can see.

When the pdf of L is Gaussian, the pdf of S has a high tail that decreases proportionally to 1/S^2 at high values of S where the exponential part of the correct formula given by Alexander Harvey is constant. This makes the integral of the distribution finite (and exactly one as any pdf should have). Even so, the tail is long and fat compared to a Gaussian distribution. There are additional problems related to the negative values of L, but their share is very small in this case.

The paper of A&H appears to have left out the factor 1/S^2. This leads to a distribution that approaches asymptotically for large S a non-zero constant value. Thus this distribution does not have a finite integral of one as every proper pdf must have, but grows to infinity when the upper limit of the prior is increased. This is clearly a very serious error that leads also to totally wrong results.

I waited a bit before replying to see if anyone responded to your statement that the A&H paper gives “totally wrong results”. I’m rather surprised that no one did. I guess they must all have moved on. It’s certainly hard to see how this issue could be considered unimportant.

Roger,
Alexander explains below, how he resolved the issue. It took me one night waking up at 5 in the morning to understand the point.

The uniform prior in range [0, U] in climate sensitivity S means that we have for L a strongly peaked prior

f(L) = 3.7/U / L^2 for L > 3.7/U
f(L) = 0 for L < 3.7/U.

The strong peaking of the prior at the lower limit makes my calculations invalid and gives evidently the results of the A&H paper.

Next question is, is it correct to exclude priors with such a strong peak near lower limit. We must notice that the point L has a very significant physical meaning. That is the point where the system becomes unstable. L = 0 means that the positive feedback has the strength 1.0. We know that the climate has not been unstable. There are some oscillations, but not such one would expect from strong positive feedback. Thus we can justify that the prior of L excludes the range L < 0.

Next question is, can we consider it reasonable that various feedbacks would combine in such a way that reaching 1.0 by the full feedback is prevented by some mechanism, but the total feedback would end up slightly below 1, when the individual factors would add up to a larger value without this mechanism. If some mechanism leads to such results, that would justify a prior in L that peaks strongly near 0. I am not aware of any physical mechanism that would lead to such limit for total feedback.

Now I feel that I should have realized all this immediately. I even think that I understood this, when I read A&H paper a couple of months ago. Now I was, however, incapable of using the Bayesian thinking. How to determine the prior remains a difficult problem and the difficulties may be strongest for exceptionally large values and for values close to a lower or upper limit, as it is possible that the prior distribution should be allowed to go to infinity at such a limit. Infinity in the variable and possibility of infinity as a limiting value of pdf at a limit of the variable are situations, where the consequences of bad priors may distort the results most strongly.

Roger,
Perhaps it is worthwhile to go through, what is in the Bayesian analysis. I write this partly thus refresh the points in my own mind as I have not used regularly the Bayesian method and therefore tend to get confused by the details.

It is easier to start with a discrete set of alternatives. We have a discrete set of models labeled with S. The experiment can give values O in another discrete set.

As prior knowledge our model tells that the experiment will give each value of O with the probability p(O|S) for a particular value of S. For each value of S we have the prior probability p(S). From these values we calculate the sum over all values of S: p(O) = Σ p(O|S)p(S). Based on this prior knowledge on what our model tells about the probabilities of each value of O for each S and on the prior estimate of the distribution of S, we can check what a new experimental observation of the value O tells.

The Bayes formula tells the posterior distribution of the values S

p(S|O) = p(O|S) p(S) / p(O)

Now O has the value observed (2.3 in the example). The three factors on the right hand side are now known for each S and thus we can calculate the updated probabilities for each value of S. The updating factor p(O|S)/p(O) describes in the present case the inaccuracies in determining the value S empirically. The model involved it p(O|S) is the model of obtaining and processing empirical observations. Its shape is determined by the methods used in this process.

Coming to the present case. Here we have continuous variables and that brings always new opportunities for errors. The discrete prior and posterior probabilities of different values of S are described by functions of S: f(S) and f(S|O). This does not cause new technical issues. The updating factor p(O|S)/p(O) is a ratio of two functions of O. The confusion in my first consideration was related to this factor. I took only the numerator without giving proper notice to the numerator. The assumption used in A&H (following F&G) is that p(O|S) has the same Gaussian form as function of O for each value of S when O is not an estimate of climate sensitivity but of feedback strength or L. This function is centered around 3.7/S and has the standard deviation of 0.7. We can perform the whole exercise having the model parameter S as climate sensitivity and the observation as strength of feedback. There is no explicit need to switch between the two variables L and S. We update the distribution of S by calculating L=3.7/S and checking what is the likelihood of the observation 2.3 for this value of L.

A&H have done exactly that, but I think the paper could have been written more clearly in many places. The kind of confusions that Alexander and I had must be rather common. The paper states:

The goal of this paper is to explore one aspect in particular which has received rather limited (and in our opinion rather confusing) treatment in the literature — that is, the choice of prior f(S).

As all too often, I fail to remember to read my message before submitting it. This leads to such strange errors as the word ‘thus’ in the first paragraph above instead of ‘to’. I have my problems with English, but that explains only some of the errors. I hope these errors do not disturb too much.

In “Normal Science” it used to be enough to find the range of any estimate to 95% confidence level. If the range was too large or crossed 0, it was seen as either the theory being wrong or insufficiency of data. The result was to get back to the “lab”.
In “Post-Normal Science” we have to have different estimate ranges of differing probabilistic values. I marvel at the “breathlessness” climate science collectively feels to come up with these estimates. IMO politicians were “invited in” too early in the process by the then practitioners; politicians who are happy to work with 50% and sometimes even less.

I think I have finally arrived at a point that makes sense, it is all to do with the L to S domain flip.

In the L domain the F&G distribution has can be viewed as a distribution with a flat prior, that is it can be viewed as a likelihood × unity.

When it is flipped to the S domain it can be expressed as a likelihood (which varies as they A&H indicate times a prior given by 3.7/S^2 the two parts together give the flipped F&G distribution.

In the S domain the 3.7/S^2 is actually quite a strong prior.

Now they dispose of this prior and impose other, often much weaker priors (in the S domain) to indicate how “fat tailed” the distribution becomes with much weaker S priors than the implied F&G S domain prior.

They then use their stronger priors to show how the fat tail can be reduced.

That is all find and dandy but I really cannot see the point as I cannot see any justification for weaking the priors, as compared with the S domain F&G prior in the first place.

It is only when they reintroduce sufficiently strong priors that have a similar effect to the original disposed of F&G prior that they obtain the result that P(S 4.5ºC (less than 2.5% chance).

So whereas I have maligned them incorrectly, I have merely missed the mark. The paper is I think deceptive, it did not make it clear to me that the process started off by disposing of the strong S domian F&G prior, if I have missed where that is stated, I would be glad to have that pointed out to me.

Anyway I think I can now see clearly what they have done, and although it is formally correct, it seems to be vacuous. They demonstrate the tightening effect of various theoretical priors including an “expert” prior but none of them achieve anything beyond sticking with the original F&G prior.

I think that this would be most easily seen if their weak S priors (flat in S but truncated for values greater than some S) are flipped back into the L domain where they will be of the form 3.7/L^2 (with a trunction below some low L). Multiply this by the L domian F&G likelihood function and the effect on the F&G distribution is I believe dramatic, heavily favouring low values of L (implying high values of S) but barring very low values of L. So their weak S priors flip to strong L priors.

Well all is well that ends well, (if this is the end). I think what they have done can be viewed as very artificial and I don’t think that it is very meaningful or insightful. The crucial step on which it relies seems to be the disposal of the F&G S domain prior, which was not obvious to me, and I doubt that it was obvious to many. If it is mentioned I did not notice it.

I apologise for failing to understand their method but then I wonder how many readers have. I think that crucial steps in the method are not described and that if they were the whole paper would appear in a different light. The bottom line is that they did not demonstrate that the use of an expert prior tightened the F&G distribution one jot. It was that point that led me to try and reconstruct their method, it was a merry game in a branch of mathemetics that I have studiously overlooked for many years. I think I have now done this and as I hoped they were right and I was sort of wrong, but I think I was wrong in the right way and they the opposite.

I feel that I may have finally arrived at a position where I can critique the A&H paper more accurately.

The F&G distribution in its native L domain is a Gaussian distribution and can be thought of as the product of a Gaussian likelihood that varies as the distribution and an uniformative or flat prior.

I take that flat prior as the function that is unity from -a to a where a >> 0.7 (the standard deviation of the F&G distribution). The normalised product of the F&G L domain likelihood and this prior does indeed give the F&G L domain distribution.

Now the worrying part of the F&G distribution is in the negative half plane where climatic instability is implied. Removing this by way of truncating the prior to the open range (0,a) does materially effect the distribution in both the L and S domains. But it must be noticed that in the S domain it also truncates just the negative half plane which A&H do not include in their diagrams, i.e. they have implicitly excluded probabilities that refer to unstable climatic regimes. The ony effect I can see that this has is to re-scale the S domain distributions very slightly when they are normalised.

Now when flipping the F&G distribution to the S domain the likelihood becomes the one they have used but the prior that was uniformative in the L domain varies as 3.7×/S^2 defined on the range (3.7/a,∞). The normalised product of their lieklihood and this prior does give the flipped F&G distribution as required.

Now the question is how shall we describe the information content of this prior? Clearly it contains information required to produce the required distribution in the S domain, and looks strongly informative; but it simply the transformation of an uninformative L domain prior. I can only think of it as defining a baseline prior against which alternative priors can be considered for their information content with respect to the prior necessary to obtain the F&G S domain distribution.

Viewed with respect to this “baseline” prior the first priors they use which are flat on the range [0,b] for some b do not appear to be very informative (they look flat) but compared to the required or baseline F&G S domain prior they contain very different information and hence produce a final distribution very different to the F&G one. So in that sense they can be viewed as informative with respect to the original, native, required,or baseline F&G prior.

The A&H Cauchy and expert priors are informative but with respect to the F&G prior differences at the high S end are not particularly significant. It needs to be noted that their introduction more or less curtails the high S distribution at 4ºC which is the same as for the F&G prior. However the differences at low S are perhaps more noticably significant and act to reduce the probabilities for low values of S, more so in the Cauchy case than the expert case.

So what I suspected was the case albeit not for the reasons I first thought.

The interpretation of the word “informative” gets a little slippery under the domain flip. I can only think of the problem as relating to differences in the between various prior information, and in the is case the differences for high S between the Cauchy and expert priors on the one hand and the required F&G prior are not significant. The term informative seems to be relative and hence likely to become subjective. I can make a case for A&H’s uninformative priors being highly informative (I just need to view them in the L domain were they increase the probabilities for low values of L (hence high values of S). Where I simply mean that they have a significant impact on the F&G distribution with that effect.

So arrive at the same general conclusion as immediately above. The result is vacuous in that it doesn’t illustrate anything much that was not native to the F&G distribution; and that it is misleading in that appears to claim a significant limitation on the probability of high S but only compared to some rather extreme priors that one might think were anything but extreme. Unfortunately this can only come down to a point of view, I think it be vacuous others may think it informative. Well what more can I say. I have done what I set out to do. I have figured out to my own satisfaction why I found the paper to be deceptive from the start, albeit not entirely for the reasons that I originally considered and which I consistently doubted but perhaps deeper considerations that should I think give one pause whenever priors are partitioned according to the criterion of being “informative”.

AH, you seem to be giving your review almost entirely in the form of ‘does it improve on F&G?’. I don’t think this is the purpose. Here is the end of the first paragraph after the abstract:

“More recently, a proliferation of probabilistic estimates explicitly based on calculations using observational data have also been presented (eg Andronova and Schlesinger, 2001; Gregory et al., 2002; Forest et al., 2002; Hegerl et al., 2006). Many of these results suggest a worryingly high probability of high sensitivity, such as P(S > 6oC) > 5% (Solomon et al., 2007, Box 10.2). The focus of this paper is to discuss some of the assumptions underlying these estimates, and implications for users.”

If F&G give the 95% confidence interval as [0.9,3.7] then this is a low upper bound and it is not the focus of the paper. The focus is on why some studies get low upper bounds like F&G’s 3.7 and why others get higher upper bounds. If there is a difference which studies are you going to believe? Maybe it is just obvious to you that you just use the study with the lowest upper bound but, if there is disagreement, then it might help to understand why other studies get different results.

If you find that the studies that have high upper bounds either only use some weak data and don’t use some important information or use a prior that is rather pathological. A uniform prior on (0,100) is clearly pathological P(S>20) = 80% while probability of 1.5<S<4.5 is a mere 3%. So a uniform prior is clearly the wrong shape. Previous incarnations of this paper have gone to town on showing that if you want an alarming result then it doesn't matter how good the data is, you can just increase the upper bound of a uniform prior to get the result wanted. This doesn't make the alarming result credible if the prior is not credible. This is not a desireable feature of a prior.

What A&H are saying is important is to show is whether the pdf results are resiliant to difference choices of credible prior. If the data is good enough, the result won't change much. If the data is not good enough then the results will change a lot. Which is the case? A&H's paper sets out to show that the former is the case ie there is data that is good enough to rule out the high sensitivities. The paper is far more using F&G's likelyhoods to achieve above than trying to improve on F&G.

Some of my thoughts:
I think the climate couldnt have been so ‘balanced’ with heavily positive feedbacks. As many electrical engineers know, heavily positive feedbacked systems – even when simple – need careful tuning and parameters to stay in course – and still even a slight disturbance will usually result an uncontrollable oscillation. Paleoclimatologic evidence doesn’t suggest such unbalance in our climate.

In the contrast, negative feedbacked systems stay in control even without the “intelligent finger” tuning the parameters. It is so ironic that many times I hear people attacking Roy Spencer about believing in “intelligent designer who keeps the planet in balance”, when in fact it is the heavily positive feedbacked systems which need such a “designer”.

Positive feedbacks according to paleoclimatologic evidence also requires such assumption, that the climate is stable until an external forcing influences it. We all should know by know that this is not the case. Without this assumption, climate sensitivity cannot be determined from the paleoclimatologic evidence.

Juakola – Thanks for the provocative reference, and its description of the nature, magnitude, and unpredictability of sudden climate changes. It’s clear that non-linear processes can lead to climate effects difficult to predict, although I believe the paper overstates the case by neglecting the role of trends predictable on the basis of known phenomena and principles.

The D/O and other rapid changes are fascinating – some are global although others appear to be regional. The Paleoclimatologic record suggests that when the sudden changes involve temperature, they are more probable during ice ages – i.e., intervals when the surface was covered with more snow and ice capable of mediating strong feedbacks than during interglacials. The same principle may not apply to desertification and hydrologic changes, as indicated by the examples in the paper.

Another fascinating question relates to the triggers for ice ages, which are generally thought to involve reduced summer insolation at 65 N as consequence of orbital forcing changes. If the mechanism involves the persistence of Arctic summertime ice, how will this be affected by anthropogenic temperature changes that appear to be driving the Arctic in the direction of increasing loss of summer ice? Currently, a new glaciation is not expected for something like another 30,000 years or more, based on the orbital calculations. Will this change as a result of anthropogenic influences, or is the time too far in the future to be susceptible to current trends?

One area where I tend to disagree with you involves the sign of feedbacks. With the possible exception of Jim Hansen, I believe that there is almost universal acceptance within climate science that net feedback is negative, and not vulnerable to turning positive from any plausible change in the near future. Therefore, while I agree about the instabilities of a system dominated by positive feedbacks, I don’t believe the Earth is an example of such a system.

I plead guilty to being mysterious. Yes, climate feedback is net negative by any ordinary understanding of feedback. The putative positive feedbacks include water vapor, ice/albedo, and presumably clouds, along with a variety of others including some involving the biosphere. All of these, however, are offset by the so-called Planck Response, which is simply the tendency of an object to shed more heat when its temperature increases, in conformity with the Stefan-Boltzmann law. Inconveniently, climate science terminology often tends to exclude the Planck feedback in discussing feedbacks, although it is always incorporated into actual computations. This creates the impression that net feedbacks are considered positive. When the Planck response is included, they are negative. The problem is a semantic one, not a scientific one.

Fred, I guess you could be right, at least there is no counter argument on your disagreement that personally I could come up with. I still would like to see and learn what excactly would keep the planet from runaway warming (or cooling) if the net feedback would be strongly positive. If those would be negative feedbacks wouldn’t it mean they would be dominating the system?
Imho what seems evident is that the feedbacks are nonlinear and they build up a thermostat, thus keeping the earth in balance.

one still has to face the situation we have now. We can identify the presence of 150 w/m^2 of power being blocked by all sources including atmospherics and clouds as well as ghgs. We have albedo of around 0.3 and captured incoming power of around 235-240 w/m^2 making the temperature rise of our current Earth to be 33 deg C.

That puts the sensitivity of the climate to power change to be 0.22 deg C / W/m^2 averaged over all of the power blocking. According to the ipcc’s own assumptions a W/m^2 is a W/m^2. To assume this value to change much from the first W/m^2 to the last one causes lots of conceptual problems. Ultimately, the most recent w/m^2 change has have a power sensitivity that is fairly close to the value of earlier ones.

Sensitivity to a co2 doubling, ultimately must translate into W/m^2 blocking increases relative to incoming power. Using the sensitivity from above yields the requirement that a 1 deg C rise requires 1/0.22 = 4.5 W/m^2 increase in power blocking relative to the incoming solar power.

That places the problem that if there is a 4.0 deg C rise due to a co2 doubling, there must be an increase of 18 W/m^2 in power blocking caused by all of the factors affecting incoming and outgoing power. That means a 2 1/2 % increase in the total blocking due to co2, 3.7w/m^2 and associate 0.8 deg C T rise, is now responsible for causing a 12% increase in the total blocking and 3 additional degrees C of added temperature. While I’m quite rusty at control system theory now, I don’t think that is stable or anywhere close.

However, of the required 18 w/m^2 , we’ve accounted for 3.7 w/m^2, leaving 14.3 w/m^2 left. A quick look at absolute humidity rising with an average RH being constant, one has for 5 deg C, only a 30% increase in h2o vapor. That is enough for maybe 3.1 w/m^2 added blocking. So now we’ve got 6.8 w/m^2 and 1 1/2 deg C accounted for. One of the model assumptions, Hansen ’84 ???, is that cloud cover drops by one to two % if certain assumptions are maintained. That of course reduces the blocking slightly but allows more incoming SW to be absorbed so it does increase the blocking relative to the incoming absorbed power as I’ve defined it above. Other factors potentially include added methane and perhaps reduced albedo due to land. However, we’re still missing 11.2 W/m^2 to achieve a 4.0 deg C rise that must come from what I’ve just described or from something else I missed.

Call all of this (part of) a simple reasonableness check on a complicated calculation using statistics. Where is an error so far in the above comments?

A theoretical investigation of climate stability and sensitivity is carried out using three simple linearized models based on the top-of-the-atmosphere energy budget. The simplest is the zero-dimensional model (ZDM) commonly used as a conceptual basis for climate sensitivity and feedback studies. The others are two-zone models with tropics and extratropics of equal area. In the first of these (Model A), the dynamical heat transport (DHT) between the zones is implicit, in the second (Model B) it is explicitly parameterized.

It is found that the stability and sensitivity properties of the ZDM and Model A are very similar, both depending only on the global-mean radiative response coefficient and the global-mean forcing. The corresponding properties of Model B are more complex, depending asymmetrically on the separate tropical and extratropical values of these quantities, as well as on the DHT coefficient. Taking Model B as a benchmark, a criterion for assessing the sensitivities given by the ZDM and Model A is found. This criterion is not always satisfied for parameter ranges of physical interest.

The 2 × CO2 sensitivities of the simple models are studied and compared. Possible implications of the results for sensitivities derived from GCMs and palaeoclimate data are suggested. Sensitivities for more general scenarios that include negative aerosol forcings (inadvertent or geoengineered) in the tropics are also studied. Some unexpected outcomes are found in this case. These include the possibility of a negative global average temperature response to a positive global average forcing, and vice versa.”

“Bayesian reasoning does not deal well with ignorance.” This statement suggests a considerable ignorance of Bayesian methodology, where priors give a sound framework for including the fact that you know that you do not know the value of something into the analysis. Bayesian methodology is all about dealing properly with uncertainties (i.e. lack of knowledge, a.k.a ignorance).

Judith, you may not define uncertainty as being the same as ignorance, but that doesn’t mean that it characterisation is correct, nor that it is the only characterisation that is correct. Ignorance is not a Boolean quantity; partial knowledge of the value of some quantity is equivalent to a complementary partial ignorance regarding that value. Both can be captured by a probability distribution within a Bayesian framework.

Note that the lexicon says “Recognized ignorance refers to fundamental uncertainty” which clearly implies that ignorance is a form of uncertainty.

Bayesian probability provides a framework for inference under uncertainty/ignorance. However the user needs to understand how to properly express the relevant uncertainties within that framework.

Regarding the Rumsefeld comment, I use it regularly when teaching Bayesian concepts. It is easy to deal with things you know you know. Bayesianism gives you a way to include the things you know you don’t know. There is no framwork that can include the things you don’t know that you don’t know for the simple reason that it is unbounded. However, most good scientists realise that EVERY finding is subject to that caveat, and follow David Humes good advice and apportion their belief according to the strength of the evidence.