Critique of the new Santer et al. (2019) paper

Ben Santer et al. have a new paper out in Nature Climate Change arguing that with 40 years of satellite data available they can detect the anthropogenic influence in the mid-troposphere at a 5-sigma level of confidence. This, they point out, is the “gold standard” of proof in particle physics, even invoking for comparison the Higgs boson discovery in their Supplementary information.

FIGURE 1: From Santer et al. 2019

Their results are shown in the above Figure. It is not a graph of temperature, but of an estimated “signal-to-noise” ratio. The horizontal lines represent sigma units which, if the underlying statistical model is correct, can be interpreted as points where the tail of the distribution gets very small. So when the lines cross a sigma level, the “signal” of anthropogenic warming has emerged from the “noise” of natural variability by a suitable threshold. They report that the 3-sigma boundary has a p value of 1/741 while the 5-sigma boundary has a p value of 1/3.5million. Since all signal lines cross the 5-sigma level by 2015, they conclude that the anthropogenic effect on the climate is definitively detected.

I will discuss four aspects of this study which I think weaken the conclusions considerably: (a) the difference between the existence of a signal and the magnitude of the effect; (b) the confounded nature of their experimental design; (c) the invalid design of the natural-only comparator; and (d) problems relating “sigma” boundaries to probabilities.

(a) Existence of signal versus magnitude of effect

Suppose you are tuning an old analog receiver to a weak signal from a far-away radio station. By playing with the dial you might eventually get a good enough signal to realize they are playing Bach. But the strength of the signal tells you nothing about the tempo of the music: that’s a different calculation.

In the same way the above diagram tells us nothing about the magnitude of the temperature effect of greenhouse gases on the climate. It only shows the ratio of two things: a measure of the rate of improvement over time of the correlation between observations and models forced with natural and anthropogenic forcings, divided by a measure of the standard deviation of the same measure under a “null hypothesis” of (allegedly) pure natural variability. In that sense it is like a t-statistic, which is also measured in sigma units. Since there can be no improvement over time in the fit between the observations and the natural-only comparator, any improvement in the signal raises the sigma level.

Even if you accept Figure 1 at face value, it is consistent with there being a very high or very low sensitivity to greenhouse gases, or something in between. It is consistent, for instance, with the findings of Christy and McNider, also based on satellite data, that sensitivity to doubled GHG levels, while positive, is much lower than typically shown in models.

(b) Confounded signal design

According to the Supplementary information, Santer et al. took annually-averaged climate model data based on historical and (RCP8.5) scenario-based natural and anthropogenic forcings and constructed mid-troposphere (MT) temperature time series that include an adjustment for stratospheric cooling (i.e. “corrected”). They averaged all the runs and models, regridded the data into 10 degree x 10 degree grid cells (576 altogether, with polar regions omitted) and extracted 40 annual temperature anomalies for each gridcell over the 1979 to 2018 interval. From these they extracted a spatial “fingerprint” of the model-generated climate pattern using principal component analysis, aka empirical orthogonal functions. You could think of it as a weighted average over time of the anomaly values for each gridcell. Though it’s not shown in the paper or the Supplement, this is the pattern (it’s from a separate paper):

FIGURE 2: Spatial fingerprint pattern

The gray areas in Figure 2 over the poles represent omitted gridcells since not all the satellite series cover polar regions. The colors represent PC “loadings” not temperatures, but since the first PC explains about 98% of the variance, you can think of them as average temperature anomalies and you won’t be far off. Hence the fingerprint pattern in the MT is one of amplified warming over the tropics with patchy deviations here and there.

This is the pattern they will seek to correlate with observations as a means of detecting the anthropogenic “fingerprint.” But it is associated in the models with both natural and anthropogenic forcings together over the 1979—2018 interval. They refer to this as the HIST+8.5 data, meaning model runs forced up to 2006 with historical forcings (both natural and anthropogenic) and thereafter according to the RCP8.5 forcings. The conclusion of the study is that observations now look more like the above figure than the null hypothesis (“natural only”) figure, ergo anthropogenic fingerprint detected. But HIST+8.5 is a combined fingerprint, and they don’t actually decompose the anthropogenic portion.

So they haven’t identified a distinct anthropogenic fingerprint. What they have detected is that observations exhibit a better fit to models that have the Figure 2 warming pattern in them, regardless of cause, than those that do not. It might be the case that a graph representing the anthropogenic-only signal would look the same as Figure 1, but we have no way of knowing from their analysis.

(c) Invalid natural-only comparator

The above argument would matter less if the “nature-only” comparator controlled for all known warming from natural forcings. But it doesn’t, by construction.

The fingerprint methodology begins by taking the observed annual spatial layout of temperature anomalies and correlates it to the pattern in Figure 2 above, yielding a correlation coefficient for each year. Then they look at the trend in those correlation coefficients as a measure of how well the fit increases over time. The correlations themselves are not reported in the paper or the supplement.

The authors then construct a “noise” pattern to serve as the “nature-only” counterfactual to the above diagram. They start by selecting 200-year control runs from 36 models and gridding them in the same 10×10 format. Eventually they will average them all up, but first they detrend each gridcell in each model, which I consider a misguided step.

Everything depends on how valid the natural variability comparator is. We are given no explanation of why the authors believe it is a credible analogue to the natural temperature patterns associated with post-1979 non-anthropogenic forcings. It almost certainly isn’t. The sum of the post-1979 volcanic+solar series in the IPCC AR5 forcing series looks like this:

FIGURE 3: IPCC NATURAL FORCINGS 1979-2017

This clearly implies natural forcings would have induced a net warming over the sample interval, and since tropical amplification occurs regardless of the type of forcing, a proper “nature-only” spatial pattern would likely look a lot like Figure 2. But by detrending every gridcell Santer et al. removed such patterns, artificially worsening the estimated post-1979 natural comparator.

The authors’ conclusions depend critically on the assumption that their “natural” model variability estimate is a plausible representation of what 1979-2018 would have looked like without greenhouse gases. The authors note the importance of this assumption in their Supplement (p. 10):

“Our assumption regarding the adequacy of model variability estimates is critical. Observed temperature records are simultaneously inﬂuenced by both internal variability and multiple external forcings. We do not observe “pure” internal variability, so there will always be some irreducible uncertainty in partitioning observed temperature records into internally generated and externally forced components. All model-versus-observed variability comparisons are affected by this uncertainty, particularly on less well-observed multi-decadal timescales.”

As they say, every fingerprint and signal-detection study hinges on the quality of the “nature-only” comparator. Unfortunately by detrending their control runs gridcell-by-gridcell they have pretty much ensured that the natural variability pattern is artificially degraded as a comparator.

It is as if a bank robber were known to be a 6 foot tall male, and the police put their preferred suspect in a lineup with a bunch of short women. You might get a confident witness identification, but you wouldn’t know if it’s valid.

Making matters worse, the greenhouse-influenced warming pattern comes from models that have been tuned to match key aspects of the observed warming trends of the 20th century. While less of an issue in the MT layer than would be the case at the surface, there will nonetheless be partial enhancement of the match between model simulations and observations due to post hoc tuning. In effect, the police are making their preferred suspect wear the same black pants and shirt as the bank robber, while the short women are all in red dresses.

Thus, it seems to me that the lines in Figure 1 are based on comparing an artificially exaggerated resemblance between observations and tuned models versus an artificially worsened counterfactual. This is not a gold standard of proof.

(d) t-statistics and p values

The probabilities associated with the sigma lines in Figure 1 are based on the standard Normal tables. People are so accustomed to the Gaussian (Normal) critical values that they sometimes forget that they are only valid for t-type statistics under certain assumptions, that need to be tested. I could find no information in the Santer et al. paper that such tests were undertaken.

I will present a simple example of a signal detection model to illustrate how t-statistics and Gaussian critical values can be very misleading when misused. I will use a data set consisting of annual values of weather-balloon measured global MT temperatures averaged over RICH, RAOBCORE and RATPAC, the El-Nino Southern Oscillation Index (ESOI – pressure based version), and the IPCC forcing values for greenhouse gases (“ghg” comprising CO2 and other), tropical ozone (“o3”), aerosols (“aero”), land use change (“land”), total solar irradiance (“tsi”) and volcanic aerosols (“volc”). The data run from 1958 to 2017 but I only use the post-1979 portion to match the Santer paper. The forcings are from IPCC AR5 with some adjustments by Nic Lewis to bring them up to date.

A simple way of investigating causal patterns in time series data is using an autoregression. Simply regress the variable you are interested in on itself aged once plus lagged values of the possible explanatory variables. Inclusion of the lagged dependent variable controls for momentum effects, while the use of lagged explanatory variables constrains the correlations to a single direction: today’s changes in the dependent variable cannot cause changes in yesterday’s values of the explanatory variables. This is useful for identifying what econometricians call Granger causality: when knowing today’s value of one variable significantly reduces the mean forecast error of another variable.

I ran the regression Temp = a1 + a2* l.Temp + a3*l.anthro +a4* l.natural where a lagged value is denoted by an “l.” prefix. The results over the whole sample length are:

The coefficient on “anthro” is more than twice as large as that on “natural” and has a larger t-statistic. Also its p-value indicates a probability of detection if there were no effect of 1 in 2.4 billion. So I could conclude based on this regression that anthropogenic forcing is the dominant effect on temperatures in the observed record.

The t-statistic on anthro provides a measure much like what the Santer et al. paper shows. It represents the marginal improvement in model fit based on adding anthropogenic forcing to the time series model, relative to a null hypothesis in which temperatures are affected only by natural forcings and internal dynamics. Running the model iteratively while allowing the end date to increase from 1988 to 2017 yields the results shown below in blue (Line #1):

FIGURE 4: S/N ratios for anthropogenic signal in temperature model

It looks remarkably like Figure 1 from Santer et al., with the blue line crossing the 3-sigma level in the late 90s and hitting about 8 sigma at the peak.

But there is a problem. This would not be publishable in an econometrics journal because, among many other things, I haven’t tested for unit roots. I won’t go into detail about what they are, I’ll just point out that if time series data have unit roots they are nonstationary and you can’t use them in an autoregression because the t-statistics follow a nonstandard distribution and Gaussian (or even Student’s t) tables will give seriously biased probability values.

I ran Phillips-Perron unit root tests and found that anthro is nonstationary, while Temp and natural are stationary. This problem has already been discussed and grappled with in some econometrics papers (see for instance here and the discussions accompanying it, including here).

A possible remedy is to construct the model in first differences. If you write out the regression equation at time t and also at time (t-1) and subtract the two, you get d.Temp = a2* l.d.Temp + a3*l.d.anthro +a4*l.d.natural, where the “d.” means first difference and “l.d.” means lagged first difference. First differencing removes the unit root in anthro (almost – probably close enough for this example) so the regression model is now properly specified and the t-statistics can be checked against conventional t-tables. The results over the whole sample are:

The coefficient magnitudes remain comparable but—oh dear—the t-statistic on anthro has collapsed from 8.56 to 1.32, while those on natural and lagged temperature are now larger. The problem is that the t-ratio on anthro in the first regression was not a t-statistic, instead it followed a nonstandard distribution with much larger critical values. When compared against t tables it gave the wrong significance score for the anthropogenic influence. The t-ratio in the revised model is more likely to be properly specified, so using t tables is appropriate.

The corresponding graph of t-statistics on anthro from the second model over varying sample lengths are shown in Figure 4 as the green line (Line #2) at the bottom of the graph. Signal detection clearly fails.

What this illustrates is that we don’t actually know what are the correct probability values to attach to the sigma values in Figure 1. If Santer et al. want to use Gaussian probabilities they need to test that their regression models are specified correctly for doing so. But none of the usual specification tests were provided in the paper, and since it’s easy to generate a vivid counterexample we can’t assume the Gaussian assumption is valid.

Conclusion

The fact that in my example the t-statistic on anthro falls to a low level does not “prove” that anthropogenic forcing has no effect on tropospheric temperatures. It does show that in the framework of my model the effects are not statistically significant. If you think the model is correctly-specified and the data set is appropriate you will have reason to accept the result, at least provisionally. If you have reason to doubt the correctness of the specification then you are not obliged to accept the result.

This is the nature of evidence from statistical modeling: it is contingent on the specification and assumptions. In my view the second regression is a more valid specification than the first one, so faced with a choice between the two, the second set of results is more valid. But there may be other, more valid specifications that yield different results.

In the same way, since I have reason to doubt the validity of the Santer et al. model I don’t accept their conclusions. They haven’t shown what they say they showed. In particular they have not identified a unique anthropogenic fingerprint, or provided a credible control for natural variability over the sample period. Nor have they justified the use of Gaussian p-values. Their claim to have attained a “gold standard” of proof are unwarranted, in part because statistical modeling can never do that, and in part because of the specific problems in their model.

Good to see this paper taken apart but I fear you are making it more technical complicated than it needs to be.

The whole concept is bogus. Models are TUNED to fit the climate record by tweaking innumerable poorly constrained parameters. This gives modellers quite a lot of lee-way to choose high sensitivity configurations by convenient parameter choices.

Hansen et al 2002 states quite openly that “models still can be made to yield a wide range of sensitivities by altering model parameterizations”. Indeed that is what they did when they abandoned earlier physics based work on volcanic forcing in favour of arbitrary tweaking in order to reconcile model output with the climate record.

This means that their “natural forcing” runs are equally convenient mythologies, based more on their own biases and expectations than any scientific reality.

If that is their “control” for these calculations the result in simply one of induction reflecting modelers’ choices. A strong AGW was built into the models largely by design. This paper simply affirms that is what models do. Bringing the satellite data into it is simply a red scarf trick.

Lacis et al 1992 ( Hansen’s team with a name shuffle ) used physical modelling to arrive at a scaling of 33W for AOD. The later ( tweaking ) paper used 20W. This meant they were reducing the calculated cooling “forcing” attributed to the same climate record, ie their were increasing the sensitivity of the model. This then allows them to effectively up the sensitivity to CO2 to counter balance. They play one off against the other.

If you now remove the anthropogenic forcing, the highly sensitive model configuration will run a lot cooler.

Now use this as your “control” and you will have a strong “5-sigma” detectable AGW “signal” but it is all of your own making.

Since the full model run was tuned to fit the climate record, switching to compare to satellite data is just a figleaf attempt to introduce some real data into a result with is totally a result of the models’ construction.

Indeed. There is a 5 sigma agreement between those models output and the hypothesis they represent. Models completely agree with the hypothesis upon which they were built.
Please note: to pretend being able to detect a 5+ sigma agreement in the output of largely diverging climate models is preposterous (alternate epithets : obfuscation, joke). No one even knows if the limited set of the selected model runs is even remotely representative of the future state of the atmospheric system. If we base us on the actual performances of those models, they are not.

Oceans are carbonated drinks, when a carbonated drink is warmed it supports higher vapor pressure of CO2 and when the drink is cooled it supports lower vapor pressure of CO2. There is some lag but there should be no one with any knowledge of basic science with any surprise.

On the other hand, a bottle of plain water and a bottle of carbonated drink at room temperature stay the same temperature as the room as the room temperature changes, with some lag of course.

I’d be happy to address your concerns in the peer-reviewed literature. I think that would be the appropriate place to respond.

That said, a brief response is necessary to some of the points you made. It would be unfortunate if readers of your blog post were unaware of our prior research – research which addresses many of the issues you have raised.

This is the only response I will make on Dr. Curry’s website.

1. We routinely consider “ANTHRO only” fingerprints – see, e.g., the discussion on page 7 of the Supplementary Material of the 2018 Santer et al. Science paper. That discussion explains why the “ANTHRO only” and HIST+8.5 fingerprints yield very similar results. In my opinion, it is not unreasonable to expect other scientists to read such background information, particularly since it is cited in the Nature Climate Change paper you are critiquing.

2. You suggest – incorrectly – that we never evaluate the adequacy of model-based estimates of internal variability. We routinely make such evaluations. Examples are given in Fig. S7 of the 2018 Santer et al. Science paper and in Figs. 9 and 10 of the 2011 Santer et al. JGR paper.

3. Readers of your blog post might infer that we are unconcerned with differences between modeled and observed tropospheric warming rates. That is not the case. Many of our publications have attempted to understand the causes of differences between simulated and observed warming rates in the early 21st century. In the 2017 Santer et al. Nature Geoscience paper, we find that a large error in model climate sensitivity – Dr. Christy’s preferred hypothesis for model-versus-data warming rate differences – does not explain the temporal structure of these differences.

4. The pattern comparison statistic we use in our “fingerprint” work is an uncentered spatial covariance. It is not a correlation.

5. Even if one ignores all pattern information and considers global-mean changes alone, the amplitude of observed tropospheric temperature changes remains large relative to model-based estimates of internal variability (see, e.g., Fig. 1E in the 2017 Santer et al. Scientific Reports paper). This holds even for University of Alabama tropospheric temperature data.

6. Whether we do or do not remove residual long-term drift from control run data has minimal impact on our results. We only detrend once (over the final 200 years of each control run). We do not detrend each L-year chunk we are processing when we estimate time-dependent S/N ratios.

7. It is true that “rebound” of tropospheric temperature from the cooling caused by Pinatubo contributes to observed warming over the satellite era. You neglect to mention that our group has studied volcanically induced “rebound” of tropospheric temperature since 2001 (see, e.g., Santer et al. 2001, JGR; Santer et al. 2014, Nature Geoscience). The rebound effect is relatively small over the entire 40-year satellite tropospheric temperature record. Additionally, it is impermissible to focus solely on “rebound” from the eruptions of El Chichon in 1982 and Pinatubo in 1991, and to ignore the cooling effects of early 21st century volcanic eruptions. The climate effects of post-2000 volcanic forcing have been studied in a number of publications (e.g., Solomon et al., Science, 2011; Ridley et al., GRL, 2014; Santer et al., GRL, 2015). The effect of these post-2000 eruptions is to reduce S/N ratios for analysis periods sampling temperature changes in the early 21st century.

8. In other fingerprint detection work, we have tested not only against model-based estimates of internal variability, but also against “total” natural variability (internally generated plus variability forced by changes in solar irradiance and volcanoes). See, e.g., the 2013 Santer et al. “vertical fingerprint” paper in PNAS. For changes in the vertical structure of atmospheric temperature, we can detect an anthropogenic fingerprint even against this larger “total” natural variability.

9. The control run distributions of noise trends are Gaussian (at least for tropospheric temperature).

No, it’s not. What he is suggesting is nobody is going to show up in the literature.

An example is James Hansen’s paper that was reviewed online for everybody to see. Several people here suggested the paper was going to destroyed by skeptic comments: they were finally going to get their licks in. What happened? Their online review comments came off as inept and wrong.

These is undoubtedly an anthropogenic fingerprint – shown most clearly in Harries et al 2001 and repicant studies. These show the effect of changes in IR photon absorption and re-emission in all directions due to increased greenhouse gas concentration in the atmosphere.

But there is a great silence from some quarters on confounding aspects of model and Earth system dynamics. The impossibility of obtaining any but probabilistic estimates from climate models (TAR 2001, McWillams 2007, Slingo and Palmer 2011). The nature of internal variability (Broecker 1996, Koutsoyiannis 2013). The tossed coin accuracy of models of abrupt decadal change (Latif et al 2013). Large variability in TOA energy flux with changes in ocean and atmosphere circulation (Loeb et al 2012, Loeb et al 2017, Loeb et al 2018) – and their perpetual and chaotic shifting states (Koutsoyiannis 2013).

Most recent (post hiatus) warming was the result of cloud feedback with a warmer eastern Pacific surface (Loeb et al 2018). As geophysics that are barely quantifiable (Koren 2017).

How close we are to tipping points would seem to be the irresolvable problem that can only be managed with pragmatic responses. A lack of pragmatism being lamentably endemic in some quarters.

There is much beyond the scope of this simple little Santer et al study that is discussed in the broader Earth system literature. My feeling is that there are far more interesting games afoot.

Re: “8. In other fingerprint detection work, we have tested not only against model-based estimates of internal variability, but also against “total” natural variability (internally generated plus variability forced by changes in solar irradiance and volcanoes). See, e.g., the 2013 Santer et al. “vertical fingerprint” paper in PNAS. For changes in the vertical structure of atmospheric temperature, we can detect an anthropogenic fingerprint even against this larger “total” natural variability.”

Yes, this is well-known to those who read the scientific literature. For example, increased solar irradiance would warm the stratosphere and the troposphere. But that’s not what observational analyses show. Instead they show tropospheric warming, with stratospheric cooling that increases with increasing height. The lower stratopsheric cooling was mitigated over the past couple of decades.

That vertical pattern fits with what would expect from increased CO2 (which warms the troposphere and causes stratospheric cooling that increases with increasing height) with anthropogenic ozone-depletion (which causes stratospheric cooling that, in comparison to increased CO2, has a relatively larger effect lower in the stratosphere vs. higher in the stratosphere), followed by mitigation of ozone-depletion by policies such as the Montreal Protocol the limited CFC release. Even contrarians like John Christy acknowledge this point:

“After Pinatubo (and perhaps El Chichón), LST [lower stratospheric temperature] declined to levels lower than prior to the eruption, giving a stair-step appearance. Ozone depletion and increasing CO2 in the atmosphere contribute an overall decline [emphasis added], so trends in global LST are clearly negative until approximately 1996. […]
Absence of lower stratospheric cooling in the global mean since 1996 is due to recovery of the ozone layer, especially at high latitudes, as the Montreal Protocol and its Amendments on ozone-depleting substances has taken effect [emphasis added] […] [page S19].
[…]
At higher levels of the stratosphere […] the observed trend is approximately −0.7°C [per decade] of which 75% is estimated to result from enhanced greenhouse gas concentrations and most of the remaining decline from ozone loss [emphasis added, and citations removed] […] [page S20].”https://www.osti.gov/pages/servlets/purl/1474380

That’s rather telling, since McKitrick co-authored research with Christy. It’s also telling, since McKitrick previously made a big deal out of mitigated lower stratospheric cooling, even though this is a predicted response of mitigated ozone-depletion, as per the Montreal Protocol:

“As for stratospheric cooling, the RSS record is graphed here. I see 2 steps associated with volcanoes, but flat in between, and flat since 1994. The trend line obscures this. I do not see a steady cooling that would correlate with rising CO2 levels, but perhaps this has been addressed in print somewhere (I have not looked).”http://archive.is/qBFzq#selection-3193.0-3197.177

Given this, I find it ironic that McKitrick says this in his blog article:

“They haven’t shown what they say they showed. In particular they have not identified a unique anthropogenic fingerprint, or provided a credible control for natural variability over the sample period.’

I await the day he addresses the anthropogenic vertical fingerprint (which, by the way, also includes the rising tropopause, along with cooling in the mesosphere and thermosphere, each of which has been observed). Until then, I don’t place must stock in his claims on the lack of fingerprint, when he goes out of his way to side-step evidence on a vertical fingerprint.

Then we should expect in models “a degree of irreducible imprecision in quantitative correspondences with nature, even with plausibly formulated models and careful calibration (tuning) to several empirical measures.” https://www.pnas.org/content/104/21/8709

Fingerprints, in the context of climate science, don’t need to be unique, They instead need to be distinguish between causes of warming, or between causes of whatever effect one is examining. That’s why, for example, a tropospheric hot spot is not a useful fingerprint: it would occur with any strong near-surface / surface warming in the tropics, regardless of the cause of the warming. In contrast, multi-decadal tropospheric warming + stratospheric cooling is a fingerprint, since increased CO2 would cause it, but not increased total solar irradiance.

Multiple such fingerprints build up to a more complete picture that allows one to narrow down to the most plausible cause on the basis of the evidence. It’s akin to a murder trial in which multiple pieces of evidence (ex: blood type of perpetrator’s blood at the scene, perpetrator being a blood relative of the victim, perpetrator’s shoe size for a shoe imprint, type of car the perpetrator drove, etc.) jointly together point to the defendant. That’s despite the fact that an individual piece of evidence would not point uniquely to just the defendant, but would instead argue against other suspects while still ruling in the defendant. Other folks like Gavin Schmidt, Andrew Dessler, and Mark Richardson have used a similar analogy:

“Models and observations also both show warming in the lower part of the atmosphere (the troposphere) and cooling higher up in the stratosphere. This is another ‘fingerprint’ of change that reveals the effect of human influence on the climate. If, for example, an increase in solar output had been responsible for the recent climate warming, both the troposphere and the stratosphere would have warmed. In addition, differences in the timing of the human and natural external influences help to distinguish the climate responses to these factors. Such considerations increase confidence that human rather than natural factors were the dominant cause of the global warming observed over the last 50 years [pages 702 – 703].”https://wg1.ipcc.ch/publications/wg1-ar4/ar4-wg1-chapter9.pdf

BTW there was not “post 200 rebound” the lower climate system had finished “rebounding” before Y2K. It was that rebound to both El Chichon and Mt Pinatubo which was mis-attributed as the late 20th c. AGW.

Do you imagine that you can get away with your simplistic and overly rehearsed talking point while ignoring the bulk of my comment?

I showed the graph and said that there other factors – leaving it at that.

“Understanding stratospheric temperature trends is a difficult challenge. Understanding the mechanisms behind this changes is much more of a conceptual challenge.

But over 40 years ago, it was predicted that the upper stratosphere would cool significantly from increases in CO2.

The depletion of ozone is also predicted to have an effect on stratospheric temperatures – in the upper stratosphere (where CO2 increases will also have the most effect) and again in the lower stratosphere where ozone is the dominant factor.” https://scienceofdoom.com/2010/04/18/stratospheric-cooling/

Ozone dynamics have a number of elements – it depends of course on UV and ozone production and destruction.

It is a more direct fingerprint of increased concentrations of greenhouse gases. Although again you have to wonder what the internal contribution to changes are. But then you have to have a capacity to wonder why. .

Fingerprints, in the context of climate science, don’t need to be unique

That’s priceless. The whole idea of the “fingerprint” metaphor is that it is something unique which shows undeniable proof of the “guilty party”. But of course in the context of climate science: global warming causes cooling and black can be white. So it does not surprise me that unique finger prints don’t need to be unique.

I hope Santer reads this and realises that all his work over the years on “fingerprints” does not mean anything because they are not unique.

Climate models have no unique deterministic solutions. The spread between members of model ensembles arise not simply because there are structural differences between models – but because each model has multiple chaotically divergent solutions with little more than modeler expectations to distinguish between plausible future projections .

“AOS models are members of the broader class of deterministic chaotic dynamical systems, which provides several expectations about their properties (Fig. 1). In the context of weather prediction, the generic property of sensitive dependence is well understood (4, 5). For a particular model, small differences in initial state (indistinguishable within the sampling uncertainty for atmospheric measurements) amplify with time at an exponential rate until saturating at a magnitude comparable to the range of intrinsic variability. Model differences are another source of sensitive dependence. Thus, a deterministic weather forecast cannot be accurate after a period of a few weeks, and the time interval for skillful modern forecasts is only somewhat shorter than the estimate for this theoretical limit. In the context of equilibrium climate dynamics, there is another generic property that is also relevant for AOS, namely structural instability (6). Small changes in model formulation, either its equation set or parameter values, induce significant differences in the long-time distribution functions for the dependent variables (i.e., the phase-space attractor). The character of the changes can be either metrical (e.g., different means or variances) or topological (different attractor shapes)…

Simplistically, despite the opportunistic assemblage of the various AOS model ensembles, we can view the spreads in their results as upper bounds on their irreducible imprecision. Optimistically, we might think this upper bound is a substantial overestimate because AOS models are evolving and improving. Pessimistically, we can worry that the ensembles contain insufficient samples of possible plausible models, so the spreads may underestimate the true level of irreducible imprecision (cf., ref. 23). Realistically, we do not yet know how to make this assessment with confidence.” James McWilliams

Optimistically we may regard the uses these opportunistic ensembles are put to as the equivalent of the drunk looking for his car keys under the streetlamp. Equivalent to the mathematical fallacy of doing what can be done rather than what needs to be done. Pessimistically we may regard the entire enterprise as scientific fraud. Chaos provides – however – infinite opportunities for tuning.

Nor is it possible that models include major sources of internal variability. “Such decadal mismatches between model-simulated and observed climate trends are common throughout the twentieth century, and their causes are still poorly understood. Here we show that the discrepancies between the observed and simulated climate variability on decadal and longer timescale have a coherent structure suggestive of a pronounced Global Multidecadal Oscillation. Surface temperature anomalies associated with this variability originate in the North Atlantic and spread out to the Pacific and Southern oceans and Antarctica, with Arctic following suit in about 25–35 years. While climate models exhibit various levels of decadal climate variability and some regional similarities to observations, none of the model simulations considered match the observed signal in terms of its magnitude, spatial patterns and their sequential time development.” https://www.nature.com/articles/s41612-018-0044-6

And there is more than enough evidence to show large variations in TOA energy flux caused by changes in ocean and atmospheric circulation.

“In summary, although there is independent evidence for
decadal changes in TOA radiative fl uxes over the last two
decades, the evidence is equivocal. Changes in the planetary and
tropical TOA radiative fluxes are consistent with independent
global ocean heat-storage data, and are expected to be dominated
by changes in cloud radiative forcing. To the extent that they are
real, they may simply reflect natural low-frequency variability
of the climate system.” AR4 e3.4.4.1

Ben, Robert
If it were possible to represent the dynamic climate state at a certain time, or over a certain period (depending on your chosen dimensionality) as an image, then it may be possible to measure a fractal dimension of that image. This for instance could be done with a Kolmogorov-Richardson box counting methods – again with boxes of any chosen dimensionality.

If this were done it may be that a change in the underlying dynamic, or “texture” of the climate state, might reveal itself as a change in fractal dimension of this image representation of the system. Perhaps the need for consistency and coverage of data going back a long time might make this challenging in practice.

(Even a simple 1-dimensional temperature plot with time has fractal dimension.)

Dr. Santer has posted some responses to my essay above, to which I hereby offer some brief replies.

1 & 2:My essay is in response to the new paper and the claims based on the analysis therein. If ANTHRO-only fingerprints would have yielded very similar results, that should have been demonstrated, even with a brief statement and graph in the Supplement. Likewise there is no discussion of the adequacy of the model-based internal variability estimates in the paper. That such a discussion appears in the Supplement to another paper isn’t much help for understanding the issue in the context of this paper.

3. Readers might be concerned about this, but it is not the topic of my post.

4. Noted — nonetheless the point remains that the covariances are not reported.

5. And they are small relative to model-based estimates of warming. This is off topic.

6. Your Supplement says that the noise estimates only rely only on the last 200 years of each control run, which is the detrended portion. If it makes no difference to the results you should have said so. It doesn’t alleviate the problem that there likely should be a warming pattern in the natural-only pattern. Detrending definitely would remove it, though there’s no guarantee such a pattern would have been there by chance in the first place.

7. First sentence: exactly my point. Even if the effect is relatively small, it would produce a “nature-only” pattern similar to the fingerprint, weakening the detection result.

8. Again, what you did in other studies doesn’t change the point of my critique of this study. The model-variability comparator is a critical component of the method and the one used herein looks implausible.

9. What matters is the S/N statistic itself. No specification tests are reported so we have no way of knowing whether the coefficients graphed in Figure 1 are independent and normally distributed.

The new Santer et al. study merely shows that the satellite data have indeed detected warming (not saying how much) that the models can currently only explain with increasing CO2 (since they cannot yet reproduce natural climate variability on multi-decadal time scales).

The aptly named Ben Santer – think about it! – is up to his usual trick of making some spurious claim in order to protect the climate establishment.

He did it in 1995 by making last minute changes to the IPCC’s 1995 report when the IPCC was under threat from the Technical Services Bureau of the UNFCCC. It looks like he might be doing it now because the “establishment” is under threat from Trump’s “Presidential Committee on Climate Security”.

The new Ben Santer et al. paper, uses the “gold standard”. A standard which is no longer used by anybody.

The gold standard is a monetary system where a country’s currency or paper money has a value directly linked to gold.

The gold standard is NOT currently used by any government. Britain stopped using the gold standard in 1931 and the U.S. followed suit in 1933 and abandoned the remnants of the system in 1971. The gold standard was completely replaced by fiat temperatures, a term to describe temperatures that are used because of the IPCC’s order, or fiat, that the temperature must be accepted as proof of global warming.

Santer is using the resurrected to save the World from burning up gold standard. It’s more progressive. Does not require fingerprints to be unique. Anyway, from a distance one fingerprint looks just like another.

If the recorded rise in atmospheric CO2 is not responsive to the recorded changes in emissions (https://tambonthongchai.com/2018/12/19/co2responsiveness/), how can someone detect a fingerprint of the emissions in the temperature record that certainly has not followed the CO2 rise very well.

I would just point out that even on Real Climate the graphs of TLT vs. models show a lot of divergence. Gavin uses longer baselines that make it appear less divergent that it is with shorter baselining periods. It’s a 40 year record and I think very hard to blame on “internal variability in the models” unless of course that “variability” is much too high which I suspect may be the case.

The significance of the missing hot spot is missing positive feedback.

The distribution of water vapor effects the efficiency of the atmosphere as a radiator to space.

Imagine atmosphere A.), as is, with an average amount of water vapor spread out from top to bottom, so that much emission takes place at higher and colder levels.

Now imagine atmosphere B.), where all the water vapor is contained in the lowest 1 meter, and no water vapor exists above.

Atmosphere B is much more efficient than atmosphere A because the emissions to space take place from a lower level.

To the extent that a HotSpot increases both warming as well as water vapor aloft, this would tends to make the atmosphere even less efficient at radiating to space ( positive feedback ). Call this atmosphere C.) the modeled atmosphere,

But the observations don’t just indicate uniform warming with height, they indicate decreased warming with height in the troposphere. Call this atmosphere D.)

To the extent that water vapor change follows temperature change, atmosphere D is more efficient than atmosphere C at emitting to space.

As far as a reason for the failure of the models, one need look no further than the double ITCZ problem, which actually got worse with CMIP5. If the models are erroneously creating twice as much tropical convection as is observed, it would stand to reason that they would also create twice as much warming aloft as is observed.

Re: dpy6629 “I would just point out that even on Real Climate the graphs of TLT vs. models show a lot of divergence.”

You’ve been called on this multiple times, so it’s quite ironic that you’re still going on about this. As I’ve told you and your compatriots before, even the RSS satellite-based team admit much of the difference is due to flaws in the observational (satellite-based) analyses:

The hot spot is a sign of the negative lapse rate feedback. That is a negative feedback, not a positive one, and is not the same as the positive feedback from water vapor. And the hot spot exists. These points have been explained to you many times. For instance:

My statement is correct despite Sanakan’s obfuscation on the subject. Look on Real Climate’s models compared to data permanent page. The last 2 graphics show the TLT data and model results. One could also note that generally radiosonde data agree pretty well with the UAH satellite data. It is a stretch to blame the mismatch on “errors in the data.”

Re: “My statement is correct despite Sanakan’s obfuscation on the subject. Look on Real Climate’s models compared to data permanent page. The last 2 graphics show the TLT data and model results. One could also note that generally radiosonde data agree pretty well with the UAH satellite data. It is a stretch to blame the mismatch on “errors in the data.””

You can dodge it all you want by calling it “obfuscation”, but the point still stands: your own source (the RSS team) debunks you on this. The rest of us who actually read scientific sources, don’t need to fall for what you’re saying, just because you’re ignoring the evidence.

conference abstract:“Understanding and reconciling differences in surface and satellite-based lower troposphere temperatures
[…]
We find large systemic differences between surface and lower troposphere warming in MSU/AMSU records compared to radiosondes, reanalysis products, and climate models that suggest possible residual inhomogeneities in satellite records. We further show that no reasonable subset of surface temperature records exhibits as little warming over the last two decades as satellite observations, suggesting that inhomogeneities in the surface record are very likely not responsible for the divergence.”http://adsabs.harvard.edu/abs/2017AGUFMGC54C..05H

And homogenized Chinese radiosonde analyses from another research group, confirming the same point on a regional level:
figure 10 of: “An analysis of discontinuity in Chinese radiosonde temperatures using satellite observation as a reference”

[This is the part where you’ll make some excuse for avoiding the cited scientific evidence, likely with a reference to how there are too many words. That’s on par with not accepting scientific evidence, because you think scientific papers, textbooks, etc. are too long to read, but long contrarian blog articles are fine.]

Robert, Pretty much sums it up. This radiosonde dataset seems to lie in the middle of the satellite datasets. Proof text quoting from the literature is biased because you select a few sentences from a vast body of literature, but its all Sanakan really has.

Re: “Robert, Pretty much sums it up. This radiosonde dataset seems to lie in the middle of the satellite datasets. Proof text quoting from the literature is biased because you select a few sentences from a vast body of literature, but its all Sanakan really has.”

You have no clue whether I’m cherry-picking from the scientific literature, because you don’t actually read the scientific literature. And to say that radiosondes lie in the middle of the satellite analyses is ridiculous, when you were shown the satellite analyses show less warming over the the radiosondes over the past two decades in the more updated “State of the Climate in 2017” image I cited, than in the more outdated analysis Robert cited. That’s especially case for the UAH analysis (version 6), which is a massive outlier. But again, I’ve dealt with you long enough to know you’re beyond rational persuasion by science evidence as I’ve told you before. My posts are for the benefit of other people to see the evidence you’re willfully evading.

I linked an image from an October 2017 publication – showing explicitly the MSU/AMSU comparison to a radiosonde reanalysis product. This is inconsistent only with Atomski’s motivated eyeballing. His quote from a conference abstract I ignored as mere motivated Googling.

I also linked a RSS comparison of troposheric temperature with the 5 to 95% CMIP5 output.

Does he imagine that models can provide deterministic rather than probabilistic estimates – or that radiosonde data is more precise than MSU/AMSU data?

He constantly claims to read the literature – always on his severely limited talking points – never seems to get it quite right – and simply denies anything he can’t misinterpret on the basis of his scientifically unschooled yet conceited opinion. A complete waste of time.

Re: “I linked an image from an October 2017 publication – showing explicitly the MSU/AMSU comparison to a radiosonde reanalysis product.”

And the image I posted (comparing satellite-based estimates, radiosonde analyses, and re-analyses) was from August 2018 Moreover, the Chinese radiosonde image I posted was from 2018.

Do you understand that 2018 comes after October 2017, Robert? Or has your grasp of time descended to the level of your grasp on climate science?

Fair warning: if you’re going to be snarky with me, you best come prepared. Don’t your usual weak sauce.
;)

“His quote from a conference abstract I ignored as mere motivated Googling.”

Oh, that’s just sad. You act if a conference abstract from RSS’ Carl Mears is irrelevant, even as you quote from stuff Mears co-authored in papers and on RSS’ website. And to put the cherry on top, you think that assuming an article was found using Google, has any bearing on its veracity.

And although he may quote from a disembodied abstract – the details are important and not merely a string of words Atomski attaches his meanings to.

The image as I said explicitly compares MSU/AMSU data – including the latest versions with a radiosonde reanalysis product. Did this change between 2017 and 2018? Nothing is inconsistent as I said – other than Atomski’s motivated eyeballing interpretation – as I said. He is in other words a disingenuous pest.

Re: “All the analyses indicate an increase of lapse rate ( that is, decreasing rates of warming with height ).”

Nope. You claim that based on your unsourced, non-peer-reviewed analysis. And you don’t even get that part right, since your radiosonde image (top right, first row) shows increasing warming with increasing height in the tropics.

Anyway, the published evidence makes your case even worse. For example, the following analyses, co-authored by members of the RSS and UW satellite-based research teams, show the decreasing lapse rate in the RSS analysis with respect to TTT (stratospheric-cooling-corrected TMT) relative to TLT:

It make no sense for you to claim NOAA/STAR shows an increasing lapse rate, since they don’t even have a lower tropospheric analysis. Comparing it to a near-surface analysis would show a lapse rate reduction, as has already been shown in the literature multiple times. For instance:

Re: “It is a struggle to read let alone grasp the meaning of his disjointed missives.”

Yes, I understand that you’re struggling to understand the scientific topic under discussion. If you bothered to check the link you gave to Sherwood’s website, you’d realize the image you posted came from figure 7 of a 2003 paper entitled “Temporal homogenization of monthly radiosonde temperature data. Part II: Trends, sensitivities, and MSU comparison”.

To give you a refresher on time, Robert: 2003 was 16 years ago. Sherwood published research since then, including research showing a reduction in the tropical tropospheric lapse rate (indicative of a hot spot). I suggest reading that research, instead of just citing internal variability you think you see in an image from 16 years ago. For example, read the following 2015 paper co-authored Sherwood:

We’ve been over this multiple times, Matthew. The fact that you keep repeating these questions will mislead people into thinking you haven’t been answered, when you actually have been. Once again:

This is primarily not an issue with model accuracy. It’s instead primarily an issue with the inputted forcings, along with internal variability and remaining heterogeneities in the observational analyses. That’s been covered in papers such as:

The RSS team acknowledges this point as well, in the very link dpy6629 cited elsewhere:

“Why does this discrepancy exist and what does it mean? One possible explanation is an error in the fundamental physics used by the climate models. In addition to this possibility, there are at least three other plausible explanations for the warming rate differences. There are errors in the forcings used as input to the model simulations (these include forcings due to anthropogenic gases and aerosols, volcanic aerosols, solar input, and changes in ozone), errors in the satellite observations (partially addressed by the use of the uncertainty ensemble), and sequences of internal climate variability in the simulations that are difference from what occurred in the real world. We call to these four explanations “model physics errors”, “model input errors”, “observational errors”, and “different variability sequences”. They are not mutually exclusive. In fact, there is hard scientific evidence that all four of these factors contribute to the discrepancy, and that most of it can be explained without resorting to model physics errors.“http://archive.is/eBPga#selection-553.0-553.1048

And it’s consistent with published research on the role of errors in inputted forcings on projections of near-surface / surface trends. That makes sense, since tropospheric warming (particularly in the tropics) reflects an amplified response to near-surface warming. See, for instance:

Re: “Is there any graph showing the consistency of model results and data relevant to the “tropical hot spot”?”

You were just shown one, and you’ve been cited research on this multiple times. The hot spot is about amplification of warming with increasing tropical tropospheric height; or to put it another way: the hot spot is about a reduction in the magnitude of the tropical tropospheric lapse rate. The figure I posted shows that the models get that amplification largely correct, at least for the RSS analysis that isn’t as much of a rebutted outlier as is the UAH analysis. Here is the figure again:

So if past is prologue, you’ll now evade the evidence I cited to you, likely either by repeating your questions as if they weren’t answered, or moving the goal-posts to another topic without acknowledging your questions were answered.

If, and when, that happens, Matthew, I’m going to give you the same treatment I give to other evasive contrarians, and which I’ve given you when you’ve evaded evidence in the past: I’ll ask you questions that address the evidence-based points I’ve made to you. And you’ll get no other response from me on this issue until you answer the questions. Here are the questions I’ll ask you (you can save yourself some time by just answering them now):

1) Were you cited evidence of the hot spot existing [meaning: amplification of warming with increasing height in the tropics, reducing the magnitude of the tropical tropospheric lapse rate]?
2) Were you cited evidence that models get the hot spot largely right, insofar as they accurately represent the amplification of tropical tropospheric warming with increasing height?

I forgot to mention that ERA-Interim (a.k.a. ERA-I, a re-analysis from ECMWF) is known to under-estimate mid-to-lower tropospheric warming, as acknowledged by the ERA-I team and other researchers:

“A reassessment of temperature variations and trends from global reanalyses and monthly surface climatological datasets
[…]
ERA‐Interim exhibits a weaker overall warming trend over the period at 700 and 500 hPa; reasons why it is thought to underestimate trends in the lower and middle troposphere are discussed by Simmons et al. (2014).”

Despite this, ERA-I still shows the hot spot, with amplified tropical upper tropospheric warming relative to tropical near-surface warming. That’s shown in the papers I cited before, such as Simmons et al. (2014), “Common warming pattern emerges irrespective of forcing location”, and “Detection and attribution of upper-tropospheric warming over the tropical western Pacific”. See, for instance, figure 1c of that last paper:

In any event, the under-estimated warming issue was addressed in ERA5, the update to ERA-I. ERA5 is available at the link below, and it still shows the hot spot, with amplified tropical upper tropospheric warming relative to tropical near-surface warming:

I gave the figure that was given by Steve Sherwood and quoted the caption – which gives the source of the graph. Sherwood on the page discusses the updating of the data to 2015.

The point was of course variability over time. The point holds and the data is valid despite Atomski’s objection again on this silliest rationale imaginable. But then he is incapable of playing the scientific ball and not the man.

Wow – such an idiosyncratic pejorative stance in the service of an obsessional single mindedness I find myself at a loss to describe. That doesn’t happen often.

So let’s stick to concise facts. The models are right – they just give the wrong answer? Models are right – they just miss the geophysics of internal variability? Models are right – the inputs are wrong?

Of course models may have far more fundamental problems.

“Sensitive dependence and structural instability are humbling twin properties for chaotic dynamical systems, indicating limits about which kinds of questions are theoretically answerable. They echo other famous limitations on scientist’s expectations, namely the undecidability of some propositions within axiomatic mathematical systems (Gödel’s theorem) and the uncomputability of some algorithms due to excessive size of the calculation.” https://www.pnas.org/content/104/21/8709

Although this may be wrong given that it is from the last decade.

The ‘hot spot’ is a fingerprint of surface warming from whatever cause. The tell is that the lapse rate varies with time. The hot spot talking point evades the question of causation in the system. Something I suppose they assume is settled. It’s almost all anthropogenic?

Any who are still reading my comments know that I am a big fan of clouds.

“Marine stratocumulus cloud decks forming over dark, subtropical oceans are regarded as the reflectors of the atmosphere.1 The decks of low clouds 1000s of km in scale reflect back to space a significant portion of the direct solar radiation and therefore dramatically increase the local albedo of areas otherwise characterized by dark oceans below.2,3 This cloud system has been shown to have two stable states: open and closed cells. Closed cell cloud systems have high cloud fraction and are usually shallower, while open cells have low cloud fraction and form thicker clouds mostly over the convective cell walls and therefore have a smaller domain average albedo.4–6 Closed cells tend to be associated with the eastern part of the subtropical oceans, forming over cold water (upwelling areas) and within a low, stable atmospheric marine boundary layer (MBL), while open cells tend to form over warmer water with a deeper MBL.” https://aip.scitation.org/doi/10.1063/1.4973593

“Nope. You claim that based on your unsourced, non-peer-reviewed analysis.”

Point rejected.
The data is peer-reviewed and even if they were not already common knowledge, I’ve pointed you to the locations.
If you were really curious and not just argumentative, you’d might have done something similar for yourself. You haven’t.

And you don’t even get that part right, since your radiosonde image (top right, first row) shows increasing warming with increasing height in the tropics.

Point rejected.
The minor maxima, which shows up only in the 75% complete RATPAC,
is still of a warming rate lower than those of the lowest levels. There is decreasing warming with height with respect to the lowest layers.

“RSS and UW satellite-based research teams, show the decreasing lapse rate in the RSS analysis with respect to TTT

Point accepted.
The contour plots belie the fact that there are only two MSU levels,
so this is not a good means for comparison. I’ll change back to the notional rectangular depiction of the MSU layers. The RSS plot indicates similar rates of warming between the lower and middle troposphere.

It make no sense for you to claim NOAA/STAR shows an increasing lapse rate, since they don’t even have a lower tropospheric analysis.

Point rejected.
STAR includes TUT ( Upper Troposphere ) and TMT, so there are levels to compare.

And there was relevant paper published on this this year for the ERA-I re-analysis:

Point rejected.
Unfortunately, the reanalyses are not immune from bias.
In fact, they include and compound all the biases of the included data sets.
The vast majority of RAOB observations in the data bases are tiny fragments ( a few months here and there ) of the period of record and are mostly missing. It will probably never be known how accurate the spotty readings are, but one theory, to which I subscribe, is that the spottier the data, the more error prone it was ( the resources to make the measurements coincide with the resources to make the measurements well ). There is a HUGE variance among the re-analyses, particularly wrt the hot spot area.

Thanks for your ideas.
I look forward to your examination of the data.

“We use the temperature data collected at the tropical (20S to 20N) 200–300 hPa as found in three data products (RAOBCORE, RICH, and RATPAC), taking annual averages as this is the finest time resolution available from RATPAC for the pressure‐level quantity we require. Our test metric is the simple average of the 200‐, 250‐, and 300‐hPa temperature value provided at those levels in both the radiosonde data sets and models.”

But as James McWilliams said – opportunistic ensembles have more fundamental theoretical limitations than simply being wrong.

Atomsk’s Sanakan: We’ve been over this multiple times, Matthew. The fact that you keep repeating these questions will mislead people into thinking you haven’t been answered, when you actually have been

You always change the subject.

The facts are that the graphs displayed model results higher than data, and if you had a graph displaying the fit of model to data you’d show it. You bring into the discussion hypothesized fixes that might align a future model to future data, but the accuracy of such fixes has been demonstrated, only asserted.

1) Were you cited evidence of the hot spot existing [meaning: amplification of warming with increasing height in the tropics, reducing the magnitude of the tropical tropospheric lapse rate]?
2) Were you cited evidence that models get the hot spot largely right, insofar as they accurately represent the amplification of tropical tropospheric warming with increasing height?

To Turbulent Eddie:
Re: “The data is peer-reviewed”

But your analysis isn’t. And I don’t trust your ability to do a competent analysis for reasons such as you posting a fabricated image that it took me less than 5 minutes to figure out was fabricated (you basically got the image off David Evans’ contrarian blog):

Analogously, just because NASA has peer-reviewed data on Earth’s shape, that doesn’t mean I need to trust a flat Earth’s analysis of that data, especially if the flat Earther has posted fabrications before. Anyway, I’ve already read peer-reviewed analyses of the data, made by competent people who don’t post fabrications they found on a contrarian blog.

Re: “The minor maxima, which shows up only in the 75% complete RATPAC, is still of a warming rate lower than those of the lowest levels. There is decreasing warming with height with respect to the lowest layers.”

Again, I really don’t trust your images, nor your ability to interpret data, especially given your track-record. Actual published analyses show amplification of warming with height in the tropics for RATPAC. First an older analysis on this, and then a more recent one:

figure 4 of “Radiosonde Atmospheric Temperature Products for Assessing Climate (RATPAC): A new data set of large-area anomaly time series”

Re: “STAR includes TUT ( Upper Troposphere ) and TMT, so there are levels to compare.”

Not really, which is why peer-reviewed analyses of NOAA/STAR don’t do such an analysis. What you mean to say is TTT, not TUT, since TTT is the mid-to-upper tropospheric analysis. And TTT just is TMT, with stratospheric cooling subtracted out, since such cooling contaminates TMT. So by definition, TTT would be would be larger than TMT, given cooling of the stratosphere. That’s why its pointless to do a comparison of them. If your analysis shows otherwise, then you’ve screwed up.

Please go read about TTT and the stratospheric cooling correction for TMT:

Re: “There is a HUGE variance among the re-analyses, particularly wrt the hot spot area.”

Every updated re-analysis I know of that looks at tropical tropospheric warming (NCEP, ERA-I, ERA5, MERRA2, CFSR, JRA), shows greater tropical upper tropospheric warming than near-surface warming. Thus they all show the hot spot. The data sources on this have been cited to you a number of times, including in:

And if you really think that “HUGE variance” is an issue, then you should rely on the surface / near-surface analyses more than the satellite-based tropospheric analyses you habitually cite. But, of course you don’t do that, right? Even Carl Mears of the satellite-based RSS team admits that:

“Climate science special report: A sustained assessment activity of the U.S. Global Change Research Program
[…]
Each of these methods has strengths and weaknesses, but none has sufficient accuracy to construct an unassailable long-term record of atmospheric temperature change. The resulting datasets show a greater spread in decadal-scale trends than do the surface temperature datasets for the same period, suggesting that they may be less reliable [Appendix A on pages 432 – 433].”

Re: “Thanks for your ideas.
I look forward to your examination of the data.”

Already did, as illustrated in the link for re-analyses I gave you above, using a web tool from a peer-reviewed paper. One of your problems is that you think the fabrications you post are more plausible than peer-reviewed, reputable analyses.

So put up a graph displaying a close fit of model to data. Everything that you have displayed to date shows misfits, sometimes with post hoc explanations of how the models might have fit better had something different been done with them. The disparities have been pointed out to you.

It’s logical to presume the Left is not saying America was responsible for the global temperature during the Minoan and Roman warming periods nor during the Medieval warm period. But, using the techniques above, human CO2 is responsible and America is more responsible than others because it’s CO2 footprint is larger. Sounds like ‘socialist’ thinking applied to science.

I agree with that. You don’t need 5 sigma to be satisfied of a conclusion, and you can’t observe it anyway. You don’t have data on how often a 5 sigma excursion might be observed. You can only reach such a conclusion on the basis of a fitted theoretical distribution, fitted using observations more in the central part of the range. That is empirical, and you don’t have empirical evidence that the supposed distribution works at that extreme tail.

It is futile to ask for 5 sigma, and futile to claim it. Valid claims work at much lower levels.

For once I’m in complete agreement, possibly for different reasons. But a claim of a 5 sigma detection is nonsensical in this case and even if it were applicable, what have they in fact detected…. it’s warmed… well yes…. it matches my models… well hhhmm yes… so largely modelled output from the set of models that match what I thought…. when we’ve taken away what the models think of as natural…
match what I thought…. well I’d be really surprised if they didn’t!! But what have either you, I or indeed they learnt from thier efforts?

When a scientist, or whoever, knows – i.e. not assumes, but really KNOWS – what our worldly climate would have been without the presence of men, he reaches a new, superior level of understanding, a level that was previously called “God”.

“Observed temperature records are simultaneously inﬂuenced by both internal variability and multiple external forcings. We do not observe “pure” internal variability, so there will always be some irreducible uncertainty in partitioning observed temperature records into internally generated and externally forced components.”

Which translates simply to, ‘we left out the AMO to do a fit up job on greenhouse gases’. Here is my gold standard, the AMO warmed in response to declining solar wind strength via negative NAO/AO, and must be strongly overwhelming the rise in greenhouse gas forcing.

Well – thank God for that. Ulric – right click – open image in new tab – copy and paste.

This is the By component of the Interplanetary magnetic field (top) and the related global electric field. My question is why By is consistently positive when the solar magnetic field reverses in the pseudo 11 year cycle?

Ulric, what is the significance of the “flow pressure” ? It seems like there was a spike just at the time of Mt P. and slow return over the next decade. Any effects from that will surely get mis-attributed to volcanic forcing.

The IPCC’s Global Warming Models predict higher Tropospheric Tropical Temperature (T3) as an anthropogenic signature due both to higher CO2 and amplified higher water vapor. This and similar hypotheses were tested by McKitrick and Christy (2018), and by Varotsos and Efstathiou (2019) with multiple lines of evidence – BUT they showed NO significance. The scientific burden of proof is on those making the majority anthropogenic climate change hypothesis.
See:
McKitrick, R. and Christy, J., 2018. A Test of the Tropical 200‐to 300‐hPa Warming Rate in Climate Models. Earth and Space Science, 5(9), pp.529-536. https://bit.ly/2wFXtRN
Varotsos, C.A. and Efstathiou, M.N., 2019. Has global warming already arrived? Journal of Atmospheric and Solar-Terrestrial Physics, 182, pp.31-38. https://bit.ly/2CQBAT8

“This, they point out, is the “gold standard” of proof in particle physics …”

They need to contact the BoM, who are struggling with their “physics” and consequential cyclone prediction …

“They have a few theories, Dr Watkins said, and are presently crunching the data to get to the bottom of it.
“There are several theories and at the moment the data is pouring in from satellites and everywhere,” he said.
But he said “basic physics” governed that[global warming] would increase the intensity of cyclones in the future.

It does not, however, explain this season’s anomaly.

“Being perfectly honest, [CO2 global warming] is a factor in most of our climate science these days but in terms of tropical cyclones you couldn’t put this season down to [CO2 global warming],” he said.

In an age wherein rigorous signal and system analysis methods have advanced very far, it’s utterly amazing that Santer et al. posit a starkly simplistic, stationary-process model of climate system behavior in their claim of “gold standard” proof of anthropogenic warming.

A well-refereed scientific journal should demand, at very least, a cross-spectral demonstration of the relationship between putative “forcings” and the temperature record. Instead, we get a gross misapplication of freshman statistics concepts in a paper that should have never see the light of day in any reputable scientific journal.

Ben Santer was searching for a human footprint back in 2011. Apparently, he is still searching.

Most recent global climate models are consistent in depicting a tropical lower troposphere that warms at a rate much faster than that of the surface. Thus, the models would predict that the trend for warming of the troposphere temperature (TT) would be at a higher rate than the surface.

Douglass and Christy (2009) presented the latest tropospheric temperature measurements (at that time) that did not show this warming. (Since then, this continued lack of warming has continued for another ten years without much change, but that is getting ahead of ourselves).

Hence, in keeping with recent practice over the past few years in which alarmists promptly publish rebuttals to any papers that slip through their control of which manuscripts get accepted by climate journals, it was necessary for the alarmists to publish such a rebuttal.

Ben Santer took on this responsibility and the result was Santer et al. (2011). It is interesting, perhaps, that Santer included 16 co-authors in addition to himself; yet the nature of the work is such that it is difficult to imagine how 16 individuals could each contribute significant portions to the work. In other words, many names were added to give the paper political endorsement? In fact, when I redid all their work, it took me about one day!

Santer et al. (2011) were concerned with a very basic problem in climatology: how to distinguish between long-term climate change and short-term variable weather in regard to TT measurements? They treated the problem in terms of signal and noise: the signal is assumed to be a long-term linear trend of rising temperatures due increasing greenhouse gas concentrations, that is obfuscated by short-term noise. However, the climate-weather problem is innately different from a classical signal/noise problem such as a radio signal affected by atmospheric activity. In that case, if the radio signal has a sufficiently narrow frequency band, and the noise has a wider frequency spectrum, the signal-to-noise ratio (S/N) can be improved with a narrow-band receiver tuned to the frequency of the radio signal. The radio signal and the noise are separate and distinct. By contrast, in the climate-weather problem, the instantaneous weather is the noise, and the signal is the long-term trend of the noise. The noise and signal are coupled in a unique way. Furthermore, there is no evidence that it is even meaningful to talk about a “trend” since there is no evidence that the variation of TT with time is linear.

Santer et al. (2011) were primarily concerned with estimating how many years of data are necessary to provide a good estimate of the putative underlying linear trend. They were also intent on showing that short periods with no apparent trend do not violate the possibility that over a longer term, the trend is always there. They derived signal-to-noise (S/N) ratios for both the temperature data and the model average by means that are not exactly clear to this writer.

As Santer et al. (2011) showed, one can pick any starting date and any duration length and fit a straight line to that portion of the curve of TT vs. time. They did this for various 10-year and 20-year durations. In each case, depending on the start date, they derived a best straight-line fit to the TT data for that time period. They found that the range of trends for 10-year periods was greater (-0.05 to +0.44°C/decade) than the range for 20-year periods (+0.15 to +0.25°C/decade).

The trend line was steepest for a start date around 1988 (ending in the giant El Niño year of 1998). Prior to 1988 and after 1998, the trends were minimal.

Santer et al. described use of longer durations as “noise reduction”, which it is, provided that one assumes the overall signal is linear in time. It still was problematic that the trend was nil after 1998 that they rationalized by saying:

“The relatively small values of overlapping 10-year TT trends during the period 1998 to 2010 are partly due to the fact that this period is bracketed (by chance) by a large El Niño (warm) event in 1997/98, and by several smaller La Niña (cool) events at the end of the … record”.

However, as Pielke pointed out, the period after 1998 was 13 years, not 10, and furthermore, the period after 1998 had roughly equal periods of El Niño and La Niña and was not dominated by La Niñas as Santer et al. claimed. What Santer et al. (2011) implied was that an unusual conflux of a large El Niño early on and multiple La Niñas later on caused the trend to minimize for that unique period as a statistical quirk. However, that is like a baseball pitcher saying that if the opponents hadn’t hit that home run, he would have won the game.

In simplistic terms, the signal-to-noise ratio can be estimated as follows. For either 10-year or 20-year durations, the signal was the mean trend derived by a straight-line fit to the TT data over that duration. The noise was the range of trends for different starting dates. For ten-year durations, the trend was 0.19 ± 0.25°C/decade. For twenty-year durations, the trend was 0.20 ± 0.05°C/decade. The signal in each case is taken as the mean trend. The distribution of trends within these ranges was similar to a normal distribution. Thus, we can roughly estimate the noise as ~ 0.7 times the full width of the range. Hence, the S/N ratio for ten-year durations can be crudely estimated to be S/N ~ 0.19/(0.7  0.5) = 0.5 and for twenty-year durations is S/N ~ 0.2/(0.7  0.1) = 2.9. Santer et al. obtained S/N = 1 for ten-year durations and S/N = 2.9 for twenty-year durations. If it can be assumed that the signal varies linearly with time, one can then estimate what level of precision for the estimated trend can be obtained for any chosen duration. Santer et al. obviously believe that the signal is linear with time for all time. By some logic that escapes me, Santer et al. concluded that

“Our results show that temperature records of at least 17 years in length are required for identifying human effects on global-mean tropospheric temperature”.

This conclusion seems to be grossly exaggerated. A more proper statement might be as follows:

Assuming that the variability of TT is characterized by a long-term upward linear trend caused by human impact on the climate, and that variability about this trend is due to yearly variability of weather, El Niños and La Niñas, and other climatological fluctuations, the recent data suggest that the trend can be estimated for any 17-year period with a S/N ratio of roughly 2.5.

Finally, we get to the nub of the paper by Santer et al. that asserted:

“Claims that minimal warming over a single decade undermine findings of a slowly-evolving externally-forced warming signal are simply incorrect”.

Here is where Santer et al. attempted to dispel the notion that minimal warming for a period contradicts the belief that underneath it all, the long-term signal continues to rise at a constant rate. Pielke Sr. argued that this was an overstatement and he concluded:

“If one accepts this statement by Santer et al. as correct, then what should have been written is that the observed lack of warming over a 10-year time period is still too short to definitely conclude that the models are failing to skillfully predict this aspect of the climate system”

However, I would go further than Pielke Sr. First of all, the period of minimal temperature rise was longer than 10 years. Second, there is no cliff at 17 years whereby trends derived from shorter periods are statistically invalid and trends derived from longer periods are valid. According to Santer et al. a trend derived from a 13-year period is associated with a S/N ~ 1.5 which though not ideal, is good enough to cast some doubt on the validity of models.

The continued almost religious belief by alarmists that the temperature always rises linearly and continuously is evidently refuted. If the alarmists would only reduce their hyperbole and argue that rising greenhouse gas concentrations produce a warming force that is one of several factors controlling the Earth’s climate, and there are periods during which the other factors overwhelm the greenhouse forces, perhaps we would have a rational description. Instead, the alarmists continue to find linear trends over various time periods, in some cases when they are not there.

[I]n the climate-weather problem, the instantaneous weather is the noise, and the signal is the long-term trend of the noise. The noise and signal are coupled in a unique way.

Alas, such analytically nonsensical notions are rampant on both sides of the climate debate. While the “climate signal” may be construed to be the very-low-frequency components of the physically measurable weather varaibles, signal and noise are mutually independent by definition. Nor can such a frequency-dependent distinction be made effectively by fitting linear trends of arbitrary length to the data.

It can be shown that the slope of such trends is a very crude band-pass filter; what is required is a low-pass filter with a physically meaningful cut-off frequency. Neither Santer et al. nor their critics show any analytic aptitude in recognizing such cut-off, let alone in distinguishing meaningfully between signal and noise.

“The global coupled atmosphere-oceanland-cryosphere system exhibits a wide range of physical and dynamical phenomena with associated physical,
biological, and chemical feedbacks that
collectively result in a continuum of
temporal and spatial variability. The traditional boundaries between weather and climate are, therefore, somewhat artificial.”https://journals.ametsoc.org/doi/pdf/10.1175/2009BAMS2752.1

Call it signal to noise and imagine you have a handle on Earth system geophysics?

There may not be human fingerprint on tropospheric temperatures since 1978, but there very certainly is an El Nino fingerprint. Occurrence of El Ninos dominated over La Ninas from 1978 to 1998, a period when there was more global warming than any other period in the past 150 years. After the great El Nino of 1997-8, global temperatures have meandered in consonance with the Nino 3.4 Index, rising to a new height in the great El Nino of 2015-6, only to fall back after that to about the “pause”

Well, I just felt the need to point out the obvious:
Despite his “This is the only response I will make on Dr. Curry’s website.”, Santer felt the need to leave a comment on this blog. This is a long way from Mann´s insults towards Judith over at RC!
Personally, I see that as a great victory of a calm and patient scientist running her blog against evil men pushing an agenda for personal power, way to go, Judith

“If you think the model is correctly-specified and the data set is appropriate you will have reason to accept the result, at least provisionally. If you have reason to doubt the correctness of the specification then you are not obliged to accept the result.”
“But there may be other, more valid specifications that yield different results.”

Thats the issue of investigating complex systems with only vaguely estimated parameters and data. You can build a whole lot of different “plausible” models and model variants and models containing models that even specialist statisticians have trouble understanding (and which leads to the entertaining question of which specialist is the true specialist) that might yield different results. Then pick the model that gave your favoured result.
Welcome in the era of speculative science (again).

It might be useful to add the absolute uncertainties (if they were calculable or known) to the the model uncertainties (multiply if independent). In this case the probability that the model is representative of the “real” climate/weather and the alternative climate/weather without humans. Here that’s actually the only number that matters (the other being almost “1” (that “we done it”)).
Otherwise the whole 5-sigma stuff is worthless.

Studies of air trapped inside Greenland ice cores repeatedly show century-long warmer intervals going back 4000 years, compared to a 2001-10 basline and winters have been getting colder since 2010. Nothing we see today falls outside natural variation. The global warming hoax is a matter of ‘human nature,’ not science and accordingly, it can be argued that a very vocal majority of Western academics who believe humanity’s CO2 has actually caused global warming over the last half of the 20th century actually prefer superstition and imagination (false assumptions and unverifiable hypotheses) to scientific reasoning and scrutiny (logic and observational evidence).

Lacking any actual science-based reasoning the True Believers of global warming have been forced to argue against a holistic view of the world. Being anti-holistic, however, doesn’t come across well to the scientifically illiterate hippies that flesh out the membership of the climatism movement. To buy into the AGW model-makers’ hooey about the Earth having a fever you also must believe that nature is totally flummoxed about what to do with man’s CO2. The model-makers of climatism created their own ‘reality’ which is a metaworld where man’s CO2 produces enhanced greenhouse effects that raise Earth’s average temperature with disastrous consequences for all living things.

Thanks Ross for a nice summary. I was always taught to first plot out the data and see what it looks like. I downloaded your data and did histogram plots of the temp, anthro, and natural. The temp and natural are pretty much normally distributed. However, the anthro is not even close to a normal distribution. It is greatly skewed to negative and then rather flat. As you note, a t-statistic is certainly not justified for that data.

2) Changes in the vertical distribution of water vapour: Declines in lower stratosphere water vapour leading to cooling. Increases in low-mid troposphere water vapour, at least due to higher SST’s coupled with an increase in surface wind speeds over the oceans, leading to low-mid troposphere warming.

3) Reduced CO2 uptake in the warmer North Atlantic and in land regions made drier by the warm AMO phase (and increased El Nino).

All because ocean phases vary inversely to changes in climate forcing.

I have read Ross McKitrick’s criticism of Ben Santer’s paper but not Santer’s paper and from it I see the importance of correctly decomposing the CMIP5 model and observed data series into secular trend and noise. The issue with me becomes that of decomposing the series into trend, white and red noise and multi-decadal quasi cyclical components. I am not aware of how PCA would be capable of decomposing the cyclical components and separating those components from the trend component. I have been analyzing global CMIP5 model and observed temperature, forcing and TOA net R series using a recent modification of the original Empirical Mode Decomposition method called ceemdan. If those series are sufficiently similar to the mid-troposphere series analyzed by Santer and criticized by McKitrick, I would have some reservations about the Santer paper.

Decomposing with ceemdan artificially constructed series that simulate the expected components in observed and modeled series with known components demonstrates that ceemdan provides a very good method for extraction of components. My analyses of CMIP5 and observed temperature series in the historical period shows the existence of approximate 20, 30 and 60-70 year cycles with the 60-70 year cycles being shown to have a profound effect on the trend if it were to combined with the actual trend, i.e. not extracted. The completely extracted trends are not linear. What is disconcerting about my further analysis is that when applying the ceemdan decomposition to the pre-industrial control temperature series for the CMIP5 models is that significant longer term cyclical components nearly all are no longer extracted. When using ceemdan to decompose the RCP CMIP5 temperature series from 2005 to 2100 lower frequency cyclical components can be extracted but are different than those extracted in the historical period.

Together these findings could indicate the multi-decadal cycles in the historical period in the CMIP5 models are there to better emulate the observed temperature series and are not necessarily as natural as these components would appear in the observed series. I find further evidence for these matching attempts from modeling when comparing the feedback parameters for that determined for the models using the 4xCO2 experiment with that for the observed using observed temperature, forcing and ocean heat content in conjunction with ceemdan extraction and the energy budget model. The feedback parameter for the models should produce significantly greater temperature trends for the models than the observed if the observed forcing and ocean heat content changes are applied to the models. The forcing and ocean heat content change can, of course, be used in models to compensate for the feedback parameter in the historical period temperature series.

I also have a problem with using a climate model ensemble mean for comparison to the observed in these analyses when comparing with individual models can paint a very different picture and correctly emphasize the wide range of model results. That range and variation can be shown to be much larger than that for multiple runs of individual models and additionally show that there are significant differences between many of the individual models (which have multiple runs where that test can be applied).

Constructive criticism of papers on this site are sometimes recognised and acknowledged, as in the case of Nick Lewis’ recent rebuttal of the ocean warming paper by Resplandy, that was even published by the BBC: