Santer et al 2008 – Worse Than We Thought

Last year, I reported the invalidity using up-to-date data of Santer’s claim that none of the satellite data sets showed a “statistically significant” difference in trend from the model ensemble, after allowing for the effect of AR1 autocorrelation on confidence intervals. Including up-to-date data, the claim was untrue for UAH data sets and was on the verge of being untrue for RSS_T2. Ross and I submitted a comment on this topic to the International Journal of Climatology, which we’ve also posted on arxiv.org. I’m not going to comment right now on the status of this submission.

However, I re-visited some aspects of Santer et al 2008 that I’d not considered previously and found that it was worse than we thought.

In particular, I examined the following assertions that there was no statistically significant difference in lapse rate trends between observations and the ensemble mean. Santer observed that the lapse rate eliminated a considerable amount of common variability, reducing the AR1 autocorrelation and thus narrowing the confidence intervals (which is a fair enough comment):

Tests involving trends in the surface-minus-T2LT difference series are more stringent than tests of trend differences in TL+O, TSST , or T2LT alone. This is because differencing removes much of the common variability in surface and tropospheric temperatures, thus decreasing both the variance and lag-1 autocorrelation of the regression residuals (Wigley, 2006). In turn, these twin effects increase the effective sample size and decrease the adjusted standard error of the trend, making it easier to identify significant trend differences between models and observations.

Santer noted that there were significant differences in lapse rate for the UAH data set but not for the RSS data, about which he said:

there is no case in which the model-average signal trend differs significantly from the four pairs of observed surface-minus-T2LT trends calculated with RSS T2LT data (Table VI).

If RSS T2LT data are used for computing lapse-rate trends, the warming aloft is larger than at the surface (consistent with model results)… When the d∗1 test is applied, there is no case
in which hypothesis H2 can be rejected at the nominal 5% level (Table VI)

In the tropospheric data, we’d noticed that the trend in RSS T2 data was now very close to statistical significance and so it would be worthwhile checking these claims using up-to-date data. (Note that Santer did not mention T2 lapse rates in this summary, though T2 data is referred to in various tables elsewhere.)

As CA readers know, there are three major temperature indices that one encounters in public debate: HadCRU, GISS and NOAA. These were used for comparing troposphere and surface trends in the CCSP report that is a reference point for both Douglass and Santer, as shown in their Table 1 below.

Figure 1. Excerpt from CCSP Report.

As a first exercise, I calculated the lapse rate trend between these three surface indices (collated into TRP averages) and RSS T2LT (and T2, as well as corresponding UAH results.)

Using 2007 data (then available), 4 of the 6 comparisons of RSS data to surface data were statistically significant even with Santer’s autocorrelation adjustment: GISS relative to both T2LT and T2; NOAA and CRU relative to T2. Using data currently available, 5 (and perhaps) 6 comparisons are now statistically significant. (The t-statistic for the 6th, NOAA vs T2LT, is at 1.924.)

Using obsolete data (ending in 1999 as Santer did), the two GISS comparisons were “statistically significant” even with truncated data. The t-stats were 2.7 (T2LT) and nearly 3 (T2) and weren’t even borderline. So what was the basis of Santer’s claim that “none” of the RSS lapse rates trends relative to surface temperatures was “statistically significant”?

Santer didn’t include GISS and NOAA in his comparison!

He deleted the GISS and NOAA composites used by IPCC and others and replaced them by three SST series: ERSST v2, ERSST v3 and HadISST. The ERSST versions had lower trends and these trends were enough lower than NOAA and GISS that the discrepancy between the ERSST versions and RSS was no longer significant. Santer proclaimed victory.

In summary, considerable scientific progress has been made since the first report of the U.S. Climate Change serious and fundamental discrepancy between modelled and observed trends in tropical lapse rates, despite DCPS07’s incorrect claim to the contrary. Progress has been achieved by the development of new T_SST , T_L+O and T2LT datasets,

I have several problems with this supposed reconciliation. In my opinion, Santer should have reported the GISS and NOAA results. He could also have reported the ERSST results, but to simply not report the significant GISS results is hard to endorse, particular when Gavin Schmidt was a coauthor and familiar with GISS.

Secondly, if one sticks to the actual indices in common use, the discrepancy in lapse rate is just as real as ever. If the ERSST versions aren’t actually used in GISS, NOAA or HadCRU, what exactly is accomplished by showing that there is supposedly no statistically significant discrepancy between RSS and a surface version that isn’t used in the composites?

Third, Bob Tisdale observed that the first version of ERSST v3 (back in the old days of October 2008 before they “moved on”) incorporated satellite data in their estimates of SST. If so, then it seems relatively unsurprising that adjusting SST with satellite data reduces the discrepancy between SST and satellite data, but this hardly resolves the situation for data that hasn’t been adjusted by satellites or, for that matter, the unadjusted difference in the original ERSST3.

Fourth, there’s a nice bit of further irony. The low trends of ERSSTv3 apparently aroused protests within the community and the ink was barely dry on the publication of ERSST3 before they moved on. Bob Tisdale reported that, in November 2008, ERSSTv3(now ERSST v3A) was withdrawn and replaced by a new ERSST v3, not using satellites, with a higher trend. The “old” ERSST3 TRP trend up to end 1999 was 0.076 deg C/decade (I calculated this from a vintage gridded version of ERSST 3 that I located at another site); this was less than half the corresponding GISS trend and a little more than 60% of the CRU trend. But the new and improved ERSST3 TRP version presently online at the ERSST3 website (zonal) is 0.126 deg C/decade – a little higher than CRU. If they don’t adjust SST using satellites (and this adjustment seems to have been withdrawn after protests), the Santer reconciliation no longer works 🙂

In summary, contrary to Santer’s claim that none of the lapse rate trends are significant, all of them (or all but one of them) are significant relative to the three major indices using up-to-date data. As I said above, Santer et al 2008 is “worse than we thought”.

53 Comments

But, what was the support for using SST data (“adjusted”, “readjusted”, whatever) in the first place, instead of composites? They tried to compare model projection not for difference between SST and tropospheric trend(s), but for difference between combined (land+ocean) surface trend with tropospheric one.

Steve M, I am happy to see you make these points about the Santer analysis and what the Santer paper did not say about significant trend differences. And these are points that even the layperson like myself could see rather early in the analyses of Santer et al. (2008) made at CA.

I do not want to waste any time attempting to determine or attribute motivations of authors of climate science papers, but some seem to have a common theme of carefully chosing the measurements and then from that data and selective analysis arm waving a rather definite conclusion.

Fortunately that approach often leads a layperson to where the paper has failed to provide all of the analysis. In Santer’s case the lack of reporting T2 difference comparisons with surface series was obvious.

I hope the blog participants here appreciate the full impact of what you have reported in this thread. I have been doing some background work on characterizing some of these RSS, UAH, GISS and HadCru monthly and annual temperature series with ARIMA models and then deriving some synthetic series for Monte Carlo analyses.

I think that the deletion of the GISS series is totally repugnant. I don’t believe for a minute that they did not do the analysis with a GISS comparandum. Maybe it’s in the CENSORED directory with the bristlecones.

So, to review this newest “progress in climate science”:
Santer first deleted GISS, NOAA and HadCru data altogether, and even than, failed to report significant inconsistency between RSS T2 and his “better” SST surface data, obtained by adjusting original SST data with satellite data (with which he should compare them)!!! And then, some of comrades from Team probably spotted that “adjusted” SST data were “too low”, and decided to double them by new “adjustment”! And now Santer’s “reconciliation” doesn’t work anymore, because his “surface” trend is too high again! Inconsistency remains! But he doesn’t retract conclusions of his previous paper that inconsistency was removed, which was based on obsolete data, i.e. on rejected adjustment of SST!

Santer was identifed by Stephen Schneider here as a master debater who would “slaughter” various blog proprietors, including moi:

Question: More specifically, the principal skeptic websites (Watt’s Up With That, Climate Skeptic, Climate Audit and Climate Science) that I look at regularly seem to think they are winning the day. They think data is coming in that questions the established paradigm.

Schneider: They have been thinking that as long as I have observed them and they have very few mainstream climate scientists who publish original research in climate refereed journals with them–a petroleum geologist’s opinion on climate science is a as good as a climate scientists opinion on oil reserves. So petitions sent to hundreds of thousands of earth scientists are frauds. If these guys think they are “winning” why don’t they try to take on face to face real climatologists [or did he mean realclimatologists?] at real meetings–not fake ideology shows like Heartland Institute–but with those with real knowledge–because they’d be slaughtered in public debate by Trenberth, Santer, Hansen, Oppenheimer, Allen, Mitchell, even little ol’ me. It’s easy to blog, easy to write op-eds in the Wall Street Journal.

Hmm, Heartland invited many of them 2008 and as far as I know in 2009, to participate but they have refused. None showed up. Realcliamtologists advised preemptively all climatologists invited by Heartland to not take part in the show. It doesn’t seem as highest level of debating self-confidence to me…

That was my thought also. The comments do not exactly sound like a detached scientist interested in different ideas and approaches to his field of work.

I think that perhaps some of these scientists do not realize that there are laypersons out there who have the ability to analyze these scientist works and make judgments of those works on their own. It is perhaps a bit more lonely for the laypersons then being in the club of peer review and consensus, but it makes their efforts no less valid.

The winner of debate is probably the better debater and maybe even the better scientist, but that would not say anything, in my mind, about the correctness of the winner’s debating position.

A funny aspect to Schneider’s comment is that he invited me twice to act as a reviewer for Climatic Change. Now there’s a long story to the reviews, but that doesn’t change the decision to invite me as a reviewer.

Thank you, Jeff Id, Ryan, Steve, Roman and the other Jeff for the time invested and for exposing fully the machinations of IPCC climate scientists. You have performed an important service, the value of which will eventually be understood and appreciated by many more than just those of us who read these blogs.

Schneider’s argument is just silly. You’re not arguing climate science, you’re arguing statistics. Climate Science doesn’t get its own special statistical methods, no matter much the proponents would like it to be so.

Santer with a sneer: “a petroleum geologist’s opinion on climate science is a as good as a climate scientists opinion on oil reserves.”

Well, which would you rather have if you were betting your house on the statistical result: the opinion of a petroleum geologist who has been trained and depended on accurately and openly monitored statistics for his whole career or the opinion of a climate scientist who may or may not understand the sometimes novel statistical theories he employs and who contemptuously refuses to release his data and his methods?

If the question is about science, I’ll take the opinion which results from full implementation of the scientific method over the opinion which ignores the scientific method. Wouldn’t everyone?

If they don’t show their work, it ain’t science.

My teachers in elementary school insisted that we show our work on arithmetic problems. Simply writing the answer alone was never sufficient. This principle was followed by every teacher I had in every math and science class through college. I’ll bet every reader here learned the same thing from their elementary school teachers. Maybe some folks need to go back to elementary school?

Aren’t the “climate scientists” just doing data analysis on data connected with the climate? They almost certainly have expert knowledge about how the earth’s climate behaves, those that aren’t astronomers that is, but at the end of the day they are doing data analysis. They could use the same techniques to study horse racing. The nub of problem seems to be that they start out with the conclusion and work their data analysis from there.

Schneider is wrong, I doubt that they’d win a debate, but they won’t debate it anyway because the political battle has been won and to have a debate woiuld imply that there is a possibility they may be wrong.

Schneider is flat wrong that these so-called heavy hitters “like Kevin Trenberth” would win an open debate easily against the skeptical scientists, and I can prove it: I arranged, semi-recently, a written debate between Dr. Trenberth and Dr. Bill Gray; this debate appeared in the Fort Collins (Colorado) newspaper that I write for, and if anyone would like to read it, please download it (for free) from here:

And please pass it around. This extraordinarily edifying exchange got great local readership, but not nearly the national and international attention it deserved. We even conducted an informal poll, and Dr. Gray easily came out on top.

I realize that I opened the door on the Schneider comment, but let’s do this: at some point, I’ll respond to Schneider’s accusations, but let’s do it on a special purpose thread and leave this one to Santer.

I have one doubt. Suppose Steve and Ross, ad Douglas et al and all others are right and Santer and Community are wrong, concerning tropical hot-spot. What that would prove exactly? For a long time I believed that would be final smoking gun against AGW.

But, now, at closer examination, I don’t think so, at least not to the extent many skeptics usually assume. IPCC ascribes hot-spot to greenhouse gas increases, ONLY because it assumes solar forcing to be too weak to produce large warming. 1890-onwards solar forcing chart in IPCC report produces uniform warming throughout the atmosphere only because IPCC quantification of solar forcing is such to minimize it. But if underlying physics is correct, tropical amplification is to be expected whatever reason for warming would be. Earth has warmed in 20th century by 0.6 deg C. If CO2 was not the reason, something else must be, most plausibly solar effect. But, is solar effect was much stronger than IPCC says, capable of producing much of that warming, than it should be expected to produce tropical hot-spot by now as well. So, pointing to absence of hot-spot is not argument against AGW, but against underlying physics which assumes lapse rate effect with warming, or maybe against the data. Whatever the cause of warming, tropical amplification should have been visible by now. If it is not that could mean only one of two things 1) physics as we know it, is not a good representation of atmospheric processes, i.e. it is wrong. 2) data are not good (satellite or surface). I assume the first possibility less likely, and than remains only second. And I assume that most probably problem is in surface data.

So, if I am correct, skeptical emphasis on absence of hot-spot as a proof against AGW is wrong. That absence should be treated as plausible evidence that surface data in tropics have large artificial warming bias, whatever the reason for warming would be. Ironically, what Santer et al did with attempt to correct downward surface data was most plausible way out from tropical inconsistency, although Santer’s methodology was unsound.

I’m not speculating about the physics about which I’m not in a position to speak. If you wish to speculate about physics, I would prefer that you do so at one of the blogs that is interested in this sort of theory. I have no opinion on things like cosmic rays and have discouraged the discussion of such speculations here.

I’m commenting about Santer’s statistics and the reversal of some of his claims using up-to-date data. It’s Santer’s obligation to use up-to-date data in his calculations – not data ending in 1999.

Steve I didn’t want to express elaborate opinions on cosmic rays or other solar theories, neither. I invoked them only for the purpose of casting doubt on thesis that showing absence of tropical amplification means disproving greenhouse warming. I agree with your assessment of Santer et al, I just wanted to comment on wider implications of your critique, and of similar evidence presented by others about this specific model-data inconsistency in tropics.

Re: Ivan (#25), Ivan it may not be either/or, it could be a bit of both. But remember that the IPCC has vehemently denied that there is a warm bias in its surface data, going so far as to fabricate evidence to conceal the published evidence on this point (see Section 2.1 of this paper and this). The evidence of surface data warm bias is not confined to the tropics, but since that region includes a lot of very low income countries we should not be surprised that data quality over land is poor. As for data quality over the ocean in the tropics, there was a paper by Christy et al. in GRL a few years ago (which I’ll dig up tomorrow, though I’ve linked to it before) using buoy data that showed water surface temperature had a warming bias when used as a proxy for tropical marine air temperature, which is a common device since there are so few samples of marine air temperature in the tropics.
.
For my part I think the absence of stratospheric cooling since 1995 adds to the evidence that the parameterization of tropospheric warming from GHG’s is too sensitive. GHG’s are supposed to warm the tropical mid-troposphere and cool the stratosphere: whereas the sun wouldn’t do the latter. So Tropical Mid Troposphere minus global stratosphere should be increasing, i.e. if GHG sensitivity is that high, the series should be diverging. I invite curious readers to get the RSS data and see for themselves just how much divergence has taken place lately.

But if underlying physics is correct, tropical amplification is to be expected whatever reason for warming would be. Earth has warmed in 20th century by 0.6 deg C. If CO2 was not the reason, something else must be, most plausibly solar effect.

Why is it necessary that there should be some other reason? Do we know enough about the physical processes and their interactions to say with absolute confidence that there must be forcings to account for this 0.6 deg C warming; i.e. the sum collection of the interacting systems cannot drift by 0.6 deg C (or some other significant amount) in the absence of a significant and measurable change in at least one of the assumed forcings?

Why is it necessary that there should be some other reason? Do we know enough about the physical processes and their interactions to say with absolute confidence that there must be forcings to account for this 0.6 deg C warming;

I cannot say, but most of the people who question AGW at the same time single out Sun-climate connection as alternative explanation, either in TSI variation-temperature correlations variant, or Svensmark-Shaviv cosmic rays hypothesis (more plausible in my opinion). If we accept any one of those hypothesis we accept significant solar external forcing of past climate on decadal, centennial and millennial time scales. The warming of 0.6 deg C is not out of natural climate variability range known from the past, but all past variations included, exactly according to Sun-climate connection theories, a very significant external forcing. If Svensmark’s or old fashioned TSI-temperature correlation theories explain recent warming, than we should expect tropical amplification as well. Cosmic rays flux variations modulating low cloud formation, or increasing TSi are clear external factors. Even if we assume, as Roy Spencer does, internal variations based on say PDO, I think that it’s not particularly hard to show that PDO has a lot to do with TSI or cosmic rays variations.

If IPCC tomorrow says “Svensmark is right, AGW is garbage”, only thing that would change will be estimation of direct and indirect radiative forcing of the Sun (that will greatly increase), and of radiative forcing of CO2 and CH4 (that will decrease). But, not the least the property of tropical atmosphere to warm by certain degree in presence of strong forcing of whatever kind, and to warm more with increase in altitude than at the surface. Am I wrong?

I cannot say, but most of the people who question AGW at the same time single out Sun-climate connection as alternative explanation, either in TSI variation-temperature correlations variant, or Svensmark-Shaviv cosmic rays hypothesis (more plausible in my opinion).

I would say there doesn’t need to be an “explanation” because we haven’t seen anything out of the ordinary.

Ivan, AGW promotes an upper troposphere hot spot as its fingerprint. But CRF mediated by low cloud is low atmosphere, open ocean, and magnetic field strength correlated AFAIK. Very different, and consistent with observations, fingerprint.

Ivan, AGW promotes an upper troposphere hot spot as its fingerprint. But CRF mediated by low cloud is low atmosphere, open ocean, and magnetic field strength correlated AFAIK. Very different, and consistent with observations, fingerprint.

But, Davis, if warming was caused by CRF changes, that would manifest itself in decreasing low cloud cover in tropics. Is that any different from greenhouse induced warming of tropical see surface that lead to water vapor feedback, resulting exactly in decreasing of low clouds and changes in lapse rate? Physics is the same, no matter whether initial warming was consequence of addition of CO2 to atmosphere, or of CRF-caused decrease in low cloud cover. That’s the problem.

It does matter what the cross-sectional pattern of warming is, and I am surprised there hasn’t been a study that open-mindedly tried to attribute warming on the basis of the different statistical match to patterns of cross-section (longitude vs altitude) temperature change. It seems like a source of fairly definitive evidence, and perhaps now the battle is moving more towards that. Sorry I didn’t mean to go OT, but point out that it does matter.

Ross,
my initial doubt remained – if we assume extremely low sensitivity to CO2, radiative forcing able to produce 0.6 deg C of warming during 20th century must be seen in tropical troposphere (whatever that could be). Your previous comment and papers linked point in that direction. I wonder – if hot-spot must be there irrespective of type of forcing responsible for global warming? I know that IPCC assumes away possibility of any other strong forcing capabel to cause global warming. My question is what if IPCC is wrong? Tropical amplification is physical phenomenon and it cannot disappear just because we conclude that warming is caused by some other type of forcing.

Re: Ivan (#33), Ivan, if I understand your point, you are saying that: the warming of the last 100 years implies that there was a strong external forcing; the forcing has to be strong enough to produce the warming; the physics demands that it must have been amplified in the tropical troposphere; the failure to observe such warming so far may simply mean we have bad data.
.
By way of response: (i) we don’t need external forcings to produce warming or cooling, the climate is a chaotic system able to change itself; (iii) I doubt the physics really demands a tropical hotspot. The TT hotspot is a feature of climate models, but since it depends on parameterization of physical processes rather than a derivation from first principles, we are not able to say what the physics demands; (iv) always a possibility, especially in the tropics. However if we are going to disavow the surface data for the middle half of the Earth’s surface on the grounds of quality problems, then there’s a lot of data that will need to be disavowed, and when you’re done, there might not be enough data left to establish AGW as an issue.

Re: Ross McKitrick (#37), Ross, (i) it is not clear what exactly “internal variability” means in this context. ENSO, PDO, AMO are widely known modi of “internal” variability, but most of them are nonetheless very significantly correlated with solar activity variations, which is to say with one significant mode of “external forcing”. (iii) I cannot comment this. I am not physicist, but even if you are right that would mean that hotspot is not unique feature of AGW, but rather dubious effect of arbitrary parametrization in models. Than real target is not hotspot (which should show up with any external forcing in such a sloppy modeling), but, modeling process itself and unrealistic physical assumptions built into the models. I am not sure to what extent this is true. (iv) I agree, and that is exactly the reason why the are trying so hard to “correct” satellite data.

Perhaps I am not understanding you Ivan, but my understanding is that the discriminant features are different to those you are talking about. CRF effect is not thought to be evidenced in the tropics, but correlations in other places, and GHG is not evidenced in the troposphere, but in the middle troposphere. Not wanting to get into the physics, but you seem to be rolling all effects into a mean value of temperature increase, rather than the higher moments.

When I started to read and understand the Douglass/Santer debates I was struck why the calculations seemed to veer from the actual point of contention, i.e. the differences/ratios between the observed temperature trends in the troposphere and at the surface in the tropics were/were not being accurately predicted by the climate models.

I looked at some differences in earlier posts here at CA and recently Steve M presented some comprehensive calculations showing the T2 – T0 trends for observed and modeled are significantly different for UAH and RSS and that the observed trend differences with the models are rapidly increasing to significance as the calculations are updated from 1979-1999 to 1979-2009. He also pointed to the difference that can derive from using different observed T0 data series.

What I intended to look at was combining UAH and RSS data series differences and comparing them with the 48 simulations (from 19 models) for trends T2-T0 and T2LT-T0. I also wanted to hone my skills using the KNMI site for obtaining temperature data and, of course, using R. I had downloaded the latest series of UAH and RSS troposphere data, GISS (250 km) and HadCru3 land and ocean data series from KNMI into Excel and had recently loaded the model 48 simulations TO, T2LT and T2 from Climate Audit into Excel.

I did my calculations in R after loading the data series with clipboard. All data series were from the global zone 20S to 20N and all were in the form of anomalies that were normalized to time period of interest, i.e. 1979-1999 or 1979-April 2009.

I found that the GISS (250 km) series from KNMI appeared to give very different trends differences with the models than HadCru3 and after validating the download with a second one, I added a third T0 series using GISS (1200 km) from KNMI. This series gave model comparisons much like those from HadCru3.

I did a benchmark to MM and Santer results to ensure that I was doing the calculations correctly. The results are shown in a table below.

The calculation results are presented in the table below and are in essential agreement with those that Steve M completed earlier. For the 1979-1999 time period the combined UAH and RSS differences with the models showed statistical significance for T2 and T2TL in the 1979-1999 time period when using HadCru3 and GISS (1200 km) T0 surface series and close (85%) using the GISS (250 km) T0 surface series. In the 1979- April 2009 period all the combined UAH and RSS differences were significantly different than the corresponding model differences.

Why the differences from GISS (250 km) and GISS (1200 km) series yield such dramatically different results and GISS (1200 km) and HadCru3 give such simialr results is a bit of a puzzle for me.

I did a rather comprehensive comparison with all these differences using the R acf (auto correlation) and pacf (partial auto correlation) function after reading how to separate the effects that an AR1 correlation can have on higher orders of correlation. I can show some of these graphs at a later time, but suffice it to say that AR1 appears to be by far the dominate mode of auto correlation of the regression residuals in these difference (and non-differenced) series.

The data were extracted from Excel files into R via clipboard. The observed data was in a single file with UAH TM, UAH TL, RSS TM, RSS TL, HadCru3 and GISS1 in columns 1,2,3,4,5 and 6 respectively. The modeled data were in three files for the MTO, MTL and MTM with the 48 simulations collated by Steve M. The observed and modeled data in the form of anomalies were always normalized for the period of interest, i.e. 1979-1999 or 1979- April 2009.

I did some more analyses of the data I used in post # 39 of this thread. I looked at the auto correlations of the residuals for the observed and modeled data series by plotting the higher orders of acf and then comparing that plot with that for the pacf. The pacf plots show the “partial” correlation between two variables and measure is the amount of correlation between them that is not explained by their mutual correlations with a specific set of other variables or higher order interactions. I also observed normal Q-Q plots for the observed series differences of the regression residuals. Finally I looked at the normality of the model series for T0, T2LT, T2 T2LT-T0 and T2-T0 using the Sharpiro-Wilk test (shapiro.test function in R).

I believe the above analyses cover the assumptions needed for the Santer type analysis found in Santer et al. (2008). Below I show a typical rendition of the acf and pacf plots and the Normal Q-Q plot. I also list the p values for the Shapiro-Wilks test for the model and model difference series.

The acf and pacf plots would appear to my untrained eye to rather conclusively indicate that the assumption made in Santer that a Nychka adjustment for the serial correlation for AR1 is valid. I think that the ARIMA model (1,1,0) fits these series well.

The Normal Q-Q plots of the observed difference series seem to me to be reasonable evidence that the normality assumptions in Santer for the observed series and observed series differences in handling the regressions statistics are valid.

I was a bit surprised when the Sharpiro-Wilks test for normality showed the T0, T2LT and T2 modeled series as normal but that when I subtracted or differenced these normally distributed series I obtained distributions that failed the normality test. I judge that result needs more analysis.

As a lay person it always makes me nervous when data is added or removed from calculations because of ‘protests’ by some group or another that does not like the results it gives. Even if such a move is justified it gives the impression that the data is being cooked to produce a desired result, in this case a greater proof of ‘global warming’.

Thanks Jonathan, I’m aware of that. I’m just concerned that Steve has compromised his chances of publication. Maybe if he’s already certain it’s getting rejected then the move to arxiv won’t matter. I was just surprised by this move back when this post was written and forgot to post a comment.

Re: Plimple (#47), most physics journals do not consider a post to arxiv as prior publication. Those which did swiftly found that authors moved elsewhere. You still have to tread slightly carefully, but it is rarely a problem in practice.

Ben Santer sent two reactions by e-mail, one of which is here and the other — far more technical — is added as a comment:

The bottom line is that the identification of human effects on climate is a signal-to-noise problem. A human-caused warming signal is embedded in the rich, year-to-year and decade-to-decade noise of natural internal climate variability. Scientifically, we never had the expectation that there would be some monotonic warming signal in response to slow, human-caused changes in greenhouse gases, with each year inexorably warmer than the previous year. In detection and attribution studies, we beat down the large noise of year-to-year and decade-to-decade variability by looking at changes over longer sweeps of time. When you consider the entire satellite era (1979 to present), signal-to-noise ratios for global-scale changes in lower tropospheric temperature now exceed 5 – even for UAH lower tropospheric temperature data (see…”fact sheet“). This is what the discussion should focus on – the signal rather than the noise.

and in the comments Revkin pasted the 2nd item:

Ben Santer of Lawrence Livermore sent this response to my e-mail query along with what I’ve added at the bottom of the post:

I’m currently in the process of updating (with CMIP-5 simulation output) the analysis described in our 2011 JGR paper. Recall that the 2011 JGR paper was based on the analysis of older CMIP-3 simulations of forced and unforced climate change.

Our recent PNAS paper [ http://j.mp/pnassanter12 ] indicates that tropospheric temperature variability on 5- to 20-year timescales is, on average, larger in CMIP-5 than in CMIP-3 models. So based on the analysis of CMIP-5 simulations, it is likely that it will take longer than 17 years to discriminate between internal “climate noise” and an externally-forced tropospheric warming signal.

As described in both our 2011 JGR paper (see paragraphs 36 and 38) and our 2012 PNAS paper (see page 3), there are a number of possible explanations for differences between observed temperature trends and model trends in simulations of historical climate change. Dr. John Christy and Dr. Patrick Michaels claim that such differences are entirely due to model response errors. Such claims are scientifically incorrect. Errors in the imposed forcings – particular the anthropogenic aerosol, stratospheric ozone, solar, and volcanic forcings – remain a serious concern. And as the history of the MSU debate has taught us, we certainly cannot rule out residual errors in the observations.

I did not read more than the abstract for this paper, but, if the paper is showing evidence for human made warming over the satellite period, that is not exactly news.

I have used Santer (08) methods of comparing model and observed temperatures for the period 1964-2013 May and found a significant difference. I compared, one at a time, the observed series of HadCRU4, GISS and NCDC against the models and runs in the RCP4.5 scenario from CMIP5. In all cases the model mean was running significantly higher than the observed series.