Dirty Harry 4: When Harry Met Gill

Yesterday, I noted that Steig had criticised previous developers of Antarctic gridded temperature data for not having “paid much attention” to West Antarctica (e.g. the NASA GISS trend map left the area blank due to lack of data meeting their quality standards) and reproached his predecessors (including, it seems, even Hansen) for “calculating with their heads”, rather than “doing the math”, which, in his case, seems to be inseparable from the use of Mannian algorithms.

In various venues, Steig and coauthors have said that they “get” similar results using their surface/AWS reconstruction and satellite (AVHRR) reconstruction, with the similarity of the AWS result being used to reject concerns over possible problems and biases in the AVHRR reconstruction (where the monthly data remains unarchived and the data set has received negligible attention as compared to microwave and surface data).

In yesterday’s post on the West Antarctica sector of interest, I observed that GISS and Steig both used the same surface and AWS data – the blank area in GISS came not from failing to “do the math”, but because the station quality didn’t meet minimum GISS standards (which are not particularly arduous.) Differences between GISS and Steig arose, at least, in part because of Steig’s inclusion of these West Antarctica stations, which I listed: Byrd, Russkaya surface and Byrd, Siple, Mt Siple and Harry AWS stations. I plotted the AWS reconstructions (recon_aws) from Steig’s SI, showing that only Harry had a pronounced 1979-2003 trend (the period illustrated in Steig Figure 4), with Mt Siple AWS actually being negative. I ended the post with a teaser, saying that there was trouble with Harry.

FUBAR
Having identified Harry as a highly leveraged series in the AWS reconstruction, I decided to plot the GISS version against the READER version- this is the sort of routine plot that I do all the time. I had several versions of scraped GISS data, I had a vague recollection of a screw-up last June involving READER data (more on this in a moment) and thought – hmmm, let’s compare an old version with a new version, which yielded a plot looking like the one below (the script for this is in the first comment). As you see, there are HUGE differences between the new Harry and the old Harry.

Steig didn’t archive the data as used. But it turns out that it is possible to show that Steig used the “new” Harry version. Steig gives lat-longs for 63 recon_aws series, which can be matched to lat-longs of READER AWS stations (which I’ve matched to GISS IDs). It turns out that, through the wonders of RegEM, the Steig AWS reconstruction does not merely approximate the target AWS series – it matches it perfectly. So you can use the Steig AWS as a type of fossil version to check whether Steig used the old Harry or new Harry versions. As shown below, he used new Harry. The 1979-1999 anomaly version of new Harry pretty much matches the Steig version perfectly. There is no possibility that Steig used the old Harry version.

Figure 2. Comparison of Steig recon_aws version to new Harry version

Harry
The Harry AWS station is described here . It is located at 83.003S 121.393W 945 m (a different location than shown in Steig Table S2). It was installed in Nov 1994. Original AWS data at this site can be matched to READER data- for example, the file ftp://ice.ssec.wisc.edu/pub/aws/climate/1995/READER1355_1995.dat has a header line

Station Argos ID: 1355 Station Name: Harry Year: 1995

and the monthly values can be matched to READER Harry in 1995 (rounded at READER to one digit). Jan and Feb 1995 values are -10.6 and -14.1 respectively in both sources.

There was no particular pattern to the nomenclature in the Wisconsin data, so I ended up reading all the files, collating the header lines and making a details file, which is available at A/data/steig/details.wisc.tab (details). Using grep, I searched “Harry” and “HARRY” and got 5 files.

I collated the original data and compared it to new Harry (shown below only for years of overlap) and there was a match up to rounding between new Harry and Wisconsin Harry for 1994-1996 and 1999-2000, but the provenance of the other years was a mystery.

Figure 3. Compare Wisconsin Harry to New READER Harry.

Gill
CA readers made a number of useful suggestions yesterday on data provenance. I was fairly determined to track down the provenance of the “other” Harry data and by this time, I’d downloaded all the Wisc data (collated at CA in data/steig/wisc.tab). In the end, I simply examined all the Jan-Feb data in selected years to see if any data matched “Harry”. And thus, we met Gill. The “other” Harry data is derived from Gill. I’d figured this out yesterday when I wrote the teaser – a CA reader also figured this out for 1987-1989 this morning. But it’s not just 1987-1989; it’s 1987-1993 and 1997-1998. Also the “old” Harry was actually “Gill”. The graphic below compares READER/GISS New Harry with original Gill. Values are identical from 1987 to July 1994 and in 1997-1998. Values are different from Dec 1994 to Dec 1996 and for 1999-2000 where “Harry” has been spliced in the READER/GISS version.
Figure 4. Compare original Gill data to READER New Harry.

Gill is located on the Ross Ice Shelf at 79.92S 178.59W 25M and is completely unrelated to Harry. The 2005 inspection report observes:

2 February 2005 – Site visited. Site was difficult to locate by air; was finally found by scanning the horizon with binoculars. Station moved 3.8 nautical miles from the previous GPS position. The lower delta temperature sensor was buried .63 meters in the snow. The boom sensor was raised to 3.84 m above the surface from 1.57 m above the surface. Station was found in good working condition.

I didn’t see any discussion in Steig et al on allowing for the effect of burying sensors in the snow on data homogeneity.

The difference between “old” Harry and “new” Harry can now be explained. “Old” Harry was actually “Gill”, but, at least, even if mis-identified, it was only one series. “New” Harry is a splice of Harry into Gill – when Harry met Gill, the two became one, as it were.

Considered by itself, Gill has a slightly negative trend from 1987 to 2002. The big trend in “New Harry” arises entirely from the impact of splicing the two data sets together. It’s a mess.

If you now turn to the READER information page, you’ll see that Harry is right underneath Gill. Did that contribute to the screw-up? Are there other corresponding errors? Dunno.

British Antarctic Survey Erases Harry Data
Yesterday, I observed that there was trouble with Harry, notifying readers that today’s post was on the way. Triggered by this teaser, a reader pointed out that Gill and Harry had been spliced (something that I’d noticed yesterday and was one of the reasons why I put out a teaser for today’s post). About 1 pm Eastern today, when I went to verify some information at BAS , I found (as a CA reader had also just observed) that BAS had erased the “New Harry” data and a correct Harry version suddenly appeared (different from either old Harry or new Harry) – which, in its way, is a pretty compelling verification of the points made above. The British Antarctic Survey issued only the following notice, notably omitting any credit to Climate Audit (following the example set by Hansen and Mann):

Note! The surface aws data are currently being re-proccessed after an error was reported in the values at Harry AWS (2/2/09)

They did not provide any description of the error. They erased the incorrect data without retaining any copy of the prior version. You can still get the incorrect data at GISS, who haven’t changed their data yet (and I’ve saved a copy which I’ll put online at A/data/steig.)

June 2008
Does any of this bring back memories of a prior incident? Last June, we noticed data for the Southern Ocean shifting at GISS as we examined the derivation of Wellington NZ (an arbitrary example, but an interesting one in the present context.) See here. In the next comment, I noticed problems with Chatham Island (which is in the READER database, though we hadn’t turned our attention to it at the time); John Goetz identified the Chatham Island data as coming from the BAS site noting:

the record on the UK site has some errors,

We noted that GISS seemed to manually correct the BAS errors, see the next few comments 61-70, wondering what the basis of the edits was. Subsequently, we observed that GISS noted a few days earlier (without reference to Climate Audit) that changes had been made to Southern Ocean data due to problems with the READER collation:

June 9, 2008: Effective June 9, 2008, our analysis moved from a 15-year-old machine (soon to be decommissioned) to a newer machine. This will affect some results, though insignificantly. Some sorting routines were modified to minimize such machine dependence in the future. In addition, a typo was discovered and corrected in the program that dealt with a potential discontinuity in the Lihue station record. Finally, some errors were noticed on http://www.antarctica.ac.uk/met/READER/temperature.html (set of stations not included in Met READER) that were not present before August 2007. We replaced those outliers with the originally reported values. Those two changes had about the same impact on the results than switching machines (in each case the 1880-2007 change was affected by 0.002°C). See graph and maps.

Even though errors had been identified in one portion of READER, I guess they didn’t bother checking other READER data sets e.g. http://www.antarctica.ac.uk/met/READER/aws/awspt.html to see if maybe the problem that caused the Chatham Island problem occurred elsewhere. Had they done so, I’m sure that Steig et al would have appreciated it.

This would be funny if it were not actually quite depressing. What sort of standards do these people use for their work? When you have a set of data, with the odd data point outside of the rest, the first, and sensible thing to do is double, if not triple-check for errors. That’s what was done here, and within a day the error was spotted. But there you have professional scientists with Ph.D.’s whose job it is to be rigorous and meticulous, and they just can’t do it.

Re: Urederra (#6), Urederra, this report gives some description of a “Boom Sensor”. It’s either a Verticle or Horizontal movement of an extension with the Sensor near its end. Most likely verticle extension would be used in this case because of Snow.

Re: Urederra (#6),
If you look at the photo which Hu posted yesterday= http://uwamrc.ssec.wisc.edu/images/aws/harry_before06.jpg ,you will see a piece of channel (aluminum?) bolted across the top of the tower section. This channel has two protrusions hanging below the near end. The piece of channel is the “boom” and the protrusions are the “sensors” in this case identified by someone else yesterday as Platinum resistance thermomeners.

Let’s see if this will work: an image of the type of PRT Bridge temperature sensor U.Wisc. SSEC reports is used by the Harry AWS. The thing looking like a beer can is the radiation shield you see hanging from the underside of the AWS crossarm.

Long time lurker. Awesome work.
I guess the question becomes; what impact does Dirty Harry have on Steig’s report?
I assumed, since Mann was involved, that some special dataset would dominate the results. If Dirty Harry is the cherry in this particular pie, I would hope it would be so obvious that anyone could see the problems in some of these reports. Possibly we might get some reticence to blindly accept the press release associated with the next report?

The application of quality control doesn’t cost much, but is worth its weight in gold. It’s a pity tax-funded organisations don’t bother with verification, validation, data control and all those other quality control functions that most businesses use.

Speaking of credit where credit is due, James Hansen, Al Gore, and Lonnie and Ellen Thompson have all been recipients of $1M prizes from the Dan David Foundation, endowed with $100M by a businessman of that name. No one accuses them of being pawns of Mr. David, since the awards were in recognition of prior work, rather than payment for future services.

Perhaps there is a Dirty Harry Foundation or some such out there that would be willing to bestow a comparable sum on Steve for his past tireless (and uncompensated) quest for truth?

– The area of principle interest in the Steig et al analysis is West Antarctic, where they report a newly-discovered warming trend attributable, in part, to Mann’s clever ability to calibrate temperature to something-other-than-temperature and thereby reconstruct the non-existent temperature data.

– Within that region Steig introduces data from 4 Automated Weather Stations, and the AWS with the big trend is called Harry.

– In exploring the Harry data you compared the current GISS version (which Steig used) against one you downloaded in [SM: early] 2008. I gather that the 2008 [SM- current] GISS version equals the version in the READER [SM: Feb 1, 2009] archive. You noticed that the 2 are different, with the newer version showing large positive divergences in 1995-1997 and 1999-2003 (approx.) These divergences are very large, 5-15 C.

– In the Wisconsin temperature archive the Harry station is listed at a different location than Steig reported. However the Wisconsin and Steig Harry data are identical in the overlap years.

– After searching through the other Wisconsin records you found that the old Harry series (archived on GISS as of 2008) was identical to the data from another station called Gill. The Gill station is located somewhere else entirely.

– It turns out that new Harry–as used by Steig–is a splice of Gill and some relatively recent Harry data.

– The recent Harry data as recorded at Wisconsin is for years 1994-96, 1999-2000. There are also Harry data for years 2001-2004 (it looks like) in your first figure, which are from GISS 2009. Are these data also from the same station that contributed the Wisc data? [SM – data ends in 2002. Haven’t checked 2001-2002 yet. Wisc online archive doesnt have 2001-2002 information, but it seems to be incomplete].

– The immediate questions of interest are: Are other Antarctic series in Steig similarly fouled up? [SM – Dunno] and, What happens to their results if new Harry is either deleted or replaced with a West Antarctic station that is not a splice of 2 unrelated sites, if such a series is available? The other question is whether bristlecone pines grow in West Antarctica: it sounds like they do, metaphorically at least.

Given that the real Harry dataset is so short (starting in 1994), I have to wonder if it’s even appropriate to include in the analysis. And if not, I further wonder what effect that would have on the final results. Any further reduction in the small warming trend they report would make a mockery of their conclusions.

Steve walks warily down the street,
With the brim pulled way down low
Ain’t no sound but the sound of his feet,
Machine guns ready to go
Are you ready, Are you ready for this
Are you hanging on the edge of your seat
Out of the doorway the bullets rip
To the sound of the beat
Chorus

Another one bites the dust
Another one bites the dust
And another one gone, and another one gone
Another one bites the dust
Hey, I’m gonna get you too
Another one bites the dust

Let’s suppose that the Team says that the error doesn’t “matter”. (Imagine that.) They haven’t archived their code or data as used. So it’s impossible to do a turnkey analysis with and without. Is it likely to matter? We know from other Mannian work that Mannian methods can be heavily dependent on a few series e.g. bristlecones, where different results are obtained with and without bristlecones – not that any of them mean anything. So it’s possible that you get different West Antarctic results with and without Harry. I’m not going to even try to do the calcs without full access to source code.

Also please remember that I didn’t parse all 65 sites to try to find one that was screwed up. I looked at the critical region – West Antarctica – and then the series in the critical region, identifying Harry as the strongest contributor to the trend. I looked at Harry first. I think that there’s certainly prima facie reason to believe that the West Antarctic recon would be different without Harry. MAybe I’ll apply for a grant to do the study.

If the climate models that used to calculate a cooling Antarctic now calculate a warming Antarctic, are you saying they will now have to go back to having to maybe calculate a cooling Antarctic? How do they keep contol of what they are supposed to be calculating? Oh I forgot, they don’t have any quality control, so they can easily go back to calculating what they used to calculate before they were changed to calculate what they calculated after they calculated something else (or not) maybe.

I didn’t see any discussion in Steig et al on allowing for the effect of burying sensors in the snow on data homogeneity.

They don’t need to discuss that – the ptarmigan has known it for thousands of years, when weather gets really bad they let themselves get snowed over. Snow is an excellent insulator and temperature swings are strongly damped just a few centimeters down.

Thanks, Steve, for the reseach. Your webiste really earned its name on this project.

And thanks Ross, for the summary. I realize that Steve has to document, in the text, every single statement he makes, and cover off every single alternative explanation. That makes his claims bulletproof but also makes it a bit less accessible to the layman. Therefore it’s always helpful when someone writes a simpel step by step summary.

The real clincher here would be if you could re-run the whole study with the correct data and see if the result is still statistically significant.

Very interesting piece of quality control work. I posted the following at RC – they do not appear to have closed comments or at least the comment box is still visible:

SM at CA has identified what appears to be a major error in the Steig et al paper that suggests that the perceived trend is an artifact of this particualr error. Perhaps this is an opportunity to mend some fences and work towards a common goal of better data and clearer methods.

It would have been nice had SM actually notified the holders of the data that there was a problem (he didn’t, preferring to play games instead).

You see? It was all Steve’s fault he didn’t get a credit. If he’d done things differently, he would have been showered with adulations. Of course, it is entirely possible to use this reasoning to never give Steve credit – you can always say “he should have done it this way, then it would be alright…”. I suppose once you get in the habit of ex post reasoning, it is difficult to stop it…

Gav goes on:

As for the implications of the errors in the BAS Harry file on the study, that too is visible in figure S4b – removing Harry (and a bunch of other AWS stations) doesn’t change the answer in any meaningful respect

(My emphasis). Anyone remember certain reconstructions being robust to the removal of all dendroclimatic indicators? Now this time “it might be different”, but watch the pea…

As for the implications of the errors in the BAS Harry file on the study, that too is visible in figure S4b – removing Harry (and a bunch of other AWS stations) doesn’t change the answer in any meaningful respect.

Actually, there’s quite a big difference between S4a (full reconstruction) and S4b (subset). In the subset analysis there is much more warming in the West Antarctic, yet this analysis includes only one site in the West Antarctic (Byrd). This is a region about the size of Western Europe.

I must say it’s a pity that the SI isn’t clearer. I’ve read it several times and it’s very hard to work out exactly what they’ve done. If Harry, Siple, Mt Siple and Byrd AWS weren’t used (i.e. not part of the 42) then why are they in Table S2 (and Figure S4) at all?

Re: James Lane (#76), The AWS were part of the original reconstruction, in addition to the 4 automatic stations. They are listed in the table since they are used for Fig S4a. The 15 stations for the alternative reconstruction used were those with “the most available data coverage since 1957″ (page 4, supplementary info). Figure S4b shows the locations and correlations of all the data not used from Table S2 for its construction.

I admit I haven’t been following this study too closely, and I’ve made the mistake of weighing in here before with inadequate research and got things wrong. That said, I make a point of not learning from past mistakes ;)

If you are working with difficult data sets and novel methods, it is important to assess the sensitivity of the results to small changes in input data or method that could not be determined a priori. As a neutral example, consider a study which can be carved up 12 different ways – and when you try each way, you get five conclusions of “A”, five conclusions of “Not A” and two indecisive. This shows a great sensitivity to input data and show that no meaningful interpretation can be made.

The team have a tendency to cherry pick methods that yield “A”, and post up responses saying look, we can get “A” in all these different ways. That doesn’t help when trivial mods can also yield “Not A”.

That’s why I get suspicious with the way Gavin answers the question. They say “method 1 gives the answer A”. Steve asks the question, “there are some peculiarities, what about method 5?”, and Gavin answers with “method 5 doesn’t matter, we get the same answer with method 7″. This doesn’t answer the original question. Not to say there is a problem with method 5; but the result for method 7 is nonresponsive, and doesn’t address the sort of concerns outlined above.

Of course – the usual caveat applies. I don’t know the answer, there may be nothing wrong. But when I see the usual team tactics, I get suspicious. If it is true that a few more data points can swing the West Antartica trend one way, it may be the case that a few more points could also have the opposite effect.

Congratulations Steve. This is a data analyst’s nightmare, made so much worst by all the hype around it. This is why you eyeball everything, then get the same result from different approaches. Not just feed it into a Mannomatic muncher. This is looking like withdrawal of the paper could be on the cards. I’m remembering Steig’s comment at CA. Who’s going to get the last laugh now? I wouldn’t hurt these guys to get out of RC and spend more time on the blogosphere improving their communication skills.

Sorry to post this to the wrong thread – but is there any reason why comments are closed on the Antarctica warming thread? There is a fascinating discussion on at Climate Audit on this very topic, which is casting serious doubt over the credibility of the Steig study. It would be nice to be able to ask a few questions.

[Response: They were turned off last week since Eric is off to Antarctica. They are back on now. However, I wouldn’t get too excited about the current discussion (see figure S4 in the supplementary material). – gavin]

It seems that an error of this magnitude on one station could matter — as there are so few stations involved. I’m interested to see if this develops further. Or, if it is dismissed as having minor import.

Didn’t Steig write — implying that the trends of the station data were unimportant to his methodology and his findings. If so, would be interesting to have the fine details of his methodology so his confidence could be independently justified.

I wouldn’t get too excited about the current discussion see figure S4 in the supplementary material)-gavin

In their commentary, they say:

In the full reconstructions, we used the complete weather station data (1957-2006) from all 42 of the READER database locations listed in Table S2.

One problem with this sentence is that there are 46 locations in Table S2 – not 42. There are 42 surface stations and 4 AWS stations. Did they use the “complete” data from S2 (46 series) or the subset of 42 stations? With code, you could check. Given that recon_aws is a 100% “reconstruction” of “Harry”, it seems certain to me that “Harry” was used in the reconstruction -regardless of any proposed interpretation of these sentences.

This figure also describes results from a 15-site version. Unfortunately, Steig didn’t archive those results. But I’ll take a look at the 15 sites in question.

IT would be simple enough for Steig et al to end speculation about the effect of this gross error: archive their source code so that people can run it to obtain the reported results and re-run it with correct data. If they’re confident that the error doesn’t “matter”, why not?

Using the smaller 15-site subset, the only West Antarctica site (excluding the peninsula) appears to be Byrd. It’s reconstruction had a small positive slope of +0.12 deg C/decade. Hard to explain a drastically warmer West Antartica using only that dataset.

Don’t forget, the satellite (AVHRR) reconstruction was represented as the principal basis, and not the AWS network. Let’s see how the new circumstances are explained…or not.

Anyone notice how Gill is a fast mover. Any bets on how long it is before Gill’s fleet footed movement by miles is cited as further evidence of Global Warming speeding the movement of the ice sheet and not accelerated ice sheet growth from cooling?

I didn’t know Steig’s article made the cover of Nature. I saw the cover photo on Steig’s webpage. Wow. Those guys at Nature are really going to hate you! You keep embarrassing them by finding errors in their articles. Keep up the good work, Steve.

Judging by some of the data selection criteria and the quick response of the British Antarctic Survey, someone’s going to have a red face. Warming must not be too important an issue to these scientists since the data from 60 some stations can’t be kept together. Jeez, how hard is it?

“Here, we use an additional masking technique in
which daily data that differ from the climatological mean by more than some threshold value are
assumed to be cloud contaminated and are removed. We used a threshold value of 10°C, which
produces the best validation statistics in the reconstruction procedure.”

O.K., broken record by now, but I too, Mr. McIntyre, like and marvel at your work, as always. Hey, since I am after all a contributor to the tip jar, and am not a climate scientist (Hey, y’all, hold down the snickers there!) or a statistician, I have one request: Would you please be so kind, when you introduce in a post an acronym which is not listed in the site’s acronym list, to write it out fully before utilizing it as a stand-alone acronym? This would be a great help to readers such as myself. If you would be so kind.

Hmmm, The 15 site subset uses Byrd surface station data but not the Byrd AWS data according to Table S2. The 0.12 deg/decade was from the Byrd AWS reconstruction. Is the Byrd surface station trend more extreme? Doesn’t seem like it should be.

It seems apparent that what the world needs is some kind of trusted source of raw data from which to work. There seem to be many different variations of the “same” data and this is yet one more instance of several that reveals problems in the quality of the raw data. Incorrect data results in a waste of effort as researchers’ conclusions are found to be incorrect no matter how much care taken in analysis.

It would be a fine project for some major university to undertake the archiving and quality assurance of a (more) trusted bank of raw data. Such a project, if done objectively without “agenda”, could raise the institution’s prestige and provide teaching opportunities in areas of handling and archiving data.

Government agencies are probably not the best place to handle the quality control and archiving though they would be vital in the raw data collection process. The spotting of errors and creating a standard archive from data input from the several nations of the world would best be done by an academic institution or even a collaborative effort between several such academic institutions.

Until there is some integrity of and trust in the underlying data, all the climate research in the world is potentially a waste of energy in addition to being a potential source of embarrassment to those using those data sets. There is an obvious need here that no one government can undertake themselves. I would suggest a global climate data clearinghouse sponsored by the many great houses of research, possibly funded with grants from the various nations, with the data available in a standard format to anyone on the planet.

It seems apparent that what the world needs is some kind of trusted source of raw data from which to work. There seem to be many different variations of the “same” data and this is yet one more instance of several that reveals problems in the quality of the raw data. Incorrect data results in a waste of effort as researchers’ conclusions are found to be incorrect no matter how much care taken in analysis.

Well, no argument. See A. Watts for US sites. But, with all the money going into the Antarctic project (see all the previous references), and the problems with getting reliable installations as cited on the U of Wisc. descriptions, how can that be accomplished? Since Anthony shows that many easily accessible domestic sites are (snip) then how can essentially inaccessible snow bound sites be done better?

My earlier request for installation information still stands, but it is likely information which the equipment constructors and installers hold close. Photos of the equipment are only partially useful. Some specific things I would like to know: How are the guy wires or chains anchored, and what base is used? How is the equipment height above present snow surface decided? How many years are planned between visits to raise the equipment? Or not?

, Although it won’t answer many of your questions, it may answer many and lead to sources who may answer more. See:
Antarctic Automatic Weather Stations Project

I have pursued most of the references in that and the U of Wisconsin site, and none address the mechanical and electrical design and construction. I have purposely avoided direct email, as the information is principally to satisfy some curiosity about how it is done now compared to “my day”, since I know they have a lot better things to do with their time. I had hoped someone knew of some other place where the specifics might have already been published. IEEE Journal, ASME publication, whatever.

#5 Steve, this blog IS about motives. Do you think you have so many readers who are interested in pure statistics? All the readers here think there is a motive. they themselves have a motive. You have a motive too.

Steve: so what. I ask that people don’t discuss motives here. If you want to do so elsewhere, that’s your right.

Francois, I disagree. This blog is about competence. Anyone with the motives one can presume for the “AGW Team” can still do good science if they are competent. On the information to date it appears in this case they are not.

That is to the extreme discredit of all six authors, the reviewers and the journal which published it.

It seems a standard line of argument: None of the errors matter because the result doesn’t change when you make gross errors in the input data. But that is very troubling for the nature of what they are doing. Why include data that is irrelevant to the final result?

They claim that f(a,b,c)=a+b then why include c in the arguments? Indeed, sometimes it seems worse f(a,b,c)=”ex ante determined constant”. Sounds like that old trick where you tell someone to pick a number, any number, and then go through a process of arithmetic operations until you magically come to the answer of 4.

I know this question goes a step beyond beyond what Steve has found, but what is the protocol if there really is a significant flaw in the Steig et al paper?

Should the authors withdraw the paper (something similar happened last year in a maths paper Steve referenced, but that may have been published electronically), and if so, what is the mechanism?
Would scientists simply not reference Steig’s paper knowing it was flawed?
Would nothing happen until Steve wrote a paper refuting the original and his paper completed peer review and publication?

Re: Fred Harwood (#74),
I meant “significant flaw” in the sense that it “matters” and has some effect on the conclusion – and that the authors agree with this. The question isn’t really specific to the Steig paper, so lets put it another way:

What happens if someone points out a mistake that invalidates the overall findings of a paper? Should the authors take action, or does everyone wait for a peer-reviewed counter-paper?

Re: AndyL (#87),
In general, a paper that is shown be incorrect should be withdrawn by the journal and can’t be cited as a reference paper for other work.
Team papers are different though. Steve hasn’t actually shown that the papers results are invalid and he wont be able to do this until he has enough data (ie. the code and data sets used) to reproduce the results both with and without Harry. In the meantime the team can work on another method of producing the required results so that if/when it is invalidated they can immediately respond with “No it isn’t. If you do this, that and the other then you get near enough the same result”. At which point the cycle begins again.

Re: AndyL (#87), In some fields, if it turns out that a result is a complete artifact the authors withdraw the paper. It happens in biochem., medicine, physics. I have never heard of it happening in climate work, but maybe there are a few.

RE 57. Yup we all have motives. How do we eliminate motives from results?
Publish your data is a good first step. Publish your code is a second good step;
Pick critics as reviewers before publication is a good third step.

[Response: People will generally credit the person who tells them something. BAS were notified by people Sunday night who independently found the Gill/Harry mismatch. SM could have notified them but he didn’t.

“Independently” - puh-leeze. This error has been sitting at BAS for months. It’s been sitting in GISS versions. It was missed in Steig et al. I observe yesterday (Sunday) that there’s a problem with Harry and someone else “independently” noticed a problem with Harry that night. BS. Their knowledge of a problem with Harry derived from the issue being raised at Climate Audit. And they were in a hurry too. They didn’t deliver the message by snail mail or FOI. They got right on it.

Gavin’s comments certainly do not help but I genuinely think that some type of rapprochement is in order. I think we should dial down the snarks and focus on the data and analysis. SM’s primary point is right on target: Post the data and the code in a spirit of scientific cooperation. I think the paper was an interesting effort to address a significant “anomaly” in the AGW story. I do think it is particularly incumbent on those proposing a novel or breakthrough result that they provide the basis for others to replicate there analysis.

As for Gavin’s reference to the Supplementary information, I am intrigued by the three outliers in Table S1, namely, Butler Island, Elaine and Larsen Ice Shelf. Butler Island is particularly odd given its strong trend but relatively low correlation coefficient.

Re: MarcH (#94), MarcH — Nice post. Now I am really confused as are others as to why these AWS stations are included. The only kind of silly reason I can think of is that the increase in dfs some how help the overall significance tests.

If Steig and his crew had pre-peer reviewed his work the way Loehle did here at Climate Audit none of this would have happened. The moral of this story is if you pre-peer review new science papers on websites like Climate Review you are much less likely to publish error riddled papers.

Great work Steve and excellent synopsis Ross. You guys are like Batman and Robin or Sherlock Holmes and Watson or…

This entire episode just reinforces my sceptical opinion of data handling, storage and -snip- in the climate science community. Errors happen, of course, but its just so ridiculous because we keep seeing this sort of deja vue all over again. Bad data but it doesn’t matter, right Gavin? Ugh!

Sure, one errant station out of 46 may not be a result killer, but Steig as primary author is ultimately responsible for the veracity of all the data used and so he should voluntarily pull the article and re-verify ALL sources before re-submission. Who doesn’t do this?

To all you bright analysts that contribute to this site and especially SM this is simply a note of huge appreciation for ALL your extraordinary hard work. I expect that you all take tremendous pleasure from being able to clearly define the errors that you uncover.

Interesting thing about Byrd Station, it is on the top of a significant hill. The hill is not natural, in the winter the buildings and structures slightly obstruct the wind causing snow to drop and and form a drift. In the Austral spring when the field personal returned it was often the case the structures- canvass and wood Quonset huts called Jamesways where pretty much fully buried. They were dug out and re-set on the snow surface or in some cases left buried and replaced with new structures.

There is a drill rig at Byrd like those you see operating in an oil field- just about completely buried with only the top portion exposed above the snow. It was used to collect an ice core that you see references sometimes in climate studies.

There are definitely local meteorological effects as the wind driven air rises to clear the obstruction generated hill. I do not remember where the weather station is with respect to the hill but it seems possible to me that there might well be small localized temperature effect especially when one considers decadal trends of a few tenth of a degree C. I have no idea of the sign or the magnitude of such a hypothetical effect.

The obstruction created hill is not unique to Byrd, there is something similar at Siple dome but not as pronounced if I remember correctly. The camp at Siple Dome not as old as Byrd so less time for the effect to accumulate. Byrd is semi-active to my knowledge as I don’t think it is regularly staffed even in the summer but is used only when there is a scientific justification to fly in people and supplies.

As is the case with many formerly active deep field camps there is an emergency food and fuel cache which makes the camp an attractive place for continued occasional use. If you are going to be stuck somewhere in deep field Antarctic it is reassuring to bee somewhere that has a couple hundred gallons of jet fuel, some old structures and a couple hundred pounds of perhaps decade old food in the general area.

I always enjoy reading the stuff on this site, even though most of it is over my head. Fubar, or fu..up beyond all recognition, brings back memories. It reminds me of CF, not to be confused with climate forecast. We used CF a lot in Vietnam, which stood for cluster fu… It’s doubtless coincidental, but I find it curious that a major climate crisis always occurs during wars of choice; global cooling during the Vietnam war and global warming during the war on terror. It’s probably just me. But it is more fun to rock out in London at the concert against global warming than hump the desert in Darfur.

Your service to the climate analysis community is invaluable. Are you ever asked to review data heavy papers like this in advance of publication?
Steve: I was asked to review Burger and did so. I was asked to review Wahl and Ammann 2005 (and did a detailed review posted elsewhere here.) They terminated me as a reviewer. I was asked to review a Mann submission to Climatic Change in 2004 and asked to see supporting code and data; the editor Stephen Schneider said that no one had ever made such a request in 28 years. I said – so what, I am. He said – it will need a need a new policy from the editorial board – I said, ask them. They agreed that I could get supporting data, but not code. So I asked for supporting data, which Mann refused to provide and the paper never saw the light of day. However, by that time, Mann had made a check kiting reference to the paper in Jones and Mann 2004 (and could then use the kited reference.) But I haven’t been asked to review anything in 3 years.

#65: you are absolutely right about the problem of redundant data in Team studies. When they brag about how they can get the same answer after chopping half or more of their data off they are really bragging about how they manage to load all the weight on a few series. This was clear when looking at the hockey stick data set. 95% of it is there for show, the conclusions all hang on a few bristlecones. They could put in white noise for the rest, for all the effect it had.

“I submitted that figure because it is a beautiful rendition summarizing our results. Not only was it not intended to mislead anyone, I don’t agree that it is misleading (except perhaps to those that want to make it so). It’s a pretty picture for heaven’s sake. Don’t read more into it than that.”

I must say it’s a pity that the SI isn’t clearer. I’ve read it several times and it’s very hard to work out exactly what they’ve done. If Harry, Siple, Mt Siple and Byrd AWS weren’t used (i.e. not part of the 42) then why are they in Table S2 (and Figure S4) at all?

I was wondering that myself. I was also wondering what the purpose of the subset test was. Does using less data somehow result in a better reconstruction? Seems backwards to me.

Also, regardless of which test you look at, the maps look rather dissimilar from the ModelE panels in Fig. S5. To the untrained eye, this would appear to indicate (assuming the tests are accurate) that Antarctica is not behaving as the models predict.

Steve, Thanks for the clarification re my overlay graphs on post 54 of West Antarctic, which you have asked to be continued here.

I noted that the period 2002-2006 or so (eyeballing the dates) was also essentially identical for Byrd and Harry. Is this also outside the recording period of one of the stations and so also a reconstruction? Guess I was confused seeing near-identical data given different location origins.

Is there any particular data I can help with given that Australia claims sovereignty to some 40% of the Antarctic continent and conducts active research?

I noted that the period 2002-2006 or so (eyeballing the dates) was also essentially identical for Byrd and Harry. Is this also outside the recording period of one of the stations and so also a reconstruction? Guess I was confused seeing near-identical data given different location origins.

That’s pretty much it Geoff. The records are

Byrd (station) 1957-1975
Harry 1977-2002
Byrd (AWS) 1980-2008

Presumably there’s a touch of Siple, Siple Dome and maybe Russkaya in the mix, but they are much further away. Most of the infilling will between Byrd & Harry, as your graphs show quite eloquently.

Steve’s comment that “RegEM is just a complicated way of assigning weights” is a helpful way to think about the method.

But James, I have been looking today at data from Harry, for the period 31 Jan 2009 to 3 Feb 2009. It’s at 83 deg S, 121.39 deg W. Station 8900. The temp and atmospheric data resemble nearby Elizabeth 21361 and Linda 21362. So when DID Harry stop operating and why was its data seemingly and probably played with on my graph 2002-2006?

Harry ranged between minus 19 and minus 6 degrees C in those few days of half-hourly reports. From this you can, over longer terms, pick a trend of 0.8 deg a decade or whatever? I very much doubt it. The noise is not small.

That overlay method is extremely powerful for discovering the hand of man and I’ve used it for several years, so it’s no longer experimental.

BTW, it is quite to make an error when converting longitudes from “E” of Greenwich to “W” expression if you do not have the hang of it.

But there are photos elsewhere of Harry being dug out in 2006 and records from Harry on the Net for 2009, so what is the basis for saying Harry stopped in 2006?

Given that Harry 2002-2006 is almost identical to Byrd, is this not in the same “error” category as Gill? Seems to me like someone is fiddling the results.

If you look at the transmissions from thee bases yesterday (as I did) about 5-10% are not transmitting; others have flat temp reponses suggesting burial. Who is the poor person assigned to sort/validate the data before serious use? Where does the quality control happen? If I took down daily the reports that are coming from the Antarctic, I would have no confidence at all that I could compose a paper of adequate standard to be submitted. So why is it going up on the Net? To confuse people?

#65, #100: I keep thinking the same thing every time I hear of a new “team” study that is supposedly insensitive to the data. If the data is so inconsequential then whatever algorithm they are using is likely biased to give the answer they want. The team doesn’t seem to realize that this is not necessarily a good thing.

Same goes for their arguments that climate model predictions are consistent with all kinds of behavior from warming to cooling. Again, this just shows that the models are weak.

The really sad thing about the errors upon errors, is that you and I are paying for this incompetence, and the journals seem unable to catch these errors. Thanks, Mr. McIntyre.
As a plumber if I were to make similar errors repeatedly, I would not be able to afford insurance, however these people, in my employ, are getting raises. I think it’s time for a layoff.
Mike Bryant

Re: Mike Bryant (#110),
Mike,
I could not agree with you more. The last time I looked at the Stimulus Bill, the global warming crowd is slated to get some $400-million disbursed from the same discredited agencies that have financed Steig, Mann and all the others. In the real world of plumbers, welders, mechanics and other craftsmen these guys wouldn’t get a nickel after the performances they’ve turned in.

At the end of the day, it is apparent that GISS miscoding of the site locations is what led Steig astray.

I don’t think you should write a major paper like this which is based on so few stations without at least double-checking whether the locations were correct. The entire method depends on the locations of the stations being correct (or is it actually that the method depends on the locations being incorrect?)

We still don’t know whether Antarctica is cooling or warming but the results will continue to be quoted by the warmers forever anyway.

At the end of the day, it is apparent that GISS miscoding of the site locations is what led Steig astray.

I don’t think you should write a major paper like this which is based on so few stations without at least double-checking whether the locations were correct. The entire method depends on the locations of the stations being correct (or is it actually that the method depends on the locations being incorrect?)

Bill I think it’s clear that the incorrect location of Harry in S2 is a typo. It’s located correctly in Figure S4. Also Geoff’s graphs discussed above make it pretty much certain that the incorrect long/lats for Harry weren’t used in the analysis. Why do you think it has anything to do with GISS?

Mann’s earlier mislocations Maine/France and Spain/Kenya however, were taken into the analysis.

Since you did not notify them fast enough of problems in their data, I suggest that you pre-notify them of the errors in their next paper before it is even published. The probability of an error must be close to 1.

Perhaps the answer is that is the data that Steve M. chooses to look at first-he challenges the conclusions of the paper by looking at the parts of it that may have controlled the conclusion. As many have said, this is the best argument for publishing the data and code along with the paper, and even having people like SM review the paper prior to publication. Currently, these papers seem to be peer-reviewed by “like-thinkers”, which though it is consistent with peer review, does not challenge the inate biases of the scientists publishing the paper. Perhaps the method of peer-review used in the past is now bankrupt in the internet age, and we need to do something like a “public” (open source?) peer-review.

One more observation about Steig’s practices here. In his readme you’ll all note that he merely points at the data he “used” and he “points” at the Regem code.
What’s wrong with this? Well, Suppose I write a paper in 2007 and I point at
GISS data that I used in 2007. Now it’s 2009 and my link links to the current GISS data. As we all know month by month GISS updates change the past. That is,
in 2007 GISS has a set of numbers for 1880-2007. In 2009 that very same data set has different numbers for 1880-2007. So, one cant merely point at a data source, one has to supply the data set “AS USED” that’s right make a copy. Same with references to code. Supply the copy of the data as used and the code as used.
Ideally if people followed the principles of reproducable reasearch you could download their data and code, AS USED, and re run it and get the figures in the paper.

Ideally if people followed the principles of reproducable reasearch [sic] you could download their data and code, AS USED, and re run it and get the figures in the paper.

Not only does this sound like good science practice, the climate science community should by now realize that they’ve turned their science into an adversarial process by staking claim to positions bearing on very controversial public policy choices. In an adversarial process, such as where one testifies as an expert witness, turning over all of your work product and supporting evidence is par for the course. No one proffers their opinions as an expert witness in litigation unless they are prepared to have their work examined with a fine toothed comb. Here, though, there is resentment every time SM or anyone else looks too closely at “the evidence” being proffered on these matters. It is trite, but if they cannot stand the heat, what are they doing in the kitchen?

It was impressively arrogant of Steig to make that comment. Like he’s king Daalek. -Doctor Who.

I requested help from gavin for data and code again on RC it was cut of course but that wasn’t the point. I wouldn’t be surprised to find the antarctic warming over 30 years, how could it not be with the rest of the world doing it? It would be nice for them to share the code and data so those of us who don’t accept paper titles and news articles as fact can know.

Pre-Notification:
Mann, Hansen, Steig et al,
In the future it might be a good idea to run your data, your procedures, your computer code, your statistical constructs and any ideas for studies or papers, through this website to garner comments from the statisticians here before you proceed with publication. This may save you from the need to correct, adjust, amplify or do nothing at all to save face…
Thanks for your attention,
Mike Bryant, Plumber

So the data for Harry has been “fixed”. No explanation from the owner of what was wrong, or even what was changed in their fix. How does one know if any other data was “fixed” today? If one now looked at the rest of the station data it might all appear correct, even though yesterday, and when it was used for the paper it was incorrect. Not sure one can even check the other station data at this point and conclude anything. The Reader .txt files only date and time is from earlier today. I assume there are other ways to look at the data that might indicate which stations other than Harry (if any) where changed this morning.

This idea of changing data in a “live” data base, with no easy accessible change history, is fraught with issues.

Steve,
Reading how the comment is progressing on RC (RC comments 160-161), they are saying (to completely paraphrase) that the AWS data is just as validation of the rest of the work: that the 42 manned stations are used for data, and data from the 4 AWS is just additional support.

I’d be very interested to know what you think about that. I know that the data error makes them look bad, but do you even accept their contention that the AWS data is mere validation/support for the main thrust?

At the risk of making you repeat yourself, what are the obstacles to you performing further analysis on the rest of their data?

“ant_recon_aws.txt” contains the monthly-averaged data from Figure S3.
The 600 rows are months, starting with 1957 (January) at the top, and
ending in 2006 (December) at the bottom. The latitude and longitude for
each of the 63 columns are given in the files “aws_lats.txt” and
“aws_lons.txt”.

I wonder what the selection criteria were?

I have had a flick through Antarctic AWS data from the last couple of days. About 10% of it is out of action and the rest wanders all over the place. Some stations record practically no change over about 100 temperature data points, while others have swings of 20 deg C. And from this we can extract a 0.1 deg C trend? Doubtful. I’d imagine that the reported trend is less than the instrumental calibration/drift etc.

In older times, we would include information as to error sources and magnitudes, these often being half the paper in explanation.

Conclusions
Observed temperature and precipitation for Antarctica show no statistically significant trends overall.
IPCC AR4 models have approx. the right snowfall sensitivity to temperature, but warm way too strongly in relation to observations.The cause is not related to the leading model of atmospheric variability, the Southern Hemisphere Annular Mode. An anomalously strong coupling Annular Mode. An anomalously strong coupling between water vapor and temperature is implicated that overwhelms the actual dynamical effect.
Obviously a lot more research is needed to isolate the precise cause among the models.
Does raise flags regarding the reliability of future projections of climate change over Antarctica.

Clearly the Steig paper set out to dispell concerns regarding the reliability of future projections of climate change over Antarctica

Faraday and Rothera Point are two Peninsula stations. These longstanding (WW2 to present) stations are key to calculating the magnitude of the reported Peninsula warming. I’ve been looking at their data and, while I’ve noted no hairy problems, I did come across something I find to be interesting.

Here are plots of their monthly average temperatures for two periods. The first period is 1946-1975 and the second is 1976-2008:

and

I think it’s a good (though often futile for me) exercise to attempt to create physical explanations for such patterns.

In these plots I see divergence in the months (January to September) in which I’d expect there to be nearby open water . In late winter (September onwards) the once-open water has refrozen and the difference approaches zero.

I’m quite interested in anyone’s alternate interpretations. The reported Peninsula warming indeed seems to be ocean (open water)-driven and, presumably, the ocean warming is AGW-driven. Frankly, though, I expected to see a larger AGW impact on the all-ice late-winter period. And, there’s always the possibility that natural ocean current movement of ice might play a role in the reported warming.

The warm PDO waters allowed the atmosphere to stay slightly warmer due to a delay in the freeze start date. The actual water temperature during the periods would have to be checked to be sure.

The close temperature correlation between the warm and cold PDO during October-December seems to indicate the effect of having the ice/melt-water layer in contact with the atmosphere since that layer stays around freezing until the ice is fully melted.

Too bad they don’t have a data set that indicates the actual temperature of the ice when fully frozen. The data sets I’ve seen just plug in -1.8c to indicate ice. I’m sure the ice temperature must continue to drop lower in those latitudes.

I guess they haven’t figured how to get that type of information yet. Might make a good project for some sharp young scientist. I think accurately defined boundary layers would be important.

I’ve been blasted at RC for the temerity to suggest that Harry was included in the full reconstruction. Apparently it was not. But the fact that Harry was not included in a subsequent reconstruction makes it all ticketeeboo. So why was Harry included in Table S2?

“The first AVHRR was a 4-channel radiometer, first carried on TIROS-N (launched October 1978). This was subsequently improved to a 5-channel instrument (AVHRR/2) that was initially carried on NOAA-7 (launched June 1981). The latest instrument version is AVHRR/3, with 6 channels, first carried on NOAA-15 launched in May 1998.”

Given the criticisms regarding the calibration problems due to orbital drift etc of the satellites used for the UAH and RSS temperature series by ‘the team’, what are the implications for Steig’s reconstruction of it’s reliance on three seperate, differently constructed instruments sequentially covering the period of record?

Well done Steve for the work and Re: Ross McKitrick (#16), well done Ross. I know you’ve asked to keep technical, but this episode, plus the GISS issues last year deserves a much wider expose than just in this blog. The Team has been so badly caught out – isn’t this worth a story? And this is not being snarky – the core of this matter is the disingenuous of the way that they conduct themselves, e.g denying that it was Steve and CA that found these things, instead attributing “independent” sources. And following this is the way that they conduct their science. It needs ferretting out as much as the technical aspects of their science. What is at stake is ending their nonsense and seeing better climate science.

So Gavin knows who notified BAS. Could that bit of data be placed on line or in some recoverable source? Such indefatigable research ought to be rewarded. We need more such tireless sleuths. Don’t be chary of sharing, please.
==============================================

I would have expected as a minimum standard that both the data and the code were managed within some sort of version control repository. It then becomes a trivial matter to supply both code and data as at any given point in time, as well as supporting internal analysis.

I just posted the following to RC to counter their current excuse for not cooperating with Steve’s efforts. I don’t expect it to survive their “stringent testing” as the post seeks to correct the misimpressions that they are leaving regarding audits.

Eric wrote: “A good auditor doesn’t use the same Excel spreadsheet that the company being audited does. They make their own calculations with the raw data. After all, how would they know otherwise if the Excel spreadsheet was rigged?”

Utterly wrong.
Auditors have access to all the company financial information including intermediate values across subdivisions and inclusive of budgets, forecasts and sales revenue. They rarely run the whole year’s financial data through again, that would be ridiculously expensive. Instead they do a series of tests against the accounts based on subsets of data. It is because of this that they need exactly the intermediate information that has been asked from climate researchers. The data used must (by law) be archived, not subject to the vagaries of external sources. The company directors are liable for the veracity of the data and its maintenance, punishable by imprisonment. No decent sized company uses “spreadsheets” for its accounts, unlike some “sciences” changes to the data needs to have an audit trail that survives for 7 years.

Well, I’ve inquired further at RC but I’m not sure I’ll get a response. I first asked if Harry was part of the reconstruction and Gavin said it was. Then Gavin said he “mispoke”. Eric then turned up to say it was clear as crystal (tho apparently not to Gavin or I) that Harry was not part of the reconstruction.

So why is Harry included in Table S2 at all? And you could ask the same about Byrd AWS, as Byrd Station only extends to 1975. Byrd AWS is in the same category as Harry (one of the four records supposedly not included in the reconstruction).

I may be missing something, but it looks like the West Antarctic reconstruction relies on one station (Byrd) 1957-1975, in an area about the size of Western Europe. I’ll report any traction at RC.

The southern ocean temperatures around Antarctica have huge varibility on short time scales as well as definitive up and down cycles on longer time scales.

Over time, there has been an increase in southern ocean temps, not unlike the rest of the oceans, but in most areas, there has been a down-swing since about 1975. This should have been reflected in the land and atmosphere temperatures.

The Pennisula area is one of the most active regions on the planet where energy is being exchanged between the oceans and the atmosphere. The Weddel Sea next to the Pennisula is the most active down-welling region of the deep ocean circulation system – where the warm surface water cools and sinks to become part of the deep ocean circulation.

The southern ocean influence could easily cause cooling in Antarctica over any time-scale despite the global warming influence. So I think we should just get these temperature trends right and there is no reason to expect that Antarctica will always be warming or always be cooling.

Dan, This admission of past mistakes and the way to correct should be applauded. Trace analysis is difficult and laboratories can lose reputation and business very easily if they are found to be in error. In the 40 years I so that I mucked around with labs, I was never confident in the results, especially those from my own. That remains my present attitude to trying to wring too much out of oxygen isotope ratios, a far more difficult task than the coal example you give.

Labs habitually overstate their competence. Years ago when the first moon rock came from Apollo, I did a comparison of the 20 or so “top” labs that analysed them and each was optimistic compared to the mean. So GCM errors are deja vu. Different actors, minor script variation and theme.

Information (or misinformation?) posted at BAS is changing by the minute.
You commented that they erased the incorrect data without keeping a copy.
Twenty minutes ago, the old data was there, with the statement

“The incorrect data file for Harry temperatures is available here”
and a link to it.

Now, it says

“The incorrect data file for Harry temperatures will be made availabe once they are recoved from backups”

So, allegedly, in the last few minutes someone has deleted the file rather than just moving it.
And there are now three typos in consecutive sentences.

It is probable that the work on Byrd/Harry that I graphed, taken further, would have turned up Gill rapidly. Then others reported and there was no need. The important point is that it was Steve’s observation on CA that started several lines of investigation going worldwide. Full credit should be given to Steve. I would not have gone looking if he had not made his original post.

There are also some calibration points at 0 deg C transmitted and one stations showing a flat line, no noise visible, as in failure mose. I have also seen a 444 number used as code for something unstated.

#151. Right from the outset, I made a point of putting code online so that calculations could be understood. Over time, I’ve raised my own standards by trying to make code “turnkey” as opposed to “reference” i.e. using code referring to online data, placing data online as necessary (often my collations of other data.) Some scientists wring their hands about changing data – well, as I’ve mentioned on many occasions, there is a long history in economics of the same problem. The procedure is pretty simple: you freeze a copy of the data as you used it (documenting when you downloaded it). You put the frozen copy online (citing the original) and use the frozen copy in your code showing where readers can substitute the fresh version if they want. There’s no need for hand wringing; it’s simple.

In addition, I’ve never regretted any occasion where I’ve spent the few minutes needed to tidy code to make it workable. I’ve got a lot of topics on the go and it’s easy to lose track of calculations. That’s one reason why I increasingly insert code even to blog posts.

The same thing for readers. Roman inserts code and it makes his calculations FAR more interesting to me than calculations sent by readers from a spreadsheet that might be just as interesting but where it’s hard to understand the method or what the calcs mean. I really encourage readers who’ve worked hard on a graphic or figure to include the code.

The same thing for readers. Roman inserts code and it makes his calculations FAR more interesting to me than calculations sent by readers from a spreadsheet that might be just as interesting but where it’s hard to understand the method or what the calcs mean. I really encourage readers who’ve worked hard on a graphic or figure to include the code.

As a poster who in the past has put up non R documented analysis at CA, I will readily concede that I agree with Steve M here. Most of my past analyses were posted with the intent to interest others and not as any official treatise on the subject matter. I did a number of them to satisfy a personal curiousity about a topic that I impatiently felt others were not covering.

I am currently learning R and, as a matter of fact, I downloaded a 50m file from Steig’s directory into R. The readme file said it had 600 rows and 5509 columns, but the download had a different dimension. Anyway I would not know where to begin to fit a file of those dimensions into Excel. The download in mode = wb went quite fast compared to what I would have expected copying and pasting it into Excel or Notepad.

I am curious if others here have downloaded the ant_recon.txt file and plotted the 1957-2006 temperature anomaly time series. I admit to showing more impatience here and particularly when knowing full well that others here have busy schedules and different priorities than I do.

But, with all the money going into the Antarctic project (see all the previous references), and the problems with getting reliable installations as cited on the U of Wisc. descriptions, how can that be accomplished?

Perhaps I miscommunicated the scope of what I was suggesting. Rather than complain about individuals or motives, the point was to suggest a process that makes it more difficult for these kinds of errors to appear in the future while at the same time, providing a mechanism for a real-world teaching opportunity for quality assurance and archiving sort of like a university teaching hospital provides a place for doctors to learn while also serving the community.

Such a clearinghouse/archive, as I had envisioned it, would not be responsible for the data collection. But what they could do is notice when a particular reading from a station was completely out of line with what could be expected and flag it while notifying the collecting agency to a potential problem. Also, if data from one source (say for example NOAA) is missing values for a particular station but those values are available for the same station from another source, the missing values can then be filled with real observations and not calculated. They would not “adjust” the data in any way but would attempt to build a more complete and more accurate data archive and quickly spot problems such as readings carried over from a previous month or other data integrity issues and alert the originator of the data of the problem.

Additionally, they could maintain an archive of this data in a standard format that all could have access too. So when Professor Bill, Dr. Bob, amd Mr. Sam perform some analysis of the data, they are all working from the same standard source. It then becomes much easier for a third party to attempt to recreate the result because that party would have access to the same data set and there would be no ambiguity over which “version” of the data was used.

The notion would be a quality controlled, more complete, more trusted source of data that researchers, the educational institutions, and other interested parties could access and be more confident that they are less likely to be embarrassed when publishing results that turn out to be incorrect because the underlying data are incorrect.

The idea was to propose a positive solution rather than simply complain or speculate on motives.

No one likes to be corrected. And the natural tendency, unfortunately, is to attack the corrector. “Sure… maybe I made a little mistake, but you were wrong in the way you pointed it out!” The corrector is accused of being too loud or too soft, too slow or too fast, too vague or too picky, to condescending or too overbearing, etc. and somehow this blaming of the corrector makes the error prone soul feel better about himself and allows him to continue to wallow in his mistakes.

My life experience is in the law. I’ve stumbled onto this AGW debate only lately and find it fascinating.

When I practiced criminal defense law I noticed that the best police officers were determined to “sell” their charges, not just to the courts but also to the accused. Charges being essentially “corrections”. The good officers would use all the techniques of a great salesman to convince the accused that, not only was this charge right, it was for his best good. However, despite their best efforts, the guilty often resorted to complaining about police tactics rather than focusing on dealing with their own faults.

In this spat about corrections to data we find the same response. Instead of being contrite or apologetic about having made an error, the writers of the paper are annoyed and angry at the person who discovered it. They minimize the error and vilify the finder (all the while refusing to give him credit for the very thing they are angry at him about). Logic and good sense have been pretty much lost. It’s a scenario we see a lot of in criminal courts.

Being a “good” police officer who can make useful and lasting corrections is a tough job. It’s one thing to find the errors and quite another to convince the error prone that they need to change. That requires almost infinite tact and patience mixed with tenacity and a genuine love of the one’s needing correction. If they sense you don’t like them they almost never accept your correction.

I expect I will continue to follow these discussions until I am convinced one way or the other about AGW. In the meantime these spats that generate more heat than light do have a train wreck like entertainment value. What lawyer doesn’t enjoy following a good argument?

…Instead of being contrite or apologetic about having made an error, the writers of the paper are annoyed and angry at the person who discovered it. They minimize the error and vilify the finder (all the while refusing to give him credit for the very thing they are angry at him about). Logic and good sense have been pretty much lost…

Aside from the seemingly unknown quality of if the sensor is buried or not, and when or not. The real issue here is something along the lines of the original data having combined two stations. That the Harry AWS stuck out as a pronounced trend and a highly leveraged series would mean anyone checking thier reasults the results should come upon this issue. Especially given that table S2 shows the AWS for Mt. Siple and Harry at the same location but with different data. Forget the different data, shouldn’t it be obvious there’s not two stations with different names in the same place?

And an error in multiple datasets used in at least one paper that’s been wrong for months, then found by others the same day it was posted here, well, to steal a word, that’s bizarre. Almost as strange as the difference between Steve posting the issue here where anyone can read it, versus directly notifying people who may have been somewhat uncooperative and distant in the past. Nothing like telling him not to contact them any more of course. That would be far too obvious.
…

SUMMARY: We are adopting rules requiring companies to provide financial statement information in a form that is intended to improve its usefulness to investors. Companies will provide their financial statements to the Commission and on their corporate Web sites in interactive data format using the eXtensible Business Reporting Language (XBRL). In this format, financial statement information could be downloaded directly into spreadsheets, analyzed in a variety of ways using commercial off-the-shelf software, and used within investment models in other software formats.

[note this is an excerpt from the summary of the rule and I reordered the sentences to make it a bit more readable]

Suppose I write a paper in 2007 and I point at
GISS data that I used in 2007. Now it’s 2009 and my link links to the current GISS data. As we all know month by month GISS updates change the past. That is,
in 2007 GISS has a set of numbers for 1880-2007. In 2009 that very same data set has different numbers for 1880-2007. So, one cant merely point at a data source, one has to supply the data set “AS USED” that’s right make a copy. Same with references to code. Supply the copy of the data as used and the code as used.

.
I am an engineer at a medical device company, and I can tell you in no uncertain terms that the FDA requires archiving the data AS USED and the scripts/code AS USED. There is no other effective way to audit. The fact that any scientist would argue with this is mind-boggling to me.
.
As far as RegEM goes, the figures I see in the supplemental information do not give me much confidence in the results. In the area of Antarctica where the 4 AWS stations (Harry, Byrd, etc.) are, the authors predict ~+0.5C/decade trend. Individually, none of the stations save the errant Harry are even close to that trend.
.
It makes absolutely no sense that they can claim the AWS stations provide verification. Fig. S4(b) really says it all (eyeballing the colors):
.
Byrd: Actual 0.12 Modeled ~0.40
Mount Siple: Actual -0.06 Modeled ~0.30
Siple: Actual 0.16 Modeled ~0.40
.
The reported correlation coefficients are 0.63, 0.58, and 0.70, respectively. If correlation coefficients > 0.5 yield such poor matches with observed trends, just how far off are the ones in the ~ 0.3 range?
.
And how is it possible to have a positive correlation coefficient for Mount Siple when the trends go in opposite directions?

While Antarctica has mostly cooled over the last 30 years, the trend is likely to rapidly reverse, according to a computer model study by NASA researchers. The study indicates the South Polar Region is expected to warm during the next 50 years. Findings from the study, conducted by researchers Drew Shindell and Gavin Schmidt… appear in this week’s Geophysical Research Letters. Shindell and Schmidt found depleted ozone levels and greenhouse gases are contributing to cooler South Pole temperatures.

This study also contains the model warming map that seems to have made self-fulfilling prophecy; it also has what looks like the only NASA record I would now trust, with my current understanding of the climate factors at work. A record I’ve never seen before, as it’s been serially replaced with warmer and warmer looking versions – see Warming Antarctica by Paintwork.

By the way, after reading and rereading the supplemental information, I think the purpose of the AWS stations was to try to demonstrate verification of the RegEM output only. I don’t think their primary purpose was to be used as entering arguments for the calculation.
.
Dunno. Still not certain if that interpretation is correct.

By the way, after reading and rereading the supplemental information, I think the purpose of the AWS stations was to try to demonstrate verification of the RegEM output only. I don’t think their primary purpose was to be used as entering arguments for the calculation.

Now that Eric has confirmed that the 4 AWS stations were not used in the full reconstruction, my conclusion is the same as yours. It seems that the only data from West Antarctica that was used is Byrd (1957-1975). Why the subset analysis (also using Byrd) shows more warming is a mystery.

A bigger mystery is how RegEM finds all this previously unknown warming in West Antarctica based on one site (Byrd), when the record for Byrd is essentially flat (slope 0.12, reconstructed).

Indeed.
.
I’m also concerned about the correlation statistics they give. Note how the trend from the subset analysis does not match the instrumental record for any AWS site . . . yet the correlation statistics between the AWS and the analysis are among the higher correlations. In my opinion, that shows rather clearly that RegEM did not produce physical results. The fact that the output somewhat closely matches the manipulated satellite data is immaterial; the output does not match the surface instrumental record. How this got beyond peer review as a positive validation of the analysis is beyond me.
.
Also, when it comes to evaluating the predicted vs. actual trends, the RE/CE method they use (monthly ground instrument temp vs. RegEM temp without regard to time) is inappropriate. It is a meaningless statistic for validating a trend because the trend is so small compared to the natural variability of the data. The trend will be overwhelmed by the fact that in summer, both go up and in winter, both go down, and the magnitude of that cycle is far greater than the trend being investigated. What should be done is a regression of a time series of the residuals. That would be much more sensitive to differences in trend.

My comment below was rejected at RC today. Maybe we as a community should officially ‘hire’ Steve as our auditor…

eric: “Science is not the same as business. The self-appointed auditors of climate science don’t seem to understand that science has a built-in-auditing system — the fact that by proving someone else wrong, especially about an important issue, is a great way to get fame and success.”

The built-in auditing system of science works great in the long run when there is plenty of time for the process to work. That time may or may not be available in the special case of AGW. Most contributors and commenters here at RC say that time is very short and very costly decisions need to be made very soon. If that is the case, then in my opinion, arguing against a business-type time frame auditing process is helping to delay decisions to implement mitigation policies.

eric: “You don’t get to be an auditor merely by launching a blog, and you certainly don’t publicly speculate about your findings before (or even after) you’ve done the analysis. Above all, you have to demonstrate competence and integrity, and the company you work with has to trust you, or they won’t hire you.”

Outside independent auditors are often not hired by the company being audited (IRS for example). Also, trust seems less important here, since no commercial secrets should be at risk. You may worry about unfounded accusations, but in the long run those will not matter (the built-in auditing system of science should take care of that).

I submit that SM has in essence been ‘hired’ by the community of interested stakeholders that want to see an independent audit of the science that is being relied on for the potentially very costly policy decisions that need to be made soon.

If you can suggest a better and still fully independent auditor, I would be very interested.

From what I’ve been able to understand (which I admit is less than half of the meat that’s been shown), there are basically two issues with the Harry dataset: 1)the location was incorrectly listed and 2) the data, depending on when you scraped it is either “old Harry” or a mix of “new Harry” mixed with Gill.

And now BAC has erased “new Harry”.

Ok, if that’s correct, then something that Steig said over at RC confuses me. He said that the error is irrelevant. That Harry was only used as a double-check against the satellite data. If that’s so, then why did the satellite pass this double-check when the data was so far off?

The coordinates are 64.1S, 298.4E, but Steig doesn’t ID it by name. I can’t find these coordinates in either Steig’s SI or in the U. Wisc. list of active AWS sites. They are roughly Two Hummock Island, NW of the Antartic Peninsula, and E of Brabant Island. There were two old sites in the vincinity, Racer Island and Bonaparte Point, but I can’t find their coordinates.

I have no clue from either the paper or the SI how this was constructed. But the dotted lines in Figure 2 in the text are identified as the averages of these series across all AWS stations in each region (63 in all). Included would be line 26, which has the coordinates of Harry. The trends from these series are also plotted as the red or blue dots in Figure S3.

I wrote to BAS and asked them to archive old Harry as it was the version used in a prominent Nature cover story. It is now available once again on their site, though the working directory now has the new Harry.

Note! Notification was received from Steve McIntyre on the 3rd February 2009 of incorrect values of 1.0 in the temperature record from Chatham Island for January 1988, March 1988, October 1988, January 1989, February 1989, May 1989, October 1989 and November 1989. These values have now been removed.

It seems that we have a pretty obvious new case of “Harry & Gill” here. For the period October 1989 – March 2004 (with many gaps) monthly temperatures range from c. -6 to +2 Celsius, which is quite reasonable for a maritime subantarctic site. Then for January 2005 to August 2006 there is an even scrappier record with monthly temperatures varying from -10 to -35 Celsius, indicating an inland site up on the icecap. Unfortunately there are no pressure data for this later period, so it’s not possible to determine the altitude of this site.

I’ve taken the precaution of saving the data, in case somebody should “independently” notify BAS. The data are not available at archive.org, since BAS has used robots.txt to block archiving, a rather strange thing for a scientific organisation to do by the way.

Actually the “anomalous” jumps in temperature for Racer Rock are quite easily explained. The record for 2005-2006 is obviously spurious and vastly too cold, but RegEM has filled in the missing months with fairly reasonable values, hence the wild swings up and down. This just illustrates that RegEM leaves the original data untouched when filling in the gaps.

The most interesting thing about the Racer Rock record is that it shows that nobody, neither any of the authors nor a reviewer can ever actually have looked at this particular record, it’s so obviously and blatantly corrupted. Presumably the “supporting data” were simply churned out by a computer and mailed to Nature unread.

For the period October 1989 – March 2004 (with many gaps) monthly temperatures range from c. -6 to +2 Celsius, which is quite reasonable for a maritime subantarctic site. Then for January 2005 to August 2006 there is an even scrappier record with monthly temperatures varying from -10 to -35 Celsius, indicating an inland site up on the icecap. Unfortunately there are no pressure data for this later period, so it’s not possible to determine the altitude of this site.

Note that the .html versions of these tables are more complete than the .txt versions, since the former include percentage coverage, plus numerous months, marked in red, where the percentage was < 90%. In the .txt versions, there are no coverage figures, and the red months are just missing. I haven’t checked which version the Steig team used.

Note! The aws data for Racer Rock since April 2004 has been removed from the READER website as the values appear to come from a different station even though they were transmitted on the GTS (Global Telecommunications System) as 89261 which the WMO (World Meteorological Organization) still list as being Racer Rock (4/2/09)

I’m not sure how much trust to put in this database. So far two of the stations looked at here had corrupted data. These two happened to stand out. How many more stations have errant data, but with a smaller magnitude? Seems like the owners of the data need to do a complete audit of their data and determine how these errors are occurring (and then fix the process so they don’t happen again).

RE 184, 185,
I just realized that each site has 5 temperature file links on http://www.antarctica.ac.uk/met/READER/aws/, for 4 different times of day, plus “All”, presumably their average. All 5 of them for Racer Rock now quit after March 2004. Was there really data for 2005/6 yesterday? I could have just misspoken, as I assumed was the case in #184.

I try to stick to the fact/scientific/statistical(non- alleged snark/Perry Mason) threads. I also have a number of computers open here at home with open threads to monitor. Sorry I missed the new thread(s). I was following up on various links on this thread, but was overcome with mirth when I followed a link to this revelation. ;-)

Re: Joe Black (#194), I was just letting you know… Not sure how long you’ve been following, but Team antics has become a sort of entertainment of its own around here over the past several years. Benny Hill had an amazing look that expressed surprise, awe, and disgust all at once. He had a script in which he discussed this – I can remember laughing so hard I cried, but I cannot remember exactly what he said when he did it. :)

Steve, you continue to surprise me. Well done.
.
Finding myself bemused by the entire Harry episode, I had to have a map to understand what’s going on. Here’s the map:

Fig. 1. GISS temperature datasets in West Antarctica. The Peninsula extends out of the picture at the upper right. The Amundsen-Scot temperature record is at the South Pole (bottom center). Gill is down toward the lower left corner, on the Ross Ice Shelf.

You may have to open this in a separate window to see it clearly. At first look, you might think that’s enough temperature records to have a shot at telling what the temperature in the vicinity of Harry might look like. So I figured I’d take a look at Harry and his ‘hood. I must confess, I busted out laughing, you truly can’t make this stuff up. Here’s Harry and his pals:

Figure 2. GISS charts. These are screenshots from the GISS website. Note the different timespans (x axis), as well as the different temperature ranges (y axis)
.
Hey, call me crazy, maybe we’re starved for humor here in the remote islands, but I find that hilarious. Erin has a three year record. Doug has four years, but they’re in two groups of two. Elizabeth has six years, but only four of them are together. Siple has two runs of three years and one of four years. There’s not enough data in the whole lot put together to draw any solid conclusion about whether it’s warming or not.
.
The reason I did all of this was to give a first cut at an answer to whether it is a difference that makes a difference, or not. After seeing the data, I believe it will make a large difference. I say this for a number of inter-related reasons.
.
One is that by local standards (Harry and the five sites around him), Harry is a star. He has the longest continuous record and the largest trend. So whatever Harry does will outweigh the other, more spotty records from the surrounding stations.
.
Next, by Antarctic standards, Gill is a fairly long record, longer than Harry. And if you squint at it in a certain light, it almost passes for complete. As such, in any estimation of area-wide temperature it will carry more weight than the shorter sections of the other records. These include the other records surrounding Gill (Lettau, Linda, Elaine and others, not shown) which are also spotty and with shorter contiguous runs. So both Harry and Gill start out the heavyweights of their respective local neighborhoods.
.
This is related to the next reason. This is that when Harry met Gill, Harry filled in a bunch of blank spots in Gill’s data record. So the new composite record contains more data yet, and thus has even more weight than Gill, the previous record holder. It also has a larger trend. The new Harry+Gill is now way, way larger than anyone in Harry’s neighborhood.
.
Next reason is that there are so few records. If this splice was one of a hundred records, the effect would be less. But there’s only a handful of records in the central part of West Australia. This means each one carries more weight, particularly the largest one.
.
Next reason it will make a difference is apparent when you look at the range of the temperatures. Gill goes from a high of -26°C (high?) to a low of -30°. Harry, on the other hand, has a high of -23°, and a low of -28°. In other words, Harry is a couple of degrees warmer than Gill. When you create the new largest record by pasting a two degree temperature rise on the recent end, it will definitely affect the result.
.
Next reason is a funny one. It is that Gill is in the mix twice, once as Gill and once as Gill+Harry. Even fairly large datasets can be strongly affected by a few similar individual records. In particular, when two records are identical over most of their length, but they are separated by a thousand km or so, the effect will be magnified.
.
Finally, these problems I mention above affect even simple areal averages. They are generally magnified by any kind of weighting scheme. This is because a weighting scheme, whether weighted using ordinary least squares or RegEM or principle components, favors similarity by its very design. They are looking for the common factor, the unifying scheme, the “principle component”. We laugh about “teleconnection”, but here it is in action. When any weighting scheme finds that kind of long distance correlations, guess what they pick as the “principle component”?
.
So that’s my considered opinion, that it will make a very large change in the results. They say it won’t matter because the ground stations are just a double-check on the satellite data, but I doubts that … and if it is true that a big error in the ground data doesn’t affect their results, why include the ground data at all?
.
What I predict will happen is that they will re-run their whiz-bang temperature modification algorithm again, and find out that, well, no, we can’t say that West Antarctica is warming. At that point, they will say nothing. They will, in the words of the Mann himself, “move on”.
.
So if we get new results from Steig in the next little while, you can be sure that it made only a small difference as he claimed. On the other hand, if what we get is the strange thing that the dog did in the night (to misquote Mr. Holmes), you can be sure that it made a large difference and the revised version will never see the light of day.
.
w.

In fairness to Steig, I believe his method was to infill missing data based on a correlation to satellite temp data. As long as there is enough overlap to establish a correlation, it seems a reasonable approach to me. The only problem, as I understand it, is the satellite temp record does not show warming. So the only real warming seems to be coming from stations with problems like Harry and Racer Rock.

Willis:
Very interesting. Apparently Steig has re-run his data and is now indicating that the effects are minimal. But apparently others are finding other issues with the data and the selection of stations. Sorry I should have linked you to the other threads but perhaps someone whi is more adept can provide the links.

It is referred to on the new thread at RC, where Gavin has reproduced some of Steig’s output that apparently show the impact of the correction to Harry. Also Steig briefly summarizes his analysis here.

If the first two AWS stations that get close scrutinty, namely Harry and Racer Rock, turn out to be badly corrupt, Reverend Bayes would be the first to argue that the posterior probability (starting from a uniform prior) that the next one to be comparably scrutinized is also corrupt is 3/4.

Some BAS procedure has evidently permitted these errors to creep into their archive of this otherwise admirable data set. Until those faulty procedures are identified and corrected, there is no reason to think that the other series in the data set are any more accurate.

It is therefore premature of Steig and Gavin to announce that the problem is solved, just because correcting or omitting these two sites has little effect on the Steig team’s conclusions.

I must compliment BAS that it took them less than 12 hours from when I first posted my comment on CA that Steig’s Racer Rock reconstruction series looked questionable, for them to withdraw the data!

Until Steig posts the code he used to apply Schneider’s Matlab routines to the data, it is impossible to evaluate his paper’s conclusions. Fortunately, on 1/23, Steig assured Steve that his data, and presumably complete code, would be online in the near future. I’m sure everyone at CA anxiously awaits this information.

A set of six Automated Geophysical Observatories (AGO) have operated on the Antarctic continent since 1993; although their primary instrumentation has focused on ionospheric and magnetospheric phenomena, they also have recorded and reported meteorological conditions (with significant gaps).

Due to the recent installation of an averaging filter to remove arbitrary data, a trend analysis has been conducted and is included in this study. It agrees with previous statements that the Antarctic interior is experiencing no major temperature deviations. However, it does suggest that satellite readings tend to be erroneous during all seasons except the Antarctic winter. …

Each AGO observatory is equipped with J-T Thermocouples capable of observing temperatures between -100°C and 100°C at an accuracy of 0.3°C. …

The satellites have undergone several revisions in calibration and algorithms to compensate for anomalous factors, but the reliability and accuracy of the data has been debated. With the AGO data now available, it is possible to determine the satellite’s accuracy in predicting surface temperatures in the Antarctic. …

The satellites and ground stations match most closely to each other during the Antarctic winter, most likely due to a lack of cloud coverage and solar activity. During all other seasons the satellite is often off in its readings by up to 20°C …

It was confirmed by the AGO stations that the satellite observations of the Antarctic continent were fairly accurate during the Antarctic winter, when there was a minimal amount of sunlight and cloud coverage to obfuscate the readings. There were significant errors in the satellite data during all other seasons, however, which may need to be addressed.

The mean statistical differences between the TOVS and AVHRR estimates were 5 °C for surface temperature, 4 °C for air temperature, and 1.5 kPa for vapor pressure deficit. …

This study differed from previous comparisons of satellite and surface measurements in that we have compared much larger spatial areas, over a longer time period, between satellite estimates of land surface variables. The results provide insight into estimation of environmental variables from remotely sensed observations, particularly when considered in conjunction with earlier comparisons of surface measurements with TOVS and AVHRR (see citations in Section 1). For example, data from three data-rich field experiments have been used to determine that air temperature could be retrieved from AVHRR to within 3.9 °C, surface temperature to 3.5 °C, and vapor pressure deficit to 1.09 kPa over a wide range of conditions (Prince et al., 1998). Comparison of the surface temperature, air temperature, and vapor pressure for the same experiment sites showed that the long-term averages were unbiased, but the standard deviation of the biases were 4 °C, 3.5 °C, and 3.5 mb, respectively (Lakshmi & Susskind, 2000). Similar comparisons of TOVS air temperature and vapor pressure retrievals to monthly observations at four airports around the world produced standard deviations of 1.1–2.4 °C for air temperature and 0.1–0.2 mb for vapor pressure (Lakshmi & Susskind, 1998). Comparisons of AVHRR and TOVS air temperatures to observations from a network of surface stations in the Arkansas Red River basin reported standard deviations of 3–5 °C (Lakshmi, Czajkowski, Dubayah, & Susskind, in press). Differences of this magnitude are not unexpected considering the spatial mismatch between local meteorological measurements and spatially integrated satellite sensor retrievals. (emphasis added)

The final implementation of our algorithm resulted in an MAE (Mean Absolute Error) of 6.09 °C and a standard error of the estimate of 2.73 °C (Tables 1 and 2), which is of a magnitude similar (to) those from previous studies. (emphasis added)

1 Calculated surface temperature accuracy
The accuracy of the remotely sensed surface temperature was estimated with help of real surface temperature, measured by thermometer at the ground stations. These two data have different ground resolution but comparing calculated and measured temperature values we found that average accuracy of the surface temperature sounding is less than 2.5 degrees (Table 1). It is interesting that there is different sign of errors at day and night times. The main reason of it might be atmospheric transparency and surface emissivity variation peculiarity at different times, of a day. For some situations these differences range more than 3 degrees and we connect facts with different ground resolution and uncorrespondence of certain surface emissivity. …

In clear sky and high transparent atmospheric condition the temperature accuracy is +2.50C, when correctly considered the surface emissivity(. O)mitting of the emissivity gives error more than 100C(.) (emphasis added)

The most popular estimation method for land-surface temperature is a variant of the split-window technique developed for sea surface temperature [Price, 1984]. To apply this technique to land surface, corrections are applied to adjust for emissivities for different surface types. Several validation studies over different terrain types have indicated discrepancies ranging from 1 to 3 K. … In all the above studies, the satellite estimates were generally higher than the ground measurements. …

We find that observed biases with respect to the ground temperature, both during day and night, are small. However, except for one rainy day measurement, there consistently was a warm bias during the day and a cold bias during the night. (emphasis added)

Ship air-temperature data during the validation indicated moderate to strong inversions over sea ice under clear skies. These formed and decayed rapidly (tens of minutes) as clouds moved out of and into the zenith area. …
Air-temperature inversions and
air-temperature–skin-temperature differences
Air temperatures were recorded during all ship’s-rail observations by the ARISE cruise meteorological staff. The temperature sensor was located 21m above the waterline. During the course of the validation experiments, we noted large differences between the KT-19.82 ice surface skin temperature measurements and this air temperature whenever clear-sky and low-wind conditions prevailed (Fig. 7). Temperature differences of 2–15°C were observed, and these formed and decayed rapidly (tens of minutes) as clouds moved into and out of the zenith area. We attribute this to strong atmospheric thermal inversions forming rapidly under clear-sky conditions, and breaking down when clouds covered most of the sky. Thermal inversions are common polar occurrences; however, these inversions had gradients intense enough (0.25 Km–1) to significantly affect a comparison of meteorological station or automated weather station (AWS) 2m air temperatures and satellite measurements of ice surface skin temperature. …

We further find that near-surface temperature inversions can be strong enough under cold, clear conditions to affect air-temperature–satellite-based ice skin temperature comparisons.

Comparison with available ground-based observations shows that TIR data provide good estimates of the near-surface air temperature (Ta), although they may be cooler than the actual Ta under strong surface inversion conditions (Comiso 2000). In addition, monthly means of TIR data have a clear-sky bias because infrared surface temperature estimates cannot be made in cloudy conditions. Since the net effect of clouds on surface temperature in the Antarctic is warming (e.g. King and Turner 1997), monthly cloud-free averages from the infrared observations tend to be cooler than in situ station observations by ~0.5 K (Comiso 2000). (emphasis added)

Some of these posts are self-explanatory. Mostly, I wanted to show that there appears to be a well-known and fairly large uncertainty in the satellite “temperatures”. In fact, satellites do not measure temperatures directly. Satellite temperatures are themselves estimates. Also, the Schneider dissertation mentions both clear-sky bias and clear sky temperature inversions. However, the Schneider dissertation does not say what sign or how large the clear-sky bias could be and claims that the cloud-free inversion has a cool bias. The uncertainties in the satellite temperature estimates are clearly larger than the trends that they have claimed to find. Also, it is not clear to me how the trend uncertainty that they are claiming can be so small given the uncertainties in the satellite temperature estimates, especially when you consider the (apparently unestimated) clear-sky bias as well as all the mathematical gymnastics that has apparently been done. Maybe I’m missing something.

Re: Phil (#213), Thank you. These are very useful posts. They certainly suggest that the potential issues with the AWS and station data could and should have been addressed earlier. IMHO they also suggest that the period under analysis should have been examined to ensure that the trend is not an artifact of some larger climate cycle. I suspect that the dedicated analysts here will soon shed additional light on these issues.

Like Bernie, I thank you for the posts. While they certainly make the satellite questionable for Antarctica, they seem to also do so for surface data. While I’d certainly think that with care and many stations a more-or-less correct estimate could be made of the Antarctican temperature, I don’t see how accuracies less than .1 C can be justified at the present time. I’d say more, but it’d just be redundant.

Re: Phil (#212), Thanks Phil for all of that. This raises some questions. 1) Did Steig carry through the satellite uncertainty into their analysis? 2) With the correct values for the 2 stations Harry and the other one, what are the new confidence intervals? I only see Gavin’s and Eric’s statements about effect on the trends.

Also, it is not clear to me how the trend uncertainty that they are claiming can be so small given the uncertainties in the satellite temperature estimates, especially when you consider the (apparently unestimated) clear-sky bias as well as all the mathematical gymnastics that has apparently been done. Maybe I’m missing something.

The claimed uncertainty probably does not include any instrument errors/uncertainties but only those using the reconstruction variations. One also must consider that, if anomalies are used, a constant bias in readings will have no effects on the trends.

The errors in this case apparently arise from differences in clear and cloudy times and it would seem if that ratio changed over time a trend error would be introduced.

The point that bothers me the most is the inversions that occur and the resulting dependence of measured temperatures with the height of the sensor above the surface. From the “digging out” anecdotal evidence presented in other threads at CA on that height varying, I would have a concern – and particularly so if I thought the authors of the paper were as unaware of these conditions as they were with the trouble with Harry.

The claimed uncertainty probably does not include any instrument errors/uncertainties

I can see how that might be the case, but, given the extreme environmental conditions in measuring temperature and other variables in Antarctica (as opposed to, for instance, next to your backyard barbeque), shouldn’t the instrument errors/uncertainties have been included in this case? I think that, if they were not included, then they should have been before going on a worldwide press tour. Please see also the Docken and Petit paper in Re: Phil (#207).

Here I quote more fully:

The satellites and ground stations match most closely to each other during the Antarctic winter, most likely due to a lack of cloud coverage and solar activity. During all other seasons the satellite is often off in its readings by up to 20°C (see Figure 3). The variance differed depending on location and pass, as the 0200-hour samples tended to underestimate the temperature (see Figure 1) and the 1400-hour samples tended to overestimate (see Figure 2).

With respect to your comment

One also must consider that, if anomalies are used, a constant bias in readings will have no effects on the trends.

, I am not sure that using anomalies will cure these ills as the uncertainties that Docken and Petit identify don’t show a consistent bias on a daily or monthly basis. One would have to separate the two daily passes and try to “adjust” the data for bias independently for each pass, but that would just get us into the same adjustment maze that the surface temperatures suffer from. Furthermore, the sheer magnitude of the uncertainties (up to 20°C) should call into question the credibility of asserted trends of fractions of a degree.

Phil, my point about anomalies was only that a consistent bias would not matter but that any biases/errors are problematic because one does not necessarily know whether the errors are consistent or not.

Steig et al. (2009) notes in their paper that the satellite IR measurements were adjusted with a cloud masking technique described in an earlier paper (where the Antarctica looked much cooler) and then further adjusted by discarding temperature readings that were over 10 degrees away from the climatological mean.

They selected 10 degrees because as they state it “produces the best validation statistics in the reconstruction procedure”. That is an obvious case of data snooping, but the point for our discussion is again one of an acknowleged bias that one cannot be certain is consisitent.

In fact the 10 degree adjustment satellite IR reconstruction looks very much like a previously published reconstruction that used a 6 degree threshold for discarding data. The authors unfortunately do not detail what data was discarded with their procedure and whether it was more frequently colder or warmer temperatures or if such a procedure could actually throw out legitimate temperature variations.

The authors unfortunately do not detail what data was discarded with their procedure and whether it was more frequently colder or warmer temperatures

Looking at John Daly’s blissfully accessible data I see occasional clear indications (when seasonal records are separated) that summers tend to be steadier temperatures while winters tend to plunge sharply. So yes, cutting out variations over 10 deg might well produce a warming bias.

Played around with the AWS reconstruction and have some things to share.
.
FIRST: Harry, Mt Siple, Siple, and Byrd were used in the reconstruction for the AWS sites – author’s comments notwithstanding. I am 100% certain they were more than just comparisons. I downloaded all of the READER data, converted to monthly anomalies, and plotted vs. the reconstruction. There are some differences due to how I had to calculate anomalies (subtract recon anomalies from READER, take the mean of the result for each month, subtract that from READER . . . some months were sparse with actual data). Anyway, here’s the results (please note the vertical scales on Byrd and Siple):
.
Byrd:

.
Harry (NOTE: This is using New Harry data . . . reasons for that later)

.
Mt. Siple:

.
Siple:

.
There is no doubt whatsoever that these were used in the reconstruction for the AWS comparison. I don’t yet know if they were used in the full reconstruction, but I’d be willing to bet they were. So now that we know that Harry was used, what effect did it have?
.
Turns out, like Steig claims, not much at all.
.
I made anomalies for every month for the READER data and subtracted the reconstruction from them to make a time series of the residuals. Then I did a simple linear fit. Negative coefficients mean the reconstruction OVERESTIMATED the trend at the site; positive coefficients mean the reconstruction UNDERESTIMATED the trend. I then generated the following histogram:
.

.
Basically, even though the reconstruction used the bad Harry data, the effect when comparing reconstructed anomalies to actual is very minimal. If you take the mean of the slopes, you get a whopping -0.003. So RegEM overestimated the slopes just a wee tad.
.
I’m pretty convinced that Dirty Harry – though sloppy – has no substantive effect on their results. (I have another issue with their results, but I have to work through it first. Has nothing to do with Harry, though.) So my verdict is that Harry is interesting, but irrelevant to their conclusions.
.
Now . . . I downloaded R only about 2 weeks ago, so I’m a n00b. The following script most likely reflects a significant amount of n00bness. Columns 1:63 are the reconstruction data, 64:126 are the raw READER data, 127:189 are the monthly anomalies for the READER data, and 190:252 is the difference between READER anomalies and reconstructed anomalies. The missing file is “aws.txt” – which is just 1957, 1957.083, … that forms the x-axis for my plots. I don’t know time series stuff in R just yet, so I made an axis in Excel and saved it as text. You’ll need to do something similar for the script to work as written.
.
The script goes out and gets the READER data on its own.
.

Lucia drew my attention to these posts, Steig et al. were certainly aware of the material that the other Phil cites (some of it was from one of the authors’ graduate dissertation for which Steig was the committee chair). They certainly took steps to circumvent the problems, e.g.:
“Results from our AWS-based reconstruction agree well with those from the TIR data (Fig. 2). This is important because the infrared data are strictly a measure of clear-sky temperature and because surface
temperature differs from air temperature 2–3 m above the surface, as measured at occupied stations or at AWSs. Trends in cloudiness or in the strength of the near-surface inversion could both produce spurious trends in the temperature reconstruction. The agreement between the reconstructions, however, rules out either potential bias as significant.”
and:
” The TIR data are biased towards clear-sky conditions, owing to the opacity of clouds in the infrared band. Cloud masking is probably the largest source of error in the retrieval of TIR data from raw satellite spectral information. We have updated the data throughout 2006, using an enhanced cloud-masking technique to give better fidelity with existing occupied and automatic weather station data.”

Re: Phil. (#220), The language you quote was what, among other language in the Nature letter, prompted me to post the references that I did. There does not appear to be anything in the Nature letter or the language that you repeat that specifically addresses the large uncertainties in satellite temperature estimations – only generalities. In this regard, your post did not provide us with any new information or any insight that might further understanding of these issues as it only repeats what was stated in the original letter and does not specifically address the issues that I was trying to raise. If I am misunderstanding the point you are trying to make, please forgive me. Respectfully, the other Phil.

Quick add-on: Don’t need to check if Harry was used in the full recon; Steve did that in the first post.
.
So the author’s initial statements (and the impression given in the supplemental material) that Harry, et. al., were merely for comparison is incorrect. They were used for both the full recon and the AWS recon.

Damn. There was a mistake in the script I posted. My main data frame was pre-formatted with the extra columns and the script doesn’t execute right starting from a blank workspace. You also need to read in a list of the station names, conveniently in alphabetical order just as they appear on the READER site.
.

One thing most people forget about antarctica is that anything on the surface of the ice MOVES with the ice. I would think that the changing location of the station would automatically end any chance of determining temperature trends of 0.1 degree celcius. The best that can be done is to pick up the equipment and relocate it back to its original position periodically.

Another thing…I keep a backyard weather station. With no formal training other than my own hobby reading, I know to go out after it snows to clear off the temperature sensor shelter to get the most accurate low readings. Maybe these guys also didn’t realize that anything on the surface of the ice in Antarctica also gets BURIED over time.

Having not paid attention to CA for a while I find that while I was away something of interest to me has occurred and I would appreciate a little guidance from the crowd here.

I have not read all of the comments here or at RC and am hoping that someone will provide a little more detail on how the problem was detected and ultimately identified. The detail I am seeking is not so much who and when but how. In lieu of reading every comment I did some searching and noticed that “Tim C” identified the problem but did not find any associated code.

I must admit that I have spent several hours playing with GISTEMP step0 and did not catch the error and I would like to enhance my emulation by screening the source data for errors of this nature.

#240. There are two legs to the problem: 1) that the Harry record included data from a period when it did not exist; 2) that the spliced data came from Gill.

OF the two points, the only one that really matters is that Harry contains spliced data from a period when it did not exist; the Gill thing is the sort of interesting detail that adds color but doesn’t really matter.

I noticed that GISS and Steig used the same data (See my first post), so I was interested in why Mannian RegEM yielded different results than GISTEMP – the sort of thing that you’d think that they would report in PRL. I noticed that Harry was a big outlier in the West Antarctic and decided to look at it. I have several GISS versions and compared versions – I’m not sure why. I’ve got a lot of relatively well-organized data and can do this sort of thing in a couple of minutes. A huge difference popped out.

I’ve done enough deconstruction of Team and Mannian stuff that I don’t overlook the possibility of really goofy things like accidentally splicing unrelated series – this is not the first time. If you’re fresh to this corpus of material, it wouldn’t occur to you to be alert for this sort of thing.

So I looked for info on Harry – I googled “Harry AWS” and got to the Wisconsin site. It showed that HArry didn;t exist prior to 1994. So I was pretty sure on Sunday afternoon that they’d done something goofy and asked readers to look for a provenance of Harry data before it existed. I posted up a script for scraping data from the Wisc website as well.

I went back just before bed and scraped all the files, the number of stations increased and varied by year, so the scraping was a little fiddly. I then looked through Jan-Feb values in a couple of years before Harry existed and found a match with Gill. I then downloaded all the Gill series and found that all the imaginary Harry data came from Gill.

There was another way. Gill proved to be one series above Harry in the BAS list. Tim C noticed that the two BAS series were identical.

I don’t know what Gavin did in his “independent” identification of the Gill provenance. However, as noted above, for BAS purposes, the provenance of the bad data doesn’t matter; only that the data is bad – something that Gavin obviously didn’t “independently” notice.

As observed elsewhere, we later learned that Monaghan et al 2008, cited by Steig, observed that Harry data was “suspicious” but apparently took no steps to investigate or pin down these suspicions.

Don’t count on Nature to do anything of the kind. When the National Academy of Sciences published regarding the Hockey Stick controversy (and agreed with Steve and Ross on every important point of science), Nature published an article claiming the NAS panel supported Mann. I wrote and asked Nature to withdraw that article since it was demonstrably false but they claimed they stood by the story. Nature is quickly losing the respect it once commanded.

Re: RomanM (#228),
Nic L:
Please double check. Roman is correct ay ;east that is how GISS reports missing values. Standard procedure is to define missing values before doing any analysis so I doubt very much that this type of error was made. I would be slow to report glitches until you ask one of the afficiando’s here to double check. Unlike others of recent memory, no one here seems inclined to steal any of your credit. By the same token, adding noise to the system at BAS by false alarms will make it difficult for them and for others reporting errors – especially those not in the loop.

6 Trackbacks

[…] 2009 It seems that folks are all “wild about Harry” over at Climate Audit, with the revelations occurring there, and no good kerfluffle would be complete without some pictures of the weather stations in […]

[…] And now as Andrew Bolt has noted Steve McIntyre, who with Ross McKitrick uncovered the ‘hockey-stick’ nonsense in the first place, has delivered the coup de grace to the Steig/Mann Antarctica claim. Steig used data from a weather station called Harry. Bolt observes: Harry in fact is a problematic site that was buried in snow for years and then re-sited in 2005. But, worse, the data that Steig used in his modelling which he claimed came from Harry was actually old data from another station on the Ross Ice Shelf known as Gill with new data from Harry added to it, producing the abrupt warming. The data is worthless. Or as McIntyre puts it: […]

[…] is the first article on Climate Audit, then several others followed into February. Try not to miss, When Harry Met Gill It turns out there were errors in Automatic Weather Station (AWS) data, surprise, surprise. Also […]

[…] I had been admiring Steve McIntyre’s demolition of the Steig et al 2009 claims (much promoted in the Australian media) that Antarctic was “warming after all”. I think this is the first article on Climate Audit, then several others followed into February. Try not to miss, When Harry Met Gill […]