Carnage

Moving right along since the problem with Harry was identified on Super Bowl Sunday, BAS reports:

This is a list of corrections that have been made to the AWS data tables and a link to the table before the corrections were applied, any suspected errors should be reported to Steve Colwell

(2/2/09)The AWS data for Harry have been corrected after is was reported by Gavin Schmidt that data from Gill had been added where data for Harry did not exist. The incorrect data file for Harry temperatures can be accessed here

(4/2/09)The AWS data for Racer Rock since April 2004 have been removed from the READER website as the values appear to come from a different station even though they were transmitted on the GTS (Global Telecommunications System) as 89261 which the WMO (World Meteorological Organization) still list as being Racer Rock. The incorrect data file for Racer Rock temperatures can be accessed here

(4/2/09)The AWS data for Penguin Point since January 2007 have been removed from the READER website as the values received on the GTS appear to come from a different station and this AWS is reported as being removed at the start of 2007. The incorrect data file for Penguin Point temperatures can be accessed here

(4/2/09)The AWS data for Clean Air since January 2005 have been removed from the READER website as the values received on the GTS appear to come from a different station and this AWS is reported as being removed at the start of 2005. The incorrect data file for Clean Air temperatures can be accessed here

I compliment British Antarctic Survey for being professional in dealing with the situation. Clean Air is in Steig Table S1, Harry in Steig Table S2. Penguin Point and Racer Rock are in neither table. But it seems not beyond imagination that the exclusion of bad data might make them eligible for Table S1 or S2 – whatever the eligibility criteria (and I don’t pretend to have any idea how eligibility for these exclusive clubs is determined.) So the mere fact that they did not contribute to the present recon, if they didn’t, doesn’t mean that they shouldn’t have – once the records are corrected. Time will tell.

While Steig has asserted that I should know the impact of these various errors, I don’t. Maybe they’re negligible, maybe not. We’ll find out. In Steig’s shoes, as I’ve mentioned before, I would deal with present situation by placing end-to-end source code online so interested parties can readily assess the impact of these errors for themselves. If he’s certain that the errors “don’t matter”, that’s the best way to extinguish speculation.

Being stubborn about this is going to be unsustainable and he might as well get it over with.

Just my opinion.

Update: See here for a list of 6 surface (nor AWS) stations where problems of one sort or another are pending.

151 Comments

Looking at Steig’s AWS reconstruction data (I am ashamed to say using Excel – my R skills are not yet good enough), there are quite a lot of very odd points. In about 88 cases, the temperature for a station for a particular month differs from the average of the temperatures for the previous and next months by more than 10 degrees C. There is a concentration of such points in August 1978, in June & July 2004 and in June, July and August 2006. Stations particularly affected are 17 (Elaine), 25 (Gill), 26 (Harry), 35 (Lettau) and 55 (Schwerdtgeger). And, as Hu McCulloch pointed out in #170, 52 (Racer Rock), has some exceptionally anomolous temperatures: -20.5 in August 2005 and -13.025 in January 2006, the adjacent months being quite close to zero in both cases. In all cases, the stations concerned had no actual temperature measurements in the READER database for August 1978 or June/July 2006.

These data series are, if I understand correctly, estimated tempertures reconstructed using data from the occupied weather stations. It seems very strange that a statistical reconstruction technique could produce such hugely anomolous temperature changes for particular months.

I will try to make my Excel file, with all Steig’s AWS data, station names & positions, and graphs, available if anyone wants it.

I have, incidentally, cross checked the longitudes and latitudes of all the AWS stations listed by Steig to the READER database. They all tally, except that Steig has not used Dome F (which only produced data from 1997 to 1999) and Casey New Airstrip (Dec 98 to Jun 99 only). Steig’s AWS reconstruction data is in station name order; e.g. Harry is in column 26. (NB From the READER data, Steig seems to have made a couple of minor mistakes in the Station Names listed in Table S1 of his SI: LGB120 should read LGB20 and Marily should read Marilyn.)

Further to my earlier post (#183), I have now had a first look at all the actual AWS and manned station data temperature records. I have found (and reported to BAS) a problem with AWS Elaine’s record – recent data may be coming from another station. However, the problem only starts in early 2008 and so does not affect Steig’s work. I should say that BAS responded very promtly and helpfully to my email to them.
I have also just queried with BAS some anomolies in the records for manned stations Adelaide (in 1975) and McMurdo (in 1992, 1994 and 1995), in that the All day temperature diverges from the average of the 00Z, 06Z, 12Z and 18Z figures. In the case of McMurdo that is just as well, since the 12Z temperatures shown for various months from 1995 to 1998 are close to +1000 degrees!
Nic
PS I also thank Phil for his useful posts, which are very relevant to appraising Steig’s work.

There is … an increase of the trend at Racer Rock (the error there had given an erroneous cooling), but the other points are pretty much unaffected

Racer Rock is not referred to in either Table S1 or S2. I know that various people have tried to figure out what privileges attach to inclusion in Club S1 or Club S2. I must confess that I;ve been unable to keep track of the various accounts and welcome a summary from anyone who’s followed this aspect of the debate.

I’ve moved a couple of Nic’s posts from the Dirty HArry thread. Maybe a few more should be moved.

Steve, thanks for your work on this. So now there are four stations with bad data. Is that the end of it, do you think? Or do you see other questionable stations?

Also, I do not think Steig really understands the issue with the code. On RealClimate, he responded to a question from jeff id with this:

[Response: What is there about the sentence, “The code, all of it, exactly as we used it, is right here,” that you don’t understand? Or are you asking for a step-by-step guide to Matlab? If so, you’re certainly welcome to enroll in one of my classes at the University of Washington.–eric]

Re: Ron Cram (#5), when I think of good places to learn Matlab, a geology class is at the bottom of the list. A geology class taught by someone who doesn’t write his own code doesn’t even make the list.

Re: cbone (#9), Matlab code (*.m files) are just ASCII can be opened with any text editor or word processor. Matlab is high level and most of it should be easy to follow if you program in a different language.

Ryan O writes, in a comment way down on the previous “Gavin’s Mystery Man Revealed” thread,

Not to put a damper on everything, but Racer Rock, Penguin Point, and Clean Air will not have an impact on Steig’s results because they were not used for any reconstructions. The only sites that could potentially affect his results are Byrd, Harry, Siple, and Mt. Siple. So while additional AWS sites may be fubar, unless the fubar’d ones are Byrd, Harry, Siple, or Mt. Siple, it doesn’t change his paper.

I just see three problems here:

1. The version of Harry used in the Nature article is indeed FUBAR — see CA for the past week, passim.

2. We were initially assured by Steig and/or Gavin that the AWS stations were not used at all in the primary Nature reconstruction. Yet Steig now admits on his U. Wash. page that correcting Harry and Racer Rock does make a difference to the primary reconstruction, but that since the two huge errors largely offset one another, the net effect is small. So which is it — were the AWS stations just dragged in to snowblind the Nature reviewers, or were they really used somehow? And if they were used, why isn’t it important that they all be correct? What’s the significance of the 63 AWS reconstructions carefully archived on Steig’s site, if they’re irrelevant to the conclusions of the paper?

3. Observers can’t know what the Steig team really did until they post their actual code. Just pointing to routines on Tapio Schneider’s Caltech site, as at present, does not tell us how Steig et al employed those routines to obtain their results. Is this supposed to be Climate Science, or Climate Faith?

Well, maybe there might be more problems, but let’s take these three for starters …

3. Observers can’t know what the Steig team really did until they post their actual code. Just pointing to routines on Tapio Schneider’s Caltech site, as at present, does not tell us how Steig et al employed those routines to obtain their results. Is this supposed to be Climate Science, or Climate Faith?

I think the proper term is trust. Ref: RC
Personally, even if I trust someone, it doesn’t mean I trust them never to make mistakes or have bad luck…

Steve: It’s working code but is 8 years old and does not show particulars of the application. The data has to be input. The text says that the Schneider RegEM was “adapted”; this contains none of the adaptions. I put code online all the time that you can insert in your R session and get precise output; that way people can see what I did. So I know what’s possible. This is no more than citing an R package. It’s not “all the code”; if that’s what Steig thinks, then he has no command over his paper.

(5/2/09)Notification was received from Nicholas Lewis that the AWS temperature data for Elaine since February 2008 is incorrect, this is data that was recived via the GTS and will be removed from the READER website soon. The incorrect data file for Elaine temperatures can be accessed here

#11. I phrased this the wrong way. What I meant to say was: Gavin, there’s a problem with the Elaine station. Would you get right on it and locate the defect for us by tomorrow morning. Thanks in advance.

Y’know the beacon that Gotham City puts up in the sky to summon Batman.

Maybe I am missing something but: do we understand completely the reasons as to how the data from various stations became corrupted? Surely this is important as to assessing the integrity of the rest of the data?

Maybe I am missing something but: do we understand completely the reasons as to how the data from various stations became corrupted? Surely this is important as to assessing the integrity of the rest of the data?

I can make a guess. The logistics of research in Antarctica is very dependent on aerial supply and there is a lot of flying out and in of unmanned strips out in the wilderness. Now flying into an remote unmanned site without weather data is not at all desirable, so my guess is that there is a lot of shifting around of a limited number of weather stations (the one that was at “Casey New Airstrip” for just 6 months seems to be an obvious case). Presumably the Head Office back home is not always informed, which does seem a bit sloppy.

I believe I noticed a statement on RC that Steig et al assumed that for AWS’s that tended to report average values relative to two other stations for part of the record, missing data points were replaced with the corresponding average value. I’ve been bitten by this kind of assumption before when using finite difference methods for heat transfer calculations. It may be valid under some conditions, but that’s no guarantee it will work across the board. In one series of tests, a small error at one node between the average value and the actual value propagated itself throughout the relaxation network and created a mammoth error elsewhere. The esteemed Dr. Steig et al shouldn’t be doing calculations anywhere near that complex, but the assumption should be tested, if it hasn’t been already.

63 AWS reconstructions on Steig’s site might indicate that various combinations of AWS records were being examined for robustness.

On this occasion I agree with you completely. I said so on RC before checking in here. We’ll see if they flush me too. I posted my comment on my own blog so you can read it.

There is no reason that data and complete end-to-end scripts shouldn’t be released with publication. This caginess is good for nobody and the grant agencies should go out of their way to discourage it. All I can say in defense of Eric is that this sort of behavior is common in many sciences and isn’t, as far as I know, peculiar to climate. The fear of charlatans stealing ideas (this does happen and damaged my own scientific career) trumps the advancement of science. In the present case I can’t see how that even makes sense, but I think that’s the core impulse.

Re: Michael Tobis (#22),
Welcome back and nice post, Michael.
Gavin at RC countered Michael Tobis reasonable argument to post data and code completely by creating a “slippery slope argument” that leads to an impossible burden on researchers. Frankly it is specious and such an argument would have been eviscerated at the Oxford Union. Steve is right: This is an argument that Eric and Gavin cannot possibly win (which does not mean they will comply, simply that everybody will see them as being wrong).

FWIW my comment at RC:

Michael Tobis:
Nice post. Gavin seems to have adopted the position that you earlier enunciated at least in part but now seemed to have clearly moderated.(The part in italics I just added since it makes clearer that Michael’s original position was nuanced.) Sure there are complexities and costs in posting code and data and there may be an unnecessary effort imposed on researchers — but when the aricle in question is on the front page of Nature, addresses important issues, potentially reshapes existing preconceptions, the authors receive repeated requests for the code – and the code is clearly to hand and accessible to non-authors then Gavin’s position is hard to justify.

I’ve been reading here for going on two years (and many other sites both for and against AGW theories). I’m a liberal from Massachusetts, an engineer by schooling (environmental), and would be considered by most to be of above average intelligence. Your post on your blog was the single most damning evidence of an out of control group of people that I’ve ever read. It’s to a point where someone just entering the debate would think they were reading some fanciful movie script instead of factual events. This is a conspiracy theorists wet dream (an no, I’m not one). It has been said before, but I’ll say it again. Sooner or later there is going to be a major news story about the stonewalling, bad data, and integrity issues that is going to put a black cloud over those involved that will never lift. It’s going to be ugly. After reading your elegant and to the point post, I hope you aren’t one of the people effected by it all as you seem to understand the concept of repeatability. Thank you.

It looks like the organization isn’t adequately informed about what’s actually happening to the AWS in the field. Considering the field efforts neccessary to maintain service for these AWS the resulting (meta-)data quality is dissatisfying.

From my own experience of people doing ad-hoc analyses in SAS in the pharms industry, I would say this is extremely likely. That’s not to say that they wouldn’t have at least some of their code stored, but I wouldn’t be at all surprised if they, say, make use of an existing derived dataset, the provenance of which has been lost to history. Something like “hey, we could use John’s dataset where he corrected for flibble effects” – “sure, let’s use that” – “do we know how he did the correction?” – “not exactly, but John’s a good guy, I’m sure it’s fine” – … etc.

Happens all the time in my industry, though NOT, of course, for the production-quality code that goes supports a new drug application. The FDA would tear our heads off.

It is not only the four automatic weather stations (AWS) listed in SI Table S2 that Steig appears to have used. The Terra Nova Bay station listed there as an occupied weather station seems in fact to be an AWS – it is the Italian AWS Eneide, WMO ID 89662 (ARGOS ID 7353). See http://www.climantartide.it/stazioni-meteo/tabella-eneide.php?lang=en. The station is listed on the READER website both under occupied weather stations (under the name Mario_Zucchelli, which is what the Italian Terra Nova Bay base was renamed) and under AWS (under the name Terra_Nova_Bay), both with the same ID and identical temperature data. The latitude given in READER in both cases differs slightly from that given on the Italian website, but there seems no doubt that both READER entries are for AWS Eneide.
I don’t know whether the inclusion by Steig of this AWS both as a manned station and an AWS affects the validation statistics of his AWS reconstruction (which used occupied station data to produce a reconstruction of AWS temperature data).

At this point, If I were Steig, I would withdraw Nature’s Dec 2008 paper. Several problems with the data have been found. There is a cloud of uncertainty about the validity of the conclusions presented in this article. So withdraw it, redo the calculations and repost the article regardless of what the conclusions are.

I believe the main reason some climate scientists jumped the AGW wagon is because they have to publish if they don’t want to “perish”, metaphorically speaking. And there is a clear bias of scientific papers against negative results, or inconclusive results if you want me to be more accurate.

As far as the peer review process is concerned, It would be nice to know what happened during those 10.5 months that passed between the submission of the manuscript and its publication. It seems very likely that at least one of the reviewers didn’t agree on the publication of this article on Nature the way it was initially submitted. Otherwise it would have been published earlier. Research articles on lesser scientific fields wouldn’t have been that lucky.

At this point, If I were Steig, I would withdraw Nature’s Dec 2008 paper. Several problems with the data have been found. There is a cloud of uncertainty about the validity of the conclusions presented in this article. So withdraw it, redo the calculations and repost the article regardless of what the conclusions are.

You must be new in these here parts…I very much doubt that will happen.

The MO has been pretty consistent: brazen it out. Attack the auditors as stooges, amateurs, shills. Refuse to respond directly, and only do the minimum of what the journal requires (if anything). Very quickly get bored (while the ink is fresh is usually the time to start yawning and stretching) and move on to the next publication. Never ever admit anything was wrong except “trivia” that only obsessive amateur cranks would be worried about.

From the website here is the list of changes made to the Matlab code over the last 8 years. From a file called CHANGES on the site. The formatting appears to get mucked up when I paste it in (tabs I guess).

——————————–

03-Jul-07 * Corrected a problem in PEIGS with recent Matlab
implementations, in which eigenvalues were not
necessarily returned in descending order.

15-Mar-07 * Changed screen output of effective number of variables
(peff) in REGEM for TTLS option. REGEM was erroneously
writing out an effective number of degrees of
freedom, rather than an effective number of variables
(=truncation parameter). However, this was of no
consequence for the performance of the algorithm.

Note that this definition of the effective number of
variables is not precise for TTLS. A better definition
would be based on the filter factor formulation of
Fierro et al. (1997), but obtaining these would
involve additional computation, which seems
unnecessary for this merely diagnostic output.

01-Jul-02: * Users reported problems due to changes in the calling
sequence of EIGS in a beta version of an upcoming Matlab
release.

02-Jan-01 * Adaptation to Matlab 6 (minor changes in PEIGS). REGEM now
has an additional optional parameter OPTIONS.minvarfrac,
from which an upper bound on the regularization parameter
can be constructed.

04-Apr-00: * CovRes in REGEM is now a full matrix, no longer a
sparse matrix. Not allocating memory at initialization
of the sparse matrix had significantly slowed down the
algorithm.

22-Mar-00: * All variables in a dataset are now scaled at the
beginning of an EM iteration and regularization in
standard form is performed. Before, in each individual
record only the variables with available values were
scaled. Scaling all variables slightly changes the
objective of generalized cross-validation: instead of
estimating the regularization parameter for which the
expected *ms error* of the imputed values is minimum,
GCV now estimates the regularization parameter for
which the expected *relative ms error* of the imputed
values is minimum. REGEM with MRIDGE might thus produce
slightly different results; however, REGEM with IRIDGE
should produce the same results as before.

Thanks for requesting the code again. Steig is creating a huge problem for himself. This is what he said to me.

What is there about the sentence, “The code, all of it, exactly as we used it, is right here,” that you don’t understand? Or are youasking for a step-by-step guide to Matlab? If so, you’re certainly welcome to enroll in one of my classes at the University of Washington.–eric]

This is what he said to someone else on the thread.

I do routinely make all our data available, as does everyone else that I know. In this particular case, anyone legitimate who has asked for all our data, including the intermediate steps, has received it

I don’t know how he figures the routines that call RegEm are the intermediate ones but they cut everything else I sent on this both before and after my question.

Major changes in Harry led to only the slightest changes in results. Suggests that ther methodology may be insensitive to input. Could be another hockey stick situation…anything you put in gives warming.

1. There’s no error
2. O.K. there was an error, but we spotted it independently, and it doesn’t change the results
3. Well, the error changes the results, but this is still consistent with the main theory
4. ???

A couple of points need highlighting so that they don’t get overwhelmed in the back and forth over details. First, as Thor points out in #37, is the bizarre reliance on “trust” in comments Tamino made at RC. Second, is the adjective “legitimate” used by Steig (see Jeff Id at #38) to distinguish those with whom he is willing to share his work and those he deems unworthy.

These revelations are telling. They are jaw-dropping in their implications.

The people involved in this affair can be divided roughly into two groups. One group had access, for an extensive period of time, to all the data and all the code. Members of this group actually invested significant professional attention to the preparation of this study. This group is the one Steig considers “legitimate” and Tamino says he “trusts”.

The second group consists of people who only spent a short amount of time looking over the study (without the benefit of the code and methodology) and quickly discovered that the data had some serious problems.

How can members of the first group argue that the second group isn’t advancing the cause of science? Don’t they realize that their petty obstinance and obfuscation only serves to diminish their credibility? Do they really think that that outside observers are going to accept their justifications for denying access to critical information for people who are clearly improving the state of the science?

The team is beginning to look like a lot like brer rabbit after his bout with the tar baby.

Thanks for requesting the code again. Steig is creating a huge problem for himself. This is what he said to me.

What is there about the sentence, “The code, all of it, exactly as we used it, is right here,” that you don’t understand? Or are youasking for a step-by-step guide to Matlab? If so, you’re certainly welcome to enroll in one of my classes at the University of Washington.–eric]

The sarcastic comment above to Jeff Id I saw on Real Climate and tried to post the following:-

When Harry Met Aunt Sally:

Dear Professor Steig why don’t you correct the data and results just as you would advise your students in the classroom.

Having been on the receiving end of an audit that was plenty harsh (at CA in 2007–it wasn’t SM who was harsh), I can testify that it is indeed possible to be polite, to send data and yes even code in response to questions, even from hostile questioners.
Yesterday I predicted that the weather station data fiasco was not over. I was right and I continue to predict further screwups.

There is no reason that data and complete end-to-end scripts shouldn’t be released with publication. This caginess is good for nobody and the grant agencies should go out of their way to discourage it. All I can say in defense of Eric is that this sort of behavior is common in many sciences and isn’t, as far as I know, peculiar to climate. The fear of charlatans stealing ideas (this does happen and damaged my own scientific career) trumps the advancement of science. In the present case I can’t see how that even makes sense, but I think that’s the core impulse.

While it might be legitimate to embargo one’s code and laboriously assembled data (from all except the reviewers) until the article that uses it is published, there is no excuse for keeping it secret from “illegitimate” (ie potentially critical) eyes after the publication date.

I like fFreddy’s idea (#24) that the whole thing was done in interactive mode. MATLAB has such a mode, but of course for anything serious one should write a script that lays out all the steps where they can be double-checked before final execution. So maybe Steig simply has nothing to show us!

[Response: Michael, with all due respect, you are holding climate science to a ridiculously high ideal. i.e. that every paper have every single step and collation available instantly upon publication for anyone regardless of their level of expertise or competence. I agree that would be nice. But this isn’t true for even one of the 1000’s of papers published each month in the field. It certainly isn’t a scientific necessity since real replication is clearly best done independently and science seems to have managed ok up til now (though it could clearly do better). Nonetheless, the amount of supplemental information has grown enormously in recent years and will likely grow even more extensive in future. Encouraging that development requires that the steps that are being taken by the vanguard be praised, rather than condemned because there are people who can’t work out how to set up a data matrix in Matlab. And what do we do for people who don’t have access to Matlab, IDL, STATA or a fortran 95 compiler? Are we to recode everything using open source code that we may never have used? What if the script only runs on Linux? Do we need to make a Windows GUI version too?

Clearly these steps, while theoretically desirable (sure why not?), become increasingly burdensome, and thus some line needs to be drawn between the ideal and the practice. That line has shifted over time and depends enormously on how ‘interesting’ a study is considered to be, but assuming that people trying to replicate work actually have some competency is part of the deal. For example, if there is a calculation of a linear trend, should we put the exact code up and a script as well? Or can we assume that the reader knows what a linear trend is and how to calculate one? What about a global mean? Or an EOF pattern? A purist would say do it all, and that would at least be a consistent position, even if it’s one that will never be satisfied. But if you accept that some assumptions need to be made, you have a responsibility to acknowledge what they are rather than simply insist that perfection is the only acceptable solution. Look, as someone who pretty heavily involved in trying to open out access to climate model output, I’m making similar points to yours in many different forums, but every time people pile on top of scientists who have gone the extra mile because they didn’t go far enough, you set back the process and discourage people from even doing the minimum. – gavin]

Re: theduke (#57), Ok, just wow Gavin. It has been mentioned here repeatedly that in numerous fields including econometrics, protein structure, toxicology studies, medical studies, and many parts of engineering it is required that one archive code and data, even massive data sets. Don’t play on this violin that it is so hard. Of all fields of science, it is currently climate science that is making the greatest claims to be a basis for public policy. Even clinical trials only affect those who will end up taking the drug in question. I notice in the case of Steig’s paper that the whole analysis was redone in less than a day to get the new trends. It must be all set up for a push-button run. eh?

Craig, actually less than a day it seems. This is just more evidence of Gavin trying to complicate an uncomplicated issue. The whole idea that you, as the author of a paper, get to decide who is competent to review your work and who isn’t is utter nonsense and smacks of a “good ol’ boys” mentality – since obviously the only people Gavin and Steig think competent are those that are in agreement with the general theory or are actual colleagues or friends.

Gavin – stop talking yourself into a corner and stonewalling. In an age where we can stream HDTV over the net – it seems ridiculous tho claim that raw data and code, that is already stored somewhere, is a great effort to upload somewhere and costs any real money.

Well, the reply and paste isn’t working for me this morning.
D.Nut,#16
The simple answer is recordkeeping. Realize that the sites have names, AWS numbers, and Argos numbers (and maybe other ID). The Argos numbers are likely in firmware, so if an electronics package is moved, the Argos number and site name/AWS number no longer match. I’m not sure whether the AWS number is assigned to a location or to a specific electronics package. How this change is propagated down to the BAS and other users has not been discussed here, but there are likely to be many steps where an error could occur. BAS then just strings all the data under a given Argos number together without noticing that the site/Argos correlation has changed.

It does appear that Steig is admitting he doesn’t know how to write code. That is pretty damning in itself. In this day and age a scientist is unable to write a simple script to automate his work?????

OTOH, this may have been intentional if you know in advance that some of your methods might come under critical examination that you could not support directly.

However, even though this may not “offically” be computer code, that is a rather nebulous distinction. Whether one calls it a “procedure” (written instructions) or a “script” (code) is not really meaningful. It should all be released.

Re: Hu McColloch #6
.
1. While true, unless bad Harry data made the RegEM results match the actual data better, it could not have had a large impact: http://www.climateaudit.org/?p=5054#comment-325029 – go up 2 posts.
.
2. No argument. They were very confusing about that. I even stated that myself in the previous thread.
.
3. No argument. I would like to see what they did as well. I made a post to this effect in the Mystery Man thread.
.
You seem to be implying that I am defending Steig’s reaction, or failure to check his data, or failure to provide his code and data as used. I have never said anything of the sort.
.
With all of the excitement over the bad AWS data, I just wanted to remind people what could and what could not affect the results:
.
1. The primary purpose of the Steig paper was to investigate temperature trends in Antarctica using satellite data. In order to remove the affect of cloud cover, they discarded some data. The resulting reconstruction was the primary focus of their paper. It’s a satellite reconstruction and has nothing to do with the AWS stations.
.
2. In order to justify what they did to the satellite data, they used RegEM and compared the results from RegEM to known AWS records. They claim good correspondence, so they felt that the RegEM results could be used as an independent check on their satellite results. So they compared the RegEM to the satellite. They claim good correspondence, which is their reason for saying their method for analyzing the satellite data represents an improvement over previous methods.
.
The AWS data is not primary to the conclusions; the satellite reconstruction is. With the exception of Harry, Byrd, Siple, and Mt. Siple, it’s not even secondary to the conclusions because it wasn’t used for the RegEM imputation – the only purpose of which was to provide an independent check on the satellite trends. You can run the script I posted to see that the other AWS stations were not used (just plot columns 190-252 . . . only Harry, Byrd, Siple, and Mt. Siple have residuals of ~0).
.
Even if every other AWS station contains records from Key West, FL, it does not affect the results of the satellite reconstruction. All it would affect is the correspondence of the RegEM imputation to the historical record at the other AWS sites, which, in turn, would affect the correspondence of the satellite data to the RegEM imputation. This could effect the degree to which the satellite corresponds to RegEM, but it cannot affect the satellite reconstruction itself.
.
With all that being said, I feel (like you) that they should have posted the code as used and the data sets as used. No argument there. The initial response of Gavin and Steig at RC did not engender confidence, either – especially as it turns out 4 of the AWS stations were indeed used in the RegEM imputation.
.
At the same time, I think the general assumption that “problems in the AWS data” equates to “potential problems with the conclusions of the paper” is inaccurate. Problems in the AWS data for Harry, Byrd, Siple, and Mt. Siple – and only those 4 – could result in an inaccurate RegEM imputation, which, in turn, could result in a lower correlation between the satellite and RegEM reconstructions. This doesn’t alter the satellite data, but it may alter their error bars (which are already freakin’ HUGE, so I’m not sure that it really changes much).
.
For me, the biggest issue with the study is that they validate the satellite reconstruction using a RegEM imputation that is far outside of any benchmark test for RegEM that I know of. To me, the correspondence between the satellite and RegEM gives no increase in confidence in the results because:
.
1. The satellite cutoff was chosen based on which value corresponded best to the RegEM reconstruction – which destroys the independence of the methods.
.
2. I am not aware of any documented test of RegEM where its ability to infill so much data over such a large area has been categorized. This makes it a poor benchmark because its relationship to reality is unknown.
.
To me, those are the main issues in the Steig paper.

2. In order to justify what they did to the satellite data, they used RegEM and compared the results from RegEM to known AWS records. They claim good correspondence, so they felt that the RegEM results could be used as an independent check on their satellite results. So they compared the RegEM to the satellite. They claim good correspondence, which is their reason for saying their method for analyzing the satellite data represents an improvement over previous methods.

Ok, so it effects the correspondence which in turn should change the error bars. Did the errors change the correlation negatively or positively? If your correlation is now much less between AWS records and the satellite data, then trends of .1 degrees become even more meaningless than that might already be. In short, a poor correlation wouldn’t support their conclusions.

Re: Ryan O (#61),
Ryan O:
This is a very helpful summary of the substantive issues associated with potential errors with the AWS data. Gavin’s and Eric’s reactions to requests for code and possible data errors are another issue entirely.
Given what many readers here have access to and could do, do you have any ideas of what they could do to validate accessible data that is more central to Eric et al’s paper?

I really don’t understand their position on this. They don’t need to release a version for anyone to be able to run it on any platform. They need to
1. Release the platform information of the system(s) and operating system(s) that they used
2. Release of the data and data sets
3. Release the final programing code used to generate the results they are purporting
4. A step by step journal of decisions and assumptions they made during the study to get the results they are purporting.

It seems to me that if they did a reasonable job actually doing the study/model that each of these items should be included in the final write-up and package, prior to even thinking about releasing this to be published. Anything less is sloppy and should be considered a flawed study. The burden is on the author to present a clear picture of the tools they used and the steps they took so that their study can be independently replicated and verified by whomever feels like taking the time.

However, if you look at what was changed in the last two changes, it was only months with very spotty coverage which doesn’t make it into the mean monthly data used in the Steig et al paper

I don’t recall seeing anything about a criterion which sets a bar on when something is too “spotty” for inclusion. Does anyone know how this test is done? I don’t think that it is inherent to RegEM, but I don’t know this. Otherwise, to my knowledge, Tapio Schneider’s code doesn’t test for spottiness. Is it possible that Tapio Schneider’s code is not a complete rendering of the steps that they actually did?

#63. Their excuses are just that. No one’s asking that they provide code for every conceivable platform and language – only the code that they actually used in the language that they used. I do stuff in R and would provide code in R. If they used machine language or COBOL, that’s fine – whatever it was.

Steve, I think he is referring to the completeness of the AWS daily records. For example, the monthly means for Clean Air that were removed were based on data that was about 10-20% complete and doesn’t even show up on the .txt file. I don’t think he’s talking about anything with RegEM.

[Response: The results featured on the cover and in the paper are based on AVHRR satellite data, not AWS data. The latter was used only as a secondary check on the results. And even the results using the AWS data are insensitive to the issues raised thus far. Really not that difficult to understand. -mike]

I wonder if Mann is aware of the paper Phil linked in #328 over on the “Gavin’s Mustery Man Revealed” thread.

AJ Abrams:
.
The mean slope of the time series of the residuals (plotting actual monthly anomalies from actual AWS data post-corrections vs. the RegEM reconstruction) changed by -0.003. In principle, you are exactly right. In this specific case, the change really is negligible.
.
This doesn’t mean, however, that RegEM is a worthy method of benchmarking a new satellite computation. All it means is that Harry didn’t change it much. It makes no statement whatsoever on the usefulness or appropriateness of the benchmark.

This map nicely shows the mountain range that separates W Antarctica from E Antarctica, potentially giving it a different warming trend.

Dirty Harry is on the 240 line between Byrd and the pole. His gal Gill is on the Ross Ice shelf, on the 180 line at 80S. Since Gill is both lower elevation and further from the pole, she has a considerably warmer personality, whence the dramatic chemistry When Harry Met Gill.

Racer Point is the upper leftmost point on the Antarctic Peninsula. Clean Air is at the S Pole, while Penguin Point is on the coast near the 150 line. Elaine is on the 180 line further in from Gill.

I gather that the Nicholas Lewis who ratted out Elaine is our own Nic L, and not one of Hansen’s Bloodhounds. Nice catch, Nic!

It is incorrect that the primary source of the trends is the satellite data. The satellite data only begins in 1979. Dr. Schmidt has made it clear that the primary source of temperature data is the manned weather stations. The satellite data is used to provide spatial coverage. I believe that this means that they have used the satellite data once it became available to compute how locations between manned stations vary with the temperatures at the manned stations. They then rolled that covariance back in time with the data from the manned stations.

As to the AWS data I think they are using it for a similar purpose as the satellite data, but I’m not quite clear on this yet.

Gavin replied to Nicolas that open source isn’t common in commercial settings, completely missing the point which is about repeatability.

Just having a complete script to replicate your workflow is standard practice everywhere I’ve seen except in science. I don’t think it’s especially a matter of climatology. It’s simply that nothing better is expected in grant-driven science.

I have written code for everything from 2000 line utilities to 2,000,000 line flight simulations and trusted operating systems. In every case, packaging up the code so that the current shipped version could be downloaded, compiled, and run (and in most cases so that the binary as shipped matched the binary created by anyone downloading and recompiling – a necessary security measure) was a SIMPLE integral part of saving your work each day.

Any person who considers themselves a professional can save code and data into a source code control system or a versioning system. Heck, if you are absolutely lazy, then zip your working directory of the version that you are using to generate final product. At least you will have the scripts and makefiles to recompile.

The most amazing thing I see is that Gavin has talked about how the changes to the data might make it hard to reproduce results at a later date.

At one point many years ago, the company I worked for was sued by Microsoft. We eventually settled, but as part of the settlement, we agreed to provide microsoft with a set of our code that would compile into the shipped product. Guess what. It was a simple as downloading the latest code set from the SCCS (source code control system), and putting it onto a tape. On the onerous scale, it was a 0.

I’m sending this message from Firefox running on Ubuntu 8.10 (Intrepid). Both Firefox and Ubuntu can be downloaded in their entirety as source code that ANYONE (regardless of their professional credentials) can compile and run (and change). This is millions of lines of code and data. Someone recently estimated that this is in the hundreds, if not thousands of man-years of work product. All of it is packaged complete.

How much work product is involved in Steig, or Mann’s most recent papers?

Next time you hear anyone tell you that it’s too much work to back up data and code, tell yourself that they’re lying. There’s no other explanation.

“A group of Spanish computer scientists measured the size of a Linux system similar to Ubuntu, and found that it contained around 230 million lines of source code. When they translated this into the effort spent on writing this code using a standard software industry cost estimate model, they found that it would correspond to almost 60,000 man-years of work (Amor-Iglesias et. al. 2005).” http://www.antropologi.info/blog/anthropology/anthropology.php?p=2805&more=1&c=1&tb=1&pb=1

Nicolas:
.
Yet another example of unclear writing on Steig’s part (or inability to understand on my part). 🙂
.
With that clarification, the issue of bad data with Harry, Byrd, Siple, and Mt. Siple becomes more important . . . though corrections to AWS stations other than those 4 is still not very consequential.

#74. Michael, I was invited by Stephen Schneider to review an article by MBH replying to our 2003 article. I’d never reviewed an academic article before and I asked for supporting data and code. Schneider said that in 28 years of reviewing no one had ever asked for such things. I said – well, times change and I’m asking. He said – if reviewers have to do this, I’ll never get any reviewers. I said – I’m not saying what other reviewers should or shouldn’t do, but I’m prepared to do this. He said – I’ll need to get an editorial board policy change. I said – fine, get one. So they said that they would establish a policy requiring supporting data and source code. I said – not what you should have done, but here’s the supporting data that I want. Mann refused to provide it. As a reviewer, I observed to Schneider that Mann had failed to comply with the new journal policy. I don’t know what happened thereafter but the article was never published.

Mann cited the rejected/withdrawn article in Jones and Mann 2004 in a bit of academic check-kiting and this latter reference then became “authority” but the original check bounced.

Is it possible that Tapio Schneider’s code is not a complete rendering of the steps that they actually did?

I’m thinking, from some things being said here, that Dr. Steig’s meaning was that he only considers computer programs “code” and that things like scripts and “read me”s aren’t code and therefore you haven’t asked for them. This is disingenuous, of course, and reminiscent of “It depends on what the meaning of ‘is’ is.” Dr. Steig is a PhD and therefor an intelligent person. He should be able to figure out what people are asking for. That he refuses to do so does not speak well of him.

I don’t think Gavin has a clue about what it means to provide all the data and code. I worked in an industry where everything (data and code) was independently verified and archived, and this is going back 30 years. Regular auditing was carried out over that time to keep everyone on their toes and honest. Even calculations done 30 years ago can, in theory be repeated today (although the computer platforms may be a bit different). If it could be done 30 years ago, it can be done today. Gavin has no idea what quality control, traceability and auditing means. I suspect this applies not only to Gavin, but also to all tax-payer funded “climate scientists”.

Following up on my earlier post, after reading Dr. Schmidt’s reply to my query and going back to the article I think I can explain in English how this reconstruction worked. I think it is pretty clever, which I guess shouldn’t be surprising.

They wanted to figure out continent wide trends for the Antarctic over a significant period of time, but the issue is that the sites providing temperature data are quite sparse, with poor coverage in some areas. However since about 1979 there is satellite coverage that can be used for temperature measurement. So what they did was to take observations during the period of time where there was satellite coverage and see whether they could predict the temperature of the areas covered by satellite alone. They additionally tested using AWS data to see if they could predict that.

The math to do the prediction is non trivial as it seems the covariance depends on things like the season, and other factors, but including all these factors they report that they get a good result. In other words by plugging the manned station data into their prediction machine they come up with a reasonable approximation of the values that are recorded by the satellite and the AWS.

To test this they made sure that the system worked well in two different time periods of the satellite era. It seems they also took the most complete 15 manned sites and ran it through the engine to see if it did a good job of predicting the other sites. I can’t tell whether they did this for the entire temperature record, or just the satellite era.

They then ran their engine from the pre satellite era to the satellite era to fill in the gaps in spatial coverage. I assume that in the satellite era they just filled in using the actual satellite data. This is probably automatic with the Regem algorithm.

Personally I would like to verify that they tested over the entire period the ability of the 15 “best” sites to predict the rest of the sites. Then the only other question is whether the manned station data is solid over the entire period from 1957.

Re: Nicolas Nierenberg (#86),
.
After you pointing me to Gavin’s explanation, I pretty much agree. I’m still unclear, though, on the actual usage of the satellite data since Gavin said it was for spatial coverage. To me this would indicate that they didn’t use the satellite data instead of ground station data; they used satellite data to measure the difference in temperature between known points (manned stations) and imputed points on the grid. My read is that the temperature data itself actually still came from the ground station, but the temperature difference between the manned stations and grid points with no ground observations came from using the satellite data essentially as a difference map.
.
Now it makes more sense why they seemed so concerned about the bad data. The primary data source is indeed ground observations – not necessarily AWS – but ground observations nonetheless.

My take on the reconstructions is that there are two independent reconstructions: (1) using the TIR (adjusted IR satellite measurements) and (2) using the AWS stations (selected AWS stations).

Since the AWS and TIR measurements are limited to the more recent past, the only way for the Steig paper authors to get back to 1957 (and use that pontential dip in temperatures back then to make the claim that the Antarctica is warming) is to “reconstruct” the manned stations that, at least as a group, go back that far. The overlap period of the manned stations with AWS stations and the TIR measurements is used for calibrating and verifying the reconstruction. Both the TIR and AWS reconstructions are calibrated using the 42 manned stations listed in Table S2 in the Steig paper SI (or at least that is what the paper says was used).

Now here’s the part that is not so straight forward in my understanding. The TIR measurements are now calibrated to the 42 manned station data spatially and evidently the manned stations going back in time are related to the many grids used for the TIR measurements and thus we see the huge files at Steig’s web site and file directory of with all these grid temperatures going back in time to 1957 for the TIR reconstruction. The situation is the same for the AWS reconstruction as those 63 stations go back to 1957 also.

The critical part of this reconstruction are the 42 manned stations used for the calibration.

It is, however, evident from looking at the AWS reconstruction table in the Steig directory that the raw AWS station data will play a role in that reconstruction in determining the reconstructed temperature trends — and particularly so in the period of overlap with the 42 manned stations.

As an aside I do not see why the Steig paper authors’claim of a warming Antarctica from 1957 to 2006 is of much importance vis a vis AGW. The authors or at least some of the authors conceded the following:

Looking at an approximate period going 30 years back can show no temperature trend to a slightly cooling one.

The decade of 1935-1945 was likely the warmest of the 20th century.

Combining these observations with claim of warming since 1957 (my unofficial results indicate that warming is not statistically significant) brings to light evidence of a cyclical nature of warming and cooling that would appear more of a natural origin than due to an AGW related trend. It would also point to how critical the start dates can be for reporting trends.

As a layman, I’m interested in how the paper at the link below might bear on Mann’s claim that “The results featured on the cover and in the paper are based on AVHRR satellite data, not AWS data.” (Phil linked the paper yesterday.)

The abstract states:

Due to the recent installation of an averaging filter to remove arbitrary data, a trend analysis has been conducted and is included in this study. It agrees with previous statements that the Antartic interior is experiencing no major temperature deviations. However, it does suggest that satellite readings tend to be erroneous during all seasons except the Antartic winter.

It’s important to understand that Mann has obscured where “the results featured on the cover and in the paper” actually come from. If you read Schmidt’s comments in that thread carefully, it’s clear that the results come from manned ground stations only. It has to be that way because the cover is based on history since 1957 and the satellites have only been up since 1982. Steig et al have used the AWS and AVHRR data to correct/calibrate/verify/infill/etc the manned ground station data.

If you want the paper withdrawn, it might be more fruitful to examine the data from the ground stations. If there are as many data integrity problems with those as with the AWS that everyone is scouring now, the paper is almost certainly toast.

Re: Punch My Ticket (#93),
Examine data from the ground stations? That’s all I could find for Amundsen-Scott’s station. But I was looking for site reports of installation and alterations of the equipment. Nothing. Took forever just to find that the current station (installed 2004?) is partly by the runway and partly in the Clean Air Facility. Temperature is measured somewhere in the clean air zone.

It is not surprising that Ryan O (in #217 at Dirty Harry 4 thread) has found that Harry, Byrd, Siple and Mount Siple were used in Steig’s AWS “reconstruction”. According to my analysis, in most cases, the “reconstruction” temperature for the month for each station exactly equals the actual temperature per the READER database, where it exists, minus the average of the actual temperatures for that month (over all years for which valid actual data for that month exists). Unlike Ryan O, I get a perfect match for Siple. Byrd is one of the AWS where there is not a perfect fit for all or all but one or two months (usually December). The others where there is not an exact fit (for all but a few months at most) are Bonapart_Point, Butler_Island, Cape_Denison, Cape_Ross, D_47, GC41, GEO3, GFO8, Hi_Priestly_G1, LGB35, Larson_Ice_Shelf, Law_Dome_Summit, Limbert and Marble_Point.
It looks as if the reconstruction algorithm uses actual data from all the AWS for the dates that it is available but is unable in some cases (particularly at the ends of the actual records, I suspect) to get it to fit smoothly with the proxy data, and therefore modifies it. I hope to investigate further.

Re: Nic L (#95),
.
The first version of my script adjusted the means to be equal. The difference between anomalies at the beginnings/ends of actual data resulted in some minor mismatch (though this did not affect the trends). I redid the script without this adjustment – trends are exactly the same-but I still don’t quite get a perfect match for Siple. Do you round anything?

Re: Ryan O (#97),
No, I didn’t round anything, and I get matches with zero variance to about 8 decimal places for most data points. I did have a problem initially in that I had downloaded AWS data from the HTML pages, which include cases where under 90% of the temperature data was available. Steig evidently only used the data from months where over 90% of the data was available for a station, as included in the text pages (notwithstanding that in his SI Table S1 his cut off point for including AWS stations was 40% data completeness). I foolishly, to save time, converted the HTML data by applying a 90% test to it only to find that in a number of cases months for which 90% data completeness was shown were not included in the text version. But you seem to have downloaded data from the text pages, so that can’t be the explanation of your not getting exact matches. For Siple, the adjustments that I get based on the average temperatures recorded for the months of January to December are:
11.8875000
17.7500000
25.1625000
27.5428571
30.2285714
31.1857143
32.8000000
32.0857143
29.1142857
25.8285714
18.4333333
12.2833333
Adding the relevant one of these adjustments to each actual temperature data point gives the figure in the reconstruction in every case, to at least 7 decimal places.
I am afraid I can’t post a script as this is all in a large Excel workbook, my knowledge of R being rudimentary at present, but I am happy to upload it anywhere suitable.

And what do we do for people who don’t have access to Matlab, IDL, STATA or a fortran 95 compiler? Are we to recode everything using open source code that we may never have used? What if the script only runs on Linux? Do we need to make a Windows GUI version too?

Gavin is just blowing smoke in the eyes of RC readers. This is a total red herring.

All that is being asked by Jeff Id, Michael Tobis, and everyone else here is that Steig post his code in whatever language he used, so that we can figure out what he actually did or didn’t do with which data. Since Tapio Schneider’s RegEM routines are in Matlab, it seems likely that Steig used that language to call them.

If Steig posted his own Matlab on his U. Wash. site, Steve or half a dozen other CA readers would have it translated into R for free within 24 hours. If that.

Although downloading and then retransmitting the image might be construed as a copyright violation, nothing has been copied here except the URL, which is free game. The image is in fact coming to your screen from Steig’s website, where he has placed it for all to see, and not from CA.

you’re writing R with a heavy Fortran accent. I’ll try to translate this script into R some time

Hahahaha! Assembly language would be closer to the truth. Never really played with Fortran (though, given that all the models seem to be written in it, I kind of wish I had).
.
If you do find the time to translate it, I’d be much appreciative. I probably learn best by example. For the moment, I just look up a function in R and brute-force my way through it. No elegance here. Haha. 🙂

I’ve read most, but not all, of the replies and have been following this since it started.
#102:
Considering all of the possible reasons for NOT releasing the code…
Is it possible that Steig doesn’t actually know what scripts/code were run because it was done by someone else (grad student?), and that with this major SNAFU, that person has said “No way…I am no longer involved, and I am not releasing ANYTHING. YOU deal with this.” ?

#104. Ryan, there are LOTS of different ways to do things in R. 95% of all do loops are avoidable which leads to HUGE gains in the code being self-documenting as its the blizzard of subscripts that makes much code hard to read. For example, I’d calculate the trend coefficients for the recon_aws as follows:

This does a couple of things in an R-accent. It uses the function apply. (tapply is HUGELY useful for ragged arrays, as are sapply and lapply. They are amazing.) Also apply, sapply,… encourages the use of functions rather than iterated do loops clarifying what you did. I’d also avoid the use of arrays with things tucked in columns here and there. Make lists instead or define new vectors.

Re: Steve McIntyre (#115),
Thanks. I’ll have a play with the apply functions. It might take me awhile to get away from using arrays and substituting vectors instead…the arrays are a reflection of how I think about the data.
.
<–old-school. Which is another way of saying I’m usually too lazy to adapt to better ways. Haha! 🙂

Steve: arrays are good, it’s the do loops that you want to get away from. sapply does some beautiful things.

Re: Steve McIntyre (#115),
.
The only question I have about it is if you define an object as a function of (x) and then apply it to a matrix where you leave one of the subscripts unspecified, then R assumed that the unspecified subscript is x and iterates through all available subscripts?

Oh, one other question . . . is there any real difference between the “<-” and “=” method of assignment? The documentation that comes with R always uses the former, but I’ve noticed that a lot of folks here use the latter. I didn’t know if there was a real difference, so I used the “<-” . . . but I’d prefer to use “=” if they are equivalent.
Steve: AFICT they are equivalent. I now use “=” since it quotes into WordPress better.

As someone who started thirty years ago with FORTRAN and COBOL under MVS and then VM/CMS — and who moved on to BASIC, to SQL databases, to Nomad2, and then on through a variety of languages such as DOS Command scripts, C++, PowerBuilder, VB, PYTHON, UNIX shell scripts, Informix 4GL — I have the following comments to make about Steve’s snippet of R code:

This piece of very cryptic but very powerful R code is all fine and dandy for someone who is thoroughly conversant in R and who has intimate prior knowledge of the code’s functional purpose, the methods and process flows employed in its internal construction, and the specifications of its desired data outputs.
.
Unfortunately, as efficient a way of employing R programming technique as this code snippet is, the code means absolutely nothing in the absence of an accompanying roadmap which explains in readable English:
— What functional purpose the code servers
— What data objects it acts upon
— What the general internal sequence of process flow it uses
— What outputs are expected
— The kinds of roles it will play within some larger scheme and context of other data interpretation and analysis programs
.
What I am saying here in this note to Steve and to all the others now using R programming code as their primary means of technical communication is that if you are all of a mind to go full bore in taking on the role of climate science auditing vigilantes — to the extent of replicating the entire body of someone else’s computational analysis using your own equivalent R programs — then you all need to start applying some commonly accepted software QA/QC standards to your work.
.
I’m not saying, and do not expect, that this R code should be rewritten in some other manner so as to fit the needs of an external documentation standard. I’m saying that under a lifecycle QA/QC approach, this very cryptic piece of R code is, all by itself, far from being adequate for documenting the assumptions, the methods, the processes, and the contextual basis under which these various analysis replication efforts are being performed.
.
In other words, what is good for the goose is good for the gander, eh?

Scott Brim (119), You missed APL? It’s about as cryptic as “R” to a novitiate…

It’s a shame but when a group of people get together and want to differentiate themselves from the herd as “specialists” in a particular area, the first thing they do is develop their own language. This has two effects: first, since we, the unwashed, can’t understand a word of it, it give them a feeling of uniqueness, and second, it makes others, not a part of the group, think they must know what they’re talking about since “they use all them big words.”

Seriously, I agree with Scott that after many years close association with 25 or 30 computer languages, you eventually learn that every routine you write today may develop a life all its own and could easily come back to bite you several years in the future. The more documentation you put in your code today, the more time you will save in the future when you have to try to figure out what the heck you did “way back then”.

While I enjoy the clear concise code routines included in the posts on this blog, they do take a lot of time to fully understand although I know/expect that time will decrease as my experience with “R” increases.

I am seeing something developing here on CA that I’ve witnessed in other circumstances, wherein scientists and engineers develop a habit of skipping the inconvenient and time-consuming step of pre-planning, pre-designing, and pre-documenting their proposed analysis approach in a form that is easily understandable by others, the experts and the non-experts alike.

Instead, they jump straight into code writing while using the code itself as their primary means of technical communication wherein their ideas are first committed to paper — as opposed to using plain old technical English.

With a programming language like R, the temptation to do this kind of thing can become overwhelming, especially if there are schedules and deadlines which have to be met. I’ve been there myself using similar languages on mainframes writing truly massive software systems very quickly using 4th Generation language systems.

If you know both the 4GL language and a significant level of detail concerning the surrounding context of the application, and the external work that is being supported, you can figure things out. But if you don’t have a good measure of knowledge concerning each of these areas, the intensely cryptic nature of the code leaves you completely clueless as to what is actually being done.

What this boils down to is that CA code writers need to set an example and tell us what it is specifically they are doing with these individual chunks of highly concentrated R code, and they also need to describe the surrounding context and the implementing methodology of their overall analytical approach.

In doing so, I am, of course, asking for something much more than simply “code documentation” per se. I am asking CA code writers to plan and document their analytical work up front so as to be proactive in creating the kind of supporting material that will soon become necessary to defend their own analytical work against the flood of counter-criticism and counter-analysis that is sure to come.

In doing so, I am, of course, asking for something much more than simply “code documentation” per se. I am asking CA code writers to plan and document their analytical work up front so as to be proactive in creating the kind of supporting material that will soon become necessary to defend their own analytical work against the flood of counter-criticism and counter-analysis that is sure to come.

I guess I’m not quite as concerned by the code on this site as you appear to be. While it would be helpful if it were better commented, if you include the discussions surrounding each post a lot of the documentation is there, just not included in the code. I tend to fall back on habits developed many years ago where, as I try to understand a new piece of code, I just rewrite it to add the necessary comments and documentation.

However, I do wonder how much of the resistance by the climate science community to providing the code, or even the data, is actually caused by their exceedingly poor documentation and code/data control procedures. I would bet most of them couldn’t accurately reproduce their work a year or two after publication. Some, like Steig et al, would not be able to reproduce their work even now, only a month after publication, if it were not for the British Antarctic Survey’s ability to recover the original data. Now if they can only remember which code routines they used… and in which order…

Hopefully all this will lead the climate science community to better code/data control procedures, and both them and the journals to improve data and code availability.

For a high-level language like R, you can pretty much include the actual code within a text narrative that says exactly what you’re doing. Lots o’ comments and then white space that makes the code stand out.

Reader Nic L emailed me inquiring about a difference between the Bonapart(e) Point AWS series archived at CA and the one presently online at BAS (no correction notice being posted at BAS for this series.)

BAS does not have an ASCII readable information file. I scraped the lookup names from their html page

If you are interested I have some code ( at home ) that I can post up to create a text file (csv) from the *pt.html files– It is not pretty but it does the job. I use it as part of my GISTEMP step0 emulation.

I know that you like to do as much as you can in R but here is a script (extracted from my GISTEMP step0 emulation) that downloads and diffs the data. Once I have the data local I go ahead and start my calcs

Here is an illustration of the changes to Bonapart/e Point. The black is the Steig original recon. There is pretty much an exact match to the anomaly version (red) of the Bonapart data that I originally scraped (as Nic L observed to me). The blue shows today’s Bonapart data, which is not the same as my scrape. Original Steig used the old version as evidenced by the match to red but not to blue. I checked the “corrected” Steig recon and it is mismatched in the blue dots and appears not to use today’s READER version for Bonapart Point.

HEre’s what I got for counts. Ryan O, I don’t think that your list is right. D_47 matches. I think that you’ve got Harry there. In a few cases, the difference is adding Jan 2009, but there are some odd deletions and changes. As noted above, Nico changed a value.

Re: Steve McIntyre (#134),
I didn’t do station counts; I plotted the difference between the recon and the actual data for all data points that are currently present (current as of last week, anyway). Doing that will pick up data that is changed or added, but not data that is deleted. Here’s examples of what I mean:
.

.
The count will pick up points that were removed – which the plots won’t do. The plots will pick up points that were changed or added. I went through every site and collated any that had points not on zero in the list above.
.
The plot thickens. 🙂

Re: Ryan O (#135),
I think that it is worth separating the differences between the data that appears to have been used in Steig’s reconstruction and that in the current READER files at http://www.antarctica.ac.uk/met/ into two categories:

It is fairly simple to find differences in the second category, whether they reflect null data in one only of Steve’s file or in the current READER file, or a difference in numerical values. I have worked with the old Harry and Racer Island files since it is that data that is reflected in Steve’s file and in Steig’s reconstruction. The differences must presumably reflect unannounced changes that have been made to the READER files since Steve obtained them. I have now compiled a complete table of such differences and would be happy to upload a file of this if I can find somewhere to host it. Note that the announced changes to Penguin Point and Clean Air did not affect the text files and so are irrelevant; the announcement recently made re corrections to Bonaparte Point, also states that it does not affect the text files and so it doesn’t seem to cover the differences I have found. AWS where differences occur are as follows:

It is more difficult to work out what the differences in the first category are, as we don’t have direct access to Steig’s data, and the adjustments he will have made to get to temperature anomolies from actuals will be based on the data he has used. However, I have been making quite good progress with it. Judging by the pattern of implied differences I am seeing – which involve missing data in Steig’s reconstruction, heavily concentrated in one particular month – I would guess that most of the category 1 differences are more likely to be due to errors made by Steig than to changes in the READER files between when he obtained the data and when Steve did.

I would guess that most of the category 1 differences are more likely to be due to errors made by Steig than to changes in the READER files between when he obtained the data and when Steve did.

.
That was my conclusion as well.
.

It is more difficult to work out what the differences in the first category are, as we don’t have direct access to Steig’s data, and the adjustments he will have made to get to temperature anomolies from actuals will be based on the data he has used.

.
This actually isn’t as hard as it might appear at first. You can get Steig’s anomalies exactly by simply subtracting the RECON from the ACTUAL and taking the mode of the result for each month. You don’t even have to know which months he used for the calculation.
.
What’s odd is that sometimes he only used partial data for the anomaly calculation, yet included the full data in the recon.
.
To get the mode of the data, you can use this:
.

modev=names(sort(-table(data[i,j]))[1]

.
The sort produces a vector of frequencies, ordered from most frequent to least frequent, and the names of each quantity is the value associated with the frequency.

Okay. I downloaded the D_47 file 2 minutes ago, plotted, and ended up with the same graph as I posted earlier. At this point, unless BAS updated the D_47 file in the time between when you got it and now, I’m not sure why it would be different.

It’s worth noting that the monthly averages shown in the READER data appear to be simple averages of the 00z, 06z, 12z and 18z monthly temperature readings. That averaging method probably gives different results than the daily (min+max)/2 technique used in much of the rest of the world (including the manned Antarctica stations?).

This can be problematic if one tries to splice or otherwise mix averages derived by one method onto averages derived the other way.

(11 Feb 2009) Incorrect data for Bonapart Point for the whole of 1999 and January 200 have been updates with values from the quality controlled files, as the values received on the GTS appear to come from a different station. All of the incorrect values were displayed in red and did not appear in the downloadable text file.

(11 Feb 2009) The name of Bonapart Point has been changed to Bonaparte Point as this is the correct name.

(11 Feb 2009) The station height for Terra Nova Bay has been corrected from 81m to 92m.

(11 Feb 2009) It has been decided not to calculate AWS wind data from the GTS data as the sensors can sometimes freeze. Only values obtained from the quality controlled files provided by Madison will be displayed.

#144. Ryan, other than getting at the data backwards, can you figure out how Steig did a forward calculation of anomalies? In this sort of detection, I usually try short files and see if there are any clues.

It’s still kind of backwards, but when I removed from the raw data all points that didn’t lie on zero after calculating using the mode, and then recalculated anomalies using the data that was left, I get Steig’s anomalies exactly.
.
Prior to realizing the data changed between Steve’s archive and my download, I was mystified because it looked as though Steig used varying baselines. But when I used Steve’s archive, that problem disappeared (BAS has done a LOT of data changing over the past week apparently).
.
Personally, I think Steig removed some data points that looked like outliers and simply calculated the anomalies based on all available data. There’s also the possibility that he just used all the available data and there were quite a few changes between his data set and Steve’s. I think the former is probably more accurate based on which data points were removed. There’s probably some of the latter mixed in as well.
.
But in the end, once I used Steve’s archived data, I’m convinced Steig just calculated anomalies using all of the information available – minus any outliers he didn’t like.

Personally, I think Steig removed some data points that looked like outliers and simply calculated the anomalies based on all available data. There’s also the possibility that he just used all the available data and there were quite a few changes between his data set and Steve’s. I think the former is probably more accurate based on which data points were removed. There’s probably some of the latter mixed in as well.

I have worked out that the following points for which actual readings existed, at the time that Steve archived them, were not used by Steig. Once these points are removed in doing the spliced calculations, everything agrees to his reconstruction (subject in the case of some AWS and particular months of the year to constant differences for all years that actual data for that month has been used – see below).

The data points (months and actual mean temperatures) that do not appear to have not been used by Steig are:

The huge concentration of deleted data points in December 2002 certainly seems to point in the case of that month to something Steig did rather than to any changes in the READER files before Steve archived them, but it could just be a processing error and lack of checking that is to blame rather than a deliberate decision to exclude the points. It does appear that almost all the missing December 2002 data points were below the December average for the station concerned, but I am not sure that the missing points are outliers.

As mentioned above, in some cases the amount that has to be deducted from each calendar month’s actuals to get the temperature anomoly per Steig’s reconstruction does not exactly equal the average up to December 2006 of that month’s temperature readings. The AWS affected are Butler Island, Cape Ross, GEO3, GF08, LGB35, Larsen Ice Shelf, Law Dome Summit, Limbert and Terra Nova Bay. Almost all the differences disappear if the averages for these AWS are calculated up to June 2008 (July 2008 for Cape Ross, October 2007 for Larsen Ice Shelf) instead of to December 2006, but it remains in 7 of the 756 cases. Very strange.

Further to my post #148, I have now ascertained from BAS that most of the December 2002 all-day monthly average temperatures were displayed in red and so did not appear in the text files until October(?) 2008, when quality controlled data became available. Therefore, it is unsurprising that they are missing from Steig’s reconstruction. Likewise, the missing late 2006 data points for Ferrle and Marble point were probably not included in the text files when Steig obtained the data, as quality controlled files for them only became available in December 2008. Accordingly, contrary to my previous thinking it appears that Steig was not responsible for any of these points being missing from his AWS reconstruction. However, BAS have confirmed that the earlier data I identified as missing were available for download in 2007.

#150. Steve, I’m afraid I have no idea why the Dec 2002 data wasn’t QCed until Oct 2008. That is in fact the last month for which QCed AWS data is stated to be available from the University of Wisconsin – Madison: see ftp://aws.ssec.wisc.edu/pub/aws/antrdr/readme.aupdates. AWS which Madison deals with seem to be listed at ftp://aws.ssec.wisc.edu/pub/aws/biglist.
I have checked the BAS READER HTML web pages and can confirm that (as of last Friday at least) there seems to be no QCed data (in black) yet for January 2003 to January 2007 inclusive for any of the stations on their list, with the exception of Ferrell (Nov & Dec 2006 and Jan 2007), Marble Point (Oct, Nov & Dec 2006 and Jan 2007) and Larsen Ice Shelf (Jan 2003 to Mar 2004 and Nov & Dec 2004). I have checked this back to the ftp directories at Madison, which are empty for 2005 and have only a few monthly data files for 2004 and 2006.
It is difficult to see how Steig could have downloaded his data in 2006, since it extends to December 2006. I suppose he could have downloaded data up to an earlier date during 2006, before the pre-2002 missing data points that I identified were in the READER text files, and then done another download in 2007 to get subsequent data up to December 2006 and not replaced the data for earlier months that he had downloaded in 2006. Or maybe the missing pre 2002 data points were not after all added to the READER text files until later in 2007 than Steig downloaded his data.