Y2K Re-Visited

Long-time CA visitors will recall the events in mid Sept 2007 when NASA GISS made abrupt changes to US historical temperature data without annotation – a month after the Y2K changes. Some fresh light has been shed on these events by the NASA FOI. At the time, I observed:

no wonder Hansen can’t joust with jesters, when he’s so busy adjusting his adjustments.

On Sep 12, Jerry Brennan wrote in reporting puzzling changes over the previous few days in the station history for Detroit Lakes MN which we’d been using as a test case. I posted up the following illustration of the changes with the commentary shown in the caption:Figure 1. Original Commentary: Jerry Brennan observed today that Hansen appeared to have already “moved on”, noticing apparent changes in Detroit Lakes and a couple of other sites. Here is a comparison of the Detroit Lakes (combined) as downloaded today, compared to the version downloaded less than 3 weeks ago. As you see, Detroit Lakes became about 0.5 deg C colder in the first part of the 20th century, as compared to the data from a couple of weeks ago.

When I noticed this, I sent an email to NASA notifying them of the effect and asking about it – see here.

A little later, again based on a suggestion from Jerry Brennan, I postulated that the most recent Hansen adjustment to his adjustments came from changing the provenance of his USHCN version once again, with the remarkable corollary that “the temperature increase in Boulder since the 1980s is about 0.5 deg more than they believed only a couple of weeks ago”:

It looks like this is the reason for the conundrum observed in my last post. I never thought of checking to see if Hansen had altered early 20th century values for Detroit Lakes MN between August 25 and Sept 10. It’s hard to keep with NASA adjusters. As noted previously, no wonder Hansen can’t joust with jesters, when he’s so busy adjusting his adjustments.

As a result of revisions made within the last 2 weeks, NASA now believes that the temperature increase in Boulder since the 1980s is about 0.5 deg more than they believed only a couple of weeks ago. Boulder is the home of IPCC Working Group 1, the site of UCAR’s world headquarters, NCAR’s site and home to hundreds, if not thousands of climate scientists. You’d think that they’d have known the temperature in Boulder in the early 1980s to within 0.5 degree. I guess not.

When Hansen capitulated to pressure to release GISS code, I commented here on what I believed to be the relevant interest in temperature records – a comment that seems apt today ( in a CRU context):

Personally, as I’ve said on many occasions, I have little doubt that the late 20th century was warmer than the 19th century. At present, I’m intrigued by the question as to how we know that it’s warmer now than in the 1930s. It seems plausible to me that it is. But how do we know that it is? And why should any scientist think that answering such a question is a “hassle”?

In my first post on the matter, I suggested that Hansen’s most appropriate response was to make his code available promptly and cordially. Since a somewhat embarrassing error had already been identified, I thought that it would be difficult for NASA to completely stonewall the matter regardless of Hansen’s own wishes in the matter. (I hadn’t started an FOI but was going to do so.) Had Hansen done so, if he wished, he could then have included an expression of confidence that the rest of the code did not include material defects. Now he’s had to disclose the code anyway and has done so in a rather graceless way.

I also posed the following small puzzle in respect to the temperature records:

If Hansen says that South America and Africa don’t matter to “global” and thus presumably to Southern Hemisphere temperature change, then it makes one wonder all the more: what does matter?

Steve is still having fun with step0, but eventually he’ll find that we did not document how we determined the US brightness numbers we use in our homogeneization.

Checking that step, I noticed that the program that reads Stutzer’s file
is machine dependent, because some stations lie on the edge between 2
cells. If in these cases I use the brighter of the 2 cells the choices
become more robust. 8 of the about 400 rural stations would become
peri-urban, and since we don’t distinguish between urban and peri-urban
stations, these would be the only effective changes.

I looked at the US annual mean series in both cases: they differ by less
than .003 C except between 1880 and 1900; in 1900 the difference new-old is -O.OO5 deg C, in 1880 +.014C .

I would prefer to use add the newer version into the sources. I’d also
like to take the text out of the tar file and make it available
separately – this is probably the only thing that any normal person
might be interested in. The big brightness file should probably also be
stored separately for the convenience of. those who would just want to
look at the programs.

By the way, Steve’s newest “discovery” about the Detroit file is simply
due to our switch from USHCN-1999 to USHCN-2005. He also claims (not in any•email, just in his blogs – so it may not be true), USHCN-2006 is also available; I know that NOAA has it but I can’t find it on the USHCN site.
Reto

Hansen replied:

On 9/12/07, James Hansen «

We should make the explanation about switching to newer USHCN available. But I don’t think that changes should be made that affect the results, even at the O.OOx level, unless/until we discuss it when I am in the office. A machine dependence is not something that we can be assaulted for, but changing the analysis in response to McIntyre adds fuel to the bonfire of his vanities. Jim

Later in the day, Hansen added:

Got Makiko’s phone message. I agree that we need to add a dated statement to a list each time we change the analysis procedure. or input files that are used. This can be a very brief statement. In the present case a simple statement to the effect that we switched to XXX on yyy. If WMO or NCDC changes the data in a file, that is not our change, but when we make a change it should be noted.
Jim

The day closed with NASA deciding to post up a backdated change notice, the backdate being prior to my observation of the change, making it look like I’d failed to read what was on their website :

*** What’s New ***
(a) Sept. 10, 2007: The use of the USHCN station records was extended to 2005 from 1999.
(b) Please see “A
Light On Upstairs?” (Aug. 10), “The Real Deal: Usufruct & the Gorilla” (Aug. 16), and
“”Peak Oil” Paper Revised and Temperature Analysis Code” (Sept. 7). for discussions regarding the changes made on August 7, 2007 for 2000-2006 U.S. mean temperatures. ” to my web page called “graphs”, but not on the GISS temperature home page. Are my words OK?
Makiko

September 2007: The year 2000 version of USHCN data was replaced by the current version (with data through 2005). In this newer version, NOAA removed or corrected a number of station records before year 2000. Since these changes included most of the records that failed our quality control checks, we no longer remove any USHCN records. The effect of station removal on analyzed global temperature is very small, as shown by graphs and maps available here.

20 Comments

Got Makiko’s phone message. I agree that we need to add a dated statement to a list each time we change the analysis procedure. or input files that are used. This can be a very brief statement. In the present case a simple statement to the effect that we switched to XXX on yyy. If WMO or NCDC changes the data in a file, that is not our change, but when we make a change it should be noted.
Jim

This is called version control and also has elements of source control in it.

SO NASA HAS DISCOVERED VERSION CONTROL!!!!!!!

I wonder how long it will be before these people discover the telephone and sliced bread. This is absolutely unbelievable. These people expect to be able to advise the world. They don’t even know how to manage their own operations.

OODT underlies both the NASA Planetary Data System (PDS) and — as described in the video — several medical-research data archives.

One requirement the presenter attributed to PDS is that, although new versions of data are frequently added, no version of archived data may ever be deleted.

I suspect there’s a History of Science doctoral dissertation-that-needs-to-be-written, to dig up and document exactly how the evidently slip-shod Climate Science data-archiving culture and the obsessive-compulsive data-archiving culture on the Planetary Data side diverged. I suspect it has something to do with the close association of Planetary Science mission scientists with the notoriously obsessive-compulsive aerospace engineers who build the hardware for space missions.

” The best security against revolution is in constant correction of abuses and the introduction of needed improvements. It is the neglect of timely repair that makes rebuilding necessary” Richard Whately (Prelate & theologian).

Incompetence would seem to be the plausible explanation for each of these errors taken individually. But at a more philosophical level – why is it that there is an ongoing need and or desire to adjust the past? This seems to be a failure of scientific logic. Similarly – why do the preponderance of these errors point in the direction of recent warming and why do they typically emerge after investigation by inquisitive people outside the ‘Team’?

The Titanic is not a sensible analogy here. People dired because of so many vanities:

The vanity of the engineer thinking it was unsinkable and so did not require a lifeboat for each passenger,
The vanity of the company wanting maximum speed
The vanity of humanity assuming the rich deserve better treatment than the poor.

The vanity of the modelers thinking their models accurately represent reality
The vanity of the reconstructors to think no one would check their work
The vanity of the clique to think they could control peer review.

If I understand it correctly version control is the record that something has changed. It has a new version. Revision control is the detail of what that change was. Were both not present in Hansen’s ever dynamic polemic?

As someone who in auto-manufacturing had to put up with capability studies, guage R&R, calibration traceability, control charting etc ad nauseum. Can I just ask the default question, why is it ever necessary to adjust instrument data if it’s believed that the device is working properly, is in calibration and is being operated with a sound measurement method ? I can understand that the results might be classed as the result of an assignable cause. I can understand that you might check the guage decide it’s gone wonky or the method is unreliable and strike the results from calculation. But I don’t quite get why they should be altered……. Am I being obtuse here? but surely the results are the results?

Not that I entirely approve of the tactic, but the idea is that when you’re making local measurements of a larger climactic phenomenon, you’re necessarily taking a sample at a point where even a slightly larger local averaging could produce results that differ from it. The homogenization techniques are designed, in theory, to allow you to have an estimate of the average local temperature; using things like spatial averaging to attempt to isolate and repair both single-measurement outliers and trends not related to actual climate (siting and local environmental changes).

That said, I have not been thrilled by what I have seen of the actual techniques employed, especially since it is very difficult to determine whether the applied adjustments are actually improving the quality of the data.

I can understand how the actual results from a group of stations might have weightings and inclusion/exclusion criteria applied to them in the course of generating a derived set of aggregate readings say for a defined district or a geographical ‘Box’ but I still don’t see why changing the original data is involved. I’d also expect any such aggregation process and indeed any decision points to exclude ‘outliers’ to be based on a published specification detailing how they estimated the ‘normal’ mean and variability of the climate at each station. Which distribution was used to deal with rare events and why. How they would ascribe any particular set of readings as normal or indicative of a change in mean or variability. What the weighting/exclusion algorithms are and how well they are conditioned in response to changes in the data. Is all that available?

The other thing suggested is that what’s really needed is a self calibrating network that isolates locations that are drifting away from “true”. Having been involved in building sensor nets for industrial processes, I would have tp say it’s difficult enough to get working when its been designed for purpose with malice and forethought and you have some control over the operational environment. Trying to synthesise that result from a set of stations NOT designed for it by torturing the data isn’t likely to give reliable results.

I skimmed a recent issue of Scientific American containing an article regarding global warming research skeptics and how some researchers are taking them more seriously, or at least allowing them a voice.

The article states that several skeptic theories have been disproved, yet continue to circulate nonetheless. Naturally, the author doesn’t waste any ink telling us which theories those are, who disproved them, and whether there have been subsequent rebuttals by the skeptics.

I know you can’t read the mind of the author, but can you shed some light on what you think this author considers “disproved”?