Will the Real USHCN Data Set Please Stand Up?

The GISS homepage formerly said:

The NASA GISS Surface Temperature Analysis (GISTEMP) provides a measure of the changing global surface temperature with monthly resolution for the period since 1880, when a reasonably global distribution of meteorological stations was established. Input data for the analysis, collected by many national meteorological services around the world, is the unadjusted data of the Global Historical Climatology Network (Peterson and Vose, 1997 and 1998) except that the USHCN station records included were replaced by a later corrected version.

In econometrics, you couldn’t use loosy-goosy phrases like “replaced by a later corrected version.” You’d have to identify the version. Actually AGU policies (which apply to JGR) require proper data citation, although climate scientists publishing in AGU journals flout this policy, including Hansen here. Following an email to Hansen and Ruedy of GISS, they changed the introduction at the GISTEMP intro from:to the following:

Input data for the analysis, collected by many national meteorological services around the world, is the unadjusted data of the Global Historical Climatology Network (Peterson and Vose, 1997 and 1998) except that the USHCN station records up to 1999 were replaced by a version of USHCN data with further corrections after an adjustment computed by comparing the common 1990-1999 period of the two data sets. (We wish to thank Stephen McIntyre for bringing to our attention that such an adjustment is necessary to prevent creating an artificial jump in year 2000.)

This doesn’t really clarify the provenance of the data. In his email, Ruedy added:

In 2000, USHCN provided us with a file with corrections not contained in the GHCN data. Unlike the GHCN data, that product is not kept current on a regular basis. Hence we used (as you noticed) the GHCN data to extend those data in our further updates (2000-present)

Well, I hadn’t really “noticed” that they had used GHCN data to extend the USHCN data. I’ve done lots of cross-comparisons with different variations trying to identify exactly where the GHCN raw data came from. I’ve put up many plots at CA showing GISS raw as compared to different USHCN versions and I’ve done many more that I’ve not posted up. At his point, there’s one thing that we can say for sure: I’ve now looked at GISS raw as compared to GHCN raw data in the post-2000 period and in the few sites that I’ve examined since receiving this email (Detroit Lakes, Port Angeles), there is an exact match. So at least we’ve tracked down one aspect of the provenance of GISS raw data. I’ve done a comparison plot below for Detroit Lakes MN, a series that we’ve looked at before. After 2000, the match is exact (the delta is 0.0). However, before 2000, the USHCN TOBS/adjusted series sort of match for a while but the match tails off in the earlier portions, with the GISS raw version being quite different.

So where does the GISS raw version come from? At this stage, I think that we can declare that it doesn’t come from any of the USHCN version 2 (raw, TOBS, adjsted). If it did, then the versions would match as exactly as the GHCN raw and GISS raw after 2000. So where does it come from? Another stupid climate science guessing game, although hopefully it will solved in a shorter time than the unsolved MBH99 confidence intervals.

Ruedy’s letter has opened up the possibility of really obsolete data being used – I hadn’t thought to look at really obsolete data. I browsed through some obsolete data in connection with Swindle – who would have thought that we’d be doing so again. Two possibilities spring to mind: (1) maybe he’s using something from USHCN version 1; this is online, but I don’t think that the dates are right. (2) Maybe there’s an earlier USHCN version 2 that’s been overwritten (3) maybe he’s using an old GHCN version before 2000. Maybe none of the above – hey, it’s climate science.

23 Comments

Steve:
Great detective work. My questions are “why” the convolutions? “Why” the imprecision? “Why” the apparent gross ineptness. It could be simply sloppiness but, if I follow what you are saying, it appears to be too convenient. Hopefully I am not succumbing to conspiritis.

A wise old programmer once told me when you find an error in your code, better canvass that whole area, because there is probably another. If Hansen has any software development skill he, or some of is minions, ought to be scouring that section of code.

Certainly now is not the time he wants to be forthcoming with his black box. However, now more than ever us mere mortals that are slaved to his results are justified to demand a peek at it.

RE2 Thanks but I didn’t do much. Don Kostuch is the “Guy” who did the Detroit Lakes site survey. I might add that Don has been a steady and dedicated producer of surveys, having single handedly done most of the state of Minnesota.

Don and I just set the stage with the survey and posting, and Eli, et al helped too by snarkily complaining enough to motivate Steve to canvas the data looking to explain the jump.

Congratulations on getting them to admit one error. Based on your comment in #5, it sounds like you are about to find another. This kind of thing proves the truth of what I have been saying for a long time – important science is being done here everyday.

Steve, now that Hansen has acknowledged the error, has he revised the GISS global temperature results? It might explain some of the difference between HadCRUT3 and GISS in recent years, when GISS is all like “warmest year in the historical record, yo”, and HadCRUT3 is like NOT! …

Why are you putting off topic stuff in here, when Steve is trying so hard to keep stuff restricted to the head post topic, when we have other UHI threads and an Untrheaded thread? You hit your head to hard in pit?

I think you should ask Hansen for a fee for correcting his work. It is after all a NASA product from which I presume they gain some revenue, so it’s only right that you should be given an auditors fee.

If one is forthcoming then you can then claim that you are nolonger (like the rest of us) in the pay of ‘Big oil’ but have been funded by the ‘consensus’ instead.

#10
Jerry:
I thought the point was that no one actually knows where the data comes from because the documentation is incomplete or
non-existent? It would be great if you had more definitive information as to the origins of the “one time special version” of data under question.

In a paper which can be found viahttp://pubs.giss.nasa.gov/abstracts/2001/Hansen_etal.html
Hansen et al discussed using USHCN adjusted numbers instead
of the raw numbers they had previously used. However, the
USHCN “adjustment” for missing data was not going to be used;
GISS would use its own approach for missing data. It now
seems that a special set of USHCN adjusted data, absent the
missing data adjustment, was prepared for GISS as a one time
service by the USHCN folks. That’s my current guess. YMMV

From the Hansen paper linked by JerryB above, we have the following excerpt explaining the data and adjustments used in constructing the GISS data set. From this description, I judge that GISS uses USHCN data without homogeneity adjustments and uses the USHCN metadata for making their own homogeneity adjustment. I am readily confused by all these inter-uses of data sets so I will list here my understanding of what USHCN does to their raw data and then what GISS uses for their own adjustments:

USHCN adjustments:

Areal Edit for outlier and suspect data.

TOBS for time of observation changes.

MMTS for liquid-in-glass thermometer bias.

SHAP for a Karl rendition of a homogeneity adjustment.

FilNET for filling in missing data.

Urban for urban warming.

From the excerpt below I understand that GISS does its own homogeneity adjustments (to urban sites only??) and quality control (like the Areal Edit that USHCN does ??) and applies its own urban warming adjustments (with the night light data).

I need help from Steve M and JerryB in understanding if my view of the data adjustments are correct. Also, Steve M, what does USCHN (and GISS) raw and adjusted imply when you are looking at differences.

If I can better understand the above process, perhaps I can became less confused on what it was the GISS did starting in the year 2000.

The source of the monthly mean station temperatures for the GISS analysis is the Global Historical Climatology Network (GHCN) of Peterson and Vose [1997] and updates, available electronically, from the National Climatic Data Center (NCDC). This is a compilation of 31 data sets, which include data from more than 7200 independent stations. One of the 31 data sets is the U.S. Historical Climatology Network (USHCN), which includes about 1200 stations in the United States. The USHCN [Karl et al., 1990; Easterling et al., 1996a] is composed of stations with nearly complete records in the 20th century and with metadata that aid homogeneity adjustments. The GISS analysis uses the version of the GHCN without homogeneity adjustments, as adjustments are carried out independently in the GISS analysis. The GISS adjustments consist of data quality control and a homogeneity adjustment applied to urban stations. The data quality control, including comparison of each station with its several nearest neighbors, is the same in the current GISS analysis as described by Hansen et al. [1999]. The urban adjustment is improved in the current GISS analysis. The urban adjustment of Hansen et al. [1999] consisted of a two-legged linear adjustment such that the linear trend of temperature before and after 1950 was the same as the mean trend of rural neighboring stations. In the new GISS analysis the hinge year is a variable chosen to be that which allows the adjusted urban record to fit the mean of its neighbors most precisely. The current GISS analysis also uses satellite measurements of nightlights to identify urban areas and remote stations in the United States (and southern Canada and northern Mexico); only “unlit” stations are used to define homogeneity adjustments. For USHCN stations the time-of-observation and station history adjustments of Karl et al. [1990] are applied before the urban adjustment is made.

If I ever figure out how these adjustments are actually done, you can be sure that I’ll report it. The methodological descriptions are inadequate without being able to inspect the software and see the actual calculations. Also please keep in mind that NOAA, CRU, GISS, GHCN all have slightly different methods, but many things are in common. So no short answers, you’ll just have to stay tuned.

I’m still waiting for an explanation for the literally incredible adjustment made to the NYC Central Park raw data as contained in USHCNv1 and further aggravated by the GHCNv2 adjustments which were in the opposite direction. The net difference between the GHCNv2 and USHCNv1 data for NYC Central Park is as much as 11°F for annual means in the 1961-1990 period!

As stated in many earlier posts, the net result of these “adjustments” is to lower the NYC temps in the 1961-1990 period and then raise the temps as the adjusments are reduced post 1990. And guess what, you’ve created a mini hockey stick! This would be so even if the raw annual mean data from 1961-2006 were isothermal. So far, all I see is a blank when it comes to defending this absurd data by NCDC/GISS. Is this stonewalling?

I glanced at the current Port Gibson, MS GISS time series versus what I had filed at Anthony Watts’ site last August, simply to see if anything had changed. The August version, with markings, is here .

The loss of pre-1958 data (“Gone with The Wind”, like the rest of Port Gibson’s antebellum past) is disappointing but I suppose the data still exists somewhere. The thing that baffles me are the temperature changes in some of the years from 1958-1980 (I circled several examples).

Seems that by now the 60s and 70s would be settled. I don’t see a pattern or anything material and I don’t suspect anything untoward, but I do find it odd.

I checked several other towns nearby and saw changes in those, too, in scattered years in the 60s and 70s.

Has there been subsequent documentation of the logic (rules) used to make these adjustments? (The scope of documentation should extend to adjustments made to input data.)
Have the system specifications for these adjustments been made public?

Surely an audit is necessary. James Hansen is quoted:
“As we predicted last year, 2007 was warmer than 2006, continuing the strong warming trend of the past 30 years that has been confidently attributed to the effect of increasing human-made greenhouse gases,” said James Hansen, director of NASA GISS.”

“It is unlikely that 2008 will be a year with truly exceptional global mean temperature,” said Hansen. “Barring a large volcanic eruption, a record global temperature clearly exceeding that of 2005 can be expected within the next few years, at the time of the next El Nino, because of the background warming trend attributable to continuing increases of greenhouse gases.”

Has anyone filed a demand under the Freedom Of Information Act for the adjustment rules, system specifications, parameters and test data / expected results that were used for the system(s) that made these adjustments?

Of special interest would be (#18) “The GISS adjustments consist of data quality control and a homogeneity adjustment applied to urban stations. The data quality control, including comparison of each station with its several nearest neighbors, is the same in the current GISS analysis as described by Hansen et al. [1999]. The urban adjustment is improved in the current GISS analysis. The urban adjustment of Hansen et al. [1999] consisted of a two-legged linear adjustment such that the linear trend of temperature before and after 1950 was the same as the mean trend of rural neighboring stations. In the new GISS analysis the hinge year is a variable chosen to be that which allows the adjusted urban record to fit the mean of its neighbors most precisely.”

Further in #18:
“… uses satellite measurements of nightlights to identify urban areas and remote stations in the United States (and southern Canada and northern Mexico); only unlit stations are used to define homogeneity adjustments.”

Anyone who has driven from a mall parking lot to a suburban home with trees knows that the external temperature drops several degrees.

To discriminate between urban and rural by “lit” and “unlit” probably involves a parameter on the amount of light observed. A poor choice of parameter could well “smear” the urban heat dome into the suburbs.

Further, where is a reference to Dr. Watts work on the quality of the climate monitoring stations that are part of the US Historical Climatological Network (USHCN)?

Lastly, where is the reference to a time series of what is classified as urban, suburban and rural?

As for me, I worked over 40 years in systems, process and data architecture. This included quality assurance (prevention of error) and quality control (elimination of error).

One of the most essential aspects of this work was identification of stakeholders and their agendas.

To find broad changes to data, made by undocumented processes, followed by a statement “As we predicted last year….” gives me the willies.

[…] of the data fed into the system. NASA’s GISS Surface Temperature Analysis (GISTEMP) had to be corrected because it over stated the recorded temperatures. Reported warming suddenly disappeared. There […]