Station Data

There is a vast amount of available station data. Versions have usually been adjusted and care has to be taken to be sure exactly which version has been used. Categories of information that I’ve tried to link here include both data and metadata. This is for orientation only. This does not include information on SST estimates or gridcell estimates.

Station Data in the Major Gridcell Composites
There are three major global indices of temperatures that incorporate station data: CRU, GISS and NOAA. Each of these groups primarily relies on the GHCN (Global Historical Climatology Network) for their input data. GHCN has two versions: 1 and 2. Each version contains max, min, mean and adjusted mean. A large proportion of the GHCN network is composed of the USHCN (US Historical Climatology Network). The USHCN network has two versions: 1 and 2 (not yet released), which do not coincide with GHCN versions. USHCN version 1 has raw, time-of-observation adjusted, adjusted and urban adjusted variations. Daily information is available for a subset of GHCN. Identification numbers are not consistent between USHCN and GHCN (and elsewhere). I know of no official concordance of USHCN and GHCN identifications and have archived my own.

Look at the directory for a list of available versions. There is a medium-sized zipped data file that is updated all the time (my present download – June 20, 2007) ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z

GHCN carries out their own adjustments: ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean_adj.Z – also updated all the time. About 25% smaller in size than the raw data.

GHCN has 12 identification numbers: the first 3 are country code, the next 5 the nearby WMO station and the next 3 a station identified. All 11 digits are needed to identify a station. This 11 digit code links back to the Station Inventory File. The 12th digit identifies the “duplicate number” since GHCN raw archives versions that are scribally distinct.

Data: There are 3 GISS datasets: 0: raw; 1- combined; 2- adjusted.
dset=0 contains multiple versions and for the most part, this information seems to match GHCN versions, including the duplicate number. As noted above, different versions of one station may only have scribal variations.
dset=1 is the “combined” time series at one station. Very occasionally, there are two dset=1 versions for one station. The form of “combining” records is very idiosyncratic, involving pervasive adjustments; there is a lengthy discussion at climateaudit on the topic. For USHCN stations, the “raw” GISS version is generally similar to the USHCN adjusted (FILNET) version.
dset=2 – “adjusted” using the GISS adjustment, which supposedly adjusts the trend of urban stations to the trend of rural stations within 1000 km.

There is no comprehensive version of the data. Each individual time series can be scraped from the GISS websitehttp://data.giss.nasa.gov/gistemp/station_data/ . It is VERY time-consuming to scrape the entire data set: I’ve done so on a high-speed cable connection and the downloading of 16 (net) MB of dset=0 data took nearly 24 hours.

Like this:

27 Comments

Is there anywhere I can find the true raw temperature data, what proxies (and/or actual stations) are used for what years, and a taxonomy of the various datasets? What I’m trying to do is find out how many different historical sets of temperature data there are and what organizations archive and rely on what data, and what is the basis for that data. I’m aware that there are thousands of temperature stations all over the world; are there multiple organizations that collect that data, or is it collected and dessiminated by only one organization? Assuming that more than one organization has this data, where is it kept, and is there a detailed audit trail of what the true raw readings were, and how and by what algorithms they may have been adjusted for movement of the station, changing environmental factors, etc?

I apologize if this is OT or there is some simple place on this site where this is located, but I have looked at length to no avail.

Maybe what I’m looking for, at least for the raw data, is the GHCN data?

Sorry for all the questions, but what is the relationship between the GHCN data and the CRU, GISS and NOAA “indices”?

Thanks again!
Allen

P. S. What prompted this is a discussion with someone who claimed there were “thousands” of data sets that all showed warming over the last hundred years, and I took issue with that. Then, later, I saw the same claim in the comments section of a newspaper article on climategate. I was under the impression there were only one or two datasets all the other researchers used, and, if they were corrupted, in would have very far reaching ramifications.

All the raw data I’m finding is in text form. How the hell do I convert to a form suirable for use in R.

The obvious, very tedious way I can work out but is there another. Please
Steve: I’ve posted utilities for converting station data to R. See climateaudit.info/scripts. Which station data are you looking at? PS – I’ve converted the Met Office dump into a more organized R format and could place this online.

I have downloaded the Jones et al Model data for Oxford UK from “http://image.guardian.co.uk/sys-files/Guardian/documents/2009/12/08/uk.csv” (an extract from “http://www.metoffice.gov.uk/climatechange/science/monitoring/reference/All.zip”) and compared it to the Actual temperatures as recorded at “http://www.metoffice.gov.uk/climate/uk/stationdata/oxforddata.txt” for the relevant dates, 1900-1980.

I have caclulated the tMonthlyMean values from the Actual data and compared it to the Model tMonthlyMean figures on a month by month basis. These mostly show a +-0.05 difference (which is presumably due to some rounding errors to get to 0.1 degree published values either by me or by others).

Can anyone tell me why, then, the last fews years of the Model data (1978-1980) differs so widely from the Actual recorded temperatures? The Model is out by up to 2.2 degress C and an average of 0.36 degress C of warming compared to the Actual temperatures for just these last few years.

I read somewhere recently (unfortunately cannot remember where) that the reason that the temperature figures are changed or updated, is because only thermometers that have been in existence for 20 years are taken into account initially, but then as they reach 20 years, they are added into the measurements as are their measurements which go back 20 years.
Firstly, is this correct?
Secondly, if this is so, how many of these thermometers in waiting are there, and where are they situated.
In view of the long term planning to hoodwink the public (in my view of course), I would not put it past some people to have placed thermometers in locations which will favour the increase.
This is also related to the demise of many thermometers in rural locations as already documented.
I look forward to some comment on this.

“In view of the long term planning to hoodwink the public (in my view of course), I would not put it past some people to have placed thermometers in locations which will favour the increase.”

It may be your view, but that doesn’t mean you don’t need evidence. My experience of climate scientists is that they couldn’t give a toss whether the public believes them or not – and nor should they. It’s good if your work turns out useful, but you don’t know that until you’ve done it. As to placing the thermometers – how would you choose sites that “favour the increase”? I’ve been a meteorologist for 30 years and I would have thought it was impossible. Also, thermometers are located to aid weather forecasting, usually up to 5 days ahead at most and mostly for aviation, military, or agricultural purposes. The use of the data by climatologists is incidental; they have no influence over the siting of most instruments, even in the UK Met Office there is no overlap.

The USHCN files for 1-2 and 3-5 station site ratings include 5 of the Pennsylvanian stations. On the survey results maps it looks as if most, if not all of them, were rated.

Is there a table of the surface station ratings (1-5) for all the surface stations rated? While I found the documentation for the surveys at http://surfacestations.org I was unable to locate such a table.

Jennifer,
You may already have this, and it’s not actually a table, but Menne has posted a map showing two groups of ratings (1-2 and 3-5) for all the surfacestation.org ratings that Anthony has released:

I should remark that these were only a preliminary and partial release, and that Mr. Watts is (justifiably, imho) bent out of shape at Menne’s precipitous use of this data.
Good luck in your explorations.
John Slayton
Azusa CA

I really appreciate the effort. When I first found those lists I figured that I had everything. But, if you compare the number of stations Watts, et al surveyed (look at their map) with the information over at NOAA it looks incomplete to me.

For now I am continuing my analysis without this information. It would have been so nice to run a full comparison, but that can be added in when I’m done with the rest.

While you, personally may find it inconceivable, the climategate emails, WATTS observations and the diarizations on this blog suggest to ME that such a practice is at lesat negligent, and may be both deliberate and widespread.

I posted these over at WUWT & chiefio; some readers may find it useful.

——————————————————————-

Previous discussions (the ‘lost’ stations in Honolulu and Dutch Harbour) have already called attention to the limited intelligence of available data searches. Trivial errors lead to blind alleys. Type in ‘MCMILLIN’, instead of ‘MC MILLIN’, and MMS will simply report that it couldn’t find a match. The list of version 2 stations available at:

Of course, I don’t really know which names are ‘correct,’ but MMS has more information, so I go with their names.

——————————————————————–

…and here’s the list of closed v2 stations. Many have been closed since the 90’s. One does wonder what motivates creating a new network incorporating stations that have been closed ten or fifteen years?

Steve,
It seems to be well established that the number of stations reported at GHCN is dropping (Peterson & Vose 1997, 1998 recently updated by D’Aleo & Watts).

I used to earn a living collecting energetic photons (up to 170 MeV). It would never have occurred to me to discard 80% of the data before starting the analysis. Is there any reasonable explanation for doing so in “Climate Science”?

Not all data sets are created equal. The GHCN keeps the short records. That is then fed into GIStemp, that tosses out anything sorter than 20 years (in the STEP2 processing just prior to the PApars.f program). So depending on where you stick your ladle in the river you get different results. IMHO the more “upriver” from the effluent producers the better 😉

@Allan Williams: Even the “GHCN UN-Adjusted” data are rather adjusted. I’ve yet to find actual “raw” sources. The “QA process” can fabricate data based on averages of nearby stations and can change the values of items.

@gallopingcamel: Yes, the GHCN set has dropped from over 6000 at peak activity to about 1200 now. Though in the last couple of years they have started adding stations back in (not clear if they are adding warming stations… but it looks like it.)

@ThosTHos: how to pick warming vs cooling sites? Easy. Very easy. First off you could just pick places with planned economic growth. Like, oh, Airports… I’m sure it’s just an accident that 92% of active thermometers are now at airports in France (and a similar number in the USA) for the GHCN data set… Furthermore, as the GHCN is a “historic creation” (as the apologists for the station dropping like to point out – dodging the point that it’s still a CHOICE what goes in…) it’s easy to run a code like “first differences” on the station data and find what’s warming and what’s not. I’ve just done this for GHCN for the entire world. Took me about a month, but I was only working on it part time:

The code I wrote for this is called dT/dt and it’s maybe 2 dozen lines long of active code. Oddly, I found many cooling countries. More oddly, most of them have their data end prior to today… In Southern Africa we have a very unphysical warming in one country that persists to today, but it is surrounded by countries where the falling or flat series end. They, of course, will be “filled in” from the one that was rising in codes like GIStemp.

I asked on WUWT when the measurement of temperatures was switched from Tave =(Tmax+Tmin)/2 to Tmean being directly taken from the area under the curve, since instruments can now measure continuously. I was told that this had not been done as it was unnecessary since, over a period,
Tave-Tmean->0. I was also told that it was the difference between the trends in Tave and Tmean that mattered, not their absolute values, and this trend was alwys zero. It was added that the work proving this had been done many times. So I asked for the reference to a single paper where this had been shown. I have had no reply. Can you refer me to any paper where these assertions have been shown to be correct, please?

Is there a thorough history (current measurement conditions, discontinuities of metrology, local site changes) of every site preferably with reference to appropriate local or similar to local climate studies on the thermal history distortions. Even what to consider with record holes.

That’s what people really need to correct the temperature not what GHCN and the all the rest do which is lazy maths.

Like Max and Min temperatures tells the whole story- there are temperature inversions.

Is there a thorough history of any site? Overly cynical, but the point survives the exaggeration. Example: Clayton, New Mexico. I drove up there this week and took some nice pictures of the present (AWOS) installation. To try to build a history, I contacted the local history museum. Incredibly, they have saved the old Stevenson screen for a display the museum building. One of the curators remembered exactly where the screen had been positioned in front of the airport office, so I went back to take pictures of that location. The museum curators put me in touch with the surviving spouse of a man who had been the weather observer for many years. She, in turn, told me of a different well-defined location where the instruments had been when the station was first moved to the airport. So it was back to the airport to get pictures of that location. Somewhere along the line somebody pointed out that the station had been on top of the local gas company office (1930-1937). Yes, that building apparently survives, so it’s downtown to get pictures of that location.

By now I have pictures of 4 separate locations, and still do not have a complete record. I do not even now how to adjust for what is known. It’s a safe bet that the roof of a downtown office building was warmer than the entrance gate to the small airport. But how much warmer?

The existence of century-old weather data was at first beguiling. Just adjust for knowable biases, and use statistical treatment over a number of stations to tease a signal out of the background noise. But after looking at a good number of these stations, my personal conclusion is that it’s an impossible project. They have almost all been moved multiple times and it’s impossible to reconstruct previous environments.

Of course, there are other reasons to document these stations. The COOP system is a remarkable success story of volunteers who work in the public interest. Many of their personal histories are fascinating, particularly those of the 19th century. Somebody should write a book. : > )