A First Look at the CRU Station List

On Sept 28, 2006, Willis Eschenbach sent an FOI for CRU station data. A year later, after many letters, we still do not have the CRU data as used, but do have a list of stations used, a list which is slightly shorter (4138) than the 4349 stations said to have been used in Brohan et al 2006, their most recent publication. It looks as though they sanitized the list somewhat – in Brohan et al 2006, they said that they removed 55 duplicates. I guess that they’ve identified 156 more duplicates (4 times as many as reported in Brohan et al.) But perhaps there’s another reason. In my opinion, they should have delivered a list of 4349 stations – I’ve asked for this from Phil Jones.

Secondly, the list has not been delivered in working order: the data is supposed to be mostly (“98%”) derived form GHCN, but the identification numbers do not tie in precisely to GHCN numbers. The CRU ID numbers are 6 digits versus 11 digits at GHCN. Many of the CRU numbers tie in to GHCN numbers as follows: the GHCN number is in the form CCCWWWWWDDD where CCC is the country code, WWWWW is the WMO number and DDD is the station – for example, nearby sites (but different) sites can have the same WMO number. GHCN DDD identifiers seldom get out of single digits. CRU identifications of WWWWWD tie to GHCN numbers for 1782 sites. As seen below, many sites can be identified with GHCN sites, but not without a further concordance. CRU says that they have a look-up table, but failed to disclose it. I’ve requested it.

Thirdly, and somewhat unbelievably, the CRU identifications are not unique. In one case, there are 6 stations with an identical ID number. Perhaps there is some still undisclosed list and they’ve delivered it in non-working form for some reason of their own. IF there is no proper list, then I have no idea how they can define a look-up table that functions for non-unique ID numbers. For my own attempt at a concordance, I’ve added a duplicate number for each group of sites with overlapping ID numbers so that the new ID number is unique. (I wasted a considerable effort before I figured out that they had non-unique ID numbers – imagine.)

After doing this, in order to make a concordance, for each CRU site unmatched in a first pass, I then selected GHCN stations that were within 1 degree latitude and within 1 degree longitude and had the first 6 letters identical. If there was only one, I declared a match and assigned the GHCN number. This didn’t match as many sites as could be matched, but reduced the unmatched sites to about 354 sites.

I then made an ASCII tab-separated table in which I wrote down the CRU station plus lat, long and altitude and GHCN stations within a degree plus the same information and manually inspected the stations. I probably could have figured another semi-automatic method of reducing it further, but I also wanted to inspect the matches. In many cases, there was a fairly obvious match with the previous method failing due to multiple candidates or spelling variations. In this way, I added 175 matches, getting up to 3959 matches (a little under 96%), leaving 179 unmatched.

Here are ASCII files listing all CRU stations together with proposed GHCN identifications (ID, name, lat, long shown from GHCN) and the unmatched list. All Unmatched These are ASCII tab-separated and can be opened in Excel or read in R.

Unmatched Stations

The distribution of unmatched stations is really very strange – and I add here, that, for each country specified below, I’ve double checked manually against the GHCN inventory to confirm that there was at least one unmatched CRU station from that country. In total, I identified 29 countries where there were CRU stations that were not present at GHCN, including surprisingly stations from Canada, Australia and even the U.S. Here is a list of countries with at least one station that is unmatched in GHCN:

Argentina: CRU had quite a few stations not at GHCN.
Australia: nearly all match, but two CRU stations didn’t match GHCN – Maryborough and Brisbane Airport. Why these?
Austria – quite a few stations not at GHCN
Bolivia – a couple didn’t match
Brazil – a couple didn’t match
Canada – quite a few stations not at GHCN. I noticed a duplicate GHCN for Parry Sound, which is near Toronto and which occurs in two alter egos in GHCN.
Chile – a couple didn’t match. A couple were called “UNKNOWN” in the CRU list. Perhaps they are connected to the UCAR “Bogus Stations”.
China – quite a few stations not at GHCN
Denmark – a couple didn’t match
Dominica – a couple didn’t match
Germany – one didn’t match
Finland – one (Kuopio) didn’t match
Greenland – possibly a couple didn’t match
Guinea – one didn’t match
Iran – one didn’t match
Ireland – one may not match (Phoenix Park)
Israel – a few don’t match
Italy – a couple don’t match
Kyrgyz republic – one doesn’t match
Netherlands – a couple don’t match
Norway – a few don’t match
Oceania — a few don’t match
Peru – one doesn’t match
Sweden – a few don’t match
Syria – several don’t match
Taiwan – quite a few don’t match
UK – a couple don’t match (Kirkwall, Wick)
USA – about 25 don’t match e.g. Moroni, Lahontan
Russia – a couple may not match

IT is quite weird to see these oddball stations crop at CRU. I’m sure we’ll quickly track down where Moroni and Lahontan and their ilk come from, but it doesn’t seem to be GHCN.

Prior Excuses

With these results in mind, let’s review the history of CRU excuses as to why they should not be required to disclose information under the FOI Act – and it’s taken slightly over a year and many letters and appeals to even get this station list. Their original refusal CRU stated that the data was already located at GHCN as follows:

Datasets named ds564.0 and ds570.0 can be found at The Climate & Global Dynamics Division (CGD) page of the Earth and Sun Systems Laboratory (ESSL) at the National Center for Atmospheric Research (NCAR) site at: http://www.cgd.ucar.edu/cas/tn404/ Between them, these two datasets have the data which the UEA Climate Research Unit (CRU) uses to derive the HadCRUT3 analysis. The latter, NCAR site holds the raw station data (including temperature, but other variables as well). The GHCN would give their set of station data (with adjustments for all the numerous problems). They both have a lot more data than the CRU have (in simple station number counts), but the extra are almost entirely within the USA. We have sent all our data to GHCN, so they do, in fact, possess all our data.

In accordance with S. 17 of the Freedom of Information Act 2000 this letter acts as a Refusal Notice, and the reasons for exemption are as stated below

In response to a further request trying to pin them down, they stated that “more than 98%” of CRU data and the remaining 2% was collected under confidentiality agreements.

Our estimate is that more than 98% of the CRU data are on these sites. The remaining 2% of data that is not in the websites consists of data CRU has collected from National Met Services (NMSs) in many countries of the world. In gaining access to these NMS data, we have signed agreements with many NMSs not to pass on the raw station data, but the NMSs concerned are happy for us to use the data in our gridding, and these station data are included in our gridded products, which are available from the CRU web site. These NMS-supplied data may only form a very small percentage of the database, but we have to respect their wishes and therefore this information would be exempt from disclosure under FOIA pursuant to s.41. The World Meteorological Organization has a list of all NMSs.

Obviously, none of this justified not providing a list of stations, but that has taken another 6 months. In connection with the supposed confidentiality agreements, as reported previously, Doug Keenan asked for the countries with which there were confidentiality agreements that restricted access and was told:

Dear Doug,
I have done some searching in files – all from the period 1990-1998. This is the time when we were in contact with a number of NMSs. We have also got datasets from fellow scientists and other institutes around the world. All supplied data (eventually and sometimes at cost), but we were asked not to pass on the raw data to third parties, but we could use the data to develop products (our datasets) and use the data in scientific papers. It is likely that some of the NMSs and Institutes have changed their policies now – and that the people we were corresponding with (all by regular mail or fax) are no longer there or are in different sections. The lists below don’t refer to all the stations within these countries, nor to all periods, but to some of the data for some of the time.
The NMSs
Germany, Bahrain, Oman, Algeria, Japan, Slovakia and Syria

These are the only ones I can find evidence for. I’m sure there were a few others during the 1980s, but we have moved buildings twice since 1980.

Not sure how you will use this data.
Phil Jones

Above I summarized the countries for which there are stations that are not matched at GHCN. Remarkably, these include virtually none of the countries where Jones said that they had received data subject to confidentiality agreements – so that the confidentiality agreement excuse cannot apply for any of these countries. And for each of the countries for which Jones said that there was a confidentiality agreement (Bahrain, Oman, Algeria, Japan, Slovakia, Mali, India, Pakistan, Poland, Indonesia, Zaire and Sudan), I was able to cross-identify all CRU stations with GHCN identifications so that the confidentiality excuse didn’t affect anything.

At this point, the only unmatched stations which would appear to be covered by a reported “confidentiality agreement” are about 6 stations in Syria (about half at GHCN) and one German station (Wahnsdorff). Otherwise there is no valid excuse for not disclosing this station data. Of course it is possible that Jones has confidentiality agreements with Canada and the Australia, but was embarrassed to report them and thus omitted them in the above list. We’ll see.

It is disappointing that the pretexts for not providing the data previously have turned out to be untrue. However, it should be possible to now develop a reasonable concordance for the CRU stations to GHCN where applicable and to identify provenances for the oddball stations to make a concordance up to a very small number of stations – at which analysis can begin.

31 Comments

I know that the Brisbane airport was relocated from the suburb of Eagle Farm to Pinkenba (a few kilometres east, towards the sea) during the 1980’s. Could that be a source of its’ falling through the net?

BRISBANE Airport’s weather station recorded its lowest ever reading today with a low of minus 0.1 – one degree colder than the previous record.
The reading taken at 6.39am reflected lower than normal temperatures across the region.
The official Brisbane minimum was 3.8 degrees at 6.53am, but at the airport the mercury dropped to -0.1.
At Archerfield, in Brisbane’s west, the temperature hit 1.7 degrees but temperatures plunged dramatically further west.
At Amberley, the mercury hit minus 4.8 at 6.42am and at Applethorpe on the Granite Belt it was an icy minus 7.7.
The Brisbane Airport reading was taken at the new weather station there which has been in operation since 2000.
But the minus .1 minimum didn’t last long – it held for just three minutes.

The previous low of 0.9 degrees was taken at the old airport weather station which was in operation from 1929 until 2000.”

(For context, Archerfield is an abandoned WWII aerodrome near Amberley, which is a large military aerodrome built before WWII. Amberley 27.37 S, 152.41 E. At Brisbane airport, the old and new met stations are probably less than 7 km apart and at same altitudes, flat coastal swamp. Brisbane airport is about 10-15 km east – seaward- of Brisbane City. So temps to the East and West of Brisbane were 3-8 degrees lower than the city itself. Does UHI exist?)

I understand the frustration with the Station IDs!! I’m trying to put together a table for USHCN stations that ties together the many versions of WMO IDs and NCDC IDs. The DDD really bugs me because some metadata files only use one D, some use none.

I maually looked at the first four Norway stations and many ID numbers seem to start with 6 digit USAF DATSAV3 station IDs. Here is the GSOD DATSAV3 list. It is being used for daily data from stations. ftp://ftp.ncdc.noaa.gov/pub/data/gsod/ish-history.xls
It looks to be a rats nest for correlating though.
Here are the first four stations and how they compare. Already one ID seems to be mismatched.

I looked through your data folder and found details.dat, cru.info.dat, PO.dat. I already have most of this information in tables (in addition to some more). Is there something in your data folder that I’ve missed?

It’s interesting that all the non unique IDs are for sations located in the US. I’m guessing that there was some kind of truncation when the values were taken from some other source. For reference, here’s the list I’ve compiled:

So GHCN has data from station 16242 which is not in their station list.
The third station they list 62316239001 ROME has the same WMO code of the previous one and is very very close to it. I think no station 62316239001 really exists near the other one 16239000, maybe a relocation inside the airport of Ciampino, but I don’t think so.
Moreover the second data set for station 16239 starts in 1811 and it is impossible that the Rome/Ciampino airport was there. I’m sure that data come from the old Meteorological Observatory in the centre of the city, which started to record met obs in the 18th century and is closer to the ROMA/URBE airport
The missing station in the GHCN list, 16242 ROMA/FIUMICINO, is the main Rome (and Italian) airport, far from the city and very close to the sea (altitude: 2 m asl)

Regarding Milan,
v2 has three data sets for station 16080 MILANO/LINATE, one of which starts in 1763.
Milano/Linate is the city airport and, of course, wasn’t there in 1763. Old data come from the city centre old astronomical observatory of BRERA.
I think CRU kept the two series as they are, i.e. different, while GHCH put them in the same station file.

AS Hans Erren has found, some of the unmatched locations are in GHCN V1.

Following are those three, plus several others, some of which also seem to
be in GHCN V2, but with some differences. Pardon the format, but I needed
to rearrange it to ease the search. If the ID has ten digits, it’s from
GHCN V1, eleven, then GHCN V2, otherwise it is from the unmatched file.

That’s good spotting to look at the old GHCN. As you observe, most of these are not in GHCN v2 e.g. Hobbs, Arequipa, …

I picked up the Tamale match afterwards.

Penghu is definitely a match as is Milano/Linate as is Juzno, I’ll edit accordingly.

I noticed Thule and left it as uncertain because of the change in elevation – it looks like there might have been a relocation, but maybe not. I guess that it’s a probable match tho.

Is Brisbane Eagle Farm the Brisbane Apt?

OK, now that we’ve identified a number of these stations in GHCN v1; that raises the question of ongoing provenance? In one sense you’d have to say that the unavailability of the stations in GHCN v2 ends the provenance. HOwever as Bob Koss observed, the ID numbers match GSOD numbers. So I’ll bet that GSOD provenance is involved somehow.

The more I read, the more concerned I am at the US-centric methodology dominating treatments given to the rest of the world.

Indeed, it seems apparent that selective deletion of data, accidental or intentional, from sets is capable of inducing trends that do not exist in Nature.

Automated methods of removing stations and parts of data because they are too short or too old or have too much missing data, built into code designed for the USA, might not work at all well in ROW places like Australia and New Zealand where quite good data sets go back to the early 1900s in many cases.

John V was arguining in another thread for the deletion of a lot of early USA decades because of sparse data. This does not mean that ROW should be treated the same way.

The unfortunate consequence is that ROW climate analysts ate at least tempted to (if not actually doing) reject large quantities of valuable data for spurious reasons. There is a risK that the Bureau of Meteorology will produce a shortened, “sanitised” set that people regard as step one when in fact it is quite adjusted.

The Brisbane airport case of above could be an example. The airport was rebuilt about 1980 at Pinkenba, after having been at Eagle Farm since the 1920s. The weather station did not appear to have been moved until 2000 or so, as the 2 airports were very close together. So, if an automatic logic check can’t reconcile a site change of location name (lats/longs too) with a site change of instruments because they happened at different times, is this a reason to drop one out?

Not also that at airports and lighthouses there is a trend to house the instruments some distance above the ground, in towers, which can have a non-trivial effect on night time inversion temperatures.

Tell me more exactly what you want to know about Brisbane and I’ll find out. I used to fly out of there at least monthly.

BTW, Archerfield aerodrome does still exist as civil as noted, but unlike Amberley it is surrounded by suburbs and has changed from rural to urban since WWII, with most the the change starting about 1960. My apologies for not making that clear. Climate records do exist for Brisbane Metro, Brisbane airport (both eagle Farm and Pinkenba), Archerfield and Amberley and most likely, about a dozen other sites within a 50 km radius of Brisbane. Many would go back to the early 1900s. Although their absolute temp values might be questioned, their trends remain a valuable resource.

There are further Australian problems. There are towns missing from the CRU data that are present in the Bureau of Meteorology official data, eg Cooma, built in 1955 or so for the Snowy Mountains Hydro scheme, south of Canberra. It flourishes today with some ten thousand people. I would expect it to have a record from 1955 to present, as it is an oasis in an area of sparse towns and works with the ski industry, who sometimes take temps.

There is Tennant Creek in the centre of the Northern Territory, part of the overland telegraph and record keeping from 1874; started gold mining about 1950s, pop about 2,000, again surrounded by few towns. Not on the CRU file.

These are just 2 towns that came to mind as missing when I did an eyeball through the CRU data. I suppose I could find, at this rate, at least 20 more, each with some form of importance (say to fill in holes in area coverage).

There are other missing or quizzical data. Newer Brisbane Airport (see above) was reclaimed from coastal swamps. The old airport was not much higher on land. The CRU data have no altitude figures. The Bureau gives an elevation of 41m, which sounds like the top of a control tower. Or wise provision for inevitable sea level rise.

The Bureau gives an elevation of 41m, which sounds like the top of a control tower. Or wise provision for inevitable sea level rise.

That’s very interesting, Geoff. There is absolutely no way that Brisbane Airport’s tarmac (and general ground) level is at 41m elevation. As you say, the whole area is built from reclaimed mangrove swamp. They may have built it up a few metres, but nothing like 40.

There are only two possible explanations: the BOM is wrong, or the station is on top of a structure. (A few months ago, I would have thought the latter suggestion was laughable. Now, after surfacestations’ documentation, it becomes surprisingly credible.) I’m not sure that Joe Public could gain access to confirm and document its sighting, in this day and age of The War Against Terror.

The following matches were found via a file named cruwlda2.zip which
once was available at http://www.cru.uea.ac.uk/advance10k/climdata.htm
but now the comment says “This dataset is no longer available …”
However,I got a copy back when it was available, an so did Warwick. 🙂

I did a quick check of the US portion of the CRU list to find CRU stations in Ohio. To my surprise, there was only 1 overlap with the USHCN set, namely Wooster Experimental Station.

The others were the following urban locations: Cincinnati, Columbus, Dayton, Akron/Canton, Cleveland, Sandusky (not to be confused with the distant Upper Sandusky USHCN site), Youngstown, and Toledo. The Columbus station is the thriving Port Columbus International Airport. MMS actually gives its baggage code, CMH, as one of its identifiers! The MMS Map tab Satellite option shows a jet taxiing down a nearby, heavily blackened runway. Furthermore, CMH is downwind of the city proper.

I haven’t checked the other urban sites, but I suspect they are also airports. Out west, where there are fewer international airports per square mile, the frequency of airports may be less. But in ROW, I suspect there are a lot of airports.

So is CRU measuring global warming, or global air traffic?

PS: Anthony gives Wooster a 3, but I think that is rather harsh. There’s just a dinky brick building whose corner is ~15 m from the sensor. You have to search the NSEW photos to even find it. More anon, elsewhere, perhaps.

Steve identifies the “Sidney” CRU station (#72531100, 40.3, -84.2) with Charleston [IL], 39.48, -88.16 [GHCN 42572531001]. In fact, this would be Sidney, Shelby Co., OH (N. of Dayton on I 75, between Piqua and Wapakoneta). It is not clear, however, whether this would be Sidney 1S (COOP 337693, NCDC 20015400; 40.27056, -84.15056) or the nearby Sidney Highway Dept (COOP 337698, NCDC 20015406; 40.298330, -84.163330). In either event, this is neither an urban airport nor an USHCN station, unlike the other Ohio CRU stations.

[…] Comment on A First Look at the CRU Station List by Dominica » Blog… […] Comment on A First Look at the CRU Station List by Dominica » Blog… [‘¦] Comment on A First Look at the CRU Station List by Dominica […]