Monday, February 15, 2010

E./noscript>M.Smith had a post at WUWT dramatically entitled NOAA langoliers eat another 1/3 of stations from GHCN database. What happened was that he looked at the v2.mean file at NOAA dated the 8th Feb, and found a whole lot of stations had no report for Jan 2010. They were deemed to be eaten, even though a whole lot more turned up that day. The simple fact is that GHCN adds data as they become available. It's a simple storage database, and there's no reason to do otherwise.

Anyway, I wrote a script to download and list the stations in the current v2.mean at NOAA. It lists them in order of date of most recent report (newest first). The list may be helpful, because it is also sublisted by country. The code (a linux shell script which calls 2 R scripts) is here. The list is below the jump.

Update 16/2
The date of the v2.mean.Z file I used was 15 Feb.

Here's a count of the numbers of stations terminating in various recent months. I found a v2.mean file I had for 23 Jan 2010, so I'm giving that count for comparison:

Year_

Month_

Count 2/15_

Count 1/23

2010

1

1182

0

2009

12

179

1252

2009

11

31

111

2009

10

12

29

2009

9

132

8

2009

8

21

123

There's an interesting result buried in there. What looks initially like a cull in Sept 2009 now looks like a bunch of very slow reporters, running about 5 months late. I'll add below (when formatted) the list of Aug/Sep stations from Jan 23 for comparison.Update Thislist now added at the bottom

Another update.
There's a batch of 39 stations from China which seem to have, at some time between 23 Jan and 15 Feb, been updated from Aug 2009 to Sept 2009. Also two from Afghanistan.

10 comments:

I think the better way to illustrate the point would be to sit there, download v2.mean every so often, and see how quickly the Jan 2010 reports get added in.

The entire WUWT post was just ridiculous. Over the top language throughout, with no moment to pause for common sense. EM Smith should learn that 'blog first, think later' is a sure recipe for putting your foot in your mouth. This isn't the first time.

Things that caught my eye: France, Greece, North Korea, Mongolia, Paraguay and Peru haven't been added at all yet, for Jan 2010. Once they come in, that'll change the numbers a lot.

Libya just gave up reporting after Jan 09. They must have a backlog. There are other countries with total or partial backlogs.

It looks like a lot of US stations died between 2004 and 2006, but I think most of these stations still appear in the USHCN files and therefore still make it into GISS. I've never cross-checked the USHCN and GHCN to see how many of them made it. Any thoughts on that?

EM Smith attributes (I think) much of the US reduction in recent years to GHCN's slowness in accepting USHCN v2. I haven't checked that myself, but the timing does seem to make that possible. But it may also be a recognition that the US was just over-represented, and if they need to economise their efforts, that's where they should do it.

In the same way, I think you're right that GISS will pick up the missing stations with USHCN, which is one of the data sets they use. But it probably doesn't matter - even with the reduced GHCN numbers, the US is likely the best covered part of the globe. I suspect GISS likes USHCN not for the denser cover, but the better metadata.

Well, after some random checks, the US stations that were dropped from GHCN are indeed in USHCN (sensible), and the US stations that remain in GHCN are not in USHCN (why not?). I didn't find any that were dropped completely in that time span, but there are probably some. I couldn't quickly whip up a comparison script because I don't see any obvious correlation between the station IDs of the two sets.

You can't use the USHCN alone because it isn't updated monthly. So GISS needs the US GHCN stations to do live updates each month. But the USHCN comes with TOB adjustments, which is important for the US record because it introduced a systematic bias. So you want to also use the USHCN when it does come out.

Actually, now I'm curious. Both sets are oversampled for the US, but I'd think you'd get slightly different numbers from each, if USHCN v2.0 comes TOB-adjusted and pre-homogenised, and the GHCN is not. I wonder if GISS's US numbers change a bit whenever they add on the latest USHCN numbers.

By the way, I don't think v2.0 uses metadata for anything but the TOB adjustment.

Zeke,I've always pictured a poor little grad student slaving away in the basement, looking up old written records from 1895 and digitising them, or finding keystroke errors. I think the effect of any corrections will be more noticeable in 1895 than 1995, because of the density of data.__

I took a peek at the GISS code, and if USHCN and GHCN both have data for the same station, it uses the USHCN version. Which means I gave up too quickly when trying to compare the station numbers, it must be easy to do...

From the above, it looks like about 1200 stations are reliable reporters each month. Taking a very naive count, we might expect (1252-1182) to show up in the next week or so, or 70 more stations. I'm assuming that the number lost to QC is about the same in any given month.

The following are big chunks missing from Dec to Jan, for a total of 90:North Korea, for 7.Mongolia, for 34.Paraguay, for 10.Peru, for 12.France, for 17.Greece, for 10.

I wonder if one of the above countries then isn't actually a reliable reporter, but happened to catch up on the backlog at the end of the year.

So that tardy China block is being (maybe) updated each month, just trailing by a few months? Bizarre. I've been intrigued by these split-country groups. Algeria is also weird, with a group that's up to date, and a group that isn't.

I noticed a poor little station in Tibet was straggling on its own. Lhasa, I think.