Tuesday, October 25, 2011

BEST extended its temperature series back a lot further than its predecessors, to about 1800. Many have assumed that this was because they have more early data, but this is not true. Their data set in this period is pretty much identical with GHCN.

You can see this from the KMZ file or the interactive Javascript plot. But I thought I should check it out in detail, so I have below a table of the actual stations, in one case from GHCNV2 (V3 is the same) and in the other BEST, taken from their data.txt file.

There are very few discrepancies. Out of about 275 pre-1850 stations, I count only 6 which BEST has and GHCN doesn't. There seem to be 9 than GHCN has that BEST doesn't, and 8 that BEST has included apparently twice. Details below the jump.

Here is a complete table of stations with pre-1850 data.. The color scheme is:

Black

BEST

Red

GHCN

Purple

BEST unique

Blue

GHCN unique

Green

BEST duplicate

The first table is just of the exceptions. I have colored STYKKISHOLMU but not counted it, because it is in both, though BEST seems to have about 20 more years data. Generally I did not color if there is a discrepancy of less than three years in start date. I have shortened names to 12 characters.

Name

Lat

Lon

Alt

Start

End

BERLIN-DAHLE

52.47

13.3

58

1769

2009

Wien - Hohe

48.2

16.4

203

1775

2009

UCCLE BELGIU

50.8

4.4

100

1794

2009

SHIP V

34

164

-999

1808

1829

EAST MILTON

42.22

-71.12

193

1811

2010

KIEV GMO

50.4

30.53

170

1812

2010

SAMSUN

41.28

36.33

4

1819

2010

FORT SNILLIN

44.9

-93.2

245

1820

1982

NEW YORK AVE

40.63

-73.96

7

1822

2009

CHARLESTON I

32.83

-79.97

8

1823

2010

STYKKISHOLMU

65.08

-22.72

10

1823

2010

MACDILL AFB/

27.85

-82.52

6

1825

2010

ITHACA CORNE

42.46

-76.46

292

1827

2010

KIRKWALL

59

-2.9

26

1827

2009

Helsinki/Kai

60.2

25

4

1829

2000

BLUE HILL OB

42.22

-71.12

195

1831

2006

STENNIS INTE

30.33

-89.44

6

1833

2010

WAVELAND

30.3

-89.38

2

1833

2006

AMHERST

42.38

-72.53

45

1836

2010

Graz-Univers

47.1

15.5

366

1837

2001

SALZBURG-FLU

47.8

13

439

1842

2010

WEST CHESTER

39.97

-75.63

137

1843

2006

STYKKISHOLMU

65.08

-22.73

8

1846

2010

SAN FRANCISC

37.62

-122.38

5

1847

2010

SATA FE COUN

35.64

-106.03

2039

1849

And here is the full table, including exceptions, ordered by start date:

Thanks, Carrick. Yes, I don't know what effect duplication would have, because they don't seem to do area-based weighting. With area weighting, the effect is limited; the duplicated site is up-weighted relative to others in the cell, but the cell is not upweighted. Since one hopes the sites in the cell are correlated, this doesn't matter too much.

They do a spatial uncertainty weighting. I can't see that this counters the duplication effect.

One hopes the duplicates would be perfectly correlated. If you look at the data itself there might be issues (for whatever reason).

I was looking at your .inv files and they got me thinking that they're certainly useful for plotting for me but it might be helpful to have a file which integrates all 4 together without the duplicates. Does BEST do so?

Robert,I expect that BEST have included GHCN and GSOD station, sorting out duplicates as best they can. They also seem to have an expanded set ot Antarctic data - they peobably use all the data that GISS includes. CRUTEM3 I'm not so sure - that came out at a fairly late stage of the BEST project.

As you'll see from this listing, BEST is not perfect in doing the sorting. But it may be the best available combined listing.

You can get a feel for what BEST covers from the KMZ file. If you click on a pushpin, with all folders open, it usually pops up more pushpins from the different databases. Almost always, one of them is yellow (BEST). You also see the ambighity - there are often two yellows, which could mean two stations in the same town.

Nick: Following up on that early morning comment - I guess their modified Kriging would handle the duplication, since the duplicates would be perfectly correlated.

I had discussed this with Chad at one point---basically a Monte Carlo method for generating "realistic" temperature series, including all of the warts in the real data, but based on "known" underlying global temperature patterns.

At the moment, there seems to be a lot of "cut and try" going on, with little validation testing of this sort to determine which set of algorithms perform the "best".

I'll note that BEST seems to have more high-frequency noise than the other series, and this doesn't appear to me to be a good thing.

Once again, it might be better to circulate your work among colleagues and have them find the errors rather than do a media blitzkreig and then find your mistakes.

CCE,Yes, that may be the explanation. I just don't know. In their first release, their large flags file (647 Mb)wouldn't unzip. In the second release, the enormous sources.txt (3.2 Gb) file wouldn't unzip. Maybe between the releases I have access to it all, but I'd prefer to wait for a version that has everything.

Anon,Yes, but these things are big. The zipped BEST data is 259 Mb. At the Google sites I have only 100Mb, with a 20Mb file limit. I have put the BEST data into the more compact GHCN format; 22Mb zipped. Even that I'd have to split.

I've been toying with the idea of Amazon, which costs, but not too much.

My geography is not what it should be and my printer quit when I asked for a map showing lat and long so I can't plot the stations. It looks like these early stations are concentrated in Europe and North America.

Hu McCulloch noticed a few exceptions, notably Mauritius (S Indian Ocean (Dodos), started 1787) and Madras (1796). There were some early stations in N Canada - Churchill which started 1768 is up on Hudson Bay, pestered by Polar Bears. York Factory is another.