Tuesday, April 6, 2010

I tried some of the more exotic options that I describeded in the last post. There was ConUS, the "Kept" and "cut" GHCN classes - referring to those that were maintained after the GHCN project finished its historical phase, and those which were not. Conspiracy-oriented folks have been suggesting that GHCN dropped preferentially stations that were not warming. I also checked B.C. and Africa.

I made some slight changes to the code. I mentioned last post the fix of Carrick's bug; this time I found issues when selections had years in which there was no data at all. These related to plotting - they had been set to zero, which could affect plot and trend - now they are set to NA, which R treats well.

Update. Zeke has queried the downturn at the end of the "Cut" plot, of stations that have no data after 1994 in the GHCN set (and do have for at least 50 years prior). It's actually very year dependent.

There were 842 stations that have data up to '91, and not since. 1007 stopped '92 or before, 1023 '93 and 1036 '94. That means that the last point in the "Cut" plot below is based on about 13 stations. So there are big fluctuations at the endpoint depending on where you stop.

I've plotted below "Cut95" - stations whose last year was '94 - that was my original choice. I've also plotted Cut94, Cut93, Cut92. The earlier ones are more reliable at the ends.

Plots

Continental US

Continental US

Africa

Africa

"Dropped" GHCN sites

"Dropped" GHCN sites

"Continued" GHCN sites

"Continued" GHCN sites

British Columbia

British Columbia

The last stages of the "dropped stations" plot depends a lot on where you stop:

Zeke,Yes, I think the transition to zero stations needs more work. It's possibly a weakness with this implementation of least squares. A graph point which is determined by only a few stations will be relatively lightly weighted, unless the weight function is varied.

The weighting takes account of spatial sparsity (through the grid) - maybe something similar is needed in time for series like this.

Joseph: there is always a danger of having spatial coverage issues predominate when you get too selective a station set. E.g. the low pop density stations in http://rankexploits.com/musings/wp-content/uploads/2010/03/Picture-78.png

@Zeke: Sure, but I've actually looked at disjoint population size groups, e.g. 10 million to 12 million, 12 million to 15 million, etc. There's a statistically significant temperature slope trend once you pass the 1 million mark.

It would be difficult to explain that regression trend as a regional bias.

We have Nick's version of Conus, your version, GISS, add josephs..maybe you guys could convince tamino to do one, I'm working on asking Jeff.. or In a pinch We could subset v2.mean to just Conus and feed it to jeff's algorithm rather than moding his code.

By doing that we come pretty close to comparing apples to apples.GISS do some fiddling with GHCN and USHCN and drop some US stations entirely.

In the end what you have is a "proof" of the claim ( which we all know to be largely true) that the various methods of averaging produce the "same" answer.

Given that the U.S. is much smaller than the globe and pretty densely sampled with mostly complete records, it really shouldn't matter much what gridding method and anomaly calculation method you use. It will make a difference if you use GHCN or USHCN, and what version you use (e.g. USHCN F52/GHCN v2.mean_adj vs. raw versions) since the U.S. has a known cooling bias in the raw data (from TOBs and MMTS at least).