Tuesday, February 23, 2010

Yet another fevered analysis on adjustments from Willis Eschenbach, this time on GIStemp in Alaska. As usual, everything that he doesn't understand means someone is fudging ("Fudged Fevers").

Ironically, there was huge pressure from sceptics that led to GISS to completely release their code and data some years ago. They may have hoped that the fudge fans would at least look at what is in the code first to find where it's done. But no.

The adjustment in question is GISS's way of dealing with the Urban Heat Island effect. The sceptic chorus is that measured warmth is an artefact of measurements being taken near cities, at airports or whatever. There is indeed a UHI, and here's what GISS does about it.GIStemp information

Update Apparently an issue now is whether GISS erred badly in classifying Matanuska AES as urban according to satellite-observed night brightness. I have added below a satellite map pic of Matanuska AES. It's not far from Wasilla. It is in fields, but about 1 km from a major freeway intersection, and about a mile from what looks like a car sales lot.

GISS adjustment principle

They now estimate the urban status from satellite brightness. This is a good idea, because brightness correlates well with energy usage, which in turn matches the waste heat that causes UHI. It's also more precise, because you can see where the stations are relative to the brightness. And their adjustment has the general effect of removing the trends from urban stations and replacing them with a trend inferred from nearby rural stations.

The weighted average they use tapers with distance, and reaches out to 1000 km from each urban station. If more rural stations are available, they will reduce the radius to 500 km. When they have the trend of this average, they make a difference of that with the urban station and add it. The resulting station has its original ups and downs, but the trend of the rural stations. And when you make a global average and look at the trend, the ups and dowwns are smoothed away, and the global trend is made up entirely of rural stations.

Well, almost. They allow for the possibility that stations which now have bright lights may have been rural at some time in the past. So instead of fitting a simple line, they allow the line to have a "knee" with a slope change.

The Alaskan examples.

Willis chose two stations - Anchorage (a city) and Matanuska, which was rural until recently. Both these are now classified from the brightness test as urban.

His Anchorage plot
is actually straightforward. It slopes downward uniformly, suggesting a regular UHI warming. If the correction is made, the global warming trend would be less. He queried the steps, but this just reflects the fact that station temperatures in GISS are rounded to the nearest 0.1C (and stored as integers). His main complaint was that the slope did not curve with the population growth. Indeed it is a little surprising that the algorithm did not insert a "knee". However, it makes little difference to the inferences that are made about global warming, since inexactitude about when UHI was at its peak tends to even out over many stations. Other cities had their peak growth in those years.

His complaints then focussed on Matanuska. Here's his plot:

Here the complaint is of the steepness of the rise, which reflects an anti-UHI adjustment, and, I think, the fact that in the end no nett adjustment is made.

A GISS emulation.

I've written a poor man's GIStemp adjustment program - you can find the code here. It reads data from the GHCN v2.mean and the GISS v2.inv inventory file. You choose a "target" urban centre. It first scans the v2.inv to work out which stations are within Rad distance - Rad is usually 1000 or 500 km. It then scans the big v2.mean file to locate the rows that come within range, and in the process works out monthly averages for each station over its data range. With duplicates, the rule is that for any year it takes only one value per station, which is the first one encountered. Note that these averages do not relate to a fixed time period. If we were looking further afield, that would be a problem, but it's OK for the neighborhood of a single city.

The next loop looks over the same rows from v2.mean, but now we know which ones to choose. An anomaly is made by subtracting, for each station including the city, the monthly mean. That makes it possible to replace missing values by zeroes, which avoids seasonal distortion. Then the distance-weighted annual average of rurals is created, and from it, the annual values for the city is subtracted to make corr[], the correction vector.

Next the line-with-knee fit is done, using lm(), Every possible knee location is tested, and the one with minimum RSS chosen. Finally it is all plotted.

Matanuska.

OK, here's our Matanuska plot. It shows the city reading in red, the rural average in green, and the difference in black. Note that if we added the black curve as a "correction" to the red one, it would completely replace it by the green curve. No info from Matanuska would go into the global index. If UHI had been suspected, it's gone.

But in fact GISS fits the line segments as shown, and adds that. That has the same effect on trend, but leaves short-term signals. I'm not sure what the benefit of that is.

Comparing this plot with GISS, the knee comes a bit later, and the late rise is even steep er (a HS?). The earlier downslope is less pronounced.

Commenters at WUWT asked why this location was being adjusted at all, as it isn't obviously a big city. The answer is that GISS uses the objective satellite brightness criterion, and I'm sure those folks at WUWT would object if GISS replaced that with subjective judgements. But it may indeed be a false positive - does that matter? Not much; as long as enough rural stations remain, it doesn't matter unduly if a few were removed without necessity.

Update: Matanuska on the map:

The AES is near the top right

Anchorage.

Here's a difference - I do get a slight knee. But it isn't very pronounced. As a numerical matter, the change of RSS with the knee/no knee options is quite small. Otherwise again we see the fairly long descent, corresponding to the expected UHI correction.

Let me emphasise again that this GISS correction does not leave Anchorage with a merely modified trend - it replaces the trend with that of rural stations.

Code and algorithm.

Again, the R code is here. I'll write more about the algorithm soon, and with details from the GIStemp code. For the moment though, I'll just reprint some comments I posted at WUWT.

On the GISTEMP code
The relevant parts of the code are in directory STEP2 of the GISTEMP source. The first thing to note is the file v2.inv in ./input_files. It lists the station data, and for the two Alaska stations:
42570273000 ANCHORAGE/INT__________________ 61.17 -150.02__ 40____8U__173FLxxCO 1A 5WATER__________ C__ 53
42570274001 MANTANUSKA AES__________________61.57 -149.27__ 46__225R__ -9FLxxCO30x-9TUNDRA__________C__ 18
The second last number gives the GHCN brightness rating - C is highest. So both will be adjusted.

The adjustment is done in the subroutine adj() of padjust.f. Temperatures are held as integers, to tenths of a degree - 121 is 12.1C. The relevant code fragment is:
____ do iy=iy1,iy2
________sl=sl1
________if(iy.gt.knee) sl=sl2
________iya=iy
________if(iy.lt.iy1a) iya=iy1a
________if(iy.gt.iy2a) iya=iy2a
________iadj=nint( (iya-knee)*sl-(iy2a-knee)*sl2 )
iy is the year. You'll see that there is provision for a "knee" and two slopes (switching at the knee). There are also limits beyond which the adjustment will be held stationary.

So the effect of the "nint" is that the adjustment is piecewise linear, and forced to the nearest integer value - ie 0.1C.

So now you can see where those plots come from. Anchorage has no knee, but just a steady slope adjustment, made stepwise by the nint. Mantanuska has a knee, and a steep slope following the knee.

The rationale seems to be that a broad slope correction is made to match the city to have the trend of its rural surrounds. The knee is presumably to allow for a transition in the past from rural to urban.

More on the algorithm used. In PApars.f they compute, as they stated, for each urban station (in the do 200 loop) the AVG() of yearly distance weighted average values of nearby urban stations. They then calculate the difference between that and the urban values URB(). This information is passed to getfit() in t2fit.f (via the common block FITCOM).

In getfit, the line-with-knee fit to AVG-URB is laboriously computed by trying every possible knee (within 5 yrs of the end of range), and choosing the value with least residual SS (in trend2(), tr2.f). The resulting slopes and knee are what ends up in the adj() routine set out in my previous post, and with the jaggedness of integer conversion become the plots shown in the head post here.

Thinking a bit more about the method GISS is using. The plots Willis has shown are just approximators to the difference between the "urban" station and the weighted average of surrounding "rural" stations. If you think there's something extreme about them, it's probably not in the adjustment arithmetic - it reflects the actual behaviour of that difference. There's nothing particularly bad about a LS fit of a bent two-line segment.

Adjustment vs Omission
So what does the adjustment achieve? Suppose you put more effort into getting a better approximator to the difference. In fact, you could just use the exact difference. That would completely replace the urban station result with the average of surrounding rural. The eventual global trend would then be just provided by rural stations. That is good if you have enough of them.

So why include the urban stations in the calc at all? They have an odd effect - by putting them in, and then replacing them by the average of local rural stations, you upweight the effect of those local stations in the global average. This probably wouldn't matter much, and the effect is further muted anyway by the gridding.

The piecewise linear approx leaves a small residual effect of the urban stations in the global average, but only contributing short-term variation. The trend effect has been removed. And since the short-term effects are almost totally lost in the averaging, it seems to me that there's little difference between GISS UHI adjusting and leaving out urban stations completely.

Update - Orland Ca
Steven, in comments, asked for a run of Orland Ca, which is another station classified urban, but arguably not. Here's the plot:

Further update
I've plotted the number of rural stations used in the Orland calc, along with the weighted sum, in each year. The weighted sum counts each one multiplied by the linear distance taper factor (range 1:0).

Monday, February 15, 2010

E./noscript>M.Smith had a post at WUWT dramatically entitled NOAA langoliers eat another 1/3 of stations from GHCN database. What happened was that he looked at the v2.mean file at NOAA dated the 8th Feb, and found a whole lot of stations had no report for Jan 2010. They were deemed to be eaten, even though a whole lot more turned up that day. The simple fact is that GHCN adds data as they become available. It's a simple storage database, and there's no reason to do otherwise.

Anyway, I wrote a script to download and list the stations in the current v2.mean at NOAA. It lists them in order of date of most recent report (newest first). The list may be helpful, because it is also sublisted by country. The code (a linux shell script which calls 2 R scripts) is here. The list is below the jump.

Update 16/2
The date of the v2.mean.Z file I used was 15 Feb.

Here's a count of the numbers of stations terminating in various recent months. I found a v2.mean file I had for 23 Jan 2010, so I'm giving that count for comparison:

Year_

Month_

Count 2/15_

Count 1/23

2010

1

1182

0

2009

12

179

1252

2009

11

31

111

2009

10

12

29

2009

9

132

8

2009

8

21

123

There's an interesting result buried in there. What looks initially like a cull in Sept 2009 now looks like a bunch of very slow reporters, running about 5 months late. I'll add below (when formatted) the list of Aug/Sep stations from Jan 23 for comparison.Update Thislist now added at the bottom

Another update.
There's a batch of 39 stations from China which seem to have, at some time between 23 Jan and 15 Feb, been updated from Aug 2009 to Sept 2009. Also two from Afghanistan.