Homogeneity Adjustment – Part I

My curiosity in the mathematics behind the homogeneity adjustment caused me to finally take a close look at Hansen Step 2. This turned out to be an incredibly torturous task. A quote from a Stephen King short story, The Jaunt, came to mind as I plowed through lines of code: “It’s eternity in there…“. However, I think I’ve decoded enough of the information to begin explaining it in layman’s terms. Part I will deal with the process of preparing the data to be adjusted, while a future, planned post will describe the actual adjustment.

There are a number of programs involved in prepping the data for adjustment. Most are involved in reformatting, splitting, and trimming files. These programs don’t really do anything meaningful to the data, but it is important to understand the input and output file formats in order to follow the real work that takes place later. One prep program, toANNanom.exe, creates the seasonal and annual anomalies we have come to know and love on this blog.

The important program in data preparation is PApars.exe. It is one of the better commented programs in the set, but before the champagne is uncorked at NASA, it should be noted that it is about the only program containing comments. Detracting from the comments that do exist are the utterly confusing variable names and lack of formatting.

With that in mind, following is a summary of the preparation process this program undertakes.

The highlights of this summary are:

Urban adjustments are not consistently based on rural stations from 0km to 1000km. Adjustments are based on stations from 0km to 500km, or on stations from 500km to 1000km, but never both.

Rural stations in the range of 500km to 1000km carry the same weight as stations in the 250km to 500km range.

The USHCN brightness index determines whether or not a station is rural or urban over most, but not all, of North America. For all other stations, the GHCN flag is used.

The first comment in this program states the purpose is “to combine for each urban station the rural stations within R=1000km and write out parameters for broken line approximations to the difference of urban and combined rural series.” As we will discover later, this is not quite true.

A subsequent comment provides a little more detail:

The combining of rural stations is done as follows: Stations within Rngbr km of the urban center U contribute to the mean at U with weight 1.- d/Rngbr (d = distance between rural and urban station in km). To remove the station bias, station data are shifted before combining them with the current mean. The shift is such that the means over the time period they have in common remains unchanged. If that common period is less than 20 (NCRIT) years, the station is disregarded. To decrease that chance, stations are combined successively in order of the length of their time record.

There is the option to pass a different distance to the program (such as R-1200km) and different overlap period (such as NCRIT = 30 years), but the shell script does not do so.

Rural or Urban

A first pass is made through the data to determine which stations are rural and which are urban. If the station has a USHCN brightness index, that index is used to make the rural / urban determination. In those cases, a station with an index of “1” is rural and all others are urban. If the station does not have a USHCN brightness index, the GHCN R/S/U flag is used, with “R” indicating rural and “S” or “U” urban. The GHCN brightness index is ignored.

Use of the USHCN brightness index extends outside the continental US to some locations in Canada, Mexico, and the Caribbean. Also, not all USHCN sites have a brightness index, notably Alaska and Hawaii. Note that when using the GISS website to interactively get station data, a “rural” station is determined by examining the GHCN flag, not the USHCN brightness index. Thus, stations such as Cedarville are listed as being rural on the GISS website, but are treated as urban when calculations are done.

As the pass is made through the data, the latitude and longitude for each station is collected, as is the temperature data. The temperature data was stored as an integer (units of 0.1C), so it is reconverted to real data by dividing by 10.

Sorting by length of station record

From here on I ran into problems determining when months are used and when years are used. PApars.exe and most of the other programs look like they can operate on years, months, or seasons. Which unit they operate on depends on information placed in each file format along the way. Some of it appears implicit. Unfortunately, one cannot depend on the variable names for resolving the issue. For example, is variable N3L expressed in months, years, or hair follicles? As a result, my use of specific month or year units in the following discussion indicates my best judgment as to what is actually used, but I recognize I may be wrong.

A pass is made through the rural station data to sort the stations according to the number of valid months of temperature data. First, the number of valid monthly records is determined (i.e., a month with a temperature value of 9999 is not valid). Then the stations are sorted in order of the station with the most valid months of temperature data to the station with the least valid months of temperature data. A station with 1000 consecutive months of data is not given preference over a station with 1001 months of occasional data.

Identifying “nearby” rural stations

Next, a pass is made through all urban stations to generate a list of “nearby” rural stations. This is done in two steps, first by looking at stations within 500km of the urban station and then, if no rural stations are found within 500km, looking at the remaining stations out to 1000km. If rural stations within 500km exist, it is quite possible any stations from 500km to 1000km will not be considered.

First, using the spherical law of cosines, the cosine of the angle between the coordinates of the urban station and all rural stations is calculated:

COS_U_R is compared with the previously calculated “critical” cosine value COS(500/Earth_radius). If the resulting COS value is less than the critical value, the rural station is too far away and is skipped. The SIN and COS values for each station’s latitude and longitude are pre-calculated and saved so that each iteration through a station does not require a new calculation.

If a rural station is within the distance limit of the urban station, the distance between the two is calculated. However, the program does not calculate the great circle distance. This is to avoid the use of ARCCOS, which must have been quite expensive 20 years ago. Instead, the shorter chord distance is calculated, and a ratio is produced, as follows:

Dist_U_R = (Earth_radius / 500) * SQRT(2 * (1 – COS_U_R ))

From this, weight is assigned to the station using WT = 1 – Dist_U_R

As an example, for a station 5 km away, a weight of 0.99 is assigned, and for a station 499 km away, a weight of .002 is used. These seem appropriate if the range is 0km to 500km.

However, if no rural stations are found within that range, the search is extended out to 1000km. In this extended case:

Dist_U_R = (Earth_radius / 1000) * SQRT(2 * (1 – COS_U_R ))

The same weighting function is applied. The implication is that if an urban station has no rural neighbors within 500km, but does have neighbors between 500km and 1000km, those rural stations are given the same weight as those rural stations 250km to 500km from other urban stations. It seems odd to me that such an asymmetric weighting would be used.

Combining an urban station’s nearby rural stations into a single series

Next, a pass is made through all rural stations that are within the 0km to 500km (or 500km to 1000km) zone from the urban station. The stations are visited in order of temperature record length, which was sorted previously. The purpose of this pass is to create a combined rural time series for the urban station. Each rural station’s record is added to the combined series as follows:

(Assume the station to be added is “NEW” and the combined series is “COMBINED”.)

The average temperature of all months in common between NEW and COMBINED is found for both series. Note that this is a single number representative of each series.

A bias is calculated by subtracting the NEW average from COMBINED average.

After this is done, the urban time series is examined. Two tests are made against the combined series.In the first test, if the urban and combined series have fewer than 20 years in common, the urban station is dropped.

In the second test, if the number of years in common is fewer than 2/3 the period of overlap, the early years are dropped and the comparison is re-done from a new starting point. The formula for the new starting point is NEW_START = LAST – (COMMON – 1) / (2/3). NEW_START and LAST are offsets from the first year of record (1880). Thus, if there are 110 years of overlap (1880 to 1989) but only 50 years have valid records in common, the new starting point would be 110 – (50 – 1)/(2/3) = 36 (integer value). This is then added to 1880 to yield a new starting year of 1916. The analysis is rerun from that point.

Note that if there are fewer than 20 years in common for rural stations within a 500km radius, the program will make an attempt to find and combine rural stations between 500km and 1000km. The rural stations within 500km are never combined with the new stations between 500km and 1000km – they are dropped from the urban station’s analysis.

Assuming there is enough overlap between the urban and combined rural time series, a new time series is calculated by subtracting the urban series from the combined rural series. Also, the mean of this new series is calculated over the period of overlap. Both calculations appear to be done on the annual averages and not the monthly averages, but I am not entirely certain at this point. This information – along with many other parameters – are passed in a call to GETFIT, which will be the subject of my future post.

138 Comments

Then the same “rural” station may be used more than once for different, “nearby” urban stations? If so, I take it the weighting may be different each time, depending on the specific “rural” station’s distance from the urban station then in question.

So if Canada, with all of its cooling rural stations, is missed out from the analysis, urban sites within 500km of the border will be biased towards warming US sites to the south even more than if they are 1000km to the south.
Therefore the effect of the Hansen homogenization scheme combined with large missing data, is to make the Northern states of the US even warmer than they should have been with the Canadian data included.
Brilliant.
Hansen makes even more money and gives even more interviews (he’s up to a 12 a day habit) on how much he’s being censored.

Many an hour have I spent tearing my hair out while going over uncommented legacy code! It’s amazing that the code for this data analysis is so sloppy. You’d think that people proposing to spend trillions of our dollars could at least take the time to make their code clear and comprehensible.

The “science is settled” crowd should see the gorey details behind how these allegedly unassailable conclusions are arrived at.

Tesla – my sentiments exactly. From Steve’s description (and my own examination of the code), the GISS homogeneity adjustment code is sloppy at best. What I find amazing are the inconsistencies in the rural site weights. What were they thinking?!?

So. Why do the homogenity adjustments at all? Didn’t you just pollute two sets of data? If you know the Urban date is suffering UHI, then show it seperate, or show it adjusted compared to non-adjusted “rural” stations. But why do all this finagling? If nothing else, it just brings the data into question. And adjusting a station based on a station 250 MILES or more away????? I. Don’t. Think. So.

#1 Yes, a single rural station can take part in the adjustment of multiple urban stations. The weight it will carry depends on the distance from each rural station.

#2 Theoretically that can happen, although I don’t know if it actually does. Although the Canadian record in recent years seems to have disappeared, the homogeneity adjustment looks back through the entire record, so earlier years from a now “missing” station would influence the adjustment.

This post by John prompts me to ask a question that has been nagging at me for quite a while. As I (a non-scientist or engineer) understand it, Hansen and crew are adjusting temperature records, primarily urban, based on what I presume they believe to be “pristine” or “uncontaminated” records of rural sites and then combining all 1200+ of them in to an over all history. Has anyone compiled a list of these “uncontaminated” sites to see if the distribution and quantity of these sites would in fact be representative of the temperature history the U.S. or worldwide for that matter? Along that line, what is the minimum number of sites and distribution pattern required to provide a scientifically defensible history of temperatures in the U.S.? Is it 10, 33, 79, 120, or any other number? I would have the same question for South America, Europe, Africa, Asia and Australia? I don’t understand how adjusting sites that are contaminated by urban heat adds anything to our understanding of the true surface temperature record particularly if there are enough uncontaminated sites with long term records. Maybe some one has addressed these questions in the past and can direct me to the answers. Thanks.

These 500km and 1000km cut-offs probably seem reasonable to inhabitants of Canada or the USA. But they souns slightly bizarre to rather smaller scale european ears. Just for fun I opened up a map of western europe on google. I then cut a couple of circular holes out of a piece of paper with radii of 500km and 1000km.

Assuming that London qualifies as Urban, one would have to replace the London figures with those from rural stations up to 500km distant. That’s all of England up to the border with Scotland. All of Wales. The West Coast of Ireland. Most of Belgium a fair chunk of the Netherlands, and the north of France going about as far sount as Nantes.

Now I’m sure there will be no problem finding 20 or so stations within that area. But supposing there aren’t 20 rural stations?

Then we need to consider the stations in the 500 to 1000km range. So that’s all of Scotland, the rest of Ireland, the rest of France. All of Germany, Switzerland, and Denmark. Plus northern Italy, and the western parts of Austria and the Czech Republic. And possibly the south-western tip of Norway.

That is, the GISTEMP temp history for the USA48 agrees very well with a temp history generated using only the best stations without any adjustments.

Steve: As we’ve discussed on many occasions, the GISS methods used for the ROW and the US are different as are the temperature histories. The issue now, as it was then, is whether the difference in history is a result of the different methods or regionalization. Until you’ve got some results for the ROW, please stop discussing US results as a “vindicaiton” unless you include an explicit caveat that the ROW situation is different, as you are aware. If we can deduce ROW results with a sensible method, then we can sign off on this. Until then, the question is open.

#12 I don’t think the issue is irrelevant. What it means is that a rural station X km away from Urban station A will have as much influence on the adjustment of Urban station A as it will have on the adjustment of a second Urban station B, which is 2X km away. I find that peculiar.

#10 – JoeH. Go to surfacestations.org for a look at an on-going audit of the USHCN stations. Warning: it’s not a pretty sight. The number of “uncontaminated” stations is a small fraction of those surveyed so far.

“The same weighting function is applied. The implication is that if an urban station has no rural neighbors within 500km, but does have neighbors between 500km and 1000km, those rural stations are given the same weight as those rural stations 250km to 500km from other urban stations. It seems odd to me that such an asymmetric weighting would be used.”

Hmm. Suppose you have a urban station X with 2 ‘nearby’ rural stations A and B. A is at 490km distance, B at 510 km distance. So, in the basic scenario, A will influence X with at small weight. Suppose now that A becomes dysfunctional (or becomes ‘urban’) for whatever reason. In that case, B will start to influence X with a greater weight than A did previously! My intuition tells me this can’t be right, or am I too dumb to see it?

These 500km and 1000km cut-offs probably seem reasonable to inhabitants of Canada or the USA. But they souns slightly bizarre to rather smaller scale european ears. Just for fun I opened up a map of western europe on google. I then cut a couple of circular holes out of a piece of paper with radii of 500km and 1000km.

They don’t seem reasonable at all to me. Even 10km away weather conditions can be very different.

To me, the distant cutoffs only sound reasonable to the extent east-west distance is treated with higher priority than north-south distance, and to the extent there is homogeneous land surface and land use.

Two questions raised by this:

* Am I missing something or is a station 200km west treated the same as a station 200km south? In the (eastern) USA, there’s a north-south gradient of approximately 4 degrees C per 200km N/S (see USDA zone map below).

* Are there data/methods to simply map “surface variability” of the planet, and/or “land use variability”? At the very least, one can look at the map below and understand that the western US and Canada are climatologically quite different from the east– and that even a 100km distance is inhomogenous for planting let alone climate analysis.

These 500km and 1000km cut-offs probably seem reasonable to inhabitants of Canada or the USA.

In Canada, the city of Ottawa is about 600km away from Toronto. The climates are completely different. Toronto residents say that they will not consider taking a job in Ottawa because the winters are so much longer and colder. Toronto is on one of the Great Lakes but Ottawa is about 200km from the same lake. This may not matter for trends but the temperature in Toronto is in no way predictive of the temperature in Ottawa

#19 Actually, Hansen does this to try to “save” urban stations. If there are no urban stations available with enough data to adjust the urban station, then the urban station is dropped from the global analysis.

#21 There is no accounting for direction in the analysis. A circle is drawn around the urban station and any rural stations within the circle are candidates.

Consider what would happen if this technique were applied to Toronto. From the map in 21, Toronto is in Zone 5. Rural stations with 500km would likely be in Zone 3 or at least Zone 4 since Toronto is in an area that is almost completely built up

What happens to a ‘rural’ station’s data, which are heavily weighted due to proximity, when it ceases to be rural due to urban expansion and is (or should be) reclassified? How often is the USHCN brightness index redetermined?

I think #21 hit the nail on the head here. The fact that the scaling doesn’t take latitude into account seems like a gigantic oversight.

I know it is improbable, but given this system of scaling, a station in Toronto Canada could get adjusted from as far north as the shores of the frigid Hudson Bay to as far south as balmy Charlotte North Carolina.

North America probably not a place where the wider reach would be seen… but what about southern South America?

In South america you could see a severe North/South adjustment bias as no stations even exist East/West.

I don’t get it. The use of 250-1000km distance figures with a presumption of relatedness or unrelatedness over than distance seems like a stretch. Don’t we have other reliability/bias measures that a laymen can understand like:

* Percent of area within 100 meters covered with concrete/asphalt/buildings?

* Percent of area within [some pertinent distance] undeveloped/undisturbed?
Number of meters from nearest building?

And wouldn’t a 500 km distance measure north and south be more likely to produce a natural temperature difference than east-west? So why use radius?

What are the odds that the parameters chosen for this process just happen to wind up giving more weight to stations that show warming more than stations that don’t? Maybe somebody should wager a new Nick Bauer SupremeOne90 Hockey Stick on that, eh?

I did some analysis using only the best rural stations (as identified by SurfaceStations.org) last fall. For the USA48, the agreement between GISTEMP and my temp history is very good:

Remind us again of how many CRN12 rural stations were used in your comparison and statistical significance of that comparison. We should also be aware of the high noise levels within a group of stations such as CRN12 and CRN12R and the need to do these comparisons using larger numbers of CRN rated stations.

And wouldn’t a 500 km distance measure north and south be more likely to produce a natural temperature difference than east-west? So why use radius?

I think in the US coastal areas (east and west coasts) east/west distance can give you as much if not more variation than north/south. And where you have mountains, one side usually has drastic climatic difference from the other.

#31. Rural stations would make sense. However, after the early 1990s, the data sets consist almost entirely of MCDW sources, which seem to be nearly all airports, many in urban settings.

Also the US situation is quite different than the ROW as there are many rural stations in the US. At this point, there is negligible verificable information on the quality of ROW stations. Most of the Chinese stations appear to be very urban. And the tests in Jones et al 1990 purporting to show urban immateriuality have been showed to be flawed to the point where Doug Keenan filed a misconduct complaint against one of the authors of Jones et al 1990.

Is there a precedent for use, what is essentially one set of data, to modify another set of data?

For instance, at one time I was involved in University research including emmissions testing. We were testing the effects of biodiesel blends at varying rates in different engines. We would have never used the data from a Detroit Series 60 to modify the data from a Cat 3406, but this seems to be exactly what this process does. After we were done testing, you could look at the results of a series of tests, and compute the means and all that, but if you had data that was distinct it would stay distinct. If we knew we had data that was damaged, by say a bad lense on an opacity meter that wasn’t caught during testing, that data would be discarded, we wouldn’t try to substitute the results from another engine. This appears to be what these homogenity adjustments do. They take data that they know is contaminated, and then adjust it with other data that may or may not be contaminated. I don’t see how this is good science or good statistical procedure. I beleive they do it because more stations appear to give more weight to the numbers, but I believe the procedure is highly misleading.

Now I think we all know what the net effect of that would be don’t we? No ‘unprecedented’ warming in the latter part of the 20th century. And if that is the case (which it is) then we don’t have any need to ACT NOW in order to save the planet. With no significant warming trend the GHG hypothesis is essentially falsified as there is no way to justify a 1.5 to 4.5 C (and remember thats the low end of the GCM prediction range) without invoking completely ludicrous positive feedback mechanism and ‘tipping points’.

If you can’t show a 0.6 to 0.7 C rise over the last century due to man caused global warming from CO2 emissions rising from pre-industrial 280 ppm to 380ppm (when the actual number based on truly rural only stations and satellite data turns out to be more like 0.2 C) what chance have you got of of justifying 3C over the next century. This would rather expose the scam of positive water vapour feedback in the GCMs wouldn’t it is just?

The late John Daly recommended that quite some time ago. He also recommended that temperature data from comprised sites such as airports and marine ports or any site where human activity might bias and compromise the temperature data not be used.

His six point critera for selecting stations for analysis of temperature records are (1)remoteness, (2)physically well-maintained site, (3)long, continous station record, (4)compliant with the WMO standards and protocols, (5)meticulous recorded keeping, and (6)permanent site location. Thre a very few of these sited left.

Go to GISTEMP and check the plot for the Parowan Power Plant. Why did the plot stop at 2000? Probably because it is stilll a flat line with no trend.

Why is this type of plot important? In this arid region there is little or no greehuse effect due to water vapor and clouds. If CO2 contibutes to warming of the air, then we would anticipate that there should be slight but discernable increase in the mean annual temperature. No such increse is observed. Therefore the hypothesis that CO2 causes warming of the air is falsified. Ditto for Death Valley and Alice Springs until the ABM moveved the ref station to the airport.

Since 45 of the 100 ref station for the ABM data base are now located at airports, the data is no longer useful. This is one sneaky method to manipulate the data base so that country’s temperature trend will rise faster than it should have. Actually, weather stations are for the benefit of people and commerace not scientists.

RE#40 The Antarctic is the best desert to look at CO2 based photon recycling (or the green house effect). If CO2 does increase the rate at which IR radiation is reflected from the atmosphere, back to the ground. You should see it there. It has a lovely time series and no urban heat effect.
There has been no change in the rate it cools in winter, nor in the rate it warms in summer. That is why it is ignored, along with the North American ‘hot’ deserts.

“Has anyone compiled a list of these “uncontaminated” sites to see if the distribution and quantity of these sites would in fact be representative of the temperature history the U.S. or worldwide for that matter?”

So I answered him. I did not use the word “vindication”. I stuck to the question he asked and restricted my answer to the USA48 (twice). Do you disagree with my answer for the USA48?

Not only is the difference in latitude in observing sites significant, but am I to understand that the urban/rural temperature comparisons are also elevation independent?

For example, take Boston MA and Worcester MA located about 50km to the west of Boston. Both could be characterized as urban stations. The temperature difference between Worcester and Boston under the predominantly climatological west wind conditions is about 2.5 to 3.0°C. with Worcester being cooler than Boston. The reason for this is that Worcester Airport is at an elevation of 304m while Boston Logan Airport is 6m.

This relationship is well known to operational forecasters but seems to elude academics who appear to be more interested in arguing over how many angels can fit on the head of a pin rather that what are the real temperatures and how accurate are they within the capabilities of the instrumentation.

Apparently it is some sort of violation of a bizarre gentleman’s agreement that the science community is loathe to even suggest that some people working on climate science, especially involving global temperatures, may be agenda driven rather than involved in an honest pursuit of the truth.

For example, until now no one on this site has provided an explanation for the USHCNv1 urban correction to the NYC Central Park monthly and annual temperature data which was altered, or adjusted if you prefer, downward from the raw data by more than 6°F in the 1960-1990 period. This adjustment was reduced incrementally after 1990, and, what do your know, although plots from the raw data were nearly isothermal during this period, the adjusted “data” suddenly acquired a strong rising trend. Is no one interested in belling the cat?

I’m not talking about Steve who has made every effort to remain neutral and above the fray which is befitting of an auditor, but surely there are some others out there that might have ideas which would illuminate the reasons for the current mess we’re in regarding temperature data.

#43 aurbo:
The adjustment is made to the trends, not to the absolute temperatures. For that reason the elevation should have little effect. That is, if there is an underlying long-term trend in Boston there should be a simliar underlying long-term trend in Worcester.

JohnV #44, even at a trend level, there’s plenty of evidence that temperature can’t be assumed to be uniform in X, Y or Z. See the altitude trend maps. See the adjustments to USDA zone maps over time. They’re subtle but there to be seen. Here’s 1960 vs current:

RE #16 and #17
If I’m reading the code correctly, the weights come into play only when there are multiple rural stations. The objective seems to be to produce a pseudo rural record that is a distance weighted average of the available rural records within the range being considered. Nothing is done to the urban record until later when that piecewise linear function is used to adjust urban trends. If there is only one rural record it has a weight of 1 in the pseudo record. So in the case of the example in #16, station A has a weight of 1. If it goes away, station B has a weight of one.

JohnV we should actually revisit the list of “best stations” . at the time you did the study we looked at stations that were listed as “R” which we thought meant rural. but in the US,”rural” is determined by nightlights. just a nit,
but it is what it is.

That said, we also have the issue that kenneth raises about significance.
I think the effect size we are looking for ( ie difference in trends) is
small (we’ve agreed on this before) and you wont see it with a small sample of CRN12R. Hopefully more stations will get surveyed and there will be more data.

#49 When the single rural station is put into what you called the combined series, the rural station has a distance weight, but it’s also divided by that weight. So it has a net weigh of 1 in the combined record. Isn’t this what you described above?

#48 steven mosher:
We should *eventually* look at re-analyzing with more stations, but this is probably not the time or place. IIRC, the majority of my analyses were done with rural stations that were approved by this community — they were truly rural.

In looking at it again, I think you are right, but not for the reason you mention. It looks like the subroutine that combines stations is called only if two or more stations exist to be combined. If only one station exists, the routine is not called, and the weights are not applied anywhere but in that routine. I don’t know if it ever happens in reality, but it is a nice catch.

We should *eventually* look at re-analyzing with more stations, but this is probably not the time or place. IIRC, the majority of my analyses were done with rural stations that were approved by this community — they were truly rural.

John, in the meantime when you reiterate your comparison test (CRN12R versus GISS) here at CA you should be careful, in my mind, at least, to indicate that the result you show was without determining statistical significance. I think you are aware of the large variation in temperature trends for the USCHN stations and even for those within shouting distances. When one attempts to compare two groups where one has a large variation and the difference being tested is expected to be small, a large red flag should appear somewhere.

Do I recall correctly that you made geographic adjustments in your comparison? With small samples this adjustment would be essential.

RE 51. Kristin picked them. I’m suggesting that we double check them against nighlights index, which nobody understood at that point in time.

nothing more. no criticism intended, just a quick double check. So,
post the list of stations, we can do a quick double check, put the issue to bed. you’re an engineer you know how this stuff works. tedious crappy part of the job.

re # 55 mosher, Thank you for the kind words. I have spent so much of my life writing and trying to understand FORTRAN programs that I got interested in this GISS stuff when you and John Goetz raised similar issues a few months ago. I’m not sure I understand more or less than the rest of you. It’s a mess and it’s not because it’s written in FORTRAN. I think it grew piecemeal like so many other programs from it’s generation. And it’s he.

Looking at surfacestations.org, and focusing on the stations with 4 or 5 siting ratings, it’s pretty easy to see that the trends at these stations are almost uniformly positive and out-of-sync with nearby (not 500-1000 klicks away) rural stations. Anthony has posted lots of comparisons of pairs of this type. It therefore seems out of line with all common mathematical sense that the 1-2 rated stations would give the same trend as the 4-5 rated stations for US temperature. I suspect that, as was suggested above, some non-rural stations are included.

Also, when Michaels and McKitrick studied correlation of temperature rise worldwide with economic/development factors, they concluded that 1/3 to 1/2 of the worldwide temperature rise over the last few decades was due to those factors (which are proxies for urbanization). Again, I find it hard to understand why the 1-2 sites wouldn’t show that different trend given their results.

Another thing I have been chewing on.
Suppose you have an urban station X with a ‘nearby’ rural station A at 510km. I don’t have a calculator at hand, but the weight of A should be around 0.7 (right?).
Now: add a second rural station B at 1 meter from A. Obviously, A’s and B’s temperature will be very similar, but their combined weight on X will now be 1.4 ! Put 5 or 10 more rural stations at the same spot and X’s temperature chart will be totally screwed by the rural stations.

I think that, at the end, that is what GISS does: it overweights rural areas with a large concentration of stations. If you find out that the concentration of stations is e.g. higher in >60°N (for the sake of the example), and we know that the northern part of earth has warmed more, that would explain the faster warming of GISS. If that’s true, GISS will also cool faster if global cooling comes around. I have also read some data (sorry, no source) about autocorrelation of GISS data being lower, so that fits in my theory of GISS overweighting certain areas.

Bruno #58, Those 3 lines of code produce a combined rural record that is a weighted average of the individual rural records within the distance range. The weights are proportional to the inverse distance from the urban site and THE WEIGHTS SUM TO 1. They don’t grow absolutely. So if there are only 2 rural stations within range and they are both the same distance from the urban station, they each get a weight of 1/2 in the combined rural record.

I do plan to re-visit the analysis but I do not want the hassle of doing it here (at least not right now). The list of stations is available in my source data in the link I posted above. You can also browse the directory for other analyses.

Speaking for the USA48 only (and acknowledging that the rest of the world could be different):

1. GISTEMP agrees well with the temperature history from the best rural stations;
2. GISTEMP does not agree as well with the temperature history from the worst rural stations;
3. GISTEMP does not agree as well with the temperature history from un-adjusted urban stations;
4. There is a real difference between the best and worst stations, but it is not as large as some suggest;

My source data and results are still available online.
If you dispute any of these points, show me where you think I went wrong and we can improve the analysis.

John V in Post #60 you restated the CRN12 versus GISS comparison without any acknowledgement of the small CRN12 sample size and the large within group variations for temperature trends. Could you link me to these data in a form that I can download to Excel so that I might look at the statistical significance of the comparison?

Try the zip files starting with “_20070919″ and “_20071018″. I don’t have time to provide a lot of support right now, but if you can be patient I do hope to re-visit these analyses soon (for sufficiently large values of soon). In the meantime, I will refrain from posting claims if you refrain from posting challenges. Deal?

Try the zip files starting with “_20070919″ and “_20071018″. I don’t have time to provide a lot of support right now, but if you can be patient I do hope to re-visit these analyses soon (for sufficiently large values of soon). In the meantime, I will refrain from posting claims if you refrain from posting challenges. Deal?

John V, I have the IDs for the 24 stations that Kristen submitted and the 17 CRN12R that you ended up using in your analysis. I will look at USHCN data for trends with these stations using the Urban series and comparing it to the other stations in that data set.

I will be looking at the variability of the trends in CRN12R and the possibility of improved capability of determining statistical significance using a larger sample size.

John V, if do not issue a CRN12R vs GISS proclamation without disclaimers what is there for me to challenge? Therefore it is a deal –by default.

John Goetz, I’ve been bothered that the GISS process will use rural stations within 500 km or 500-1000 km, but never both. I think there are conditions when it will use both ranges. The GISS program first tries 500 km and if it doesn’t find any, it then looks at a 1000 mile range. So up to this point you are right, it’s either/or. But when they get to fitting that 2 piece linear equation, they require at least 3 rural records in the combined rural record. But they can have, for example, the case where there are 2 rural records within 500 km, and they haven’t checked the 500 to 1000 mile range. In that case, they will then check the 500 -1000 range and add in those records to the combined record. I think this is what is going on immediately below statement 195. In any case with the caveat that I can’t be totally sure from this mess of code, maybe you want to look at it too.

Edit to # 68, I was not precise about the actual procedure, but the conclusion is correct. The program first tries 500 km. If it finds no rural stations within 500 km, it looks over all rural stations again for those within 1000 km. When these steps are done, it tries to fit the 2 piece line function. But if it has 1 or 2 stations at this point and they come from the 500km range, this means it has not checked the 1000 km range yet. So it then does so.

John Goetz, I’ve been a bit careless in these previous 2 posts because I didn’t fully understand the criteria for calling the fitting program. Please bear with me.

There are 2 places in the code where the 1000 km range is tried. The first is when there are no rural stations within 500 km. The second case ( below statement 195) is when rural stations within 500 km have been found but their combined record is considered insufficient for fitting the broken line. In this latter case, when the 1000 km is run, it can pick up stations in the 500 km range and and stations in the 500-1000 km range.

I believe you are correct. What I see in the code is that if the combined series is less that 20 years in length (NCRIT), then it is discarded, and the analysis is completely redone across all stations within a 1000km radius in the hope of building a combined series that is greater than 20 years in length.

#71 John, They check the variable N3 against NCRIT which is set to 20 in a parameter statement. N3 is incremented by 1 when IWT(IY) is >= 3. IWT comes from COMBIN and is, I think, a count of the number of stations with a valid data point for period IY. So it looks like they want the combined record to have at least 20 periods in which at least 3 rural stations have data. Or something like that. But I’m puzzled because according to Hansen’s report cited by Steve Mc, there must be 3 rural records for at least 2/3 of the time period being adjusted. That’s not quite what they do when they resort to checking 1000 km.

John Goetz, Once the program has tried the 1000 km range, the urban station will be dropped if there are less than 20 periods with at least 3 rural stations. If 20 or more stations are found, then it uses the 2/3 criterion to decide whether to ” drop early years.” But when it does this it goes back to statement 191 and still tries to call the fitting subroutine. I don’t know what drop early years means.

The way I interpreted the 2/3 rule in the code was that if the number of years with valid temperatures was less than 2/3 the total years spanned, then early years were dropped and the 20 period test is done again. For example, if the combined record starts in 1891 and ends in 1990, but the years 1901 to 1940 are missing, then the early years of 1891 to 1900 are dropped, and 1941 to 1991 are used to do the adjustment.

I don’t think it’s quite accurate to say that the preliminary steps don’t do anything meaningful to the data. Specifically, the splitting of the data into six files processed separately surely means that for stations near to the edge of a zone, one or more of the nearest rural neighbours could actually be in the adjacent zone, and will not be found by this method.

The splitting of the data may have been a necessity in the dark ages (papars.f appears to be written in Fortran IV, which I thought had long vanished from the earth) but there’s really no excuse with modern computers for not doing the calculations properly.

On a related note, I like the cavalier way they add 0.8 degrees to part of the Lihue record in step 1. Obviously something had changed at the site, but one feels they could have made a bit more of an effort than this to find out what, and to argue that this was the best way of dealing with it (rather than, for example, discard that data).

Just to let John V know that I am not stalling on my promise to analyze CRN12R versus USHCN Urban temperature trends, I decided to produce and post my analysis in stages.

I have compared the 1920-2005 temperature trends for the CRN12R stations against the USHCN stations using the USHCN Urban Calculated Mean temperature series. I used only those stations that had a complete set of data for the period 1920-2005. For the entire USHCN station population this meant eliminating 183 of the 1221 stations and for the CRN12R eliminating 2 of 17 stations. The 2 CRN12R stations eliminated appeared to have approximately the same trends as the other 15 stations.

I calculated the trends for the stations using annual mean temperatures for the period 1920-2005 and than averaged them for CRN12R and USCHN. I also calculated the standard deviations for both groups in order to look at statistically significant differences.

The average of the trends for the USHCN stations = 0.40 degrees C per century.

The average of the trends for the CRN12R stations = 0.52 degrees C per century.

The difference was not close to being statistically significant, but one must note that the CRN12R trends were higher in this comparison than the USHCN stations.

I then did a multiple regression of the USHCN station trends for 1920-2005 using station altitude, latitude and longitude as independent variables. The regression fit showed statistical significance for all variables. I then used the regression equation to compute a predicted trend for the CRN12R and USHCN stations using the average station altitude, latitude and longitude for each group of stations. The predicted trends were:

Predicted trend for the USHCN stations = 0.37 degrees C per century.

Predicted trend for the CRN12R stations = 0.58 degrees C per century.

Comparing the regression prediction for USHCN and CRN12R stations shows that the latitude, longitude and altitude differences between USHCN and CRN12R stations can account for all and more of the actual difference in temperature trend.

In future posts on these and related analyses I want to present all the background data used in the above calculations and do a similar comparison of the CRN123 and CRN45 stations with the latest Watts and team quality designations.

Is this post I will list the results of two regressions of the 1920-2005 temperature anomaly trends using the USHCN Urban Calculated Mean data series for stations with complete data. I did multiple regressions of the trends (in degrees centigrade per century) for the CRN1 through CRN5 stations against the station latitude, longitude, altitude and CRN rating.

The first regression (Reg1) used CRN123 stations were assigned a value of 1 and the CRN45 stations assigned a value of 0. In the second regression (Reg2) the CRN1 stations were assigned a value of 1, the CRN a value of 2 and so on to CRN5 assigned a value of 5.

In the table immediately below are listed all the pertinent results for both regressions.

In the second table immediately below the multiple regression equations were used to generate (from the results above ant the listed values in the table for latitude, longitude, altitude and CRN rating). In the final column the actual average trends for the CRN combinations are listed for comparison with the predicted values. In the same table below, the predicted trend components and the sum of the components for the various CRN combinations.

In the third and final table below the standard deviations, number of stations and probability (p) for a statistically significant difference between groups is listed along with the means and the adjusted means. The means were adjusted using the contributions from the regression equation for latitude, longitude and altitude (see the table above for sub totals and note the differences for the CRN groups). I used altitude even though it does not appear to be a statistically significant factor in the regression equation.

In conclusion it is apparent: (1) that the adjusted mean trends follow a rather logical progression of increase with the CRN designation, (2) those differences are more readily shown using a multiple regression and (3) the small sample sizes for some of the CRN groups does not allow a good statistical comparison and (4) the agreement of the predicted CRN trends from the regression model with actual trends is reasonably good and indicate that the model captures an estimate of the relationship of CRN rating with temperature anomaly trends.

I plan to use this approach for more detailed analyses of the CRN rated stations. In the mean time, this analysis strongly suggests that the Watts team station quality checks and ratings have meaning vis a vis temperature trends.

I was a bit surprised that my previous post on this subject matter did not receive any responses, and particularly so without a reply from John V. It would be a great help to me if I could provoke some comments on the methods that I have used to compare the CRN rating combinations. My previous post in this thread was rather long and detailed and, therefore, I plan to keep this post short and in summary mode.

Below is a table giving the anomaly temperature trends for the period 1920-2005 using the USHCN Urban data set for calculated mean temperatures and CRN ratings from the Watts team. I have used this time period in previous analysis since extending back in time beyond this point produces a large increase in percentage of stations with missing data. For this analysis I used only stations with a complete set of data 1920-2005. I also plotted the temperature trends in the graph below versus the CRN ratings 1 through 5.

What one sees from these temperature trends versus CRN ratings is certainly qualitatively what one would expect from what the Watts team has indicated and recorded in their evaluations. The differences from the best rating, CRN1, and worst, CRN5, is certainly significant and relatively large compared to the overall trends for this time period.

The trends were adjusted using the regression values obtained from the multiple regression of station trend versus the station’s latitude, longitude, altitude and CRN rating. Note that the corrections were largest for the CRN5 and CRN 2 stations. With the correction, the CRN5 trend remains significantly larger than any of the other stations, while the correction for CRN2 station brings it better in line with its rating by the Watts team even though it seems not to differ from the trend of the CRN3 rated stations.

Using a correction for station latitude, longitude and altitude also helps explain the result that John V obtained in comparing the CRN12R with the GISS stations trends. As it turns out on further analysis, one sees that of the 17 stations used by John V, 15 were CRN2 rated and only 2 were CRN1 rated. In my previous analysis comparing CRN12R with the remainder of the USHCN stations, I used only stations with complete data and thus the CRN12R yielded 13 CRN2 stations and 2 CRN1 stations. The adjustment for CRN2 stations is large and it appears to have an anomalously larger trend for its CRN rating.

In conclusion, I think these results, if passing muster here should be expanded on or redone by a Watts team member and perhaps published as a means to show what their efforts have revealed. I can provide any background data that might be required. I could understand anyone wanting to wait for a complete or nearly complete set of ratings data.

The results presented here seem to me to shout out that micro site quality is a major issue in temperature trend measurement and that perhaps this analysis could get some attention to just how important the effort made by the Watts team is in this regard.

Interesting and detailed work. I have resisted doing analysis until I had at least 50% of the network surveyed and better spatial distribution. I have many surveyed stations that need to be rated put into the database that have been backlogged for about a month, I expect to add about 40.

I’d also be interested for that same data from 1979 – present for a couple of reasons.

First, it can be compared with satellite records. Second, it’s near the nadir of the post WWII cooling, just after the PDO flipped to warm. Third, it was during the 1980s that so many of the stations became CRN4 and 5 violators in the first place (thanks to the infamous “cabling issue”).

Well done Kenneth. Now I would like to see Lucia include the raw unadjusted data from the CRN1 stations in her statistical tests of the probability of the IPCC forecasts. Maybe you could link your calculations and results to Watts database, and we can see the further stations coming in?

I think the IPCC forecasts will be mathmatically improbable. Also, it may be that there has been no material recent warming in the US. The most accurately suveyed area in the world.

kenneth, i had a thought. remember petersons claim that urban stations were located in cool parks? this notion, essentially argues that microsite exposure
trumps uhi. that now seems testable. which is the better predictor of trend
urbanity (population) or CRN rating?

I am hestitant to comment due to the obvious conclusion, assuming that this analysis will hold, which is the one Mosher stated in #87. But I would define it differently. The conclusion would be that the homogenization by Giss ensures that “land use”, since siting in a way is replacing a pristine site with human influenced changes at a site, is the predominant signal whether there is any AGW or not. The bias could be substantial and the conclusion that the human signal from unforced climate signal has been determined will have to be re-evaluated. The signal will have to be split into at least 3 parts, AGW, UHI, and “land use”. Though if it holds up, it will most likely mean that “the science is NOT settled” and much of the AR4 will have to be rewritten, if the UHI and “land use” are substantial.

I would like to comment after more data and analysis is done. It does give some support or at least appearance of support to the “cool park” syndrome. In other words, the worst of both worlds…cooling at the best sites, heat contamination at others, some with both.

I agree with #84. I think this analysis has to be done with the satelite data to see if both cooling and warming biases have been introduced or not, and if they can be separated, or not.

For CRN12R, I used those stations so designated by Jonh V. John V came up with 17 stations using a list of 20 or so stations that were designated rural by, I believe, Kristen. I used 15 of the 17 that had a complete set of data going back to 1920 for the USHCN Urban data set. The 17 John V used and the 15 I used had only 2 CRN1 stations.

Steven, I thought you were aware of how John V arrived at his CRN12R stations. I was not so interested in the basis for the CRN12R stations as I was to determine whether other factors were influencing John V’s comparison.

Anthony, I just wanted to be sure that you and your team were aware of this analysis and was hopeful that you could pursue analyses of your own either using similar methodology as I did or that of your own. When you have reached a completion point in your station evaluations, I was also hoping that you or a team member would summarize and analyze the results for a publication.

I was motivated by John V’s jumping in with preliminary analyses that appeared to minimize the impact of your station ratings on temperature trends with what I thought were with very small sample sizes and samples with large standard deviations. My most recent results showed differences with CRN ratings, the size of which, I had not anticipated. That is why I was hoping for comments on the analysis and perhaps showing me weaknesses in the methods used.

If anyone here wants the data I used for this analysis I can email it to them and point to the original references used.

re 89. ya, I think we need to recheck that. Given what I know about nightlights now, I’m just being cautious. That said, I still think it an interesting question to regress on CRN and then regress on pop.

re 89. one other reason to drop altitude from the regression is that the
station may have gone through altitude changes during its history. USHCN would
have adjusted the temp to account for this effect.

#90. Kenneth. Can I take up your kind offer of email data to mark.rostron at bethere.co.uk

Many thanks.

I will put together the information for you and send it some time today. If you have any questions or need further information/details do not hesitate to email your queries to me. The data are in Excel SSs.

re 89. one other reason to drop altitude from the regression is that the
station may have gone through altitude changes during its history. USHCN would
have adjusted the temp to account for this effect.

I want to be clear that when I regressed altitude, latitude and longitude of the USHCN stations against anomaly temperature trends from 1920-2005 that all were found to be significant. When I included the CRN station rating with these three independent variables in a multiple regression, I found that latitude, longitude and CRN ratings were significant but that altitude was not. I am not sure how to interpret that finding.

Steven, when you asked about including population in a regression I was hesitant to do it in light of what appears to be a strong micro site signal in the station trends and how to attribute a population to the individual USHCN station. Based on my past findings, I would have to add population as an independent variable to the other significant independent variables of latitude, longitude and CRN rating.

It would be easy to include population if you can lead me to a method of attributing population to a USHCN station. If Watts has not included the information in his spreadsheets then I do not know about it.

By the way, the complete set of data that Watts has placed in one spread sheet makes these analyses of the USHCN stations much, much easier than would be the case if one had to go hunt it up in separate places and for that I want to express my appreciation.

RE95 Thanks for the recogniton Kenneth. I had originally thought that maybe the best way was to spread each data type into individual files, using either a bigendian or littleendian format, along with small snippets of code that processes each file, then submits the results into intermediary files. And then those dozens of files get acted upon by a larger program which spits the results out into a new data file which could then be interpreted.

Then I thought to myself “hey, other people might want to examine this, replication will be important”. So I opted for a real world solution that does require an archealogical compiler and fluency in todays programming equivalent of Latin. ;-)

Using the same methods as previously stated for doing a multiple regression with the temperature anomaly trend for 1920-2005 as the dependent variable, I added population as a fifth independent variable to the independent variables latitude, longitude, altitude and CRN rating. I used the population rating from Watts’ SS for population with R=1, S= 2 and U = 3 (AW has other population measures in his comprehensive SS, but this seemed a good place to start). The results are listed below.

In this rendition, the multiple regression R^2 increases to 0.26 and all of the independent variables can be considered statistically significant (P-Value less than 0.05).

I’ll need to take another look at adjusting the CRN ratings versus temperature anomaly trends with population added to latitude, longitude and altitude.

Note that I did not take the time to convert the trend data from degrees Fahrenheit per year to degrees Centigrade per century. To do this merely multiple the coefficients listed below by 500/9.

re 97. the altitude adjustment would happen in SHAP. perhaps use your good graces to get SHAP from them. It would probably be a simple Lapse rate adjustment. Not sure if Mennes work superceeds this or incorporates it.

#99. It’s always worth foraging through http://www.climateaudit.org/data for collations of data. I’m sorry that it’s not conveniently linked through a web page as there’s only so much I can do, but the directories are not blocked and the indexing tends to be “plausible”. For example …data/giss is a good place to look for giss data. I tend to call information files either “info” or “details” and in this case there is a giss.info.dat and giss.info.tab. As always, I encourage people to suck it up and use R. Again there’s a little trick

download.file (url,”temp.tab”,mode=”wb”); load(“temp.tab”)

or you can read the ASCII *.dat file which I ALWAYS write tab-separated (which thus import into Excel if you must.)

re 102 ya I found it. I recall that earlier you did a comprehensive test of all the stations figuring the trend for each, ending up with a pretty contour plot.
Since Kenneth likes the regression approach to this issue I was wondering what your opinion would be of taking the meta data, including crn, and doing a stepwise to predict trend?

I have used the ipop column from Watts’ comprehensive Excel SS and converted the R,S and U designations to 1,2 and 3, respectively, for doing a multiple regression, as noted above. It is worth noting that on using the Urban adjusted USHCN data set for the trend regression against the 5 independent variables (see above), the independent variable for population was statistically very significant. The R^2 value for the regression improved from 0.23 to 0.26 when adding population as an independent variable.

Also the adjustment that USHCN might make for the absolute change in temperature when a station elevation changes would not affect much my regressing all the qualifying USHCN station altitudes against a temperature anomaly trend from 1920-2005. What I am looking for in these regressions is what affect does altitude have on the trend, i.e. is the trend greater at higher station elevations.

In a separate post I want to show the adjusted CRN 1 through 5 station trends using all the independent variables latitude, longitude, altitude and population.

Re: #102

or you can read the ASCII *.dat file which I ALWAYS write tab-separated (which thus import into Excel if you must.)

That casual, but stinging “if you must” came across with more effect than a lecture. I admit to having an Excel habit that I should attempt to break – if I am going to go any deeper into these types of analyses.

#104. I’m merely reporting from personal experience. One’s productivity goes up 1000-100000% by forgetting about Excel. It’s a useful tool but not for the handling of bulk statistics like we do here. Plus I love the ability of R to directly access url’s.

Another important point for R: I can post up a script that anyone can run and get the same results, seeing exactly what I did and doing any variations that they might want. When people post up interesting analyses using Excel, mostly I find them far less useful than they should be for the work involved, because I can’t do the run for myself and confirm the result and see what people did. So the ground is far more likely to be fallow.

So it appears I did not avoid the lecture completely. My problem is that I think I use the full capabilities of Excel (pivot tables come to mind) and that what I have done here in these analyses did not take much effort.

Having said that I know if I used R, I could write a few simple statements and R would do that work even more efficiently than Excel. Your last point about posting the script for R and using it to readily replicate other posters’ work is well taken and an aspect of R I had not thought through before.

Below I list the multiple regression equation for the trend anomaly 1920-2005. From it and the average of the independent variables for the USHCN stations with CRN ratings of 1,2,3,4 and 5, I can construct the components (COMP in the table below) of regression predicted trends for each CRN rating. Further I can then use the differences in Sub Totals of the population, latitude, longitude and altitude components to adjust the CRN actual trends for these 4 independent variables. The actual and adjusted trends versus CRN rating are shown in the graph below.

The CRN ratings that are adjusted by including population as an independent variable in the regression improves the expected progression of trends with CRN rating over that adjustment without population used as a factor.

Dang Kenneth.! One thing that bugged me when I did the Opentemp stuff
was I got results for crn5 that made sense, but the 2 3 and 4 made no sense.
essentially by correcting for lat and lon and pop, you demonstrate the
effect of CRN on trend.

Ken. I believe that the Temp data you are using is USHCN after Urban Heat Island effect adjustment. Is it possible to show the anomaly for CRN1 with USHCH before UHI adjustment? If the CRN1 stations are good they shouldn’t need any UHI adjusting. I looked at your workings amd don’t have the know how to reconstitute the USHCN tables into a single flat file, as you have them. I think you are very close to showing that all the recorded temperature increase for the USA is an artifact of poor data collection. If that is true, the rest of the world data is even more likely to be flawed.

Kenneth, I would like to offer some suggestions for improvements in the analysis that you have done. Two of the variables in your regression are categorical, not numeric: DRNRate and PopRate. However, in your regression, they have been treated as numeric. In doing so, there is an implicit (but very real) assumption that a change of one unit is these variables has the same effect on the response variable Trend, regardless of which specific change that is. For example, the difference between CRNRate 1 and CRNRate 2 is treated by the regression as identical to the change between category 2 and 3, or category 3 or 4 or category 4 or 5. Similarly, for PopRate, the difference between a 1 to a 2 is seen as identical to the difference between a 2 and a 3. Both of these are assumptions that I would find difficult to justify.

My suggestion is that it is more appropriate to do an Analysis of Covariance on the data. It is basically a regression on the three numeric variables, Lat Long and Alt, using different constant terms for each combination of categories of CRNRate and PopRate. The assumptions that I mentioned earlier are no longer necessary to do the analysis. Frrom what I could tell, Excel cannot do this analysis simply – it has an add-in for Analysis of Variance – but I couldn’t find one for this procedure. Like Steve Mc., I would point out that it is pretty easy to carry out this type of analysis in R using the lm() function (or in most other statistics programmes that you might have access to). I can do this analysis for you if you can provide me with access to the same data that you used. I am of the opinion that the resulting analysis would be more justifiable and possibly provide more insight than the simple regression.

MarkR, as I recall the urban adjustment in the USHCN Urban data set is small as determined by making the comparison that you suggested. I can review what I found previously.

I would not want to jump to any conclusions at this point based on the regression results. I am hoping that someone in the Watts team or posters at CA will review the calculations and publish something if the results merit it. I would also wait for the Watts team to complete their CRN ratings.

If one looks at all the possibilities someone may come along and say that the CRN rating progression determined by the regression is correct but that CRN ratings 1 and 2 are biased in a cooling direction and CRN 3 and 4 are biased in a warming direction.

RomanM, I agree with your assessment that using numerical values for the independent variables in the manner I did makes an assumption/approximation that is not necessarily true. I took the use of a qualitative dummy variable that can a value of 0 or 1 a step or two too far. I could have made a comparison, I think, using a CRN123 (0)and CRN45 (1) and a population rating R/S (0) and U(1).

I can provide you with all the CRN 1 through 5 station data that you require in an Excel SS. I know some posters here are hesitant to post their email addresses and if you are I would understand. If we do not have a more convenient way for you accessing these data let me know and I will post my email address.

Thanks for taking the time to critique my analysis. It is much appreciated.

Kenneth, what the ANOVA and ANACOVA procedues do is indeed a regression with dummy (or indicator) variables to estimate the effects of categorical variables. It is easier and simpler just to let a programme do the dirty work. You can e-mail me at the e-mail address ram44 at nb.sympatico.ca. An Excel file is just fine since I usually transfer data between programmes using the clipboard. However, looking at the data will have to wait a couple of hours while I go to a golf course to celebrate my first official day of retirement after a 40 year academic career.

re 110. The CRN ratings are not strictly categorical. for example, CRN 5
is supposed to indicate a bias of 5C ( under certain conditions) and CRN 4
a bias of 4C under certain condition. so its not really categorical. But CRN 1
is no bias. so its not totally numeric. with population there is also,another
feild that has actual population, But most of the values for realy small towns
are missing.

We can add the newly rated stations after Roman does his calculations.

I think the Roman’s suggested method is doing it is more correctly, even though as you note the categories are not entirely qualitative. We would still have the assumption that rating difference 1 to 2 was the same as 2 to 3, etc. which I think it is, but only to a rather gross approximation.

I’ll do my dummy variable combos regressions with 0 and 1 values when I finish some urgent tasks for SWMBO.

Even though there may be some meaningful numeric property inherent in the station ratings, the current analysis also assumes that that property translates linearly into its effect on the trends. This assumption might also be one that could be avoided.

re 117. well, that was what stunned me about the near linearity of graph
in 107. The ratings as far as I know are supposed to be “estimates” of the effect size, so I was always prepared, for example, to see no difference between
2 and 3, or no difference between 1 and 2, or no difference between 3 and 4.
Thats why I focused on comparing say 5 versus CRN12. anyways,

RE116. It’s fun to RomanM look at this stuff as well. Plus I’m not talking about
adding the new stations to the model. I’m saying test the model against
the new stations. Your model says trend = pop + lat +lon +alt +crn +constant

Anthony has 40 new stations. predict the trend given the meta data. make sense?

#121. In fairness, one can also say that Hansen at least makes an effort in this direction, as opposed to CRU and NOAA. But for Hansen’s method to work as advertised, he has to have a reference core of CRN1-2 stations and a system of identifying them – a point that John V frustratingly ignored. In the US, there is a plausible core of stations and the lights system, while erratic, isn’t completely hopeless and is better than nothing. However, in the ROW, his execution seems to be a complete mess, poorly planned and hopelessly executed. Cities of 5 million can be classified by Hansen as R-stations. Without CRN classifications of the ROW stations, the Hansen adjustment seems to be pretty much meaningless outside the US, as evidenced by the approximate balance of negative and positive urban adjustments.

re 121. ya my supposition is that his method was developed by looking at a few examples. epitomes. adjusting tokyo. adjusting phoenix. and looking at those
he would say ” hey my method works” and then he fed the meat to the grinder.
So, he made an effort albeit hamfisted. My question about lights is this:
what does it add in terms of fidelity beyond a simple population index?
We replace one index RSU with another index dark,dim bright. Hansen, following
Imohoff would argue that the nighlights is indicative of population density, which apparently in Hansens mind is more highly correlated with UHI than things like impervious surfaces ( concrete) {A side note in hansen 99 I believe he suggested he was going to use a vegatitative index citing gallo or owen I cant recall.} Nevertheless, nightlights may be a proxy for population density and population density may be a proxy for UHI. It would be instructive to actually test that using population density figures derived from other sources.

RE116. It’s fun to RomanM look at this stuff as well. Plus I’m not talking about
adding the new stations to the model. I’m saying test the model against
the new stations. Your model says trend = pop + lat +lon +alt +crn +constant?

Steve Mosher,

We need to await Roman’s correct modeling and results before proceeding – and something I deem well worth waiting for now that I am more fully aware of his credentials.

Anthony has 40 new stations. predict the trend given the meta data. make sense?

Rather ironic that old Kenny Fritsch, who is always calling for out-of-sample tests, would not see what you had in mind here. Not sure the sample sizes allocated among 5 CRN ratings will be sufficient to do a reasonable statistically significant test, but perhaps we have a better chance if we do a CRN123 versus CRN45 comparison.

I have seen Roman’s comprehensive and impressive analysis. There is some good meat in there to think about and I think we have to give Roman some slack to look the results over before posting them here.

That impatience thing is what I think got Michael Mann in trouble with revealing the hockey stick, and some times inflicts my impatient twin, Kenny — with adverse results.

OK, I’ve looked at the data Ken sent to me and ran some analyses similar to his. They were done using the Minitab statistical package. I haven’t used R a lot for this type of work and Minitab requires less typing and generates the output we need here in a relatively easy fashion.

Ken ran a regression on the station trends using the CRN rating including the variables population, latitude, longitude and elevation (all as numeric) to isolate the effect of CRN on the observed trends. IMHO, treating CRN and population in a linear fashion was hard to justify. Thus, I suggested an analysis of covariance with the variables latitude, longitude and elevation as covariates, but with CRN rating and population as categorical. Ken was kind enough to send me the data he used to do the analysis. For purposes of clarity, I recoded the population variable to A, B, C from the values 1, 2, 3 used by Ken. A is the lowest population level (Rural) and C is the highest (Urban).

The model used was as described above (with interaction between rating and population included). The data set was unbalanced with respect to population. E.g. for a station with CRN rating 1, 35% of the stations were pop A and 40% were C, while for rating 4 stations, 68% of the stations were A and less than 7% C. Comparing simple averages would be like comparing apples and oranges so a more sophisticated statistical methodology is needed to adjust for the effect of population on the trends. The results of the analysis:

All of the factors were statistically significant in their effect on trend. I ran the analysis without the covariates and the R-sq value dropped dramatically. Since the design was unbalanced, adjusted sums of squares were used for testing the various effects – each factor is tested after all of the other factor effects have been accounted for. The diagnostics for the analysis indicated no obvious problems with assumptions.

The adjusted means from Minitab (Trend is in degrees C) and standard errors:

The adjusted means here are calculated by replacing the covariate value (lat, long, and Elev) by the average for that variable. Population is accounted for by assuming that 1/3 of the stations in each CRN rating are of each population level, A, B, C. Individual pairwise comparisons of the 5 CRN rating means showed that each of level 1 to 4 differed from level 5 significantly, but the four did not show a significant difference among themselves. Graphically, the above adjusted means look like:

and

Since the actual percentages of A, B and C’s in the sample were actually 59.69%, 26.43% and 13.88% respectively, I also calculated adjusted means with these percentages with the and compared these to both the previous adjusted means and the unadjustedmeans of the trends:

At Ken’s request, I also ran a similar analysis after combining the station ratings into two categories: CRN123 and CRN 45, and the population also into two categories: (AB) and C:

Elevation plays a smaller role here. It is also pretty obvious that, as a group, the urban stations with rating 4 and 5 stations differ substantially from the others. Plots are not included since there are only two categories for each factor and the numbers above are simple enough to digest.

If the tables post in an unreadable fashion, you can download a pdf document containing some of the computer output (and the graphs) here:

With respect, what is the goal of your analysis? It seems that the same conclusions were reached a long time ago (perhaps with less conclusive methods). What is the question we are trying to answer here?

I would like to publicly thank Roman for the effort that he put into this comprehensive analyis and putting it into a form that can be readily comprehended from his explanations. After communicating by email with Roman, I think I can say that this old dog has learned a new trick or two. I hope the interested parties here will study the data and calculations so that we can have a conversation about the more interesting aspects of it — the population and CRN interaction coming to mind.

After seeing Roman’s methods and having him explain them to me, I wholeheartily agree with his analysis and understand the errors and shortcomings of my more simple-minded approach. I feel more confident of the apparent conclusions now that a statistician with Roman’s credentials has analyzed the data. In the end I would hope when the Watts team reaches a completion point with their CRN evaluations that their representatives and Roman could combine to publish the results. I gave Roman the data from the USHCN Calculated Mean data set for the period 1920-2005, future analyses might want to consider other data sets and time periods — even though I thought I had good a prior reasons for using this data set and time period.

This type of analysis is used to separate the factors which simultaneously affect the measured trends, understand how they interact and to quantify the size of the effect of each factor. As well, within the context of the model, we can make some decisions about which effects might be really there and which might be explained as random variation. I look at it as a piece of the larger picture of how to interpret the temperature record.

It’s very late on the thread but I have to say thanks for the nice work, I really wondered how much these stations were affected by quality. It looks like just sorting them on quality will get the job done in the future.

It’s very late on the thread but I have to say thanks for the nice work, I wondered how much these stations were affected by quality. It looks like just sorting them on quality will make a major improvement.

Thanks for the compliment. It was a fairly standard Analysis of Covariance. It would be better to have a larger set of stations (and possibly some changes in the model)if it was to be redone. Yes, it does indicate that there might be some differences due to quality of the stations.

With respect, what is the goal of your analysis? It seems that the same conclusions were reached a long time ago (perhaps with less conclusive methods). What is the question we are trying to answer here?

Clayton, I have been aware of the Watts teams findings for sometime and some qualitative statements made about the expectations from the ratings, but I have not seen any comprehensive analysis such as Roman presents here. I agree with Roman it does not give a complete or final picture, but its form allows for some interesting and important talking points — in my view anyway.

Does anyone have any potential explanations or conjectures for the interaction of the CRN station rating with the station population rating? I have been attempting to come up with some of my own, but in the end after some clearer thinking had to reject them. Perhaps some of the Watts team members have some insights into this phenomenon.

Just to remind you of what it is I am talking about here, I am summarizing RomanM’s interaction graphs in #130. The approximate ranges of the trends CRN1-5 for Rural stations is 0.3, for Suburban stations is 0.9 and for Urban stations is 1.5. The order of trends from lowest to highest for CRN1-5 is 2,3,1,4,5 for Rural; 1,4,3,2,5 for Suburban and 1,2,3,4,5 for Urban stations.

What appears to me out of this result is that Urban located stations give a measure of what one might expect from the CRN ratings in both spread and progression from 1 through 5. Suburban located stations appear intermediate between Rural and Urban for the spreads, as one might expect for an interaction, but for both Rural and Suburban stations, the CRN progressions seem to get disrupted.

In general, one might want to view these results as the Rural and Suburban environment having a mitigating effect on the CRN micro site differences, i.e. they are more forgiving of these differences than the Urban environment.

I suppose if we could come up with a reasonable explanation of this effect, we could go back to the regression model with expanded CRN ratings and look at the results, but I will leave that judgment to RomanM.