The Talking Points Memo

The NOAA Talking Points memo falls well short of a “full, true and plain disclosure” standard – aside from the failure to appropriately credit Watts (2009).

They presented the following graphic that purported to show that NOAA’s negligent administration of the USHCN station network did not “matter”, describing the stations as follows:

Two national time series were made using the same gridding and area averaging technique. One analysis was for the full data set. The other used only the 70 stations that surfacestations.org classified as good or best… the two time series, shown below as both annual data and smooth data, are remarkably similar. Clearly there is no indication for this analysis that poor current siting is imparting a bias in the U.S. temperature trends.

Figure 1. From Talking Points Memo.

Beyond the above sentence, there was no further information on the provenance of the two data sets. NOAA did not archive either data set nor provide source code for reconciliation.

The red graphic for the “full data set” had, using the preferred terminology of climate science, a “remarkable similarity” to the NOAA 48 data set that I’d previously compared to the corresponding GISS data set here (which showed a strong trend of NOAA relative to GISS). Here’s a replot of that data – there are some key telltales evidencing that this has a common provenance to the red series in the Talking Points graphic.

An obvious question is whether the Talking Points starting point of 1950 is relevant. Here’s the corresponding graphic with the 1895 starting point used in USHCN v2. Has the truncation of the graphic start at 1950 “enhanced” the visual impression of an increasing trend? I think so.

Figure 3. As Figure 2, but to USHCN v2 start

The Talking Points’ main point is its purported demonstration that UHI-type impacts don’t “matter”. To show one flaw in their arm-waving, here is a comparison of the NOAA U.S. temperature data set and the NASA GISS US temperature data set over the same period – a comparison that I’ve made on several occasions, including most recently here. NASA GISS adjusts US temperatures for UHI using nightlights information, coercing the low-frequency data to the higher-quality stations. The trend difference between NOAA and NASA GISS is approximately 0.7 deg F/century in the 1950-2008 period in question: obviously not a small proportion of the total reported increase.

Figure 4. Difference between NOAA and NASA in the 1950-2008 period. In def F following NOAA (rather than deg C)

As has been discussed at considerable length, the NASA GISS adjusted version runs fairly close to “good” CRN1-2 stations – a point which Team superfans have used in a bait-and-switch to supposedly vindicate entirely different NASA GISS adjustments in the ROW, (adjustments which appear to me to be no more than random permutations of the data, a point discussed at considerable length on other occasions.)

For present purposes, we need only focus on the observation that there is a substantial trend difference between NOAA and GISS trends.

Given that, when NOAA’s Talking Points claim that there is a supposedly negligible difference between the average of their “good” stations and the NOAA average (which we know to run hot relative to GISS), then arguably this raises issues about the new USHCN procedures.

Y’see, while NOAA doesn’t actually bother saying how it did the calculations, here’s my guess as to what they did. The new USHCN data sets (as I’ll discuss in a future post) ONLY show adjusted data. No more inconvenient data trails with unadjusted and TOBS versions.

When I looked at SHAP and FILNET adjustments a couple of years ago, one of my principal objections to these methods was that they adjusted “good” stations. After FILNET adjustment, stations looked a lot more similar than they did before. I’ll bet that the new USHCN adjustments have a similar effect and that the Talking Points memo compares adjusted versions of “good” stations to the overall average.

So what they are probably saying is this: after the new USHCN “adjustments” (about which little is known as the ink is barely dry on the journal article describing the new method and code for which is unavailable), there isn’t much difference between the average of good stations and the average of all stations.

If the NASA GISS adjustment procedure in the US is justified (and most Team advocates have supported the NASA GISS adjustment in the US), then the Talking Points memo merely demonstrates that there is something wrong with the new USHCN adjustments.

Every article that I’ve examined that purports to show that UHI doesn’t “matter” – Jones et al 1990; Peterson…, Parker – all have a similar strategy. (See past CA posts.) They purport to stratify a data set into “urban” and “non-urban” and then show that there is no difference ,..ergo UHI doesn’t matter. In every case that I’ve examined, the stratification is a jumble.

For example, in Peterson’s data set, there are rural sites in the “urban” stratum and vice versa. There is a noticeable difference between trends for properly stratified data in Peterson’s data but not for a Peterson-style jumble. The mechanisms for achieving the jumble varies.

Steve, when you write things like “a point which Team superfans have used in a bait-and-switch” it just undermines your credibility. Please stick to a dispassionate presentation of the facts.

Steve: Unfortunately it’s a correct statement of events. I’ve spent a lot of time distinguishing the two situations. If you and others had stepped up and corrected prior attempts to spin the GISS US adjustments to vindicate GISS ROW adjustments, then maybe I wouldn’t have felt obliged to observe this one more time. Do you agree with the substantive point that Hansen’s ROW adjustments are little more than random and that the GISS US adjustment methodology is different than the GISS ROW adjustment methodology?

I think that’s really the whole point of the UHI adjustments, to homogenize rural and urban trends – so when they point to the fact that the highest quality stations alone (predominantly rural) match the overall average – they are merely highlighting an intended result of the adjustment process.

The true comparison would use the raw data. It would be nice if somebody could post a distribution of trends for the 70 highest quality stations, and a distribution of trends for the entire network. I am betting there is a significant difference.

The CRN ratings have nothing to do with the urban/rural location of the site, only the microsite itself. I don’t know if the CRN1/2s are rural or not. Looking at an older version of Anthony’s spreadsheet it looks like a lot of the 1/2 sites are at airports with the lights field as mostly bright (which I assume to be considered urban). Don’t know what the final tally will be once Anthony completes his analysis.

If they don’t let people see what they are doing with their super duper, secret adjustment algorithm, they deserve no more respect than the Great Oz. A tip of the hat to all the “Totos” out there exposing the “man behind the curtain”.

Question — there was a post at WUWT which laid out some of the bizarre steps which are taken from the time an observer records a reading at a temperature site until it becomes part of the dataset. I don’t think it went into any detail on the manner in which such readings are adjusted and massaged by the readings at surrounding sites (within 250 or 500 miles). Has anyone ever prepared a concise but complete explanation of all the adjustments which affect the record of a temperature reading?

Put another way — I expect that most people simply assume that the temperatures are read at a monitoring station and recorded in the dataset. That is, they could go to some govt record and see what the temperature was on such and such a date at any given site (e.g. the avg temp at my local monitoring site might have been 79 degrees on May 29, 1967). And they assume that the number recorded at the site that day is still the number in the records today amd that today’s reading will similarly be available years from now. Most would be shocked to learn how often the numbers get changed.

Has anyone got a good description of all the different ways and different times that number might have been “adjusted” and any govt explanation for the adjustments?

The new USHCN improved adjustments were published quite recently without any code or intermediates. I haven’t had an opportunity to experiment with the method. In order to do so, I’d first have to try to replicate what they did, which is never all that easy when you lack both code and intermediates. I emailed Menne of NOAA asking for code.

Steve, when you write things like “a point which Team superfans have used in a bait-and-switch” it just undermines your credibility. Please stick to a dispassionate presentation of the facts.

How can it undermine his credibility if it’s true?

If you can’t stand a little spicy editorializing once in a while, you can go to other websites that won’t offend your sensibilities.

I would add that we just had massive and hugely expensive legislation passed based on the mystical messaging of the temperature data that Steve is exposing. One commentator called the legislation an “abomination.” So if Steve occasionally rumbles in the jumble, I think you should cut him some slack.

It is clear that they are using data from surrounding stations to provide the adjustments to a given station. As a result it is circular to compare the adjusted 70 station trend to the adjusted rest of the network. The Reno example that they show on this page gives some idea of how large these adjustments can be. It is likely that the person who wrote the talking points memo didn’t really understand this.

They are also no longer making a separate UHI adjustment since this inter-station process is supposed to handle that effect as well.

What would be interesting would be to get the data after the time of day and equipment adjustments, but before the inter-station normalization adjustments. Then a meaningful comparison between the 70 “good” stations and the rest of the network could be made.

Based on some earlier work I agree that there is evidence that the GISS temperature set in the US is pretty good. I also saw evidence that the temperature set was good in other areas with a relatively high education level.

When I first saw that graph comparing the 70 with the 1228 I thought it was more than ‘remarkably similar’ – it seemed to me ‘suspiciously similar’. Particularly since 10 states are not represented at all in the 70. If I was a statistician I would be tempted to try and check whether the fit was better than you’d expect by chance.

Has Anthony or anyone else attempted a a good sites / all sites comparison?

I did this exercise in fall 2007 for 35 or so sites, as did John V, but haven’t updated it. The GISS adjustment in the US does a reasonable job of matching “good” sites. That’s the idea in the present post – the NOAA average doesn’t.

It sounds to me that they are saying that when they adjust all of their data, essentially smoothing it by changing all sites to look like their neighbors, the good sites don’t look any different than the bad sites.

But how does that speak to the quality of the data. If you “fix” good sites, but averaging them with bad sites, it just gives you the same crappy data the bad sites had.

The penny is going to drop with the general public at some point about these adjustments. I was just watching a news report about the UK heat wave predicted this week and one of the reporters was saying that because of the buildings and roads the cities can be up to 8 degrees hotter than the countryside. He even had a thermometer showing how it was 30.1 °C in London. I’m just waiting for someone to point out to them, well if the cities have been getting bigger but your station has remained in the same place will the temperature reading not be affected as well? And then that great word, and how has it been ‘characterised’ (or ‘characterized’ for all the non-UK people)?

Two national time series were made using the same gridding and area averaging technique.

The problem with the above statement is that averaging temp or weather over areas serves no meaningful purpose. Two place 10 miles apart (say one on the ocean side of an island and one on the mainland side of the same island) can have very different weather, especially the same time of day.

I live approx 70 miles north of Seattle, and we get about 10 inches less annual rainfall per year. Seattle’s average rainfall is closer to that of Dallas, yet Seattle’s weather is very different. How is averaging our weather with Seattle of any use whatsoever? How is a “global mean temperature” valid in any way?

Two posts piqued my interest here: the one on CRN ratings and the fasct that RomanM is reading these posts.

RomanM did a proper factor analyses of the temperature trends using data that I provided him for USHCN stations and the Watts team CRN ratings (with attribution to the Watts team)- after I made a false start in an analysis of my own. I will let RomanM link you to the analysis if he is so inclined.

A proper analysis, from this layperson’s perspective, requires that factors such as latitude, longitude, population (rural, suburban, urban) and altitude be taken into account. RomanM’s analysis included these factors and evidently the one under discussion here did not.

It is also important to remember that the temperature trends have a large variations from station to station and within a CRN rating classification. As a result large numbers of stations are required to determine statistically significant differences. As I recall RomanM’s analysis showed differences between classifications but showing statistical differences required comparing CRN123 to CRN45. There are simply not that many stations classified as CRN12.

As an aside I have previously shown here that the USHCN Version 1 and Version 2 give statistically significant different trends from 1920-2006 (1920 start selected due to the amount of missing data prior to that time) when comparing only those 1000 odd stations that had a full set of data and were common to both series. In the Version 1 vs Version 2 analysis some individual stations varied by as much as from 0.2 to 0.3 degrees C per decade depending on Version used.

I had previously shown with a similar analysis that Version 1 USHCN and GISS US had statistically significant different trends.

Papers that I have read on these series differences attempt to minimize the differences and usually point to starting dates that do minimize differences. I think the authors walk a fine line by presenting new and improved versions without indicating that that probably makes the old (or new depending on your perspective) version wrong and could also call into question the confidence intervals claimed for the original series.

The analysis Ken is referring to can be found here. It should not be looked at as definitive in any particular way, but rather more as an example of what should be done in assessing the effect of station quality on trends given the presence of other factors.

Taking the latitude and longitude into account is actually introducing some regionality into the mix. A better idea might be to make the regions categorical probably based on geographic characteristics.

Using altitude as a numeric variable should also be reconsidered. However, since it wasn’t going to be “published”, I did the analysis reasonably quickly and with less attention than I would otherwise.

I’ve been reading your stuff off and on for some time. Your work is fantastic! In a world where media types just uncritically repeat everything they are told by environmentalists, you are the voice of reason that gives me hope that rational science will prevail. Thanks.

Until someone has a complete understanding of all the ways the raw data is adjusted, I don’t know how anyone can compare the “accuracy” of one set of adjusted data from “good” sites with the “accuracy” of an entire data set of adjusted data.

Q — is this an accurate summary of the claim? The algorithm does not distinguish between good sites and bad sites, so it doesn’t know what part of the data is good. About 90% of the data is bad. The extent of the error varies for each site, so they don’t know by how much the data is off. And they say their algorithm can correct the entire data set.

Someone please tell me that my understanding of their claim is wrong. Because if that’s what they are claiming …. ummm, let’s just say I’d start wondering if Michael Mann was involved.

Thanks Roman. I missed this post before ( jeez swore it existed, but he’s rarely right about anything.. haha)

WRT altitude. The approach that hansen, jones, noaa etc all use is to ignore altitude. Now, this makes some sense because we are looking for trend data. However, there is something nobody has ever thought about. It only
makes sense over time if the mean altitude of all sites used for the index stay the same. That is, if over time, you
drop and add sites to the entire database, then you had better check what happens to the mean altitude of all sites.

The altitutde is in the analysis as a linear covariate. I would likely categorize it in probably three categories (Low,Middle,High) in a future rehash of the same problem. However, this could introduce an interaction with a Regional variable based on geographic considerations. This interaction would have to be allowed for as well.

WRT altitude. The approach that hansen, jones, noaa etc all use is to ignore altitude. Now, this makes some sense because we are looking for trend data. However, there is something nobody has ever thought about. It only
makes sense over time if the mean altitude of all sites used for the index stay the same. That is, if over time, you
drop and add sites to the entire database, then you had better check what happens to the mean altitude of all

When you are comparing CRN classifications you do not know whether the longitude, latitude, altitude or other factors that could effect temperature trends are distributed evenly among the classifications. This is particularly true when those number of stations are small such as CRN12.

My point in all these analysis is the hope that Watts will invite RomanM (or a reasonable proximity) to do a proper statistical analysis on the CRN classifications. John V’s original attempt, in my mind, was a quick reaction to Watts’ work and a very poorly thought out comparison in an attempt to defend the GISS data. I still see TCO making the blog rounds claiming that John V’s analysis put the lie to Watts’ claims. He totally does not understand the underlying factors and I think neither do many other observers.

Re: steven mosher (#29), ha, since a serious number of rural sites were lost around 1990, then the temperature may well have gone up just from altitude effects if, as seems likely, rural stations are higher on average.

Steve: In figure 4 You say that you are reporting in Fahrenheit, and the caption supports that, however earlier in the article you state that the trend is “.7 degrees C per century”…Which is it? I know that you are being consistent in the math (I checked your work, NCDC does indeed run hot relative to GISS in the US, and after some initial confusion as to why my numbers appeared to show a much bigger discrepancy than expected I realized-Do’h!-I had forgot to change from F to C!) but the language is confusing.

RE Steve #37, RomanM #32,
Corrected URL now downloads, but the .tab file won’t open with either Excel or Notepad. Is this an R-specific format? .tab seems to be both a Gnu freeware format and a different, proprietary Mapinfo format. If it were truly just tab separated as in the GNU version, I would think Excel would have no problem with it.

RE 35. the issue is this. Let’s assume that I sample 1000 stations in 1900. Lets assume the average temperature
is 0C. Let’s assume that the average altitude of all stations is 1000 ft. Now, lets assume that I sample the
same sites in 2009. and lets assume that there is no increase in temperature over that 109 year period.
I get an increase of zero and a trend of zero.

Now, lets change the sampling in 2009. sites drop out, I reject some for QA reasons, etc. Lets say my sample
falls to 800. Still plenty of sites. But lets assume that the sites that were dropped were skewed toward the sites
at higher altitude and my average altitude for all sites is now 900 feet instead of 1000 feet. a mere 100 ft change.
What will happen to the temperature? The average will increase by .198C. Why? because the lapse rate is 1.98C per 1000 feet of altitude change.

The one way around this is to adjust all sites to the same altitude. That is, correct for the lapse rate.
very simply, if the average altitude of sites in 1900 were 1000ft and the average altitude of all sites today
were 900 feet you would see a spurious warming trend. If altitude increased you would see a spurious cooling trend.
Now you could assume that the sites added and dropped were normally distributed, but why take that chance when you can just adjust for the lapse rate. Adjusting for the lapse rate is something that NOAA used to do when a site
changes location ( SHAP used to do this) so if a site X changed altitude over time SHAP would adjust for the change in lapse rate. But what people have assumed is that the average of all stations over time is a constant. My bet? it’s not.

It’s not thatmuch of a problem. In the analysis of variance (or covariance) setting, altitude can be a factor with different categories (called levels) representing ranges of altitudes.. During the analysis, the average for that level can be estimated. If the percentage of stations for each level changes, we simply adjust the weighting for each level to keep fixed percentages.

For a simple example, suppose there are two altitude ranges with 50% of the stations in each in 1900. In 1910, the percentage shifts to a 40-60 split. Estimate the mean for each group separately and average the two results using a 50-50 split on the weights. The bias is removed and the cost is a possible increase in the error bars due to the imbalance in the numbers of observations.

ANOVA and ANACOVA do this in a more complicated multifactor setup. When looking for trends, one can keep the weights fixed throughout the entire time period, possibly to refelect area considerations.

What you say is especially likely since so many airports have been used in recent years. Airports tend toward low altitudes so that one would exect a warm bias over time. Add to that urban encroachment on airports and more concrete, etc at airports and it’s hard not to have warming.

Re 42. I think I get it Roman. My concern is that the effect size of microsite bias might be rather small
( I once guesstimated on the order of .15C) so that small changes in the altitude distribution over time
could swamp that signal. Kenneth and I discussed this at some length and he took the path of
trying to account for altitude differences and I ignored them. It dawned on me a couple days ago that with sites changing over time ( which sites are in and out) I was most certainly wrong by assuming that the average altitude of all sites would be consistent over time.

re 43. Well, I wouldn’t make any bets one way or the other. I would simply look at the average altitude of all sampled sites over time. Now, its complicated by this fact. If a site, X432657, changes its altitude over time. Lets say
it migrates from 1000 feet to 500 feet, you would see this this under USHCN V1. In USHCN V1, the SHAP adjustment
would takes those years at 1000 feet and rebaseline them to 500 feet ( warm the record) and the metadata would
indicate that the site was at 500 feet. You’d have to go look at station history to reconstruct the changes in altitude.
This also complicates, I believe, Roman’s analysis. the altitude record ( AFAIK) is the last know altitude for that site.
Further, if the site is a composite site, thens its record is a jumble of altitudes. The point is to do this analysis properly I think we need metadata ( variables like population, site elevation, lat lon) OVER TIME. we have some of it but not all.

in V2 changes in a sites alt, lat lon, are all munged together in one generalized break point analysis routine.

So, all the detail one used to have with Filnet, shap,tobs etc.. all that detail is now smeared away into one homogenization routine. I really dont like that very much.

I have never really got right into the UHI issue, and I suspect that neither have the purveyors of the temperature series.

In your post you say “NASA GISS adjusts US temperatures for UHI using nightlights information”. But what can that really mean? Does it mean that they assume that stations within bright nightlights are affected more by UHI effects than stations in ‘dark’ areas? But if so, what is that telling them? Or maybe they are looking at changes in nightlights intensity over time? And what adjustments, if any, do they actually make?

The real concern with UHI is not UHI itself, but Delta UHI (using the term ‘Delta’ to mean ‘change’ as in calculus) over time. It is Delta UHI that causes a change in the temperature anomaly over time. The difficulty is, how do you measure Delta UHI, particularly when there may be other confounding factors (changing population of temperature stations, changing instrumentation, changing measurement protocols such as time of day and frequency of readings, degrading Stevenson screens, growth of vegetation increasing shade etc).

Measuring UHI (as opposed to Delta UHI over time) is quite simple since anybody can actually measure the UHI of a city or airport simply by driving by in a car equipped with an external temperature readout, and measuring the anomaly at, say, 1 minute or 30 second intervals (time, not lat/long though a GPS would allow the latter) as the car progresses from rural to urban and back to rural again. I recall that WUWT presented such a traverse some time ago.

Measuring Delta UHI is quite a different matter, since to do it properly, one would need an array of such traverses over time – for example if for a particular city, one had a temperature traverse for each of 1900, 1910, 1920, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, and 2009, we could reasonably say that we have measured the Delta UHI over 109 years, and the Delta UHI over that time has been x degrees. Of course it is highly unlikely that such measurements exist. And even there, there are confounding factors. The traverses would have to be done at the same time of day, on the same date each year to be meaningful. Clearly such records are unlikely to be available.

Another way to measure Delta UHI is to compare clearly urban temperature records with clearly rural stations nearby. Often this shows warming for the urban temperature record, but no warming for the rural record. A complication here is that so called ‘rural’ records are often located at airports, which, as is well documented, have their own UHI effect which is likely to have grown over time.

There is another way to measure Delta UHI, which is to separately look at the trend of temperatures for night-time readings compared with the trend for day-time readings. It is argued that Delta UHI generally manifests in increases in night time temperatures while the day time temperatures are more stable.

There are clearly many questions relating to UHI, and the way in which Delta UHI may have corrupted the temperature records in many parts of the world, giving the illusion of more Global Warming than might actually be the case. The purveyors of the temperature series have hardly been forthcoming in their disclosure about how they have adjusted for UHI effects. Generally one is left with the impression that we should just trust them, because they are climate scientists.

In an effort to explore understandings of this issue, I turned to Wikipedia (yes, I know!) on UHI, and found a fairly useful account of what a UHI is, but a much less convincing account of how Delta UHI might affect the temperature record.

A view often held by skeptics of global warming, is that much of the temperature increase seen in land based thermometers could be due to an increase in urbanization and the siting of measurement stations in urban areas.[36][35] However, these views are mainly presented in “popular literature” and there are no known scientific peer-reviewed papers holding this view.[37]

The Fourth Assessment Report from the IPCC states the following.

“Studies that have looked at hemispheric and global scales conclude that any urban-related trend is an order of magnitude smaller than decadal and longer time-scale trends evident in the series (e.g., Jones et al., 1990; Peterson et al., 1999). This result could partly be attributed to the omission from the gridded data set of a small number of sites (<1%) with clear urban-related warming trends. In a worldwide set of about 270 stations, Parker (2004, 2006) noted that warming trends in night minimum temperatures over the period 1950 to 2000 were not enhanced on calm nights, which would be the time most likely to be affected by urban warming. Thus, the global land warming trend discussed is very unlikely to be influenced significantly by increasing urbanisation (Parker, 2006). … Accordingly, this assessment adds the same level of urban warming uncertainty as in the TAR: 0.006°C per decade since 1900 for land, and 0.002°C per decade since 1900 for blended land with ocean, as ocean UHI is zero.[38]”

In the past two years, only ~130 US stations (Alaska and Hawaii included) were used to create the US average temperature. Almost all are located at airports, and all but ~15 of those sites are considered urban (and those are all airports).

So…GISS is homogenizing 2008 / 2009 temperatures with stations that “stopped” reporting in 2007 or earlier.

Do we know how many of the so-called rural stations are in the “best sited” bucket?

John, I guess that USHCN’s change of methodology poses a bit of a problem for GISS. USHCN hasn’t updated its former methodology since 2007 and, as I recall, GISS looks for the old data set in Step 0. :)

John, it would be worthwhile (IMO) focusing on the Hawaii stations as this narrows down all the smoothing influences. Plus we can use the CRU gridcell information to reverse engineer their method a bit.

re 47. Yes Lucy. Let me put it this way. It’s a testable hypothesis. Take all the global stations that GISS uses in 1990,
calculate the average altitude of the stations. Now, do the same thing in 2009. You can be almost 100% sure that the mean altitude will be different. If the altitude is lower, then some of the warming you see from 1990 to 2009 is the result of using a sample with a different average altitude. Every 50 feet is worth about .05C. Simple test. I did a similar test long ago when Kenneth and I were discussing the lapse rate issue WRT his analysis, but it didnt dawn on me to turn the difference over time into a delta C.

Re: Steve McIntyre (#53), You keep making this point but neither GISS nor CRU nor NCDC collect data in this sense. The National Weather Services of each country transmit data in different formats, for different stations with different degrees of quality control – the daily temperature data and monthly means (the CLIMAT reports) for instance. Only stations that report monthly CLIMAT data are in GHCN – so in line with your drive to improve attribution, perhaps you could direct your ire and questions to the national weather services?

RE 51. How is GISS working now? I thought they worked off USHCN raw. But I couldnt find a V2 raw. Also, hansen likes to do his own building of a reference station and it looks like Vose and company have stitched together stations in their product.

The owners of two datasets both claim that their data is “remarkably similar” to the temperature data of the best rated stations in the US, though the two datasets differ by a “remarkable” 0.7°C/century.

This appears to be a major issue regarding NOAA and I can’t imagine, how they want to get away with that.

I hope this story will keep attention here and elsewhere, also because it raises questions about other datasets controlled and modified by NOAA.

RE Steve Mosher #52,
If GISS is averaging station-specific anomalies, a change in average altitude (or latitude) should give no bias to the trend one way or the other.

But RE John Goetz #49, if GISS is using 15 rural stations to “correct” 130 urban stations, it may as well just discard the urban stations and average the 15 rural stations. “Correcting” the urban stations just gives the false impression of a larger sample than really exists.

Every article that I’ve examined that purports to show that UHI doesn’t “matter” – Jones et al 1990; Peterson…, Parker – all have a similar strategy. (See past CA posts.) They purport to stratify a data set into “urban” and “non-urban” and then show that there is no difference ,..ergo UHI doesn’t matter. In every case that I’ve examined, the stratification is a jumble.

The issue you’ve described here is a subtle one, so subtle that I didn’t understand what you were getting at until I read this post twice. After the second time a light bulb went off in my head. As others have pointed out this is essentially a circular argument and quite astonishing that the people carrying out those studies haven’t realised the fatal flaw. I understand now why you are talking about the TOBS adjusted data which has not been homogenised and why it’s so critical that it’s no longer available. What you really need to test the site quality issue is data which has been adjusted to remove all known (and provable) biases like TOBS but where the series have not “polluted” each other by being adjusted against each other. You can’t properly analyse the difference between two data sets when the adjustment method mixes the content of the sets together – they need to be independent.

For example, in Peterson’s data set, there are rural sites in the “urban” stratum and vice versa. There is a noticeable difference between trends for properly stratified data in Peterson’s data but not for a Peterson-style jumble. The mechanisms for achieving the jumble varies.

Well that’s just silly – if one is going to work out whether there’s a difference between set A and set B the first thing one must do is ensure that set A and set B have been properly defined. Comparing set C and set D isn’t going to give you the right answer.

If I understand what you are getting at, you want to get the TOBS adjusted, unhomogenised anomaly data for all the sites Anthony has surveyed, divide it into two sets based on the assessed site quality, and then compare the average of those two sets to see if the site quality actually makes a difference. It’s truly unfortunate that the data you need to do this is no longer available. I don’t suppose you kept an out of date copy before they pulled it?

Forgive me if someone already suggsted this (haven’t had time to read the comments). Can you verify your suspicion of this being adjusted CRN1 & 2 data by comparing it’s historical values to the historical raw data values that used to be available? If they match, no foul. If they differ as you predicted, you have a strong form of verification. My assumption here is that you have archived the raw data that used to be available or have some access to it somehow.

This can be viewed in MS Explorer, and interpreted using the readme.txt and station list on the NOAA FTP site. Estimated and incomplete values are indicated with E, Q, X and I codes immediately after the monthly and annual figures.

The first field is the 6-digit station code, run together with a “3” indicating these are average values, and then the YYYY date. The fixed field formats would would be easily legible in FORTRAN, but would take some doing to parse in MATLAB.

RE Joeshill #19 (slightly edited for airport-filter compatability),

It sounds to me that they are saying that when they adjust all of their data, essentially smoothing it by changing all sites to look like their neighbors, the good sites don’t look any different than the bad sites.

But how does that speak to the quality of the data. If you “fix” good sites, but averaging them with bad sites, it just gives you the same cr*ppy data the bad sites had.

Hu, if you’re going to work on this, I’ll make a zipfile of the time series data that I’ve collated rather than you spend time scraping the data. But you could also (and I recommend this) learn enough R to download R objects even if you write them to file yourself.

RE Steve #63,
Thanks, but I have no plans to use this data myself — I just wanted to look at it, which I can now do.
In any event, if this data has been “homogenized” by making good stations conform to the bad ones and vice-versa, there is no point in anyone’s using it for anything.

Before I go back and check my sources and calculations I was hoping that someone here at CA has done some comparison work between Version 1 (USHCN Urban) and Version 2 of the USHCN series.

I found a trend difference between Version 1 and Version 2, but more troubling to me is that none of the 1000 odd USHCN stations that had a complete set of data between 1920-2006 had exactly the same trends for the Version 1 and Version 2. There was in fact a wide range of different trends for the same stations in a number of cases.

If indeed the authors of these series wiped the adjustments clean for the Version 1 and then applied the Version 2 change point adjustments, I would assume that getting exactly the same results would not be very likely. Is that assumption sound reasonable?

If the adjustments can vary the temperature trends as much as I apparently have found for individual stations (there is obviously some averaging out of these differences when all stations are taken into account) then I am curious what that says about the validity of the adjustments. If Version 1, which was used for many years with claims for the validity of the adjustments, can change this much for a newer version with a different adjustment approach, what does that say about the potential of the new version not being accurate?

I suspect an analysis of the stations with the biggest version differences might shed some light on these questions.

He made the same mistake JohnV made when he did the same analysis…. he used CRN 1 and 2 stations that are in urban heat islands. If you research the USHCNv2, the authors admit that all UHI is not accounted for by v2. In their paper, they do not say it in those exact terms; they say that the algorithms cannot account for all inhomogeneities where the same inhomogeneity exists in many of the stations in the same series. UHI, in general, is one of three problems with the USHCNv2.

If you know how to duplicate the NOAA graph, it might be amusing to see how badly the raw data has to be mangled before it overwhelms the homogenization. I didn’t see links to previous articles where you might have info on duplicating the NOAA graph.

[…] discussing it on WUWT, Dr. Pielke likely wouldn’t have commented on it, McIntyre wouldn’t have written about it, twice, and thus from all the pickups from those articles, Mr. Sinclair probably wouldn’t have […]