Surface Stations

People have quite reasonably asked about my connection with the surface stations article, given my puzzlement at Anthony’s announcement last week. Anthony described my last-minute involvement here.

As readers are probably aware, I haven’t taken much issue with temperature data other than pressing the field to be more transparent. The satellite data seems quite convincing to me over the past 30 years and bounds the potential impact of contamination of surface stations, a point made in a CA post on Berkeley last fall here. Prior to the satellite period, station histories are “proxies” of varying quality. Over the continental US, the UAH satellite record shows a trend of 0.29 deg C/decade (TLT) from 1979-2008, significantly higher than their GLB land trend of 0.173 deg C/decade. Over land, amplification is negligible.

Anthony had asked me long ago to help with the statistical analysis, but I hadn’t followed up. I had looked at the results in 2007, but hadn’t kept up with it subsequently.

When Anthony made his announcement of big news, I volunteered to check the announcement – presuming that it was something to do with FOIA. Mosher and I were chatting that afternoon, each of us assigning probabilities and each assigning about a 20% chance to it being something to do with the surface stations project.

Anthony sent me his draft paper. In his cover email, he said that the people who had offered to do statistical analysis hadn’t done so (each for valid reasons). So I did some analysis very quickly, which Anthony incorporated in the paper and made me a coauthor though my contribution was very last minute and limited. I haven’t parsed the rest of the paper.

I hadn’t been involved in the surface stations paper until after his announcement though I was familiar with the structure of the data from earlier studies.

I support the idea of getting the best quality metadata on stations and working outward from stations with known properties, as opposed to throwing undigested data into a hopper and hoping to get the answer. I think that breakpoints methods, whatever their merits ultimately demonstrate, need to be carefully parsed and verified against actual data with known properties (as opposed to mere simulations where you may not have thought of all the relevant confounding factors). To that extent, Anthony’s project is a real contribution, whatever the eventual results.

It seemed to me that random effects methodology could be applied to see the impact on trends of the various complicating factors – ratings category, urbanization class, equipment class. (Using the grid region as a separate random effect even provides an elegant way of regional accounting within the algorithm.) This yielded apparent confirmation in expected directions: a distinct effect for urbanization class in the expected direction; of ratings in the expected direction; and of max-min in the expected direction.

Whenever I’m working on my own material, I avoid arbitrary deadlines and like to mull things over for a few days. Unfortunately that didn’t happen in this case. There is a confounding interaction with TOBS that needs to be allowed for, as has been quickly and correctly pointed out.

When I had done my own initial assessment of this a few years ago, I had used TOBS versions and am annoyed with myself for not properly considering this factor. I should have noticed it immediately. That will teach me to keep to my practices of not rushing. Anyway, now that I’m drawn into this, I’ll have carry out the TOBS analysis, which I’ll do in the next few days (at the expense of some interesting analysis of Esper et al.)

I have commented from time to time on US data histories in the past – e.g. here, here here, each of which was done less hurriedly than the present analysis.

I posted the following at the Black Board with the hopes of seeing more discussion of the issues raised in the Watts paper:

“I see that SteveM at CA has started a thread on the subject of the Watts paper. Maybe some of these critical details can be revealed there. I would hope that a reasonable discussion could avoid the personality issues. I have questions about the use of change point algorithms that have not been answered to my satisfaction to date and I have a great deal of interest in the benchmarking of these various algorithms by testing against realistic simulated data where the truth is known. I would think that change point analysis could hypothetically be the best method of adjusting non homogeneous temperature data. Unfortunately I am aware of the limitations of these methods when working with noisy data. The key to validating any system is testing it with realistic data.”

SteveM when you say the following I would agree but how do you use the actual data, as opposed to simulated data where the truth is known, to test an adjustment process. Obviously if we know the truth for a given part of the actual raw data it would be best to use that data. My problem is that I am not at all certain that we can find actual data where we know the truth, and if we could, whether these data would include sufficient typical non homogeneities to properly test the process.

“I think that breakpoints methods, whatever their merits ultimately demonstrate, need to be carefully parsed and verified against actual data with known properties (as opposed to mere simulations where you may not have thought of all the relevant confounding factors). To that extent, Anthony’s project is a real contribution, whatever the eventual results.”

I was very frustrated by Watts lack of specificity in text of his paper in reference to the temperature data sets he was using. I am assuming that Raw means the raw data before TOB adjustment and Adjusted means the finally adjusted temperatures after TOB and then application of the Menne change point algorithm.
The links below are to the Watts paper text and figures:

Manfred,
I have graphs presented by M.A. Vukcevic that shows that the increase in in average temperature mostly is caused by higher temperature in the winter while summer temperatures are fairly constant.

Manfred, what you suggest is possible, but should be performed blindly and using notarized predictions.
One would, a prior, select some metric which one believed would describe particularities of sites.
One would then suggest from the temperature data alone that in a particular local there are predicted to be; n=14 sites =(1), n=25 sites =(2), and so on.
The locations/names/coordinates and the predicted classifications are then legally sealed and stored.
One would then dispatch volunteers to assess sites, without them knowing ANYTHING about an individual site and have them perform the assessment of site quality.
This data is then collated.
Only when the predictive and recorded data-sets are complete would one compare the two.
Such a study would be very powerful.
If one were to base ones classification on the US data the Watts study for ones predictive identification of site quality and use it in conjunction with European, Japanese or Australian volunteers, then one would have a very good, statistically coherent, study.
I am sure that many of the readers at Bishop Hill could be recruited to examine local station records and siting issues.
However, the would all depend on the predictions being done first, in secrecy, and being deposited in a notarized fashion. This is how similar studies are done in the biomedical field.

I was proposing anohter ivory tower algorithm instead, easy to investigate for those who have the skills to process the raw data. The first result would probably be a histogramm of raw data trends for stations wie equal trends in tmin, tmax, tmean compared with the others.

As I understood, the main issue with Muller’s UHI non-detection is the a priori assumption that low UHI stations have low dUHI. Station ratings are about UHI but only dUHI matters for temperature trends. The connection between UHI and dUHI is conplicated as shown in the log (or square) population law, better sited stations may experience more dUHI due to small changes than already heavily contaminated sites.

Addtionally, station ratings are questionable outside the ensemble verified by Anthony Watt’s surface station project.

Comparison of tmin,tmax,tmean trends fills this gap of Muller’s poorly understood and unverified assumption, because dUHI is typically directly visible as differences in these trends.

The result would still be a lower limit for dUHI, because some akward environment change or environment changes and weather combined may affect all trends in the same way.

But it may already be good eough to detect some dUHI. This thing may then be made more complex with additional selection criteria such as station rating.

Please note: many times in the past, commenters here and at other sites have been critical of every author of a climate science paper for a single detail, and assigned responsibility for an error to every author. Perhaps now, it will be allowed that every ‘author’ of a scientific paper doesn’t necessarily know every detail of every point made.

Another point. The fact that a statistician was brought in over the weekend to finish a paper does not reflect well on the entire effort. A scientific paper is not a homework assignment. I’m actually surprised Steve would put his name on such an effort. Apparently, at least one problem has arisen as a result. I have no idea how this work will shake out, but it certainly didn’t start well. Any such work should be gone over with a fine tooth comb before seeing the light of day.

Please note: many times in the past, commenters here and at other sites have been critical of every author of a climate science paper for a single detail, and assigned responsibility for an error to every author. Perhaps now, it will be allowed that every ‘author’ of a scientific paper doesn’t necessarily know every detail of every point made.

I’ve been reading this blog since it first came online, and I have trouble thinking of any examples of what you describe. Your description may be accurate for “other sites,” but I don’t think it’s accurate for this one.

It may turn out that there are problems, but the idea behind this was to put it out into the blogosphere for trial by fire. This wasn’t peer-reviewed, and as it turns out, neither was the BEST paper. Both are subject to this pre-peer review review. I don’t think anybody is going to be surprised if problems turn up that require some revision.

Its is hard to blog review in detail without the data they used being publicly available, but we will try and look at what we can.
Steve: Zeke, I’ll talk to Anthony about making the classifications public right now. He’s a bit sensitive from past experience, but I think that there’s a better chance of the classification being put to good use if it’s public now.

I think Anthony is within his rights to withhold his new WMO-standard classifications from all but reviewers and coauthors until this paper is accepted for publication.

But of course if he wanted to release them now that would be great! I’m dying to know what happens to Wooster and Circleville here in OH, as well as Boulder! Back in 2007, I predicted that Boulder would rise to 2 using the equivalent Leroy 1999 standards from 3 using the now-obsolete CRN standards.

1. Anthony has put it out for blog review and cited muller as a precedent for this practice. that practice included providing blog reviewers with data.

2. Anthony brought Steve on board at the last minute even though hes been working on this paper for a year. Steve has a practice as a reviewer of asking for data. Since we bloggers are asked to review this, we would like the data.

3. if, they want to release the data with limitations, that is fine to. I will sign a NDA to not retransmit the data, and to not publish any results in a journal.

4. You have to consider the possibiity than Anthony and Steve could now stall for as long as they like, never release the data and many people would consider this published paper to be an accepted fact.

Steve: Steve: Zeke, I’ll talk to Anthony about making the classifications public right now. He’s a bit sensitive from past experience, but I think that there’s a better chance of the classification being put to good use if it’s public now.

Will you continue your association with the paper if the relevant data (stn ids and siting classifications) is not made public and archived in a timely manner?

Its is hard to blog review in detail without the data they used being publicly available, but we will try and look at what we can.

Steve: Zeke, I’ll talk to Anthony about making the classifications public right now. He’s a bit sensitive from past experience, but I think that there’s a better chance of the classification being put to good use if it’s public now.

On the other hand, is it publishable in a journal if it’s completely out there on a blog page?

Release of all the classifications is of course necessary for a complete review. If Anthony is so extremely gun-shy about releasing the whole batch immediately, why not start quickly with a useful subset, such as a truly random 10% or perhaps a few states’ worth?

“A new manuscript by Muller et al.
2012, using the old categorizations of Fall et al., found roughly the same thing. Now,
however, Leroy 2010 has revised the categorization technique to include more details of
changes near the stations. This new categorization was applied to the US stations of Fall
et al., and the results, led by Anthony Watts, are much clearer now. Muller et al. 2012
did not use the new categorizations. Watts et al. demonstrate that when humans alter the
immediate landscape around the thermometer stations, there is a clear warming signal
due simply to those alterations, especially at night.”

Hmm. I suppose that sitting on data and not reporting adverse results or merely taking ones name off the paper, just got a bit dicey.

Perhaps, co author Christy should be sent a notice that the results he testified about were not fully baked.

GDN, You have to remember that preprints have always circulated in every field but today we have preprint servers such as arXiv which have fuzzed the standard. Publication today is essentially regarded as the publication of a peer reviewed paper, however, it is always best to consult a journal’s specific policy. This dichotomy is seen in that many people leave the original version on arXiv for people outside the paywall to read and the journals do not object much.

General practice is to post/send out preprints at the same time as submission because having a manuscript in good enough shape to post/send out means that it is also in good enough shape to submit. There could be a few days either way in general. People are using arXiv today to establish precedent for submissions because of how long review can take.

Mr. Mosher knows my email, and has my telephone number, and mailing address, and so far he hasn’t been able to bring himself to communicate his concerns to me directly, but instead chooses these potshots everywhere.

The project was worked on for a year before we released, a number of people looked at it at various stages. Dr. John Christy was in fact the one who suggested we should put a note in about TOBS at the end, saying we will continue to investigate it it, because he knew it would be an important consideration. I concurred. We also knew that to do it right, the TOBS comparison couldn’t simply rely on the “trust us” data from NCDC. Christy had already been through that with his study of irrigation effects in California and had to resort to the original data on B91 forms to disentangle the issue.

What we are finding so far suggests NCDC’s TOBS times (we have the master file for all stations) don’t match what the observers actually do. That’s a discrepancy that we need to resolve before we can truly measure the effect along with siting.

Mr. Mosher would do well to note this comparison.

1. When The Team gets criticized on a technical point, they typically dismiss it with a wave of the hand, saying “it doesn’t matter”. Upside down proxies, YAD061, and lat/lon conflations are good examples.

2. When we get criticized on a technical point, we stop and work on it to address the issue as best we can.

1. When The Team gets criticized on a technical point, they typically dismiss it with a wave of the hand, saying “it doesn’t matter”. Upside down proxies, YAD061, and lat/lon conflations are good examples.

Over on The Blackboard, I discussed a problem I have with your Figure 23. The first panel of it displays the location of all the compliant sites with your updated classification scheme. Table 4 of your paper says 13 previously compliant sites are no longer considered compliant. However, when I checked the 71 sites listed as Rank 1/2 on the Surface Stations website, it seemed 30 were missing from your map. I even went ahead and generated an image of your map with the location of those 71 sites marked.

If my results are right, either that figure or that table must be wrong. It’s also possible the problem goes deeper. There’s no way for me (or any other reader) to tell.

Could you tell me if I’ve messed up somewhere, or explain what’s going on if I haven’t?

Actually, I may have an answer to my own question. I think a lot of the sites that didn’t show up on that map are airports. If so, Figure 3 (23 was a typo) may actually be showing Rank 1/2 sites sans airports. It doesn’t say so, and it probably should show the airports, but it’s not a big deal. It would also explain why my visual count of locations on the map came up short.

As an aside, anyone could probably regenerate the station list from that image if they wanted. All not publishing such a list does is add a layer of tedium. It might prevent a perfect replication of the list when sites are located very close to each other, but otherwise…

“We also knew that to do it right, the TOBS comparison couldn’t simply rely on the “trust us” data from NCDC.”

But what do you propose to rely on? A complete reinterpretation of the B-91 forms? I understand that your contention is that even these can’t be relied on.

The fact is that a TOBS adjustment is required if time of observation has changed. NCDC has reports that the times did change. Adherence to stated times may be imperfect, but that doesn’t mean that the reported changes can be ignored.

I highly recommend everyone read (or re-read) this link of a post from last fall by Steve and which is also linked in the second paragraph of this post. It’s a “big picture” view of his thoughts on the subject of BEST and satellite temp data. Also very accessible for those of us who are not fully conversant with all the math and science that is discussed here.

One issue of the temperature data related to Urban/suburban/rural is that, given no microsite bias, all three temperature records are valid. The real point is that they are only valid for the area around them that is uniform to the point at which the data was taken . Heat is heat, but if the site only represents .01% of the grid cell, then it’s contribution is only .01 percent. Trying to homogenize that out makes no physical sense. Doesn’t address all the problems but it at least would take that out of the equation.

In keeping with the traditions of climate audit … turnkey code and data? You guys put the paper up for review, we can hardly start without the data.

A few questions.

1. Did you double check the ratings or just put the data in an algorithm?
2. Since the new site ratings seem to depend up some manual labor done using Google earth
Did you have occasion the do a spot check on the accuracy of those ratings?

To the latter point I’ll draw your attention to just one of many comments about the accuracy
of google earth and suggest that a audit would definitely have to go down to the raw data there.
I hope folks kept records.

“The imagery in Google Earth is stretched based on the angle of the aerial object taking it. Further, the 3D terrain of Google Earth is low resolution, so the imagery can shift depending on the inaccuracy of the terrain. (You might want to try turning off the terrain and making comparisons). To make matters worse, Google has not described in detail how it goes about registering imagery. Human factors are probably involved in alignment and especially in stitching together images for esthetic appearance sake.

The key point is that Google Earth is NOT a GIS or survey-grade dataset. They don’t promise to be, and they discourage its use for those types of applications.”

3. Since Wickham made her station list available to you prior to submission, will you make your station list available to others?

4. Why did you stop at 2008?

5. What you say about amplification here differs with what you wrote in the paper.

6. What does a comparison with CRN show?

7. You use USHCNv2 metadata to classify rural/urban. Did you check that? Do you accept that definition of rural?

8. The how were grid averages computed?

Steve:As I mentioned, I’ve been involved with this paper for only a few days. You know my personal policies. I did some limited statistical analysis, which, to my considerable annoyance, I need to revisit. As you know, I don’t have a whole lot of interest in temperature data, which is an absolute sink for time. So I’m going to either have to do the statistics from the ground up according to my standards or not touch it anymore.

Steve … re: the Google Earth imagery – since the Leroy 2010 standard appears to deal only with sinks and sources within 100m – and realistically it would seem likely in many/most cases, with sinks and sources much closer than that, is any error in Google of any importance.

Seems no matter how much you warp, twist, stretch .. over 100m or less would seem you cannot create a huge error. And then – even if it were something on the order of 25% – it would seem that would not make all that significant of a change in the ratings formula result – with exception of a few stations with sinks/sources right at 100m that might suffer a shift. I’d think those cases would be very small, if any?

You might think that, but thats just the kind of thing you want to audit.

As I understand it photos were used and measurement tools were used on those photos. So I would expect that a complete dataset would include.

1. a copy of the photos.
2. a description of the method used.
3. a calibration test of the method.
4. the measurements produced.

Then you want to check that the rating was actually done properly. Remember, just because the rater says its a 3, doesnt mean its a 3. raters make mistakes.
was it blind rated? did the rater know he was re rating a site previously rated at 1? who was the rater?

lots of things you want to know.. and check.. and not take peoples word for it. You know, apply the same standards of scrunity.

Recall if you will Anthony requested paper copies of b91s to check NOAA. I think steve may also have made requests for this data. So a standard of checking it seems to me has been set. All the way down to the raw data if its available..

Why couldn’t you use the Google Maps Sat. view (or other aerial pictometry) as a cross check?

That data is not stretched or warped. Measure a handful of sites where you think an error may exist in Google Earth then do same in Google Maps Sat. view (or Terraserver or any of the similar). Seems that should give you at least a close idea.

All that said, again – considering the Leroy 2010 method uses at max a 100m distance I just can’t see any such warping or other inaccuracy being more than a 5-10 percent if that. And I cannot think there are any large number of stations where a 5-10 meter error over 100 meters would cause a class change in any significant number of stations.

I don’t disagree it should be checked in interest of accuracy, but seems a low priority to me.

Steven:
As a very large critic of you and your *style*, I agree 100% with your take on using GE. (I also applaud your relentless hammering for the data and code) GoogleEarth is a great program that I use all the time, but you always need to confirm with ground-truth and/or USGS topographic maps and/or aerial photography. However, this level of investigation is a just a small part of data collection site evaluation.

For commercial real estate toxics due diligence, we our bound by professional standards to look at historical USGS topo maps, historical Sanborn fire insurance maps, historical city directories, historical air photos, public agency records. All of this research and subsequent analysis costs between $1,000 to $2,000 per site.

This minimum professional standard of care is very likely beyond the capability of Surface Stations crowd sourcing.

In defense of Watts’ Surface Stations (I am a huge critic of WUWT, BTW), they are attempting to do the first steps of the fundamental research that should have already been carried out in each state and county by undergrads at local colleges and universities paid for by the feds between NASA, NOAA and USGS. One year, $1M each and Bob’s your uncle.

Until individual site evaluation is conducted, all of the temperature data sets are “a pig in a poke”, no matter how much statistical lipstick is rigorously applied. It sounds like there are enough holes in the Watts, et. al. study to discredit and derail the effort of reducing uncertainty in temperature records.

Howard …your numbers and info are quite interesting ad valuable. As is your grudging appearance of respect for Watts work ;-).

You point out just how ridiculous this whole temperature data process seems to be. Alleged experts who don’t or won’t do the comparatively tiny amount of proper due diligence to assure the quality of the temp records.

AGW research has been widely reported to receive billions in funding worldwide, and similarly huge sum in the US … yet they won’t due the simple due diligence of proper site inspection and reviews.

Using your numbers – at $2,000 each – we’re talking a couple million dollars in total for a detailed professional review of all 1200 stations.

That seems a tiny price to pay – in fact it would seem that survey should be done every 5-10 years on all of the core reporting stations?

As you know, I don’t have a whole lot of interest in temperature data, which is an absolute sink for time.

A big reminder of why we should be so grateful to guys like Anthony Watts and Steven Mosher.

Having mentioned three people I greatly respect I’ll add that when I read Anthony’s teaser on Friday and all the hoopla it generated, including the published ruminations of the said Steves, something told me to “Calm down,” as one aforementioned just said to another. (The only evidence you have for my calmness and even boredom at this point is that you won’t find any record of me joining the speculation game before Anthony’s press release on Sunday. Mind you, I liked the guy who ended ‘That’s the report from my gut’. But my gut was telling me that this wasn’t worth the attention. And I think that now Steve Mc may be regretting he gave it the small amount of attention he did.)

That isn’t to say that I criticise Mr Watts in the least. He was having a go at something a bit different. Worth a shot – and once problems are sorted out, who knows what the end result will be. And I also feel that Mosher’s decision months back to get pretty heavily involved in BEST was an excellent one.

I love that feeling of seeming to face multiple ways at once, don’t you? Something I feel is essential to get even close to the truth in the climate game 🙂

Steve, these guys are trolling you. They are trying to manipulate your integrity to go after Anthony. No skeptic believes you contributed anything more to this paper than what was stated nor do they for a moment believe what is posted is the final product. Everything is clearly stated as preliminary and pre-publication.

Steve, the discussion and conclusions of Watts et al. (2012) state: “We are investigating other factors such as Time-Of-Observation changes which for the adjusted USHCNv2 is the dominant adjustment factor during 1979-2008.”

This makes this web publication even weirder to me. At least Anthony Watts seems to have ignored this problem knowingly. How did he expect to pass review with leaving such an important confounding factor out of the analysis. And it is not as if analyzing this would require a whole new study, that may have been an excuse for a less important confounder.

I am aware of the theory behind the TOBS, but it has struck me as an adjustment of a contrived scenario.

Background: I grew up as the son of an aeronautical engineer who had a fluid based min-max thermometer that he religiously recorded when he came home at the end of each work day and during the weekends. And plot them up on 11×17 K&E 1x1mm graph paper, one sheet per year, stack on top of each other, year after year, for about 35 years.

Did he always record the make the recording at 6pm? No. When he got home or after the hottest part of the day.

I am trying to figure out how some TOBS adjustment can or should be made to a min-max temperature record — provided that there was a gap between the min and max markers and the current mercury levels. A guy doesn’t do this to record bad data.

Yes, cold fronts would come through at 10pm making the low that was made at 4am and recorded at 6pm not the low of the calendar day. That low would be recorded the next day. But how can a TOBS adjust for that without recording min-max temperatures many times per day? Does that data exist? I doubt it. And why would it be a TOBS ‘adjustment’ instead of the min of several mins recorded that day?

Does the metadata exist to support blanket significant TOBS adjustments? Convince me that TOBS is not a contrived adjustment to get the answer “we want.”

Steve: allowing for a TOBS adjustment is reasonable enough. When max min are read daily, if they are read in late afternoon near the daily maximum, a hot day can end up contributing to the maxima for two consecutive days and the cooler next day not counted. The adjustment is made relative to theoretical midnight readings.

I understand the suspicion of these various adjustments which often seem arbitrary, but this one is fair enough,

Fall et all examined raw, tobs, and fully adjusted data. Their primary conclusions were based on the fully adjusted data, since raw data can have lots of other confounding issues (tobs changes, instrument changes, instrument moves, etc.) that may be correlated with urbanity or CRN rating the skew the result.

The problem is that everyone who follows this in detail knows that there is a TOBS adjustment. we know why it is made and we know that the adjustment has been tested and validated. we know that everytime you see a weird result with data or something too good to be true, you check to make sure that the proper TOBS correction has been applied.

in fact we spent a considerable time here at CA going over TOBS.

Anytime Anthony does work my first question is always..
Did you use TOBS?

Where has it been tested and validated? Citation please. When I read the original paper the adjustment is based on the authors made it clear that it was neither tested nor validated against real data, but in fact was largely guesstimated. That it cools the past and warms the present by around 0.4C is enough to tell us it should have been tested somewhere. I’ll be interested in Steves findings. I suspect this is a minefield.

“The satellite data seems quite convincing to me over the past 30 years and bounds the potential impact of contamination of surface stations”

Why are you so sure? Have you studied the satellite data and methodology with the same level of auditing scrutiny as you did with the paleo-reconstructions to claim so confidently, or it is just simply convenient to say so, in order to dismiss potentially “toxic” conclusion that the surface data might be “cooked”. Correct me if I am wrong, but the procedures of collecting and processing the satellite data to create a temperature record are extremely complicated, much more so than in the case of the surface record, and both satellite records underwent more than one revision already, all of those revisions increasing substantially the trend. What is the specific basis for you belief that satellite data has more integrity than the surface record?

If TOBS is a serious problem then I’d say we need to go back to the old observation times, or use both. And it also seems like we should erect shelters at each location that are the same as the original ones along with the MMTS style shelters. Recording electronic sensors or digital cameras pointed at thermometers could be used in the old shelters.

Steve: “Anthony sent me his draft paper. In his cover email, he said that the people who had offered to do statistical analysis hadn’t done so (each for valid reasons). So I did some analysis very quickly, which Anthony incorporated in the paper and made me a coauthor though my contribution was very last minute and limited. I haven’t parsed the rest of the paper.”

So, you allowed your name to be added to the list of coauthors without reading the paper itself?!

Steve: If the paper is submitted anywhere, I will either sign off on the analysis or not be involved. I didn’t “allow” or not “allow” anything in respect to the discussion paper.

“I support the idea of getting the best quality metadata on stations and working outward from stations with known properties,… To that extent, Anthony’s project is a real contribution, whatever the eventual results.”

I agree on the idea, if achievable. But how does Anthony’s project contribute? The data seems to be just a photo (and maybe some Google Earth measurement) at a particular point in time (2009). The metadata needed for trend is of station history.

Nick, maybe Anthony’s project will get some beurocrat to authorize a few thousand dollars for a sites’ road trip. Reading Anthony’s reply to Revkin, it seems to me that he has a point in his argument about getting out of the office and do some field work. The basic of the temp trend are the stations. Does it not make sense to know how they are sited and what can effect them before you go through all of your modeling tests?

“I agree on the idea, if achievable. But how does Anthony’s project contribute? The data seems to be just a photo (and maybe some Google Earth measurement) at a particular point in time (2009). The metadata needed for trend is of station history.”

And that can be readily accomplished without leaving your desk. Do not do smiley emoticons.

I agree that Watts evaluations are snapshots but if those snapshots present a different picture that the meta data what then?

“if those snapshots present a different picture that the meta data what then?”
They don’t present a “different” picture. They present an unrelated picture. To analyse trend, you need to know about the past. It’s possible that the current photos could be used to aid interpretation of past metadata, but I can see no indication that this has been done.

Oh god, Nick here we go again. Very obviously I am saying if the snapshot shows a very different or even just different picture than the meta data would imply for the time of the snapshot there is a problem. And by the way a snapshot could be used for even validating meta data further back in time than the time of the snapshot. For example, there are changes that the snapshot might show that could tracked by other means, like when a parking lot was blacktopped or an air conditioner was installed. I am under the impression that even good meta data does not account for these changes seen in the micro climate by these snapshots.

Having said that, I have continued to repeat that one must known when and how and over what time period the micro climate changed to become what it is documented in the snapshot in order to fully utilize it. That is particularly true where studies use only a brief period of the last 30 years. If a really low quality station evaluated today were a low quality station 30 years then we can expect no effect on the 30 year trend. Also would a slowly evolving change be found by change point algorithms or meta data?

Here is what we know: Someone recorded a min and a max value at a recorded time on a specific day. We will assume the record is good enough so that 4s can be told from 8s, 7s from 1s, 6s from zeros. That potential source of error is for another day.

Some recognizes that there is a TOBS issue with the max on day B on the same day as an unusually cold min. Did that max occur before cold front or is it a hold over from the day before?

As I see it, we have two basic assumptions:
1. the recorder is contientious so that we can trust the max and min, no action necessary, or
2. the recorder is an idiot or doesn’t care about getting it right.

If 2, then we recognize that the max MIGHT be in error.
our choices are:
2A: Record the potential error value in the measurement, and thus increase the error bars of our analysis, or
2B: Fudge the number to some estimate we TOBS adjust e and… do what to the overall error estimate? Leave it unchanged?

Naturally, I’m in favor of 1, with 2A as a fall back if the metadata indicates sloppiness of the recorder. 2B seems unacceptable to me ever since high school chemistry lab some 40 years ago.

If people believe TOBS is really important, then it should be primarilly be seen in increasing the uncertainty of results to a point that few conclusions can be made, not strengthening the signal.

And what is this “estimated at midnight” baloney? What on earth in the written record indicates that the min was happened at 11:50 or 00:10? Does it really make a difference to the 100 year climate record if the “day” was from midnight to midnight or 18:00 to 18:00? A six hour timeshift over a 100 year record is that critical to the result? Moving the temperature to an absolute written record at 18:00 to an estimated record at midnight is doing nothing to improve accuracy. My skepticism is pegging off scale.

Has anyone ever been able to study whether taking Tmin and Tmax daily over a month, a year, etc. adequately represents what happens to temps all through the varying 24 hour cycles? e.g., aren’t there some days and nights where many more minutes and hours are closer to Tmax or closer to Tmin? Does it all somehow average out or could there be significant distortions of the real “physical” temps because we don’t have 24 hour continuous data in the historical record? Do satellites now provide any adequate comparison for “validation” purposes? I may not be phrasing any of this right, I’m not in this field, but wondering if Tmin and Tmax can be enough for an accurate representation even if we had good enough data for those numbers?

Some stations measure once a day. Others measure once per hour, the newest network(CRN) does it every 10 seconds. There will different readings recorded by different measurement systems. It doesn’t make much difference as long as long as the chosen method is applied consistently at each station. Might on average be worth a couple tenths of a degree difference due to method used.

Something I’ve never seen discussed is what happens if the chosen method for a station gets changed. I think that should result in a separate record being created. Although each method is self-consistent, the difference in methods may end up creating a unwanted step-change in the record if only a single record continues to be maintained.

Columns F-G are Tmax, Tmin.
Column H is (Tmax+Tmin)/2
Column I is the average of 24 hourly readings ultimately derived from the averages of 10 second readings which are taken from the average of three different thermocouples.

Note that sometimes column H is higher than I and sometimes the reverse is true.

It is certainly a well thought out system. Too bad the system is only about ten years old. Also too bad is the idea they don’t include humidity readings in the data files which I know they record.

Output plots are average difference over the year for different observation times, and daily differences for 5pm vs 7am. The yearly average difference for CRNH0202-2011-AK_Barrow_4_ENE has similar shape, but smaller spread.
Steve: very relevant.

That is the kind of issue I was groping toward from my layman’s perspective, that it *might* matter by more than a tenth or two if a temp record is only (Tmin + Tmax) / 2

This is maybe more about the error bars then any specific correction that could be made, but I was thinking about how temps can *sometimes* be volatile during a 24 hr period, especially as weather fronts move in or out etc. Maybe it all averages out, but if there are any cloud cover changes as discussed at that BH article then there might be warming or cooling that is not about “global warming” per se (as anything related to CO2).

The point is this. If I told you that the thermometer was moved from a grassy field to under an air conditioner you would say that things changed and you would want to investigate that. If the time at which the observation was taken changes we would also want to investigate that.
we cannot pay attention to changes in observation practice in a selective manner. If we complain about changes to instruments, we have to complain obout changes in time of observation. When we do actually look at the EFFECT of changing time of observation we see very clearly that it biases the answer. changing TOB changes the temperature. Attention to details like this is something that WUWT fans should appreciate. There are two approaches to TOB changes

1. split the data, and call it two stations.
2. correct the bias.

Bias correction for TOB has been investigated. At John daly, here at CA, and in the literature.

Ignoring the need for a correction, pretending that observing practice matters for microsite but doesnt matter for changing the TOB, is not best practices

I think that Watts initial intentions were good in that he kept updating the CRN ratings for stations as the team doing the work turned it in. Some participants at these blogs, and including me, did some preliminary calculations and while the results appeared to vary with who was doing the calculations or more importantly how the calculations were being done the results were not overwhelmingly different (although I thought with sufficient data one might be able to significant differences) than the adjusted results from USHCN. The gallery at the time was expecting some dramatic differences or so was my perception. My point at that time was that the number of CRN 1 and CRN 2 stations was very small and that given the noisy data for temperature trends amongst even closer spaced stations meant that in order to see a statistically significant difference due to CRN rating would require a very large difference in trends or a larger number of stations in those classifications. I even suggested the grouping of CRN123 versus CRN45 at that time.

I admired Watts, and particularly his teams efforts, in going out into the field and looking the micro climate conditions first hand. I have often thought that climate scientists like economists fling data and statistics around of which they do have an intimate understanding and the result could be garbage in and garbage out.

I was puzzled that when Watts withdraw the updating of the CRN ratings until I realized he hoped to get the data analyzed and published. He was slow in accomplishing this task and in the meanwhile others published papers based on the CRN findings before Watts did his first paper. I have not been happy with the approach taken by any of these papers including the Watts coauthored one.

Now Watts has a different rating criteria that evidently gives different results and obviously makes this result, if it were to hold up, a publishable event. I do not understand if the prepublication is a matter of shopping the results around or not, but I see no way in hell it can be published without the original data and code. After all Watts is not exactly a climate scientist regular who might be given that exception.

It seems to me that the big deal with the paper is that a group has taken the trouble to examine the sites and analyse them using physical thinking. This seems to me to be potentially a far superior approach to the sort of adjustment flummery used heretofore. If detail needs sorting out, so be it: at least it won’t be smuggled into the literature, errors and all, by pal review.

I must say, though, that I might not have liked having my name added to a paper in a rush. (It happened to me a couple of times, and both times my new colleagues managed to get my name wrong!!)

Well, I am glad at least Mosher wants to apply the same standards to this paper as to one written by, say, Michael Mann. And I’m afraid I can’t understand how “the statistics” for anything like this can be done over a weekend. It’s troubling. I can’t understand the paper very well, there isn’t enough detail.

The USHCNv2 monthly temperature data set is described by Menne et al. (2009).
The raw and unadjusted data provided by NCDC has undergone the standard quality-control screening for errors in recording and transcription by NCDC as part of their normal ingest process but is otherwise unaltered. The intermediate (TOB) data has been adjusted for changes in time of observation such that earlier observations are consistent with current observational practice at each station. The fully adjusted data has been processed by the algorithm described by Menne et al. (2009) to remove apparent inhomogeneities where changes in the daily temperature record at a station differs significantly from neighboring stations. Unlike the unadjusted and TOB data, the adjusted data is serially complete, with missing monthly averages estimated through the use of data from neighboring stations. The USHCNv2 station temperature data in this study is identical to the data used in Fall et al. (2011), coming from the same data set.

Is it not the case that Anthony is simply using real_climate_science that would underscore a comparison of oranges to oranges thereby avoiding inconsistency from within the real_climate_science community with respect to their own accepted science (oranges) ??

If this is the case then it would seem to me that criticism surrounding (TOB) is a moot point.

I realize it was probably just miscommunication between the two of you, or maybe a last minute honour Watts thought to give you by listing you since you had pitched in, but one of the things that reassured me about the mathematics behind the paper was your involvement. It was disappointing, to say the least, to have enthusiastically touted this paper to some friends, and then see your post.

But … let that be a lesson to me. I despise self-interested cognitive biases in science, but am hardly immune.

I realize this is an interruption in what you would otherwise be doing, Steve, but I hope that you can help to tighten up the paper and salvage the value there is within it, which I hope is high. But, failing that, if it needs to be criticized, I hope you’ll do that too, with respect and rigour both.

Let the science prevail.

(That said, the conclusions of the paper make 100% intuitive sense to me and I won’t be in the least surprised to see them born out.)

“Watts et al. say a statistically significant signal was found in data using minimum adjustments.”

The fact that a statistically significant signal can be found in a set of data says nothing about the accuracy of that data. And any conclusion you draw from the statistical analysis is only as strong as the underlying data is accurate.

The raw data is known to have problems. If you refuse to address them, and if they’re significant, it’s garbage in, garbage out.

Watts needs to show that homogenization algorithms are wrong. You do that by analyzing the algorithms and showing where they are wrong, not by asserting that an analysis of raw, flawed data must be better just because it shows a lower trend. That’s essentially what Watts is doing.

Steve: Yes and no. I agree with your comment about the importance of addressing problems in raw data – that’s obviously been a major concern of mine with respect to bristlecones, Yamal and so on, where there are problems more serious than “tobs”. I also agree with your remark about assuming something is better because the result meets expectations. Again a criticism of mine with respect to proxy reconstructions. ‘

I also think that the deconstruction of homogenization algorithms is a different job from presentation of the surface stations data classification and that the two jobs should be kept separate.

I also think that the deconstruction of homogenization algorithms is a different job from presentation of the surface stations data classification and that the two jobs should be kept separate.

Wrong, because Watts is declaring that the homogenization algorithms are wrong, and pretty much stating that it’s due to a desire to show an inflated trend. He didn’t just present his surface stations data classification (hidden, as Mosher has pointed out, where’s the data?), he says they prove the homogenization algorithms are wrong.

If he didn’t go down that path, I’d agree with that. But not only has he gone down that path, but that’s his entire schtick for years, and that’s the major conclusion of his “work”.

Watts “surface stations data classification” is the result of applying Leroy (2010) siting standards to the existing readily available station data. Not a thing I can see to stop you from duplicating his work and verifying or disproving his results.

Watts identified the data used. He identified the siting standards used. He listed the process they took. And he showed his results including how the stations shifted in rating catagories from the prior Leroy (1999) standards..

I am a complete layman. I read the Watts report, did a little reading – mostly at blogs like here, and with 5 minutes of searching I was able to find all the above data links.

Of course I was a fool for doing so as after doing the digging, had I bothered to read the references I would have found all of this data was listed in the Watts report itself.

I believe that is all of the data required to reproduce Watts work. I even included the NCDC station history metadata in case you don’t want to do the extensive visual and/or onsite inspection Anthony and his help spent well over a year doing.

Seems to me instead of complaining about his work – if you want to refute it you should just jump in and have at it. Do the work and show where he is wrong.

A. Scott: Watts “surface stations data classification” is the result of applying Leroy (2010) siting standards to the existing readily available station data. Not a thing I can see to stop you from duplicating his work and verifying or disproving his results.

One thing that would stop us from verifying his results is that he has not provided a list of the USHCN that he has classified, the classification that has been assigned, or the methodology used to make the assignation.

The fact that Google has aerial imagery, that Leroy 2010 explains a new classification scheme, and that USHCN provides its station data freely to the public does not somehow make Anthony Watts’ refusal to provide the station ids that he used, or to provide the Leroy 2010 station classifications that he used, and or the methods used to make that classifcation in his paper more palatable. Hide the data; hide the code! 😆

Steve: I agree that there is little point circulating a paper without replicable data – even though this unfortunately remains a common practice in climate science. It’s not what I would have done. I’ve expressed my view on this to Anthony and am hopeful that this gets sorted out. Making the data set publicly available for statistically oriented analysts seems far more consistent with the crowdsourcing philosophy that Anthony’s successfully employed in getting the surveys done than hoarding the data like Lonnie Thompson or a real_climate_scientist.

It would have been nice if you’d spoken out on any of the occasions in which I’ve been refused data. You are entitled to criticize Anthony on this point, but it does seem opportunistic if you don’t also criticize Lonnie Thompson or David Karoly etc.

(I know you’re not really a co-author, as it’s normally understood, but until you make him take your name off the paper, you are a co-author. Time to choose, are you, or not? If not, make him remove your name from the paper, publicly.)

Steve has frequently said that when a novel statistical method is introduced, there should be a paper on details of the technique that is separate from the paper using the technique. His comment above seems no different.

Is TOBs bias such an issue given the Watts’ study period was 1978-2008? I note that Fig 3 in the Menne et al paper “THE UNITED STATES HISTORICAL CLIMATOLOGY NETWORK MONTHLY TEMPERATURE DATA – VERSION 2” shows that the TOBs bias trend flattened after 1990… and so the impact on Watts’ study should be limited to only the 80’s (if at all).

In any case from the Menne paper:

“The net effect of the TOB adjustments is to increase the overall trend in maximum temperatures by about 0.012°C dec-1 and in minimum temperatures by about 0.018°C dec-1 over the period 1985-2006″.”

So even if we accept that this applies across the entire Watts’ study period, then this is still only a 10th of the trend Watts is highlighting – i.e. 0.145°C per decade.

That’s odd. In my version of the Menne et al BAMS paper, the TOBS adjustments are shown in Fig 4 and the corresponding text, starting on p 996, says:“The net effect of the TOB adjustments is to increase the overall trend in maximum temperatures by about 0.015°C
decade-1 (±0.002) and in minimum temperatures by about 0.022°C decade-1 (±0.002) during the period 1895-2007.”

Eyeballing Fig 4, over 1979-2008 the trend difference due to TOBS looks like 0.06 °C/decade.

Got that from the above… also didn’t realise it was different from the BAMS paper – but interestingly the trend for 1985-2006 is specified there [and this is the period of interest].

My point is given that Anthony is only referencing the period from 1978-2008, then something like 50% of the stations would have ALREADY changed TObs (Time of Observation) from evening to morning. So these cannot be an issue. That should logically reduce this already small overall trend. See DeGaetano 2000:

But hinking about this a bit further, maybe there is a MMTS conversion issue in play – and not only that you go from a LiG reading to an electronic one. As Menne et al suggest that most of the HCN sites were converted in the 80’s. Modern base units record daily min/max temps for up to 35 days, so I would assume that this would be done on a strict daily basis (midnight – midnight).

So a conversion from an old style MMTS to a newer one may introduce another TOB issue where you go from a morning reading to a midnight one. Is this undestood and accounted for? Wouldn’t this introduce a warming bias?

Here is the 1986-2006 version. Good to see that peer review works, showing an impact of TOBS changes that ignores most of the 1970-90 switchover is disingenuous at best.

But if the change in observation was from afternoon to morning, I’m not sure the adjustment makes sense. “The net effect of the TOB adjustments is to increase the overall trend in maximum temperatures by about 0.012°C dec and in minimum temperatures by about 0.018°C dec over the period 1985-2006”. The bias in maximum readings is a positive bias, namely the measurement is the max of the tail end of the previous day and the current day’s maximum. How can fixing that bias yield an increase in max temps?

This is a particularly funny thread. People are worried that Anthony rushed things and that led to errors.

Yes, he did, and it did. That wasn’t the point of the timing of the release. He hasn’t submitted so there’s no need to rush from this point. He’s doing exactly what he said. Let the blogs have at it. What Steve Mc here is doing, will only add to the paper. The work of the station sitings can stand alone without the TOBS consideration. The TOBS will only give a fuller picture.

The timing was a righteous poke back at some tricks played earlier. I like it. Goose/gander.

From my limited non-technical understanding the data is readily available publicly with the exception of Anthony’s siting results using Leroy 2010. This includes the raw and adjusted temp data along with the Leroy 2010 rating spec’s, which would allow anyone to do their own duplication of the work.

To me that seems preferable here – anyone attempting to duplicate should start from the beginning – rather then working backward from the conclusions.

The paper notes they applied the readily available specs of Leroy 2010 to the Fall 2010 USHCNv2 data set.

They identify the data they use:

“We make use of the subset of USHCNv2 metadata from stations whose sites have been classified by Watts (2009)” and; “site rating metadata from Fall et al (2011)”.

They further narrow:

“Because some stations used in Fall et al. (2011) and Muller et al. (2012) suffered from a lack of the necessary supporting photography and/or measurement required to apply the Leroy (2010) rating system, or had undergone recent station moves, there is in a smaller set of station rating metadata (779 stations) than used in Fall et al (2011) and Muller et al. (2012), both of which used the data set containing 1007 rated stations.”

Seems correct to expect Steven Mosher and Zeke would have access to this station data as Watts used the same data as Muller 2012 in this regard?

They included description of data used, methods – how they calculated numbers, and their conclusions.

To me it would seem much more relevant, for those interested in replicating to follow the entire process – and see how their siting category counts came out.

And only THEN compare to Watts conclusions.

I would also be interested in seeing how the USCRN stations, which were designed per Leroy 1999 (“which was designed for site pre-selection, rather than retroactive siting evaluation and classification”) fare under a review using Leroy 2010.

Watts 2012 notes “Many USHCNv2 stations which were previously rated with the methods employed in Leroy (1999) were subsequently rated differently when the Leroy (2010) method was applied in this study”…

Again, it would be very interesting, and potentially valuable, to see if the new USCRN sees the same siting quality results using Leroy 2010.

Having personally visited the CRN site north of Seattle, and seen photos of a number of the other sites, I’d be quite surprised if any ended up below the top site ranking using Leroy 2010.

Regarding replication, it seems to me that the different aspects such as statistical analysis, checking Leroy 2010 scores, and so forth, are best done by those with expertise and interest in those areas. There’s no reason to demand that a single person or group do it all, or that it be done at the same time, or even in a certain order.

Personally, I think it will be pretty clear upon looking at 10 or 20% of the sites whether Anthony’s new Leroy 2010 scores are done correctly, and that evaluation of the statistical analysis should not wait for a re-scoring of all the sites.

Steve, you note that amplification is negligible over land in models with respect to long term trends. This is true globally in models, but I have to wonder the extent to which this effect varies (in models) from location to location. I wonder about this because a fee years back I empirically estimated the amplification factor globally based on interannual temperature fluctuations. I found it to be in the model ball park (perhaps a bit larger, actually) which implied a large warming bias in the surface data, a large cooling bias in the satellite data, some less large combination of those two, or an unknown real climactic effect on lapse rate variation that only operates on the long term and is absent from current models:

I was motivated to see if I could get a similar result for the US, so I compared USHCN data from NCDC with UAH data for the same area. Much to my surprise, the slope for twelve month smoothed and subsequently detrended data (UAH as X, USHCN as Y) indicated more variation of temperature at the surface: a slope of about 2.23 (2.32 if you don’t detrend). This leads to a very slight cooling of the surface relative to the satellite record adjusted to surface variation levels. This suggests to me that global trends are biased warm but there is not likely to be a significant bias in the US record.

So Climate Audit has new rules. My own experience was that
1) if you are impolite
2) if you have intellectually nothing to offer
you get snipped.
Seems this guy, who is and has, is treated differently.

The phrase says more about the the person who used it than it does about Steve, so leaving it in place could be a fair response
Steve: Precisely so. I expect regular Climate Audit readers and commenters to comment politely and am disappointed when they don’t. If someone does not comply with such policies, I prefer that people do not respond to such comments.

General practice is to post/send out preprints at the same time as submission because having a manuscript in good enough shape to post/send out means that it is also in good enough shape to submit. There could be a few days either way in general. People are using arXiv today to establish precedent for submissions because of how long review can take.

As Eli recalls that is what Berkeley did, and it is quite standard. What Watts did is post a draft, and a draft full of blunders.

You may not call it a “draft”, but the initial Berkeley release was rushed out filled with a generous amount of errors. Furthermore, it appears that the updated “manuscript in good enough shape to post/send out” is not quite so good enough. However, the BEST folks did not seem to be hindered from milking the media without indicating the publication status of the document.

RomanM,
It is true that we do not know the publication status of the BEST document. We could however interpret Mosher here as implying that McKitrick’s review may have been found to be “not quite so good enough” by the editors:http://scienceblogs.com/stoat/2012/07/30/cage-fight/

SOP at AGU journals now. If they think major changes are needed they reject with a suggestion to rewrite in view of the referee’s reports. It unclogs the pipeline. BEST has updated their web page to show that one of their papers has been accepted subject to some changes in the methods paper which is also under consideration.

This has been the polite language for rejecting non-viable statistics journal manuscripts as far back as I can remember. At that point, the paper is no longer under consideration for publication by the journal so it has indeed been rejected. Should the manuscript be rewritten, it is resubmitted as a new paper.

Your efforts in spinning facts based on concepts such as “it depends on what the exact meaning of the word rejected is” come across as comical…

Steve- I normally don’t snip critics but you’re making an untrue factual allegation here. I did not do “much of the statistical analysis” in the paper. I did not even see the paper until Friday; I did one analysis, which unfortunately did not catch a latent problem. It would have been more appropriate to acknowledge me than to list me as a coauthor, but unfortunately I did not catch this as the grandchildren were over visiting on Saturday night and Sunday morning and I missed some emails.

Not to beat on this too much, and Eli suspects that is what Tony told you and what he believed, but Christy is ALSO an author and was quite aware of the hearing. He could have been pushing Watts without showing his hand. Just sayin’but in any case the optics are awful.

There’s no reason Anthony and coauthors couldn’t now take it down, now that it’s been helpfully crowd reviewed. This wouldn’t be a retraction, since it was just a circulation draft in the first place. Leaving it up creates the impression this is a semi-official version.

Please keep in mind that Anthony’s pre-publication covers the USA 48 contiguous States (CONUS). Ultimately, we seek a world estimate of temperature change, with uncertainty quantified, plus more confidence in ascribing change to various factors.

In Australia (another 2% of the area of the globe) there has been discussion about writing a pre publication similar to the above. However, this might be impossible to do well, if at all. It would be even harder to do, for example, over the Antarctic.

The Watts pre publication deals mainly with 2 techniques. The first is microsite changes of some magnitude resulting from using methods attributed to Leroy(2010). In Australia, there are over 1,200 sites with temperature data, but there are also many sites where man has not placed any or many objects that can store and re-radiate heat in a way that the Leroy method could compensate. Conversely, there are many that could be analysed this way, but why bother when there are many without the complications?

The second major Watts technique relates to the ways that USA data are treated after collection. ‘The identified biases include station moves, changes in instrumentation, localized changes in instrumentation location, changes in observation practices, and evolution of the local and microsite station environment over time.’

Again, there are many Australian sites where many of these factors are absent. It would be perverse to seek out sites heavily affected by these to see if the Watts corrections work in Australia as well as in the USA, though a handful of important sites could use examination.

I raised Antarctica in these 2 contexts. There is a relationship between analysis method and population density of countries. There has been a good deal of prior work done in Australia, which has a land area similar to CONUS but 5% of the USA population. The work that is being done tends towards the conclusion that official estimates are inflated; qualitatively, it would not be surprising to find that more work on Australian data would give trend results similar to those reported by Anthony & Co. (The story Steig et al 2009 and its rebuttal paper by O’Donnell et al 2010 is well documented for the Antarctic.)

In short, one can select a number of pristine Australian sites to analyse for trends over the last 30 years. I have done some 45 of these and found a wide trend variation, from about +0.47 to – 0.27 degrees C per decade linear, far greater than the CONUS trends. Therefore, we seem to have a noise problem. If there is to be further development of statistics as Steve Mc has foreshadowed, for example to cope with TOBS, then it would be great if that statistical proficiency could also dig a little deeper into signal:noise topics.

Finally, the vast bulk of liquid-in-glass observations in Australia were made at 0900 hours, so TOBS can be selectively ignored.

Yes, the validation of historic temperatures is boring and not mentaly stimulating, but without a verified temperature base, there is not much point in using advanced statistics. That has been a criticism of BEST.

Since my instinct is that there has been a bit of warming, and one of my complaints has been that there seems to have been no non-risible estimate of it based on land surface measurements, I am glad to think that, after due criticism and correction (if required), there may soon be such a non-risible estimate.

As for what might have caused such a rise, Lord knows. Is there any evidence worth a hoot?

A (perhaps naïve) question: Do MMTS stations have the same TOBS issues, or do they record the 00:00-23:59 max/min temperatures properly in a way that doesn’t require a manual reset on some specific time of day? I ask because the Watts et al result for “Rural MMTS, no Airports” (see slide #45 and #52 in the “Overview” PPT http://wattsupwiththat.files.wordpress.com/2012/07/watts-et-al-station-siting-7-29-12.ppt) is the most eye catching of all, with practically no warming.

I’m pretty sure MMTS would have TOBS issues if there was a change in the time of observations. This shows that MMTS thermometers have to be manually read, and infers they’re capturing the same min/max over the last 24 hours.

It seems that this has changed recently, though: Anthony now has an update in his second discussion second thread where he notes that “With the advent of the successor display to the MMTS unit, the LCD display based Nimbus, which has memory for up to 35 days (see spec sheet here http://www.srh.noaa.gov/srh/dad/coop/nimbus-spec.pdf) they stopped worrying about daily readings and simply filled them in at the end of the month by stepping through the display.”

As I said earlier a conversion from an old style MMTS base station to a newer one may introduce another TOB issue – where you go from a morning reading to a midnight one. Is this undestood and accounted for? It should introduce a warming bias.
Steve; this issue is well known to specialists.

I can see why SteveM is a little peeved. If one is going to be involved in a project one should be there from the beginning even if your contribution is small and towards the end.

Anthony Watts and his ‘team’ made an enormous effort to get to where they are now. They have seen NOAA/NASA take their data and publish a rebuke long before any reasonble conclusions could be made and even before the data was complete. The dirisible comments here from the lukewarmers at Lucia’s site and the trolls from RC are, IMHO missing the ‘trick’ that Anthony and his team are doing.

They have made public the body of a paper which has not yet been submitted for publication. Whether by design or accident ( and only they know) this has allowed them to refine the body of the paper and pick up possible data options which will make the paper more ‘airtight’. If you are not a GW supporter you need your work to be very much more precise, accurate and without error than if you are not a ‘team’ member. As we have seen from the likes of Mosher, owzyafarther etc non team member papers are treated in much more rigorous manner than otherwise would be the case.

As a senior project manager I was guilty of using and abusing people like SteveM because their input(s) are guaranteed to aid the success of the project which was always my primary target. The people I abused in this way were not always happy about it and that led to a reduction in their valuable work. There is a balance to be struck, I now realise, and I hope that AW and SMc will find a way. This work is too important not to be completed with its’ best possible outcome.

For me; I think a public apology to SteveM (I suspect that Steve doesn’t need it) might go a long way.

Steve: Anthony and I have chatted. There was a misunderstanding due to the rush and the time zone and my grandkids staying overnight. I signed off on Saturday dinner and we went out them in the morning. I missed some emails in the evening and Sunday morning until later. I would have suggested an acknowledgement. But again, in the rush, I missed something that I wouldn’t normally miss and I’m very annoyed at myself.

Anthony: Steve created an entire section, and in fact referred to it as “my section” in emails. An acknowledgment would have been insufficient in my view.

As much as I regret that I have to wait longer for the promised report on Esper 2012, I have a feeling that a serious look at TOBS might be rewarding. It might just show that US-meteorologists did a good job during the last 30 years (I am talking about the people using the equipment), and that TOBS, as is should be in the age of the computer and automated measuring, is no serious issue anymore.

This entire debate about TOBS is a red herring, and here is why: if the difference between the raw data and the final product is mainly due to the TOBS adjustments then you would expect to see the similar raw trends for rural and urban stations, and for “compliant” and “non compliant” stations, don’t you?

However, that’s not the case at all. The raw trend for all the 1,2 rural stations, airports excluded is 0.108 c per decade, three times lower than the reported official trend. And when you take into account just the MMTS rural stations without airports the trend is 0.032, essentially flat! In the same time the raw data for the non-compliant 3,4,5 class stations show 0.212 C and for all stations, urban and rural 1,2 – 0,155, for the class 3,4,5 stations raw 0.246 and for all stations adjusted 0.3. How come that TOBS adjustments for the good and rural stations are so much higher than for the bad and urban ones? Obviously the bulk of the difference between the good and bad stations has nothing to do with TOBS. And the MMTS rural 1,2 stations with the trend 0.032 are the only ones which are relevant for assessing the real climatic warming. So the entire fuss about TOBS is beside the point.

From the paper:

“The gridded average of all compliant Class 1&2 stations in the CONUS is only slightly above zero at 0.032°C/decade, while Class 3,4,5 non-compliant stations have a trend value of 0.212°C/decade, a value nearly seven times larger. NOAA adjusted data, for all classes of rural non-airport stations has a value of 0.300°C/decade nearly ten times larger than raw data from the compliant stations.

These large differences demonstrated between regional and CONUS trends accomplished by removal of airports and choosing the rural subset of stations to remove any potential urbanization effects suggests that rural MMTS stations not situated at airports may have the best representivity of all stations in the USHCNv2.”

Question: Why then the same paper trumpets 0.155 trend as relevant, when it includes both airports and urban stations data, as well as the measurements made by the older and less reliable equipment?

then you would expect to see the similar raw trends for rural and urban stations, and for “compliant” and “non compliant” stations, don’t you?

No, because TOBS biases are much more important for rural stations than for urban stations, and rurality in turn happens to be correlated with station quality. That’s the whole point of the “confound” term used by Steve in the post.

First, the “good” stations could be equally urban and rural, depending upon micro-siting. Quality does not have anything to do with urban vs rural.

Further, why is TOBS more important for rural stations? You mean, the “real” climatic trend is larger than the raw trend at urban, poorly placed stations? And that more urban and more poorly placed the station is, the less likely it is to experience the problems with TOBS?

Finally, if the rural stations are generally more likely to have TOBS issues than urban ones, whether then the non-airport rural stations are also more likely to have TOBS issues than the airport rural ones (since the trend at the rural airports is three times higher than at the non-airport rural stations!). And also that the MMTS rural stations have much greater problems than CRS rural stations, since the MMTS trend is about flat, whereas the CRS trend is 0.108 C per decade? Man, that would be really a fine-tuned intelligent design, made by God in order to preserve the global warming hype! 🙂

To summarize your and Steve’s argument – the fact that the good and rural stations show almost no warming trend whereas the badly placed and urban ones show huge warming is not a consequence, as one might think, of the latter being affected by, you know, UHI, but, au contraire, of the former not being “properly adjusted”. I knew there must be some logical explanation.
Steve: please do not presume that I’m overlooking obvious points. Rural good stations whose TOBS changes from afternoon to morning have measurably lower trends than rural good stations with no TOBS change. You can’t ignore this merely because it gives a result that you “like”.

How about rural good stations with MMTS having substantially lower trend that good rural CRS (if I am not wrong, the MMTSs do not have any TOBS issues, since they automatically record the highest and lowest temperature for a given day?)? Or rural airport stations having 0.240 C trend, whereas non-airport rural ones having from 0.032 to 0.108, depending upon location and measurement technique? Is that also due to TOBS changes? I am not claiming that the TOBS adjustments are irrelevant, I just don;t see how they could explain those differences.
Steve: TOBS is a different issue than automatic recording. It affects MMTS as well. I don’t know why you’re arguing in such categorical terms. I didn’t say that TOBS accounts for everything. Only that it is a confounding factor that needs to be disentangled and it wasn’t. The statistical analysis needs to be re-done. It will be re-done.

ok, I support your effort to recalculate. But the reason why I was and am skeptical is following: we have data of various degrees of validity according to several independent criteria. The best data available have the lowest trend, and, lo and behold, as you go further to less and less reliable data ,the trend increases. Occam’s Razor seems to favor the elementary explanation that urban and poorly sited stations have artificial warming bias, especially visible at airports. This relationship in the data is so obvious, so overwhelming, that it seems to me exceedingly unlikely that it could be just a product of some simple non-adjusting error. TOBS problem should “target” with such a precision only the good and rural stations, and so many factors should coincide, in order to make the TOBS a significant factor. The MMTS rural, non airport data should have dramatic TOBS problems, but not the rural MMTS airport stations; 1,2 compliant urban stations should have TOBS issues, but not the 3,4,5 class urban stations; MMTS rural stations should systematically have more TOBS problems than CRS. the Airports in general should have much less problems than the non-airport stations. And on it goes. What is the basis for these expectations/predictions? Belief that all those factors could operate in harmony borders on religion.

Ivan, it would be worth your time to look into a sample of actual stations in order to better understand how temps were recorded in real life. There are many sources of non-uniformity that can correlate with station quality or location.

Here are a few that I’ve seen:
1) High-quality rural stations are often run on an individual basis rather than an institution (e.g. farmer volunteer vs. airport office). If a single individual is responsible for the readings in the former case, TOB may take a big jump as that responsibility passes from person to person; in the latter case, there are often “official” procedures in place specifying TOB.

2) The MMTS (the “first” version, not NIMBUS) is not necessarily “better” than the manual liquid thermometer/Stevenson screen system. For example, IIRC the MMTS has to be manually reset each day to clear it for monitoring the next day’s min/max temps. In addition, the MMTS outdoor sensor is physically connected to the indoor control box via a cable. The default cable shipped was of limited length, and so greatly limited the siting choices for the MMTS temp sensor vs. the connection-free earlier manual liquid thermometer.

you are just illustrating my point. It is one thing to say that the rural stations may not be as reliable as some people think,because of the general problems with personal you note, but quite another to claim that the data must of necessity exhibit a strong cooling bias! Why changing persons would necessarily so alter TOBS as to increase the trend? Why not decrease? What is the basis for believe that there exist ANY bias on that account, let alone, a warming bias? Various changes could simply cancel each other out.

Also, your second paragraph only strengthens my case; you are pointing out that there is likely a warming bias in MMTS.

Ivan, all I’m saying is that before making claims and assumptions about what is likely and what is not, one needs to look at the actual data.

There are many possible sources of bias, some of which may be issues in practice, and some of which may not. For those that are issues, they are not necessarily random, but may be systematic. One really needs to look at the actual data and metadata in order to get a feel for the specific situation — the data awaits you!

This is certainly an interesting discussion — is TOBS important? Is the temperature record a time sink? Is the temperature record accurate? If somebody asked me if the temperature record was accurate — I would ask “Which one”?

Analyzing an clarifying the temperature record may be a time sink. However, if I understand correctly the paleoclimate records are calibrated against these many temperature records — and not everybody chooses the same record. (Correct me if I am wrong.) So my understanding is that the most meticulous paleoclimate record could be rendered worthless by calibrating against a temperature record of dubious accuracy. If one has a paleoclimate series of dubious accuracy plotted against a temperature time series of dubious accuracy — what exactly was shown by all this effort?

…and I did not even ask about TOBS yet. I read Anthony’s latest comment (in his second comment thread) on how the data collection was done and I agree with his general premise that corrections for TOBS might no be as revealing as some imagine. His comments on the data collectors and the methodology match what I know about problems in other areas — that are far less problematic that a partly volunteer data collection network.

I guess those are the sort of things a simple layman (in climate science) might ask of the experts. I hope this makes sense to others.

Steve: the temperature data issues are not really relevant to paleoclimate calibration as the uncertainties and issues are not germane. I begrudge the time because the differences in dispute are IMO rather small, but the data sets are large and complicated.

I think that the newest records done with the “memory” stations could be high value — a personal opinion. I think Anthony makes the point very well that the confusion with TOBS might not invalidate previous records but perhaps degrades their worth to some unquantifiable degree. Much of the confusion appears to be due to human nature and the fact that people did not record data at the same time every day due to what can only be explained as human nature. There appears to be no way to untangle that portion of the data. Maybe somebody can develop a statistical test that proves when the recorders were visiting their grand kids or grocery shopping. As much as I respect our host I suspect that developing that test is beyond even him — perhaps even Drs. Hansen and Mann.

However, Anthony does point out effectively that some of the recent data could be just fine and perhaps should become the gold standard for judging the remaining data. Since the change to the new record keeping thermometers in the 1980s (I think) maybe there is not enough data to perform any reliable calibrations.

I should clarify one point. Every project that I am currently working on professionally came about because of the Global Temperature Record and the “proof” of AGW. Every project that I am not working on was cancelled because of the same proof in the sense that projects were cancelled because of extra costs in resource exploration and extraction. IOW — It would be difficult to argue that extra costs and at least some of the economic downturn did not occur because of concerns about GHG emissions. Bad economy — no projects. Bad economy for many reasons of course — but there is a contribution in my area from this particular debate. Call this debate a contributing factor of some undefinable proportion and leave it there please. It’s not the point of this discussion.

So does this discussion affect me? Rather directly I would say — right in the pocketbook — at least in my current projects.

“1.A quality control procedure is performed that uses trimmed means and standard deviations in comparison with surrounding stations to identify suspects (> 3.5 standard deviations away from the mean) and outliers (> 5.0 standard deviations). Until recently these suspects and outliers were hand-verified with the original records. However, with the development at the NCDC of more sophisticated QC procedures this has been found to be unnecessary.

2.Next, the temperature data are adjusted for the time-of-observation bias (Karl, et al. 1986) which occurs when observing times are changed from midnight to some time earlier in the day…”

——————————-

The first step should already remove some of the error due to double counting tmin or tmax measured at critical times such as 7 am in the morning or 2 pm in the afternoon. It actually removes those cases with the largest contribution to the TOBS adjustment – double counts with a large difference to the true value. I don’t see any reduction of the TOBS adjustment for errors already removed in step 1.

I see that a comment I posted has been moderated away. In it I said, the paper was the result of the efforts of volunteers without govenment or university grants or other aid. That you, Steve McIntyre, would do your usual thorough and object analysis of the TOBS issue, and wondered if TOBS would be an issue for all classes of stations which might not change the qualitative results of bias resulting from poorly placed weather stations.

If I inadvertantly violated blog protocol or offended someone, I certainly am sorry since that was not my intention. That being said, I am curious why my comment was removed.
Steve: If people are going to participate in public debate, their work should be held to professional standards. I deleted the comment, because I don’t want to support excuse making.

1. Did you double check the ratings or just put the data in an algorithm?
2. Since the new site ratings seem to depend up some manual labor done using Google earth: Did you have occasion the do a spot check on the accuracy of those ratings?
3. Since Wickham made her station list available to you prior to submission, will you make your station list available to others?
4. Why did you stop at 2008?
5. What you say about amplification here differs with what you wrote in the paper.
6. What does a comparison with CRN show?
7. You use USHCNv2 metadata to classify rural/urban. Did you check that? Do you accept that definition of rural?
8. The how were grid averages computed?

Reading the rest of thread, I’m at a loss to find cogent answers to any of them. Anyone?

Steve: better to ask Anthony. As I mentioned in the post, I was not involved in the writing of the paper other than contributing a rushed statistical analysis that unfortunately exacerbated the TOBS problem. Anthony was trying to be polite by adding me as a coauthor, but an acknowledgment would have been appropriate. I was offline on Saturday night and Sunday morning as our grandchildren were visiting and didn’t deal with this issue and it got overtaken by the rush. An unfortunate misunderstanding. However, since I’ve had my fingers burned, I’m now chipping in and trying to ensure that the matter is dealt with correctly. In the meantime, please don’t grind at me as though I was the person who did the work. I can inquire on some of these issues, but that’s all that I can do.

The TOBS adjustment is based on the idea that for a once-a-day observation of the minimum and maximum temperatures of the previous 24-hour period, the closer the observation time is to the typical time of minimum or maximum temperature in the diurnal cycle, the more likely it is that there will be a duplicate or near-duplicate reading for two days. Averaged over many days, this could artificially bias the minimum or maximum to be more extreme, if not corrected for.

As the trend in the US rural stations, which at least until very recently employed these min/max stations, has been from early evening observation (5pm or 7pm in most of the sources I’ve found) to early morning observation (usually 7am), this has been presumed to put an artificial cooling bias into the temperature record, so a net positive, and increasing as more stations have been converted, correction has been added to the raw data.

The more recent 7am readings have been very close to the typical sunrise minimum daily temperature, the problem of “duplicate” minimum daily temperatures should be particularly acute. However, I have seen anecdotal reports that many of the volunteer COOP weather observers have long been aware that this would yield bad daily readings on many days, and so would reset the minimum reading in the early afternoon so that the next morning’s reading would always show the low for that morning and not data from the previous morning.

Now, I realize that the plural of anecdote is not data… but it seems to me that it would be quite straightforward to evaluate properly if this is the case for individual stations. For a station, when the time of observation is shifted from early evening to early morning, the number of “duplicate” minimum readings in the raw data should increase greatly if there is no other resetting. Probably the best measure would be to compare the lag-1 autocorrelation in the minimum readings before and after the change. If there is no significant increase in this metric, the adjustment may well be unwarranted.

Mike B at Tamino’s blog has pointed out that substantial chunks of the current draft (Sections 2.1 and 2.2 in particular) are word-for-word identical with Fall et al 2011. These will obviously need revising before publication to avoid accusations of self-plagiarism. Just a heads-up for Steve M, assuming he wishes to stay a co-author on the final paper.

(1) Fluctuations are generally likely to be greater, the colder the average temp one is dealing with. So, tropics: tiny fluctuation. Temperate zones: moderate fluctuation. Arctic zones: huge fluctuation of temp. Thus during the warming-earth period we’ve seen during 1979-2008, there is good a priori reason that min temperatures should be seen to rise more than max temperatures. Likewise, during a cooling-earth time I would expect the opposite.

(2) So what should be paid very close attention to, is records where the overall max trend appears to be zero. If in such cases, min trend shows at all, these should be regarded with suspicion as likely to be an artefact of land use changes. There may be statistically rather few of these. But we have to make use of what we have, and if something shows up that looks likely to be significant, we have to find another way to winkle it out of the other effects on all records.

(3) The correlation between station dropout and temperature rise, that Ross McKitrick showed, should be investigated. As noted above, Arctic temperature fluctuations are far greater than temperate zone fluctuations – and this may play into Ross’ results.

(4) IIRC, Roy Spencer and Andrei Ilarionov showed that the UHI effect on trends is far more marked on very rural areas becoming slightly less rural, than on urban areas becoming even more urban. A significant effect, that might escape detection by even Leroy 2010?

In the nuclear industry, the analysis product itself, and the process used to produce that analysis product, are viewed together as being one unified thing. Likewise, the raw data and the interpreted data, and the analysis methods used to analyze that raw and interpreted data, thus producing information, are also viewed together as being one unified thing.

Within the nuclear industry, in viewing the fitness of an analysis product for its intended purpose, any deficiencies found in the raw data, in the analysis methods, in the interpreted data, or in the overall process used to produce that analysis product mean that the analysis product itself is deficient. If so, the analysis product is sent back for repairs and rework, including repairs to the process methods, as necessary.

This is why, for example, it took twenty years and cost fifteen billion dollars to study and analyze the Yucca Mountain site for use as the nation’s high level nuclear waste repository. Moreover, a good portion of that fifteen billion dollars was spent documenting all facets of both the process methods and the analysis product so that it was all totally accessible and totally transparent.

The validity of the surface temperature record of the United States, and the suitability of its employment in climate science, is arguably a more important public issue to be properly addressed than is the suitability of the Yucca Mountain site for storing the nation’s nuclear waste.

And yet no one proposes spending fifteen billion dollars to properly study the validity of the surface temperature record of the United States. The job of looking at important problems in the surface temperature record, problems which are not yet properly addressed by the government agencies responsible for that information, is instead being left to ad-hoc associations of volunteers acting in the public interest.

Ad-hoc or not, the volunteers who have done this important work must be held accountable for the quality of their product. If the product is deficient in terms of process and/or content, it has to be sent back for repair and rework.

In today’s wired world, it has never been more true that an ounce of prevention is worth a pound of cure. It is plainly obvious that a disciplined internal peer review of the Watts et al paper prior to its public release would have prevented much of the controversy that is now developing about it. Haste makes waste, in other words.

While I can understand Steve McIntyre’s irritation with revisiting the statistics due to the TOB issue, I feel many in the climate blogsphere are investing undue hope that this may invalidate Anthony Watts’ paper. Perhaps because of the amount of time people have spent trying to tease a climate signal from low resolution meteorological data , or the large body of work resting on this dubious foundation has confused some into thinking that Anthony Watts needs to calculate a climate signal using alternate methods. He does not actually have to do this for this paper to have an impact.

Some may recall the shrill demands that Steve McIntyre produce his own proxy temperature reconstruction when he brought the infamous hockey stick into question. All that was actually necessary was to demonstrate that there were one or two issues with short centring proxy data before principal component analysis. Issues like short centring red noise prior to PCA also produces hockey sticks.

All that Anthony Watts need demonstrate is
1. The amount of metadata required to turn a low resolution meteorological record into a climate record.
2. That this metadata is unavailable, has not been used or has not been used correctly in the surface temperature products on which much of climate science has been based.

In criticizing Richard Muller’s claimed flip from skeptic to believer in CAGW and other behaviour, Rabett says “This ain’t saying that the BEST project was useless, they have developed some interesting methods, and pushed the surface temperature instrumental record back somewhat. It wasn’t that others were unaware of such records, but the level of trust was, let us say, about where Michael Mann stands in Steve McIntyre’s mind.”

Watts did make a mistake, but Muller doesn’t get a free pass. He and his team promised to be transaparent. If you go to the BEST web site and download the code, the associated readme file says the code isn’t cleaned up and may not work. Obviously in the latest paper, according to peer review, not all the methods used were elucidated. So let’s not beat up Watts only here. Muller hasn’t lived up to his promises, either.

One more strong indication that this entire TOBS panic is probably a waste of time (or should I say – a deliberate diversion).

Roy Spencer developed his own surface temperature index for the USA 48, by using only the stations which have a homogenous method of taking data over time (four measurements per day) and which are hence completely free of any TOBS biases. In addition, he is correcting the data for urban heat bias by the so-called population density adjustment. His thus obtained linear trend for 1973-2012 is 0.145 C per decade.

Now, Spencer does not correct for the two things that Watts does: the micro-siting issues and the airports. If he had done that, the overall trend would have been likely much lower. The 1,2 “compliant” stations typically have 2 times less warming than the 3,4,5 non-comliant stations. Assuming a random selection of good and bad stations in Spencer’s index, we should expect the trend to fall substantially when corrected for micro-siting warming bias (because the 3,4,5, stations are much more numerous that the 1,2 stations).

So, in all likelihood, the real climatic trend at TOBS bias-free stations is almost certainly lower than 0.150 C (reported trend in Antony’s paper) and likely much lower than that. If anything, Antony has significantly EXAGGERATED the real climatic trend.

I read Watts as saying he would release raw data, I don’t remember when (e.g. soon or when the paper was submitted to a formal publication). I have the impression they were going to post a few more things when they recovered from the main effort.
But A. Scott has herein pointed to the root raw data is already publicly available, that Watts listed sources, and that Watts has explained his method, including reference to a published station evaluation criteria. Ideally, the earlier verifiers/critics start in the data chain the better. But hopefully solid data is generated to build further on – Watts is in effect claiming (unlike Muller) that the data is not solid, specifically that there are serious errors/omissions.

So the effect of time of observation effects on the data needs more work – did others like Muller even mention it? (Though “Ivan” downplays the significance of TOBS, others disagree, seems a secondary factor.)

Hastiness has been acknowledged, unlike others who can’t acknowledge fundamental errors in their work. Lesson may be to avoid chasing the schedule of sleazy types like Muller who released a paper so flawed his own team and rabid alarmists like Michael Mann have panned it. (Though Mann may be paying back for Muller having pointed in 2004 to McIntyre & McItrick breaking his hockey stick.)

Seems to be good technical discussion amongst the blather here.
As for knowing only the current state of each station, that is an important step. It facilitates substantially answering the question “Is this station’s data accurate today?” If the answer is no the station’s data should be removed from the database until the question
“how quickly did the station get to that inaccurate state” can be answered. (Presumably very few cases where the environment has improved, such as a walkway or road torn up after re-routing of them.)
At least some of the answer can be obtained by a huge amount of slogging through construction records, news reports (campus newspaper, newsletter of host/data-collection organizations, etc.), and old photographs. Perhaps somewhat amenable to automation (modern search and image-recognition techniques if data is in computer format, otherwise additional cost to photograph/scan). Cash to manage the research can be mailed to me at …. 😉

More seriously, personally I am skeptical that much of the fussing over instrumental temperature analysis is worthwhile toward the goal of predicting change thus facilitating preparation. I suggest it is even less worthwhile in the blame-humans debate, as satellite temperatures, estimation of temperatures in past centuries and millenia, calculation of human contribution to CO2 increase, physics of CO2’s effect on heat flow in the atmosphere, and inaccuracy of theories (“models”) are far more important. So I’ll leave slogging through history of surface stations to those criticizing Watts’ work. 🙂

Re “Curt Posted Aug 1, 2012 at 7:38 PM”
Please elaborate on how an observation close to the time of minimum and maximum can cause a bias.

I do expect that an observation of an automatic min-max logging device close to minimum and maximum could be in error as the min or max may not have yet been reached.
The common case of that would be changing weather, a chinook being an extreme example (temperature can rise or fall at a rate of several degrees per hour.

I am wary of minimums and maximums, I don’t see how they represent climate for the debate over causes of change. (They are of value to activities sensitive to temperature, such as damage to plants or equipment from freezing or overheat. But intuitively to me an integration is better, because the main debate is over heat being trapped.)

(As for quick temperature change, I’m remembering the story from Transair, an airline that served Whitehorse YT out of Winnipeg MB. One day they had to hurry to get everyone on board and takeoff, because the temperature was dropping toward the 737 Classic’s authorized ground minimum of -55F, IIRC. That area of the Yukon is often the coldest in Canada, though IIRC even -55F was not common there.)

11 Trackbacks

[…] – indicate a warming over the U.S. closer to NOAA’s estimate. This point was raised by ClimateAudit blogger Steven McIntyre: “Over the continental US, the UAH satellite record shows a trend of 0.29 deg C/decade (TLT) from […]

[…] 24 different models). Note that McIntyre is a co-author of Watts et al., but has only helped with the statistical analysis and did not comment on the whole paper before Watts made it public. We suggest that he […]

[…] However, if he is so distinguished, why does he feel it necessary to rely upon Watts et al (2012), which the esteemed Professor apparently co-authored? Whatever the extent of Christy’s actual involvement, this unpublished paper is now receiving significant constructive (but very damaging) criticism; and being disavowed by one of the other high-profile co-authors – Steve McIntyre. […]

[…] – indicate a warming over the U.S. closer to NOAA’s estimate. This point was raised by ClimateAudit blogger Steven McIntyre: “Over the continental US, the UAH satellite record shows a trend of 0.29 deg C/decade (TLT) from […]