The UK Met Office “Subset”

Around Dec 8, 2009, the UK Met Office released “value added” data for a “subset” of 1741 stations – see here, describing the release as follows:

The data downloadable from this page are a subset of the full HadCRUT3 record of global temperatures, which is one of the global temperature records that have underpinned IPCC assessment reports and numerous scientific studies.

The data subset consists of a network of individual land stations that has been designated by the World Meteorological Organization for use in climate monitoring. The data show monthly average temperature values for over 1,500 land stations

In question 7 of the webpage, they asked and answered rhetorically:

7. Why are you releasing a subset of the data now?

We can only release data from NMSs when we have permission from them to do so. In the meantime we are releasing data from a network of stations designated by the World Meteorological Organisation for climate monitoring together with any additional data for which we have permission to release.

Today, I’m going to do a quick analysis of the Hadley subset, which has some interesting attributes.

A “Subset”?
First, I checked whether the Hadley Subset is actually a subset of the CRU station list archived in response to Willis Eschenbach’s FOI request.

The metadata for the Hadley Subset states that the “source” of the data is either “Jones” or “Jones+Anders” (Anders Moberg being the coauthor of Jones and Moberg 2003, the “peer-reviewed” publication of CRUTEM2 in the litchurchur.)

What is one to make of this inconsistency? My surmise is that the CRU stationlist provided in response to the FOI request must have been inaccurate in respect to the omitted stations, though it’s also possible that there were some changes between 2007 and 2009. I strongly doubt that the Met Office obtained and collated station data from third party sources not via CRU.
Any Additional Data for which we have Permission to Release
The Met Office said that the Subset was a “network of stations designated by the World Meteorological Organisation for climate monitoring together with any additional data for which we have permission to release”.

CRU said that they would seek permission from national meteorological services (NMSs) to release station data. The Met Office didn’t report on the progress of this supposed program. However, if the Subset includes “any additional data for which we have permission to release”, as the Met Office has represented, then this should be evident in the station lists as follows: all stations in the CRU station list for an agreeing country would necessarily be in the Subset station list.

I compared the two lists by country. (This comparison is hampered by the appalling sloppiness of the country designations in both CRU and Met Office station lists. I know that climate scientists like to make fun of taking care of such details, but it doesn’t take that long to allocate 1741 stations to consistently spelled countries and it’s the sort of thing that’s helpful if you’re administering a set of confidentiality agreements. The 2-digit codes are related to countries, but don’t yield a precise matching.) In any event, I spent an hour and a half or so making a consistent allocation to countries and the Met Office is welcome to the results.

Denmark has been in the news lately. So let’s examine whether all the Danish CRU stations are in the Hadley Subset. Alborg and Koebenhavn are in the Hadley Subset, while CRU stations Vestervig, Tarm, Bogo, Hammerodde Fyr, Nordby and Tranebjerg aren’t. If we can rely on the Met Office statement that they included “any additional data for which we have permission to release”, this means that Denmark has not yet provided permission for release of 6 stations. (It’s a little odd to think that they would have consented to the release of 2 stations and not the other 6, but hey…)

Similar puzzles abound in the Hadley Subset. Other countries in a similar situation to Denmark – with some stations released and others not – include Norway, Sweden, Finland, Ireland, Iceland, Greenland, Netherlands, Switzerland, Spain, Portugal, Germany, Canada, USA, Australia, Russia, India, China … Actually most countries in the world.

The countries with complete CRU inventories in the Met Office Subset is a much shorter list: UK, France, New Zealand, some small countries like Latvia, Lithuania. The most surprising aspect of the list is that is dominated by Third World and especially African countries: Laos, Vietnam, Ecuador, Tunisia, Guinea, Senegal, Niger, Mali, Ethiopia, Somalia, Eritrea, Kenya, Uganda, Tanzania, Chad, Liberia, Cote d’Ivoire, Ghana, Benin, Zimbabwe, Zambia, Madagascar, Namibia.

It seems surprising that these countries would have answered the bell for permission to release data so much faster than European countries.

Reviewing the Met Office statement on the release:

In the meantime we are releasing data from a network of stations designated by the World Meteorological Organisation for climate monitoring together with any additional data for which we have permission to release.

My interpretation is that the “additional data for which we have permission” has made a negligible contribution to the Hadley Subset – indeed, it looks possible that the only “additional data” in the Hadley Subset comes from Britain itself (plus possibly France, New Zealand.)

The Designated Network
This raises an interesting question about the “network of stations designated by the World Meteorological Organisation for climate monitoring” and the permissions attached to this network, as it doesn’t appear that new permissions have a material impact on the release.

The Met Office did not provide a reference to a document describing the “network”, linking only to the WMO. Googling the phrase “network of stations designated by the World Meteorological Organisation for climate monitoring” leads to the GCOS network – a list of 1025 stations is here – and perhaps this is what the Hadley Center has in mind.

There is considerable overlap between the Hadley Subset and the GCOS network but neither is a subset of the other. In a first cut matching effort, I compared the 5-digit identification of the GCOS network to the first 5 digits of the Hadley identifier and matched 760 stations, leaving 981 unmatched. There didn’t appear to be any particular country pattern to the residuals.

Conversely, from the 1025 GCOS stations, there were the 760 matches leaving 265 unmatched.

This leaves a number of puzzle-type questions about the Hadley Subset:
– how were these selected?
– why is the Jones version of GCOS stations not considered confidential?
– why is the Jones version of non-GCOS stations in the Hadley Subset not considered confidential?
– why was this not made available last summer?

I’m not suggesting that climate science stands or falls on these questions. For now, they are merely interesting little puzzles.

Update (10 pm Dec 27). A reader observed in the comments that the Regional Basic Climatological Network (RBCN) dataset see here datalist ftp://ftp.wmo.int/wmo-ddbs/RBCN_DEC2009.xls has a much closer genetic relationship to the Hadley Subset than GCOS. Ive confirmed the closer relationship though not all problems are resolved.

The CRU data set (FOI version) contained 4158 stations of which 1521 were matched in the Hadley Subset and 2617 unmatched.

Of the 1521 Hadley Subset matches, the first five digits matched RBCN WMO numbers for 1477, with 44 unmatched. The 44 unmatched were from only a few jurisdictions: UK, France, New Zealand (three jurisdictions mentioned in my notes above) plus one from Liberia and one from Bahamas.

Conversely, of the 2617 CRU FOI stations excluded from the Hadley Subset, 145 were matched in the RBCN data set and 2472 were unmatched. The 145 stations matched in the RBCN data set but not included in the Hadley Subset were mostly from USA, Canada, Mexico and South Africa, with a few stragglers from Mongolia, Panama, Dominica, Israel and one from Russia (or as CRU refers to it – USSR). [Update Dec 28 – a reader observed below that these 145 stations all had a non-zero sixth digit. If an extended 6-digit RBCN identification is defined using the SUBINDEX field, there is not a match at the 6-digit level for 144 of the 145 stations. Using 6-digit identifications, of the 2617 FOI exclusions, there is only one common identification in the RBCN data set – 718011: St John’s West CDA in RBCN and St Johns UA in CRU).

Lets hypothesize for now that (1) the Hadley Subset consists of CRU stations in the RBCN network; (2) additional permissions to date are restricted to UK, France and New Zealand . Then the failure to include the 145 excluded matches might be sloppiness; or perhaps a different RBCN version; or perhaps we havent quite got the right network. [Update – as noted above, a reader has clarified that the 6-digit identification resolves the 145 stations.]

As noted in a prior post, the webpage stated:

The stations that we have released are those in the CRUTEM3 database that are also either in the WMO Regional Basic Climatological Network (RBCN) and so freely available without restrictions on re-use; or those for which we have received permission from the national met. service which owns the underlying station data.

I remain puzzled as to how the RBCN network would supersede the supposed confidentiality agreements.

As they say – the Devil is in the detail, and the more the Met Office shares, the more peculiar it becomes.

One observation – since Somalia hasn’t had a functioning government for several years and is generally considered to be a political basketcase – it’s wonderful that they responded to the Met Office request 😉

Somalia: Permissions may have been given long, long ago. And once given, one would think that are in effect until rescinded.

The whole issue of permissions, to me, seems ludicrous. I recall some mention of them having some monetary value, but I fail to even imagine who would think temp data has value other than scientific, and if science has come to charging money for data, the scientists aren’t any better than the corporations they revile.

The met stations are governmental facilities, aren’t they? What government sells data? Is that a common thing? Anybody?

NIST in the US has certain rights by law to sell reference materials and reference data so as to help it recover some of the costs of assembling and supplying the reference materials and data required by industry. But it is the only US agency I’ve heard of that has that power.

It may be that stations with too short a record are not used. I would also bet that many of the available stations out there (like WMO) are “lost” to CRU/Hadley, as we have seen in Russia. For these they don’t need permission since they already have it or the countries have posted online, just an internet connection (someone please tell them about the internet…).

The Swedish “NMS” is called SMHI. Here is a link to the data that they make available on the web, there is some kind of license agreement there as well, it (more or less) says it’s allowed to use the data in a non-commercial way.

Maybe it’s possible to complete the data for Sweden by downloading the numbers from there. Or at least increase the understanding in CRU classification of IP data and non-IP data for specific countries.

It could be Sir Humphrey, but they seem to lack his subtely, I have made an FOIA request as to who authorised the Met Office funding for the activities in collecting the signatures of the 1700 or so scientists, and how much it had cost. I received a reply which included a statement telling me that there was overwhelming and growing evidence of AGW etc.etc.

They went on to tell me who the petition was for, who had authorised it and how many wo/man hours of time had been spent on it be Met Office staff.

Unfortunately, I cannot share this information with you because the note below was added to the email from the Information Officer:

“The information supplied to you continues to be protected by the Copyright, Designs and Patents Act 1988 (the Act). Unless specifically permitted by the Act, any reproduction of the information, in whole or in part, requires the permission of the copyright holder.”

I have written back to them asking for some minor clarifications, and explaining that my request was not an attempt to smear any scientists anywhere and was solely to do with the use of public funds by a public organisation that had no mandate to use the funds in that way.

We do have some fair use provisions in this country. ‘Unless specifically permitted by the act’ sounds very narrow but is actually incredibly broad when you realise that it includes such concepts as ‘criticism’. In any case, the information, once rephrased, is not subject to copyright.

It is indeed odd that the Hadley Subset has an assortment of weather station omissions in many cold northern countries and yet a complete set of weather stations in many warm southern countries. Surely this is just a fluke?

I was being somewhat sardonic — tongue planted firmly in cheek. Your point to the lack of stations in many third world nations, including the general lack of stations in most of the southern hemisphere compared to the northern hemisphere is spot on.

I expect the CRU will be forced to explain in the very near future why they included some stations and omitted others when the station data was available to them. There may be good scientific reasons for exclusions (station moves, missing data, equipment divergence, TOB, elevation, and so forth); it could even be simple oversight. But what ClimateAudit readers want to know is this: is it Mannian type statistics at work to keep the temperature plots “on message”?

Is there any logic at all – scientifically speaking, not “keeping the message clear” speaking – for leaving out ANY met stations, ANYWHERE? Especcially since there are so many fewer stations than 25 years ago. What could possibly be the justification? I haven’t heard any reasons at all, much less good reasons.

A now-deceased meteorologist friend pointed out to me in 1999 or 2000 the drop in met stations, and he was laughing derisively at it, saying that at a time when we want more an more data, it is beyond belief that they would have been eliminating stations. RIP, Norm…

I am wondering if it would be appropriate for CA readers to contact the authorities in each of the countries with missing data, in the local language. If we ask politely maybe we can have the data in the public domain or at least know what are the actual issues/licence polices being applied country by country, and who we have to persuade.

If the idea is liked, we should produce a standard letter, as we do not want to appear to be aggressive or making excessive demands on time due to multiple slightly different requests, particularly as I suspect at least some of the data is available, just not used/collated properly, and in some cases the METDATA so wrong/incomplete the record can not be identified.

So, while doing this, I realized that the GHCN v2.mean contains multiple records for a single station id. So far, I have been just editing this by hand. Is there a preferred method for merging these multiple records for a single station id into a single continuous series? Some public script I don’t have to reinvent?

Steve: Look back at some of the 2007 discussions of GISS here and at chiefio. The combining of station versions is the first statistical operation – this yields Jones’ “value added” data and GIDD dset1(done differently). I’ll do a post on this some time.

The way that the stations are sorted into folders after they are unzipped is truly amazing. Larger countries like China and the US are placed in multiple folders, whereas Central American countries are lumped together in one folder. They still have some stations listed as being in the USSR. If the Lon-Lat entries are correct, then they will be easier to correctly sort by country. Since the Lon-Lat have only 2 or 3 significant figures (nearest tenth of a degree), the locations are imprecise.

The temperature data are monthly averages, but of what? Tmax? Tmin? Daily Averages? The files that I have looked at, and I’ve only opened a small number of the files, seem to start in the 1950s.

If this is the typical way that CRU organizes its data, then it is small wonder that there are problems. I do not mean to imply that this means that all their papers are wrong. I just have difficulty understanding why scientists would not sort data into folders with the very simple algorithm of one country = one folder. Why would they sort the station data at all if their sort does not improve order?

The location matters if it is 6 nautical miles from the original station location in the midst of rice paddies to the center of a town of 300K inhabitants; the new station location.

The UHI is established fact and is typically larger in magnitude than what I have read is the magnitude attributed to AGW over the past 100+ years. Since heat raises, the UHI has little effect on areas located outside the urban heat source. Consequently, the location matters quite a bit if the change in temperature over time is to be used to calculate the global temperature change rather than the amount of local urbanization which has occured over the duration of the reading.

I’m not suggesting that climate science stands or falls on these questions. For now, they are merely interesting little puzzles.

That’s quite disingenuous, Steve, if I may. Either it’s important and should be brought to people’s attention, or it is of no consequence and is not worthy of note. May I suggest you are in a “having cake and eating it” position?

I realize you can’t be held responsible for what the more rabid followers you have say and write, but I can pretty much guarantee that in the next few days, your posts will be used to “prove” that the data is suspect and thus the science is as well in an if/then manner.

You should know this by now. The fact you continue to do posts like this suggests to me that this is intentional. If you really do believe this, then by all means, state it up front. But if you really are all about improving the data, so that the science and policy based on it is improved, there are more professional ways to achieving that end.

By not stating outright your belief that climate science falls on this, you might appear to have washed your hands of it, but that’s not how its perceived by those outside your group of followers.

The kind of self censorship you advocate (i.e. don’t say something true because it might be misrepresented) serves to undermine the credibility of climate science because it leaves people wondering how much they have not been told because it ‘might hurt the cause’.

In a politicized debate like AGW, every word and turn of phrase becomes fodder in a big PR debate — on both sides. Those who want to claim they are staying above that have to be aware of how their words play out in the public. To turn a blind eye to how one’s words are taken is to be deliberately naive. In this case, either the state of the data is important to the credibility of climate science or it isn’t important. If Steve thinks it’s important, then say so. Don’t “nudge nudge wink wink” about it.
Steve: Susann, if you wish to discuss Hadley Center, you are welcome to do so, but please don’t coatrack this.

Since we are currently unaware if this is significant or not, offering an opinion from the current state of ignorance on whether or not climate science could “fall” because of this is premature. Of course it could, but it may not. Once we can replicate the transformations from the base data, we might be better positioned to know where to start on a sensitivity analysis, which would tell us if it “matters” or not. As ever, Steve is kindly blogging his “lab notes” – we can’t just flick to the last page and peek at the answer because that bit hasn’t even happened yet, let alone been blogged.

So far, it is still a mostly free internet. This is a preliminary look at the data. Why must anyone act as if “climate science” stands or falls on every detail. If it is a solid edifice on a solid foundation, it will stand regardless. I personally don’t see a lot of evidence that it is.

A lot of damage can be done — needless or otherwise — by PR campaigns which take this bit of thread and that bit and turn it, spinning it for their own purposes.

A lot of damage can be done by sloppy or dishonest science too. Even if the answer is right, it undermines confidence in the actors responsible for the product, and hence in the product itself.

If you look at the subset of stations used, you’ll note that a fair number of these are US stations, probably GHCN network stations and almost certainly not stations that need “permission to do so”.

Looks to me like we’re being handed another set of half-truths from a group of people who in the past have had trouble with honest interchanges of information.

BTW, as I’ve noticed on Jeff ID’s blog, the accompanying perl scripts do virtually nothing (400 lines of code), all of the “heavy lifting” has been done already in the “value added (now with lemon)” data set. One script averages data within 5°x5° grid without any geometric weighting to accounting for station density within the segment. The second just does a freshman-physics cos-weighted average.

It’s interesting that you challenge whether someone has tried to replicate the work. The whole point of getting the COMPLETE data set is that one can’t replicate the work without it. Of course we don’t have the complete data set so as an auditor, Steve is not in a position to know whether the work is good or not.

It’s rather like a financial auditor looking in to the books of a company and learning that all of the records are on various scraps of paper in several shoe boxes, some of which haven’t been made available yet. That in itself is disturbing, but it doesn’t mean the books have been cooked, so Steve is rightly deferring judgement.

Wrong premise, I suppose. CA is primary about interests of its authors so if one of them thinks something is worth discussing, it is automatically worth publishing here.

”
I realize you can’t be held responsible for what the more rabid followers you have say and write, but I can pretty much guarantee that in the next few days, your posts will be used to “prove” that the data is suspect and thus the science is as well in an if/then manner.
”

This does not matter. CA is not a PR blog for some organization. It is about auditing climate science as far as it interesting for the contributors. Peoples’ interpretations are theirs and theirs only and I it is hard to find a reason why Steve or other authors should even take these into account when making decision about what to publish here. They may choose to do so but there is no obligation.

With Steve’s track record of finding errors and misrepresentations, a little coyness is more than tolerable to most long time readers. This is a blog after all and there’s little harm in teasing future posts.

Look at it like a forensics puzzle (I do) – until you have all the pieces, you don’t have the big picture and don’t know how everything fits together. So this data set situation is an interesting tidbit , but it only has significance in the bigger picture, which is still unknown. One poster said is was evidence of sloppy work; I prefer to view it as evidence of a cavalier attitude toward the science process.

I’d suggest that being “cavalier toward the science process” is bad science, pure and simple.

I disagree with Susann that not showing your work is allowable in science. That is why The Royal Society demanded Briffa turn over his data: they weren’t about to let someone be cavalier or let themselves start being sloppy after 400 years or whatever. How Science and Nature allowed it is unbelievable. It is antithetical to science.

“That’s quite disingenuous, Steve, if I may. Either it’s important and should be brought to people’s attention, or it is of no consequence and is not worthy of note. ”

False dilemma. The issue in my view is this. You have a mails were Jones contemplates messing about with the data to confuse the issue the issue and annoy people. Want the cite? go find it yourself. The issue is this. CRU think that they can make the issue go away by releasing some data and some code. They can’t. They have to release the data we requested and the code they requested.
WRT the confidential data they also have a little problem lurking. but I wont spoil that just yet. Bottomline here. CRU continue to make a hash of this.

“I realize you can’t be held responsible for what the more rabid followers you have say and write, but I can pretty much guarantee that in the next few days, your posts will be used to “prove” that the data is suspect and thus the science is as well in an if/then manner.”

Its not the data that is suspect. it’s the behavior.

“You should know this by now. The fact you continue to do posts like this suggests to me that this is intentional. If you really do believe this, then by all means, state it up front. But if you really are all about improving the data, so that the science and policy based on it is improved, there are more professional ways to achieving that end.”

Professional ways of doing it? Did you even read the mails. here’s a clue by 4 for you. Go lobby CRU to clean up their act.

“By not stating outright your belief that climate science falls on this, you might appear to have washed your hands of it, but that’s not how its perceived by those outside your group of followers.”

“climate science” whatever that is doesnt fall on this. You miss the point. The temperature record is the most important piece of observational data we have. You think CRU, WMO, et all can come up with a naming system that allows people to check what they did? ya think?

WRT the confidential data they also have a little problem lurking. but I wont spoil that just yet. Bottomline here. CRU continue to make a hash of this.

Mosher, you seem like a straight-shooter, despite the statement above. You tease!

What I would appreciate is that people just come out and state what they *really* think so we can know instead of carrying on this “pondering puzzles” routine.

Let me see — the initial rejections, and the FOI rejection and the reasons given were false, specifically the claim that the CRU had agreements with countries and had to get permissions from the countries to release the data. The FOI release earlier in the year was incomplete, cherry picked, false. Faced with the revelations of the CRU hack/leak, the CRU / Met decided to release data to save face and is hoping for it all just to go away. There may be evidence to prove that the initial FOI was flawed/fraudulent in this *current* release and so the data released are checked against other data to see if that is the case.

So far, nothing incriminating been found in this current release, from what I can see, despite a few possibilities earlier on before people read the release notes more carefully.

Am I being too blunt?

Am I close?

Steve: You are not close. Earlier this year, the Met Office refused an FOI request for station data – a distinct refusal from the CRU refusal. Six months later, they seem to be able to provide a substantial portion of the data previously refused. In order to analyse this change of heart, we’ve analysed the provenance of the data now provided to see what might have changed in the six month period so that both their present provision of data and their previous refusal were justified.

Steve: You are not close. Earlier this year, the Met Office refused an FOI request for station data – a distinct refusal from the CRU refusal. Six months later, they seem to be able to provide a substantial portion of the data previously refused. In order to analyse this change of heart, we’ve analysed the provenance of the data now provided to see what might have changed in the six month period so that both their present provision of data and their previous refusal were justified.

I stand corrected about which FOI this referred to. Your comparing the Met data release to CRU data archived in response to Willis’s FOI request seemed to indicate there was a connection between the FOI requests/refusals.

So, then, to get on track, the Met refused an FOI request earlier this year based on the argument it had confidentiality agreements with certain countries. I take it you don’t believe the Met Office’s FOI denial was legitimate. Since the CRU hack/leak, the Met Office has archived new data – perhaps in response to it. Now you are checking to see if there is some evidence that the denial was wrong in the release. Hence, the comments on the confidentiality agreements as the justification for the denial.

When I read your post, I read it as an attempt to see if you can find evidence that proves the original FOI denial was flawed. That’s pretty blunt. Am I still off base?

Susann
I leap in at risk of getting it wrong too. But someone compared Steve’s work to auditing someone whose records are scribbled on scraps of paper and kept in several shoeboxes, not all of which have yet been presented for audit. Steve’s website is an audit website. Auditing is not commonly recognized as having scientific method underpinning it, and so often, real-life audits ARE out of tatty shoeboxes. Yet the process of auditing is strict, and as such can be called “scientific”. It is also a vital but neglected part of any science affecting global policies – and it should be transparent too. The likelihood in normal auditing is that there may be problems to resolve, but not major fraud, despite tatty shoeboxes. Nevertheless there MAY be major problems. An auditor has to sift the evidence right through, although patterns may grow clearer during sifting, and suggest areas requiring closer attention in order to reveal issues clearly enough. A competent auditor will sift thoroughly enough to make the real issues self-evident, and thus avoid theorizing altogether.

Steve sharply discourages all public theorizing – perhaps we can see why.

Surely part of the point being made here, is that the climate data is stored incredibly carelessly. Another example of this is to be found in the HARRY_README.TXT, where a programmer records his struggles to determine the format of files of data. This level of muddle is breathtaking for a project of this importance.

The Harry file certainly appears to show the data to be in a very bad state. I don’t *know* that it is. Harry sure seemed at a loss. I’ve seen a lot of IT people and data people commenting that it is shocking. If it is in such a bad state, does that matter for anything other than appearances and face or does it indicate some kind of deeper problem — either malignancy or fraud — with the data? I don’t know. I don’t have enough evidence or experience to judge. In light of the CRU hack or leak, if the data is really sloppy and unprofessional, I suspect things will drastically improve.

Steve’s entire premise is how important the data is AND HOW THEY ADJUST IT. He is sent partial data in terrible shape, with no information about how they adjusted any of it. What is he supposed to tell anyone, other than it being incomplete and poorly arranged and internally inconsistent?

He can’t say any more than that, and that is ALL he’s said.

If they choose to make the data public in such a slipshod and incomplete way, who is to blame for that? The recipient or the provider?

One question I have is: If it is in this bad of shape, how did Jones/Briffa process it in any coherent way? If it is difficult for Steve to sort out, the people at HadleyCRU must have had a bitch of a time with it, too. This kind of organization seems as if it would invite errors and make programming code very much more difficult than it needs to be. Perhaps not, but…

I would also take exception to your referring to people on this site as being “rabid.” My experience has been that they want to see the data and see the truth of the numbers come out. They are pleased when an item here or there goes their way, and they have become distrustful of HadleyCRU and Mann and the IPCC, but I haven’t heard a lot of blood lust here at all. Aren’t you being a bit sensitive about the AGW side having had a bad six weeks?

This post is quite rude Susann, if I may. This site has by far, IMO, the least vitriol and the most technical postings. Rabid? Please. Rabid people cannot follow the postings here, i have enough trouble and I have had my rabies shot.

Rude? Hardly. I fall under the ‘skeptic’ label and have in the past posted essentially the same question that Susann posted and been hounded away. Lots of people post glib remarks like ‘garbage in == garbage out’ all the time. That is rude, whether the people the remark is directed to are present or not. And the fact that garbage in != garbage out doesn’t seem to be addressed.

I’m a fan of Steve’s and have defended him elsewhere. That is why I would like to see this blog be far and away the best blog on the web. Steve is certainly under no obligation to obey any of my wishes, but there is the unspoken understanding when you have a blog and put up a tip jar that people are allowed to (within reason) have their say.

To bring this back on topic, I think this is an interesting post. But, I don’t think I am out of line by saying that this proves nothing. Nothing. The UK Met Office may be sloppy. So what? Perhaps scientists ‘in the know’ already know how to decrypt the country codes and make sense out of the alphabet soup. That is not ideal. But it is possible. And this data may have have been correctly used by the people who are in the know.

In the meantime, I personally can’t prove that this means anything either way. I’m looking forward to the follow-up post that does prove something.

Chris

Steve: not every post proves something. We’ve learned something already: the Met Office didn’t identify the “network” but it seems to be the RBCN network. We don’t know why they were able to release Jones’ data for the RBCN network stations on Dec 8, 2009, but unable to do so in July 2009. My guess is that there isn’t any valid reason. That if there were legally binding confidentiality agreements in July (something which is far from established), nothing relevant has changed. This is all that is at stake here. However, in post-Climategate statements, it would be nice if they provided clear explanations of what changed.

Steve, you write:
“This comparison is hampered by the appalling sloppiness of the country designations in both CRU and Met Office station lists. I know that climate scientists like to make fun of taking care of such details, …”

I find that quite appalling.
Sloppiness in dealing with one’s data, even – or perhaps especially – in such ‘unimportant’ things like properly labelling what one has got, and then making fun of it, allows one to infer the sort of mind these ‘scientists’ have: juvenile, loving their gadgets and playing with them, and having a laugh at all those old-fashioned people who think such petitesses actually matter …

And if they are sloppy in these small things, we can be certain there is much more sloppiness in the larger ones – else why the despairing notes in the HARRY_READ_ME file.

Scientific results and claims are supposed to be transparent, traceable and verifiable. What the Met Office has presented certainly is not. Is this hopeless, or is there a chance to fill in the gaps and sort out the mess?

Sean,
I’d say let’s do it, but only if it becomes clear the Met is too incompetent to get their datahouse back in order. They have been assigned to do a job after all. If they can’t, they ought to be sent home.

2. As far as continuing to perform the audit based on this latest data, the original phrase “network of stations designated by the World Meteorological Organisation for climate monitoring” and it’s ‘subsets’ led me to

One has to think there are people working long hours to get this all in some sort of presentable shape (and in a way that will eventually make them look correct, if not geniuses about it). For them to release something this incoherent shines a bad light on what condition things are in at the Met Office.

One would think the first release would be parts that were “shovel ready”, something to get out there, just to stop the rabble from overrunning the palace.

But if this bit was in that bad of shape, HOW did the researchers deal with it without pulling their hair out? HARRY_READ_ME.txt may not be just a one-off. It suggests have been par for the course. And since that file is from 2009, when one would have thought they’d come up with an efficient system after 20+ years, HOW BAD WAS IT BEFORE?

The Global Telecommunication System (GTS) is defined as: “The co-ordinated global system of telecommunication facilities and arrangements for the rapid collection, exchange and distribution of observations and processed information within the framework of the World Weather Watch.” – WMO No 49 Technical Regulations”
——
“The GTS has a hierarchical structure on three levels:

“WMO GTS is the backbone system for global exchange of data and information in support of multi-hazard, multipurpose early warning systems, including all meteorological and related data; weather, water and climate analyses and forecasts;”

I suspect that Susann has either not had the experience some others have had here in attempting to decipher what CRU (and the Met Office) is offering up or she has simply let her partisan leanings show through.

The IPCC uses CRU temperature data sets and thus one has to judge not only CRU for its sloppiness in data handling and presentation and its less than transparent presentation of methods but those organizations and individuals like the IPCC that use the data – and in a rather unquestioning way.

As Steve M notes, nothing in terms of the temperature record may hinge on the CRU handling of data, but what it does indicate is that since the consensus thinks it apparently already has the final answer that sloppy data handling is merely a public relations problem and not one directly affecting the science.

In the end, what it says to me is that the consensus thinking and approach, like the over confident athlete, has become sloppy.

CRU handling can make all the difference between reality and fantasy. Look at a different way of analysing the GHNC by someone who seems to know what he is doing and posts all his work.http://justdata.wordpress.com/
Look at the 7th chart. Is this the temperature reality? (I’m not saying it is – I don’t know enough about statistics to know if his method is valid).
Wouldn’t this temperature record trash the models as they are tweaked to the CRU reality?

Like it or not – these are typical challenges that come with ‘raw’ data, and we should be thankful for these small steps. As a scientist who routinely manages global datasets (points, lines and polys) in his work – country code is amongst the least useful metric to sort data. Ok for an audit via spreadsheet, but a better way is to compare/contrast based upon lat-long. Station and Country names change – latitude and longitude do not;
Steve: country is relevant if you’re administering confidentiality agreements as I observed in the post.

The subset of stations is evenly distributed across the globe and provides a fair representation of changes in mean temperature on a global scale over land.

So a possibility might be that they selected this subset in order to fulfill this criteria.

Another sentence here:

As soon as we have all permissions in place we will release the remaining station records – around 5000 in total – that make up the full land temperature record. We are dependent on international approvals to enable this final step and cannot guarantee that we will get permission from all data owners.

So the collection of stations in the current release might have started from scratch (in some sense, perhaps created a new computer database), and since it will take them more time to complete the process for all of those about 5000 stations, they decided to release a workable subset as an intermediate result of their effort, driven by the above criteria. Just a guess, of course.

In other words, it might be those stations of those 5000 which they already have, minus those which they need to remove in order to get a “fair representation”.
Steve: how did they get permission to show some Danish stations and not others?

As soon as we have all permissions in place we will release the remaining station records – around 5000 in total – that make up the full land temperature record. We are dependent on international approvals to enable this final step and cannot guarantee that we will get permission from all data owners.

As I pointed out, it’s not likely that most of the US stations that haven’t be released had any sort of NDA. As far as I can see, they are using the GHCN network, for which all of the data is publicly available already.

What they are saying is IMO almost certainly not the full true (I stop well short of calling them liars, since sloppy explains everything).

minus those which they need to remove in order to get a “fair representation”

a) They didn’t say they removed any points to “get a fair representation” (the only reason listed was NDA related).
b) They said they did use them in the full CRU product, so they obviously think they needed them.

So, from my point of view, we’re back to “sloppy science” again. I also don’t see any point in contacting them when the inference is this obvious, and a more complete product is in the pipeline.

The press release says that “The subset of stations is evenly distributed…”.

This might imply that they had to remove some stations from the subset (even though they had permission to use them), in order to achieve that even distribution. That;s what i tired to say (see also below).

Steve: how did they get permission to show some Danish stations and not others?

Could it be that they left out certain stations for which they have (or do not need) permission, to balance unavailable stations in other regions? The resulting average would very likely change drastically if they include ALL stations in area A but only 20% in area B, so instead of coughing up the remaining 80% of B (unavailable because of “confidentiality”, or the insatiable dog), they remove 80% of A to redress the balance. Doing so adds the convenience of being able to cherry-pick those 20% of A that make the outcome of the subset most similar to the published “complete” result.

Thhis seems to be a good explanation of my suggestion above. I’ll repeat the part of the press release which I quoted, and highlight the crucial part, since those responding to my comment seem to keep missing it:

The subset of stations is evenly distributed across the globe and provides a fair representation of changes in mean temperature on a global scale over land.

I think the page makes it quite clear that this is an intermediate release on the path to releasing a much larger set of about 5000 stations.

The page also has a Q&A which attempts to answer some questions, but apparently they didn’t foresee all the questions this forum might have.

Bu the way, Question 8 seem to apply to what is discussed here, but wasn’t quoted yet: (My emphasis)

8. Why these stations?

The choice of network is designated by the World Meteorological Organisation for monitoring global, hemispheric and regional climate.

To compile the list of stations we have released, we have taken the WMO Regional Basic Climatological Network (RBCN) and Global Climate Observing System (GCOS) Surface Network stations, cross-matched it and released the unambiguous matches.

I am familiar first hand with the kinds of problems Steve is discussing — people will give you data when you ask, but not necessarily the way you might like it. It takes our data people many many hours to get the data in order for the very simple modelling we do and rather basic statistical analysis that goes on as part of policy development.

Your argument might carry some weight if this was a new project for CRU and/or Met. Exactly how many years does it take to work the bugs out?

My experience is that you don’t *find* the bugs unless there is a reason to suspect they exist, especially for old data — or when someone new takes over and gets a look at the data as they orient themselves to it. And then, if there isn’t the funding or the staffing to fix the problems, they remain. The people who created the datasets in the first place probably know how bad they are…

I honestly don’t know what the case is with the CRU so I can’t speculate.

When I go to my data people and ask for certain numbers from them, they give me all sorts of provisos and qualifications about it, and this is pretty simple epidemiological stuff I’m working with.

Well, thank you for that. My forty years in the IT business impressed on me that you NEVER presume the data is correct. If you are not looking for bugs all the time then you are not doing your job.

I should clarify that the data techs are aware of the limitations of the data, and are always working to try to improve it, but it is often a work in progress to which new data is continually added, new errors introduced that have to be weeded out, etc. You often don’t find all the errors unless you are actively working with the data.

This is just my experience as someone who uses data products rather than generates them so it is limited. I am not an IT specialist. Recently, during the H1N1 pandemic, we found a problem with our database when we did some modeling recently and came up with an anomalous result. That result send us back to the data and we discovered a problem we didn’t know of until then.

and by the way…. epidemiology is NOT simple. You are always a reporting cycle behind and are as likely to target the cats as the vector rather than the rats.

One should remember to include a winky face when trying to be humorous. 😉

I looked up the Nadi Airport coordinates, according to the released data set (lat -17.8 long 177.5). That should be on the left side of +/- 180 which you labelled as zero. On the map I see it as close to the city Nadi.

I think the plot on your post is wrong, not the Met station data. (You have it correct in the table). Besides, Wikipedia has almost the same coordinates for Nadi Airport, unlike the blog says.

Nadi and Nausori airports are both on the island of Fiji, less than 100km apart, equalling slightly more than 1 degree longitude. The dateline does not pass through that island.

In other words, the Met station table must be wrong in putting one of them east (+177.5) and the other west (-178.6) of the dateline. Whatever their preferred method is (positive = East or West of DL), they should stick to it throughout, and they don’t. Debating whether it’s the Nadi or the Nausori coordinates that are wrong amounts to nitpicking – the point is that they can’t be BOTH correct as in the Met table: Either Nadi has to be changed to -177.5, or Nausori to +178.6 .

OTOH, Ono-I-Lau is shown on Google maps to the south-east of both Madi and Nausori, approximately at the place it appears in the Met Office plot. Here the “other sources” seem to have failed.

Nuku’alofa again is on the wrong side of the dateline in the Met plot, it lies on Tonga, roughly east of Ono-I-Lau.

As the others (especially Udu Point) do not seem to agree in their relative positions when looking them up in Google maps, I’ll try to post the link to a map with markers for all 8 locations. Sorry I am a newbie to this and have no idea how to show a lat/long grid or individual coordinates in Google Maps.

(Looks like someone beat me to it, but I will post my response anyway)

Norbert, your lack of awareness in your quest as defender of AGW becomes more and more apparent in your comments.

Exactly which Wiki did you check for the coordinates of Nadi? I presume it was this one
which gives coordinates 17°45′19″S 177°26′36″E. The information in the Met data gives them as -17.8 and 177.5, apparently the same. However, the more alert reader of the post would have read the following text above the table:

I have mentioned before that for some unknown reason, Met and Cru prefer to do the opposite of what one might normally expect for coding longitude values. East of Greenwich, their longitudes are negative and those west are positive – not what one would expect for drawing maps and not what one might generally find in other global reference venues.

So Met places the airport on the RIGHT hand side of the dateline and not on the the correct left side. As an added check, look for Nausori Airport on Google maps. It is East of Nadi (between Nadi and the dateline) , but it has the opposite sign for longitude so that both of them cannot be correct.

[Update to my comment: One can check that Met uses the opposite sign for its longitudes by simply plotting all of the stations. You will get a mirror image of the world map.]

Everything else I have seen always uses + for eastwards. However, I just looked up this on Wikipedia:

“For calculations, the West/East suffix is replaced by a negative sign in the western hemisphere. Confusingly, the convention of negative for East is also sometimes seen.The preferred convention—that East be positive—is consistent with a right-handed Cartesian coordinate system with the North Pole up. A specific longitude may then be combined with a specific latitude (usually positive in the northern hemisphere) to give a precise position on the Earth’s surface.”

I’m not an expert on that, but I guess they should abandon using negative for east. That does look like a mess.

Jimchip Posted Dec 27, 2009 at 2:28 PM
WMO has an error in its GTS MTN data – Bracknell UK Met Office closed about five years ago and moved to Exeter, circa 150 miles SW where temperatures will probably be generally warmer (always a good thing). The office site was demolished and replaced by a housing development. Perhaps someone has forgotten to tell WMO.

Isn’t the most important part of the Met’s data release the term “value added”? If “value added” means “artificially adjusted”, then its useless for research. The only data that anyone can use is the raw data, so the researcher can apply their own adjustments for things such as the urban heat island, fully stating what they have done.

The only data that anyone can use is the raw data, so the researcher can apply their own adjustments for things such as the urban heat island, fully stating what they have done.

If it’s purely replication, you need to know the methods so raw data alone would not suffice. If you want to do your own analysis, you still have to be able to make the adjustments and I imagine there is quite a lot of debate over how to do that. That’s one of the real topics of debate — how the data is or isn’t adjusted.

I don’t know how much of “Climategate” is due to tribalism, how much is due to sloppy data keeping, how much is due to bad practice and how much, if any, is due to outright fraud.

If it’s purely replication, you need to know the methods so raw data alone would not suffice. If you want to do your own analysis, you still have to be able to make the adjustments and I imagine there is quite a lot of debate over how to do that.

No, it goes beyond replication. If I was a researcher who was examining some feature of the climate in relation to actual temperatures, would I wish to rely on what undisclosed changes have been made in the raw data by some other group. Not likely.

If CRU wishes to adjust their data they are entitled to do so. However, if they expect this data to be used in serious scientific endeavours (and justify the money provided for their existence), then they should be providing two data sets: the raw one and their “value-added” version (with a complete explanation of how the data has been altered). This should have been the case years ago.

My understanding so far (although this is a complicated subject) is that CRU considers (or considered) it the responsibility of the meteorological services to provide the raw data, since that is where it is coming from. It seems to me the raw data goes through several phases (probably being worked on in different locations, at different times) until it is used in a specific research paper. And the best version to use might depend on the use case.

You didn’t answer the question of course. It’s the elephant in the room. The margin for error could be equal to or greater than the temperature of the “global warming” being reported to us all. And we have no idea what it is do we? This is outrageous and not funny at all.

After seeing some of the arguments for reducing measurement errors, I really wonder if some scientists were out sick when the subject of propagation of errors was covered.

The only way that an adjustment can reduce the error, is to know what the correct value truly is. Unfortunately, the correct value is at best approximated by the device doing the measurement. An adjustment is just a guess of the magnitute and the sign of the true error. This guess could just as easily be further from the correct value as it could be closer to it.

Therefore, I prefer the raw data. This gives me data that will have measurement errors which I understand and I may be able to quantify. Raw data lacks the adjustments made by a researcher who believes that his adjusted data is superior.

I missed this earlier and just completed reading the transcript – the program is excellent, I wish a US media outlet (ANY US media outlet) would do a program as accurate and on-topic as the Finnish program!

For those with a background in signal processing, can I suggest the following:

Take a daily temperature record. Smooth it with a 30 day average filter. Plot the spectrum of the filter.

Then decimate the record by sampling it every 30 days. Compute the spectrum of the decimated signal (analytically), remembering that the sampled signal spectrum is the convolution of the spectra of the signal and the sampling process. Put realistic data into this process.

Is the data aliased? The signal is completely degraded.

Then take the monthly signal and apply a 12 month average to get a yearly signal, and then, in true climatological manner, smooth it with an averager. You will discover that this introduces trends and excursions that do not exist in the data.

If this is how the temperature data is processed, I am absolutely appalled. I have done real time signal processing on invasive cardiac signals for the last 25 years (having designed and built the specialist equipment as well as being a cardiologist) and I could not possibly apply methods such as these in a clinical environment.

I have have written a brief research note on this and sent it to SM. If there are any signal processors reading this, please repeat my calculations and I am happy to send you my research note, which I will refine over the next week or so.

My only problem with a daily temp record (not possible methods of analysis) is that the ‘day’ can be (Tmax + Tmin)/2. There has been other work looking at Tmin (nighttimes) vs. Tmax) (daytimes). There is just no way to know at, this point, unless one gets a chance to examine and (at least try to) understand how a good ‘set’ of data is acquired and analyzed.

For what it’s worth, I’ve been in the energy sector for 25 years and counting. I’m on the Operations end of things, so can’t claim to be an expert, but for years we calculated anticipated load (EHDD, or effective heating degree day) using the simple formula you describe to get a temperature average. 10 or 15 years ago, we outsourced weather predictions to professional firms that take into account cloud cover, wind chill, etc. as well as predicted continuous temperature over a particular day. For historical actual data, the same methodology is used but on real measured data.

The newer methodology (usually) yields a better result for load forecasting, but the calculated average temperature is often quite different than the simple arithmetic mean of Tmax and Tmin. The implication is that one should use one or the other methods for a given period of time, but don’t mix and match. Just as a guess, most or all of the older data probably uses your simple formula since it would be unlikely that many stations would have the capability of continuously recording temperature. Even if they did, how many would have the expertise or time to integrate 24-hour periods on the paper charts that were only recently supplanted with electronic recorders?

Steve: can we leave this topic alone… nothing turns on it for this thread.

If this is how the temperature data is processed, I am absolutely appalled. I have done real time signal processing on invasive cardiac signals for the last 25 years (having designed and built the specialist equipment as well as being a cardiologist) and I could not possibly apply methods such as these in a clinical environment.

It’s useful to have the perspective of other practicing scientists/clinicians. Do you think your methods of analyzing cardiac signals applies to something like determining global temperature over time?

Experience in time series data, especially the introduction of artifacts by any processing of the data – and how to detect these artifacts – would seem to me to be entirely relevent (although perhaps not to this thread). Do you have any reason to believe otherwise?

You appear to be advancing the argument that analyzing cardiac signals (as an example of non-climate-related time series) is not useful in considering the determination of global temperature over time.

This is notwithstanding your opening sentence, “It’s useful to have the perspective of other practicing scientists/clinicians.” Emphasis added.

It would be useful to readers if you would state your points explicitly.

No, in all honesty, I don’t *know* if it would be useful. I asked Richard Saumarez to respond with *how* it would be useful so that I could learn. We have a minerals consultant/former policy analyst criticizing the stats used in dendro and gosh, doing dendro himself, so why not a cardiologist providing advice on a method to to establish global temperature?

If CRU wishes to adjust their data they are entitled to do so. However, if they expect this data to be used in serious scientific endeavours (and justify the money provided for their existence), then they should be providing two data sets: the raw one and their “value-added” version (with a complete explanation of how the data has been altered). This should have been the case years ago.

I agree. I have no problem with data being altered, given what is involved in obtaining temperature readings around the globe. I agree that why and how it was altered should be transparent so that other scientists can replicate it and that we can be certain of its findings. This should be part of justifying its budget and for any use of its data in policy development.

Susann,
One of my professional specialties is demographic data analysis, tracking thousands of variables over time for every geographic location in a region. In many ways, not very different at all from maintaining a collection of temperature data.

You are correct that data sources are often messy.

However, it is truly disquieting (to put it mildly) that after many years of running this process, CRU has been unable to produce a quality-assured data production system that carefully tracks the data in every way. One learns how to deal with messy source data.

Take a look at Weather Underground for a data management system that’s run a bit differently. At random, I chose an African nation and was quickly and easily able to pull up the available data. I can easily see which stations are reporting, which have missing data, etc etc etc.

By the way, Susann, your following statement…

By not stating outright your belief that climate science falls on this, you might appear to have washed your hands of it, but that’s not how its perceived by those outside your group of followers.

…indicates you haven’t really listened to Steve’s stated position. He has stated quite clearly (on national television for that matter) his believe that we don’t know.

In that sense, this post adds to the uncertainty. If a high level of uncertainty means the “fall” of climate science, so be it. But that’s a nuance you have apparently missed.

…indicates you haven’t really listened to Steve’s stated position. He has stated quite clearly (on national television for that matter) his believe that we don’t know.

In that sense, this post adds to the uncertainty. If a high level of uncertainty means the “fall” of climate science, so be it. But that’s a nuance you have apparently missed.

If climate science falls, it should fall because it is *wrong*, plain and simple, not because its practitioners are sloppy or the scientists working in it tribal, or imbued with massive egos out for glory — or because its findings are too uncertain or doubts have been raised about the scientists. Doubt and uncertainty are not enough to scupper a whole discipline because those are legitimate parts of the development of any science.

If the records have been adjusted incorrectly — either inadvertently or fraudulently — such that the results do not reflect reality, I want to know but I also want to know that those determining whether they have or not are competent to determine it.

The primary reasonable outcome from Climategate is the more widespread understanding that the confidence has been overblown, the science presented as more certain, more settled, than it is. In that sense, the public visibility and/or acceptance of the AGW climate science campaign is “falling.”

OTOH, the need for more, better science to be done in this field is rising rapidly in my book.

Looks to me like we’re being handed another set of half-truths from a group of people who in the past have had trouble with honest interchanges of information.

I agree that Climategate emails appear to show a reluctance to provide him with data — that much seems clear. The bottom line for me is whether it matters to the issue at hand — the reliability of the temperature record and whether the warming climate scientists claim it shows is real. I’ll wait for the investigation into the matter to conclude to answer that. Everything else is just speculation and rumor.

What is not speculation and rumor is that the emails show that Jones was in a panic to prevent the release of the data that would allow the basis of his claims to be closley examined. Now why do you suppose he would feel that way?

What is not speculation and rumor is that the emails show that Jones was in a panic to prevent the release of the data that would allow the basis of his claims to be closley examined.

There is a whole group of scientists who subscribe to the concept of intellectual property rights. If you give away your work as soon as you’ve done it, you’ll never be anything but a data pack mule for other researchers.

It’s clear IMO from Jones’s (and Briffa’s) exchanges that this IPR issue was the major roadblock for releasing their data.

My opinion is if the work is published, you need to provide the data used in obtaining those results, and at the least equivalent code to what you used to produce your results.

When it is felt that your code contains non-releasable IPR sections, a suitable compromise might be to release a version of the code using PERL or MATLAB, whereas your private version is in C++ (and hence much faster) that obtains substantively the same result as your own “private” source.

Clearly Jones panicked at the prospect of his work being closely analyzed: hence the sudden (and wholly implausible) claim that contractual, confidentiality agreements prevented him from sharing the data, which apparently had commercial value (who’ll give me $1000 for Singapore Temperatures 1930-1935?). And then the even more implausible claim that the legal documents have been lost! But what is odd is that the Met Office have bought into Jones’ fantasy. Sooner or later it will be exposed that there are no legal agreements which prevent the sharing of the data, which will be a problem for Mr Jones (as he then will be). Why should the Met Office want a part of that problem?

Clearly Jones was bothered — perturbed, frustrated — I don’t know which verb properly applies — by the requests for data and the possibility of FOI requests, and the FOI requests themselves. Apparently, the FOI officer(s) and commissioner agreed with his reasons so unless he outright lied to both, he has some backing in this. I’m not saying I agree with the decisions– in fact, I think the decision to deny access caused more harm than benefit, but if the law was followed, it was followed. It appears the FOI officials looked into the matter and decided for Jones. I don’t know if all the i’s were dotted and t’s crossed — I’ll have to wait for the investigation. There are many laws with loopholes and aspects I disagree with, but short of lobbying to have the FOI laws changed or pay for more enforcement or revise how much latitude the FOI officers have, there’s not much to do about it besides wait for the investigation.

First, whether the email deletion took place has not been proven, nor is it proven that any deletion they might have done was against the FOI laws. It is a possibility. If the email system is backed up, the emails should be retrievable, depending on how they run their system. I won’t take anyone’s word that they broke the law — that is to be determined by official investigation. I want to see what the investigation finds.

That issue aside, the issue is whether the FOI officers/ commissioners acted properly in agreeing to deny the request. I don’t know if the CRU has confidentiality agreements with countries that prevents all data from being released — they claim they do, and unless Jones was lying through his teeth, I imagine they do. This would be something easily checked. I would presume the FOI people did check as part of their own due diligence — if not, they will pay the consequences.

Do I think there is any legal legitimate reason not to release the data, aside from the confidentiality agreements which may exist? No. Should they have released the data? Yes. Are they playing a game of catchup right now because they did not release the data? Yes. This is a hard lesson to learn, but in a politicized issue like this, gaffe and error and stumble is to be expected.

Steve: CRU was unable to produce relevant confidentiality agreements last summer. In addition, in their November 18 email, they resiled from their July claim that the supposed “confidentiality agreements” included specific clauses preventing distribution to “non-academics”. Given CRU’s inability to produce relevant confidentiality agreements, it is apparent that their FOI officer sent out refusals without carrying out any due diligence to determine that there were confidentiality agreements containing language of the type claimed by Jones. My guess, and it’s only a guess, is that the university FOI officer is not happy about the information that he received from CRU. Before you opine on these matters, why don’t you read the blog posts on the past history of the FOI requests. It’s well documented on the blog.

My guess, and it’s only a guess, is that the university FOI officer is not happy about the information that he received from CRU. Before you opine on these matters, why don’t you read the blog posts on the past history of the FOI requests. It’s well documented on the blog.

People opine all the time here without benefit of evidence or based on flimsy evidence at best.

I have read the blog however I don’t have it memorized. You may have documented your experiences with FOI requests, but that’s only half the story. I don’t know what the FOI officers thought or how they interpreted the request and how they developed reasons for refusal. I have worked with FOI officials before on data requests, so I know how our people respond to such requests from the public.

Are there any FOI laws anywhere in the world that make it OK to instruct others to delete emails subject to an FOI request?

Not having read every piece of FOI legislation in the world, I can’t answer that. I don’t imagine so, however, stranger things have been the case. Laws are written with many loopholes for just these kinds of occasions. You’d have to be naive to think otherwise. Just speculating here, but it might be that they wanted to delete emails on their *personal* machines to make it harder for such emails to be found as they would have to go back into the archives and find emails. This would create a (minimal) barrier to easy release of info but it might raise the cost of the release such that the requester might not be able to pay, etc.

I really don’t know. You can’t claim they broke the law until they are actually charged and found to have done so. They can look guilty as heck, but so do the celebrities on the cover of The National Enquirer…

This leaves a number of puzzle-type questions about the Hadley Subset:
– how were these selected?

I don’t know, however, a retrospective of the word ‘subset’ shows how the word is used in the emails. ‘Subset’ is used as a commonly understood term in the emails but not as a merely generic, mathematically-defined word.

“But this is fine, since the IPCC AR4 and other assessments are not saying the evidence is 100% conclusive (or even 90% conclusive) but just “likely” that modern is warmer than MWP. So, yes, it should be possible to find some subsets of data where MWP and Modern are comparable and similarly for some seasons and regions. And as you’ve pointed out before, if any season/region is comparable (or even has MWP>Modern) then it will probably be the northern high latitudes in summer time (I think you published on this, suggesting that combination of orbital forcing, land-use change and sulphate aerosols could cause this for
that season/region, is that right?).”

> What we’ve done (in the new water vapor work described above) is to
> evaluate the fidelity with which the AR4 models simulate the observed
> mean state and variability of precipitable water and SST – not the
> trends in these quantities. We’ve looked at a model performance in a
> variety of different regions, and on multiple timescales. The results
> are fascinating, and show (at least for water vapor and SST) that
> every model has its own individual strengths and weaknesses. It is
> difficult to identify a subset of models that CONSISTENTLY does well
> in many different regions and over a range of different timescales.

————————————-

(May, 2007 Schmidt to PJones) A long email but interesting beyond ‘subsetting’:

“Some work on modelling a subset of those effects has been done for the last glacial maximum or the 8.2 kyr event (LeGrande et al, 2006), but there have been no quantitative estimates for the late Holocene (prior to the industrial period).”

————————————-

Here’s one example where ‘subset’ is used as a verb: I couldn’t resist the longer quote:

Gil,
One other good plot to do is this. Plot land minus ocean. as a time series.
This should stay relatively close until the 1970s. Then the land should start moving away from the ocean. This departure is part of AGW. The rest is in your Co2 increases.
Cheers
Phil
Gil,
These will do for my purpose. I won’t pass them on. I am looking forward to the draft paper. As you’re fully aware you’re going to have to go some ways to figuring out what’s causing the differences.
You will have to go down the sub-sampling, but I don’t think it is going to make much difference. The agreement between CRU and GISS is amazing good, as already know. You ought to include the NCDC dataset as well…”

“…There are also discrepancies in the WWII period. I have not subset the reanalysis to correspond to a particular dataset’s missing mask as all 3 have different coverages. I’ll be making plots for the paper (with a draft coming) soon.
best wishes,
gil”

—————————————

There are other uses of the term. I left out debate/discussions regarding claims that McIntyre or McKitrick or Michaels or others were using the ‘wrong’ subset.

Here’s how station list configuration and PDO can interact to introduce a false temperature trend. Note: there innumerable potential examples and mine is not an accusation of any sort!

In 1966, the rounding methodology changed for temperature measurement recording to “up” vs.”away from zero”. The impact is that the average of temperatures below 0F is raised by .5F for glass bulb thermometers (the norm at the time).

A proper bias correction would leave the US average temperature unchanged. However, if a station reconfiguration was used to correct for the rounding increase, by adding southern stations in the cool part of a PDO phase, but on the cusp of a change to a warming regime, the avg. might be unchanged initially, but it would falsely increase over the next half of the PDO cycle.

I am not sure of the definition of coatracking but if man made global warming was such a major factor wouldn’t be able to overwhelm cold phases of the PDO. It didn’t in the 40-70’s cold pahse and not doinhg it now.

P Gosselin, fair point. MET/CRU are more likely to get results. How about giving them a limited amount of time? If the process is not making clearly making progress, then we start. I suspect they will just ask once to show willing, but not actually make waves to free the data, but we can.

I have to say I am primarily interested in getting the data online and updated regularly and automatically, not picking a fight with the MET office. From a first view, the MET office/CRU re-coding is compatible with an open auditable record, and we need to nurture them for this.

Everytime I hear “subset” my spidey sense changes it to cherry picked. Why not all the data? Based on their line of this representing average temperatures they picked not based on the mysterious confidentiality agreements.
It will be interesting to compare relative trends of today’s released data vs the future subsets that might be released.

There is another approach which can sidestep some of the issues with adjustments of RAW data, especially homogenization of data from different stations (which can have the same ID despite being in different locations) or different instruments at the same station. It was suggested by a commenter on WUWT named “supercritical,” and perhaps by others also.

Instead of trying to generate a “global average” temperature and extracting a trend line from it, we should be compiling *decadal* records from every station which has a homogenous and continuous record — no changes in location, instrument, TOB, etc. — for at least a decade, for each decade for which those conditions hold true for that station. Let’s call those “qualified decadal records.” We average the results for all QDRs within a 1 X 1 degree geographic grid cell (or 5 X 5 if not enough data for the 1 X 1), for each decade. Then splice together the decadal averages for each grid cell — end to end, without regard for the absolute temperature values — to form a 100+ year record for that cell (because we are only interested in the *trend* over that period, not the actual temperatures). We can then average the resulting trends over the globe or within latitudinal bands.

This method dispenses with all the uncertainties, and opportunities for manipulation, which efforts to homogenize and otherwise “adjust” raw data present. It would not give us a useful global temperature, which we don’t need anyway, but it would give us a reliable historical trend, at least for those portions of the globe having data in most of their grid cells.

Contrarian is absolutely right. That is why the data on the website justdata.files.wordpress.com/2009/12/rawtempbylatitude_2.jpg is interesting. It shows the raw temperature data only for those stations with continuous records from 1900 to 2009. There were only 613 stations, and only 70 degrees of latitude are covered, all in the northern hemisphere. There simply is not enough data to determine the average air temperature of the earth in the early 1900’s. Of the 7, 10 degree latitude bands where long-term records are available, 5 showed no change in temperature, while 10 and 20 degree latitude stations (+-5 deg) show nearly 5C increase in temperatures over a 100 year period. I wonder how much of this is due to urban heat island effects due to development near the weather station.

Susann I think you miss the point of Steve’s posts. He finds something interesting to him, he posts on it. Sometimes he follows up and develops an end game answer and posts on it, sometimes CA readers follow up and develop an end game answer and they post on it, sometimes they add a tidbit of knowledge that they have on the topic that Steve can use, sometimes independent sources release additional info on the topic or otherwise posit an answer. And sometimes the end answer remains a mystery. Suggest you just relax, sit back and watch.

Gavin,
First the figures are just for you – don’t pass on!!! I don’t normally see
these. I just asked my MOHC contact…”

2. more insight into ‘how’:

“The rigorous QC that is being talked about is done in retrospect. They don’t do much in real time – except an outlier check.
Anyway – the CLIMAT network is part of the GTS. The members (NMSs) send
their monthly averages/total around the other NMSs on the 4th and the 18-20th
of the month afterwards. Few seem to adhere to these dates much these days, but
the aim is to send the data around twice in the following month. Data comes in
code like everything else on the GTS, so a few centres (probably a handful, NOAA/CPC,
MOHC, MeteoFrance, DWD, Roshydromet, CMA, JMA and the Australians)
that are doing analyses for weather forecasts have the software to pick out
the CLIMAT data and put it somewhere…”

3. Some insight into selection (?)

“At the same time these same centres are taking the synop data off the system
and summing it to months – producing flags of how much was missing. At theMOHC they compare the CLIMAT message with the monthly calculated average/total.
If they are close they accept the CLIMAT. Some countries don’t use the mean of
max and min (which the synops provide) to calculate the mean, so it is important
to use the CLIMAT as this is likely to ensure continuity. If they don’t agree they
check the flags and there needs to be a bit of human intervention.”

4. I made a previous comment but couldn’t recall the reference:

“What often happens is that countries send out the same data for the following month.
This happens mostly in developing countries, as a few haven’t yet got software to
produce the CLIMAT data in the correct format. There is WMO software to
produce these from a wide variety of possible formats the countries might be using.
Some seem to do this by overwriting the files from the previous month. They
add in the correct data, but then forget to save the revised file. Canada did
this a few years ago – but they sent the correct data around a day later and again
the second time, after they got told by someone at MOHC.
My guess here is that NOAA didn’t screw up, but that Russia did. For all countries
except Russia, all data for that country comes out together. For Russia it comes
out in regions – well it is a big place! Trying to prove this would need some Russian
help – Pasha Groisman? – but there isn’t much point. The fact that all the affected
data were from one Russian region suggests to me it was that region.”

As someone mentioned earlier, this blog has operated as a cooperative venture in trying to determine what is true and what is wrong or exaggerated. we also want the truth “wherever it leads”.

You can think of it as an on-line seminar where we exchange ideas in an open forum. There are a lot of capable professionals here who contribute their relevant skills and experience in analyzing the material at hand. These include people with a surprisingly wide expertise ranging from various aspects of climate science, engineering, biology, physics, economics to the advanced methodology of mathematics and statistics needed to understand and evaluate what is dealt with. Steve Mc. is the driving force who keeps the blog moving in a productive direction.

Your suggestion that only a fully developed “finished” result be made public just would not work. In order to function, the topics need to be current and we need to be able to exchange ideas in a timely fashion. We do not have an infrastructure to work in camera nor the time or funds to set up what would be required – many of these people work full time and do this as a hobby.

If Steve has become popular enough to have people pay attention to what he says, that is only an indication that he is doing good work and doing it right. His presentation style is interesting and wry, but honest – you seem to interpret it as inciteful rather than the way most of the rest of us see it, insightful. I suspect that you are just not familiar with his viewpoints. If you hang around and watch and read (and ignore the occasional extreme comment in a blog which does not a priori censor individuals based on having opposing viewpoints), you might change your mind.

If Steve has become popular enough to have people pay attention to what he says, that is only an indication that he is doing good work and doing it right. His presentation style is interesting and wry, but honest – you seem to interpret it as inciteful rather than the way most of the rest of us see it, insightful.

Steve has become very popular — I won’t go so far as to say he’s doing good work and doing it right — that is yet to be proven in my view. I have read his papers and blog and I’ve read the criticisms of his papers and his blog and I am undecided. He has raised doubts — both in the papers he has co-authored and in his blog posts. Are those doubts warranted? That I do not know.

If I see some of his posts as inciteful instead of insightful, I suspect it’s because I don’t share the viewpoint of most people here and so I notice the word choice more.

I will say that if this all leads to a stronger more transparent climate science, then the work has been worth it. But it does ultimately depend on the climate and that won’t be clear for a while.

Steve – one more time, would you please move your OT ruminations about the merit or lack of merit of this site to Unthreaded and off a topical thread.

Oops, I missed the request upthread to move that conversation to the OT thread, sorry.

As a possible blog enhancement, may I recommend considering adding a ‘noisy’ flag to each comment, settable by the moderators? Viewers can choose to view the post with noisy comments filtered (the default) or unfiltered.

Or, alternatively, enable the moderators to send noisy or OT comments to OT or Rejected threads, automatically tagged with the thread from which they were removed.

Steve: there are lots of things that we would like to have, but we are limited by the wordpress environment which has the great advantage of stability and speed. I’m sick of the blog crashing and being so slow that management was prohibitively time-consuming.

Ron Broberg in a comment above observed a much closer genetic relationship of the Hadley Subset to the Regional Basic Climatological Network (RBCN) than to GCOS. Ive examined this dataset and agree. Its too bad that the UK Met Office didnt describe the provenance of their network.

Ive updated the post to describe the matched and unmatched stations. The puzzles are much reduced. The main questions remain – what is the basis on which the Met Office decided that RBCN stations were not subject to the supposed confidentiality agreements. And were there any relevant changes between June 2009 and Dec 8, 2009.

While I missed it the first time I read the Met’s subset webpage, I don’t think it is fair to characterize the UK Met Office as hiding the provenance of their network. The following quote is listed just above their applet for downloading station data.

The UK Met Office states: The stations that we have released are those in the CRUTEM3 database that are also either in the WMO Regional Basic Climatological Network (RBCN) and so freely available without restrictions on re-use; or those for which we have received permission from the national met. service which owns the underlying station data.http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html

Steve: I “missed” that statement the first couple of times that I read the webpage as well. As did other CA readers to date. It’s possible that they added a clarification after the matter was raised here. We’ve seen changes in other official webpages in response to comments here – with the changes being inserted on the fly without notice. I didn’t save the webpage as a I first inspected it. It’s also possible that we both missed the statement. The Google cache at the time of this comment (Dec 28 8 am Eastern) is a Dec 28 cache and sheds no light on whether the page might have been amended. Further Update: Arggh. As Ron points out, there is convincing evidence that the webpage is unchanged – I quoted it myself!!! I haven’t previously handled the RBCN data set and missed the reference. Shame on me.

“The WMO Regional Associations define the Regional Basic Synoptic Networks of surface and upper-air stations adequate to meet the requirements of Members and of the World Weather Watch. The WMO Regional Associations also define Regional Basic Climatological Networks necessary to provide a good representation of climate on the regional scale, in addition to global scale. The WMO Executive Council Working Group on Antarctic Meteorology is in charge of reviewing the Antarctic Basic Synoptic Network and the Antarctic Basic Climatological Network.”

Steve, one thing the CRU emails go over again and again, and even Susann seems to imply is :

1) Release of data will lead to more requests, more FOI
2) Our precious time will be used up complying to FOI requests
3) We should therefore not respond to any FOI requests.

The current sequence of events is leading down precisely this path. The Met data seems insufficiently annotated.

I think the basic flaw is in Jones’s basic questioning of the very legitimacy of ‘non-scientist’ requests for data. The fact that the data is sloppy then sits on top of this initial skewed thinking. So for Jones, presumably, he cannot release the data even if he wanted to, because it is in such a messy state. But yet, he can cover it up by claiming this messiness to be the very complexity that non-scientists cannot comprehend. From therein starts the deviously circular logic that FOI requests will waste time for CRU. An attitude of openness would have taken the whole matter down a different path. There are many scientific communities that practice open sharing of data of this nature.

Why won’t CRU/Jones participate in a joint exercise with the skeptics – in annotating the data and publish it? That would carry more weight than the “SPM”.

“…more transparent climate science, then the work has been worth it. But it does ultimately depend on the climate and that won’t be clear for a while.”
I think there is enough value in scientific endeavor of the kind that simply says “theory X cannot be right because of Y reason” for it to be pursued on it own right. All science does not have to be synthetic. Maybe Mr McIntyre’s inquiries will unclog a few minds who will tomorrow synthesize something else.

We can and should try to fix the past record as a one off, but we need to also ensure the present comes in in a usable form. For me that means online raw measurements in machine readble form from the collecting contries as soon as practical. I have no problem if the raw is published 12 months late for commercial reasons. Then we can QA the data and fix the issues while it is fresh. As I understand it, in the past there have been get togethers, and years of figures have been swapped in one go. So there is no back channel to raise QA issues in a timely manner.

“If we can rely on the Met Office statement that they included “any additional data for which we have permission to release”, this means that Denmark has not yet provided permission for release of 6 stations. (It’s a little odd to think that they would have consented to the release of 2 stations and not the other 6, but hey…)”

I’m sorry I believe in calling a spade a spade. It certainly looks like the Met Office is not telling the truth about witholding data on account of “permissions”. It is incredible, (not credible, totally unbelievable), that the Met Office would withhold “permission” for release of data for 6 stations while giving permissions for 2.

Something is rotten about this Danish yarn, and if their credibilty is shot here in Denmark, this whole “permission” yarn doesnt sound credible.

“The most surprising aspect of the list is that is dominated by Third World and especially African countries”

The southward March of the Thermometers?

“It seems surprising that these countries would have answered the bell for permission to release data so much faster than European countries.”

Actually, realease by third world countries but not by first world countries makes perfect sense to me. Those fisrt world countries are steeped in two things lacking in the third world countries – strong IPR laws with the public legal wherewithall to prosecute violations, plus the administrative need to find “external” funding sources for govt departments. It’s not too hard to find a client that finds such data useful in, say, europe and is willing to pay for it, but selling weather data to people who are, in large part, surviving on US$2 / day is never going to be profitable, let alone desirable – just getting it to them would likely exceed any fee they could reasonably be expected and are willing to pay.

Mr. Mosher
I believe you are correct that individual scientists control some datasets. If you read the GHCN preview to V.2 paper you can read the story of trying to acquire data from an individual scientist with the Met office in Rwanda who was later lost in the chaos in that country.

They also mention Professor Wernstedt from Penn State who for decades collected data from nonindustrialized countries. His “World Climatic Data” book was popular back in the 1970’s. I expect there are many other private collectors who have contributed to all of the different datasets, and they may indeed have confidentiality agreements.

It seems that unlike purveyors of blog science (see denialdepot) the Met Office actually took time and care over this release. It is a pity that Mr. McIntyre clearly did not read the website issued concurrently with the data. The answer to question 8 on the site clearly explains the provenance of the lists. If that isn’t enough the provenance of the networks is again given, this time not in a drop down menu answer, just next to the datafile download link. The page has not been updated since before christmas as far as I can ascertain and certainly well before this top post. If this is hiding the provenance I’d hate to think what disclosure to Mr. Mcintyre’s acceptable standards would require – flashing neon lights perhaps? Then, if Mr. McIntyre had bothered to do a thorough search on the station lists it would be immediately apparent as it was to me after all of 5 minutes work on a cruddy old laptop that the 145 “missing stations” that became the subsequent focus have a non-zero 6th id and a station name that disagrees with that in the WMO network. Therefore the Met Office did not disclose data for stations that were not in the network. It looks like they did a far more thorough job of network checking than here. Is that fraud or incompetence? No. Instead it looks to me like a national met service undertaking a full and transparent effort and then people not reading the very small manual attached and shouting fraud when there is none. It does not reflect well on this blog.

In summary a pretty conclusive blog science 0 Met Office 1 scoreline. It would be hoping too much for this post to be updated in the header to reflect these verifiable facts and I fully expect this posting not to appear or to get removed fairly pronto. In the eventuality that it isn’t I expect flame posts ignoring the facts contained. Please surprise me pleasantly in all regards …

Steve:thank you for figuring out the handling of the 145 matches. I agree with your point about the 6th digit and have amended the head post accordingly. This is precisely one of the benefits of discussing things on a blog; sometimes even careful people miss things as I did here. A few hours later, someone else figured out the answer (you in this case) and that part of the puzzle is solved.

However, your allegation that I had shouted “fraud” on this point is simply untrue. I enumerated several possibilities for the non-matching of the 145 stations and “fraud” was not one of them.

There are other puzzles in respect to this list and perhaps they will all be resolved. For example, do you know why the Jones value-added version of the RBCN network is not covered by the supposed “confidentiality agreements”?

Accepting your “facts” at face value. It hardly changes the validity of the body of work done by Steve and others at this blog. Might I suggest that you spend more time reading and less time writing posts – like I do. Since the “facts” you bring forward had already been identified by other posts.

I believe subset here refers to the land surface temperatures being a subset of
the combined land and sea data. Reviewing the US station data, it is almost
exclusively urban locations, with a bias for coastal locations. Montana, the
4th largest US state with an area of nearly 150k sq miles has one station located in Helena. New York has two stations. Texas, the largerst US state in the lower 48 with an area of nearly 270K sq miles has only 6 stations. Looking further afield, the country of Sweden with a size of 173K sq miles area has 8 stations. Could they not get the Swedes on board? The province of Quebec with an area of nearly 600K sq miles
has 8 stations. The sparsity of stations and their location and urbanization may
have real consequences when the temperature data is later compared with the proxy
data. Proxy data is mainly found out in the countryside and in the mountains. In the new world, the temperature data from the early 1900s will be from urban areas
that are in their infancy. As the the century progresses, these urban areas will
be growing rapidly and UHI effect will come into play. Meanwhile, proxies which are located way out in the hinterlands will remain fairly unaffected by UHI. If
more ( or I should say some) stations from rural, mountainous areas were included
this would result in a more representative temperature trend.

I think that I have seen papers where the TREND from rural stations has been compared to the TREND from urban stations that have been adjusted for UHI and the results are very similar. I’m sorry I don’t have a reference handy, but perhaps one of the other more well read visitors will have one. If you can’t locate one after spending time on Google, let me know and I’ll see if I can hunt it down.

I only looked at my hometown site…Tucson…and a few other locations in Arizona (Phoenix, Duncan, and Safford). For Tucson and Phoenix, it appears to be the F6 data without any modification for UHI, instrument changes, etc. Safford and Duncan are coop sites. Warming at any specific site then would be a function of natural global warming (including PDO/ENSO), man-made global warming, and the other factors (like the ones above). Yearly averaged temperatures showed Phoenix warmed the most…3C beteen 1960-1995, Tucson about 1.5C from the mid 70s to the mid 90s; Safford and Duncan about 1C from the mid 70s to mid 90s. How does one determine for any particular station the weighting of these factors? Using Duncan and Safford as a guide, the 1C temperature change probably didn’t have any UHI effects. So Tucson’s UHI was about .5C and PHX a whopping 2C. The 1C of warming at Duncan and Safford would be a combination of PDO (coincided with cold to warm phase), generic global warming, and man made global warming. The 800lb gorilla being PDO phase. Since the mid 90s, there has been no dicernable warming at any of the stations! There has been a major change in instrumentation (the HO83s and ASOS), which I suspect has resulted in an increase in temperatures also.

Not exactly the ones with a long history or purely rural ones or those studied, compared and homogenized by Czech meteorologists or whatever…

Praha – Libus is a station at the southern outskirts of the Prague city and there was definitely a change in the surroundings during the last 40 years.
On the other side, it belongs to the Czech Hydrometeorological Institute, so it is well-kept and serviced.

I remember searching some years ago in the satellite maps for the Mosnov and Turany meteorological gardens and these were quite close to the runways, like it is usual at the airports.

There’s a paper in Czech, saying that at the airport Brno-Turany, higher maximum temperatures have been measured than in the very center of the city Brno, because the Vaisala measuring system positioned close to the runway is capable to catch increases caused by airplanes turning around in the vicinity of the temp sensor!http://www.amet.cz/webmendel/MendelClanekPD05.pdf
Fig 4 (Obr. 4) meteo-garden
Fig 6 (obr. 6) curves of difference between MAX temps at airport Turany (yellow) and the city center of Brno (blue).
Telling…

It might be worth noting that 50 meters behind a large jet engine of the type used by modern airliners the “wind speed” is about 20 knots at idle power and 90 knots at full power (= Category 2 hurricane). The width of the plumes are about 15 degrees and 40 degrees respectively).

The innuendo about incomplete station reporting is that climate scientists are hiding something. There is no evidence of this, nor is there any evidence that full reporting would alter the overall results indicating that the planet is, after all, getting hotter.

Similarly, the many ClimateAudit posts about the East Anglia CRU breakin did not include concerned opinions about who actually committed the crime. It’s as if the Watergate breakin converage focused on revelations about the DNC’s polling data.

Steve:We discussed the University of Victoria break-ins, including on-the-scene reports of computer thefts from the U of Victoria Anthropology Department, in earlier posts, noting the additional possibility that the laptop thefts had been masterminded by Macavity the Cat. Early posts included speculations on whether the zipfile originated within the university or not.

They broke into many departments – not just climate associated ones. Just a band of thieves trying to steal computers.

And who cares if it was illegal – the behavior is still exposed. Since the CRU themselves broke the law with the FOIA, then they are no better.

There is evidence in the e-mails that they were hiding something. Not disclosing that a graph was created with different types of data is hiding something. Mann even denied doing this in an older RealClimate comment a long time ago.

The compensation for the Urban Island effect is still being kept secret by the CRU.

“The innuendo about incomplete station reporting is that climate scientists are hiding something. There is no evidence of this, nor is there any evidence that full reporting would alter the overall results indicating that the planet is, after all, getting hotter.”

I don’t question that we are in a warming trend. However, one puzzling aspect of it is the disagreement between surface measurements and satellite/radiosonde measurments. It would be useful to see the analysis methods to see if bias is being introduced at some point.

And, in point of fact, they have been hiding things. Briffa: Big claims based on insufficient data and questionable source material. (And that famous decline!) Mann: Big claims based on questionable methods and source material.

Hansen shows temperatures rising faster than any other group. What accounts for this?

mike roddy – I would back Macavity over oil companies, those actively involved in tar sands and offshore development, as the culprit in this crime. This is just my hunch, albeit like yours without proof, for he also has an alibi, (and one or two to spare). Just read up his bio and you might be swayed too. There is just no one like Macavity.

It looks like most of the stations, about a third, are not in rural areas, going by the station locations listed.* The rural stations are mostly from certain areas, like N. America and Siberia.

*(If I open a map and there’s a sizeable community or an airport within a tenth of a degree of the location listed I’m assuming that’s where the station is, and in some cases it’s clear from the station name.)

I think a plausible explanation works like this. The WMO is engaged in various projects to update the monitoring of climate. Perhaps they provide assistance or funding. In each nation there would be some subset of stations. For example in Australia there are 100 such stations. RBCN is described here

The Member countries belonging to one of the six WMO Regional Associations have committed to operate a Regional Basic Climatological Network (RBCN), an agreed selection of surface and upper-air meteorological observing stations of the WWW/GOS. World-wide, about 2,600 surface stations and 510 upper-air stations belong to the six RBCNs and Antarctic Basic Climatological Network and generate monthly averages of meteorological parameters measured at the surface and in vertical layers of the atmosphere up to 30km. The data are archived in and made available by the two WMO World Data Centres. RBCNs are based primarily on Regional Basic Synoptic Networks and include all GCOS surface (GSN) and upper-air (GUAN) stations supplemented by other CLIMAT and CLIMAT TEMP reporting stations needed for description of regional climate features. There are over 1,000 GSN and 160 GUAN stations within RBCNs.

So, There are a selection of stations ( they have special requirments ) that have agreed per WMO guidelines to share there data. Stations not included could be left out for a variety of reasons.
Jones, uses these stations plus others. Each country and selected scientists are now being contacted to see if they will provide their data free in clear. They may have in the past, but Jones just wants a record of that. Clear record. I suspect ( tell you later ) that the requests for this went out right after your FOIA denial.
So, they are posting the data THEY KNOW is in the clear, while they wait for the other data. Also, Jones uses CLIMAT data. This is clear from the mails. CLIMAT data appears to come in real time.

The problem Jones and CRU will have now is keeping the data properly marked and segregated. It’s a common problem those of us who have worked with classified data deal with. The other problem they have is this: They cannot sign confidentiality agreements without a finding that the confidential data is NECESSARY to their mission. I’ve FOIA’d them on this question. Did they determine that the confidential data is necessary.

What is most telling for me is that there is now a complete lack of trust in anything the Met Office says or does. They release some data and the first action required is to analyze it for veracity, completeness and accuracy. It is now assumed, rightly or wrongly, they are hiding something or tweaking their data to follow the Team Game Plan.

This is not a good thing, this is not supposed to be the way we perceive our governmental agencies.

Some nice bed the Met Office has made for themselves. It will take a lot of complete openness for their credibility to be restored. They could start by having a policy of releasing raw data and homogenization methodology used to produce the enhanced data products.

Having grown up in a place that grew a lot of corn and a lot of tobacco, the temperature in the tobacco fields was higher then the temperature in the corn fields even though the fields were separated by a nothing more then a single lane dirt road.

Does one simply average the two readings..or does one take into account how many acres of corn compared to tobacco and use a weighted average or does one simply throw out one of the readings to come up with the ‘true’ surface temperature record of the small town I grew up in?

You may wish to reconsider your new format of “nesting” comments. This format simply seems to encourage certain people to hijack the thread, promotes food fights, and has made this particular set of comments almost unreadable (both in substance and in form).

Re: Rossb (Dec 28 15:14),
Traditionally, we get food fights etc on any popular thread 😦
Working on improving the situation. This comment is an example… soon hope to give “Pasted links” back to most everyone 🙂

MrPete,
I really appreciate all your work in making sure the blog runs smoothly. I do realize that food fights have occurred on other popular threads (I have been reading this blog since 2007). But I can’t remember a previous occasion where there have been so many comments snipped and where two commentators have so hijacked a thread – not even when Tom P was floating around with his own “innovative” brand of analysis. It seems to me that nesting encourages food fights. (I do like your example post and I second the request for post numbers).

Steve: the hijacking of threads is really annoying. I’m about to start asking some commenters to stay in the penalty box for a while.

The term hijacking assumes that people are helpless and must go along with the hijacker because there is a gun to their head. Sorry, no gun. I always thought that if people don’t want to discuss some issue that arises from the main thread, they would move along. Apparently, a lot of people do want to discuss side issues. But rather than force Steve to “put me in the penalty box”, I’ll bow out of this discussion that apparently has been solved by actually reading the release notes that went along with the data…

Steve: Susann, the release notes do not state why the RBCN data could be released in December but could not be released in July. Your mischaracterization of the issues is really quite childish.

Yes, there are always commenters who wish to sidetrack any thread. However, 99% of readers do not comment on threads. Over time, feedback has been overwhelmingly that they prefer threads to stay on given issues and not to be diverted onto pet issues of the commenter. Otherwise, I’ve found that all threads degenerate editorially into the same discussion. Thus I’ve increasingly tried to keep threads topical – something that’s been hard with the recent influx of commenters who are unwilling to respect this editorial policy.

Steve: Susann, the release notes do not state why the RBCN data could be released in December but could not be released in July. Your mischaracterization of the issues is really quite childish.

I took it you had two puzzles when reflecting on the data. One was its provenance, and you spent some considerable time pondering and guessing what it was by comparing it to the CRU data. It took your readers to point out that the information on its provenance was readily available in the notes. Another was why it was possible to release the data in December but not July. You’ve offered up several possible reasons.

So I stand corrected — one of the issues you raised with the data remains unanswered.

Provenance has not been settled. The original hypothesis that provenance was from GCOS was superseded by RBCN, as the latter appears to have fewer stations unaccounted for. However, there is still no exact match.

You mean, Moser, that since I’ve been gone, this place has been just one big happy agreeathon with no criticism of Steve’s posts, no foodfights, no OT, and no coatracking? I thought y’all were skeptics.

With one notable exception ( the lorax thread ) we have not had BIG FOOD FIGHTS since you left. Spurious correlation is your best argument. use it, no charge. That does NOT entail your contention:

“You mean, Moser, that since I’ve been gone, this place has been just one big happy agreeathon with no criticism of Steve’s posts, no foodfights, no OT, and no coatracking? I thought y’all were skeptics.”

I mean EXACTLY what I say I mean. No BIG food fights.
Which english word in that sentence can I google for you?

has it been a happy agreeathon? No. read the Yamal threads. No. read the threads on CRU FOIA. read.

No Food fights. Like I said BIG food fights. Foodfights where threads had to be zambonied regularly,

Steve – thanks for the analysis, it may not prove anything, but for this one regular reader of your blog and Tip Jar contributer, it lets me know my contribution is never wasted as you’re always on the job, working hard to dig out the facts on this important global issue.

It appears that the data is sloppy from reading Harry but other than the Harry file, I have nothing to base that conclusion on. Nor do I know if other data held by other sciences or agencies is any more or less sloppy. Yes, Steve claims none of it meets his requirements, which I assume are from the minerals industry, but does any of scientific data meet it? People can claim “I worked for umpteen years in science a and we never had such sloppy data” but how do I know they’re legit?

We take Harry at face value and assume he is revealing something about the shape of the data he is working on. Yet, one could also read Harry to be about a data tech who doesn’t know how to use the data, whereas previous techs may have been well-versed in its use and mysteries. IIRC, Harry kept making these mistakes and chastising himself about it. Maybe it’s a bit of both? Who can say for certain? None of us except the principals involved.

In which case could you refrain from asserting “we” can’t know something when in fact it is you that doesn’t know.

Many people on this blog make claims that they or “we” know many things about climate science, Jones, the data, the emails, the hack/leak, etc. that I don’t know so this seems to be a common occurrence.

I would be willing to suggest that with respect to the Met release of data, and the CRU data and staff, unless you actually work for the Met office or in the CRU, were part of the FOI team, and/or were involved in creating the data, that we — any of us — don’t know much for certain. We have a limited amount of the evidence and thus can only conclude so much from it — partial at best, and potentially erroneous because what we lack access to may be the more important information. Some of that evidence will not be available until the investigation is complete.

Another problem solved – spent ages wondering how the start & end years for a particular station
(WARATAH) were being corrupted. Turns out they weren’t – I’d written ‘getmos’ to trim empty years,
but forgot to check the return flag! Duh.

No one. Previous code not properly documented. Within just a few lines of your cherry-picked quote are several examples. One that is more relevant to this thread is the following:

I am very sorry to report that the rest of the databases seem to be in nearly as poor a state as Australia was. There are hundreds if not thousands of pairs of dummy stations, one with no WMO and one with, usually overlapping and with the same station name and very similar coordinates. I know it could be old and new stations, but why such large overlaps if that’s the case? Aarrggghhh! There truly is no end in sight. … And a long run of duplicate stations, each requiring multiple decisions concerning spatial info, exact names, and data precedence for overlaps. … One thing that’s unsettling is that many of the assigned WMo codes for Canadian stations do not return any hits with a web search. Usually the country’s met office, or at least the Weather Underground, show up – but for these stations, nothing at all. Makes me wonder if these are long-discontinued, or were even invented somewhere other than Canada! Examples:

Norbert,
Completely out of context. Making a mistake does not make one an incompetent software engineer as suggested by Susann. The mess is obvious and it wasn’t Harry’s making. No real database, just a collection of poorly documented flat files. Data files that are sometimes little endian and sometimes big endian. Some files with values scaled up so they can be stored as integers while other files are not. In short, spaghetti code and spaghetti data.

Susann did acknowledge that it appears the data is sloppy, and just pointed out a theoretical possibility based on the fact that you can’t completely rely on a single persons report. Even the above quote by Phil contains the sentence:

I know it could be old and new stations, but why […]

He is writing those comments at a point where he hasn’t figured things out yet. There might have been some documentation that Harry didn’t know about or couldn’t relate to. The documentation might have been on the servers of some Nat Met Service that he didn’t look at because he forgot that someone asked him to do so 5 weeks ago, when he was overly tired as he sometimes seems to have been. He might have been looking at some experimental files from someone who was working only for a few weeks, thinking it was the real thing. You just can’t reliably tell from a single person’s text file.

Beyond this, I don’t want to discuss this further since a) it appears somewhat off topic and b) there are other reasons to believe that there have been data storage problems. (For me personally, it’s just difficult to tell to what extent.)

@Mike Roddy
“The innuendo about incomplete station reporting is that climate scientists are hiding something. There is no evidence of this, nor is there any evidence that full reporting would alter the overall results indicating that the planet is, after all, getting hotter.”
End part post

There is no innuendo the self named climate scientists obviously manipulated and “hid” data (prevented/obstructed the release of data). All you need to do is read the leaked emails?

Certainly the northern hemisphere contains the great majority of Earth’s continental landmasses. On a country-by-country basis, though, we suspect that small, undeveloped, predominately equatorial polities will exhibit Warmist biases sufficient to excite the Met’s climatological voyeurs. Random selection would hardly exhibit such equatorial clusters… in compiling this egregious “subset”, why else would MO functionaries so conspicuously neglect more easily available alternatives? If there is an objective, rational basis for this list, interested parties deserve to hear it.

(1) Require registration to post. That will eliminate most of the drive-by food fights.

(2) Set up a registration system where people can reveal their technical or scientific training. Maybe even have some threads where only technical minded people can post, and also some sandbox threads for the kids.

It is typical of this whole episode that the Met Office has not yet, to my knowledge, published a significant list of countries that would be adversely affected by data release because of existing agreements.

Fortunately, it is a very simple matter to write to both a government and the Met Office to ask if your country is involved in a primary or a secondary “third party data sharing” agreement.

If the answers agree and are in the negative, that makes the actions of the Met Office look a bit lame.

Some country copyright documents require first party/second party agreements before use of data; some do not automatically allow third party data sharing; or second party alteration of data without notification to the first party. In filling of data would be such an example of the latter.

Why not write to your government? It does not have to be under FOI in the first instance, so “Nature” cannot complain about the gigantic work load.

The UK Met Office publishes the central England Temperatures [CET] data from 1659 to the present. If you compare this with the NASA global temperatures for the period 1880-2008 using averages for each data set covering 1950-1980, there are differences, however just eyeballing the moving averages you can conclude that the CET data reflects the way the global temperatures move, therefore with a leap of faith you can say that the CET data reflects what happened to global temperatures prior to 1880 so we don’t need the Mann type proxies which probably have a much wider error budget than my leap of faith.

Why is this interesting? Because the CET data shows a number of periods when the rate of increase in temperature is similar to the latter end of the Mann hockey stick, and when we come out of the Maunder Minimum the rate of increase is greater than any other part of the curve.

Now I realise that eyeballing and leaps of faith are not very scientific but if anyone who has the statistical ability to put this stuff into better shape I would be happy to send them the Excel spread sheet I used. Contact me via http://www.akk.me.uk

Mike,
Presumably you’ve seen all this – the forwarded email from Tim. I got this email from McIntyre a few days ago. As far as I’m concerned he has the data – sent ages ago. I’ll tell him this, but that’s all – no code. If I can find it, it is likely to be hundreds of lines of uncommented fortran ! I recall the program did a lot more that just average the series. I know why he can’t replicate the results early on – it is because there was a variance correction for fewer series. See you in Bern. Cheers
Phil

Worse and worse! In 2009 Jones says the code is “…likely to be hundreds of lines of uncommented fortran”.
(1) Fortran??? I bet he made copious use of Common – banned as a constant source of hard-to-find errors in large (>60 statements of unstructured code) programs by any decent programmer as far back as the late 1970s!)
(2) “… hundreds of lines…” – sounds unstructured.
(3) “uncommented” – Omigod!

In summary, this all sounds like an utterly avoidable disaster. What Jones seems to be talking about would be OK for a two-week project by a couple of undergrads, but would NOT be accepted even from a 102-level trainee programmer in any self-respecting IT shop.

Who ARE these guys? I honestly cannot belive that such apparent crass ignorance of and incompetence in basic programing skills seems to be alive and well in an institution that is the cornerstone of so much political energy and man-centuries of effort to say nothing of squintillions of our money! Does no-one care that rank amateurs are writing the code that so very much hinges upon???

Problem with Los Angeles Temp data—I have been auditing the Los Angeles temperature data that is available from the CRU site, station 772950 with data from 1894 to 2009 with the
historic temp data from the National Weather Service for station 045114 for the same area,
data from 1906 to 2009. The plots overlay exactly until 1991. The general trend in the July
temperature is a gradual rise until the 1980s, then a decrease until present. Starting in
1991 the CRU data takes a massive 2-4 deg C drop below the NWS data for whatever reason. The
general trend is the same. Not sure what is going on here, but it don’t look good.

I have found the problem with the Los Angeles temperature data. Until 1991
the temperature data tracks the downtown Civic Center temperature exactly. For
some reason, in 1991 CRU starts using the LAX data which is regularly 2 deg C
cooler and they use this up to the present. I have overlayed the LAX and Civic
Center plots and although they are only about 12 miles apart, there is a constant
temperature gradient between them due to LAX being right on the ocean. I don’t know
if this move is an attempt to take care of the downtown UHI effect or to compensate
for a station move. In 1999 the downtown Civic Center station was moved about 6 mi
SW to the campus of USC. The consequence of this is that the 1980s stand out much
more than they would have had the station data remained downtown. One of my concerns
about the low number of stations is that if only one or two stations are used
per 5 X 5 grid, any error at one of these stations is bigger than if you had more
stations. In the Los Angeles grid, there are only two stations in the CRU data:
Los Angeles and San Diego. The CRU name for the Los Angeles station is LA International, which indicates LAX, however, LAX data is only being used since
1991. I am suspecting that station moves will be much more of a problem than anyone suspects. Doesn’t look like much value was added to this station.
Conclusion: one out of one grid fails audit so far.

Completed audit of the Los Angeles 5 deg X 5 deg grid which only contains
a Los Angeles weather station (at one time was right next to the Santa Anna
freeway on top of a parking garage!) and a San Diego station at SD International
airport. The CRU ref # for San Diego is 722900, the local station # is 047740.
Evaluated the years 1914 to 2005 and CRU is a perfect match for the NWS data
of station 047740. Gradual temperature climb up to mid 1980s, then sharp fall-off
to the present, much like we have seen elsewhere. Recent color plots of this
grid on the big board will show up as blue = cooling. Question is, where is the
UHI correction? I don’t see any. This wraps up the LA grid. Not much value added.

This whole discussion tells me only that the management of crucial data – the foundation stone for a proposed world-wide tax-and-spend of multiple trillion dollars – seems to be in the charge of a bunch of blatant amateurs, who are so unbelievably incompetent that they do not understand the basics of good data management and data schema design. Alternatively, they’re deliberately trying to obfuscate the issue.

PS: I may not know much about climate science, but I’ve spent my working life in IT design and implementation.

PPS: No version numbers on data in all.zip means to me that it’s raw data. If it’s not, then it’s processed data (the result of applying some algorithm to the raw data). But it’s unlikely that the raw data was only processed once (one pass with one algorithm with one set of run parameters), and so where’re the version numbers?

PPPS: A whole data base in 1,741 separate text files, each with a sort-of-schema at the top??? I wish I’d been able to get away with that sort of thing. Aside from being hugely labour-intensive to maintain, itcouldn’t be better designed for maximum errors in such maintenance. Unless, of course, it’s produced automatically from some other source – maybe a proper database? But then, from some of the leaked emails, I’ve gathered that when they run models, they do it from a command line, like we used to do back in the 70s, and specify as a command parameter the file or files to use in a run. Is that normal in academia? If so, you really need to get up-to-date if your resutls are actually going to be the basis for big decisions – far less gigantic world-wide ones that could cost uis trillions in taxes!!! This is not IT, this is infantile mucking about by untrained amateurs. Oh my God!!!

Interesting selection of stations in Denmark – as Ulrik mentioned, the 2 stations released (Copenhagen + Aalborg) are metropolitan areas, all of the other listed stations in the boons! It is difficult to believe that this is a coincidence.

In addition, in Denmark most institutions and organizations dealing with science, education and public services are organized at the national level. It is therefore highly unlikely that the selective release is due to legal or confidentiality issues, as all of these stations are controlled by the same jurisdiction.

There is clearly something fishy going on here…

4 Trackbacks

[…] major networks about too many matters to be an insult to my intelligence. Came across a link via Climate Audit to Dec. 7 coverage of Climategate by YLE, a Finnish TV network. English text is superimposed, […]