Updating the CRU and HadCRUT temperature data

The latest incarnation of the CRUTEM land surface temperatures and the HadCRUT global temperatures are out this week. This is the 4th version of these products, which have undergone a number of significant changes over that time and so this is a good opportunity to discuss how and why data products evolve and what that means in the bigger scheme of things.

The paper describing the new CRUTEM4 product is in press at JGR (Jones et al, 2012), and makes a number of important observations. First, on the evolution of the CRU temperature data set, from CRUTEM1 back in the mid 1980s, which used a limited selection of station data and ‘in-house’ homogenization, to CRUTEM2 around 2003, CRUTEM3 in 2006, and now CRUTEM4 which has a wider data sources and relies much more on homogenization efforts from the National Met Services themselves.

Second, the paper goes into some detail about how the access to data, and reasons for the above changes, the history of homogenization efforts, and the current status of those efforts. Much of this is excellent background information that deserves to be more widely known. For instance, the timing of the CLIMAT (monthly average reports) available almost immediately, MCDW (Monthly Climate Data for the World) after a few months, and the World Weather Records (WWR) (once a decade, last one issued for the 1990s, next one due soon) has a much larger influence on spatial coverage and station density than would be ideal.

The third point is how this product differs from similar efforts at GISTEMP, NCDC, and (though not mentioned) the Berkeley Earth project. The basis for GISTEMP and NCDC is GHCN for the majority of their records, so there is a lot of overlap (95% or so), but there are big differences in how they deal with grid-boxes with missing data. GISTEMP interpolates anomalies from nearby grid points, Berkeley uses kriging, while NCDC and CRUTEM estimate global means only using grid boxes with data. Since many missing data points in CRUTEM3 were in the high latitudes, which have been warming substantially faster than the global mean, this was a source of a low bias in CRUTEM3 (and HadCRUT3), when these data products were used to estimate global mean temperature trends. The increase in source data in CRUTEM4, goes some way to remove this bias, but it will likely still remain (to some extent) in HadCRUT4 (because of the sea ice covered areas of the Arctic which are still not included). Another improvement is in how the error bars are being estimated – due to data sparsity, autocorrelation, structural uncertainty, and assumptions in the synthesis.

The CRUTEM4 data is available here and here, along with links to the full underlying raw data (minus Poland) and the code used to process that data (this is not quite finished as of when this post went live). This is a big step forward (but like the release of the code for GISTEMP a few years ago, it is unlikely to satisfy the critics).

So what does the CRUTEM4 data look like?

Overall, changes are small (see figure to the right, showing the trend (°C/60 years) for each CRUTEM3 (top), CRUTEM4 (middle) and the difference in their trends (bottom)). There is no change to the big picture of global warming in recent decades, nor in its regional expression. Where there are noticeable changes, it is in coverage of high latitude regions – particularly Canada and Russia where additional data sources have been used to augment relatively sparse coverage. Given the extreme warmth of these regions (and the Arctic more generally) in recent years, combined with the CRUTEM procedure of only averaging grid boxes where there is data (i.e. no interpolation or extrapolation), this extra coverage makes a difference in the trends.

There will of course be an impact on the combined ocean and land temperature record, HadCRUT4. This incorporates (and bring up-to-date) the HadSST3 product that we discussed last year. The paper describing HadCRUT4 is also in press (Morice et al, 2012).

As expected, the changes (a little from both data sets) lead to a minor rearrangement in the ordering of ‘hottest years’. This is not climatologically very significant – the difference between 1998 and 2010 is in the hundredths of a degree, and most of the attribution work on recent climate changes is looking at longer term trends, not year to year variability. However, there is now consistency across the data sets that 2005 and 2010 likely topped 1998 as the warmest years in the instrumental record. Note that neither CRUTEM4 nor HadSST3 are yet being updated in real time – they only go to Dec 2010 – though that will be extended over the next few months.

There are a number of issues that might need to be looked at again given these revisions. Detection and attribution efforts will need to be updated using CRUTEM4/HadCRUT4, though the changes are small enough that any big revisions are extremely unlikely. Paleo-reconstructions that used CRUTEM3 and HadCRUT3 as a target, might be affected too. However, the implications will be more related to the mid-century and 19th C revisions than anything in the last decade.

We can make a few predictions though:

We can look forward to any number of contrarians making before and after plots of the data and insinuating that something underhand is going on. Most of the time, they will never link to the papers that explain the differences. (This is an easy call because they do the same thing with GISTEMP all the time). (Yup).

Since the “no warming since 1998/1995/2002” mantra is so seductive to people who like to focus on noise rather than signal, the minor adjustments in the last decade will attract the most criticism. Since these fixes really just bring the CRU product into line with everyone else, including the reanalyses, and are completely unsurprising, we can expect many accusations of groupthink, deliberate fraud and ‘manipulation’. Because, why else would scientists agree with each other? ;-)

Joking aside, there are some important points to be made here. First and foremost is the realisation that data synthesis is a continuous process. Single measurements are generally a one-time deal. Something is measured, and the measurement is recorded. However, comparing multiple measurements requires more work – were the measuring devices calibrated to the same standard? Were there biases in the devices? Did the result get recorded correctly? Over what time and space scales were the measurements representative? These questions are continually being revisited – as new data come in, as old data is digitized, as new issues are explored, and as old issues are reconsidered. Thus for any data synthesis – whether it is for the global mean temperature anomaly, ocean heat content or a paleo-reconstruction – revisions over time are both inevitable and necessary. It is worth pointing out that adjustments are of both signs – the corrections in the SST for bucket issues in the 1940s reduced trends, as do corrections for urban heat islands, while correction for time of observation bias in the US increased trends, as does adding more data from Arctic regions.

Archives of data syntheses however, are only really starting to be set up to reflect this dynamic character – more often they are built as if synthesis just happens once and never needs to be looked at again. There is still much more work to be done here.

But even while scientists work on ironing out the details in these products, it’s worth pointing out what is robust. All data sets show significant warming over the 20th Century – regardless of whether the raw data comes from the ocean, the land, balloons, ice melt or phenology, and regardless whether the data synthesis is performed by the scientists in Japan, Britain, the US, individual bloggers or ‘sceptics’.

The idea that all the world’s top scientific agencies have been traduced would be laughable if so many people had not been traduced by it. The “conspiracy” is on the other “side” and is by definition unskeptical, no matter what label it chooses for itself or acquires from people who are disgusted with it.

Have you ever been in a room with several scientists? Having risen like cream to the top of their fields (which is not easy, try it before assigning yourself to be a judge), they all have strong self-esteem and are fond of their own opinions. Agreement among them is a strong indicator of truth.

The envy of intelligence that would like to take it down rather than use its capabilities is pushing the envelope of climate toward an ever greater likelihood of ever greater destructive potential.
—
Usage quibbles on “data” are out of date. (If you were translating from the Latin, “are” would be correct.) Common usage now for a body of data is “is”. I used to fuss about this, but English is a living language, and American English even more so.

Disproportionate polar warming (Arctic responds much faster than Antarctic to greenhouse warming) has been a prediction of enhanced greenhouse warming ever since the early 1980’s, so that observation is entirely consistent with expectations; e.g.

A whole load of other “completely separate measurements” have been made” that are similarly consistent with expectations from our understanding of the Earth response to enhanced greenhouse forcing. These include (1) enhancement of tropospheric water vapour concentrations, (2) enhanced-greenhouse-induced effects on stratospheric temperature and tropopause height, (3) effects on relative increases in day and night temperatures, (4) response of atmospheric circulation to warming and (5) latitudinal changes in precipitation trends, changes in (6) sea level and (7) mountain glacier extent, and so on…

The increase in surface temperature and accumulating ocean heat are also consistent with expectectations. All of these measurements “support” “CO2 GW”, which btw is essentially a truism and isn’t going to be “falsified” any time soon!

If you want to know all the details about how NASA computes global-average temp estimates, you can find a wealth of information on the NASA/GISS web-site (data.giss.nasa.gov). Full code and documentation can be found there.

It turns out that the global-average temperature estimates are extremely insensitive to variations in station selection, as I demonstrated in my previous post. I was able to produce results that look very similar to the official NASA/GISS land index results by applying a very simple anomaly-averaging procedure to raw temperature data from just a few dozen rural stations.

The bottom line is, the surface temperature network is so spatially oversampled in most places that station selection is not critical at all for global-average temperature estimates. Pick virtually any subset of stations with sufficient global coverage, and you will get results consistent with the results that NASA has published.

The basic “bare-bones” procedure for computing very good “first cut” global-average temperature estimates from the raw station data really is quite straightforward. A competent programmer/analyst should be able to replicate the NASA land-temperature index quite closely just from the information that I have provided in my posts here.

In terms of measurements to asses the AGW thesis, basic temperature readings are obviously important. But what of other indicators – the apparently missing tropospheric hotspot, disproportionate polar warming (Arctic: Yes, Antarctic: No). Given that average surface air and shallow ocean temperatures are going sideways, and deep ocean temperatures not well enough known to asses if heat is “hiding” there, what efforts are there to seek some other, completely separate measurements to either support or falsify CO2 GW?

Well, when one adjusts for known temperature influences – enso, solar cycle, and volcanoes the ‘sideways’ temperature simply disappears.
If you don’t think that accounting for lower solar input, volcanic aerosols, and the known effects of the el-nino/la-nina cycles is appropriate, do tell us what you base this on.

This is fully in keeping with the consensus view that a decade or more can be weather, that it takes more time than that to falsify ‘CO2 GW’.

It’s worth pointing out that there’s really no doubts in any quarter about CO2’s direct effects on global warming. 1 degree centigrade per doubling, and you won’t find the likes of Spencer, Monkton, or Lindzen arguing with that. The effects of CO2 on infrared radiation can be easily measured in the lab.

The disagreements, such as they are, deal with natural feedbacks to a system warmed by additional CO2.

If you think the consensus has it all wrong, please start by explaining the current greenhouse effect (33 degrees) without postive feedbacks like water vapor.

As for disproportionate warming of the poles, it’s well explained by the fact that the north pole is a shallow ocean surrounded by land that has warmer currents flowing into it while Antarctica is a big block of ice surrounded by circumpolar currents.

It does no good, Double, to insinuate that the consensus has it wrong because measurements that you might like to see aren’t there. There are very good, and well known, physical reasons for why the measurements you ‘expect’ couldn’t possibly happen. Or should we not expect less heating when TSI declines?

“completely separate measurements to either support or falsify CO2 GW”

You mean like an observed increase in the altitude of the tropopause, an observed decrease in energy being radiated at top of atmosphere in the CO2 band with a corresponding but lesser net increase in energy being radiated in other bands, and consequently observed stratospheric cooling—the real fingerprint of enhanced greenhouse warming, all of which was predicted?

Normals are computed for as many NWS stations as reasonably possible. Some stations do not have sufficient data over the 1981 – 2010 period to be included in Normals, and this is the primary reason a station may not be included. Normals are computed for stations that are part of the NWS’s Cooperative Observer Program (COOP) Network. Some additional stations are included that have a Weather Bureau — Army — Navy (WBAN) station identification number including the Climate Reference Network (CRN). Normals are only computed for stations in the United States (including Alaska and Hawaii) as well as U.S. territories, commonwealths, compact of free association nations, and one Canadian CRN station. (top)
How many stations will be included in the normals?

The 1981-2010 Climate Normals includes normals for over 9800 stations. Temperature-related normals are reported for 7500 stations and precipitation normals are provided for 9300 stations, including 6400 that also have snowfall normals and 5300 that have normals of snow depth….”

If you’ve been reading denial sites, there’s no evidence for the “conspiracy” claims about hiding or excluding station information. None.

I just love it when glibertarians go all sciencey on us. It’s like watching the actors on CSI pronounce their lines phonetically! And then when they proceed to score about half a dozen own goals by citing predictions of climate science as mysterious puzzles…well it just doesn’t get any better.

I love the smell of Dunning-Kruger in the morning. Smells like victory!

Jason, there are three major ‘official’ datasets; each is described in the scientific literature, in detail. Yes, links are available for all, but I’m not going to chase them all down for you. For GISTEMP, I would start here:

I see links to multiple papers over the years, and I’m sure that some of your questions will be answered in those papers. But be prepared to spend some time.

I’ve seen similar links for NCDC data; I don’t specifically recall them for HADCRUT, but I bet they are out there.

A possible shortcut would be to check the index of climate-related data links at Tamino’s site, “Open Mind.” Just Google it.

I believe the data and methods for the unofficial but ambitious BEST project is available as well. And of course there are other initiatives, too, which have discussion of related issues in excruciatingly technical detail. Some of those should be searchable from Open Mind.

Finally, the core of all three ‘official’ data sets is the Global Historical Climate Network (GHCN.) So a possible research shortcut would be to just read up on GHCN criteria.

Re station selection – if you use Anthony Watts skeptical criteria and only select known good rural stations, thereby eliminating urban heat islands that are inflating global warming, you find that uhm, er, ah.., oh look, over there – its a bunch of hacked CRU emails!

15 Derek said, “The Whitehouse climate bet was simple and was indeed set by james Annan in 2007 that between 2007 and 2011 there would be no new record (1998 was the warmest using HadCRUT3). Since in HadCRUT4 no new record was set over the same period the outcome of the bet would be the same. What is it you don’t understand?”

Since 1998 was the year to beat, that 2005 beat it is either irrelevant, as the year is outside the 2007-2011 window, or proof that the bet was won. To use it as a gotcha-substitution for 1998, well, that’s a demonstration of your character. Finally, since 2010 beats 2005 in Hadcrut4, your argument fails and seems to devolve into a lie.

So, yes, it’s simple. Even using your dishonest methods, 2010 beat 1998 AND 2005, and the bet was won. Why do you pretend otherwise?

47 Dez sez, “One obvious answer is that they work for the same / parallel organization in society.”

Have you ever met a scientist? Ever seen a few interact at work or a social gathering? Exploration, dissection and disagreement galore, but perhaps most telling is the caveats. Scientists are most interested in the times a statement is wrong. Keeping them from exposing the man behind the curtain would be like herding cats.

Honesty is pseudo-sacred in science. Peter Gleick was universally condemned even though his transgression was merely using social engineering to get the goods on what many believe to be a truly evil entity. You’re going to get that group of people to falsify data and not just write bad papers (which certainly would have to have flawed equations), but collude to accept them and only them in peer review?

Now, what is this parallel organization you speak of? We’ve got lots of universities and some governmental agencies, such as NASA, involved. Who “controls” them? Well, for eight years Bush was the big boss. His opinion was well-known. He said, “I read the report put out by the bureaucracy,” in response to an EPA report on global warming.

Think back to high school. Remember the nerdy kids who could pretty much write their own ticket to whatever college and career they wanted? (well, maybe not pro football or movie star) Some became doctors, some lawyers, and some entered the financial industry. They all expected to be paid very well for their efforts. But what of the nerdy kid who chose science? He knew going in that he’d probably never make more than a middle manager.

What does that tell you about his motivations? Your conclusion that such a person would commit crimes, shred professional ethics, and sell his soul for a few dollars paid to his employer, not himself (presumably in the hopes of a raise) – well, it just doesn’t stand up in my mind. Why not spend a couple years as a banker and retire instead?

But scientists are people, and some get corrupted. Which ones would be most susceptible? I propose that it would be the contrarians. Their money often comes from private sources, so it can be paid directly to the scientist without any silly rules attached, like requiring the funds to be accounted for as used for research as opposed to a vacation or car. This means they can be corrupted without breaking the law or violating too many ethical boundaries beyond the bad science itself. Perhaps Fred Singer is an example.

So, it seems like a witch hunt to me, with the appropriate(?) minor adjustments in bank accounts being the result, but what of the $1.6 million? Hansen is doing rather well for himself. But to attribute to motivation, one has to look to the past. There was no money in climate science when Hansen switched to it ~1980, nor in 1988 when he gave his famous testimony. No, it is obvious that Hansen did and is doing what he thinks is right. Yep, $1.6 mil fell in his lap. Karma can be a satisfying concept.

Watts’ accusation is actually evidence you’re wrong, Dez. If Hansen were trying to please his employer, then why isn’t he being a team player? Rebelling against a permission form by not conscientiously filling in all the blanks, such as cost of provided travel! He even committed the ultimate sin, getting the boss’ boss upset. (you know, President Bush). And I’m sure it helps his career to use up his vacation time visiting various jails. Sorry, the guy just doesn’t sound money hungry or conformist or like someone who would deviate from his beliefs for fame or fortune.

Yep. That’s why satellite gravimetrics were used to make the worldwide loss of glacier mass appear uhmm, er, LESS? than the field measurements of surface ablation? How did the worldwide conspiracy of scientists smart enough to completely dominate the peer reviewed literature be so dumb? Must be a trick to make us think they don’t have a hippy liberal econazi sozialist plan for world domination.

Snarkasm aside, what the comparison shows is that surface ablation, loss of thickness due to warming, when converted to mass, results in a larger mass loss than satellites show. The problem is not that glacier loss is slowing, as widely misreported in the denialsphere. Some of the surface melt water flows into cracks and pores in the glacier, staying where the satellites gravimetry senses its mass. Some soaks into local porous rock and soil, not moving far enough to be distinguishable by satellite. There are undoubtedly errors in the surface density assumptions in the conversion from ablation to mass. But the glaciers aren’t melting slower.

I find that statements, that conflate global warming with the global temperature increase, troublesome.

For me, global warming = positive radiative forcing by greenhouse gases etc. The global temperature increase is one manifestation of global warming.
Keeping these distinct would I think help communications e.g. on claims that global warming has stopped.

[Response: You are fighting a losing battle on this one. Global warming defined as the increase in (long-term) global mean temperature is by far the most natural and sensible definition. Arguing for something else just looks like semantics. – gavin]

This is a matter of clarity of definition rather than semantics. The global temperature increase is exactly what it says it is.

Warming is a different concept which, as you well know, can be manifest in a range of parameters, e.g., a change of state.

To conflate these is to help lose the communication battle.

[Response: You have it exactly backwards. Redefining perfectly well understood terms (like warming) to be some specific, relatively obscure, technical term is only a recipe for confusion. And we don’t want that, right? – gavin]

Can someone give me a citation of the study that shows the warming expected of natural systems ex-cagw? Seems to me that we should expect warming of some level in an interglacial period. And I know the study has been done, just don’t know the citation so I can read it.

2 Loss of land, and ocean, snow and ice: result, enhanced seasonal flooding, loss of vital water resources and a range of environmental and economic damages
3 Sea level rise: result, flooding of costal zones including major cities, loss of low-level fertile land and threatening the existence of small island states.

The latter two are not necessarily reflected in the global temperature record at all.

Therefore to say
Global warming= Global temperature increase: alone (which I take to be your position) seems to miss two well understood communication points.

Also, as the annual global temperature is inherently variable, it will inevitably give rise to spurious and confused statements/headlines such as “global warming has stopped”.

JimBrock @73 — At this time the orbital forcing is low so without anthropogenic influences it is possibel (in such an alternate universe) that this would be ythe descent into the next glacial. There is an earlier guest post by W.F. Ruddiman on this topic here at RealClimate.

Jim Brock @73, interglacial warming from peak Milankovitch insolation forcing and natural amplifying feedbacks peaked ~8000 years ago at what’s known as the Holocene Climate Optimum. Use that term as your search text string in Google Scholar to find the relevant literature. We’ve been in a long, slow decline ever since, with excursions both above and below the trend due to natural variability, of course.
See: http://www.globalwarmingart.com/images/b/bb/Holocene_Temperature_Variations_Rev.png

I have noticed a trend towards talking about 20th century warming now being siginificant..(is this an attempt to lower the bar for 21stc warming?) and more emphasis on the trend over this period , when did anyone ever think there was a cooling trend?

As attribution work is looking at longer timeframes apparrently

a)What is the consensus view on when anthrpogenic warming became detectable or significant tin the temperature record (considering the long cool period mid 20th century and co2 curve) does significant imply AGW began in 1900??

b)Accepting the normal period as previously defined 1961-90 does Hadcrut 4 change the baseline?

c)Is the test period not more directly related to the models/predictions in the 21st century?