Thermal Hammer Part Deux

This is basically a continuation of the documentation of Roman’s improved method of combining temperature anomaly. First there are several plots which show the difference between the simple – average the anomaly method – used universally in climate science overlaid on the improved method.

Figure 1 - Two plots of global temperature anomaly. Red is the standard method used in climate science, and black is the improved method.

Note the slightly higher trend in the black method. This is created by the required offset corrections for incomplete temperature stations. Looking back, it’s kind of funny, I wouldn’t even calculate an Antarctic trend without the offsets. Nobody had to tell me, hey Jeff you should….. it’s obvious!! They are a conceptual necessity in calculation of a global trend. The more I consider it, the more idiotic it seems that climate ‘science’ doesn’t do it.

Below is the difference in trend according to these new methods. Please don’t pay too much attention to the confidence intervals b/c the data is filtered.

How cool is that! I mashed together global trends using my own standard method in Figure 3 and came within a few decimals of the pro’s. Like sea ice, this is not too bad.

Roman’s method is so obviously better Figure 4, as is the method I used for Antarctic trend but they are not used by the pro’s. It’s slightly higher in trend as we would expect.

It’s too much, it has to be said — the waministas who would trade their grandma’s for a few tenths of a degree, didn’t think of the most obvious and correct method to calc temp anomaly. I’m not even talking about Roman’s mastery of the subject but just about offsetting two anomalies to match each other. The method always increases trend — but it is correct!! Wow, in Mann08, I was convinced that the idiocy was fraud, this time, I think it’s just incompetence. It is like a whole new world has opened up.

Let’s leave my overactive (and over-vocal) thoughts for another post though. I promised the complete code last night and nobody called me out on it. I’ve attached everything except some plotting functions below. The plotting functions are not required for verification and they do add a lot of false complexity to the appearance. One of my goals is to get people to take a crack at using R and actually running this stuff. It’s not so heady that you can’t do it, and there is a ton to learn. If you need the plotting code, leave a comment.

#1, I tend to let these comments go to others with more opinion. Life is incredibly busy (far too) but my hope is that people realize that I always read.

There are a ton of things I could write on the meaning of temp rise. It would be cool to get someone like the Lurker to let it rip. IMO, it’s good to combine data in the best possible way, no matter the result or meaning. I mean it’s a pure goal. We don’t get a choice, nobody is forcing anything on anyone else. It’s just data. Cold as space and as predictable as addition. We can actually calculate a proper temperature anomaly from what appears to be weak data in need of serious QC.

When you say — does it matter.

The answer is yes, to modelers who believe they can predict the anomaly of the atmosphere. Consider that we have hundred million dollar computers calculating temperature models, and guys like Ben Santer comparing their output to temperature trend for accuracy. YET NOBODY BOTHERED TO OFFSET [snip] ANOMALIES!!!

Willis’s post has to do with the comparison between correlation (which like the Antarctic posts here) is highly dependent on short term variance, and trend which affects correlation nearly zero. It’s actually an odd post for him because he’s very good with numbers.

What this post, like the previous post, shows, is that there is no ‘hide the decline’ in the algorithms that create global temperature series. They are basically correct for the data that is being shoved into them. That means that those who would argue with global warming have got to focus on the data, b/c the trends presented are reasonable in a mathematical sense.

What this post and the previous also show, is that the boys who like warming, don’t have a clue what they are doing. It’s obvious and correct to offset anomalies when calculating trend. If you need the usage of a supercomputer to calculate your model, why the hell don’t you calculate anomaly offsets for trend on your laptop.

Geoff Sherringtonsaid

Not wanting to sound downbeat, but to me the next logical step would be to look at the ways that various experts would handle gridding/interpolation etc. You sound like you could do with a holiday first, though. Even a Roman Holiday.

timetochooseagainsaid

Now there is just one final nut to crack-how to account for various sources of bias that might be present in the GHCN data. By which I mean, anything that might induce spurious or unrepresentative trends that hasn’t been accounted for.

That’s a tricky one.

Oh, quick question, have you thought about looking at Max/Min temps? I believe, correct me if I’m wrong, that GHCN lists those separately, in addition to the mean they get from them. A lot of us surface data skeptics think that Max temps would be more representative of the thermal state of the atmosphere and less influenced by local biases. The Diurnal range is also supposed to be a big signal in the data (of something, but people disagree on what).

Thanks, Jeff and Carrick,
I revived the savePlot commands and found my version of R insisted on type=”jpeg”, not jpg. That fixed, it ran for a long time, through all the gridcell trending (showing reasonable plots), but stopped at line 350 where “weight” had not been created.

Re: Nick Stokes (Mar 26 07:30),
I went to this post, and found this code for weights:
weight=sin(seq(2.5,180, 5)*(pi)/180) #cell weights
meanweight=mean(weight)
weight=weight/meanweight #normalize cell weights
That let me proceed. Objects hadcrut and glava were missing, but I could just skip those plots. Then the function calc.anom() was missing. I guess I could find that without too much trouble, but by now a whole lot of plots have appeared, and they look pretty reasonable. “Global Temperature Anomaly” has a super HS at the end – an artefact, but matches well otherwise. Fig 4 of the previous post seems identical, as do the hemisphere anomaly plots. The temp plots look much the same.

Bob H.said

It seems the processing of the data has been refined to the point where the results are reliable based on the input data. The problem, as I see it, is the input data has been badly corrupted through UHI effect and other seemingly random adjustment (New Zealand, Australia, etc.) and poor station siting (Watt’s station siting project) plus the station dropout since 1990.

Once these problems have been corrected from the raw data, perhaps we can then determine what the real numbers are. Until then, it GIGO.

Jeff, I really think you need a fixed anomaly base period here (eg 1961-1990). I described here the problem caused if you let it vary. You are subtracting an anomaly adjustment made up of fragments from different time periods, and trend in the data will be appear as trend in the anomaly subtraction, unless you use a common period.

RomanMsaid

Nick, I don’t think that you understand how the method works. The anomaly is not calculated individually for the “fragments”, but is calculated for the finished series. The entire calculation prior to that point is based on an “anomaly basis” common to all of the series – it’s called “0 degrees C”.

One of the things that clued me into the fact that this may be a good approach was the property in the original equivalent SS that Tamino proposed to minimize. Although it may not be immediately apparent from my formulation (but can be proved reasonably easily), the offsets are chosen in a way which also minimizes paired differences between offsetted values at the same time of measurement. If there is a trend common to all the series present in the temperatures, it is removed by the differencing and should not mathematically affect the calculation of the offsets. The trend is the mean of the offsetted values. All of this, however, is done simultaneously.

Show us that this is happening. If you have math to back up what you say (or at least some sort of example illustrating the effect of what you refer to), I will be happy to consider what you say. Otherwise, this is arm waving not mathematics.

#19, I may have misunderstood but I thought Nic was referring to the standard no-offset method compared to yours in the post above. It could be the reason that I’m about a tenth of a degree lower in trend than the published version.

Zeke Hausfathersaid

Interesting work Jeff. Looks like you are ready to dive into segmenting and slicing up the stations to see what results you get from various subsets. I’ve been working on something along those lines over at Lucia’s place using satellite nightlights, population density, and urban boundary data as proxies for urbanity, with the assumption that rural places were most likely rural in the past so UHI effects will be observable in the difference between trends in adjacent urban and rural stations. Ron Broberg over at the Whiteboard has done some good work improving the GHCN station metadata by referencing it to various spatial datasets.

Checking my model quickly, it looks like I get a 0.258 C per decade trend, pretty close to what you get. For reference, GISS land temps show 0.190 C for that period and NCDC land temps show 0.294. I’m somewhat impressed how far apart they are.

Re: RomanM (Mar 26 16:00),
Roman, it’s not an issue just for fragments. It applies when combining any groupings – stations, gridcells etc. If the data covers different periods, and the anomalies are calculated with respect to those periods, then the means that you subtract have a trend. This is a far bigger effect than the artificial seasonal effect that you identified.

I don’t know if you are reading my comments to you at Lucia’s or above but there are two issues Roman solved. One is the seasonal ripple, the second is to realign anomalies by offsetting them to remove steps.

Re: Jeff Id (Mar 26 15:35),What should I do with series that don’t have data for that timeperiod?
That’s the big issue with all anomaly gridding methods. It’s why GISS compute anomalies for grid cells rather than stations, because the grid cell is more likely to have some data for the period.

My own non-standard suggestion is to do a regression over some period which is as close as possible to the 1961-90 but has enough data, and then use the end 1975 fitted value as the mean to subtract.

I managed to get your program to run to completion using the following calc.anom function, which I think implements your intent: calc.anom =function(tsdat) {
anom = tsdat
for (i in 1:12) { sequ = seq(i,length(tsdat),by = 12)
anom[sequ] = tsdat[sequ] – mean(tsdat[sequ])
}
anom
}

Re: RomanM (Mar 26 17:45),
Yes, OK, I’ve gone through the math and I now agree that with the simultaneous fitting of offsets and combined function (as with your method and Tamino’s), you don’t need a fixed base period to avoid spurious trend effects. However, the cost is that the offsets will vary as the data period varies. If new data comes in, all the past offsets change. GISS publishes gridded anomalies for 1978, say, and they would change with 2010 data. You need a fixed period to avoid that.

The objection to the varying base period still applies to the implementation of the “standard non-offset anomaly” which is presented here for comparison. There the fixed common period is needed, and is part of the “standard”. So a comparison without it is meaningless.

Jimsaid

What if you plotted two charts – one for the grids with data in the same latitude around the globe and one for grids in the same longitude line around the globe. Smooth both graphs. For a grid with no data, use the average of the two (monthly) smoothed graphs at that grid. This would give an estimate based on the shape of the two temperature curves for the entire circumference of the Earth.

Carricksaid

I also agree that Nick’s right and the normal method needs to be done over a base period. Roman’s method doesn’t use anomaly. The whole thing is compiled as seasonal data before the final trend was anomalized using a method employing Roman’s improved trend calc.

Geoff Sherringtonsaid

I had to OCR from a poor potocopy given to me by the BoM. Sorry about the quality. If you need equations repeated I’ll JPEG them.

The point is that the 1200 kn radius correction is invoked. The problem is that I do not know if it is invoked in the global data you have been using above. I suspect it its, so there is a double correction.

I seem to be due to pay some indulgences before the BoM will talk to me again because a few years ago I mentioned that global temperatures might not increase. If one of you has a better connection, you might be able to find precisely what adjustments the BOM has made to the data (and over what periods) that you are using in the above valuable global example.

Geoff Sherringtonsaid

Nick Stokes “It’s why GISS compute anomalies for grid cells rather than stations, because the grid cell is more likely to have some data for the period”

This is a judgement call, because there is no guarantee that an estimate made form other stations in a grid cell is any better or worse that a straight guess of a missing value for a station, or an average od data on either side.

One day we might get to a stage where we can simply leave all missing data out of global calculations, do no guesses or interpolations, then calculate a difference to see if it is significant. First we have to find a reliable series to compare the difference. It might be that the effects of all missing values sum to nothingness – they might be equally distributed over seasons, latitudes, altitudes, have cancelling TOBS adjustments and so on. Maybe this present work is a large investment in finding ways to overcome missing data and I hope that it allows an examnation of whether missing data matter or not. I am forever uncomfortable about assigning values to missing data because all methods are guesses, by definition.

I tried my short cut method of finding an anomaly by fitting a regression and using a fixed year value in place of the mean. I tested the 1978-2009 period, and used 1994 as the reference year, being a sort of mid-point. This required just a change to calc.anom(). Using the original calc.anom() I got a global trend of 0,211 C/Dec (cf Jeff’s 0.2107). Using the regression I got 0.2217 C/Dec, which is moving towards agreement with Roman’s method (0.2476), but not very far. However, I found that there was more dependence on the choice of year than I liked. Using 1975 brought the trend back to 0.2156 C/Dec.

RomanMsaid

There appear to be some R technical problems with your script. My computer hiccupped on several of the lines.

It appears that you are regressing the raw data on a monthly basis over the entire range and zeroeing the intercept at the given year. I am not sure that the programme actually does that correctly since the results I was getting on a sequence of 240 normal variates looked a little weird.

You are using different slopes for each month and I don’t know that that is desirable. The other possible difficulty I perceive with this is the assumption of linearity inherent in the process. I am having some trouble visualizing exactly what anom actually represents at the end of this entire calculation.

RomanMsaid

Same for Roman’s just so prior years are stabilized against future data additions.

Yes, you have a point here.

I ran an example with four series with reasonable amounts of data. I did four runs cutting the data at 1979, 1989, 1999, and 2009, respectively. Differences among the four resultingcombined series were less than a maximum .05 in more recent years (with most much closer to zero), however some reached .15 a century back when only one station was contributing.

The reason is that the offsets may change slightly as more information becomes available for comparing the input series. Anomalizing the combined series does not change the situation substantially. However, if the entire period is not used, the problem of some series not being usable any more becomes an issue. As well, the uncertainty of the combined series also is increased. Some thought needs to be put into finding procedures regarding these stabilization issues.

Carricksaid

RomanN, when you add more data there are two effects, one is geometric, namely the centroid of the data shifts forward in time. Subtracting the value for a constant baseline (e.g., 1960-1980) just “freezes” the baseline so it doesn’t drift as you add more data.

The second effect is a statistical one, and is a “real” effect: As you add more data, you are using more information, and that obviously influences your estimates of the offsets.

Re: RomanM (Mar 28 12:40),You are using different slopes for each month and I don’t know that that is desirable.
Yes, it’s a short cut to emulate the effect of a fixed base period without the difficulties that ensue when there are missing values there. It’s for comparison, not claimed to be better than your use of a common trend. The idea is that instead of subtracting the mean of each grid set, you fit a line and subtract its value in a fixed year.

I’ve pasted the code as it ran here. I have been trying (unsuccessfully) to locate a technical problem because, as I mentioned, it is more sensitive to the choice of base year than I think it should be. Could you be using a different na.action.default for lm()? I’m using the default na.omit.

Geoff Sherringtonsaid

Sorry for the junk transmission in my 35. I’ll tidy it up. Different browsers/hosters can have subtle (to me) differences and sometime I get it wrong the first time.

The essence is whether countries like Australia have already made adjustments of the type you incorporate, before you get the data. I simply do not know the answer. The important graphs from above for visual purposes are

Re use of correlation coefficients, one gets a different answer using daily Tmax over a short period, to annual Tmax over a long period, for parts of the the same pairs in a group, and a different answer again after smoothing. How do you decide the fundamentally correct calculation? It points to correlation coefficients changing as instruments change, since some of them can do smoothing like daily spike rejection.

Carricksaid

the missing data and the adjustments are the bits of fudge where more certainty gets assigned than is warrented.

I’ll take TOBS for example. If you have 60 measurements per month and your instrument error is like that assumed by Jones,
then your monthly estimate is good to .03C 1sd.

So a monthly temp ( say march) is say 10c +-.1C at a 95% CI.

Now you find out that this 10C was taken at 6AM as opposed to midnight, the standard time.

What’s a poor climate scientist to do?

Well, build a TOBS model. This model is empirical. they look at years of hourly data from stations. They note the
month, the lat, the lon, the position of the sun. Then they do a regression. This regression is a prediction.
The TOBS regression says This:

F(lat,lon,month, 6AM) = -.5C. In most cases the prediction has a SE of between .1 and .2C It all depends.

So now, how well do you know that march temperature?.. The estimate comes out to 9.5C.. what about the error?

I imagine that it’s vanishingly small. Still, I’d like to see how its treated. Same for the MMTS adjustments.

Zeke Hausfathersaid

Cross-posting this from Lucia’s comment thread, since I’m not sure you are actively reading it. Would it be possible to obtain annual anomalies 1900-present for this method? I’m planning on learning R myself in the near future, but at the moment I’m looking to put together all the various surface temp reconstructions floating around (as well as GISS/NCDC/CRUT land temps) on a chart.

If you go to my post at 35 above, the whole lot is hyperlinked by accident so you can read most of the text. The figures were dropped out in the OCR/rename/post prcesses. Let me kow if you need figs or equations. Geoff.

Geoff Sherringtonsaid

44 Steven Mosher I’m surprised how hard it is to discover if a country submitted data to the CRUs of the world in raw, averaged, adjusted or whatever form, over what periods and also whether this has been upgraded as the info is constantly being improved by people digging back through the original meta data sheets and making revisions to the home country data. My worry is that better statistical methods for assembling global data are doubling up on some adjustments without the authors knowing that they do not have truly “raw” data. I have an email from one country authority asserting that their data are “raw” when in fact they are rather modified.

WIthout making a long, involved comment on the INSANITY of “average temperature”. (Sorry, it’s patent nonsense.) If one wishes to think there is some significance in it…let’s note this, the rise (if real, I have HUGE trouble with the QA of the data, which I’ll get to in a moment)…is “linear” and “consistent” from 1900 on. When one uses the Mana Loa CO2 data as a baseline, and the allegations with regard CO2’s “heat trapping” (which I also debate..yes, I’m and Elasser/Milkosky radical…CO2 is a NET EXHANGE AGENT in the troposphere, and makes no differences until above 2500 PPM!) there is obviously a “cause/effect” problem.

This heads us towards asking what causes overall cycles, taking “man” out of the question.

With regard to DATA QUALITY. WHEN, just WHEN does anyone think we transitioned from “hand recorded”, “guess at the high or low” data…and “rotating drum” recorders?????

Does ANYONE REMEMBER that there were NO satellites in 1900? 1920, 1950, even up to 1980??? Does anyone have a CONCEPT that the “hand recorded data” has virtuatlly NO VALUE for 0.1 C values? (.2 F?)….Does anyone understand GARBAGE DATA IN GARBAGE ANALYSIS OUT?

The reason for the gradual rise may very much be due to a systematic data bias error due to changing technology.

I see NO error bars. I see NO attempt to quantify error. (Although BEST does have that. I will give them credit for that, but ZERO credit for the great “Average temperature” flaw.!!!!)

I hope and pray that some day, some BRIGHT INTELLIGENT people will Wake UP! And realize the futility of all this “average temperature” nonsense.

The ONLY think that you can do with temperature averages is look at a LOCAL REGION, and see if there are “seasonal shifts” going on.

I’ve done that for MN. AND AFTER REMOVING THE UHI from the Minneapolis/St. Paul area, and “guessing” at the “error bars” for the prior to 1920 data, I cannot (beyond statistical doubt, STANDARD DEVIATION ANYONE?) say that since the first temperature record (Fort Snelling, Minneapolis MN) there has been any SIGNIFICANT change in the overall weather/temperature profiles in MN. (Barring the fact that the 1820-1840 was the end of the “mini-ice age” and does show early winters and late springs. And averages somewhat lower than the following 40 years, but then 1880 to 1920 pans out pretty close in historgrams and trends to the 1820 to 1840 realm. Again, comparisons made by HISTOGRAMS in terms of “degree days”, and NOT by strictly averaging temperatures…insanity.