Un-Muddying the Waters

A few days ago, in a comment at CA, NASA blogger Gavin Schmidt stated that, over land, the ratio of lower troposphere trends to surface trends in the GISS ER model was 0.95 (+- 0.07) and thus there was no downscaling over land of tropospheric trends.

Schmidt linked to a November 2009 realclimate post, “Muddying the Peer Reviewed Literature”, which criticized Klotzbach et al 2009 for using an amplification factor of 1.2 over both land and ocean in their similar downscaling of satellite trends to surface for comparison to surface trends. (The realclimate post was published just before Climategate and not previously discussed here.)

Schmidt waxed eloquently against the Klotzbach et al article for being “heavily promoted”, containing “fundamental flaws” and wasting “a huge amount of everyone’s time” – a criticism that one feels would be more aptly directed at (say) Mann et al 2008, which “heavily promoted” their supposedly groundbreaking no-dendro reconstruction using upside-down and contaminated Tiljander sediment, with a “huge amount of everyone’s time” being wasted in dealing with the seemingly wilful obtuseness of both Schmidt and Mann in failing to retract or amend the original article and realclimate posts. But that’s another story.

Schmidt’s 2009 realclimate article also reported that the overall GISS ER amplification factor was 1.25 and, over ocean, was 1.4. describing the calculation as follows:

First I calculated the land-only, ocean-only and global mean temperatures and MSU-LT values for 5 ensemble members, then I looked at the trends in each of these timeseries and calculated the ratios. Interestingly, there was a significant difference between the three ratios. In the global mean, the ratio was 1.25 as expected and completely in line with the results from other models. Over the oceans where most tropical moist convection occurs, the amplification in the model is greater – about a factor of 1.4. However, over land, where there is not very much moist convection, which is not dominated by the tropics and where one expects surface trends to be greater than for the oceans, there was no amplification at all!

The land-only ‘amplification’ factor was actually close to 0.95 (+/-0.07, 95% uncertainty in an individual simulation arising from fitting a linear trend), implying that you should be expecting that land surface temperatures to rise (slightly) faster than the satellite values.

In response to Schmidt’s criticism, Klotzbach et al re-did their calculations in a correction, in which, instead of an overall amplification factor of 1.2 applied to both land and ocean, they used an amplification factor of 1.1 over land and 1.6 over ocean. (Re-reading Schmidt’s original post, one becomes aware of what Schmidt didn’t mention: he had drawn attention to and taken umbrage with Klotzbach’s too-high amplification factor over land, but not to the too-low amplification factor over ocean.)

Klotzbach et al (2010) stated that that they got similar results to the original article with these corrected amplification factors and did not retract the article as Schmidt had called for.

The discrepancy between the amplification factors used in Klotzbach et al 2010 and those reported by Schmidt at realclimate in November 2009 is an inconsistency that one feels that, purely as an empirical point, should be possible to resolve merely by trivial calculations on GISS ER runs.

Fresh Calculations
I’ve re-calculated tropospheric and surface trends over land, ocean and combined for the GISS models (including GISS ER) using Chad Herman’s excellent collation of A1B models (including GISS) that I placed online last week. Chad took care with land-ocean distribution, and, for the GISS runs, used the GISS land mask (see Chad’s documentation at climateaudit.info/data/chadh/src.rar.

The script for my calculations follow. These calculations closely replicate the amended values reported by Klotzbach et al 2010</a (note – see here for a discussion of these calculations, done by Ross McKitrick. I either wasn’t aware of this exchange or lost track of it during Climategate.)

For the period 1979-2008 (used by Klotzbach et al), I obtained an average GISS ER amplification factor over land of 1.10, exactly matching the Klotzbach et al 2010 results and outside the confidence intervals reported by Schmidt in November 2009 and re-iterated in his CA comment last week (0.95 +- 0.07). I experimented with other trend periods to try to match the Schmidt results and was unsuccessful. For the period 1979-2002 (which seems to have been used in Schmidt 2009), I got 1.057. For the A1B period 2010-2040, as an experiment, I got 1.107.

I looked at the trends in each of these timeseries and calculated the ratios

Nor does the discrepancy appear to arise from differences in gridcell allocation between land and ocean, since discrepancies in the same direction arise over ocean, where Schmidt reported an amplification factor of 1.4 (Klotzbach’s amended value was 1.6). I got 1.56 for the period 1979-2008 used by Klotzbach and 1.59 for the 1979-2002 period used by Schmidt, values that were close to those of KLotzbach et al 2010 and considerably higher than those reported by Schmidt.

On a combined based, Schmidt reported an amplification factor of 1.25, while I obtained a value of 1.36 for the 1979-2008 period (1979-2002: 1.345).

In summary, none of Schmidt’s results (land – 0.95; ocean – 1.4 and combined 1.25) were reproduced despite considerable care in the calculation. It would be nice to see the precise provenance of Schmidt’s results.

Update Nov 8: Gavin Schmidt explained in a comment below that his result was calculated as follows:

Secondly, I calculated the amplification factors via linear regression of the annual anomalies of MSU-LT and SAT (1979-2005; GISS-ER only). You are calculating the ratio of two OLS trends.

There were two reasons why I calculated the ratio of the two OLS trends. It is what Gavin had said that he had done in the realclimate post. And it’s the right way to do the calculation. From Gavin’s comment, it seems that he regressed TAS~TLT or TLT~TAS. By experimenting, it seems evident that he regressed TAS~TLT as this yields “amplification factors” in the reported range. However, this is an incorrect approach to the problem.

First, extending the script below, here is evidence that Gavin got his 0.95 through TAS~TLT regression:

As discussed in comments, this is not the correct approach to the problem. The correct approach was the straightforward one said to have been used: the ratio of the trends. This can be seen in the diagram shown below where the blue (“pseudo-TLT”) series goes up at a higher trend (0.1 units/year) than the green (“pseudo-TAS”) series (0.05 units/year) – an “amplification factor” of 2, but has a lower variability (both have the same rnorm error, but the blue series has an amplitude of 0.25 of the green series).

If you take the trends of each series and divide, you get a value of around 2 (1.96 in the example that I did.) However, if you regress “TAS” against “TLT”, you get a coefficient of 0.598. If you regress “TLT” against “TAS”, you get a value of 0.91. Neither value is remotely close to the true (by construction) value of 2. Actually, it seems to me that the reciprocal of the coefficient from Schmidt’s regression of TAS~TLT would be less bad than the coefficient that he seems to have used. In this case, it would give a value of 1/.598 = 1.672, much closer to the true value than the estimate from Schmidt’s method. In the original case, the reciprocal of Schmidt’s value of 0.95 gives a value of 1.05, which is less than the actual ratio of trends, but again less bad.

But it’s easy enough to calculate the ratio of trends. I think that this small problem can now be said to be resolved.

The satellite data seems very dubious. Old satellites have not been matched with new, algorithms may not be the same, slopes are certainly different.
TSI measurements with ACRIM require continual matching between satellites and with time – see Leif Svalgaards data.

Secondly, I calculated the amplification factors via linear regression of the annual anomalies of MSU-LT and SAT (1979-2005; GISS-ER only). You are calculating the ratio of two OLS trends. My error bar was the 95% confidence interval on the regression on a single run, what are the error bars in your calculation?

Steve: In your realclimate post, you stated: “I looked at the trends in each of these timeseries and calculated the ratios.” As you observe, in my calculations, I calculated the ratio of the tropospheric trend and the surface trend – my interpretation of the language in the realclimate post. I take it from your present comment that you regressed the TLT data against the TAS data. I interpret this as a different procedure than the one described in the realclimate post. Be that as it may, this new information will probably assist reconciliation.

Third, I used the MSU-LT calculated by the model itself, rather than afterward via the vertical temperature structure – there will be some variation there related to vertical interpolations and (perhaps) surface conditions. I doubt that is particularly significant though, but it might be worth checking.

Finally, I don’t know know exactly what runs you downloaded or how you spliced the 20thC runs to the A1B projections, so given that 2005 was the last year of the GISS-ER 20th C runs, let’s stick to 1979-2005 to reduce the number of complications.

Steve: the 5 runs at PCMDI.

The data I used are online at ftp://ftp.giss.nasa.gov/outgoing/sat_msu_1979-2005.tar.gz – all data (including ensemble mean and topography file) are also in GISS fortran binary format.
Steve: “GISS fortran binary format” may mean a lot to you but is not a format that I am familiar with. With the help of CA readers, I’ve parsed some GISS binary files in the past, but it would be more useful if you archived data in a more modern format or at least gave some information on the binary setup.

Unfortunately, I don’t have R code, only Matlab. I’ll post Matlab code separately (in case it doesn’t format well), and will leave it to someone else to code an R file reader, working from the above description or from the Matlab.

Oddly enough, adding the four arrays (ocean, lake, ground, glaciers) doesn’t produce a value of 1.0 at all grid cells as I expected. Some cells are a little high, some a little low. Continental coasts tend to be low. Caspian Sea (?) is high.

I am sure he will be forthcoming with the templates — and perhaps even the code.. After all his reputation depends on it and I know he wants to though highly of. A link to the correct templates and perhaps the code for the file interface would do.

If he doesn’t come through with at least the templates and perhaps the code he may as well have handed out encrypted data. Nobody wants to be seen behaving in that manner — especially the head of a climate program.

People who work in some(most/many?) forms of engineering use Matlab and FORTRAN for a reason. When I wrote network/nodal analysis programs I did not even consider C or Pascal. I might have considered Pascal if it had the complex number package then that it does now. Now C++ has a “Complex Number” package — though if you are used to FORTRAN why bother switching?

This discussion of file formats borders on religious arguments akin to discussions of angels dancing on the head of a pin. If you have the file formats this is not a big deal — you can write a simple data interchange program and upload the file to a SQL server database engine — then you only need add new data when it becomes available.

If someone handed you an Oracle/SQL Server/Progress/Interbase file would you complain? Or would you just load the appropriate database and or find someone who could put it in the format you want.

All this is about is the data structure. There is still lots of FORTRAN talent about to do a simple translator…

And Dr. Schmidt will provide, as he is a scholar and a gentleman and we says please and thank you — like all proper folk do…

“Secondly, I calculated the amplification factors via linear regression of the annual anomalies of MSU-LT and SAT (1979-2005; GISS-ER only). You are calculating the ratio of two OLS trends.”

If I’m understanding your procedure correctly, it seems to be estimating something different – that is, the “amplification” in interannual variability, rather than the trend amplification relevant in Klotzbach et al. I’m not sure if these different amplifications are expected to be similar in GISS-ER (although it appears not based on discrepencies between the two methods), but certainly they are different in measured surface temperatures vs. TLT temperatures.

“Secondly, I calculated the amplification factors via linear regression of the annual anomalies of MSU-LT and SAT (1979-2005; GISS-ER only). You are calculating the ratio of two OLS trends.”

If I’m understanding your procedure correctly, it seems to be estimating something different – that is, the “amplification” in interannual variability, rather than the trend amplification relevant in Klotzbach et al. I’m not sure if these different amplifications are expected to be similar in GISS-ER (although it appears not based on discrepencies between the two methods), but certainly they are different in measured surface temperatures vs. TLT temperatures.

In my opinion, (as on other occasions), Troyca is right on the money. If you want the ratio of two trends, then you do precisely that: calculate the ratio of two trends.

If you regress TAS against TLT, you get numbers that closely replicate Gavin’s 0.95. The difference does not arise from exotica in the data sets. It arises from the structure of the calculation.

Gavin’s calculation, as Troy observes, doesn’t measure the ratio of the trends. It measures something different – as can be readily confirmed by thinking about the matrix algebra for the regression coefficients: in the TAS~TLT regression, time doesn’t even enter into the matrix algebra. I have no issue with the calculation described in the original blog post – only with the implementation.

It seems to me that we may have resolved a small issue: Gavin was justified in criticizing the original Klotzbach calculations (and my post), but his own calculations also appear to be erroneous. The revised amplification factors in Klotzbach et al 2010 (responding to Gavin’s original criticism) look properly calculated to me.

While Gavin’s method is not calculating the ratio of trends, I’m not sure that Klotzbach did that properly either. While I’m unfamiliar with STATA which Dr. McKitrick used, it appears that he calculated the amplification ratio [sat trend / surface trend] in each gridcell, and then computed the area-weighted mean of the ratios over those cells labelled as ocean, and again over those not labelled as ocean. The results were 1.602 and 1.106 respectively.

I think a better calculation, from the perspective of the Klotzbach paper at least, would be to compute the average trend over ocean cells (resp., non-ocean) for satellite & surface, and taking the ratio. Using the values in the provided .csv files, I obtain 1.549 & 1.052 respectively.

Note that the new TOPO file which Gavin provided, has ocean fractions and land fractions for each gridcell, so one can use a more subtle weighting than all-or-none. It’s not clear whether this method is superior to selecting all-ocean or all-land cells, but I tried it anyway, computing area-weighted trends over land for the six runs in Gavin’s .zip file. The satellite-to-surface trend ratios for the six runs varied from 1.006 to 1.136. The ratio of average trends (0.270 K/decade sat; 0.258 K/decade surface) came to 1.044. The last would be the figure comparable to 1.052 obtained above from the McKitrick dataset. The area-weighted ocean amplification from the above method comes to 1.516.

I agree with your viewpoint on a more correct procedure and (implicitly) used that procedure in my own calculations, that yielded numbers that I reported (which were similar to the K2010 numbers). I’m relying on my own calculations rather than the K2010 calculations.

In these calculations, I relied on Chad’s collation which in turn used the PCMDI archive. Gavin has pointed to differences in the various versions and these might contribute to the slight discrepancy between the numbers that you got and the numbers that I got.

Regardless, I think that we can all agree that Gavin’s methodology for his calculation was incorrect.

Whether any of these numbers are good estimates of the “true” amplification factor is another question again.

I don’t have time to write this up right now as I’m going to be away for a couple of days, but the retrieval scripts provided above, permit the comparison of GISS ER amplification factors to other models. Amplification factors for GISS_ER over ocean, over land and over land-and-ocean are in the top quartile among models. The possibility of the GISS ER model being somewhat “wrong” on this point should not be precluded. I presume that Schmidt will argue something like this rather than concede that his actual calculation was incorrect.

It is disillusioning to see sea surface temperature trends differing from satellite data by more than a factor of 1.6 and/or climate models still producing such big errors.

As, according to new HadSST3, sea surface temperatures have only increased by 0.3 (or 0.2) degrees since the 1940s, even the possibility that temperatures did not increase at all since then is refuted.

Steve, here is a NASA web page explaining that the GISS-ER model run data archived at PCMDI may not be current. It links to an ftp with updateshere. I found the page while attempting to replicate Gavin’s graphs comparing 20th century GISS-ER OHC model runs with observations (RC posts hereand here) using Troy Master’s data and code from this post. Troy’s data source was the PCMDI archive. PCMDI had a total of nine twentieth century runs (including OHC) for GISS-ER archived. As far as I could tell, the NASA ftp directory only had OHC updates (“thetao” files) for 5 of the 9 20th century runs.

“The scripts are very time consuming to prepare and much too reminiscent of translating medieval Latin”: “medieval Latin” is much easier than Classical Latin – must Climate Scientists get even their analogies wrong?

Fortran is not that hard to learn or use (especially with ‘gfortran’ available), but I appreciate that people are more effective using languages they are familiar with. Thus I’ve made a quick and dirty version of the data in netcdf format as well:

Thanks for the updates to the script – it is now functional, and I will investigate your numbers further.

The land-mask you point to is fine and should be identical to the one I used.

The correspondence of simulations is more tricky than ‘the 5 runs at PCMDI’ unfortunately. There were 9 20thC GISS_ER runs in total which unfortunately had an error in the 1979+ stratospheric ozone forcing.

Subsequently, 5 of them were rerun (1979-2005) with corrected ozone forcing and that is what I am making available here. Note however, that the continuation runs for the A1B scenario were not re-run (and so they continue from a subset of the original 9 runs). This is much messier than it ought to have been, but versioning of data in CMIP3 was not ideal.

I know how to read Fortran. I learned it in 1968. My criticism of Fortran for statistical work is based not on a lack of knowledge of Fortran, but on my appreciation for the accomplishments of the contributors to both R as a language and to its statistical packages.

IIRC R’s gunzip can’t handle COMPRESS data but gunzip.exe can. That may be why Mr. McIntyre has gotten into the habit of using it.
Steve: Nicholas is my guru on retrieving compressed formats into R. Every time that I used mode=”wb”, I subconsciously tip my hat to him.

unfortunately using the “system()” call does tend to make your code less reliable across various OS. I’ve never gotten the gunzip.exe to work properly. Plus, you’re asking people to download an exe to their systems.
Steve: I agree about downloading .exe. I tried the R.utils version and it worked fine. Thanks for this. I’ve re-edited the script.
.

And while you’re making minor “improvements” I’d like to suggest using the built in compression support in untar. According to the online doc (I don’t write R code so I’ve not tested it) there is pretty complete support for various compression types.

This is either a wrapper for a tar command or for an internal implementation written in R. The latter is used if tarfile is a connection or if the argument tar is “internal” or “” (except on Windows, when tar.exe is tried first).
…
Since it uses gzfile to read a file it can handle files compressed by any of the methods that function can handle: at least compress, gzip, bzip2 and xz compression, and some types of lzma compression.

Arguments
compressed logical or character. Values “gzip”, “bzip2″ and “xz” select that form of compression (and may be abbreviated to the first letter). TRUE indicates gzip compression, FALSE no known compression (but the tar command may detect compression automagically), and NA (the default) that the type is inferred from the file header.

I’ll give it a spin. One issue, however, is building a system that works on all OS. with tar on windows
I’ve often run into the problem of tar.exe being flakey. On the mac it just works. But there are ways of handling that..

I’ll have a look at it because in the climate world ( and others) we have to deal with gz inside tar, tar inside gz
bz2, and the hardest nut ( which nicholas cracked) .Z

This has been an ongoing theme of Steve’s. Ideally a script should just run. It should go get the data.
download it. and feed it to the methods. look ma no hands. That sounds really easy, but our experience is
that there is often SOMETHING that requires some hand work. In the example above you see a pretty good example of the ideal.

now everybody is free to do what they like with this data and code. That’s a beautiful thing.Open inquiry, no requests to reluctant PIs, it just works.

yup. Fluency in use of R-tools to decompress data for turnkey scripts is something that takes time. Nicholas wrote a few scripts for me and I’ve applied those.

One avenue that I started down, but didn’t pursue, were scripts to download data from OpenDap websites or from java websites like KNMI. I managed to figure out how to ping KNMI and extract data efficiently, but they changed their handshakes and they don’t seem to work anymore. My pings were purely empirical as I don’t understand the handshake protocols, but creating handshake procedures within R for some of these sites would be a very worthy activity.

Handshaking to PCMDI to create a R version of OpenDAP would also be very worthy. Some of the statistical analysis of models is pretty trivial. If one could establish R handshakes to PCMDI, it would do much to increase accessibility and interest. Curt Covey of PCMDI has commented here from time to time and would probably be supportive of such an effort. (I think that they should commission such work, but even if they aren’t commissioning such an effort, Covey would at least not frustrate it.)

One thing R is lacking ( last i looked) is a good SOAP package, REST as well.

Also, some folks think they are doing a favor by having a menu driven data selection approach as there only approach to get data, which means you have to use things like RCurl to issue a http request to the server to ‘simulate’ a request made through the menu system. Someday i’m gunna figure RCurl out so we can get that data easily as well, and program in a user id and password which RCurl can handle

it’s too bad that more people haven’t got the idea of fully “turnkey” scripts. I view these things almost a sort of fancy hyperlink.

Even people who’ve made an effort to archive supporting data and code – BEST is a recent example or the Schmidt 2009 archive with data in “binary GISS format” – do not fully appreciate the power and desirability of making these materials efficiently available or equivalent to a turnkey script generating figures and tables.

Let domain specific tools handle domain specific tasks and resist the pointless urge to re-invent the wheel. For the compressed archive issue get cygwin (better, dump windows) and then use your analysis software to generate and execute what is a trivial, easily customized, one line shell script to download, decompress, and unarchive.

As for file format, netCDF is a good choice– I often use it since there are native libraries for many popular programming languages and canonical tools to convert to ASCII for the outliers.

I “get” the problem, we disagree on the solution space. These wrapper utilities still depend on and inherit all of the constraints of the local programs: gunzip.exe and tar.exe Steve’s case. If you attack the real weakness head on the first handful of lines in Steve’s script melt away and the reader can focus on the analysis.

The issue for me isnt focusing on the analysis. the issue is getting something that works cross platform
independently of the particular installation the user has. also, the tar on my windows, as I noted, has
a horrible track record with files. simply not to be trusted.

The simplest version works with either the built-in (tar=internal) or external (GNU/Cygwin) tar
and supports vanilla tar, gzip, bzip2, and xz formats but not LZW .Z.

untar(“sat_msu_nc.tar.gz”, files=c(“MSU2R.E3Af8foM20A.nc”))

Under Windows this works if one is lucky enough to have GNU tar installed since tar.exe is called preferentially.

untar(“sat_msu_nc.tar.Z”, files=c(“MSU2R.E3Af8foM20A.nc”))

Under Unix one would need to specify tar=/path/to/gtar (Linux tar == gtar)

untar(“sat_msu_nc.tar.Z”, files=c(“MSU2R.E3Af8foM20A.nc”), tar=tar)

Since there is a library that supports uncompress it might be interesting to have a simple function that converted a .Z file into a .gz file in one go zipper(“foo.Z”, “bar.gz”) for the few times .Z files have to be dealt with.

bob

Steve: for .Z files, I’ve successfully used system(‘”gunzip.exe”‘,…). Works well enough for the occasions.

Another good equal-area projection for climate illustrations besides the 1805 Mollweide is the 1772 Lambert Cylindrical Equal Area projection, which preserves orientation not just in the EW direction like Mollweide, but also in the NS direction. It is particularly nice when the aspect ratio is set to 2:1 (40.8d N/S standard latitudes), as originally suggested by CA’s JohnA. This aspect ratio makes the prime meridian look half the length of the equator, as it is (ignoring the slight non-sphericality of the globe).

Although I get most of my TV news from the Comedy Channel, I am very impressed that CBS News with Scott Pelley uses an interrupted Mollweide globe as its background. The interruptions reduce shearing of the continents while preserving equal-area.

Thanks, Steve, for this post, chasing up what I (among others) asked about on the Thoughts on BEST thread. In turn, your (and troyca’s) findings here throw up some new questions, as well as shedding some new light on your points re. surface vs. satellite temperatures.

First, the larger-than-one downscaling factor provides some more support for your suggestion that the surface land temps really do exaggerate temperature trends.

Next, the fact that the direct regression of TLT vs. surface temperatures in the models yields a ratio of 0.95, whereas the ratio of their two time trends is 1.2, is interesting in terms of the response and time-lags, as discussed e.g. in your series of posts following Spencer’s paper (e.g. this one). Here, both regressions are based on GISS model data – do the Spencer and related analyses not suggest that the models may get this sort of thing wrong?

Finally, if there really is a substantial downscaling factor, is this not a signature in some sense of a decreased adiabatic lapse rate with rising surface temperature? I’m a bit out of my depth here, but I wonder if anyone can comment on the implications here to things like sensitivity, as again discussed in those threads.

Gavin on here debating the numbers. People taking a closer look. New methods to facilitate analyses being developed. Is there SCIENCE going on here? We welcome the robust debate. This is what it is all about. I believe even Gavin enjoys the challenge. It makes one alive.

I’ve added the excerpt below to the head post. I think that the small problem of this post – the discrepancy between the revised downscaling of Klotzbach et al 2010 and Schmidt’s Nov 2009 realclimate post – is now totally reconciled. The amended numbers of Klotzbach et al 2010 appear reasonable.

Gavin Schmidt’s criticism of the downscaling over land in Klotzbach et al 2009 and of my original graphic in Closing BEST Comments post was justified, but his own figures for downscaling were incorrect. The diagnosis of the discrepancy was complicated by the fact that his actual method did not correspond to the most reasonable interpretation of his realclimate article. Thanks to Gavin’s clarifications, we now have what seems to be a definitive diagnosis of the discrepancy and where Gavin got wrongfooted. It seems to me that it would be constructive to note the resolution of the discrepancy in the original RC post.

{insert]
Gavin Schmidt explained in a comment below that his result was calculated as follows:

Secondly, I calculated the amplification factors via linear regression of the annual anomalies of MSU-LT and SAT (1979-2005; GISS-ER only). You are calculating the ratio of two OLS trends.

There were two reasons why I calculated the ratio of the two OLS trends. It is what Gavin had said that he had done in the realclimate post. And it’s the right way to do the calculation. From Gavin’s comment, it seems that he regressed TAS~TLT or TLT~TAS. By experimenting, it seems evident that he regressed TAS~TLT as this yields “amplification factors” in the reported range. However, this is an incorrect approach to the problem.

First, extending the script below, here is evidence that Gavin got his 0.95 through TAS~TLT regression:

As discussed in comments, this is not the correct approach to the problem. The correct approach was the straightforward one said to have been used: the ratio of the trends. This can be seen in the diagram shown below where the blue (“pseudo-TLT”) series goes up at a higher trend (0.1 units/year) than the green (“pseudo-TAS”) series (0.05 units/year) – an “amplification factor” of 2, but has a lower variability (both have the same rnorm error, but the blue series has an amplitude of 0.25 of the green series).

If you take the trends of each series and divide, you get a value of around 2 (1.96 in the example that I did.) However, if you regress “TAS” against “TLT”, you get a coefficient of 0.598. If you regress “TLT” against “TAS”, you get a value of 0.91. Neither value is remotely close to the true (by construction) value of 2. Actually, it seems to me that the reciprocal of the coefficient from Schmidt’s regression of TAS~TLT would be less bad than the coefficient that he seems to have used. In this case, it would give a value of 1/.598 = 1.672, much closer to the true value than the estimate from Schmidt’s method. In the original case, the reciprocal of Schmidt’s value of 0.95 gives a value of 1.05, which is less than the actual ratio of trends, but again less bad.

But it’s easy enough to calculate the ratio of trends. I think that this small problem can now be said to be resolved.

>>
If you take the trends of each series and divide, you get a value of around 2 (1.96 in the example that I did.) However, if you regress “TAS” against “TLT”, you get a coefficient of 0.598. If you regress “TLT” against “TAS”, you get a value of 0.91.
>>

OH NO , not again!

Well will these amateurs learn how and when to do linear regression?

Linear regression only works reliably when you have a controlled variable. ie one quantity that has substantially less noise, error , uncertainty than the other.

As Steve correctly states your should regress each temperature against time, a controlled variable , then take the ratio.

If you regress one uncontrolled and noisy quantity against another you will get a result that is an attenuated slope.

This is exactly what GS has done and exactly what he has got.

As Steve’s test shows it does not matter if you do TLT v TAS or TAS v TLT , either way round you get an attenuated value, albeit a differenct one.

This is the same issue as has plagued the climate sensitivity argument for years. None of these jokers seem to get even the high school maths right.

1) If the noise on TLT and the noise on TAS are uncorrelated, then both methods (ratio of trends vs regression of TAS against TLT) will give similar results.

2) If the noises are correlated AND the ratio of their amplitude is equal to the amplification factor (which here is 2), then the regression TAS against TLT will give better results, because some of the noise will cancel out.

3) If the noises are correlated AND the ratio of their amplitude is NOT equal to the amplification factor, as in Steve’s example, then the ratio of trends will give better results.

4) In all cases, ordinary least square linear regressions are ok to estimate the trends of TAS against t and TLT against t and then take the ratio. BUT to estimate the slope of TAS against TLT, a total least square linear regression is obviously the correct choice, for the slope to be invariant when switching TAS and TLT.

“Science would work a lot faster if authors corrected their own work when errors were found.”

Gavin Schmidthttp://rankexploits.com/musings/2009/food-fight-at-climate-blogs-with-poll/
Steve: The comments in the linked post include comments on Nov 17-18, 2009 with a few from Jeff Id on Nov 18 (then unaware of the link on his own blog). Climategate went viral on Nov 19. Gavin Schmidt was in possession of the Climategate dossier on Nov 17 and presumably had other things on his mind. He was very quick off the mark at the first mentions at Lucia’s on Nov 19.

If one posits a model for the anomalies: msu_i = a * sat_i + noise, or individual trends in time: msu_i = b * t_i + noise and sat_i = c * t_i + noise, it is easy to see that they are equivalent if a = b/c. Different ways to estimate either ‘a’ directly, or via the ratio ‘b’ and ‘c’ will differ only in how the different levels/types of noise work out – in the (impossible) limit of very small noise of course the answers will be exactly equal. For the particular case here, the results are not actually sensitive to this distinction within the uncertainties:

Neither a regression through the annual anomalies, nor the actual calculation of the ratio of trends is statistically distinguishable from the 1-to-1 line. This was the point made to Klotzbach et al, and on your previous post where you assumed without any backing that the ratio must be 1.4.

Perhaps you’ll be notifying the Wall Street Journal that your claim that the satellite trends over land were less than half the BEST result was incorrect?

NB: Note that in the runs I am looking at have trend ratios of between 0.99 and 1.11 (mean 1.03). Looking across models one might want to distinguish those that have stratospheric ozone forcing and those that don’t.

It is common ground on the part of say Lindzen on one side and the IPCC models on the other that trends in the lower troposphere (where satellites measure temperature) will be higher than at surface because increased evaporation of water at surface and condensation at altitude will increase the trend at altitude. Overall the tropospheric trends are anticipated to be about 40% higher than surface (this is the 1.4 divisor). While the overall number is 1.4, the GISS model presumes that the effect is limited to above oceans and that there will be an amplification factor greater than 1.4 above oceans but only 1.0 or so on land (Pielke Sr says 1.1 using GISS but I can’t comment on this.) If you calculate the downscaled-to-surface equivalent using satellite date, you get lower trends than CRU and much lower than BEST or NOAA.

My surmise is that the difference is due to some combination of urbanization, algorithm artifacts etc.

>>
If one posits a model for the anomalies: msu_i = a * sat_i + noise, or individual trends in time: msu_i = b * t_i + noise and sat_i = c * t_i + noise, it is easy to see that they are equivalent if a = b/c. Different ways to estimate either ‘a’ directly, or via the ratio ‘b’ and ‘c’ will differ only in how the different levels/types of noise work out
>>

Let’s substitute your equations into the first one, recognising your point that the noises are not the same in each case;

msu_i = b * t_i + noise_b = a * (c * t_i + noise_c)+ noise_a

b = a * c + a*noise_c/ t_i + (noise_a -noise_b)/ t_i

“it is easy to see that they are equivalent if a = b/c : b= a*c ”

let’s try that assumption:

0 = a*noise_c/ t_i + (noise_a -noise_b)/ t_i

a= (noise_b -noise_a)/noise_c

To follow your logic “it is easy to see” that ‘a’ is nothing but noise if we follow your precepts.

Different ways to estimate either ‘a’ will differ by whether they are VALID or not .

Throwing out the time dependency of your two time series is always a bad start . Trying to regress one noisy signal against another one is simply , flat out wrong.

Look up the conditions and assumptions of linear regression methods. You WILL get an incorrect result if you don’t respect the conditions.

All that your linked graph illustrates is that the level of noise in “SAT anomaly” and/or non linear component of the relation between the two IN THE MODELS is sufficiently small that your regression fit looks OK to the eye.

Steve says:
>>
Actually, it seems to me that the reciprocal of the coefficient from Schmidt’s regression of TAS~TLT would be less bad than the coefficient that he seems to have used. In this case, it would give a value of 1/.598 = 1.672
>>

Indeed. and this is of no surprise. It is clear that the TLT data has much less noise and is far more linear than TAS in your peudo data. It therefore makes a half decent candidate as the base for the regression, though as you have found, the regression value is still less than the actual programmed ratio. This is precisely because the noise , albeit small, is not negligible.

This is the same issue as I pointed out in your exploration in the absolute temperatures discussion and that I have laboured to get noticed in relation to climate sensitivity.

As soon as climate “scientists” stop abusing linear regression a lot of the discrepancies and ensuing argument will disappear.

On a side note, I’m bewildered by the running comments “Climate scientists cannot properly handle linear regression”, the one below belonging to that species:

“Throwing out the time dependency of your two time series is always a bad start . Trying to regress one noisy signal against another one is simply , flat out wrong.”

The world of linear regression is not restrained to the Excel-provided ordinary least squares linear regression. Of course it’s a bad idea to apply an OLS regression to msu against sat, among others because you can switch the variables and expect the resulting slope to be the inverse of the initial slope (something an OLS regression won’t do).

However that kind of problem is easily handled a by total least squares linear regression. If Gavin applied a TLS regression to his data, then it’s perfectly sensible and valid.

Roman —
I can’t say that I have ever actually used TLS, but from reading up some on it after discussions here on CA, it sounds to me like a very reasonable way to handle the errors-in-variables problem.

However, the elementary treatments I have seen just assume that the variances of all the errors are equal. In fact, they ordinarily wouldn’t be equal, and in order for the method to be identified, you have to know what their relative size is (or what the absolute size is on one side). Then you can rescale the variables so that the errors are equal, and use the elementary method. This gives you “y on x” in the limit when you know that x has no measurement error, and “x on y” in the limit where you know that x has measurement error but there are no regression errors.

Is there a standard way to compute standard errors or CI’s for the coefficients in TLS? I haven’t seen that. (Since the measurement-error-only case corresponds to the calibration problem, the CI’s may be nonstandard).

Another issue that is glossed over is the intercept term — most regressions include an intercept, which is the coefficient on a unit “regressor”. However, there is never any measurement error on unity, so it has to be handled differently. A quick fix-up is just to subtract the means from all variables, which forces the regression through the origin, and then to shift it to pass through the variable means instead. But then we still need an estimate of the uncertainty of the restored intercept term.

The widely used econometrics package EViews doesn’t seem to have TLS. An Ivo Petras has contributed a TLS package to Matlab File Exchange. It may work, but as such it has no Matlab endorsement. (I recently found a very helpful program to solve Nash Equilibria there, but it only worked after I corrected two bugs.)

The usual approach is to center all of the variables involved at zero (the line can be moved by de-centering the result later) and as Steve says an svd procedure applied to estimate the coefficients. In the case of two variables, there is an explicit solution. Inference on the result would be difficult and resampling or asymptotic large sample results would likely be the only type of inferential methodology available. I have not bothered to research what these methods might be.

I have some real concerns that the methods don’t really properly take into account the individual “errors” for the predictors or the responses. For each observation, there is in effect a single “residual value” which is apportioned to each of the variables (in exactly the same proportions for each observation) where the apportioning is determined by the fitted line (or plane as the case may be). I wrote up some criticism of the procedure at my blog a year ago.

ymat and xmat are matrices containing the response variables and the predictor variables, respectively. They should be decentered before use. The output is the regression coefficients and the predicted values for ymat. The residuals can be calculated by subtracting the latter from ymat. In the case of a single variable for each, the coefficient is the slope and the predicted values form a straight line.

Do I “approve” of the procedure? I guess there are probably some uses, but IMHO, I don’t see that it really handles the problem that all of the variables can contain uncertainty in any reasonable fashion.

Thanks for the link to your blog page, Roman. Since this is getting OT here, we should move this discussion there.

But as you note in one of your comments on your page, the variables must be scaled so that their errors have equal variance before running standard TLS. If this is not done, measuring temperature in F rather than C changes the results, and equalizing the variances of the variables themselves can give absurd results.

I try to discourage people from over-generalizing since it usually diverts attention from the actual issue at hand. My interest in this post was not to make claims about all climate scientists and all regressions. I was focusing on one particular example in which it appears certain to me that Schmidt (1) used a method inappropriate to the task; (2) erroneously reported the reciprocal of the coefficient. As seems to be policy in the field, no one concedes even the smallest point.

If the amplitude of the noise is downscaled by the same factor as the trend (factor 2 in your example), then taking the ratio of the trends or the slope of RMA regression of blue against green will give similar results.

Gavin’s error can perhaps be seen more clearly with an example with no noise. For data 1 (d1) and data 2 (d2) with identical slope and no error but an offset of deltaT (d1 = d2 + deltaT), plotting either d1 vs d2 or the ratio of pairs of points at each time t (d1/d2 at t) gives a positive trend. This will only go away if each data set is converted to anomalies with the same variance.

Clarification: The problem will only go away if both data sets are normalized to the same mean, and I am not sure about different variance, especially since a different slope will give a different variance of the data even if the “noise” is the same–dividing by the variance will tend to reduce the slope of the data with higher actual slope I think, and thus should not be done.

A range of [0.784,1.234]… and a mean (if you think that is sensible) of 0.9708 . Lest anyone think that volcanoes or something are affecting this, the same calculation for 2010-2100 is a range of [0.914,1.097] and a mean of 0.9897.

So, I think that this implies that an (non)-amplification factor of ~1 is pretty close to a consensus.

Steve: in a previous comment, I stated:

I don’t have time to write this up right now as I’m going to be away for a couple of days, but the retrieval scripts provided above, permit the comparison of GISS ER amplification factors to other models. Amplification factors for GISS_ER over ocean, over land and over land-and-ocean are in the top quartile among models. The possibility of the GISS ER model being somewhat “wrong” on this point should not be precluded. I presume that Schmidt will argue something like this rather than concede that his actual calculation was incorrect.

Looks like I got this prediction 100% correct :) Schmidt’s original calculation was incorrect but needless to say, no admission. As I noted in my previous comment, GISS results are in the top quartile amplification factors and other models give results more like Schmidt wanted. To the extent that GISS ER is materially wrong on this point, other models do support a lower amplification factor over land. But only to the extent that the GISS model is incorrect in this respect – which, I take it, Schmidt is now conceding.

Regarding the range of amplification values: the expectation that TLT will exhibit ‘amplification’ over TAS globally (land + ocean) is due to water vapour feedback from surface warming causing a change in the moist adiabatic lapse rate. Essentially this means the rate at which the atmosphere gets cooler with altitude reduces. Of course, being the moist adiabatic lapse rate we would expect this effect this to be dominant over the oceans but not so much over the land. I suspect this is why Gavin expects a near 1-to-1 ratio over land, though obviously the reality (even virtual reality) is a bit messier.

I actually think the range of values over the long term (2010-2100) is quite well constrained – 0.9 to 1.1 – given the complexities involved. Over the 1979-2005 period short term fluctuations play a role in widening the range.

Regarding Steve’s suggestion of a UHI/station-siting component I don’t think this works because the same relationship is seen over the oceans.

If we take the model expected ocean/land ratio to be 1.6/1.1 = 1.455, we can see that 1.455*(mean of land Tsat/Tsurf ratios) = 0.997

The mean of the ocean Tsat/Tsurf ratios is 0.982. We can see that the relationship between land and ocean TLT/TAS in the models is the same as the relationship between land and ocean TLT/TAS in observations. Whatever the problem is it doesn’t seem likely that UHI is playing much of a role here.

Further to this, the RSS TMT data suggests that trends in the satellite products are not even self-consistent between layers with regards to the physics. We would expect tropical (20S to 20N) mid-troposhere trends to be larger than tropical lower-troposphere trends, yet the reported values are TLT = 0.144; TMT = 0.116. I’m not sure where to obtain UAH TMT data but I suspect their values would be similar.

If we’re accepting the physics, as was the premise for the original post, the data point to a systemic problem with realised trends in the RSS and UAH products.

Perhaps a consensus might be obtained be trying to see if modellers can reach agreement on about 2/3 of the problems and then just split the difference on the balance. If this works for difficult negotiations, it might work for model wars.

people will ask stupid questions, IP will be lost, nothing is proved by running the same code over,
people will lose the opportunity to learn for themselves.

The simple facts are these. facts we can observe based on steve’s on going experiment with changing how we publish science.

1. Vague descriptions in text are replaced with precise definitions of what is done.
2. Code is built upon and improved
3. Others can quickly translate into their language of choice
4. Issues of debate are quickly clarified and personality doesnt dominate ( its always there, but its not dominating the conversation
5. res ispa loqitor. ( monktopus does my latin)

Simply, there never has been a good reason for NOT releasing the code. ( ok a couple of exceptions )

I don’t object to your words or anyone’s words because of the author. In fact, i believe that your freedom of choice in words is somewhat of an illusion. Hence the intentional fallacy. Looking at your sentence I do not object to a single word. In fact I have never read a sentence of yours where I objected to the words. Still, you could take notice of how the process works differently when power is shared.

We could first take notice of the proof by assertion. We could then grant that sharing code shares power and make model wars work differently. We could finally take note that this might not suffice to settle the problem of reaching a consensus through difficult negociations.

There is no assertion. There is a request that you attend to the evidence in front of you, for once. There is a request that the arguments of the past be weighed in light of the evidence. We could take note that other strategies for achieving consensus in this area have failed miserably, for once. Again, attend to the evidence. We could take note that the following strategies ( listed in private documents and attachments) have been tried and have failed.

1. Ignoring the other side
2. discrediting the other side
3. Denying the other side, their legal rights
4. ridiculing the other side
5. ‘educating the youth’
6. sandbagging the other side

In short, the record has evidence of what we all have experienced one time or another. Bad faith begets Bad Faith. Time for a new approach and a new negotiating team. But that’s just 25 years of negotiating experience ( as bad cop ) speaking.

The request seems quite tengential to my own remark, which is not really mine anyway, and to Tom Gray’s remark regarding the “consensus”.

The request does not acknowlege what we have conceded in the sentence starting with “We could then grant.”

The request appends yet another “Yes, but Climategate”, starting with “We could take note that the following strategies”, covering up most of last comment, but also the previous one, and also a reply to Richard Drake.

The request appeals to Moshpit’s own authority.

So Moshpit’s request is being sent with an untrue claim, the imposition of a burden I do not hold, the refusal to recognize a concession I already offered twice, the famous Yes but Climategate, and an appeal to his own authority.

Auditors can speculate on what would happen if bad faith begot bad faith.

Auditors can speculate on why we should bother with people with no honor.

To talk to another team, to pick on another Team than the Kyoto Flames.

To talk to another person, a good start would be to use less categorical statements, to forget about self-representation, and to honor past commitments.

This applies to everyone, including Gavin regarding his commitments hitherto.

Gavin, dealing with noise is a well known problem, with epic amounts of literature in Electrical Engineering (in particular, Communications). Defending your proposed model by invoking what happens when no noise is present is an incredibly flawed approach, and one that would laughed at by any communication journal, where rigorous analysis would be required. Since all climate data is noisy, quantized, sampled and filtered before anything is done with it, such a hand wavy argument for what is a reasonably simple problem dealing with noise doesn’t exactly instill confidence in ones analysis of this data.

Furthermore, Craig’s comment is also apt. Gavin’s claim (minus noise amplification issues) is correct so long as the models are linear. If they are instead of the form sat_i = a*t_i + b + noise, or if the noise doesn’t have a mean of zero, then his claim is incorrect.

Note that Gavin’s original calculation was not the equation that he is presently arguing but the opposite direction. The “amplification factor” of 0.95 can be obtained by a regression of TAS~TLT not TLT~TAS. IN effect, Gavin modeled Surface Temp = a* Tropo + noise. The coefficients from the regression Tropo = a*Surfact + noise give very different coefficients. :)

No, TLS is whole other story. If you want to do regression on one noisy dataset against another one you need to go into error-in-variables and have a good idea of noise, error magnitudes in both. That is usually difficult or impossible depending on what is known about the data.

The best solution here is to apply OLS in a valid way, that is what Steve ratio of trends is doing.

GS’s attempt at doing OLS on a scatter plot is naive, ill-informed and can only give wrong answers.

He’s gone a bit quiet. So either he’s learnt something here and it’s an embarrassed silence, or he has decided that this just one of “communications problems” and he has gone away to work how to best get his message across to all us mere mortals who are too foolish to bow to his superior intellect.

I cannot understand why you are pairing the points. Either one model is correct and the other 23 are all wrong, or they are as valid as each other. If that is the case then you have to treat both TAS or TLT as a population.

Waht is interesting about the 24 TAS and TLT numbers that Gavin provides is the line shape.
Rank each row, smallest to largest and plot about by rank number. The two data-sets have identical line shapes, in modeling terms this screams screw up.

The “land effect” and “ocean effect” don’t magically begin at the shore, nor is the “ocean effect” evenly distributed across all areas of all oceans. I don’t have a clear mental picture of the physical basis of the two calculated lapse rates.

Just in case you really wanted to know about the tilde–I couldn’t tell whether you were serious, but it seemed like a reasonable question to me–I’ll give a reply that may serve until one of the cognoscenti deigns to respond.

Judging from my vast (last-couple-of-weeks) study of R, I’d say the tilde is what R uses to separate the variable being modeled from the model. That is, y~x, where x and y are equal-length vectors (e.g.,time series), tells R that you want it to find the scalar values m and b that minimize the sum of the squares of the elements of the vector y-mx-b.

Thank you, and Joe Born. All the answers were exactly what I expected to hear — but as Joe Born says — us guys in the cheap seats like to follow some of the action… Despite studying Eng. Math and Theoretical math (but not that much stats and certainly not R) I still wonder if someone is using a notation system that I don’t know — and it is usually a coin toss on that issue… So it helps to let us guys in the cheap seats know where to find the explanation. Maybe the rest of us should just stick to pole and zeroes and eating popcorn… ;-)

Joe: Yes I am in the same boat — but also working through Octave and a couple of other languages as well — so it has been a busy couple of weeks.

I have R code to calculate century averages, standard deviations etc… using factoring and then cbind() but i’m having the problem where NAs are resulting in centuries being excluded. Any idea how I can calculate century averages excluding NAs?