Steig Mystery (almost) Solved!

Thanks to some insight from Ryan O (#43), we have been able to make progress on the puzzle of where the values in the Steig paper came from. In fact, it now is quite apparent that Steig et al used the reconstructions based on the AVHHR satellite data rather than the ones based on the AWS dataset .

First, we should make it clear that Shuman used a regression which was based on the annual sequences which we looked at in the previous post (and got a perfect fit both in the trends as well as in the 95% error bounds). To demonstrate this, we use R starting with the monthly data defined in comment RomanM (#20). This has to be converted to anomalies before the regression can be carried out.

Although the trends in both case are basically the same, the error bounds differ substantially. The mathematical reason for this is that although the annual averages will be less variable than the monthly values, there are 12 times as many monthly data points. However, there may also have to be considerations made for autocorrelation, the strength of which may differ for the two types of series. Because of this and possible non-linearities in the relationship, no data analyst would automatically jump to the conclusion that the calculation giving the lower error bound for the trend is somehow “more accurate”.

Anyway it is clear that the Shuman results are based on the annual series.

The values calculated as “distance” are in the same numeric order as the actual distances (but with considerably less calculation error). Now we convert to annual values and then calculate the trends for the ten closest points to each AWS.

The trends are closer than for Byrd, but they are still not quite on the mark. Error bounds are off (by close to 50%) again.

So, Steig et al did NOT use the annual series calculated from the satellite reconstruction in calculating the posted values for the trends at Byrd and Siple. However, the trends aren’t that far out so it is reasonable to try one more approach. Did they run a regression on the monthly series?

There are three possible candidates (none of whom are among the closest ten grid points – grid point 265 is the 19th closest grid point to Siple’s co-ordinates).

So, we have discovered the following:

The Shuman reconstruction was based on actual AWS data and was extended by using satellite data to the 19 year period. They did their regression on an annual basis generating a trend and an error bound. The Shuman results (without the caveat placed by the original authors on the interpretation of the trend) were used to “enhance” Figure 3 of the Steig paper.

The Steig reconstruction was based on the manned surface station data and different satellite data. It did not use ANY actual AWS data. The regression was done on a monthly basis thus reducing the apparent error bounds by a factor of about 1/3. No other overt numeric comparison of the results is indicated in the paper or the SI although supposed seasonal similarities are indicated as existing.

Some comparison.

There is still a bit of a puzzle with regard to which grid point(s) or average of grid points may have been used, but the overall picture seems to have been solved.

You would think that they would make things more clear. The original quote from the paper

Independent data provide additional evidence that warming has been significant in West Antarctica. At Siple Station (76S, 84W) and Byrd Station (80S, 120W), short intervals of data from AWSs were spliced with 37-GHz (microwave) satellite observations, which are not affected by clouds, to obtain continuous records from 1979 to 1997 (ref. 13). The results show mean trends of 1.1 +/- 0.8C per decade and 0.45 +/- 1.3C per decade at Siple and Byrd, respectively (Ref. 13). Our reconstruction yields 0.29 +/- 0.26 C per decade and 0.36 +/- 0.37C per decade over the same interval. In our full 50-year reconstruction, the trends are significant, although smaller, at both Byrd (0.23 +/- 0.09 C per decade) and Siple (0.18 +/- 0.06 C per decade).

suggests that the AWS reconstruction would be the more appropriate one to use. Spelling this out i the SI couldn’t hurt.

Fred (#3), don’t hold your breath waiting for any “corrections”. It ain’t gonna happen. My suspicion (just a guess) is that the authors didn’t bother to look more closely at the Shuman result than just quoting the numbers. They probably weren’t even aware of the details of how Shuman arrived at them. If you are going to go to the length of actually using them (instead of your own) in a graphic, I would think that it might be imperative to ensure proper comparability.

Re: romanm (#4),
In regards to Fred’s comment also, is there an official rebuttal process, ie is everything conclusive enough and far enough along to have a published Roman, Ryan, Jeff, Jeff (2009) (sorry if I left anyone out)? Or do we still need the original data to be able to make definitive statements about Steig et al(2009)?

Darn!!! I had a great lengthy reply written out for you, but my computer ate it!

I may come back to this topic later, but for the moment, the short answer is no, I don’t think we don’t have anything substantial of the type that would be worth pursuing with the editors of nature at this time.

The bottom line was that I am of the opinion that we have discovered a lot of questionable things:

– using wrong data

– doing the wrong thing in a throwaway comment

– possible misdirection of the press and public by exaggerating the “significant” nature of their rather numerically unimpressive results

– using the “Antarctic average” rather than focusing on the different nature of the behaviour of the climate in East and West

-using a time period that starts with lower temperatures when we can see how changes of on or two years in the range of a period can reverse supposed trends

-using linear trends when the temperatures don’t seem to respond that way for many of the stations

-PC results (e.g. PC3 of the satellite recon) that are spurious-looking

-other theoretical statistical issues – were accomodations made for autocorrelation?

None of these things will make a dent with the Nature editors. What we need is the intermediate satellite data results and information that was crucial but not mentioned (in particular what percent of the variability did their 3 PCs account for – something that should have been in the SI but wasn’t). I think that there are still some skeletons in that data that are worth hunting down.

(Please. No piling-on on the disclosure issue or the snip-scissors will come out!)

Re: romanm (#8),
Thanks for the reply!
Is the issue of using regEM for infilling with non-gaussian and non-spatially random missing data encompassed in your list or was that resolved as a non-issue? Theres been so much detailed breakdown its been hard to keep up :)
Also if Nature was not apt to publish, assuming the data is made available and the problems persist, would there be other publications that might consider taking this analysis?

Well I’ll be waiting for the next edition of Nature to publish a full retraction of Steig and the paint by numbers front cover art and to publish your article explaining how apples and oranges are, in fact, different.

I hate to be a pain, and I know it’s unreasonable, but please rather than refer to another thread to get some data/functions etal could you please include the appropriate R, commented out if necessary so it’s turnkey. I also note one of Steve’s recent posts includes a link to a Steig file on his D: drive which didn’t match anything in the Steig directory by the time that I was going to run it on a 640 cpu, 4Tb memory machine which has current R installed, and people were talking about factorising it. My plan was to do what Steve asked and send him the W response, and then give him an account to do it himeself on real machines.. I know Steve tries to make things turnkey, and I know you end up commenting things out to remove redundant downloads but..

I appreciate the request, but it becomes a matter of inflating the size of the post if every time you add a new procedure you have to go back to first principles. I try to be complete so I did make a reference to two other places where the initial work was done. The burden was shifted to the reader to catch up. I always run the script from scratch to try to guarantee that if you follow my instructions, the script will run. Steve is a (much) more prolific poster so it is not surprising that he may have personal references in his scripts. i will make an extra effort to be more complete.

Re: Scott Brim (#10),
Hey you’re speaking to the converted. I’ve usually done one or the other of both these things. This ***one*** time I didn’t do either, and my laptop decided to drop the wireless to our home router. When I tried to go back, it didn’t. I even searched the files in the IE temporary directory for one or two keywords in case the text was stored before sent – but to no avail. Rats!

The least squares solution of the first equation produces betas which are equal to the monthly means. In effect what you are doing is turning the values into anomalies by subtracting the monthly means from each value.

The name that I usually use for 0-1 dummy variables is indicator variables.

Re: RomanM (#18), normally, I cringe when I see ‘anomalies’ because they are usually calculated as deviations from the mean of a limited time period. So, apparently, my brain did not comprehend what it was seeing in your code.

I also dislike the use of the term “anomaly” because of the overtones which go beyond the meaning of as it is intended here. One of the definitions of the word is “peculiar, irregular, abnormal”. When applied to temperature data (similarly to the case of “significant trends”), the ordinary listener adds their own emotional interpretation.

A much better descriptor might be to refer to them as “deviations”, or as “centered” or “standardized” values. However, I used the term because it is common practice in temperature data discussions.

In the case that you are referring to (deviations from the mean of a limited time period), I would think that the adjective “indexed” would be more appropriate .

RomanM: I had a great lengthy reply written out for you, but my computer ate it!

If you have a lengthy post, write it in Word first (or else notepad) and then copy/paste it into the text box. Or else copy it to the clipboard from the text entry box just before hitting Submit Comment, in case the upload process fails for some reason.
.
It would appear that a comprehensive writeup of CA’s analysis efforts to date is in order, one which also documents what material is currently missing that would allow the analytical reconstruction of Steig et al (2009) to move forward to final completion.

RomanM: When I tried to go back, it didn’t. I even searched the files in the IE temporary directory for one or two keywords in case the text was stored before sent – but to no avail.

Most blog software holds the text input form only in memory, and only within the scope of the process which is driving that instance of the form. Once the process terminates, the form goes to the bit bucket, taking whatever text you entered with it. (Easy come, easy go.)

Roman, I hope this isn’t too far off topic. I think we’re getting closer to reasonably useful processing of the satellite data. It has several differences from what was originally plotted and Jeff C has noticed some possible odd and unmentioned processing steps. It’s not ready yet but it’s coming along quickly.

I made a movie of the satellite data and found the edges of the Antarctic are apparently contaminated with sea surface data.

Keep up the good work Jeff. Independent analysis of the satellite data is useful. If your results differ from those presented in the paper in any substantial way, this could possibly become an issue to be raised with Nature putting the onus on Steig to justify why their results differ.

I really can’t believe how ludicrous this situation is. You shouldn’t have to try and guess how someone else carried out an experiment. Reproducibility, far more than peer review, is absolutely vital to ensuring the scientific fidelity of any given answer. Nature is extremely poor in presenting such methods, but they really should ensure that such information is available on request, and the paper withdrawn if it isn’t, or the exercise is pointless :( .

It’s sort of amusing that Nature magazine should have to be notified of this work; surely the Editor has someone reading this blog. How much more ridicule before that stiff upper lip starts to tremble?
==========================================================

[snip][snap][snop]… I think you guys are having too much fun here. If Steve decides he even wants to come back from Thailand, will he recognize this blog when/if he does? Will you let him have it back?

Has anybody noticed (because I have… for the PAST TWO YEARS, every day). This picture has not changed nearly for every day!http://wxmaps.org/pix/temp8.html. The vast part of South America has been below anomaly nearly every day for the past two years check it yourselves…

It’s much noisier than I thought it would be in its original form. The data is definitely not very similar to the reconstruction. I think the noise level could be the reason behind the PC breakdown but the data has gaps so it’s not a simple process to run PC analysis.

I don’t know for sure but it seems like it’s reasonable to simply truncate the array months missing, run PCA and replace them later. The sat grid covariance shouldn’t be distorted by the 3 missing months. I’m not even sure if R can handle this calc yet.

Roman, I’m sorry if this is too off topic just snip it. It seems like you or Ryan (or perhaps some of the others here) are probably my best bet’s for an answer.

When I got to the ftp site after clicking “FTP site” on Jeff’s link (and then “accept” their conditions) , the initial ftp page seemed to keep cycling as it tried to load. I stopped the cycling by clicking the red “stop” x on IE, then hitting “refresh”. I could navigate the site smoothly after that.

The files seem to be monthly with two different times 0200 and 1400. Which (if any of these) ones are you looking at? It looks like a lot of scraping would be needed.

Re: Jeff Id (#38), – regarding running PCA on the http://stratus.ssec.wisc.edu/products/appx/appx.html monthly data, you can simply replace the missing months with the monthly mean, or could we run all 5509 series through RegEM? I have not tried this yet and have no idea if RegEM could handle it, but it might be fun to see if it works.

I’m also experimenting with deleting cell/month combinations that exceed +/- 10 deg C of the monthly mean for each cell (similar to what Steig discusses on page 1 of the SI). I was going to infill them with the mean data, but if RegEM will work I’ll use it.

Re: Ryan O (#44), Ryan O – how were you going to process the daily AVHRR data? Do you have CASPR running?

Re: romanm (#41), Jeff Id has assembled all of the monthly temperature data into two files, one for 0200 and one for 1400. I have some scripts that combine four cells into one (to make them 50 x 50 km), remove the cells over water, calculate monthly means, and covert to anomalies.

Re: Jeff C. (#48), No, no CASPR. I wasn’t going to try to duplicate Steig from the raw data unless Comiso/Steig provide a pretty detailed explanation of how they did it. I was going to attempt to calibrate the Chan 4/5 data to ground measurements. More like Shuman, less like Steig.

I was going to attempt to calibrate the Chan 4/5 data to ground measurements

Do you have a link for Shuman? I’m not familar with his methods. I performed a ground calibration on the monthly AVHRR data by forcing the South Pole cells to match the Amundsen-Scott surface data and offsetting the rest of the AVHRR cells accordingly. I got trends that look surpisingly close to the Comiso trends shown in Monaghan and the Steig SI.

.
The procedure they used (basically, applying a sinusoidal correction to the sat data) won’t work for AVHRR. But, like any calibration, the principle is the same. Find an expression that forces the sat to fit the ground for a sample of the data and then test the expression vs. the rest of the data. A good calibration will have good out-of-sample predictive power; a bad calibration will not.
.
The question is how much are clouds going to bugger up the calibration . . . and I don’t know the answer to that one.

I’ve just sent off a letter to Steig and Comiso officially requesting the TIR data and protocols under Nature’s availability policy, with a copy to Nature’s Chief Physical Sciences Editor Karl Ziemelis.

Steig has already point blank refused Steve’s request for this data. Comiso said he would comply, but has been sitting on Steve’s request to him for over a month now. Nature authors are required to make data “promptly” available to readers, not when it suits them.

It wouldn’t hurt if Ryan, Roman, and/or the 2 Jeffs, with their superior knowledge of the details, would also request this data, also making it clear that they will take any refusal to Ziemelis for appropriate action. It may be fun to try to rebuild this data from scratch, but it’s Steig and Comiso’s job to just tell us how they did it and what the intermediate matrices are.

Dear Drs. Steig and Comiso,
As you are undoubtedly aware, Nature has a strict policy that, “An inherent
principle of publication is that others should be able to replicate and
build upon the authors’ published claims. Therefore, a condition of
publication in a Nature journal is that authors are required to make
materials, data, and associated protocols promptly available to
readers without preconditions.”
( http://www.nature.com/authors/editorial_policies/availability.html ,
original emphasis).
Pursuant to this policy, I hereby request that you make publicly
available the data and protocols that were used to construct the AVHRR
Antarctic temperature reconstruction file ant_recon.txt that underlies the
principal conclusions of your letter, “Warming of the Antarctic
ice-sheet surface since the 1957 International Geophysical Year,”
which appeared in Nature for 22 Jan. 2009.
It appears that detailed raw AVHRR data from NSIDC
(http://www.nsidc.org/data/avhrr/) was first edited down to a 300 X 5509
matrix of monthly averages representing 50km square areas for 1982-2006.
This was then processed to remove observations believed to be
contaminated by cloud cover. The resulting matrix was then either directly entered
into Tapio Schneider’s RegEM program along with surface station data,
or else was first reduced to rank 3 (the same rank as the output matrix)
before being entered.
The data and protocols I am requesting be made publicly available are therefore
1. The detailed raw AVHRR data for 1982-2006 that was employed,
or else the identify of the specific publicly available files that contain this data.
According to http://www.nsidc.org/data/nsidc-0094.html, the 25km
EASE data is at present publicly available from NSIDC only through 2000,
while the 5 km EASE data only runs through June 2005, so you evidently
had access to data that has not yet been released by them.
2. The protocol by which this detailed data was reduced to monthly
50 km averages. This could either be in the form of a detailed description or
commented computer code. In particular, how were coastal grid square
averages computed?
3. The reduced 300X5590 matrix of pre-cloud-masked data.
4. Any details of the cloud-masking protocol that go beyond the discussion
on p. 1 of the online Supplementary Information .
5. The 300X5590 matrix of cloud-masked data.
6. The protocol by which this matrix, with its many missing observations,
was reduced to rank 3 (if it was) before being entered into RegEM.
The 600X5509 output file ant_recon.txt already online at Dr. Steig’s webpagehttp://faculty.washington.edu/steig/nature09data/ is very helpful and
informative, as are his link to Tapio Schneider’s RegEM code and the
online SI. However, the additional information specified above is
necessary in order for other researchers to fully replicate the
findings of this very important letter.
I am writing to Dr. Steig as lead author, but also directly to Dr.
Comiso, since as I understand the division of labor on this paper,
the preparation of the AVHRR data was primarily his responsibility.
If it would expedite matters, perhaps Dr. Comiso could post the
requested information at his own institution, eg with a link athttp://earthobservatory.nasa.gov/IOTD/view.php?id=36736 .
If items 1 and 2 above prove to be time-consuming, it would
be very helpful if just the two matrices in items 3 and 5 could be
made available immediately.
Thank you very much in advance for your help in this matter. As
you are undoubtedly aware, Nature’s availability policy goes on to provide that,
“After publication, readers who encounter refusal by the authors to comply
with these policies should contact the chief editor of the journal (or the chief biology/
chief physical sciences editors in the case of Nature). In cases where editors
are unable to resolve a complaint, the journal may refer the matter to the authors’
funding institution and/or publish a formal statement of correction, attached online
to the publication, stating that readers have been unable to obtain necessary
materials to replicate the findings.”
Sincerely yours,
J. Huston McCulloch

I note that some of the matrixes you reference in 1-5 above show 300 (or 600) x 5509 while others are x 5590. Is this a typo or not? And if so, is it just in your typing to this blog or also to Nature? If the latter, you might want to issue a correction.

Would it be better for us to all go after now or add to any subsequent complaint should that become necessary?

You can’t complain that you’ve been refused the data, unless you have first requested it yourself. Likewise, I can’t complain that Steve has been refused — only Steve can do that. I’m hoping he does file a complaint in the near future, and I think that maybe a few others’ rattling of the cage may improve his odds of results.

So I would urge at least a couple of you who are most familiar with the details to submit similar requests. For the sake of brevity, you might want to just second my request, but should add any details that I overlooked. Then, after you are refused or ignored after a reminder sent a week later, you can send in your own complaint.

Steve got blown off by Nature on MBH98, but times have changed, editors have turned over, Mann is not the lead author, and Comiso (if not Steig) seems inclined to cooperate, so maybe we’ll get lucky this time around.

I note that some of the matrixes you reference in 1-5 above show 300 (or 600) x 5509 while others are x 5590. Is this a typo or not? And if so, is it just in your typing to this blog or also to Nature? If the latter, you might want to issue a correction.

My bad, but I think they will figure it out. I’ll correct this in my complaint if necessary. At first I had them all 600X5509, but then at the last minute fortunately remembered that the AVHRR data itself would only be 300X5509.

Although the trends in both case are basically the same, the error bounds differ substantially. The mathematical reason for this is that although the annual averages will be less variable than the monthly values, there are 12 times as many monthly data points. However, there may also have to be considerations made for autocorrelation, the strength of which may differ for the two types of series. Because of this and possible non-linearities in the relationship, no data analyst would automatically jump to the conclusion that the calculation giving the lower error bound for the trend is somehow “more accurate”.

If there is no serial correlation, there should be no substantial difference between standard errors from monthly or annually averaged data. There are 12 times as many monthly data points, but then the annual averages will be only about 1/12 as noisy, so it should be a wash.

There might still be a slight difference in confidence bounds that arises because t critical values are being computed with fewer degrees of freedom with the annual data than with the monthly. But this would be only a very small effect unless the number of years is getting really small (as may be the case with raw AWS data and even with Shuman’s ~10-yr AWS recons).

If the monthly serial correlation is truly AR(1) with ρ ≈ .3, the serial correlation should almost entirely wash out at the annual frequency since .3^12 = 5.e-7. However, Ken Fritsch, at Comment #14 of the It’s a Mystery thread, reports,

For AR1 correlations for the AWS and TIR reconstructions from 1957-2006, I found:

For TIR, the annual correlation is still over half the monthly, and for AWS it is actually greater! This is not at all what would be expected from AR(1) errors.

One possibility is that this is just measurement error in the correlation coefficients. For large samples (eg n = 600 as with the monthly 1957-2006 data), the standard error of the correlation coefficient itself is approximately 1/sqrt(n) = .041 with n = 600, so the monthly correlations are off-the-chart significant.

For n = 50 observations, as with annual 1957-06 data, this is a bad approximation, and should be replaced with the exact Durbin-Watson test for serial correlation. The DW-stat and its exact p-values can be computed with MATLAB function DWTEST, and also with an R function of the same name.

Using MATLAB, I get comparable correlation coefficients, and DW = 1.42 (2-tailed p = .015) for annual AWS, and DW = 1.60 (2-tailed p = .081) for the annual TIR trend line. The annual serial correlation is therefore still highly significant for AWS, indicating that AR(1) may not be a perfect model for it after all. The TIR annual serial correlation is significant at the 10% level on a 2-tailed test, and even at the 5% level on the 1-tailed test originally favored by D&W (1-tailed p = .040). So perhaps the AR(1) model isn’t that great for TIR either. But still, adjusting for it is a lot better than no adjustment at all. In any event, it doesn’t hurt to try both monthly and annual and see if the answer is robust.

(Ken uses an R pkg named durbin.watson in his Comment #281 on the Deconstructing Steig thread, with only slightly different results. For some reason he gets and error message with this, but still obtains answers.)

(Ken uses an R pkg named durbin.watson in his Comment #281 on the Deconstructing Steig thread, with only slightly different results. For some reason he gets and error message with this, but still obtains answers.)

Hu, the error message that I left in was due to my attempted call to the function durbin.watson without loading the package called “car”. I had to load that package before doing any DW testing in R.

As I recall I left the error message in because I had asked Roman for help when I could not call up the DW function and Roman responded, in college professor style, by giving me sufficient info to figure it out for myself. Now I know (something) about loading packages and libraries into R.

On precision, accuracy and packaged tests for them, I have a hypothetical.

Start with a daily maximum average temperature from a single site from 1860 onwards. In 1860 the thermometer was mercury-in-glass and it hung from a tree. One could draw a graph of temperature with time for a few decades, and select a way to draw error bounds. Nothing fancy, been done many times.

Then the thermometer gets put inside a home-made chamber. There is a jump in the data. At some recent date some authority like GISS or Hadley decided to add 1 degree to the past average, as an “adjustment”.

Where should the error bounds go? Should not the lower one stay where it was and the upper one be shifted up one degree?

Next the thermometer is put in a regular Stevenson screen which has a systematic 0.5 degree difference. Up goes the main curve again, but where should we put the error bounds?

Then the lights start to come on and the UHI effect is introduced into the data and the thermometer is replaced by a thermocouple then a thermistor in a different screen box. Adjustments like TOBS and FILNET are projected back to 1860 for some reason. Same question, what do we do to the error bounds?

There seems to be a tendency to compute statistical error bounds about the new average line each time a new average is adjusted. It seems to me that the correct error bounds should enclose all of the readings, including the early unadjusted ones. Do you agree in principle? Bias versus precision becomes relevant.

The same topic arises with GCMs. As I’ve written before, the error of an ensemble should not be the error of the selected runs handpicked for comparison in a round robin. It should include all runs by all participants, even ones that looked wonky, unless there was a known mathematical error to cause rejection. Otherwise you get inbreeding, which we all know can be damaging to future propagation.

Since my computer language ceased at my failure to comprehend “BBC Basic”, I don’t get any of the calculation stuff, but I am most impressed by the efforts various contributors to this blog have spent, in getting to the bottom of all things “Manomatical”.
I’m especially impressed by the use of “Nature”‘s own availability policy to get the data that Steig has refused to divulge.
I see that Dr Schmidt over at (Un-)Realclimate still persists is his mantra of “Steig is robust”.

Nice letter Hu. I hadn’t contacted any of the coauthors myself but had emailed Steig who replied to communicate with his coworker at the university. After some initially hopeful communications I was told to wait 3 months. Since it sounded hopeful that Comiso was going to release the data soon I was simply waiting. At this point, I have no idea why it’s taking so long to push send on their computer.

RE Jeff Id, #58,
My guess is that the reason Steig switched so quickly from cooperative to stonewalling is that Mann intervened. Comiso sounded forthcoming at first, but since we haven’t heard anything more from him, it looks like Mann got to him as well. So unfortunately, it seems like politely reading them the riot act is the only way to get the data now.

Who was the coworker Steig referred you to? It’s the job of Steig and/or his co-authors to provide the data, not some grad student. Nature requires the data and protocols to be provided “promptly”, not in 3 months. You should definitely write back to Steig that this is inadequate, and that Ziemelis will be hearing from you next week if the required data and protocols aren’t on his website by then.

I would interpret an algorithm to be a mathematical “protocol” to transform one set of numbers into another, whence the authors are required to provide their algorithms as well as their data. This could either be in the form of a written and detailed explanation of what they did, or just well-commented code.

It sounds like durbin.watson would give slightly different results each time it is used due to sampling or resampling error, whereas dwtest should give the same answer each time.

Note that R’s dwtest defaults to the normal approximation for n > 100, while Matlab doesn’t do this until n > 400. Either will let you override this default. They may also differ in whether the default test is 1-tail or 2-tail.

Thanks, Hu for the details on these methods – it can be utilized to take my application of beyond the black box. It also means more work for me to now determine what kind of differences I’ll see between methods.

Raw Data.
All of the data used in the temperature reconstructions are from publically available data sources. Any queries about the raw data, or access to it, should be directed to the appropriate data centers, listed below.
Weather station data (both occupied and automatic weather stations) from the British Antarctic survey: http://www.antarctica.ac.uk/met/READER
Note that corrections have been made to AWS stations “Harry” and “Racer Rock”. See the READER web site for details.

Opps didn’t cut and past my entire comment correctly…when writing to nature it might help to reference Steig’s data link (above in previous post) specifically and note where it is not complete enough to replicate the results. This might help stop the reply that they already have provided the data requested.

There’s something I must have missed in these threads, which is fundamental to understanding the import of some of the stats analyses that form quite a feature of this blog. This is /exactly/ what do authors mean by statements like those in #4, (which happens to be a quotation from the original paper). Does anyone know what statements like “mean trends of 1.1 +/- 0.8 C per decade” are intended to tell us? Is the 0.8 one standard error, or is it the SE times the appropriate t multiplier, which a number rather close to 2 if one is considering the 95th percentile of the two tailed t distribution? If the latter then some of the quoted stats are apparently believably significant. It would be nice to know for sure, but we must rely on the original authors for an explanation, though in this instance they are not members of our community. I’d prefer always to see confidence intervals at a specified probability level, which of course encompass the sample size in their values.

Does anyone know what statements like “mean trends of 1.1 +/- 0.8 C per decade” are intended to tell us? Is the 0.8 one standard error, or is it the SE times the appropriate t multiplier, which a number rather close to 2 if one is considering the 95th percentile of the two tailed t distribution?

I can confirm what Hu says. In the It’s a Mystery thread initial post, for this exact reason, I calculated the 95% confidence intervals for the trends using the R confint function (which would use the t distribution and the appropriate degrees of freedom for the standard error of the trends) and divided the length of the interval by 2. When rounded, the results matched the Shuman “error” terms exactly indicating that they were indeed 95% error bounds.

Roman, if you or anyone serious enough wants the NSIDC AVHRR data , drop a comment on my blog with your email. I can send it in a gridded timeseries prior to 50km re-gridding or after. Since I don’t have a nice ftp site, the 80 MB takes about 20 minutes to upload to a free file share site so I don’t want to do it too many times.

RE Robinedwards, #65,
Shuman’s Table 3 indicates that the +/-0.8°C per decade represents a 95% confidence interval, ie about 2 times the standard error. They admit that of the 4 AWS they study, only Siple appears to show a significant trend either way.

However, I am not certain whether or not they made any adjustment for serial correlation.

Can you match these with R’s dwtest? If you rerun R’s durbin.watson, do you get the same results each time, or just close?

For several trials with annual AWS reconstruction in R using the durbin.watson (car) test I obtain 1.4182 for the DW stat every time, but the p changes with each trial: 0.020(T1),0.022(T2),0.014(T3) and 0.024(T4).

I need to try the other DWtest to which you linked me. I believe we think/know that that test is the same as the one in Metlab.

Using the dwtest(lmtest) in R on the AWS annual reconstruction 1957-2006, I obtain a DW stat = 1.4182 and a p= 0.01165 with every trial. I used the default with the “pan” algorithm (sample size less than 100). The iterations were 15.

RE Roman, #68,
Does Shuman consider serial correlation, or just use OLS se’s? It wouldn’t be hard for serial correlation to wipe out even the significance of the Siple trend. Is the SC in these regressions significant using R’s dwtest?

checking in briefly from Chiang Mai. Actually there is quite convenient internet access in all sorts of places here as it’s oriented to backpack travelers, all of whom are documenting their steps on Facebook.

Roman, thanks for these interesting contributions. Made me feel very proud both of the forum provided here and, if I may put it this way, a style of doing things.

Thanks as well to Ryan O, the rwo Jeffs. Hu and others for interesting comments.

the “rwo Jeffs”? A bit too much Tiger Beer maybe? Steve, do you really want to come back from Thailand? And my earlier question still stands…. these guys are having a lot of fun with your site… will they let you have it back? Enjoy your vacation… and you other guys.. keep uo the good work.

RE Geoff Sherrington,#73,
That’s a big and important topic that goes way beyond this narrow thread.
However, the secular consistency of adjustments and equipment is will be discussed at length in [snip] a forthcoming report by Anthony Watts. Stay tuned to WUWT for details!

The R routine dwtest (in lmtest) sounds like it uses the same non-random PAN algorithm as MATLAB’s dwtest, but it can’t be exactly the same, since the MATLAB version defaults to the N(0, 1/T) approximation to the distribution of r1 above T = 400, while R’s version defaults to this above T = 100. (Either will let you override the default, however.)

The numbers we are getting for the DW for annual averages of Steig’s AWS series are off in the 5th significant digit (1.4182 vs 1.4189). This is a little odd, but of no practical consequence. I just took the file as was, and didn’t round any intermediate results. Computing DW itself (if not its p-value) is straightforward, and should be replicable to machine precision.

I believe we think/know that that test is the same as the one in Metlab.

Using the dwtest(lmtest) in R on the AWS annual reconstruction 1957-2006, I obtain a DW stat = 1.4182 and a p= 0.01165 with every trial. I used the default with the “pan” algorithm (sample size less than 100). The iterations were 15.

Something must be wrong with one of the routines, since MATLAB gave me p = .01510 for the same problem. What are you getting for TIR?

MATLAB’s routine doesn’t mention iterations. Perhaps this is somehow related to the discrepancy, but I have no idea how this PAN algorithm works.

Last year, I tried printing out the orginal Durbin and Watson 1950 and 1951 Biometrika articles to learn where their critical values come from, but got blown away. I see now that both the numerator and denominator have exact chi-square distributions (provided D&W’s otherwise peculiar formula is used), but evidently these are not independent, else the ratio would just have an F distribution. The distribution of the ratio may be related to the arcane Wishart distribution which generalized chi-square to non-independent contributions. It arises also in Brown’s paper on Calibration which UC has called to our attention.

BTW, Jim Durbin visited OSU ten years ago or so, and gave a very stimulating series of lectures on Kalman filtering, which served as the basis for his subsequent book on that topic, and got me excited about using it for some problems I was working on. Quite a long and productive career!

Using the dwtest(lmtest) in R on the AWS annual reconstruction 1957-2006, I obtain a DW stat = 1.4182 and a p= 0.01165 with every trial. I used the default with the “pan” algorithm (sample size less than 100). Iterations from 10 to 100 did not change the DW stat or the p value.

When I used an alternative normal approximation with dwtest(lmtest) I got a p-value = 0.01216 and the DW stat remained at 1.4182. I will try the annual TIR when I recover from having grandkids over for the week. Thanks for the warning on metlab since a man of my age cannot afford to lose any more teeth – implants are darned expensive. It shall be MATLAB for ever and a day.

RE #85,
Thanks, Ken —
The documentation for R’s dwtest at http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/lmtest/html/dwtest.html gives references to 2 papers by Farebrother describing the algorithm (originally due to Jie-Jian Pan) supposedly used in both the R and MATLAB dwtest routines, despite their different answers. I’ll try encoding it when I get a chance and see what I come up with with this Steig data.

Meanwhile, all these details are disgressing pretty far from Roman’s topic of the validity of Shuman’s figures as cited by Steig. Let’s move further discussion over to the Serial Correlation thread.

This thread is especially interesting to me because the techniques recommended regarding use of dummy (indicator) variables to help cope with pronounced seasonal factors is something I’ve used routinely for climate data ever since 1992, when I first typed in the 1400 rows of values for Kew/Greenwich temperatures. I now use anther but equivalent method which is marginally quicker (in my software) for me to implement. The obvious benefit is that temporal resolution improves dramatically relative to using annual means and perhaps more importantly residuals from subsequent regressions are much lower. The term I use is “monthly differences” which is neutral as regards ones feelings or biases about what the values mean.

A further advantage of using monthly differences is that the incidence of missing values is proportionately lower, in that one or two missing months do not mean sacrificing the whole year’s data. For many Arctic and Antarctic data sets this is of considerable practical importance, as many will have noticed.

It will be pretty obvious that the main, large scale, conclusions from Annual and Monthly Difference data will be closely similar, but the latter turn out to be (in general) far more interesting, to me anyway.

I’ve posted in other threads, just occasionally, regarding the widespread – indeed almost universal – practice of fitting simple linear models to climate data. The advantage of adopting this type of underlying model (assumption?) is that it is very simple. My strong opinion is that it is over-simple, since virtually every plot of climate time series data makes it abundantly clear that climate does not behave linearly on almost any time scale. Why not use the observations themselves to help in the choice of a model that is much more feasible and more plausible? Well, perhaps no-one ever got fired for fitting a linear model! This does not mean that it is in any sense an optimum model.

The technique I use, and which I strongly recommend some people here to try, is first to form the monthly differences, by whatever method comes most naturally. These, by definition, have a mean zero. Now, form their cumulative sum by simply summing the MDs successively. Any missing value yields a missing value for the cusum, and is simply ignored in subsequent processing.

The resulting data set will be grossly autocorrelated, but this, for present purposes, is immaterial. We are not interested in the intimate statistics of the cusum but merely its general form.

Now plot the cumulative sum on the time axis. In very many cases the plot will be striking. What you may find is that the cusum plots as what appears to be a collection of roughly straight segments of varying duration which terminate in an abrupt change of slope which heralds another roughly straight segment. The pattern may be repeated several times on different scales. However, in many data sets there is a “grand scale” linear section that may endure for even a hundred years, indicating a very stable long term temperature regime, with several brief excursions in both directions.

To see these things for yourself I would advise initially looking at data for the North West Atlantic (Greenland/Iceland) for which there are several individual sites available. There is also a “consolidated” set published by Vinther et al in 2007, which goes back to the 18th century. If you use this one you will see a most spectacular cusum plot with a very pronounced change point at the end of 1922 (September) of about 2 C. Yes 2 degrees C, occurring in the space of a month or so. Prior to that date a remarkably stable regime existed, with a very slight but significant rise. After the “event” the climate was again stable at a higher temperature for around 60 years. The conventional view of the data is that there was a marked change in temperatures over the first part of the 20th century, although no-one, as far as I am aware, has noticed that the temperature increase was actually a step change.

The cusum has /no/ predictive properties. Indeed, the typical stepwise nature of climate change seems to me to indicate that reliable area or regional forecasting is impossible. Vinther’s data up to Sept 1922 gives absolutely no hint of a change. Just fit trend lines to the data partitioned at that date to assure yourself that the contemporary observer/analyst would have had zero expectation of a change until it occurred.

Many other data sets exhibit similar phenomena, usually masked by the normal climate noise until uncovered by old-fashioned industrial quality control techniques. I could write endlessly on this topic!

If you could email me I’d be able to provide uch more on this.

Robin

P.S. My first “Reply” seems to have been lost somewhere. I’ve had to re-write this stuff – rather tedious.