Incredible work. I’ve used Ryan’s work as my initial foray with co-workers to show the poor quality of team statistics and shaky ground of AGW. It has opened more eyes to reality than anything else I could have said. In a way it actually helps that it made the cover of Nature (ie., if this is this the best that is out there…). Less complicated than explaining the broken hockey stick and has dropped a few jaws. I especially liked when Ryan ‘played God’ inputing a’known’ Antarctic climate history and having Steig’s method double the warming trend. A truly valuable and noble effort.

Thanks, guys, and thanks, Steve for the topic.
.
We would all probably be best served, though, to pretend my stuff is the newest Mann paper and try to rip it apart. Better to have that happen now than later. ;)

Re: Ryan O (#4), Ok then.
1) Disambiguate – how much of the change is due to the extra PCs and regpar and how much is due to your change in the satellite data. Those are very different things. What is the 3PC/regpar=3 reconstruction with your ‘corrected’ input?
2) Error bars. It’s all very well to say that the 3PC/regpar=3 reconstruction with the model-frame input gives a different trend, but since it was extremely unlikely to give the exact same answer, it would have to be greater or less. If we knew the error bar on the trend value we could see if it was significantly more. (I would be surprised)
3) You have a big problem with validation given that you ‘corrected’ the initial satellite data to the stations in the first place. You should prove the validity of your method without this step (which frankly is rather ad hoc in any case).
4) perhaps we could see a clear statement that you consider that the aims of the Steig et al paper were valid, that the tools they used were valid (since you use the same ones), and that tediously repetitive ad homs against Steig and Mann are an unnecessary distraction?

Regarding #3, I wonder how much difference in trend this method would get with Ryan’s model frame method and high PC’s.

I did a summary of the reconstructions a short while ago. This was I believe prior to Ryan’s method of using ‘model frame’ data. In RegEM the satellite data is not replaced no matter how well the long term trend matches the surface stations. This means we get the trend from this extremely noisy data overlaid on the surface stations. Anyway, the summary of the direct RegEM methods are below.

Re: victor (#5),
.
Excellent questions.
.
1. With satellite corrections (spliced): 0.071 Deg C/Decade
With satellite corrections (model frame): 0.060 Deg C/Decade
Without satellite corrections (spliced): 0.088 Deg C/Decade
Without satellite corrections (model frame): 0.068 Deg C/Decade
.
I consider the model frame solution to be a better representation of surface temperatures. If you use the spliced solution (like Steig did), the post-1982 portion is entirely satellite data. No ground information is included. This is inappropriate, since the satellite data is never explicitly calibrated to the ground data. For the model frame, the change is 0.008 Deg C/Decade; the spliced is about double that at 0.017 Deg C/Decade. The geographic distribution of the trends does not visibly change. If you wanted to verify this yourself, the script is here: http://noconsensus.files.wordpress.com/2009/05/recon.doc Substitute “all.avhrr” for “avhrr.cal” in the do.recon() commands.
.
2. Steig’s overall trend was 0.118 +/- 0.065 Deg C/Decade – which places my trend (barely) within the 95% confidence intervals prior to correction for autocorrelation. However, remember that this “overall” trend mashes the whole continent together. In the three major geographic areas – Peninsula, East Antarctica, and West Antarctica – the story is different. Mine shows statistically significant cooling in East Antarctica from 1967-2006 and very slight warming from 1957-2006; Steig’s shows statistically insignificant warming. Mine shows statistically insignificant warming in West Antarctica (but cooling on the Ross Ice Shelf) from 1957-2006; Steig’s shows statistically significant warming. The trends in the Peninsula are also outside each other’s 95% confidence intervals.
.
3. There is no way to “prove” the correction outside of a) seeing if other users of AVHRR data have noticed the same problems with the same satellites and published it in the literature (and they have); and, b) testing if there is a statistically significant difference in the means for each satellite period (and there is). There is a vast array of papers on correction factors for the AVHRR satellites and a fairly wide spread in correction factors. Most are concerned with single-channel analysis, though, which has no direct applicability to our case. Unless I want to reprocess the entire 1.2 TB AVHRR data set and learn how to cloud mask, this is as good as it will get. A longer explanation can be provided if you desire.
.
4. I’ll break this one down into parts:
.
a) I feel that the principles behind doing this kind of reconstruction are valid.
.
b) While I used similar tools, they are not exactly the same (the RegEM algorithm is modified) and the methodology is significantly different in at least one aspect. I do not impute the ground stations and PCs together in a single step. The principle of the reconstruction (as stated by Steig) is to use the satellite data to provide covariance information to predict temperatures at locations other than ground station locations. Including the PCs in the ground station imputation step results in the satellite data determining ground temperatures at ground station locations since the PCs are allowed to affect the imputation. Separating them does not change the overall trend, but it does change how strong the correlations/anticorrelations are between locations and prevents RegEM from producing artifacted solutions at regpars ~ 7 and higher.
.
c) As the thread at WUWT can attest, I do not believe that the accusations of fraud or academic misconduct have a place in a discussion of the technical details of the paper. As I stated several times at tAV, I have interacted with Steig several times via email and have found him to be professional and helpful. I personally have had no bad experiences with Steig.
.
And that is where I will leave that. ;)

Re: Ryan O (#9), With respect to point 2) you need to compare the trend and uncertainty of the 3PC/regpar=3 model-frame reconstruction compared to your ‘true’ model-frame trend. If the uncertainty is comparable to the others quoted the 0.102+/-0.06(?) is not significantly different to the 0.07 that is the target answer. You can’t use a match that close to undermine the same error calculation in the real world.

On 3), that answer is unlikely to satisfy reviewers since there is still the circularity of ‘validating’ against the data you corrected the satellites with. You need to withhold the truly independent data for any useful validation exercise.

Re: Mark T (#11), The WUWT thread is full of them: “Liars”, “Data-diddling doo-doo brains”, “enviro-mentalists”, “eco-Taliban”, ‘Climate crazies’, ‘Apocalyptic conartists’, on CA “Mannian” this that and the other, juvenile references to “the Team”, “total garbage”, “It’s hard to avoid the suspicion that they considered other parameter combinations and did not consider combinations that yielded lower trends”, and on and on…

You cited inflammatory terms used at another blog, that I do not administer and which has different policies than Climate Audit. I ask people not to use terms that impute motives to people or inflammatory terms of the type that you objected to at WUWT and by and large people comply with this policy. I do not pre-screen comments; sometimes this policy is not heeded and if something has slipped through, I ask that people draw it to my attention so I can delete the offending term or post.

The term “ad hominem” is described at Wikipedia as follows:

An ad hominem argument, also known as argumentum ad hominem (Latin: “argument to the man”, “argument against the man”) consists of replying to an argument or factual claim by attacking or appealing to a characteristic or belief of the source making the argument or claim, rather than by addressing the substance of the argument or producing evidence against the claim. The process of proving or disproving the claim is thereby subverted, and the argumentum ad hominem works to change the subject.

I do not see how the following comment can be construed as “ad hominem” within that definition:

“It’s hard to avoid the suspicion that they considered other parameter combinations and did not consider combinations that yielded lower trends”

The comment does not allege that Steig or Mann is linked to Fenton Communications or allege that Mann has a track record of dubious statistical methodologies. It’s a criticism, but it is not an “ad hominem” criticism. Look, only one of two things is possible here: that they neglected to examine any other parameter combinations; or they considered other combinations (which necessarily yielded lower trends). You tell us what you think.

The term “Hockey Team” originated at Real Climate, not here. All I’ve done is shorten it somewhat. Climate scientists regularly use the term “community” to describe themselves. Would you be happier if I started using the term Community?

We are talking here about algorithms and methods developed and applied by Mann, used in an article of which Mann is a coauthor. “Mannian” is a bit more colloquial than “Mann et al”; if you regard the term “Mann et al” as pejorative, I would perhaps understand your perspective, but I get the impression that you don’t.

By and large, I avoid using phrases like “total garbage”, but I concede that they do slip in from time to time. But inelegant as the phrase was, your characterization of its use as an “ad hominem” is totally false, as the phrase was used to describe various assertions in Steig et al. As noted above, “ad hominems” are used to change the topic, while the criticisms levelled in the paragraph are on point and technical and pertain to the substance of the article.

Virtually all of the above is total garbage. We’ve seen in earlier posts that the first three eigenvector patterns can be explained convincingly as Chladni patterns. This sort of problem is long known in climate literature dating back at least to Buell in the 1970s – see posts on Castles in the Clouds. “Statistical separability” in this context can be demonstrated (through a reference in Schneider et al 2004 (by two coauthors) to be the separability of eigenvalues discussed in North et al (1982). Chladni patterns frequently occur in pairs and may well be hard to separate – however, that doesn’t mean that the pair can be ignored. The more salient question is whether Mannian principal component methods are a useful statistical method if the target field is spatially autocorrelated – an interesting and obvious question that clearly is not the horizon of Nature reviewers.

Ryan’s objective was to determine whether or not Steig’s hypothesis was falsifiable using variations of Steig’s own hypothesis. Ryan’s work did not serve as a newer substitute for Steig’s hypothesis. Ryan succeeded in achieving his objective, therefore his objective cannot be reasonably described as the “nonsense” you suggested insofar as the falsifiability of Steig’s hypothesis is concerned.

Steig’s hypothesis attempted to demonstrate a method by which missing air temperature observations could be inferred and/or imputed across a continental Antarctic expanse using statistical methods to infer relationships between surface, satellite and virtual locations. It remains to be demonstrated how it is scientifically possible for such physical relationships to exist in the Antarctic in accordance with the principles of meteorological and climatological science. While an infinite number of calculated relationships between the climates of observational locations can be inferred in the non-physical domain of a numerical analysis used in a statistical hypothesis, few of those are numerically calculated potential relatonships known to have an actual physical manifestation in the real world we inhabit.

If
Station A climate = lower latitude coastal climate
And
Station B climate = higher latitude plateau climate
And
Station C climate = intermediate latitude plateau climate
And
Station D climate = higher latitude plateau climate
And
Station A climate ≠ Station B climate
And
Station B climate ≠Station C climate
And
Station C climate ≠ Station A climate
Then
((Station A climate + Station B climate)/2) ≠ Station D Climate
Conclusion
Steig’s obscure statistics also ≠ Station D Climate
Because
Steig’s statistics fail to recognize and take into consideration the physical barriers establishing unequal climate domains and their physical constraints

Re: victor (#15), First of all, not one of your assertions amounts to an ad hominem. They are, at best, insults. I’d suggest give yourself a lesson in logic and study up what an ad hominem argument really is. Steve outlined it fairly well, notably without insulting you once. You don’t do your own argument (which is quite clearly a straw man) any justice by not even understanding the basic definitions of the terms you apply. Second of all, you were replying to this thread, in particular, which does not lodge even a single insult, let alone an argumentum ad hominem, nor does Ryan O’s post on which this thread is based.

Re: victor (#15), I am not entirely sure what your objection is. You seem to be concerned with the continent-wide trends, which I freely admit are within each other’s 95% CIs. That is only one measure, however. If you break the continent up into different regions (Pensinsula, West, East) and then calculate trends, they are outside of each other’s 95% CIs.
.
I also believe you are misunderstanding the point of the Steig paper. Whether the overall average is 0.12 Deg C/Decade or 0.07 Deg C/Decade is not nearly as important as the geographic location of the positive trends. If the positive trends are limited to the Peninsula, then no one really cares because we already know that. If the highest positive trends are on the Ross Ice Shelf, then a lot of people care, and they care a great deal.
.
In order for the catastrophic sea level rises vis a vis the IPCC to come true, there must be a major loss of continental ice from Antarctica. Larsen and Wilkins are ice cubes. While somewhat symbolically important, they are meaningless insofar as sea level rise is concerned. The Ross Ice Shelf is a wholly different story.
.
Disintegration of the Ross Ice Shelf provides a direct means for deglaciation of the Antarctic interior. If the ice shelf goes away, the speed of advance of the interior glaciers to the sea increases dramatically – and sea level follows. If the highest trends in Antarctica really were located geographically on the Ross Ice Shelf, there might very well be cause for concern. Hence the big deal with the Steig paper. The fact that overall Antarctica had warmed was not the main point. The fact that the geographical distribution of trends was much different than had been previously assumed is the main point.
.
So I think your focus on the overall trend is somewhat misplaced and a bit myopic. In this case, it’s location, location, location.
.
I also feel that you have a misunderstanding of the point of the analysis that the two Jeffs and I have done. The point is not really to present a rock-solid alternative to Steig’s analysis; the point is simply to show that the choices Steig made to arrive at his results are unsupportable. Choosing only 3 PCs simply does not allow enough geographical resolution to say that West Antarctica is warming. Regardless of whether our reconstruction is accurate, Steig’s cannot be. As shown in the previous “Coup de Grace” post, you can feed in data that has a minimum on the Ross and Weddel Ice Shelves and a maximum in the Peninsula and what pops out is a maximum on Ross. Steig’s method simply does not have the geographic resolution necessary to make his claims.
.
If you wish to believe that my reconstruction is all nonsense – well, to be honest, I feel somewhat the same way. I myself feel that there are nonsense aspects of it and I would not hang my had on the reconstruction by any means. My reconstruction suffers from many of the same unknowns as Steig’s. While I do not feel as strongly as D. Patterson about the conceptual basis of the analysis, there are fundamental problems with this type of extrapolation that are not addressed in either reconstruction.
.
So in summary, the main points of the analysis are:
.
1. Steig’s analysis cannot properly locate trends geographically.
2. Steig’s analysis likely overstates both the magnitude and statistical significance of the reported trends.
3. There is no indication that the Ross Ice Shelf is in any danger of collapse (it is, in fact, growing).
4. The errors and uncertainties with these types of analyses have not, in the past, received serious enough consideration by the scientists performing them. This type of analysis can lead to wildly different conclusions based on small changes to either algorithm parameters (see Steve McI’s ERSSTv3 thread for another example) or input data. As such, these types of analyses should be accompanied by all of the assumptions that must be met for the analysis to be valid, along with a discussion of how those assumptions may or may not be met.
.
That goes for my reconstruction as well as Steig’s. You are right to be skeptical of mine. However, you seem to think that this uncertainty undermines the point of the analysis – when the actual point of the analysis is the fact that these uncertainties have not been – and likely cannot be – quantified.

I do not know what is motivating Victor’s comments here in his apparent holding action defense of a peer reviewed paper, but the responses they provoke such as, and in particularly, Ryan’s above comments provide good reviews of and justification for the sensitivity studies performed to date.

Victor, if you are interested in furthering this discussion, I would recommend that you ask more specific questions that show you are familiar with the details of the analyses.

Stay away from the personality issues and attempts to construe Ryan’s intentions and results and comments of blog participants into something there are not. Those actions put into question the sincerity of your questions and comments.

Ryan, climate scientists seem to have a lot of trouble understanding the idea that you can work through a method under slightly different assumptions as a sensitivity study without “presenting” the variation as an alternative positive reconstruction. We got this all the time with our analysis of MBH – we didn’t present an MBH-style analysis without bristlecones or with two PCs as an alternative reconstruction – we stated clearly that we did not endorse either the choice or proxies or method. We were merely illustrating impacts, rather like showing what happens with different regpars or PCs.

Wahl and Ammann spent much of 50 pages arguing that an MBH-style reconstruction without bristlecones or with reduced bristlecone weights didn’t make sense – only they called this a “MM reconstruction” and huffed and puffed endlessly about its iniquities. Of course, we agree that an MBH-style reconstruction without bristlecones didn’t make sense; our dispute was whether a reconstruction that failed without bristlecones could succeed merely by adding them back in. I raised this issue very clearly as a reviewer of Wahl and Ammann – Schneider terminated me as a reviewer and let Wahl and Ammann proceed without dealing with the point – debating Schneider-style, I guess.

At this stage, I don’t think that you’re in any position to offer regpar variations as your own “alternative” reconstruction and I would strongly advise against use of any wording that indicates in any way that you acquiesce in the Steig methodology. At this stage, I doubt whether anyone fully understands all the statistical properties of the method right now, including apparently Steig or the Nature referees. We know something about its properties under regpar and PC variations in some empirical circumstances and I think that you are entitled to observe that regpar choices other than 3 trump the Steig reconstruction according to criteria set out in article xyz, but you shouldn’t go beyond that. Even with warning labels attached everywhere, it will be misrepresented but you still need to always attach warning labels or the Community will latch onto the one instance where you didn’t attach warning labels. You;ll find that the Community will be far more interested in proving why you’re wrong than in understanding why Steig’s is even more wrong.

Another point – Steig’s error bars only cover the trend in respect to OLS errors, a point that Hu discussed a couple of months ago. They are not designed to include structural errors of a regpar type. The Steig error may be “significant” in the sense that the trend of a reconstruction that trumps the Steig reconstruction may not be significantly different from zero (and probably won’t be), whereas the Steig reconstruction purported to be.

Re: Steve McIntyre (#22) and Re: Kenneth Fritsch (#21), Unfortunately, it’s almost a catch-22 situation. In order to show how sensitive an analysis is, you must present the alternatives using different inputs. Unfortunately, those who respond then seem to focus on the problems with the alternatives (in spite of the fact that you freely admit that the alternatives are likely to be just as hoaky as the original), and, once having “disproved” the alternatives, subsequently declare vindication of the original study.
.
I find it rather odd that the burden of proof shifts like this and few people question it. It is not your burden to show that a no-bristlecone reconstruction is valid. The fact that a no-bristlecone reconstruction has crappy verification statistics makes no statement on the validity of a bristlecone reconstruction. It’s merely a sensitivity test. It is the burden of the original researcher (MBH, in this example) to prove the validity or the original study.
.
Yet the field seems content with setting up the sensitivity test itself as a strawman.
.
I imagine – assuming we are able to publish in the first place – that the Antarctic paper may experience the same problem.

Obviously I agree 100% with your statement here. I’m glad that you understand the point so clearly. It’s really quite bizarre, isn’t it?

It’s very hard for someone from outside the field to appreciate how pervasive this sort of reasoning is.

Because you;re pretty new to this, you’re have had a chance to read the papers in an interesting order and have a fresh perspective. Your start on the methodological issues came through Antarctica and not through MBH, so you get to read these issues both with some knowledge of the kind of problems but without an investment in the prior issues. Somehow the Community has lost its way when it comes to dealing scientifically with the MBH statistical issues.

Re: Steve McIntyre (#28), They’re really identical issues. Funny how things come full-circle. The unfortunate thing is (whatever personal issues various folks have had in the past) they are all bright people – many of them are probably a lot brighter than me. But somehow they have become blind to one of the basic tenets of science: he who makes the claim has the burden of proof.
.
Or, as Carl Sagan said, “Extraordinary claims require extraordinary evidence.”
.
Note that he did not say that they require extraordinary inference.

Re: Ryan O (#20), I referred to the continent wide trend in the model-frame test because that is the only one you originally highlighted, and was described as the ‘coup de grace’. With Hu’s comment about the trend uncertainty once the temporal auto-correlation is factored in being as large as ~0.09 degC/dec, then it is very likely to be the case that the continent-wide trend in the model-frame calculation is easily within uncertainty of the ‘actual’ trend – thus not a ‘falsification’ of the Steig methodology as claimed elsewhere. The East and West Antarctic trends (and their uncertainties) in the model-frame test would be interesting.

Re: Steve McIntyre (#22), I’m going to go out on a limb and suggest that the “Community” doesn’t actually mind when people try different things to get closer to whatever the underlying truth is – since that is what the different groups working on all these issues are trying to do as well. What they probably object to is the feeding frenzy that you encourage (or at least don’t try and prevent) that goes way beyond sensitivity studies on methodology. All Ryan’s sensitivity studies show long-term Antarctic warming, yet Steig’s work is apparently ‘falsified’, and any use of PCA or RegEM is described as Mannian (and by inference in these parts, incompetent or worse). Ryan is to be applauded in at least attempting to quantify what the available data can and cannot show (even if one might quibble with the details). If one avoids dealing with the ultimate consequences of any sensitivity study, it encourages a great deal of mindless extrapolation as seen most clearly on the WUWT threads. While you aren’t in a position to deal with the WUWT commentators, the pattern of behaviour is not exclusive to that site. Your advice to Ryan is I think completely counterproductive if the goal is actually to advance science and knowledge. Any methodology can be criticized on a priori grounds but doing so is completely unproductive. Much more interesting is putting alternatives out there to see whether they can stand up to the attention that gets applied to the mainstream stuff. Of course, if the goal is simply to feel smug and bask in the adulation of people who are only looking for anti-mainstream talking points, then your advice is probably appropriate ;)

Re: Ryan O (#23), I think you have this wrong. The goal in these kinds of studies is to see what can be said robustly given the imperfect data, different analyses methods available and the uncertain understanding of the system. Steig et al made a reasonable case (with plenty of verification and back up from different methodologies) that a little more could be deduced than had been clear earlier. As far as I can tell, at no point did they claim that their reconstruction was absolute truth or that there weren’t continued uncertainties. Now you are claiming that you can robustly derive even smaller scale structures in the trends than they claimed – and are therefore potentially advancing the science. That’s great if it stands up, but that will only be the case if different approaches produce the same results and the verifications work out. That is up to you to show, not for Steig et al.

Re: victor (#38), I think you need to re-read the Steig paper. Even in just their abstract they claimed that:

Here we show that significant warming
extends well beyond the Antarctic Peninsula to cover most of West
Antarctica, an area of warming much larger than previously
reported. West Antarctic warming exceeds 0.1 6C per decade over
the past 50 years, and is strongest in winter and spring.
…
Instead, regional changes in atmospheric
circulation and associated changes in sea surface temperature and
sea ice are required to explain the enhanced warming in West
Antarctica.

.
They are calling for wholesale changes to the physical description of what is happening in Antarctica based on their results. Far from a discussion about uncertainties, they are taking their reconstruction as fact and recommending that circulation models be updated to reflect that.
.
The article is replete with discussions about how their reconstruction is robust. Any potential errors are generally dismissed as “not significant”. Look at their discussion of “spatial coherence” or the discussion about satellite calibration, for examples.
.
You also have entirely misunderstood the point of our reconstruction. Our goal was not to derive smaller scale structures than Steig; our goal was to show that Steig could not resolve the structures he claimed to resolve. We prove this by feeding our reconstruction into his method as input. Also, you seem to put too much stock into Steig’s “different methodologies”, as they are all 3 PC solutions. They are essentially variations on a theme and are not fundamentally different from each other.
.
Without intending to be too snarky, I think you need to re-read both Jeff Id’s and my own analyses on the Air Vent, as well as the Steig paper. You do not seem to have a good grasp of either of them. I apologize if that sounds harsh, but based on your questions, it is the truth.

You’re getting kicked around a bit on some of your weak points but the one you make about correcting sat data makes some sense.

Regarding your point number 3 I think you have made a valid argument which needs to be addressed. However, Ryan’s corrections make good sense to me, the satellite AVHRR data Jeff C looked at from the NSIDC is far too noisy to accept short term trends as real they simply can’t be discerned from the massive cloud based fluctuations. The long term trend of the AVHRR is (without a study) well outside the surface station trend CI. The only remaining value of the satellite data becomes the ability to distribute the trends of individual surface stations appropriately based on area of influence. In Steig et al, the distribution from too few PC’s is nothing more than a smudge. Ryan and others ;) have shown that reasonably clearly.

Since all of the above details are well confirmed, the usefulness of satellite data becomes – The correct weighting and distribution of surface station information with minimal distortion of surface station trend.

I mean we really really don’t want the satellites to distort far more accurate temperature measurements and in Steig08 that is exactly what happens. IMO, Ryan’s corrections minimize the distortion bringing the reconstruction far closer to reality, yet satellite trends (and spurious correlation) is still affecting the stations trend. But it’s better.

If we want to know temperature trends of the Antarctic, my firm opinion is that it’s hard to do better than simple area weighted plots of surface stations.

Now if we want improved spatial distribution of trends and we recognize that the short term fluctuations in AVHRR dominate the covariance in RegEM, the offsets Ryan adds should make little difference to the distribution of eigenvalues across the antarctic, however in RyanEM SST information dominates the trend bringing the partially corrected satellite data further into alignment with surface stations and we may be able to improve accuracy.

It’s my thought then that the corrections to AVHRR should be more substantial in order to force consistency with surface station data trend while maintaining the short term covariance. How do we convince peers in a journal that this is correct when they just headlined a broken paper?

My 3 year old says it best. My not know.

I tried to do exactly that by completely removing the sat data except for covariance, TCO came by and beat up my effort completely missing the point.

Re: Jeff Id (#25), Jeff, just to add to that, with the method we used for the latest reconstruction, doing the calibration to the AVHRR data changes the 1957-2006 trend from 0.068 to 0.060. It has a negligible effect on the trend. It also has a negligible effect on the geographical location of the trends – I certainly can’t tell the difference visually between the two.
.
So I think that the methodology in our reconstruction is improved as compared to Steig’s. But as you’ve said many times before at tAV and elsewhere . . . does it really represent the 50 year temperature history of Antarctica? There is simply no way to tell.
.
The (somewhat) frustrating thing is that the community appears blind to the possibility that there is simply not enough data to draw reasonable conclusions.

perhaps we could see a clear statement that you consider that the aims of the Steig et al paper were valid, that the tools they used were valid (since you use the same ones), and that tediously repetitive ad homs against Steig and Mann are an unnecessary distraction?

On your second point, I do not see how Ryan (or anyone else) could possibly be in a position to issue a “clear statement” that the Steig tools are “valid” on the basis of present knowledge of these tools. After considerable effort, we’ve collectively more or less now managed to figure out what Steig actually did. Ryan has commenced the analysis of the sensitivity of this method to PC retention parameter and TTLS truncation parameter – something that, in my opinion, should have been already studied and reported on by authors of a study featured on the cover of Nature.

In my view, opining on the “validity” of the Steig method in total is well beyond the scope of Ryan (or anyone else’s) studies of this method to date. In order to do so, the method in total – and there are many quirky features over and above regpar and PC parameter selection – would need to be connected to known statistical literature. Again, in my opinion, this ought to have been the responsibility of the originating authors.

In my opinion, Ryan’s study of the sensitivity of results to arbitrary choices of regpar and PC parameters in no way constitutes an endorsement of the “validity” of the method and you should not impute this.

I discourage people at this blog from commenting on authors’ objectives and motivations, as this generally leads to pointless speculation. We can each derive personal opinions on their motivations from their handling of press interviews at the time of publication, but at this blog anyway, I encourage people to separate such opinions from their comments on the methodology itself. I see no purpose in Ryan or anyone else commenting on the motivations/aims of the authors.

Ryan’s comments have been free of ad homs against Steig and Mann, “tedious” or otherwise. For that matter, I believe that my own posts have been as well. I’ve criticized their methodology and have observed that, in the absence of any adequate explanation, their selection of parameters could be construed as opportunistic, but I do not see evidence of “tediously repetitive ad homs” against Steig and Mann. On the contrary, at the outset of this discussion, I urged readers to appraise the application of RegEM to Antarctic temperatures on its merits, without placing any adverse interpretation on Mann’s involvement, noting that the prospects for a sensible application of this method were higher in this context than in a bristlecone-based proxy reconstruction.

In addition to previous responses to your post, Steig’s entire premise, including observations of surface air temperatures at coastal weather observation sites with or without satellite observations to interpolate and infer missing intra-Antarctic values to whole degrees and fractional degrees is fundamentally invalid. Such a premise makes no more real world sense or validity than it would to include observatons from the Aleutian Islands to interpolate and infer missing surface air temperature values for broad swaths of the Hindu-Kush mountain ranges. The Antarctic has many environments whose air temperatures are the results of very different physical processes. Statistics have a role in analyses, but their application to a non-existant physical relationship can only result in a non-existant and fantasy conclusion in any event. In other words, regardless of the statistical methods used and their validity or lack thereof, Steig has invalidly asserted and/or imputed the existence of a physical relationship shared between the air temperature environments of the observation sites which do not and cannot exist in the physical world of the Antarctic continent.

Kudos to Ryan O. His demonstration should make him famous. A time-series histogram showing the fraction of stations with missing data employed in both his and Steig’s reconstructions would provide an additionally revealing dimension to this splendid benchmark study.

Not wanting to pile in, but Victor, you are either being obtuse or trolling. D.Patterson is entirely correct to state that showing via sensitivity analysis that a particular method has significant problems is unrelated to whether the original method is valid as applied. Or are you trying to claim that if a disputed method gives a disputed result, the only valid criticim mechanism is to disavow the entire method ?

And to criticize CA for transgressions or otherwise for comments on WUWT, WTF ? You’re stretching man, you’re stretching. I think what you mean to say is “No, sorry, I can’t find any specific ad-homs on CA contrary to what I alleged”, however you don’t appear to have the intellectual honesty to say so.

Now if we want improved spatial distribution of trends and we recognize that the short term fluctuations in AVHRR dominate the covariance in RegEM, the offsets Ryan adds should make little difference to the distribution of eigenvalues across the antarctic. The reconstructed portion won’t change much from the corrections to sat data. However in RyanEM, SST information dominates the trend bringing the partially corrected satellite data further into alignment with surface stations.

It’s my thought then that the corrections to AVHRR should be MORE substantial rather than less. The purpose being to force consistency with surface station trend while maintaining the short term covariance.

We would all probably be best served, though, to pretend my stuff is the newest Mann paper and try to rip it apart. Better to have that happen now than later.

While it’s OK to tabulate uncorrected SE’s (or double SE’s) as a point of reference, the bottom line is whether a value is significant after correction for autocorrelation.

In my post, Steig 2009’s Non-Correction for Serial Correlation, I show that Steig et al did not in fact make any such correction, not even using the simple (and adequate in this case) Santer-Nychka-Quenouille-Bartlett method, despite a suggestion in the SI that they had. I found that the adjustment was not sufficient in itself to overturn their results except for E.Ant., and so not worthy of a letter to Nature. However, in order to be meaningful, your modified trend values should be reported with corrected se’s (or CI’s). Have done this yet with any of your reported results?

In my post, I am able to replicate Steig’s overall trend of .118 °C/decade, but get an adjusted se of .0458, which gives an adjusted CI of +/- .092, not .065 as you give.

I’ve been thinking about this as well, I remembered your post before. In the simple reconstruction linked above the trend was 0.04, if the corrected CI from your post can translate there is no statistically significant trend but really it’s not even close.

Re: Jeff Id (#31), Ditto here. I just haven’t done the calculations yet. The uncorrected values for the 13 PC reconstruction reach significance by a hair. I fully expect that the end result will be that they are not statistically significant, except in the Peninsula. I also expect that even after correction, East Antarctica will show statistically significant cooling.
.
And I agree entirely that these corrections need to be made. ;)

With regard to your first question, you need to appreciate that because the methodology deals with multivariate statistics concepts, it is not easy to provide a practical, working understanding of the procedures without first explaining a great deral of underlying background. I wrote an arm-waving description which you might look at here.

For the second question, there are situations where one has measured a large quantity of different (but substantially inter-related) variables which you might wish to use in a statistical analysis. Several principal components may very well summarize the information contained by those variables thereby simplifying subsequent analysis.

For example, several years ago, I was consulted on a study where someone had done chemical analyses of soil samples from somewhat more than a hundred archaelogical sites recording the amounts of almost one hundred different minerals and other substances in those samples. The intent was to look for similarities between those sites. The problem was substantially simplified by extracting between five and ten principal components from the original measurement variables and then doing a cluster analysis on the sites using just the PCs. As well, in some regression problems, PCs can be used to overcome collinearity difficulties caused by highly correlated predictor variables.

Despite the superb job that the RJ2 crew has done to clarify the methodological issue, the epistemological issue of how well Antarctic temperature “trends” are now known remains murky. The reasons are two-fold: 1) spatio-temporal volatility of decadal-scale temperature variations that determine the linear data slopes, and 2) paucity of actual measurement coverage on a continental scale.

The challenging, counter-intuitive feature of measured temperature time-series is that, as distance between stations increases, the decrease in cross-correlation comes primarily from loss of coherence at the lowest frequencies, rather than the highest. Whether this is due to instrumentation drift, differing microclimatic patterns, or data “adjustments” is not known. But the rapid demise of cross-spectral coherence at precisely those frequencies that most strongly affect fitted trends is quite ubiquitous throughout the globe. It is only at sub-decadal time scales that coherence stands up well as station separation increases into hundreds of kilometers and beyond. Between Punta Arenas and Base Orcada (the two stations closest to Antarctica with long enough records for reliable cross-spectrum analysis) the squared coherence stands at O.80 at the 4.4yr spectral band, drops down to 0.03 at the 6.3yr band, and never rises to a significant level at the multidecadal bands. The interpolation of trends over many hundreds of kilometers is hazardous, at best.

At 5.5 million square miles, Antarctica is ~1.5 times the size of the United States. Yet, even supplemented by AWS stations, the total number of useful (though by no means intact) records is only 97, with the great majority from coastal sites. Even mineral deposits of the continent, which lie static over the ages, could scarcely be estimated from 97 bore holes, no matter how carefully the PC analysis and RegEM was done. The challenge of estimating the time-varying temperature record of the continent carries the burden of an entire additional dimension. And time is precisely the domain over which linear multidecadal trends fitted to measured data at a fixed station vary strongly from decade to decade. The intrinsically oscillatory nature of temperature time series produces oscillatory autocorrelation functions, unlike those of various statistical models. The computed trends should not be confused with secular ones, nor with “measured trends,” as some would have it.

What has been produced in the RJ2 reconstruction is a putative continental trend map that applies only to the precise end-year of the reconstruction. Change the end-year and the map will change appreciably if there’s sufficient underlying realism. The unmistakable methodological superiority of reconstruction no doubt leads to the temptation to mistake it for reality. In the interest of scientific integrity, that temptation should be resisted. In the light of experiments revealing the impact of inadequate data via three- dimensional Fourier transforms, I’m with Jeff Id that the best that one could do in the face of inadequate records is a straighfoward area-weighted average. And I would add that there’s no onus on not knowing what is unknowable. There is a fine line, however, between sophistication and sophistry.

The challenging, counter-intuitive feature of measured temperature time-series is that, as distance between stations increases, the decrease in cross-correlation comes primarily from loss of coherence at the lowest frequencies,

I don’t think the lowest frequencies are having any effect at all on correlation. There is very little trend an high variance. Maybe I’m misunderstanding?

Probably a pertinent set of graphs. The first is the station density, or the number of different stations reporting by month:
.

.
The second is the record length of each individual station included in our reconstruction:
.

.
I would think that before anyone could present a reconstruction using this or a similar method that a series of sensitivity analyses would need to be done that would include the following:
.
1. A set of reconstructions where various long record length sites are removed.
.
2. A set of reconstructions where the effect of random offsets (to simulate instrument bias) is determined, particularly in the pre-1965 timeframe where the total number of stations is 15 or less. The same physical measurement instrument has not been in continuous use since day 1, so the effect of potential biases when instruments were replaced/moved needs to be assessed. Some research would need to be done to establish a reasonable spread of biases to test.
.
I personally have no intention of doing this on our reconstruction because I don’t think that either Jeff or I plan on claiming our reconstruction as the “actual 50 year temperature history of Antarctica”. Rather, I would think that we would state these as issues that would need to be addressed prior to anyone making claims based on reconstructions of this type.

Victor, it is you who has it wrong, not Ryan. Steig et al made no case whatsoever about what can be said robustly. Ryan is not claiming robustness. In fact the opposite. His main point is that Steig et al is not robust.

The challenging, counter-intuitive feature of measured temperature time-series is that, as distance between stations increases, the decrease in cross-correlation comes primarily from loss of coherence at the lowest frequencies,

I don’t think the lowest frequencies are having any effect at all on correlation. There is very little trend and anomalies have high variance which dominates the correlation. Maybe I’m misunderstanding?

Perhaps you’re dealing with mere snippets of records, wherein the comparatively high-frequency components dominate the sample variance and the computed spatial cross-correlation. What I’m seeing in century-long records is that the highest spectral densities are at low frequencies (multidecadal periods), where the cross-spectral coherence is usually low, even between neighboring stations. Clearly, it’s the low-frequency content that determines the apparent temporal “trend.”

Any methodology can be criticized on a priori grounds but doing so is completely unproductive.

I disagree 100%. One of my underlying objectives in these various threads is to try to understand the statistical properties of these various ad hoc methodologies that seem to arrive from outer space. Unfortunately, the original authors exacerbate the problems with unsatisfactory descriptions of their methodology, which together with their failure to archive working code, leads to considerable frustration and wasted effort in merely getting to the starting blocks of a statistical analysis – which, in a way, is where we are now. Had Steig placed working code and data online at the outset as I suggested to him in the most cordial possible terms – much subsequent frustration would have been avoided. Steig chose not to – an unwise decision in my opinion.

any use of PCA or RegEM is described as Mannian (and by inference in these parts, incompetent or worse).

RegEM-TTLS is a methodology that can fairly be described as “Mannian” in that I’m unaware of any other authors anywhere in the statistical universe that have employed this particular methodology. When RegeEM-TTLS is used in combination with prior PCA, the combination is particularly idiosyncratic. I am as familiar with the literature in this field as anyone and I do not know of any exposition providing a thorough description of the statistical properties of the method – which, in my opinion, remain substantially unknown. Ryan’s recent experiments show some peculiar properties. We showed peculiar properties of methods used in MBH and Smerdon has showed peculiar properties of a prior RegEM variation, not used here. I have argued on many occasions against the use of obscure statistical methodologies for the derivation of controversial applied results. That applies here as well. If you regard the use of obscure methodologies with unknown properties to derive important applied results as being “incompetent or worse”, then you are entitled to your opinion. I prefer to limit my commentary to the methodology rather than the hominem.

Steig’s work is apparently ‘falsified’,

I said at the outset of these threads and have re-iterated it on many occasions: it seems odd to me that Antarctica wouldn’t be warming along with the rest of the world; regardless of statistical methodologies, the instrumental record is narrow and could easily be affected by inhomogeneities of the sort that are objected to elsewhere; and, even if models are unsuccessful in some aspect of Antarctica, this would not signify the “falsification” of models to me as there are peculiar one-off aspects to Antarctica. I also said that RegEM, whatever it is, was more likely to work in Antarctica than on bristlecones.

“Falsification” is not a term that I use. I don’t know why you keep blaming me for comments that others make. Indeed, I regularly discourage “piling on” comments.

Re: Jeff Id (#46), I saw that post today. Interestingly the only way to decide if something has been overfit is to have performed a very long, arduous but necessary manual crank of all the data i.e not reconstructed, and characterised for each station, and compare it with the reconstruction. There’s no replacement for thoroughness, no matter how unpalatable.
The above posts have demonstrated there are many ways to skin a cat, but you still need to know that it’s actually a cat. And the discussion keeps coming back to the obvious question: Can you actually produce a decent trend analysis of Antarctica with the data we have? I think that should be the a priori position before people start reading into PCs too much.

The above posts have demonstrated there are many ways to skin a cat, but you still need to know that it’s actually a cat. And the discussion keeps coming back to the obvious question: Can you actually produce a decent trend analysis of Antarctica with the data we have?

Re: Ryan O (#56), Sometimes in science the correct answer is “no, you can’t do that”. As in, “no, this airplane design won’t fly” or “no the atom is not like raisin pudding”. To insist that we can always obtain an answer just because we want one is not scientific.

It appears this is a result of the persistent belief that by embarrassing specific scientists, the entire edifice of ‘global warming’ will fall.

As if validating a study is “an effort to embarass” instead of proper science. As if those behind the verification were expecting “the entire ediface will fall”, something that they never said nor implied.

What hope is there for actual communication to take place with these people?

And, again, the complaint of

blogs full of ad hominem attacks

Do they never read the comments at RC? CA is the model of behavior they should aspire to.

In case anyone’s curious, here’s my reply. It hasn’t gotten through moderation yet.
.
Eric,
.
As the “Ryan O” to which you refer, I would like to have the opportunity to respond to the above.
.
First, the discussion in North about statistical separability of EOFs is related to mixing of modes. Statistical separability is never stated or implied as a criterion for PC retention except insofar as degenerative multiplets should either all be excluded or all be retained. I quote the final sentences from North (my bold):
.

The problem focused upon in this paper occurs when near multiplets get mixed by sampling error. So long as all of the mixed multiplet members are included, there is no special problem in representing data and the same level of fit. However in choosing the point of truncation, one should take care that it does not fall in the middle of an “effective multiplet” created by the sampling problem, since there is no justification for choosing to keep part of the multiplet and discarding the rest. Other than this, the rule of thumb unfortunately provides no guidance in selecting a truncation point for using a subset of EOF’s to represent a large data set efficiently. Additional assumptions about the nature of the “noise” in the data must be made.

.
The criteria set forth in North do not suggest, in any way, shape or form, that only statistically separable modes should be retained. As statistical separability is not a constraint for either SVD or PCA, mixed modes often occur. Indeed, there is a whole subset of PCA-related analyses (such as projection pursuit and ICA) dedicated to separating mixed modes. PP and ICA by definition would not exist if statistical separability were a criterion for PC retention, as both PP and ICA require the PC selection be made ex ante. Calling statistical separability a “standard approach” to PC retention is unsupportable.
.
Second, as far as verification statistics are concerned, the improvement in both calibration and verification using additional PCs is quite significant. This obviates the concern that the calibration period improvement is due to overfitting. I have provided fully documented, turnkey code if you wish to verify this yourself (or, alternatively, find errors in the code). The code also allows you to run reconstructions without the satellite calibration being performed to demonstrate that the improvement in verification skill has nothing do to whatsoever with the satellite calibration. The skill is nearly identical; and, in either case, significantly exceeds the skill of the 3-PC reconstruction. The purpose of the satellite calibration is something else entirely (something that I will not discuss here).
.
Thirdly, this statement:

Further, the claim made by ‘Ryan O’ that our calculations ‘inflate’ the temperature trends in the data is completely specious. All that has done is take our results, add additional PCs (resulting in a lower trend in this case), and then subtract those PCs (thereby getting the original trends back). In other words, 2 + 1 – 1 = 2.

is misguided. I would encourage you to examine the script more carefully. Your results were not used as input, nor were the extra PC’s “subtracted out”. The model frame of the 13-PC reconstruction – which was calculated from original data (not your results) was used as input to a 3-PC reconstruction to determine if 3 PCs had sufficient geographical resolution. The result is that they do not. I refer you to Jackson ( http://labs.eeb.utoronto.ca/jackson/pca.pdf ) for a series of examples where similar comparisons using real and simulated data were performed. Contrary to your implication, this type of test is quite common in principal component analysis.
.
Lastly, I take exception to the portrayal of the purpose of this to be, in your words:

It appears this is a result of the persistent belief that by embarrassing specific scientists, the entire edifice of ‘global warming’ will fall. This is remarkably naive, as we have discussed before. The irony here is that our study was largely focused on regional climate change that may well be largely due to natural variability, as we clearly state in the paper. This seems to have escaped much of the blogosphere, however.

.
Nowhere have I stated my purpose – nor have I ever even implied – that my analysis makes any statement on AGW whatsoever. The purpose was to investigate the robustness of this particular result.

Re: Ryan O (#51), Five will get you ten that your post will never appear at RC. Just checked in over there and the level of polite discourse continues unabated. Per example; this description of Ryan O’s work: It’s “supermarket tabloid-quality material.” No doubt contributed by some statistical genius.

RE Jeff Id, #46,
Steig’s discussion over at RC makes no mention of CA but at least does provide a link to the argument of “someone ‘Ryan O'” on Jeff’s site The Air Vent, and makes an argument, based on North et al 1982 why 3 is an appropriate number of PC’s for this problem. At least we’re getting some dialogue here.

Many legitimate CA posters use web names so that their actual names don’t turn up a million hits on a Google search. However, given Ryan’s intense involvement here, with perhaps a future article under his full name, maybe this would be a good time to introduce himself to CA and RC, even if he prefers to continue to post under Ryan O.

I have some thoughts on a potential problem with North’s Preisendorfer-like rule, but will have to take a harder look North’s paper first.

I hope the dialog will continue. It may actually be helpful rather than adversarial. PCA and EM need some kind of proper validation in climate science, there are several examples of misapplied PCA and in my opinion EM. The methods are ad-hoc QC’d and are resulting in a number of errors. It’s complicated though and those in climatology are too accepting of covariance as a confirmation of quality.

In Ryan’s reply above, he’s mentioned that the verification stats are almost the same no matter which satellite data is used (corrected or not). Honestly, it’s difficult for me to figure out how a PhD misses the fact that the verification stats in this case are from the large magnitude of the short term signal and unrelated to long term trend. After you look even briefly at the plot of signal from sat and ground, it becomes pretty clear. I’m willing to learn though if there is a reasonable explanation that I’m missing.

JeffId–
The peculiarly odd thing about Eric’s comment was that he’d just written a post commenting on your blog post. So, the position would appear to be:
1) He comments on your post in his blog post, at RC, a non-peer reviewed blog forum.
2) He doesn’t discuss with you in the comments attached to his blog post.

At the appropriate point, you and RyanO will need to collect things together and publish. In the meantime, you at least have the benefit of some of eric’s objections stated publicly.

Re: lucia (#62), TBH, it’s kind of flattering to think that we generated a whole RC post! I’m not terribly concerned with how it plays out on RC, but it does (and it pains me to say this) validate one of TCO’s favorite refrains: Publish.
.
Besides, we may be all impressed with ourselves for the moment, but the real test is whether we impress anyone else enough to dedicate space to it in a journal.

It’s quite remarkable that PC retention should once again emerge as a battleground issue. Needless to say, Steig does not refer to Wahl and Ammann’s “convergence” test in his overfitting article.

It will be pretty hard for them to have a PC retention test that threads the needle between keeping out the lower order PCs in Steig and at the same time provides a rationale for carrying the non-temperature related bristlecones.

The citation from SMith et al 1996 referred to above is very much against Steig’s argument here – particularly when we’ve seen the effect of Steig trying to squish 5509 gridcells into 3 PCs – this manufactures spurious correlations between stations that are far apart.

Comments at Steig’s RC post are already closed. This is an all-time RC record for the shortest time to comment closing (about 12 hours), easily breaking the old record which I believe to have been previously held by Ritson’s post on autocorrelation.

I don’t tend to read other blogs much, despite contributing to RealClimate. And I’m especially uninterested in spending time reading blogs full of ad hominem attacks. But a handful of colleagues apparently do read this stuff, and have encouraged me to take a look at the latest commentaries on our Antarctic temperature story. Since I happen to be teaching about principle component analysis to my graduate students this week, I thought it would be worthwhile to put up a pedagogical post on the subject. (If you don’t know what principle component analysis (PCA) is, take a look at our earlier post, Dummy’s Guide to the Hockey Stick Controversy).

Forgive my rant here, Steve, and snip away, but this paragraph is so disingenuous and the attitude so feigned, I can’t help myself.

First, after what M and M did to MBH98 etc ad nauseum, Dr.Steig’s claim that he doesn’t much read other blogs that critically review his work and the work of his colleagues is not credible. He just had a major paper published in a glossy science mag. He knew it would be controversial and subject to intense review on this website as well as others. The paper’s claims and methodology have not withstood statistical scrutiny on this website and elsewhere. So what does he do? He attributes his discovery that there was fierce scrutiny of his paper on competing websites to a mere “handful of colleagues” who “apparently do read this stuff.” {guffaw} The implication is that Dr. Steig in his contrived nonchalance is apparently a blogger who only reads blogs he agrees with, which if true, shows a striking lack of scientific curiosity, especially when it relates to his own work. (Would that we could all have such supreme confidence in our work product!) He attributes his disdain for blogs to the ad hominem nature of some of the posts on them, and yet posts on a blog that although tightly controlled and regularly censorsius of opposing points of view, contains as many ad hominem attacks per 100 posts as any other science blog. Nor were Dr. Steig’s comments on RealClimate about his crtics and detractors before his trip to Antarctica exactly collegial. He can dish it out with the best of them.

And then, as if that were not enough pretense in one paragraph, he comically relates that he just happened to be teaching his students (surprise!) about PCA (“Principle Component Analysis”)and therefore will take a little time out from his laborious duties to “put up a pedagogical post” on the subject after Ryan O et al have just taught HIM a lesson on the same subject. In so doing, he inexplicably spells the word “Principal” wrong. This was corrected in the text of the post after his error was pointed out to him in post #4, but really, if you claim to be an expert in Principal Component Analysis, why can’t you spell it correctly? Is that how they spell it in the Dummy’s Guide to the Hockey Stick Controversy?

Re: theduke (#66), According to Steig’s words inviting improvements, there is virtue in an initial analysis that supports global warming. However, I and others don’t see it that way. Rather, I don’t see much virtue in half-baked analysis that supports global warming. For starters, it creates a lot of work for other people doing it properly. Second, it should not draw conclusions, esp. conclusions in Nature, when the analysis hasn’t been optimized across the model space.

Here’s a suggestion: If Dr. Steig and others like him have the courage of their convictions, and are genuinely interested in advancing the science, they should bring their findings here to ClimateAudit or some similar website for scrutiny before going out and submitting them to publications that, regardless of how strenuous the peer-review process, may not have the wherewithal to fully ascertain the accuracy of the findings.

This is rather easy to tell. If your answer depends on the number of PCs included, then you haven’t included enough.

Unless of course it’s Steig in the Antarctic, where the rule doesn’t apply.

Is there any reason for the rules being different? Maybe it’s because MBH was a NH reconstruction and Steig is a Southern Hemisphere reconstruction. Other than that, I can’t think of any difference relevant to the PC retention policy.

IMO the main problem in Steig’s “educational” post, and also somewhat discussion here, is the focus on PC retention as in ordinary PCA. This is not what they are doing in their Antarctic paper. Instead, what is actually done is closely related to Principal Components Regression, where it is well-known that low variance components may be very important. I guess a quote from Jolliffe (1982) is in order.

Hill et al. (1977) give a thorough and useful discussion of strategies for selecting principal components which should have buried forever the idea of selection based on size of variance. Unfortunately this does not seem to have happened, and the idea is perheps more widespread now than 20 years ago.

Instead, what is actually done is closely related to Principal Components Regression, where it is well-known that low variance components may be very important.

Overfitting is not mentioned in the pdf. Is this relevant as well:

A cautionary note on PCR: In practice, zero eigenvalues can be destinguished
only by the small magnitudes of the observed eigenvalues. Then, one may be tempted to omit all the principal components with the corresponding eigenvalues below a certain threshold value. But then, there is a possibility that a principal component with a small eigenvalue is a good predictor of the response and its omission may decrease the efficiency of prediction drastically.

Unfortunately, it is comments like these that do not get directly into the discussion. They do, however, have a place in our knowledge base and help us individually judge the evidence and methods in papers like Steig et al. (2009).

RE Ryan O, #51, 53,
I’ve read North now and you’re quite right that he expressly states that he is not trying to derive a stopping rule. He’s considering an application where the EOF’s may have a physical interpretation of interest, and notes that if a given EOF has an eigenvalue that is insignificantly different from that of another EOF, the two EOFs are only determined up to linear combinations, and hence neither will show the underlying pattern by itself.

Preisendorfer’s “Rule N” (which has nothing to do with the sample size N, since it is just the 14th of several alphabetically identified stopping rules he considered) is much more relevant, but still has a problem, IMHO. His null is that all the eigenvalues are equal so that no linear combination has any more explanatory power than any other. The estimated eigenvalues will then be different with probability 1 and will ordinarily decay gently when they are arranged in decreasing order. The exact distribution of the j-th largest estimated eigenvalue is messy since it is the j-th largest of N identically distributed random variables, so he proposes simulating it by Monte Carlo means, taking into account the serial correlation of the data. The distribution of the very largest estimate will be considerably higher than North’s formula evaluated with all lambda’s equal would predict, since it is the largest of N such estimates, and not a single such estimate as in North’s case.

Preisendorfer’s simulated critical values work fine for the first eigenvalue estimate, but once having rejected that the first is equal to the others, I believe they give the wrong distribution for the second, since they are based on the original null that all N (or N-m if m seasonal or other paramters have been estimated) of the eigenvalues are equal. If the first eigenvalue was say 40% of the total variance, then there is only .6 as much variance to spread over the remaining N-1 (or N-m-1) eigenvalues under the revised null that these remaining eigenvalues are all equal to one another, but not to the first. Accordingly, the critical value for the second eigenvalue would be only about .6 as high as for the first in this example, and so forth. So the rule tends to stop much too soon. (Remember that the “singular values” generated by SVD of the data matrix are not the eigenvalues themselves of the covariance matrix, but just their square roots.)

If you did have a data set with a pair of equal eigenvalues that were substantially higher than the others, and if a rule like Preisendorfer’s (with or without modification as above) were applied to it and the first lit up as significant, the second would almost surely light up as well. So the event of multiple eigenvalues doesn’t pose a particular problem for stopping rules per se.

Of course, in order for equal eigenvalues to be a meaningful indicator of non-correlation, each variable must have first been normalized to have equal value. While it may be useful to apply a SVD directly to eg Steig’s AVHRR file, as Steig has apparently done, it would not be meaningful to apply a stopping rule like Preisendorfer’s Rule N to the resulting (squared) singular values.

I hope Steig sees this discussion in time to correct his graduate lectures on PCA this week!

BTW, North expressly assumes 0 serial correlation in the data matrix. If the data are serially correlated, perhaps an adequate fixup would be just to adjust his N for “effective DOF” a la Santer, Nychka, Quenouille and Bartlett. But as it stands, his se’s wouldn’t be valid, even for the limited question he addresses.

Hu, I’ve just done a post on the inapplicability of North et al 1982 to the problem at hand. (This was something that I’d visited in February, but deserves re-visiting.) Steig’s “tutorial” on this topic suggests that it would not be out of line for Steig to take a refresher course.