Ryan O: More Mannian Algorithm Problems

Ryan O has observed a remarkable property of the Mannian algorithm used in Steig et al’s Antarctic temperature reconstruction described in a lengthy post at Jeff Id’s here and cross-posted at Anthony’s. Source code here (the source code style BTW evidencing engineering tidiness from which we should all take heed). I’m reporting here on one aspect of the post; readers are urged to consult either of the original postings.

As I understand his exposition, he took a hypothetical Antarctic temperature history (his “model_frame”) in which the overall Antarctic trend was 0.060 deg C/decade, with cooling on the Ross and Weddel ice shelves and near the South Pole and with a maximum trend in the Peninsula, as illustrated below (showing 1957-2002 trends):

Ryan extracted from this:
1) the post-1982 gridcells in lieu of AVHRR data;
2) for stations, the same pattern of missing data as in the actual Steig reconstruction;

He then did a Mannian reconstruction in which he:
1) used 3 PCs for the “AVHRR” data;
2) set the PTTLS regpar parameter equal to 3.

In this case, we know the “correct” answer (0.06 deg C per decade.) Instead of getting the correct answer of 0.060 deg/C , the Mannian algorithm yielded a trend that was 70% higher (0.102 deg C/decade).

In climate science terminology, this trend is “remarkably similar” to the trend reported in the original article. Ryan dryly observed:

If “robust” means the same answer pops out of a fancy computer algorithm regardless of what the input data is, then I guess Antarctic warming is, indeed, “robust”.

Ryan’s example is pretty convincing evidence that this particular Mannian algorithm is biased towards yielding higher trends. However, I think that it’s important to understand the mechanism of the bias a little more clearly.

For example, in our analysis of Mannian PCA, it was important that we were able to demonstrate the mechanism of the bias – short-segment centering led to the algorithm in effect “mining” for HS-shaped series, which, in that case, were the bristlecones. In the present case, our collective understanding of the problem is still a little empirical – though obviously even that seems to be a considerable advance, to say the least, on the level of analysis achieved by the Nature referees. IMO it should be possible to provide a more analytic explanation. With this end in mind, some time in the next few days, I’ll post some notes that will attempt to connect RegEM with known statistical methods i.e. methods used by someone other than a Team member.

34 Comments

Paul, I’m fairly sure that it would depend on the exact structure of the data – i.e. which areas were cooling, which were warming and which stayed the same. Given the quirks of this algorithm it may be possible to have a net change of 0, where the interior is cooling and the coast is warming, and have it pop out a strong positive trend, ignoring the cooling interior like it seems to have done in this case. We won’t know until somebody tries it, though. That’s the problem with sui generis methods.

#2 Yes. Depending on the structure, you can get it to enhance cooling instead of enhancing warming. If you run RegEM at high enough regpars in this particular case, you can get massive cooling (you can also get massive warming). If you run it at regpar=1, the whole continent cools. regpar=2, the whole continent warms.
.
Earlier I had made this for the Air Vent:
.

.http://noconsensus.wordpress.com/2009/05/06/regem-the-magic-teleconnection-machine/
.
In order to prevent this type of behavior, I decided to impute the ground stations separately, and then use that result to impute the PCs. Doing it that way yields more consistent solutions, without all the spurious correlations and sign flipping.
.
This has some philosophical appeal. Throwing everything into the meat grinder at once allows the PCs to drive each other’s imputation both directly (because when only the first regpar eigenvalues are used to approximate the data, they are no longer orthogonal and hence will show correlations with each other) and indirectly (because they will affect the ground station imputation, which, in turn, affects their own imputation). Because the record length of many stations is short, this allows spurious correlations and some odd behavior. Also, since the PCs by definition start out as orthogonal, it does not make any sense to allow them to interact with each other.
.
Imputing separately helps minimize these problems. It allows actual ground station temperatures to determine the station covariance (which does not remain constant in time). During imputation, the PCs are then “bent” to the actual ground station solution. Rather than trying to force the stations to fit the satellite-determined covariance, the stations themselves are left alone (after all, we have actual thermometers there to measure temperature) and satellite covariance is used to describe what happens between stations. This would seem to be a more physically appropriate method. It would also seem to dictate using the model frame (where the actual data is not spliced back in) rather than what PTTLS returns.
.
Steve – have you made any headway in what Steig means by “statistically independent”? The only thing I can come up with is a concept from ICA – where two variables are statistically independent if they contain no mutual information. If this is what he means, it is an inappropriate stopping point for PCA because the only constraints in PCA is that each successive component explains the maximum possible variance and that they are all orthogonal to each other. PCA allows mixing of signals, so different components often contain portions of an underlying factor.

If that is the type of statistical independence they wanted, then they should have used a proper selection tool (such as broken stick) and performed ICA on those selected components. I cannot see any justification for using “statistical independence” as the selection tool itself.

have you made any headway in what Steig means by “statistically independent”?

I posted something on this a while ago – though I understand that it’s hard to find anything. I’m 99.99% certain that the following describes Steig’s use of the term.

In Chladni figures on regular geometric shapes, some eigenvalues are identical. In such circumstances, any linear combination of the eigenvectors is also an eigenvector with the same eigenvalue. So the eigenvectors with identical eigenvalues are not “independent”.

North (1982) considered the situation where eigenvector had close but not identical eigenvalues and observed that the introduction of noise readily caused linear combining of the eigenvectors – thus they were not “statistically independent”.

BTW I think that you are being excessively deferential to recipes for PCA retention. In the case at hand, I think that it is important to start with a model of the temperatures being spatially autocorrelated, with stations being irregularly distributed, and trying to deduce the most appropriate method of calculating an average temperature at any given time.

This is actually a lot like an ore reserve calculation – something that a number of CA readers are familiar with – where kriging is a tried, true and stable method – and works out about the same as the regular gridding methods that you and the Jeffs have experimented with.

Re: Steve McIntyre (#4), I would interpret your response as meaning what Steig’s really referring to is that he selected PCs that show no degeneracy and have eigenvalue confidence intervals that do not overlap. I have the North paper, so I’ll give it another read. A bootstrapped eigenvalue/eigenvector test would show whether the latter claim is true. I do have a script for doing that, but it’s for an old version of R and I haven’t had time to update it yet. For the former, degeneracy (i.e., doublets) would seem to be a poor selection tool since obviously significant factors often show up in PCA as doublets.
.
On the PC thing, personally I would prefer to use 15 PCs – not the 11 that broken stick says – because there is clear resolution of the Peninsula warming at 15+ PCs. I suppose the excessive deference to selection rules was mostly to show that there is no generally accepted test that I can find that would indicate that Antarctica can be properly described by only 3 PCs. Another reason for the deference to the selection rules is simply because I personally have little practical experience with PCA and I felt I needed something concrete to guide my decisions.
.
In this case, once you get up to about 15 PCs, the number you take simply doesn’t matter. The answer doesn’t change. That indicates to me that if you discard additional PCs and only take 3, you are discarding important information about the covariance. Another indication of this is comparing the PC/RegEM method to gridded methods (like the gridded AWS reconstruction that Jeff Id and Jeff C did). As you include additional PCs, the PC/RegEM trend values and distributions approach the gridded solution. To me, this means that using only 3 PCs results in throwing away important information.
.
The purpose of the selection rule discussion was to show that there are also acceptable (albeit heuristic) tools that show a similar thing: 3 PCs are simply insufficient.
.
As an aside, it seems like doing the separate imputation of the ground station matrix and then imputing the PCs results based on a fully-populated ground matrix is analogous to kriging. The spatial weights (left eigenvectors) of the PCs serve as the kriging estimator and also serve to show the precondition that there is sufficient spatial dependence for the interpolation to yield valid results. I must admit that my linear algebra skills are not sufficient to show this, but based on performing the 100+ reconstructions and watching how the ground station matrix drives the reconstructed values, I believe this interpretation to have some validity.

Have you read Wahl and Ammann 2007 on PC retention? Wahl and Ammann is the Team attempt to rehabilitate MBH and they argue that it is a mathematical law that you need to retain enough PCs to obtain convergence – i.e. it is supposedly a law of nature that the bristlecones should be included in the MBH reconstruction regardless of whether they are a valid temperature proxy.

It’s turgidly written, but the hypocrisy reaches astonishing levels even for the Team. Bishop Hill wrote a nice history in his post “Caspar and the Jesus Paper.

Re: Steve McIntyre (#7), I did read Wahl and Ammann a while ago, but it was before the Steig stuff forced me to obtain a sort-of working understanding of PCA, so my eyes glazed over during many parts. I will take your suggestion and read it again – I imagine I will understand it better this time around.
.
The other interesting thing is that I’ve gone back and read some of your past posts on Mann and the criticisms that you and Ross had made make a lot more sense now – as does the Wegman report.
.
<–Time to re-read many things. :)

Re: Steve McIntyre (#7), Not to turn this into a Wahl/Ammann review thread, but I found some of their arguments confusing. The most confusing was the argument that the inclusion of series that do not demonstrate temperature information in the 20th century can still be valid temperature proxies because of some larger relationship with ENSO and other climatological processes that can be assumed to contain some unquantified amount of temperature information. This makes no sense.
.
The argument that just because the inclusion of these series results in better reconstruction statistics indicates that they are valid temperature proxies seems problematic. This is a multivariate problem, and one could easily argue that they result in better reconstruction statistics because they are good CO2 proxies. In the period over which sufficient instrumental data exists to perform calibration/verification (about 1800-present) CO2 has approximately followed temperature. Prior to that, however, the CO2/temperature relationship breaks down. So it would seem incumbent upon MBH and WA to show that their reconstructions do not actually represent something besides temperature – and I see no evidence to indicate that. WA even admit as much, but then hand-wave it away by saying there’s nothing in the literature saying otherwise – so the assumption must be valid. Um . . . eh?
.
I also don’t understand the reliance on RE nearly exclusively, especially given that most of the reconstructions yield such poor CE statistics. Given that the total temperature change during the investigation period is approximately 1 degree and the noise is much higher than that, valid reconstructions should result in nearly equal RE and CE statistics. I can show this for the Antarctic reconstructions. Even with positive RE and CE scores, areas that show a significant divergence between RE and CE also show a significant divergence in the computed temperature trends. Not only do both need to be positive, but they also need to be close (though the latter requirement can be relaxed if the real temperature trend is strong).
.
Besides, as I mentioned before, I think RE is a terrible statistic upon which to rely. Steig’s reconstruction results in RE statistics that are almost entirely positive, yet the model frame from the Steig method can be shown to have a statistically significant, opposite trend sign in many regions with positive RE statistics.
.
I have to think more about their PC arguments. Even after 2 reads I didn’t really understand why they did what they did.
.
Regardless, I don’t see how they present any evidence that the MBH technique can differentiate temperature/precipitation/CO2 factors to produce a reconstruction that is a temperature reconstruction.
.
And I’d be willing to bet – based on the series you have to include to get “good” RE values – that you’d get even better statistics if you called it a CO2 reconstruction. A gas-ometer, as it were, not a thermometer.
.
Okies. All further comments from me about WA will be on the PC side of things – in order to stay relevant to the topic at hand.

The only reason they “rely” on the RE statistic is because it’s the only statistic that the MBH reconstruction passes. One of the battleground issues re MBH was their withholding of the adverse (failed) verification r2 statistics in the early network – even though their Figure 3 illustrated this statistic for the AD1820 network where it passed. Mann denied even calculating this statistic to the NAS Panel, even though MBH98 refers to it and his source code produced for the House Energy and Committee shows that he calculated it. Wahl and Ammann initially withheld the information as well. Oddly enough, I was asked to review their submission and asked for it in my capacity as an anonymous reviewer; they refused to provide it, Stephen Schneider supported their refusal and I was terminated as a reviewer. I filed an academic misconduct complaint against Ammann and it was reluctantly included. It’s a long sordid story. See for example http://bishophill.squarespace.com/blog/2008/8/11/caspar-and-the-jesus-paper.html

I have to think more about their PC arguments. Even after 2 reads I didn’t really understand why they did what they did.

It’s actually very relevant to Steig, since Wahl and Ammann style retention (developed ad hoc to prop up MBH) undermine Steig – a point that I’ve made from time to time.

Here’s what Wahl and Ammann are up to: the bristlecones are in the PC4 using covariance PCs. MBH had an incorrect PC methodology (a point that attracted much attention when we reported it a few years ago) and this demoted the bristlecone hockey stick from the PC1 in the North American network to the PC4. MBH had retained only 2 PCs in this network. Retaining only 2 PCs resulted in a non-HS recon.

The “reason” for doing what they did was to develop an after the fact method for “getting” the bristecones into the recon. Wegman stated that the Wahl and Ammann methodology had “no statistical integrity”. IPCC ignored Wegman and adopted Wahl and Ammann as the last word on the matter. There is no evidence of the PC retention methodology advocated in Wahl and Ammann actually being used in MBH and it is impossible to replicate observed retentions in other networks using the ad hoc method.

Burger and Cubasch make the subtle point that since RE and CE are only different, by definition, by the difference between the means of the calibration and verification periods and that if the series is stationary (as has to be assumed for any proxy to “work” unless the non stationarity is accounted for) then those means should be equal and thus RE and CE should be identical. They also point to evidence that calculating RE and CE as separate measures is a rather new invention in climate science.

“On the verification of climate reconstructions” by G. Burger and U. Cubasch, the authors note the following:

Before we explain our testing procedure, we recall one of the most basic estimation principles: that a regression/verification exercise is generally nonsense if calibration and validation samples are not drawn from one and the same population. Consequently, differences between sample properties, such as calibration and validation mean, must be considered completely random. Accordingly, verification methodology was developed from and for stationary records, such as weather or riverflow (Lorenz, 1956; Nash 5 and Sutcliffe, 1970; Murphy and Winkler, 1987; Wilks, 1995).

In that method, mean and variance are usually seen as population parameters relative to
which errors are to be measured. In fact, from the original articles wherein RE and
CE were introduced, (Lorenz, 1956) and (Nash and Sutcliffe, 1970), respectively, they
were only two different names for one and the same thing: the reduction of the squared
10 error relative to the variance of the predicted quantity. Only later they appear as distinguished entities (Briffa et al., 1988; Cook and Kairiukstis, 1990; Cook et al., 1994),
in that reference is explicitly made to calibration (RE) or validation (CE) variance.

That latter score, CE, has been attributed to the hydrologic study (Nash and Sutcliffe, 1970).
However, we have not been able to find therein any reference to a validation mean, 15 nor in any of the articles we checked from the hydrologic literature (e.g. Legates and
McCabe, 1999; Wolock and McCabe, 1999). It thus appears that RE and CE were originally envisaged as identical measures but have been mistaken for distinguished entities in dendrochronology. We emphasize that their difference solely reflects sample properties and must be considered random.

FWIW, I posted a link to Ryan’s work on WUWT to Tamino’s blog. It didn’t see the light of day. In fact anything I post there containing the words ‘Steig’, ‘Santer’ or ‘Mann’ doesn’t get through, no matter how innocuous my comment may be.

Steve M. and Ryan O.–
I am learning a lot from your post(s) and back-and-forth here. Thanks! For me, this is great science discussion. Only on CA. (Just a moment here while I genuflect–well, I’m being a little snide there, which is unfortunate, because I genuinely do think this is a terrific blog.)

Ryan do you know what happens if you make exactly the same calculations except with all the inputted trends reversed? i.e. flip all the input series anomalies around 0, so that all warming becomes cooling and all cooling becomes warming, does the output also become inverse or does something else happen?

If flipping the input trends doesn’t flip the output trends we know something is seriously wrong with the algorithm. If it does, that suggests its problems are primarily spacial in nature.

By the way, Jeff Id had done something similar to this a few weeks ago:
.http://noconsensus.wordpress.com/2009/04/20/retrended-regem-recon/
.
What happens to the pre-1982 data in his is quite interesting.
.
Re: Nicholas (#12), If you flip the signs of the data, the signs of the reconstruction follows. I wouldn’t say that Steig’s method has a warming bias per se, rather I would say that it improperly locates trends geographically and exaggerates whatever trend exists.
.

I wonder what happens with simpler test data.
* Give every station exactly the same temperature for all periods.
* Increase all stations at the same linear rate.
* Decrease all stations at the same linear rate.
* Increase all stations until midpoint, then decrease to reach the starting temperature.
* Decrease all stations until midpoint, then increase to reach the starting temperature.

Yes, I know some steps require information from variations, but it would be interesting to see if anything amusing happens with simple data. The obvious modification to provide slightly more complex data is to use low-amplitude sine waves of suitable frequencies.

You have all of the elements needed for a rebuttal letter to Nature challenging the Steig et al. results as being arbitrary and based on personal choice of PC’s and Regpar. The graphic “Which Antarctica Do You Want?” is a compelling demonstration of the arbitrary result of that choice.

It would be a simpler, faster, approach than a full fledged paper, and quite effective I’m sure. The disposition of such a letter will also tell us what Nature’s editorial board is really made of.

I enourage you in the strongest possible terms to write such a rebuttal letter to Nature. Posting here, on WUWT, Air Vent, etc. is a wonderful way to bring it to the attention of our circle of climate netizens, but it has a limited reach and as the ever vigilant Tamino demonstrates, the effect of causing a “circling of the wagons” rather than elicit a response of scientific integrity.

“Ignore it and hope it goes away” sometimes works when you don’t expect your opponent to fully rise to the challenge. In this case, the only way to rise to the challenge is to confront it at the peer reviewed publishing level.

The Team won’t likely step up to correct their own error, or even to challenge it at the current level, since at this stage any acknowleding it and stimulating any additional exposure is troublesome to them.

If Nature remains a true periodical of science, a rebuttal letter will be accepted and the challenge met by The Team.

It is an important litmus test of the state of science today, and I urge you to make this test on behalf of all of us whom have faith the in maxim: “science is self correcting”.

If it is, a rebuttal letter will elicit a demonstration.

Thank you for your consideration.

Steve M: just as a point for later verification of a possible future topic, you might want to make a note of the IP address this comment was posted from.

Re: Anthony Watts (#27), At first I thought that was the route to go, but I’m not sure it’s the best way. The real point here is not whether Antarctica is warming/cooling/schizophrenic, but, rather, the methodology employed. I have far less concern over what Antarctica is doing than I do over inappropriate methodology that seems to be Kinko’d all over the place.
.
My feeling right now is that, in order to present a worthwhile critique of the methodology, that a full-fledged paper is the most appropriate. There simply isn’t space in a rebuttal.
.
The prospect of a full-fledged paper is a bit intimidating, TBH, but I personally feel that it is the best way. However, it is definitely something that I cannot do alone. I believe there are a few folks interested in helping out, so I think I would prefer to pursue that road.
. ;)

Re: Ryan O (#28), You could cast your tiles upon Nature in a note that something seems to be wrong, and it’s either something wrong with what they published or with the author’s following their rules (and thus difficulty in properly duplicating their work). Either way, they should be concerned about how it reflects upon them.

Not that their reputation hasn’t already been damaged. One can hope they’ll care about getting it repaired before too many academics lose interest in being connected to them.

Steve, just asking – have you checked this work by Ryan? I believe the 2 Jeffs have, but haven’t seen if anyone else has verified the calcs / working out. cheers. ps I’d do it if I could – just way way past me.

Wow. A whole week posted at CA, WUWT and, of course, Airvent, and not a peep, cheep or chirp anywhere, except a snarky dismissal from Tamino and a lamer-than-usual foray from Flanagan. It’s quiet. Too quiet.

Sorry, Jeff… I got the reference from you… Tamino’s thread on dangerous curves… and it wasn’t his reply but rather his Greek Chorus… I guess I owe Tamino a sort of apology: he hasn’t made a snarky dismissal but has left it to his sycophants. (I was going to add “yet”, but I’ve got a feeling that this is one argument he (and others) would really, really like to avoid.)