Wahl, Ritson and Ammann on VZGT

Wahl, Ritson and Ammann, the authors of two rejected comments on MM05 to GRL – see here and here for our Replies to the rejected comments – have joined forces and pulished a critical comment on Von Storch et al [2004] in Science, to which von Storch et al have issued a Reply. realclimate has issued an editorial here.

There’s quite a bit of back story to cover on this exchange, which I’ll cover in more detail on another occasion. For now, here are a few quick observations. I’ve pointed out on this site for a long time both (1) that VZGT had not correctly implemented MBH procedures; (2) that I did not think that VZGT had correctly diagnosed the problems with MBH98.

I had identified at least 3 different problems with VZGT implementation of MBH: (1) they did not appear to implement a re-scaling step unreported in the original MBH98 article. As I pointed out in my AGU05 presentation, the variance differences alleged by VZGT empirically did not exist; (2) in the GRL Comment on MM, VZ did not accurately implement the goofy MBH principal components method, seemingly not fully comprehending just how bad the method was; [Update -May 4, 2006: I’ve reconciled code with Eduardo. Their description in GRL of what they did certainly suggested otherwise, but they did implement the key features of the Mannian PC method – so the differences with us lie either are probably due to the next item.] (3) relying on Jones and Mann 2004, the VZ (and the VZGT) pseudoproxies wildly over-estimated the temperature signal content of MBH proxies and did not allow for "bad apples".

I don’t blame (and didn’t blame) VZGT for any of these "problems"; the fault lies entirely with Mann et al. (1) How could VZGT replicate a re-scaling step that was never mentioned in the oriignal article? (2) While I think that we provided enough information in our articles to decode the MBH principal components method, I can readily understand how people would assume that the method was more reasonable than it really was and think that it wasn’t possible that MBH used such a weird method; but it was possible and it did happen. (3) absent a detailed investigation of MBH98 proxies of the type that we’ve carried out, anyone relying on MBH information would assign much better behavior to the proxies than is justified.

To the above three points, Wahl et al add a fourth: that VZGT calibrated on detrended proxies rather than non-detrended proxies. In reply, VZGT argue that Wahl et al overstated the impact of this methodological difference on their results, which they claim to be valid even with calibration on non-detrended proxies. Both Wahl et al., and especially realclimate, gloat over this seeming "error" in VZGT implementation. However, to the extent that VZGT have incorrectly implemented this MBH procedure, then one can certainly see some basis for the misunderstanding. In a criticism of MM03, Mann et al said:

The use of gridpoint standardization factors based on undetrended data (MM) to unnormalize EOFs that had been normalized by standardization factors of detrended data (MBH98) implies a pattern of bias in the projection of an eigenvector onto the surface temperature field that is increasingly large in regions where the 20th century is large.

Similarly in the Corrigendum SI, MBH stated:

Standard deviations were calculated from the linearly detrended gridpoint series, to avoid leverage by non-stationary 20th century trends. The results are not sensitive to this step.

While neither of these points specifically refer to proxies, MBH procedures are so poorly and inaccurately described that one can see why VZGT might innocently assume that if detrending was the"correct" MBH method in standardizing gridcell standard deviations, then it might very well be what MBH did with proxies. In the Original Supplementary Information to MBH at Nature (now deleted but presered in a University of Massachusetts mirror), MBH report results from a detrended (DET) run. So even if VZ have inadequately modeled one MBH variant, one could see why they might at least think that they had modeled the DET alternative.

I must confess to feeling a certain amusement at Mann savaging VZGT for allegedly "incorrectly" implementing his precious methodology. Back in 2003, when we sought clarification of MBH methodology, Mann refused, on the basis that von Storch and Zorita had found his existing disclosure sufficient to implement his methodology (see Mann correspondence). In the Corrigendum SI, Zorita et al 2003 is cited on 2 different occasions as an accurate implementation of MBH methodology. In summer 2004, we advised Nature that the Corrigendum SI remained insufficient; Nature said that anything further was up to the authors. So if VZ subsequently misinterpreted MBH, surely Mann has only himself to blame. All in all, surely it proves a point we’ve been making for a long time: code should be archived so that this sort of confusion is avoided. Even now, code archived by MBH is incomplete – how do they calculate confidence intervals? This mystery would be resolved in 2 seconds by looking at code. Likewise their supposed Preisendorfer calculations. Neither was archived last summer.

Realclimate completely mischaracterizes the handling of the detrending issue by Bürger and Cubasch, accusing them of following VZGT in using detrended calibration. In fact, Bürger and Cubasch carefully distinguish between trended and detrended calibration, analyzing each as separate "flavors". Nothing wrong with that.

The closing paragraph of the VZGT response raises two important issues, which are familiar to readers of this site. They state:

It is commonly accepted that proxy indicators may contain nonclimatic trends. This is particularly true with tree-ring data (8), which were intensively used in the study by MBH98. The calibration and validation of any statistical method using nondetrended data are dangerous, because the nonclimatic trends are interpreted as a climate signal. Only in the case that the trend in the proxy indicators can be ascertained to be of climate origin is a nondetrended calibration and validation permissible.

Does this sound to anyone like they are coming around to recognizing the impact of bristlecones on MBH? They go on:

In the validation period, in contrast, the correlation between the (5-year-smoothed) reconstructed and observed NHT in the validation period 1856 to 1900 is 0.23. This low correlation skill in the validation period has been recently acknowledged (9 – citing Wahl and Ammann, submitted to Climatic Change).

They do not cite M&M – unfairly under the circumstances, since these are points that we originally made and have dragged out of Wahl and Ammann, kicking and screaming. Citing the verification r (correlation), which is not mentioned in Wahl and Ammann, rather than the verification r2 , which is so directly associated with our critique seems a little sly. However, on balance, I’m happy to see them coming round to the points that we made.

I think that Mann’s going to regret that Wahl et al have put all of this back on the table. It’s pretty amazing how they complain on the one hand about people criticizing a "10 year old paper" – with Mann, arithmetic within 25% is pretty good – and then continually pick at scabs by publishing stuff like Wahl and Ammann [Climatic Change] and Wahl et al [Science]. Without Wahl and Ammann, I’d have "moved on" to other long overdue studies. However, as long as they keep contesting things – and especially when they do so with misrepresentations and withheld data – then I’m content to keep returning the ball from my side of the court.

27 Comments

I think that Mann’s going to regret that Wahl et al have put all of this back on the table.

I agree, especially since it gave an opportunity for von Storch to respond. I especially like the end:

It is commonly accepted that proxy indicators may contain nonclimatic trends. This is particularly true with tree-ring data (8), which were intensively used in the study by MBH98. The calibration and validation of any statistical method using nondetrended data are dangerous, because the nonclimatic trends are interpreted as a climate signal. Only in the case that the trend in the proxy indicators can be ascertained to be of climate origin is a nondetrended calibration and validation permissible. In realistic circumstances, however, it can lead to an overfitting and lack of skill outside the calibration period. In this respect, the observed and reconstructed NHT shown in figure 1A in (1) only agree in the period with a large linear trend (centered in 1930). In the validation period, in contrast, the correlation between the (5-year-smoothed) reconstructed and observed NHT in the validation period 1856 to 1900 is 0.23. This low correlation skill in the validation period has been recently acknowledged (9). Furthermore, whenever the observed NHT deviates from the centennial linear trend (e.g., around 1950) the reconstructed NHT does not follow the observed temperature. In our opinion, these are indications of a dangerous nondetrended calibration.

Given all the huffing and puffing at realclimate this morning, it’s amusing to look back at the post which was in controversy last fall Is Gavin Schmidt Honest? The post discussed various defects in VZ simulations as they applied to the VZ comment on M&M, cited so lovingly by realclimate. It was posted through only after I raised a huge stink here and then Gavin censored my further attempts to comment.

But it’s ironic that this was the very topic of the Is Gavin Schmidt Honest post.

Step 1. Thoroughly mix a bizarre concoction of esoteric methods and unobtainable data. If the obscure methods are ones you have devised yourself one late summer afternoon on the back of an envelope, all the better. Describe inadequately.

Step 2. Determine the end of the world is nigh by affirmation of the consequent. Publish in Nature.

Step 3. Await requests for information from other scientists.

Step 4. Provide erroneous data and methods to other scientists. Other scientists publishes paper claiming data and methods are bunk.

Step 5. Simmer slowly on a low heat.

Step 6. Publish article pointing out other scientists completely misunderstood the technique. Make lots of disparaging remarks about the other scientists being evil and incompetent, not necessarily in that order.

Step 7. Sit back with smug grin as politically motivated cheerleaders uncritically lap up anything you put down for them, whilst simultaneously complaining about cheerleading and political motivation on other sites.

Just a short comment to say that the notation VZ is not completely fair: there are other authors that decisively contributed to these papers.

I am following the debate here and in realclimate. From my perhaps privileged position in this matter, I can say that I am learning some things about science but also a lot about.. human nature as well. It is being really interesting

I’m not content that you spend so much time on the distracting nitpicking arguments. A quick reply ought to be sufficient and then on to opening up new wounds. but instead you only have one decent paper published (GRL05). And I don’t count EE and I don’t count replies to comments.

You are fighting the enemy too much in the place of his choosing, vice driving (new) stakes into his heartland.

I’ve been thinking some more about the trending-detrending issue. I’m not sure that it makes any difference to the different variances and to this extent, for what it’s worth, I’m inclined to endorse your reply.

Wahl et al note the difference in methodology and assert that this methodological difference causes the difference in results. But they do not prove this. Your reply cerainly indicates that their assertion is not correct. There are other variables e.g. scaling and re-scaling; and the proxies themselves. I have zero confidence in the ability of any of the three musketeers to analyze this situation and place little weight on their diagnoses. However, I don’t think that matters are by any means reconciled. This hand is far from being played out.

I’m more inclined to place the differences on the properties of the proxies themselves. Until you do runs with pseudoproxies that more realistically represent MBH proxies – contaminated bristlecones, some proxies correlated to precipitation, some just noise period with no signal, then you haven’t really modeled the MBH dog’s breakfast. I think that that’s where a reconciliation will come.

I’ve done runs with MBH data – goofy PC1 and all – with and without detrending. After the 15th century, I didn’t get any material difference with and without detrending. The only material differences occur in the 15th century where detrending reduces the weighting of the bristlecones! I agree entirely with the observations in your closing paragraph and hope that my snippiness at not being cited didn’t precent this agreement from being clear.

My guess is that we are not too far away from being able to get a synthesis of our approaches – we have consistently argued that MBH is driven by a spurious regression with bristlecone growth. You’ve acknowledged this as an issue, but maybe it’s time to re-visit it. I think that you’ll get to the same conclusion from a different direction.

TCO, if you hang with scientists you can’t get away with seriously off-topic posts by hiding them in German. If you like metal oxides, then you should contact Frank de Groot at the University of Utrecht in Holland, and learn all about L-edge x-ray spectroscopy.

Now, since you are here, would you mind answering the following question.
In your Science (von Storch et al.) paper (combined with the response), your key contribution, in my opinion at least, is that you show, using simulated temperatures and proxies, that the “regression methods” used in MBH are flawed since they underestimate the pre-caliberation period. Moreover, it is evident from your response, that this is especially true if the noise is colored. These are, in my opinion, also the key points raised by Steve. So my question, as a working scientist, to you is that how do you justify the fact that you do not cite Steve’s results?

#18. Let me project the answer for Eduardo: he and Hans probably think that it would have created too much of a commotion, got everybody’s back up and interfered with their presentation and they are under enough pressure as it is.

Yes, I would have liked a citation. Eduardo is a very decent guy and I’m sure that he’s a little embarrassed about the non-citation, but I’m not going to get into a war with him about it. He, Fidel and Hans have been nice to me privately and, while I would appreciate more public support e.g. a citation here, they’ve been virtually alone in the climate science community in having spoken favorably about me in the past.

So Jean, while I would have liked a trick here, I can live without it. There are more cards to be played. I appreciate the support, but I’m confident that things will sort out under more favorable conditions.

Eduardo, if you’re reading this, I’m working on a long and interesting note synthesizing these issues, which I think that you’ll be interested in. If you’re online, why don’t you wait for that and we won’t pick at scabs.

Hopefully Edouardo will appreciate as I did explaining the GCM results he is intensively using. Will your homunculus JohnA running this webside refrain from insulting everyone who is using the word “GCM”? Will you stop of publishing personal relationships between people proving that there are “independency problems”? Will you and your disciples instead of soberly analysing data and their problems yelling all the time “frawd” and “communist Greenhouse conspiracy” as McCarthy in his best days?
For all so happy analysing the ECHO simulations, at the end of the simulation Edouardo is using you will find a strange and strong rise of temperature in the 20th century. It must be a lie certainly.

I’ve never used words like "communist greenhouse conspiracy" or anything remotely like that. I do not do yell "frawd" "all the time" and have intentionally stayed away from any such suggestions. I have repeatedly said that, for policy purposes, one should rely on a consensus; however, for scientific purposes, every detail should be probed. In terms of data analysis, I think that the record of this blog is exemplary.

I do not believe, for example, that the articles, Mann and Jones 2003; Jones et al 1998; Briffa et al 2001 and MBH, for example, are "independent" in terms that a civilian would understand. Your outrage should be directed at people making the ludicrous claims that these studies are "independent". Being non-independent doesn’t mean that the studies are "wrong". Each one stands or falls on its own merits. But don’t tell me that they are "independent".

Epica, that a model shows increased temperatures in the 20th century is a necessary, but not sufficient outcome of a
model. As far as the instrumental record is reliable, the model outcome must follow the same pattern, or should be discarded. That doesn’t mean that models have any predictive power, as there are too many sets of parameters which all can give the same “curve fitting” of the past century/centuries, from 3-5 times direct solar (at the cost of the CO2/sulfate tandem) to positive and negative responses of cloud cover to increased temperatures… After all we have only one equation with one/two dependent (temperature, precipitation) with many independent (solar, volcanic, GHGs, aerosols,…) variables and with interdependent feedbacks like cloud cover, air/ocean circulations)…

That even state-of-the-art models have troubles to reflect reality, can be viewed in figure 1 of Johanessen ea. where the ECHAM4 model underestimates the measured temperatures in the 1930-1940 period, and overestimates the influence of GHGs and aerosols in the second period (and thus overall for GHGs)…

I must confess to feeling a certain amusement at Mann savaging VZGT for allegedly “incorrectly” implementing his precious methodology. Back in 2003, when we sought clarification of MBH methodology, Mann refused, on the basis that von Storch and Zorita had found his existing disclosure sufficient to implement his methodology (see Mann correspondence). In the Corrigendum SI, Zorita et al 2003 is cited on 2 different occasions as an accurate implementation of MBH methodology.

Yes, quite laughable how some can speak out of both sides (whichever suits them at the time!).

It’s pretty amazing how they complain on the one hand about people criticizing a “10 year old paper” – with Mann, arithmetic within 25% is pretty good – and then continually pick at scabs by publishing stuff like Wahl and Ammann [Climatic Change] and Wahl et al [Science].

I’m more inclined to place the differences on the properties of the proxies themselves. Until you do runs with pseudoproxies that more realistically represent MBH proxies – contaminated bristlecones, some proxies correlated to precipitation, some just noise period with no signal, then you haven’t really modeled the MBH dog’s breakfast. I think that that’s where a reconciliation will come.

FWIW, I have to second this. When you actually look at the raw proxy data (which Steve has been very good about posting) you see how bizare the underlying data actually is. Using well-behaved time series to model them doesn’t begin to capture their diversity. Perhaps it would be more realistic to generate the pseudo-proxies using a number of different engines with very different properties.

If the color (or other properties) of the proxies interacts with the method, then I agree, that one must use pseudo-proxies that mimic that nature. However, I worry that sometimes Steve tends to weave a bit in his logic. During a discussion of methodolgy inserting comments about CO2 contamination of bristlecones is a non sequiter. That can be decided entirely independantly of a discussion of mining.

it’s not possible to cover every technical specifications in one single paer , no matter how collaborative it may be or how widely researched the paper may be . again , it goes without saying , that an older document may have sometimes infoermation relevant to new times.