VZ and Huybers Comment and Reply

The von Storch and Zorita Comment and the Huybers Comment, together with our replies, were published by GRL this week. I previously posted up on VZ here and on Huybers here, here and here and have nothing to add at this time.

Original copies of VZ is here; our Reply is here; Huybers’ comment is here and our Reply here. All rights reserved to AGU and not to reproduced without permission. VZ issued a press statement here.

Both VZ and Huybers agree that the MBH98 PC method is biased towards finding hockey sticks. VZ did not accurately implement the full hockey stick algorithm and this affects their results, so their affirmation of the "Artificial Hockey Stick" is based on only a partial implementation. VZ pointed out that our GRL simulations were in a red noise situation and argued that, in a situation where there was an actual signal, the mining tendency of the algorithm would not "matter". In such a situation, the real signal in effect competes with the mining tendency for the attention of the PC algorithm. With proxies that have a signal content, the signal will win out and even the biased MBH98 algorithm will find the signal; however, if the proxies have a very high noise level, then the MBH98 algorithm will find a hockey stick instead of the signal. The MBH98 proxies do not have the correlation to gridcell temperature hypothesized by VZ and a high noise situation applies.

If there are some "bad apples" i.e. nonclimatic hockey stick series, even 1-2 bad apples can substantially distort the result. Even 1-2 bristlecones in a network of over 50 other sites suffice to produce a seemingly "significant" PC using the MBH method. So it’s even more biased in an environment with bad apples than in a red noise situation.

As to Huybers, there is no reason to think that the correlation PC1 is a magic bullet for extracting a "signal" from the MBH98 proxy mess. It does give more weight to bristlecones – but surely that’s the issue at point. We do not say that the covariance PC1 is a magic bullet, but it’s what you would do if you were implementing a "conventional" analysis based even on Huybers’ statistical references [Presiendorfer and Rencher]. I’ve commented on this at length.

Huybers does not confront the problem of the failed cross-validation R2 statistic; I think that the referees should have required him to deal with this, but they did not. If you’ve got a failed R2 statistic, it’s only a matter of curiosity as to explaining a spurious RE statistic. We had showed how the flawed PC method was intimately related to spurious RE statistics. Huybers criticized our explanation. We modified this explanation slightly implementing further details on MBH98, including the most up-to-date information on the source code. with only slight modifications, we once again showed that spurious RE statistics resulted from simulations (which included the re-scaling step proposed by Huybers.) So our explanation that the RE statistic was spurious stands.

Neither VZ nor Huybers discuss the peculiar association of MBH98 with the bristlecones. Huybers suggests that bristlecones would be good to research "in the future". Give me a break. This should have been done by MBH98 before they published, not after.

There’s nothing in either of these comments that resurrects MBH98 or refutes any of our findings.

Steve, regarding your comments above on the PC methodology: even if one were to find that one or two of the tree ring sites was a “good” temperature proxy in that there was no evidence of fertilization or other variables to explain 20th century growth, would that make this approach valid? I don’t think so. All of the tree ring sites are attempting to measure the same thing. If 99 of them show no response to temperature but 1 is correlated with 20th century temperature, it is probably just spurious correlation. PCA should only be used,if at all, to sort among DIFFERENT types of variables, such as to see if tree rings are the best proxy, compared to ice cores and sediments and so forth.

Steve, two of your links are not working. One is to your reply to VZ, and the reason is a typo in the URL that resulted in ".or" rather than ".org". I supplied the missing "g" and got your reply. The link to your reply to Huyber’s comment simply doesn’t work and results in a 404 "not found" error.

The last sentence of the published VZ article reads “Other concerns raised by MM05 [see e.g., Crok, 2005] about the MBH methodology have not been dealt with.” And the last sentence of the draft comment from Huybers reads “Those biases truly present in the MBH98 temperature estimates remain important issues, and corrections for these biases will be taken up elsewhere.” Two very significant statements. Can someone confirm that the last statement in the final version of the Huybers comment is as in the draft?

The final paragraph in the printed reply from Huybers reads “In summary, MM05 show that the normalization employed by MBH98 tends to bias results toward having a hockey-stick-like shape, but the scope of this bias is exaggerated by the choice of normalization and errors in the RE critical value estimate. Those biases truly present in the MBH98 temperature estimate remain important issues, and corrections for these biases will be taken up elsewhere”.

For VZ the reply ends “Finally, we note that we have strictly addressed here the question of the PCA-centering within the MBH98 algorithm. Other concerns raised by MM05 [see, e.g., Crok, 2005] about the MBH methodology have been not dealt with”.

At least Huybers commits in print to take up these other issues, and hopefully von Storch will as well.

A. Steve M. seems much more willing to dig into the math and to discuss the actual statistical arguments on his blog, then is done here. There is even a comment from author here in this thread that says you are “just observing” the argument. That’s not scientific. You should care about the answer and you should (at least Micheal Mann) have the experience and the mathematical skill to engage on the issue. You should be reading both comment and reply and thinking about the issues and what is right and why…and ought to have an opinion. It’s not about “being proved right”, but about discovering the truth. You should dig into it. Let’s look at the algebra!

B. There are some extraneous posts to the hockeystick controversy (e.g. 7, 18, etc.). If you are going to police the skeptic sillies (and I admit there are many), you should do the same with the cheering choir section (re offtopic posts).

C. (On content) Both of the comments say that there were problems with the MBH method. They may disagree on the impact of it. But they still say that it was flawed. You should look into this (completely honestly and objectively) and acknowledge it or disagree with it based on mathematical, objective reasoning (not tendentious). Even if it doesn’t change the overall impact (and I’m not granting that), if it was technically wrong, you should acknowledge it. Otherwise, you’re defending a position rather than being a scientist. A scientist cares only for truth and will concede any point if he feels it to be true, regardless of some overall larger argument or some policy issue.

D. Given that the comments had replies (on topic and on content) regarding the RE and the PC centering issue, I don’t see how you can say that this is settled. Surely, you should at least summarize Steve’s reply argument so that you have captured the thinking. (If you can see a flaw in his thinking, point that out of course, too.) But just claiming victory, seems off. Surely, you should do what any objective reader would do, which is to consider both comments and replies.

E. If you want Steve to come here and engage on the specific math arguments, I’m sure he would be glad to. However, his posts are not being allowed.

I tried posting a challenge on RealClimate to discuss the bristlecone pines the other day and it hasn’t appeared either. And I wrote a follow-up to that one asking if the previous message had been unsuitable and have seen no response to that either. Of course I don’t blame the people blocking messages they don’t want to appear for not responding off-line to say so since I’m sure they know that if they admit they’re censoring posts from non-sycophants they blow their cover and the fact will be all over the blogosphere (starting here) in nothing flat. But again, it’s still just a holding action since eventually if enough people are blocked from posting on their site, and make it known in their usual haunts, the truth will out.

So thanks for the Latin, George, but I’m still looking for a Latin phrase that says the equilivant of: “Delay, the next best thing to being right.”

Well, while we’re talking about it, I entered a comment into RealClimate a week or so ago which mysteriously disappeared. No e-mail explaining why or anything, just didn’t appear.

[For info: the comment was on a discussion of the use of vineyards in Europe as a proxy for regional temperature. William C dismissed vineyards as a proxy as they might be influenced by other factors as well as temperature. I observed that pretty much all proxy measurements have this shortcoming and using such an argument was a prime example of a mechanism for subconcious data selection, and very dangerous in a world of statistical analysis]

I was very careful how I worded my comment – no direct accusations, but it could probably be easily misread as such, so I wasn’t entirely suprised when it didn’t get published. But TCOs comments above really should have been included.

RealClimate’s posting on this topic is disappointing, but unsurprising. They only post up half the story (the claim they couldn’t find the replies is pathetic – I’m sure they are well aware of this site) and comments like “Hockey Team 2, MM 0″ just show the level of debate they wish to engage at; the censorship of dissenting scientific comments coupled with promotion of approving political comments speaks volumes.

RE#13: “William C dismissed vineyards as a proxy as they might be influenced by other factors as well as temperature. I observed that pretty much all proxy measurements have this shortcoming and using such an argument was a prime example of a mechanism for subconcious data selection, and very dangerous in a world of statistical analysis”.

It’s a beautiful looking site and the tone is very nice (well except a bit for the peanut gallery). The refusal to post comments on the substance worries me, though. Lambert’s site is a cut above in terms of free discussion.

they still haven’t put this post up. I think they got shamed into putting up the Steve’s post when several people started sending it and the blogosphere starting turning to see how RC was refusing to let him on.

Recent comment by Steve on [Huybers] being bright made me look at the comment by Huybers. I read the draft on his site and it seemed quite interesting. I disagree with the "come on" comment from Steve and the comment that Huybers should be ‘forced to address R2". This type of comment from Steve is what I call the "but A!" fallacy. Unwillingness to deal with comments on B and wanting to shift discussion to A. Huybers was very clear (and Steve could learn from that) in disaggregating the issues. He fits right in with my contention that an error in methods should be provable as an error in methods and not conflated with other issues (e.g. CO2) given that BOTH errors in methods and data selection are being alluded to. I have some more comments on the Huybers comment, but will reread Steve’s response first. But on the issue of disaggregation of issues and argument one by one vice conflation I stand firm.

#20. Some of your comments here are both tiresome and incorrect. Huybers is bright, but his main field is elsewhere.

In one way, the exchange with him illustrates the inefficiency of journal exchanges as a means of resolving issues.

In our original article, the idea was simply to show that you could have a spurious RE statistic, which was then on the horizon as an issue. Huybers argued that this demonstration did not include one then unreported step of MBH; his own calculations left our another step (making proxies), which we then considered and got an RE benchmark similar to our original article.

I cold reconcile precisely to what he did and I discussed this further on the blog with some ideas that I didn’t have at the time of our response. Our Reply completely out-flanked the original point. What should happen in a world not driven by little journal articles and little gotcha’s is that we reconcile our calculations and then present the results jointly.

I suggested that to Huybers, because I was totally convinced that any reasonable person would acknowledge that our Reply out-flanked his, but the reconciliation was worth reporting. I hadn’t met him at the time, I met him only last December, maybe the approach would have worked now. The hard thing for readers is that their take-away views are mostly driven by bias. People who want to think poorly of us will think that Huybers has a gotcha; people who don’t will think that we’ve replied fully.

In fact, we have replied fully. I worry about these things more than anybody – if there’s something wrong with our position, I want to know and get it out of the way. But there is no loose end on the Huybers front.

1. As I said, I reread the Huybers draft and am going through your reply right now Steve. I may be tiresome, but I’m correct, not incorrect when I observe that a correct comment by Huybers on “B” is irrelevant to your desire to discuss “A”. Huybers was very clear when he said that the issue of bristlecone CO2 was a seperate issue. He and I are right, right, right, right in being scientifically detached and examining issues one by one without conflating criticisms. You are not.* I won’t belabor the “editorializing” further unless there is something really new that you have to say that breaks through.

2. I’m glad that you found an error in his RE calcs and that he found one in yours. His comment was still useful in pointing out the error. (And please don’t get enraged and point out that that the error was due to Mann not documenting his methods…I agree that this was outrageous.) I use the term error in its broadest meaning and from the standpoint of variance from ultimate truth=error vice error=mistake by individual that we should laugh at him for.

3. I’m getting around to posting something from that whole big book that the Tukey article was in. Just got it from ILL.

*I was blown away by your comment that you didn’t want to publish small papers to address individual points as that ‘might make it look like the Team had won’, that I just didn’t comment on it until now. Not only is your approach completely wrong from the value of detachment of a truth-seeking scientist. It’s even wrong tactically from the standpoint of your campaign. That little comment disappointed me so much I didn’t want to post for a while (then I had some gin and relapsed….)

I realize I’m jumping into an old discussion, so sorry if I missed something. Steve makes the comment that the Mann procedure ‘mines’ for a hockey stick, while VZ states that a true signal will compete with the mining effect, and make it ‘not matter’. Has anyone ever bothered to create a series of real data, but data from bizarro world where the last decade of the ’90’s was actually the coldest in the past 400 years? Not red noise, but data with an actual signal. Would Mann still find the signal? Lets say there was data from one area, Capistrano, and all those Swallows made it appear warmer there than it really was? Hockey stick? How many ‘Capistranos’ would it take? And would something like that really prove anything? And, is there any worth to a demonstration like that, given the recent NAS paper?

And TCO, if you can remember back several weeks, what sort of gin was that?

re #23: Yes, Mann would still find a hockey stick. The reason is, as again stated by Steve, for PCA (and for the linear regression used also) the sign is immaterial. So in my understanding of the (whole) MBH procedure it is enough to have a (few) series for which there is a “blade” in the end. It does not matter if the blade is pointing actually upwards or downwards, it will get “corrected” later.

I have not proved this, but I think (speculate) that the fundamental property of the series responsible for the hockey stick creation under Mannian methods is not actually redness of the proxies, all you need is that you have a series such that the mean in the calibereation period departs from the overall mean. If you have autocorrelated proxies, this is more likely than with “white noise” series. I tried to illustrate that a long time ago here (my first post to Climate Audit :), but it was ignored by the readership :( IMHO, it shows nicely the difference between Mannian PCA and normal PCA.

Where is this critical? Well, MBH used PCA to reduce the number of the dimensionality of their tree-ring networks. Thereby, Mannin PCA creates some hockey stick shaped “proxy indicators” before actual caliberation and estimation procedure. This is the main critique published in McIntyre and McKitrick, GRL 2005, and clearly aknowledged in the NAS report (see pp. 86-88). However, IMHO, the NAS panel failed to see implications of this.

Mann tried to argue that the normal PCA would also have those hockey sticks in the lower numbered PCs, and one should include more PCs in the case of real PCA (so that those hockey sticks get included). This is complete BS, among other arguments, the true PCA is the BEST (in mean square sense) summary of data, so it is impossible to have a situation where another linear summary of data (Mannian PCA) you would have less “PCs” than in the true PCA.

VZ critique, on the other hand, is irrelevant as one has to see the Mannian PCA procedure in the context: after the “PCs” are selected they are _equally_ valid “proxy indicators” and the NH temperature is reconstructed from these proxy indicators. In other words, suppose one has a “true signal” (say PC1) and a spurious hockey stick signal (say PC2), and they both get selected. If the “true” signal does not have “a blade”, the spurious signal will overweight it in the later MBH steps. See MM reply to VZ for the better explenation. Essentially, IMO, in the case of MBH tree ring proxies (NOAMER especially), the thing is that the “true” temperature signal is either very weak (or nonexistent), so it gets dominated by a few pines which have a spurious correlation to Mann’s temperature proxies.

I think Steve has demonstrated a hockey stick with some stock data if that’s what you were interested in.

Now I take it that a question on the table here is – if the true history is a HS, then does the bias “matter”? The answer is a bit subtle and has got lost in much of the debate, but the NAS Panel has done a decent job here.

If the true history is a HS, then you should be able to get a HS without using Mannian methods e.g. with an average. Howeveer, contrary to Bloomfield’s statement at the NAS press conference, if you take the average of MBH proxies, you don’t get a HS. You need to mine the data to get a HS – the PC method is one component; the regression module is another under-discussed component. Now PCs are not the ONLY way to mine the data to get a HS. If you just manually comb through the data and select HS-shaped series, perhaps arguing that ex-post only these series are “temperature-related”, you can also get a HS. Arguably something like this happens in the “other” HS studies.

The main problem with Mannian PCs, in combination with their regression method, is that it is “too powerful” in detecting a HS. Because they will produce a HS from noise, one cannot claim that any given HS is statistically significant. The NAS Panel caught this issue pretty well and more or less rejected MBH claims to statistical skill.

None of the “other studies” made comparable claims to statistical skill and that’s presumably why they were less prominently featured by IPCC.

the true PCA is the BEST (in mean square sense) summary of data, so it is impossible to have a situation where another linear summary of data (Mannian PCA) you would have less “PCs” than in the true PCA.

Jean, based on what I’m reading, the PCA method creates a basis set of eigenvectors. Any other set of PCs then would just be linear combinations of the “best” PCs anyway, correct?

re #26: Yes, you can (linearly) recreate data from any basis you have. Also, the change of basis is a linear transform. The PCA gives the best basis in the sense than if your take n basis vectors (PCs) corresponding to the largest eigenvalues and any other n vectors obtained from the original data with a linear transformation (i.e., any other basis vectors), then (the projected) PCs give you the smallest error MSE with respect to the original data. In Karhunen-Loeve transform (KLT) jargon, this is sometimes called the “best approximation property of the KLT”.

The CA text mentions the error with a biased estimate, btw. I haven’t gotten into the details yet to speak intelligently on this subject. Most everything I do assumes zero mean so there is never an issue with that sort of bias (this is because the input to an antenna element, for example, is AC coupled, so there is no DC component… this is not completely the case in practice, however, and is the subject of much study).

TCO, if the true mean is removed from a block, covariance and correlation matrices are identical. Or is that not what you’re getting at? By removing only the mean from an arbitrary interval, there will be a difference, the bias I mention above, in using covariance vs. correlation.

TCO, if it makes you feel any better, the NAS Panel cited our Reply to Huybers and agreed with our position on covariance versus correlation i.e. that neither was a priori "right". They rejected Huybers’ argument that correlation was in some sense preferable on a priori statistical grounds, specifically stating that use of one or the other in a specific situation would have to be justified scientifically, not by a priori statistics, a position equivalent to the one in Reply to Huybers on this point. Our own position is that any output from PCA as applied to a tree ring network has to be proved as a temperature signal somehow and that Preisendorfer’s Rule N merely proves that there is a pattern, not that the pattern is a temperature signal.

I think the reason correlation and covariance are neither right could lie in higher-order statistics that seem to be present in the data. Rather, I think the mean as well as variance are time varying, w.r.t. tree-ring width/density, likely due to the abundance of other factors that influence their growth. When your statistics vary, your choice of mean/variance is, as shown, dependent upon your window size, interval choice, etc. Removing the mean of the whole sample gives one result, removing it for interval [t1,t2] gives another, for [t3,t4] yet another, etc.

Mark, I suppose you are the Mark who is a starting PhD student and currently studying ICA?? If so, I propose you a few questions you can think while you continue your studies: in PCA, PCs are naturally ordered according to their corresponding eigenvalues (variances), but in ICA all the components are assumed to have unit variance. Now
1) Why is it possible to assume unit variances?
2) Is there a correct way to order IC (=independet components)? If so, what is the correct criterion?

Question 1) is an easy one, and you can find the answer in many places. The question 2) is rather hard one/reseach level question, and you are unlikely to find a direct answer anywhere from the literature. Once you think you have the right answers, email me (jean_sbls@yahoo.com) and I’ll give you the correct answers ;)

1. Obviously if the techniques give you different results than one is more appropriate than the other. Which to use depends on the situation.

2. I thought your comments in emails about either one being a priori defensible were fine. The problem came in discussion on this blog and in your reply to Huybers where you took the interaction to be purely about defending your honor and made a one-sided defense of the covariance matrix. In addition to being one-sided, you were either tendentious or not thinking well enough with your remarks about “units”. As I clearly showed by both common sense and referral to people that I googled, it’s not just an issue of units, but an issue of scale of variance versus relevance to the problem. A responsive tree versus an unresponsive tree for instance. If you knew this but failed to share it, (when asked!)…that’s TENDENTIOUS. If you didn’t think it through…well now you know.

3. In addition, your discussion in general on this blog (Mark’s confusion is perfect example btw) as well as the GRL paper (but not the EE paper…but I’m TALKING about the GRL paper), conflates acentricity with standard deviation dividing. Examine one factor at a time! The Huybers paper was helpful in clarifying this. And don’t tell me about the EE paper. Touching second base once doesn’t allow you to skip it next time around the bases. And this is NOT an issue of having a long explanation in one area and then citing it in another. Because you don’t tend to do that. Instead you “simplify” and conflate issues and do so in the way which overmagnifies the acentricity issue. At a maximum it’s dishonest. At a minimum it’s sloppy, poor communication.

Mark T, read the Huybers comment. Heck, read the remark in the EE paper. There is a difference with the correlation and covariance matrix PCs (especially important if only a few are retained) even when the proxies are centered on the mean for the overall interval not 20th century.

They are each flavors: centered/uncentered and divide by standard deviation/don’t divide. It’s a 2 by 2.

38 TCO, if anything Steve is too even-handed with respect to correlation PCs. I vote for covariance matrix PCA, and if your data are in incomensurable units then sort that our first in a separate step.

When I learned PCA (not that it’s used much in economics), and when I’ve seen it discussed in stat package manuals, and in Rencher, which was Huybers’ citation, the use of correlation matrix PC’s gets a caveat that the covariance approach does not. If the data are already in common units then further standardization removes information you want the optimization to account for. There’s also a problem that correlation PCs (unlike covariance PCs) have a non-unique expansion, but I’ve forgotten the details just now (it’s in Rencher as well).

Dividing by the SD is just one of a zillion ways to transform a data series into a common index. Huybers tried to defend correlation PCs by citing the presence of 2 density series amidst 68 width series, but as we pointed out these are redundant in the data base since the width series from the same sites are also present, so applying the additional normalization overstates the role of those 2 sites. So, going with the discussion in Rencher, a reasonable person would conclude that for the MBH data the covariance matrix option is preferred. However, we didn’t want people to think the argument is simply one versus the other, the bigger point in that context is that if the results are robust it shouldn’t matter. But, if you were my student and you brought the MBH data set in I would say that if you must use PCA use covariance PCA.

If, on the other hand, you had, say, interest rates, stock prices, sulfur emissions, population and US National GDP statistics in $trillions, then you’d need to scale them into comparable units somehow. You could index them or standardize them, whatever makes sense. Then do covariance PCA. If you standardize by dividing by the SD you’d have a correlation PC, but you need to be sure that that’s a sound way to scale the data. So it should be a separate step with its own justification. Applying correlation PCs slips in a standardization step without saying why it’s the right one.

Yes, there is a danger in transforming inappropriately (as with not transforming). I think the best way to discuss this is reasonably pointing out the issues and the reasons for one versus the other. My issue with Steve had to do with the argumentative stance in discussion (for instance with “units” or appeals to texts when he is capable of discussing things forthrightly). In addition, I feel that he has (in GRL and in the blog) conflated “Mannianism” (i.e. the bizarre never seen in the world acentricity) with the decision on correlation/covariance. I agree that you all note the issue in the EE article more appropriately, although even here I prefer the clarity of a Huybers or Burger/Cubash explanation. And I don’t think that noting things correctly in EE allows you to conflate things in other discussions.

1. Jean, I don’t want to overstate the impact of the (unprecedented) centering convention by mixing in a (precedented and very debatable) “dividing by standard deviation convention”. It’s just common sense when you’re testing the effect of variable A, not to simultaneously change variable B (along with A) in your test comparison.

2. Read the Huybers comment. He really lays this out nicely.

3. Look at Burger and Cubasch flavor work (as a framework) for how to do logical analysis of multiple issues.

TCO, after this I’ll give up. I still don’t understand what’s your problem. Mann did not remove the mean. He removed the mean of the caliberation period. Mann did not divide my the std. He divided by the std of the caliberation period. So are you asking exactly which one of those is causing the hockey stick? Or are you asking which would have been a “correct” way to proceed, correlation/covariance PCA? What part exactly I should re-read from Huybers?

I was not aware that his std division was for the calibration period as well.

But still, the thing that was bizarre was the off-centric nature (of either, mean or std deviation). Steve conflates issues and overstates the impact of the bizarre by not dividing by std deviation in his test case.

Mark was using the correlation matrix in the sense of engineering. Mark, in statistics, by the correlation matrix it is meant the matrix of correlation coefficients.

I was speaking of the same thing… At least, the correlation matrix I mention is E[XX.’] where .’ is the matrix transpose (I use that rather than ‘ as the latter is Hermitian in Matlab). The result is R = [r00 r01 r02…;r10 r11 r12…;r20 r21 r22…]. Unless you are referring to the ultimate correlation matrix of R = [1 r01;r10 1] relating the correlation, I assume, of the “signal” with temperature?

Yes, I am the same Mark starting on the PhD, btw. Picking a topic has muddled my brain so excuse me if I say something out of line enough to cause confusion. That is not my goal.