More on MBH98 Cross-Validation R2

I have previously discussed here and here Mann’s answer to the following question from the House Committee:

"7 c. Did you calculate the R2 statistic for the temperature reconstruction, particularly for the 15th Century proxy record calculations and what were the results?"

Mann stated:

"My colleagues and I did not rely on this statistic in our assessments of “skill” (i.e., the reliability of a statistical model, based on the ability of a statistical model to match data not used in constructing the model) because, in our view, and in the view of other reputable scientists in the field, it is not an adequate measure of “skill.”

My previous discussion was based on the Supplementary Information where the cross-validation R2 statistic is notably not reported. However, I did not discuss the following Figure from the Nature article itself, which is well worth discussing.

Original Caption to MBH98 Figure 3. Spatial patterns of reconstruction statistics….For the r2 statistic, statistically insignificant values (or any gridpoints with unphysical values of correlation r , 0) are indicated in grey. The colour scale indicates values significant at the 90% (yellow), 99% (light red) and 99.9% (dark red) levels (these significance levels are slightly higher for the calibration statistics which are based on a longer period of time). A description of significance level estimation is provided in the Methods section…. The Methods section quoted in the caption says only: "For comparison, correlation (r) and squared-correlation (r2) statistics are also determined."

Before discussing the above figure, Mann’s full response to the House Committe on this question was as follows:

A(7C): The Committee inquires about the calculation of the R2 statistic for temperature reconstruction, especially for the 15th Century proxy calculations. In order to answer this question it is important to clarify that I assume that what is meant by the “R2″ statistic is the squared Pearson dot-moment correlation, or r2 (i.e., the square of the simple linear correlation coefficient between two time series) over the 1856-1901 “verification” interval for our reconstruction. My colleagues and I did not rely on this statistic in our assessments of “skill” (i.e., the reliability of a statistical model, based on the ability of a statistical model to match data not used in constructing the model) because, in our view, and in the view of other reputable scientists in the field, it is not an adequate measure of “skill.” The statistic used by Mann et al. 1998, the reduction of error, or “RE” statistic, is generally favored by scientists in the field. See, e.g., Luterbacher, J.D., et al., European Seasonal and Annual Temperature Variability, Trends and Extremes Since 1500, Science 303, 1499-1503 (2004).

The last sentence deserves some analysis, since, other than the Luterbacher article cited by here by Mann, I have been unable to find an article in which the RE statistic is used without also quoting the R2 statistic. Last year at realclimate, Mann cited Cook et al [2004] as additional support for this position, but dropped this citation after I pointed out that Cook et al [2004] also report R2 statistics (which were significant). It would be nice to cheeck Luterbacher’s work to see if he has a significant R2 statistic, but, unfortunately, Luterbacher has not archived any data to check this. (He published in Science, which has a poor track record in this regard – I’ll post about this some time.) However, a full analysis of this last sentence will have to wait for another day.

In the original SI , the cross-validation R2 statistic was not reported. You can see columns for calibration beta (which is equivalent to the calibration period R2) and for the verification beta, plus some r^2 and g^2 statistics pertaining to Nino, but, if you look closely, there is no verification R2 statistic. We remarked on this in MM05a and MM05b.

We then pointed out that the source code shows that the cross-validation R2 statistic was calculated for each step, which shows how the table in the SI was calculated. A poster at timlambert.org recently drew attention to this Figure presumably to refute the idea that the R2 statistic had not been presented by MBH98. So what does Figure 3 in the original article show?

The text says the following:

In the reconstructions from 1820 onwards based on the full multiproxy network of 112 indicators, 11 eigenvectors are skilfully resolved (nos 1–5, 7, 9, 11, 14–16) describing ,70–80% of thevariance in NH and GLB mean series in both calibration and verification. (Verification is based here on the independent 1854–1901 data set which was withheld; seeMethods.) Figure 3 shows the spatial patterns of calibration b, and verification b and the squaredcorrelation statistic r2, demonstrating highly significant reconstructiveskill over widespread regions of the reconstructed spatial domain. 30% of the full spatiotemporal variance in the gridded data set is captured in calibration, and 22% of the variance is verified in cross-validation. Some of the degradation in the verification score relative to the calibration score may reflect the decrease in instrumental data quality in many regions before the twentieth century rather than a true decrease in resolved variance.

So what we have here in Figure 3 is a graphic showing cross-validation R2 statistics by gridcell for the AD1820 step which has 112 "proxy" series and very different success than the controversial 15th century step. The "full" multiproxy network of 112 "proxies" includes 12 instrumental temperature series which are hardly "proxies" for temperature. One would expect some "skill" in reconstructing temperature, especially in the northern Europe area, when you are spotted 12 actual temperature series. In this network, the cross-validation R2 statistics were favorable and they were not only reported but presented in a prominent graphic.

MBH posted up a graphic demonstrating high cross-validation R2 statistics in a step when they obtained high cross-validation R2 statistics. Most readers would conclude that similar results applied in other steps. However, if the SAME graphic is done for the AD1400 network (I’ve done this and will try to locate it and post it up), the graphic is extraordinarily unfavorable. They’ve provided results for the AD1820 step, but not the AD1400 step, which is the controversial one. So it’s a little hard to reconcile MBH98 Figure 3 with the answer to question 7C to the House Committee.

128 Comments

I’m sure Barton will be calling on you to sort out these little technical details, Steve, as I’m sure he doesn’t have anyone on staff that can make sense of the response, and I doubt the folks who actually do this work every day will contract out to give him a hand. Make sure your v-mail doesn’t get full.

Mann says “My colleagues and I did not rely on this statistic in our assessments of ‘skill’ (i.e., the reliability of a statistical model, based on the ability of a statistical model to match data not used in constructing the model) because, in our view, and in the view of other reputable scientists in the field, it is not an adequate measure of ‘skill.'”
It appears that “not adequate” may be a way of saying “not sufficient.” In other words, Mann is staking out ground to say that R2 is not a sufficient condition for accepting a model. Others can agree with that and still find it completely implausible that he would omit R2 from his results.

Quote
“My one (anonymous) interaction with Mann, his co-workers and his critics was last year when I acted as a referee for an exchange of views submitted to Nature. After a couple of iterations I came to conclusion that I simply could not understand what was being done sufficiently well to judge whether Mann’s methodology was sound, but I certainly would not endorse it. At least one other referee came to same conclusion.”

There are 11 calculation steps in MBH98 with changing proxy networks. MBH98 (including SI) did not report the r2 test for any step prior to the AD1820 step. So the answer to your question is: MBH98 (and also MBH99).

The two statistics are complementary. Spuriously high RE statistics can arise(as we showed in our GRL paper) and the R2 statistic is a check against that. Both statistics should be reported.

A cross-validation r2 for the AD1400 step of 0.02 does not demonstrate "skill" and, in my opinion, would have seriously affected the reception of MBH98 if referees and readers had been aware of it.

But here’s an example of what I mean: MBH98 claimed to use “conventional” PC calculations, yet theirs were of the non-centered variety. Mann et al. even later admitted that “conventional” would denote centered PC calculations. It seems to be that either the reviewers (1) didn’t know what “conventional” meant in this context, (2) knew what “conventional” meant but didn’t realize “non-conventional” means were used in the actual calculations, or (3) actually assumed that “conventional” referred to non-centered PC calculations (which would match how the calcs were done but would be incorrect terminology and the wrong assumption according to Mann et al’s later admission). I would say that in any of those cases, the reviewers were confused and couldn’t “make sense” of what they were reviewing.

David H: thank you for the interesting quote about an apparent panel discussion. However, the quote is not with respect to MBH98, but rather to something else. The issue brought forth by MJ is that the reviewers for MBH98 “apparently couldn’t make sense of a lot of things” – therefore I’m seeking evidence for this assertion.

Steve: I agree that an r^2 of .02 is too low; I’m quite sure folk who do this for a living can scan a figure and if they have concerns about the gray blotches, they could contact M, B, or H and get further details.

Certainly with a first paper, you want to look at that and see if there is room for improvement in the future. As we see in subsequent papers, the time period mentioned has been further refined. None of these subsequent refinements to the initial foray indicate temps higher than today.

How is that “hypothetical?” Is there any other rational explanation for why the reviewers would allow “conventional” to be used to represent “non-conventional?”

Surely you don’t mean to suggest the reviewers knew what “conventional” meant, knew MBH used “non-conventional” methods, but decided to allow the term “conventional” in the publication to be used instead of “non-conventional.”

You claimed (in current #2) that the reviewers of MBH98 apparently couldn’t make sense of the material provided. Surely there is evidence of this confusion – correspondence, e-mail, notes, phone logs, transcribed conversations, second-hand accounts, lipstick scrawled on a napkin, quotes lost to linkrot.

I’m sure you have an evidentiary basis for your assertion. I’m just asking you to pony up is all, and not suggesting anything with regard to redefining terms, conventions as to usage, etc.

I have provided one significant example demonstrating an apparent lack of understanding on the part of the reviewers of what MBH submitted (that is, unless maybe you think it was a conspiracy to deceive).

It will require some logic to get you through the process. Try it, you might like it.

With reference to referees, the quote I used is about MBH98 but not prior to its original publication. Because of the many simple minor errors I feel sure that the original reviewers could not possibly have looked beyond what was actually going to be published and would have no idea that this would be a controversial paper.

On the other hand Steve has given a blow by blow account of his attempt to publish his audit in Nature last year and the quote refers to the refereeing of the Mann et al. response which Nature would have published along side Steve’s. With two of the referees unhappy with or at least unable to understand the justification of Mann’s use of non centred PCA there is good reason for some to smell a rat. Nature wanted Steve’s work written this way then that but I think it may be that they were stalling and in the end declined because they could not get Mann’s reply through the referees.

Well, I hope you’re having fun Dan0. Of course everyone here knows what you’re up to, but it’s always nice to see a new Mann-shill drop by. Maybe someday we’ll have the great mann himself show up and outpompous you all. Unless, of course, one of the regulars is actually him in disguise.

Which brings up an interesting question. Some of the people here, both pro- and anti- Mann don’t go by their actual names. If someone well known were to show up in disguise, could that person be recognized by language used or specific points made or avoided? Or, to make this just a teensy bit on-topic, how many words by a known writer and an incognito writer each would you need to make a good guess, statistically speaking?

Danzero, “Others can agree with that [R2 is not a sufficient condition for accepting a model] and still find it completely implausible that he would omit R2 from his results.

Pardon me, but I’m confused:

1. are you stating Mann omitted reporting r^2 in a paper? Which paper omitted r^2 from the results?
2. And are you stating that Mann is correct in using RE instead of r^2?

Thank you,

D”

You are indeed confused, see the word “sufficient”, do you understand what that means ?.

1. MBH98 did not publish ALL of the RELEVANT r2 results that are now known to have been calculated
2. No, RE ALONE is inadequate, or not sufficient on its own. You can have a “good” RE yet lack significance. RE should be cross validated. r2 alone is also not sufficient.

What you seem incapable of understanding (although I am sure that really you understand all too well) is that the selective use of results to present the most favourable light on your conclusions is unacceptable in all scientific fields. If you possess information that invalidates your conclusions and you fail to release that information, what does that say about your integrity ?

The evidence here is pretty damning, and all you can do to dispute it are a few semantic quibbles ?

Steve: I agree that an r^2 of .02 is too low; I’m quite sure folk who do this for a living can scan a figure and if they have concerns about the gray blotches, they could contact M, B, or H and get further details.

ROTFLMAO.

Danzero, are you sure you have been following this discussion? Many people here have contacted M, B, or H, to get further details. Without a subpoena from Congress, all of them (myself included) have gotten nothing.

The Idso’s CO2 site has access to US temperature graphs for various stations in the US. You have to pay to subscribe, and I looked at the graph and fit for Aberdeen, MS, 1890 to 2000, using default values. The Linear regression had R2 of 0.0% and F statistics which seemed reasonable.

R2 of 0.0% ??????

Conclusion? temperature variation had nothing whatsover to do with time, so the linear fit computed is specious. It isn’t warming or cooling, and verifies Brignell’s point on his NumberWatch that any truncated data series can have a linear trend fitted to it, but it is essentially meaningless. But visually one senses that there is some sort of trend, cooling, so the issue remains. I find that using a log scale for the Y axis tends to dampen the extreme values.

I recall that some urban stations had R2 > 50% which is a reasonable fit of the linear regression to the data, but still not being able to explain all of the variation.

From that one might be permitted to note that urban heat island effects are causing the CO2 content of the atmosphere to increase. Since it is the same atmosphere at Aberdeen as that at NYC say, given total mixing of atmosphere components by circulation etc etc, it is pretty clear that increasing CO2 is having nil effect at Aberdeen.

Similarly if Mann et al computed R2 stats that were essentially zero, then it is fairly obvious what the conclusion is.

Ok, I was serious about my question… I’m a relatively new researcher in geochemistry and don’t understand why there are such a fuss about this first paper when several have bean dun after that? I mean there you have all de data available to check if the statistics are correct? Or is this about the credibility of Mann?

Well yes I know there is a debate about the statistics and proxies and I guess time will tell… thou my money is on Mann for the minute ;)

But what I don’t understand is why all the fuss about the first paper if it’s hard to get all info abut it? Wouldn’t it be easier to check one of the newer papers, or several of them? Or is this because it’s to time consuming to get in to these as well?

Magnus, I’ve done a considerable amount of work on other multiproxy studies. For some of them, it’s not merely time-consuming to get data (time-consuming I can work at), but impossible. For example, Briffa et al. 2001 is a prominent study, but Briffa et al did not disclose the identity of their 387 sites and Briffa has refused to acknowledge any inquiries on the topic. Esper et al 2002 lists his sites, but most of them are not archived publicly – I suspect that many or at the password protected SO&P location – exactly why tree ring data needs to be password protected eludes me. I’ve had no luck getting access. I’ve been able to get data to Mann and Jones 2003, but haven’t been able to replicate the method.

I’ve collected data for Jones et al 1998 and most of Moberg et al 2005. Crowley says that he has “misplaced” the data for Crowley and Lowery 2000 except for transformed smoothed versions and doesn’t remember where he got some of the critical series. It’s quite a bit of work doing the collection and initial validation, butI’ve got a lot in hand.

As to the focus on Mann, I started on the Mann paper because it was featured by IPCC 2001. Also it was the source of the cut phrases “the warmest year in the millennium” and “warmest decade in the millennium”, which were used as an incantation by the Canadian government in the Kyoto promotion here. The MBH98 study is still used in all the spaghetti diagrams. No one has backed off from it.

The issues in the other studies vary. I’ve posted on this site a lot about the Pular Urals and Tornetrask sites you can locate b googling climateaudit Urals or climateaudit Tornetrask. These are key building blocks for Jones et al 1998 and you’ll see where I’m going with it. Crowley and Lowery depends on bristlecones and Polar Urals, so you can see where that;s going.

I’m also feeding the blog which I like doing, but it does distract me from finishing these other studies.

Steve, correct me if I’m wrong but isn’t the R^2 and RE implicitly (if not explicitly) contained in the confidence intervals for the reconstruction?. The decreasing robustness of reconstructions is graphically represented by the increasing confidence intervals.

I would also suggest that Mann is probably right in that R^2 is almost unheard of in meteorological/climate circles as a skill measure – Murphy 1995 (Weather and Forecasting 681-)is one of many papers which make the reason for this blatantly obvious… To quote.. “these coefficients (R or R^2) ignore unconditional or conditional bias” and “these coefficients are not measures of accuracy and skill” . World Meteorological Organisation guidelines for verification of long-lead forecasts (http://www.wmo.ch/web/www/DPS/SVS-for-LRF.html) do not even mention R or R^2 . Of course, we all know that verification is a multi-dimensional problem and the more measures used the better, but criticisms of a paper based on the failure to use your favorite “skill” measure is…. a bit rich.

David, what surely is rich is to calculate a statistic, find it unfavourable, then exclude that unfavourable statistic from your report.

Also, Steve isn’t (in my reading of this topic) criticising MBH for not using r2 instead of, say RE. What he is saying is that you can have a favourable RE statistic and the result may still lack statistical significance. Relying on RE alone is as bad as, say, relying on r2. However using both measures provides a good deal more information, and you acknowledge that. And it seems quite definite that in MBH the r2 was calculated for all steps. In fact it is used for the 1820 step to argue for reconstructive skill. Why, then, are the other r2 calculated values specifically not used, because if it was because it shows that the results for those steps lack significance then that is a very serious omission. Would you not agree ?

And that is just one of the serious criticisms of MBH98 & 99. Would you like to argue that the Bristlecone Pine records are indeed a appropriate temperature proxy or that we can rely on the Gaspe Cedar records ?

David, on confidence intervals, no. The MBH98 confidence interval calculations are a mess. I’ll just pick up one point for now. They calculate it as 2 times the standard error in the calibration period. A more reasonable calculation would be 2 times the standard error in the verification period. The standard error is related to the R2 and, with this very low R2, the confidence intervals would be larger than natural variability.

It’s not a matter of something being a “favorite statistic”. If the MBH index has a true relationship with temperature sufficient to provide confidence intervals, then it is impossible for it to yield a cross-validation R2 of 0.

I’m very familiar with Murphy’s work and quote it in our GRL article. Murphy’s view is that the R2 is necessarily greater than the RE statistic. IF you calculate the “skill score” (which is the term that Murphy uses, you start with the R2 and subtract the biases. Look at equation 7.20 in Wilks (1995) citing Murphy, which ironically Mann has cited.

This is a very important point, our simulations showed that the simulated hockey stick shaped PC1s had high RE and low R2 scores. So the R2 is simply a method for checking against a certain type of spurious result which is characteristic of data mining operations.

Finally, and this is an important point, Mann calculated the R2 scores and published them when they were advantageous e.g. the AD1820 network and other publications. In public securities offerings for mining promotions, which I’m used to, you’re not allowed to withhold adverse results. I have heard no argument as to why climate scientists should have lower disclosure requirements than mining promoters. Mann should have disclosed the adverse R2 results. If he wanted to argue that these statistics should be ignored because of a high RE statistic, he could do so, but that’s up to the readers to judge.

>David, what surely is rich is to calculate a statistic, find it unfavourable, then exclude that unfavourable statistic from your report…..

Ed.. this is getting close to accusing a scientist of fraud. Do you have any evidence to support such a claim? BTW why stop at R^2 and RE? 2 is obviously better than 1, but 3 is better than 2. In our office we calculated around 10, but could easy find another 10 to add to the list.

I am not a paleoclimatologist and have no interest in debating the merits of various paleoclimatic timeseries used in a study which is nearly 10 years old.

My response, maybe it is an accident ? Michael Mann has released some code (said to be used in the preparation of MBH98, see earlier threads) recently and that code does calculate the r2 statistic as part of its operation. Mann certainly claims to have calculated r2 for the 1820 step as he published the results. Mann did not publish the r2 calculation for other steps, and using the closest emulation possible so far (not total replication, that has not been possible, but Mann & others claim that that the extremely similar W&A emulation vindicates their approach) the r2 for at least some of those steps is very close to 0.

So are you suggesting:

1. Mann did not calculate the r2 statistic for other steps and hence was unaware of such results; or
2. The r2 was calculated for the other steps, they are not unfavourable, even though not released; or
3. The r2 was calculated, they were unfavourable, they were not released, and that is acceptable behaviour.

If 1, then Mann has released the incorrect code, and somehow he calculated the r2 for just the 1820 step and ignored that check for the other steps. That is yet another major mistake on his part, the release of the wrong code portion. If 2, then I’m afraid you need to show some evidence for that belief, as all the evidence available contradicts that point. If 3, are you sure that’s acceptable ?

So which do you favour ? It is true that Mann did not publish r2 for the 1400 and other steps, it is true he published it for the 1820 step, it is true that he has released code that at least on the face of it shows that if that code was used then r2 was calculated.

So the possibilities are a major mistake, accidental omission, or deliberate omission. Unless you have some other suggestion, which do you favour ?

I regret having to be so blunt, but so many apparent AGW partisans who come to this blog simply refuse to accept the data that is right there in front of them. You need to confront the possibility that MBH98 is deeply flawed and should be withdrawn, or present evidence that refutes the points on offer. The relevance of MBH98 is discussed elsewhere.

with the evidence at hand I do not have an opinion as to why Mann did not include R^2, though note that such an exclusion is common place in climate verification. You certainly have an opinion but rather weak, and at best, circumstantial evidence to support it.

It would appear that you believe MBH98 to be rather more important than most climate scientists. It is now nearly 10 years old, and dozens of subsequent papers have appeared which broadly support the conclusions drawn in the origional paper. The modifications made to the general insight yielded by MBH98 are small fry in comparison to the changes that have occured with the MSU data, for example. Surely it is time to deal with the science at hand, rather than harping back with the benefit of hind-sight to attack papers nearly a decade old.

David, you say that the exclusion is “commonplace in climate verification”. Can you provide some examples (other than Luterbacher et al 2004 which may yet to prove to have some hair on it) ?

I’ve posted from time to time on major problems with Jones et al [1998], Crowley and Lowery [2000] and Briffa et al [2001]. These papers are typically not “independent” either in terms of authorship (due to overlaps of coauthors) or in terms of core proxies (e.g. the problematic bristlecones, Polar Urals, Dunde series).

MBH was applied in Canadian policy in 2002 and 2003, continues to be employed in the various spaghetti graphs.

Do you agee that it is possible for there to be spurious RE statistics? Can you provide an explanation for why the cross-validation R2 statistic was not reported in the SI? Can you explain why the IPCC said that this reconstruction had “skill” in its cross-validation statistics, knowing that it had a cross-validation R2 of ~0?

David, it is as I feared. The “weak and circumstantial” evidence is right in front of your face if you will but look at it. You did not answer any of the questions, you are not of course obliged to, but I suspect you won’t because the answers are not to your liking. Or how about Steve’s questions ?

You are, I must say, following exactly the classic, standard, AGW partisan’s defence. Claiming the evidence against MBH is “weak and circumstantial” when it is neither weak nor circumstantial, refusing to look at that evidence or answer any relevant data related questions about it, then as a fall back, “It is now nearly 10 years old, and dozens of subsequent papers have appeared which broadly support the conclusions drawn in the origional paper”. Yep, and those papers use the same questionable data.

Challenge: If the r2 was calculated AND it was unfavourable, would it be ethical to publish claiming statistical significance and skill when you possessed evidence to contradict your claim ? Simple question, any chance of a simple answer ?

Re #31 Magnus the question is not one of whether or not there is warming. Rather it is the cause(s), extent and significance of the warming. In the specific MBH case, is the present warming unprecedented? That is the case MBH were making and the IPCC exploiting. M&M have demonstrated that the MBH results are invalid and do not support that conclusion. Interestingly none of the subsequent studies that I have seen support that conclusion either. If the present warming is not unprecedented, and in fact was likely exceeded in the MWP without deleterious consequences, then the case for both AGW and Kyoto become very doubtful. Note also that M&M are not saying that the MWP was warmer, they are just saying that the case for unprecedented warming cannot be supported by the evidence presented. What makes the issue more important is that exposing all of the details of the specific MBH paper ends up illuminating a lot of questionable practises in Climate “Science” that casts a degree of doubt on the whole field, and demands a much more objective, rigorous and inclusive approach . Murray

David, care to answer the questions ? Care to actually look at the “weak and circumstantial evidence” ?

The problem is, the data against MBH98 & 99 is so strong, no active AGW proponent (and that includes Realclimate) will actually argue about the data. They will only make semantic distinctions, pick at ones funding sources, or attempt to dismiss the whole matter either as “weak and circumstantial” or as “nearly 10 years old and others have published similar results”.

Fine, if you must, is it time to declare victory ? MBH 98 & 99 now totally discredited ?

1. Bristlecone Pines ARE a temperature proxy
2. The Gaspe Cedars are a reliable proxy series
3. That without BCPs and Gaspe the MBH reconstruction still shows a similar shape and leads to a similar conclusion
4. That RE alone is an adequate measure of the reconstruction
5. That having calculated a statistic that undermines your case, leaving the statistic out of your paper is ethically acceptable

If not, abandon MBH98 & 99, move on, and let’s take up the next questions. Other posters such as Paul Golsing have accepted that MBH no longer holds water, fine, now the debate can proceed.

If not, abandon MBH98 & 99, move on, and let’s take up the next questions.

“We are speaking about the shaft of the hockey stick, not the blade,” says von Storch. “We have no conflict about anthropogenic warming. That’s not the point.”

“This pattern is clearly common to every single proxy and model reconstruction produced, a fact that is often overlooked or ignored by people discussing climate change in the last millennium,” said Heinz Wanner, moderator of the discussion and head of the Swiss Climate Research Center.

So Danzdero, care to answer the questions ? See if you can do better than every other AGW proponent. Answer or accept the verdict, MBH98 &99 are dead in the water.

Oh, that first link, to Environmental Science and Technology, “After this fleeting brush with fame, McIntyre retreated to Canada and began a more aggressive attack on the hockey stick. He launched a blog to attract attention to his research and created a website where he posted his manuscripts that had been rejected by Nature. But in early January of this year, he finally had a paper accepted into a real science journal”¢’¬?Geophysical Research Letters (GRL).” I mean, let’s be nice, polite, and even handed about this. Ho ho ho.

For what it’w worth, I didn’t start this blog following our first article, but well over a year later. Mann started his blog first and devoted much of his initial attention to preemptive strikes on our articles. John A. suggested that I should try blogging and so I did. I’ve been surprised at the amount of interest and hits – it keeps me working at it.

It never ceases to surprise me that eminent climate scientists spout off about “quacks” and “garbage” and insinuate connections with ExxonMobil and then claim that we’ve made ad hominem attacks. On the blog, I do make fun of the Hockey Team every so often, but they surely dsserve a bit of satire; by and large, I think that I’ve got a pretty good record of staying away from ad hominem attacks, whereas Mann and Crowley and others have made scurrilous attacks against me.

So Danzdero, care to answer the questions ? See if you can do better than every other AGW proponent. Answer or accept the verdict, MBH98 &99 are dead in the water.

Thanks for your reply, Edzero.

I think it’s quite clear that an original work has been improved upon. But more to the point, what is the value of being stuck in the past, talking about the past?

Don’t you think your time would be better spent using your prodigious powers of analysis on, say, the latest work in the field? What about those shoddy analyses of satellite data? There’s some low-hanging fruit there that amateurs with no experience in the discipline oughta be able to contribute to the science. The balloon data have been reanalyzed too. Lots of opportunity in the present!

Dano, it is my impression that many, if not virtually all, climate scientists have received no training in the statistical properties of highly autocorrelated series and have not endeavoured to remedy this lack of training by reading up-to-date professional literature on the topic in statistical journals (which is often more technical than many of them are used to). These “amateurs” (e.g. Briffa, Jones, Bradley etc. etc.) nonetheless regularly contribute to academic journals, where their submissions are “peer reviewed” – typically by other “amateurs” , who also lack any training or expertise in the statistical properties of autocorrelated series. I presume that these are the “amateurs” to whom you are referring.

From a statistical viewpoint, “I lean in favor of Mann,” says statistician George Shambaugh of Georgetown University. “There is an increase in the 20th century that is greater than the cyclical patterns found by either group since 1550. And since the early 1900s, we have been hotter than any time since then.”

But your comment: many, if not virtually all, climate scientists have received no training in the statistical properties of highly autocorrelated series and have not endeavoured to remedy this lack of training by reading up-to-date professional literature on the topic in statistical journals is likely true across most scientific disciplines.

There are plenty of opportunities to find errors in, say, medicine. We could have a, say, “Vioxx audit” website where we analyze the results of tests to the lil’ bunnies and the human subjects, and then the tests the FDA saw and didn’t act on. Or you could look at, oh, chemistry. Or crash test results. Or the percentage of times economists are right yet we act on their output anyway.

You may have started a cottage industry, Steve: blankaudit.com. Fill in the blank with your hobby of choice.

But seriously, you may want to start a survey in the sciences. See how many authors seek the counsel of statisticians when crunching on, say, SPSS or just looking at the data. Or you could survey the stats departments of Unis with Atm departments and find the percentage of folk who help out with papers. Or just how many folk help out with papers in general.

Or you could survey the stats departments of Unis with Atm departments and find the percentage of folk who help out with papers. Or just how many folk help out with papers in general.

Dano, there is nothing wrong with making a mistake, especially when you are out of your field and couldn’t get a mathematician to help you out.
The problem is refusing to admit to having made an error.

Dano, I can’t comment on other disciplines other than econometrics. Issues of autocorrelated series are very important in econometrics and any graduate student is specifically trained to be at least aware of the problems. The possibility or probability that there are statistical defects in other fields hardly makes it less likely that there are statistical defects in climate science studies. This may be a valid investigation, but I can’t study everything in the world.

Hey Edzero, the answers are like arguing the importance of whether the hood bolts on a Model T are 1/2 inch or 5/8. Who cares when I’m working on a Honda Civic?

Sure, the groundbreaking paper didn’t get everything right. You keep wanting to smack the dog with a rolled-up newspaper, but the poor beast died three years ago. Get a new dog. I’m partial to golden retrievers, myself.

Danzero, thanks for the acknowledgement that MBH98 & 99 are dead ducks. Care to tell you friends at Realclimate of your conversion ?

I acknowledged no such thing, and your argumentation tactics don’t work with me, lad.

The science has moved on but the general conclusion holds. You don’t see anyone with a Wegener-bashing fetish, do you? You should move on too, Ed – there are plenty of recently-published papers awaiting your particular brand of, er, analysis.

Danzero, I am simply trying to apply your logic, twisted though it is. The mere admission that anyone on the AGW side can admit that MBH has errors is, after all, a major breakthrough. So MBH “didn’t get everything right” and “the general conclusion holds” ? OK, so the paper is wrong somewhere, and yet it is right ? You are rather having a “Dan Rather” moment, you know, MBH is factually inaccurate, but it deserves to be right.

Ed: No, I would say that as opposed to a Dan Rather moment you should think it of a von Storch moment. You know: “We sent in a comment that the glitch [McIntyre] detected in Mann’s paper is correct, but it doesn’t matter,” von Storch says. “It’s a minor thing.”

>Geophysical Research Letters (GRL).” I mean, let’s be nice, polite, and even handed about this. Ho ho ho.
>Comment by Ed Snack

Ed, have you actually read and understood the GRL paper? Prehaps you might like to explain why only the first PC/EOF is considered in the calculations of PDFs…. didn’t Mann use 2 and the “conventional” PCA analysis require 4 PCs to capture the same fractional variance? Prehaps you might enlighten us what with PDFs of the PC2, PC3, and PC4, and PC1+PC2 and PC1+PC2+PC3+PC4 are. Surely, this is the key issue… The GRL paper is a nice distraction, though.

>Fine, if you must, is it time to declare victory ? MBH 98 & 99 now totally discredited ?

I entered this discussion to see whether this site was really about science. I think you have pretty well answered the question. There ain’t victory in science, this is not some battle between Mr Right and Mr Wrong. I appreciate likening global warming to some war in which victory is to be sought may appeal to your system of beliefs, but most of us just view it as just plain silly.

>David, on confidence intervals, no. The MBH98 confidence interval calculations are a mess. I’ll just pick up one point for now. They calculate it as 2 times the standard error in the calibration period. A more reasonable calculation would be 2 times the standard error in the verification period.

So… Mann’s big statistical sin is not using the more reasonable method….

being interpreted as Dano, are you agreeing that MBH98 was wrong . At least we get a seemingly honest clarifying question.

But then we have someone else not bothering to clarify the statement with a question and just stating thanks for the acknowledgement that MBH98 & 99 are dead ducks . Anyone able to use the scroll button can see I acknowledged no such thing. Besides, I’ve been using dog and car metaphors, not duck metaphors.

Then we get fake confusion that MBH “didn’t get everything right” and “the general conclusion holds” surely means that if something is wrong, then apparently the whole paper is wrong. I hope this isn’t considered the pinnacle of analysis.

So what’s the deal Dan0? Is this your week in the barrel per the Hockey Team Protocols? All you’re doing is wasting everyone’s time as you’re unwilling to actually discuss the subjects at hand. Some weeks it’s mostly Peter Hearnden, others John Hunter, sometimes Tim Lambert and now you.

Surely you know Steve has posted complaints about every one of the other multi-proxy studies, with possibly one exception. Essentially every one of them either doesn’t have the actually data specified available or uses the Bristlecones as a temperature proxy or both. Therefore to claim things have moved on and so should Steve is disengenuous or impossible or both.

BTW MBH is still used in the spaghetti graphs, including most recently Richard Kerr in Science.

Let’s be more polite to Dano. He’s obviously no supporter of mine and has spoken harshly elsewhere (as has Peter Hearnden), but I think that he’s making a conscious effort to avoid a sci.environment flame war and is making points that he believes in. I would appreciate it if others resisted the temptation to flame. I realize that lines are hard to draw, but everyone can draw back a little on their rhetorical flourishes.

David, can you even try to answer the query as to whether you consider that the Bristlecone Pines are a temperature proxy ? Get that sorted and maybe it is worth worrying about the number of PCs to include.

Steve, in deference to your wishes as proprietor of this blog, I shall ignore Dan, ducks, cars, misdirections and all.

1. The subsequent papers in the depiction I linked to show more variability than the original paper.
2. The subsequent papers in the depiction I linked to agree with the original paper that current warming is greater than in recent past.

That is what the depiction I linked to shows. None of this is new, or news. Nor has anyone here discovered 1 or 2.

Surely you know Steve has posted complaints about every one of the other multi-proxy studies, with possibly one exception. Essentially every one of them [has an issue]. Therefore to claim things have moved on and so should Steve is disengenuous or impossible or both.

isn’t quite right – I’m saying there’s an inordinate focus by some posters on a paper that has been passed by in the community; the opportunities for…er…auditing are in the latest papers. Namely, Moberg et al. and the S&C data analyses. As this is climateaudit and not MBHaudit, I’m pointing out the box. Why there is an effort to argue whether the hood bolt of a Model T is 1/2 or 5/8 is silly; sure, some still wax poetic about kissing their first girl in the rumble seat, but more people look at the sleek lines of the RX8.

Steve:

Thank you. My daddy told me to behave when in someone else’s house. My sub for Science is an e-sub – I can’t find the Kerr article you mention that contains the spaghetti graph (the latest is about Pielke Sr) – Kerr just writes the blurbs, BTW, & likely doesn’t control layout or content – that’s an editor’s job.

Yes this is Climate Audit. Now try thinking about what an audit is. Can you imagine what would happen if an auditee told the auditor, “We’ve moved beyond the 1998 annual report. Audit our 2002 report instead”?

An Audit, practically by definition, must be carried as far as possible, preferably until all descrepencies have been reconciled. This is precisely the problem with MBH98, the auditees refuse to turn over the last of the books so that the audit can be finished. When that is done, the final report can be written.

Now please note, nobody expects that MBH98 or any other paper is the final word on the subject. But you can’t have a moving target.

Now if you want something practical to do, provide references, preferably ones which can be accessed on-line, which show why Bristlecone Pines are usable as a temperature proxy. This will be useful not just in evaluating MBH98 but most of the later multi-proxy temperature reconstructions as well.

Now if you want something practical to do, provide references, preferably ones which can be accessed on-line, which show why Bristlecone Pines are usable as a temperature proxy. This will be useful not just in evaluating MBH98 but most of the later multi-proxy temperature reconstructions as well.

My mom said to always ask ‘please’ when you are having someone do work for you, esp. if you don’t want to do it yourself and expect someone else to do it for you. Anyway,

Gosh the paleo people might know something about the implication of your demand. But what about the folk who use the tree for other things as well, not just multi-proxy temperature reconstructions? Are their results spurious too?

Anyway, you may be on to something, Dave, because there’s a whole buncha folk who are not aware of the things you apparently are aware of, and are led woefully astray:

Two independent calibrated and verified climate reconstructions from ecologically contrasting tree-ring sites in the southern Colorado Plateau, U.S.A. reveal decadal-scale climatic trends during the past two millennia. Combining precisely dated annual mean-maximum temperature and October through July precipitation reconstructions yields an unparalleled record of climatic variability. The approach allows for the identification of thirty extreme wet periods and thirty-five extreme dry periods in the 1,425-year precipitation reconstruction and 30 extreme cool periods and 26 extreme warm periods in 2,262-year temperature reconstruction. In addition, the reconstructions were integrated to identify intervals when conditions were extreme in both climatic variables (cool/dry, cool/wet, warm/dry, warm/wet). Noteworthy in the reconstructions are the post-1976 warm/wet period, unprecedented in the 1,425-year record both in amplitude and duration, anomalous and prolonged late 20th century warmth, that while never exceeded, was nearly equaled in magnitude for brief intervals in the past, and substantial decadal-scale variability within the Medieval Warm Period and Little Ice Age intervals.

To build a tree-ring chronology that reflects past temperature variability, upper treeline
Bristlecone Pine (P. aristata), whose ring-width variability is a function of
temperature, were sampled at timberline in the San Francisco Peaks (à⣃ ’ à⺳,536 m),
where temperature is most limiting to growth (Figure 2). Increment core and sawed
samples were collected from living and dead Bristlecone Pine on both Agassiz
Peak and Humphreys Peak. Long chronologies were constructed by crossdating the
deadwood samples with the living tree specimens. Prior to AD 659 the chronology
is composed entirely from deadwood material. The individual growth rings of each
sample were measured to the nearest 0.01 mm. The measured series were converted
to standardized tree-ring indices by fitting a modified negative exponential curve,
a straight line, or a negatively sloped line to the series. This process removes the
age/size related growth trend and transforms the ring-width measurement values
into ring-width index values for each individual ring in each series (Fritts, 1976).
Several statistics were calculated to gauge the reliability of the tree-ring series
(Cook and Kairiukstis, 1990; Wigley et al., 1984) (Table I). [pg 468]

For the precipitation reconstruction, the October through July precipitation climate
series was log-transformed prior to the calibration. This procedure results in the
climate data being more normally distributed and more linearly related to the treering
data. A stepwise multiple linear regression model was developed using the
“standard” (Cook 1985) tree-ring chronologies from Flagstaff, Navajo Mountain,
and Canyon de Chelly. The pool of potential predictors includes nine variables: the
three chronologies lagged à⣃ ’ ‘1, 0, and +1 years from the precipitation year (previous
October–July). Predictors were allowed to enter the model stepwise in order
of importance until R2 reached a maximum and root-mean-square-error (RMSE)
a minimum. Three predictors were used in the final model: The three chronologies
at lag 0. The inclusion of additional predictors beyond these three does not
increase calibration R2 or decrease RMSE substantially, indicating that predictors
four through nine do not improve the quality of the reconstruction.
Validation was done using the predicted residual sum of squares (PRESS)
method (Weisberg, 1985). Crossvalidation statistics indicate a successful reconstruction.
The validation reduction of error statistic (RE), which is analogous to
calibration R2, is 0.71. Any positive value of this statistic indicates that the model
does a better predictive job than the calibration period mean. Also indicative of a
good reconstruction, the validation RMSE remains low and does not differ much
from the calibration RMSE in this three-predictor model. Analysis of residuals
does not indicate any violations of model assumptions. The residuals are independent
of both predictor and predictand values. They are essentially normally
distributed and show no apparent trends. The Durbin-Watson statistic demonstrates
acceptable autocorrelation in the residuals. The regression model explains 74% of the variance in the precipitation with a calibration period of 1896–1988 (Table II) [pg 470]
(Figure 3B).

Abstract: From 1900 to 1993, latewood frost rings occurred in 1903, 1912, 1941, 1961, and 1965 in 10 to 21% of the sampled bristlecone pines at Almagre Mountain, Colorado. In early to mid September in each of those years, a severe outbreak of unseasonably cold air from higher latitudes produced a memorable or historic late-summer snowstorm in the western United States. Record subfreezing temperatures during these snowstorms probably caused the latewood frost rings, shortened (by about 1 mo in 1912) already colder than normal growing seasons, and caused crop damage in parts of the Western United States. Latewood frost rings recorded in relatively high percentages of the sampled trees (such as the 1805 event in 61% of sampled trees) were probably caused by multiple severe outbreaks of unseasonably cold air from higher latitudes that occurred from early September (possibly as early as mid- or late August) to mid-September. Analyses of 1900-1992 temperature data for two widely separated Colorado stations, Fort Collins and Colorado Springs, show that average summer (June-September) temperatures during latewood frost-ring years in this century were 1.5 and 2.0 degrees C cooler than normal, respectively.

this looks like you can use latewood to ID individual cold years in the record. That sounds useful.

The ratio of the actual ring widths to these expectations produces a set of dimensionless indices that can be averaged arithmetically with cross-dated indices from other trees into a mean chronology suitable for studies of climatic and environmental change. We show that tree-ring indices calculated in this manner can be systematically biased. The shape of this bias is defined by the reciprocal of the growth curve used to calculate the indices, and its magnitude depends on the proximity of the growth curve to the time axis and its intercept. The underlying cause, however, is lack of fit To avoid this bias, residuals from the growth curve, rather than ratios, can be computed. If this is done, in conjunction with appropriate transformations to stabilize the variance, the resulting tree-ring chronology will not be biased in the way that ratios can be. This bias problem is demonstrated in an annual tree-ring chronology of bristlecone pine from Campito Mountain, which has been used previously in global change studies. We show that persistent growth increase since AD 1900 in that series is over-estimated by 23.6% on average when ratios are used instead of residuals, depending on how the ring widths are transformed. Such bias in ratios is not always serious, as it depends on the joint behaviour of the growth curve and data, particularly near the ends of the data interval. Consequently, ratios can still be used safely in many situations. However, to avoid the possibility of ratio bias problems, we recommend that variance-stabilized residuals be used.

Looks like they found some useful issues in analyzing the data. Fortunately it’s been known for some time now.

Last,

These were among the refs in the papers I included above:

o Epstein, S., Krishnamurthy, R.V., 1990. Environmental information in the isotopic record in trees. Philosophical Transactions of the Royal Society 330A, 427–439. [uses stable isotopes in P. aristata as a qualitative indicator of temp]

Well, I hope Steve comments on your post as he has mentioned some things you’ve presented here which actually argue against Bristlecomes being useful as temperature proxies. I wish I had more time just now, but we’re going camping this weekend and leave in an hour or two, so I have to got finish packing. I’ll be back in the loop, Tuesday.

In our E&E 2005 article, we discuss nearly all of these studies, even the Mooney and Schulze articles from the 1960s. For example re Schulze et al.

For example, bristlecones are famously long-lived, but despite this, do not appear to senesce [Lanner and Connor, 2001; Connor and Lanner, 1991]. They occur in an unusual strip bark form, where the bark in most long-lived trees dies around the circumference except for a small strip on one side. Unlike most pines, they continue to respire during the winter thereby consuming photosynthate [Schulze et al., 1967].

Here’s a reference to Mooney et al 1964:

Graybill and Idso [1993] attributed the anomalous 20th century growth of strip-bark forms to CO2 fertilization There are some possible reasons why CO2 fertilization may affect high-altitude strip bark forms more strongly than lower-altitude entire-bark forms and there is specific evidence for CO2 fertilization for vegetation from the White Mountains, California, where important bristlecone pine stands are located [Mooney et al., 1964]. The response to changes of CO2 concentration in controlled experiments is strongly non-linear and attenuates as CO2 levels increase. CO2 levels at the high altitudes of bristlecone pines (3000-3500 m) are significantly lower than at sea level and, at the lower CO2 levels at high-altitude, the response to increased CO2 levels is in a range with stronger response.

Here’s the most relevant quote which mentions Fritts, Mooney and others:

Within bristlecone pine literature, LaMarche and Stockton [1974] pointed out that high-altitude bristlecone pine stands have both a lower limit and upper limit and argued that bristlecone pine growth at the lower limit was controlled by precipitation and at the upper limit by temperature. Hughes and Funkhouser [2003] found regional correlations among high-altitude bristlecone pine growth, which they attributed to regional climate, but still concluded that the anomalous 20th century growth was a “mystery”. Even in upper limit stands, the bristlecone pine stands in the PC1 are located in semi-arid regions and the bifurcation in LaMarche and Stockton [1974] may be overly simplistic. Studies of actual bristlecone pine growth have shown that it is limited by soil moisture [Fritts, 1969; Beasley and Klemmedson, 1973]. Even in higher stands, their principal botanical competition in many locations is with big sagebrush [Wright and Mooney, 1965; Mooney et al., 1964] with bristlecones outcompeting big sagebrush on moister dolomite substrate. This effect is vividly illustrated by Figure 2 of Wright and Mooney [1965], where the sharp geological contact between the dolomite and sandstone is clearly shown by the change from bristlecone pines to sagebrush at the same elevation. The same effect is also probably shown in the charming 19th century painting (Figure 7), where a sharp change in vegetation at the same elevation is easily observed. There is evidence that higher moisture levels in the 20th century in the American Southwest accounted for high growth rates in New Mexico [Grissino-Mayer, 1996; D’Arrigo and Jacoby, 1991], where two of the LaMarche and Stockton [1974] sites are located. The effect may extend to other locations. In the classical bristlecone pine sites of the White Mountains, where a weather station operated close to Sheep Mountain and Campito Mountain from 1954 to 1980, records show low ring widths correlate to drought, even in upper limit stands. Mann and Jones [2003] pointed out that precipitation proxies need to be carefully distinguished from temperature proxies and a complete demonstration that these effects have been separated in bristlecone pines is obviously required. Williams [1996] reported that a continuous climate record since 1951 at Niwot Ridge in the Colorado Front Range, near a bristlecone pine site, showed a decline in mean annual temperature and an increase in annual precipitation amount.

We didn’t mention Scueria, but I was familiar with the article. We mention cook and Peters 1997 as follows:

Finally, there may even be problems with the site chronologies as indexes of actual growth. Cook and Peters [1997] pointed out that conventional dendrochronology techniques resulted in a bias in 20th century results at Campito Mountain, one of the Graybill sites in the NOAMER PC1. Presumably the same effect applies to other bristlecone pine sites.

and again here:

Cook and Peters [1997] discussed above, explored spurious end-of-sample growth bias as an artifact of tree-ring chronology de-trending. Amazingly, in addition to the Campito Mountain bristlecone pine site, their other main example was the Gaspé series (cana036). In order to eliminate this bias, the underlying tree ring chronologies would have to be re-calculated, a calculation which would have the effect of reducing its hockey-stick shape, with implications that stand alone from any of the other issues raised in this paper.

While the Cook and Peters issues have been known for some time, they were not considered in MBH98 calculations (or other multiproxy calculations).

Salzer’s article was not out at the time of our publication. I got a pdf of Salzer last week, but haven’t gone through it yet. My recollection of the San Francisco Peaks chronology is that it did not have the big hockey stick bend of the bristlecone sites in controversy e.g. Sheep Mountain and Campito Mountain, but I’ll have to check this and will correct this point if my recollection is incorrect. I haven’t seen a URL for Salzer’s dissertation. Sometimes people put their dissertations on line and I’ve found some dissertations much more informative than the final articles. Dano, you’d probably have more luck than me in getting a pdf copy of Salzer’s dissertation from him; if you do, I would like to look at it.

Anyway, we consulted the older references for our EE article. Our comments on bristlecones were reviewed by a knowledgeable and published author in the field prior to pulication, who complimented us on our identification of the Graybill-Idso bristlecones in Mann’s PC1 and was absolutely thunderstruck that these bristlecone series were essential to the Mann reconstruction. I’ll review the Salzer article and post a separate comment on it next week as it is new and it seems relevant.
Cheers, Steve

>David, can you even try to answer the query as to whether you consider that the Bristlecone Pines are a temperature proxy ? Get that sorted and maybe it is worth worrying about the number of PCs to include.

Ed, I have made it clear from the start that I am not a paleoclimatologist (and neither are you!). I do not have the knowledge to judge whether the Bristlcone Pines are a temperature proxy or not (and…..) . Besides, this is implicitly captured im the reconstruction technique anyway. Your criticism of MBH98 are pretty weak, if they simply centre on one proxy set, which is important only for the first ~100 years of their reconstruction. Even if you are right, doesn’t that still make the other 500 years of the reconstruction Hockey stick like? and the previous period… well that has been filled in by subsequent studies, which even our resident “statistician” hasn’t yet faulted in the literature. Prehaps you might open your eyes a little, and see the bigger picture.

I look forward to hearing from you (or Steve) about the PDFs of the other PCs. Of course, anybody with the slightest knowledge of PCA knows what the answer is (and I suspect you won’t like it).

David

PS

Regarding the non use of R^2, suggest you google myself for a couple of examples, or Drosdowsky and Chambers. These are just some examples close to home, and no reviewers complained about our non-use of R^2.

I don’t have the time to ask Salzer for his paper, Steve – I have a research proposal to consider this weekend, and if I accept it you won’t see me here or at Tim’s place either, so…no rest for the weary. And all this is off the top of my head on a Friday, so hopefully it’s coherent:

Anyway, I see the EE paper is about the variability in the shaft; I have no issue with an original paper being found to have room for improvement. Wegener had fingers wagged at him because he proposed no mechanism for drift.

Subsequent papers have indicated that the MM reconstruction in those time periods you take issue with was closer to the error bars than the center line. This is not news. Nor is it news that early work gets improved upon by those who follow it. That’s how it works, and I’m glad it works that way.

If you are trying to dispense with the icon, great. It appears as if the shaft needs to be bent, and the issue is with the amplitude of the bend. Fine. No one, however, is stating that there’s a problem with the blade or it’s length; I don’t see you arguing that either (mind you, I only have so much time in the day). That is, no one in ISI is disputing blade length, it’s shaft amplitude (your paper is not in ISI, so no one can see it); if you have had inquiries from experts in the field regarding your paper, I look forward to their responses and subsequent investigations based on your paper. Yours is the first, and thus not the final word, despite the vigor of some Jack Russell terrier-like comments.

But to the specific Pinus l./a. issue [BTW, they are among my favorite trees, and that sheep piccie on pg 87 is an interesting discussion about exotic grasses, fire and nutrient cycling changes due to European grazing influences – for another time]. As I read the excerpts above (thank you), it is not that Pinus l./a. is not a good proxy, it is that there are problems with it. There is a difference – surely the annual rings reflect environmental conditions. Your issue is the CO2 fertilization issue, which complicates usage, not negates it.

So, if I understand your argument, you dispute the temp reconstruction in the past, due to specific issues with the data that require additional calibration. The result is your Fig 1 in the EE paper. As I caution everyone, it is not wise to point at one thing and say ‘a-ha’. We see the wisdom in this even in the MBH, as subsequent papers show the shaft shouldn’t be a shaft; so it is here – only subsequent confirmation or denial will allow commenters to confidently assert one way or another.

The d13 paper used in your other post to affirm your view, in my view, merely states that neither water-use efficiency nor cambial growth rate can be a sufficient indicator for changes in the biomass of natural forests, which means, to me, that plant rxn to elevated CO2 isn’t reflected in the cambium (which is what they looked at – they didn’t dig up fine roots over a period and measure the difference). These plants are hard to study because of their slow growth rate. There may be something in their metabolism or apportionment that is different than in other plants, and the author’s conclusions can’t be used to blow up to different forest types (they are a little sloppy here, Ed Snack, but this sloppiness doesn’t negate the findings of their paper).

Again, the papers used in these discussions indicate, to me, the difficulties in using Pinus l./a. as a proxy, not the impossibility of it. Nothing here shows Pinus l./a. shouldn’t be used as a proxy – you just have to do some more stuff to the data to make it work.

Alright, I gotta get this paper pile off my desk Steve. Thanks for the discussion & depending I may have some time next week,

Dave, as to the other studies, I’ve looked at most of them and have posted up some major problems with several of them. I ackowledge that I have not published in the literature on them yet, but I hope to remedy that; I’d hope to have done so by now. Blogging doesn’t help get it done unfortunately.

On the R2 statistic – there are a couple of reasons why it’s important. First, Mann’s methods tend to produce spurious RE statistics, so it’s important to examine R2 as a cross-check to ensure statistical validity – this is no different than looking at Durbin-Watosn statistics. Secondly, if you’ve calculated adverse results, then, in my opinion, you have an obligation to report them regardless of whether the referees are attentive to the presence/absence of the information. I feel very strongly about this. This undoubtedly reflects my experience with prospectuses where promoters are required to sign affidavits declaring “full true and plain disclsoure”, which means disclosure of adverse results. The obligation exists regadless of whether a securities commission thinks about asking for the data. No one has ever given me a reason why scientists should have lower disclosure standards than mining promoters. So in this case, I judge Mann by standards applicable to promoters. He knew about the adverse information; he failed to disclose it. I think that it’s evident that he knew that disclosure of the R2 of ~0 would have very bad effect on public acceptance of his results. So it’s hard for me to find an innocent explanation of this. If he felt that the R2 statistic was unimportant, then, his obligation was still to disclose it, provide the readers with his argument as to why it didn’t matter and let the reader decide. I think that he knew what the reader’s reaction would be and intentionally took the decision out of the reader’s hands/

I think that it’s evident that he knew that disclosure of the R2 of ~0 would have very bad effect on public acceptance of his results. So it’s hard for me to find an innocent explanation of this. If he felt that the R2 statistic was unimportant, then, his obligation was still to disclose it, provide the readers with his argument as to why it didn’t matter and let the reader decide. I think that he knew what the reader’s reaction would be and intentionally took the decision out of the reader’s hands/ [emphasis added]

I started reading the climate journals in the early 80s and Science/Nature probably late 80s. I’m sometimes a geek, but hey.

I didn’t scrutinize MBH98, and I’m more than a casual reader. Most people skim journal articles and don’t read that deeply.

I would guess maybe 50-75 people on the planet scrutinized it the year it came out, 2/3 of them grad students because they were told to. I have a hard time with this phrase.

I didn’t scrutinize MBH98, and I’m more than a casual reader. Most people skim journal articles and don’t read that deeply.
I would guess maybe 50-75 people on the planet scrutinized it the year it came out, 2/3 of them grad students because they were told to. I have a hard time with this phrase.

Hmmm. So why do people make such a big deal about the consensus among published climate science professionals if this is all the attention they pay to such ground-breaking work ?

Let’s talk about the referees. This information was withheld them first. What would a referee at Nature have done with a reconstruction with an R2 of ~0, if the information had been displayed sufficiently prominently so that he addressed the issue. Would he have said – the R2 doesn’t matter? Or would he have said: look, there’s something wonky about this model.

If you were doing a prospectus, you’d have had to prominently disclose a risk factor along the lines of:

RISK FACTOR: Readers should be aware that the reconstruction contained herein badly fails many standard cross-validation tests, including the R2, CE, sign test and product mean test, some of which are 0. Accordingly, the apparent skill of the RE statistic may be spurious and the reconstruction herein may bear no more relationship to actual temperature than random numbers. Readers should also be aware that the confidence intervals associated with this reconstruction may be meaningless and that the true confidence interval may only be natural variability.

RISK FACTOR: Readers should be aware that the reconstruction contained herein cannot be replicated without the use of bristlecone pines. Some specialists attribute 20th century bristlecone pine growth to nonclimatic factors such as carbon dioxide or other fertilization or to nontemperature climate factors or to a nonlinear response to temperature. If any of these factors prove to be correct, then all portions of the reconstruction prior to AD1625 will be invalidated.

That’s what I mean by prospectus-like disclosure. I’ll re-visit this language over the weekend: I don’t know why I haven’t written prospectus-type risk factors before.

I have non intention what so ever to discuss the GW in depth on this blog, it is not my main field and I intend to do research in my field (which for the moment is in the mining industry). Thou I have read much in the field and my point stands. I have no reason to distrust all the research that have bean done in GW that points in the same direction. If McIntyre succeeds in his attempt and get published, and it looks like his arguments holds for a longer time I will of coarse believe him. Thou I don’t think he is there yet.

1. Hmmm. So why do people make such a big deal about the consensus among published climate science professionals if this is all the attention they pay to such ground-breaking work

You didn’t comprehend what I wrote. The small number scrutinizing WAS the professionals. You’ll remember that today’s science is very specialized. And they meet at conferences and have at least an idea of what each other is doing.

Plus, the standard with new work is: let’s see if anyone else comes up with the same thing – you want to see a result a few times before you pay close attention. But, the dozens of single-proxy papers, of course, said the same thing, so the finding was not new and thus not a surprise.

This paper integrated multi-proxy records to get more than a regional view – that was one of the reasons for the paper, to see if the larger picture was the same as the multiple regional pictures – e.g. the papers out of China, Siberia, etc. It tried to stitch together the regional findings into a larger finding. The finding per se was not new. The extent was new. Not the finding. The extent. Not the finding.

2. I would extend what Magnus said (and I agree with him) to say that it is important for more than one person to find a difference – look what happened to S&C.

They – and only they – early on went down their path because they were the only ones with access to the data. After more had access, their analyses were found to be wrong (and amateurs didn’t find the errors).

So, when we look at what is going on here, Steve needs to show, somehow, that ~ a dozen multiproxy papers are wrong, and the dozens of regional papers are wrong too.

The ~dozen multiproxy did not all do the same thing, nor the dozens of regionals. A daunting task, but not impossible. And if Steve shows this, well, good on him. I’m confident that most others will say the same thing, despite conspiracy theorist commenter’s implications to the contrary.

Best,

àƒ’?anàƒËœ

PS, I compose in HTML, but I like how the buttons are made available for others to use the language. It’s a nice feature, and makes it able for a wider array of folk to write with clarity.

Dano, the task is not quite as hard as you think. If I were a little faster worker and I weren’t trying to do too many things, I’d have it done by now. I’ve also been spending quite a bit of time updating my statistics; it doesn’t have an immeditate payoff re the multiproxy studies, but I like the math.

One issue common to all of them is proxy cutoff in 1980. If proxies in the 1990s are not going off the charts, then there is no guarantee that the proxy methods would pick up warm periods in the past. There’s evidence of upside-down U quadratic proxy relationships (see my note on TTHH). This would compromise all of these multiproxy studeis, which are based on linear relationships between temperature and proxy.

Here’s a quick status of what I’ve looked at, numbered relative to the Wikipedia spaghetti graph:
1. Jones et al 1998. This doesn’t hold up. I’ve posted up notes here on many issues and have drafted quie a bit of an article. Problems include Polar Urals, Tornetrask, Dunde.
2. MBH99 – enough said
3. Crowley and Lowery 2000. This doesn’t hold up. Problems include Polar Urals, Dunde, bristlecones, cherrypicking.
4. Briffa et al 2001. This cuts off in 1960. Detailed analysis is prevented by lack of identification of the sites by Briffa. In my opinion, citation of this study should be prohibited until Briffa identifies his sites.
5. Esper et al 2002. Most of the data is under lock and key at SO&P so it’s hard to get traction here. Again I think that this study should be deep-sixed until the data is archived.
6. Mann and Jones 2003. Problems include: bristlecones, cherrypicking, methodology. We’re into the same old Mann issues on methods. Whatever anyone may think about my credentials, I’m as good as anyone else at figuring out what Mann does and I haven’t figured this one out yet, so I don’t think anyone else has. Jones doesn’t even have the weighting factors.
7. Jones and Mann 2004. See above.
8. Huang. Only goes back to 1500.
9. Moberg. I got started on this last spring and obviously need to deal with it. It is not very robust. One of the proxies with high 20th relative to MWP is Yang, where the contrast comes from Dunde and Guliya. The other comes from a proxy showing increased cooling offshore Somalia, which is translated into a proxy evidencing strong global warming. He uses bristlecones, but only for high-frequency and inverts their sign.

Obviously guys like Esper and Briffa aren’t real keen on having me go through their stuff. It’s almost like it would take an act of Congress for them to produce their data.

the standard with new work is: let’s see if anyone else comes up with the same thing – you want to see a result a few times before you pay close attention.

So you are saying that when several people have got the same results, then you start checking to see whether they have done the job properly ? Or do you just assume that because several people have done it, it must be right ?

Back in #87
So are you accepting that all the multi-proxy studies Steve mentions are severely flawed in their execution ?
With regard to the single-proxy studies you refer to, please could you narrow the search a bit. Which of these single-proxy studies supports the hypothesis that the 1990s were the warmest decade of the millenium ?

So you are saying that when several people have got the same results, then you start checking to see whether they have done the job properly ? Or do you just assume that because several people have done it, it must be right ?

Is this an accepted tactic here, misstating what I say? Gimme a break. It means you don’t get excited over one result, you want to see a few similar results. Try going to Uni and reading hundreds of papers, you’ll get it then – people in that setting read many papers a week and get excited over few. Individual papers these days, if robust, are building blocks and rarely foundations.

So are you accepting that all the multi-proxy studies Steve mentions are severely flawed in their execution ?

Why would you state this…ah, yes. I need to talk to my editor – I didn’t word that very well, that’s a rhetorical intent that implies the sole voice shouting in the wind. Further explanatory background is my response just above. Can’t wait to get hit for that one.

With regard to the single-proxy studies you refer to, please could you narrow the search a bit. Which of these single-proxy studies supports the hypothesis that the 1990s were the warmest decade of the millenium ?

I don’t have them at hand, I have some at home. IIRC, there are some of these example in the S&B paper (which I presume you have), but they are Chinese stalactites, boreal spruce, Pinus syl., and at least one borehole that I can think of right now.

You’ll remember that the paper arose partially out of an older idea that the MWP might be regional because these single proxies had a strong signal in some areas and not others, so some folk thought that trying to get a global distribution of proxies was one way to test regionality (and a good way to overlap proxies to test how good they were, and to get more data…).

Dear DanàƒËœ
I must say that I am very impressed by your attempts to occasionally answer questions, and by your civility.

“Plus, the standard with new work is: let’s see if anyone else comes up with the same thing – you want to see a result a few times before you pay close attention.”
While this is a generalisation, it is obviously wrong. The cold fusion claims got worldwide attention, when no-one was able to repeat. The McLachlan claims in science got world-wide attention, but no-one was able to repeat. These were examples of claims which got vast amounts of attention, because their claim was so important. The opposite is also true; Langmuir’s pathological science lecture gives examples of methods which were published, widely accepted, and turned out to be pure fantasy.

“So, when we look at what is going on here, Steve needs to show, somehow, that ~ a dozen multiproxy papers are wrong, …”
You continually fly this canard, which is misleading. The attempt to replicate MBH’98 is about finding out what went on in MBH’98; and this is logically irrelevant to the truth, or otherwise, of other climate reconstructions which are independent of MBH’98. Likewise, whether there are a dozen, or a million other reconstructions with the same result, has no impact on whether MBH’98 is a good scientific study.
So in short, there is no need to disprove dozens of other studies, because that is irrelevant to what is actually being looked at; whether one particular study is good.“I think it’s quite clear that an original work has been improved upon. But more to the point, what is the value of being stuck in the past, talking about the past? “
This seems to miss the fundamental bedrock of science. That which discriminates science from religion, is the issue of replication. Scientific work must be capable of replication; religion relies on faith. And the issue here is if any of these studies has been directly replicated, and tested. If none of them are capable of replication, then they are all valueless; that is what science is all about. It is surprising that you would want to argue otherwise.
What we do know is that MBH as published had a seriously defective methodology, which precluded replication of the results; and no-one even noticed until M&M 2003. It seems you want to continue building an edifice of paper, paper stacked precariously on paper, when M&M are telling you that your building has defective foundations.
yours
per

I think that it’s evident that he knew that disclosure of the R2 of ~0 would have very bad effect on public acceptance of his results.

DanàƒËœ

I would guess maybe 50-75 people on the planet scrutinized it the year it came out, 2/3 of them grad students because they were told to. I have a hard time with this phrase.

It is very difficult to follow your logic. An R2 value of 0 effectively says that the results are not statistically significant, which MBH set out as a prerequisite for their method. If MBH had disclosed this statistic in his paper, then
1. the referees would likely have rejected the paper, since the conclusions depend upon data points that are arguably not statistically significant
2. if your 50-75 peers read the paper, and the paper itself says that its data are not statistically significant by one measure, and that the main conclusions of the paper hang on that result, that is going to seriously affect their evaluation of the study.
3. both 1 and 2 would feed through into public acceptance, since the media rely on comments and recommendations from peers.

I think your guess at 50-75 people “scrutinising” is pure fancy on your part. What is clear is that none of them spotted the many serious errors that were present in the original MBH, and told Nature or MBH. If they had, surely Nature or MBH would have instantly issued a correction ? So scrutiny may mean something different for you than for me.

So you are saying that when several people have got the same results, then you start checking to see whether they have done the job properly ? Or do you just assume that because several people have done it, it must be right ?
Is this an accepted tactic here, misstating what I say? Gimme a break. It means you don’t get excited over one result, you want to see a few similar results.

This was me trying to extrapolate from what you said, which is why it was phrased as a question. What I was trying to understand was, at what point do you feel that someone should check the work being presented in these papers ? You implied in the original post, and seem to be implying again here, that you never feel the need to check what you read in your journals.
You said that you only really pay attention after several people get the same result; to me, this implies you are not relying only on the peer-review process.
However, this is extrapolation on my part, so let me ask you explicitly :
When do you think that climate scientists should try to exactly replicate other climatologists’ work ?

With respect to :

So are you accepting that all the multi-proxy studies Steve mentions are severely flawed in their execution ?
Why would you state this…ah, yes. I need to talk to my editor – I didn’t word that very well, that’s a rhetorical intent that implies the sole voice shouting in the wind. Further explanatory background is my response just above. Can’t wait to get hit for that one.

I don’t understand your reply. Your post #87 has no answer for my #88; your post #85 precedes Steve’s post #86 where he points out the problems with most of the other major multi-proxy studies. So I ask again, are you accepting that all the multi-proxy studies Steve mentions are severely flawed in their execution ?

With respect to the single-proxy studies, when it is next convenient for you, it would be very kind if you could dig out some references to single-proxy studies that support the hypothesis that the 1990s were the warmest decade of the millenium. I don’t recall ever reading such a thing, so it would be very helpful. My thanks in advance.

fF, I see this passage as relevant (and all the time I have to address), and I appreciate your attempts at forthrightness:

This was me trying to extrapolate from what you said, which is why it was phrased as a question. What I was trying to understand was, at what point do you feel that someone should check the work being presented in these papers ? You implied in the original post, and seem to be implying again here, that you never feel the need to check what you read in your journals.

You said that you only really pay attention after several people get the same result; to me, this implies you are not relying only on the peer-review process.

However, this is extrapolation on my part, so let me ask you explicitly :
When do you think that climate scientists should try to exactly replicate other climatologists’ work ?

I’m working on a Sunday, fF, so this will be brief and hopefully not grumpy:

1. You implied in the original post, and seem to be implying again here, that you never feel the need to check what you read in your journals

I have a specialty. Everyone has a specialty. Specialists write papers. Sometimes teams get together to write interdisciplinary research. People outside a specialty probably don’t have the expertise to check papers. I wonder how many people have ‘checked’ that Bristlecone dC13 paper? Probably nobody. No one has time. You do your research and it extends or refutes work. You don’t pore over details of an already-written because you don’t have the time, unless you are using that paper as a reference or basis for your own work. It’s not done. I’m not saying this is good or bad, it just is. There seems to be a big misunderstanding of this reality among some commenters.

In this particular case (that is: the reason this website exists), you can see the number of gray boxes and say ‘they need more data there’. You can see the wide error bars and say there’s quite a bit of uncertainty there – it’s not hard. You say to yourself there’s a result and judge for yourself whether its robust or not. The ‘checking’ comes from more papers and their findings. After a few findings, you look again. You can see the ‘checking’ has found the shaft is not straight, but more like the error bars. This is not new, or news.

The paper was used as a datum for decision-making. Decision-makers make decisions this way all the time. Decision-makers make decisions with results from econ papers that have low r^2s and Ts all the time, because econ is only one way to predict human behavior. Decision-making is like making sausage.

2. What I was trying to understand was, at what point do you feel that someone should check the work being presented in these papers ?

When someone has time. See my comments above for the reality of this. The ‘checking’ comes from other papers being written.

3. You said that you only really pay attention after several people get the same result; to me, this implies you are not relying only on the peer-review process.

No, this implies I only have so much time and therefore one result doesn’t merit shouting from the rooftops.

4. let me ask you explicitly :
When do you think that climate scientists should try to exactly replicate other climatologists’ work ?

Nobody has time to do this, in any field. But to answer the question, climate scientists should try to exactly replicate other climatologist’s work when they have enough time to do so.

I’m researching a possible research agenda (not climate per se) for someone, outlining possible avenues for new research and giving a few test hypotheses for a direction. I’m hired to do work like this because researchers and academics are unbelievably busy and don’t have the time for everything. If the data are there, this person will pitch a research agenda and attempt to get funding for someone to do further research to test a hypothesis or two. Why am I doing this? Because no one has 1) thought of it or 2) no one’s gotten to it yet. I don’t know why. All I know is I’m doing it because no one else has the time to do it.

5. So I ask again, are you accepting that all the multi-proxy studies Steve mentions are severely flawed in their execution ?

No. I don’t have that expertise, so I rely on the folks who do for my information.

I’m saying Steve is the only voice saying these researchers are in error – and I don’t mean pundits or James Glassman-type shills authors. I take all single voices with caution and wait for other voices, like I said above. S&C was a single voice and they were wrong, like I said above. Their being wrong does not make Steve wrong, rather it sets the context for the argument Steve puts forth. Publish some papers and empirically derive how they are wrong, and have debate on that empiricism.

That is the scientific process. Humans made this process. If’n you don’t like the process, there are a number of good policy schools on this continent that would benefit from energy you can expend changing this process to ensure there is enough time to step back and revisit. Meanwhile, society rushes on without revisiting and we are pulled along with it.

Dano, I don’t disagree with you about the need for caution with single-voice argument, especially when the voice is, as you point out, not a recognized authority. Many of the people who agree with me do so much too quickly and these give me little satisfaction (but satisfaction all the same).

However, now that you’ve spent a little more time visiting, I think that you’ll agree that even if I’m wrong about MBH, I’m not trivially wrong. Otherwise I wouldn’t be still standing. I can’t imagine that any of Mann’s supporters are very happy about his explanations of the cross-validation statistics, the bristlecone pines, the screwed-up principal components method, the fiddling with the Gaspe series, or the myriad of little things. I think that they must realize that he’s obfuscating, but they go along with it because it’s not helpful to their cause if he blows up.

A real trouble with the above particular issues is not just that there might be a problem, but that the problems were known about by Mann in the first place and not reported. There’s lots of evidence of this. This could get very ugly if pursued and all these issues are on the Barton Committee radar screen. Given these skeletons in his closet, Mann was very foolish to make a spectacle of himself with the Wall Street Journal, where he caught the eye of some serious people.

The Barton Committee exactly understood these issues in securities terms; the learned societies have completely missed the point. Mann’s answers might play to believers, but he didn’t answer any of the hard questions. I think that Mann is in a wretched position if the Barton Committee actually does turn its sights on him. He might have got bailed out by Hurricane Katrina, which looks like it might eat up every politician’s energy for a year.

You say to yourself there’s a result and judge for yourself whether its robust or not. The “checking’ comes from more papers and their findings. After a few findings, you look again.

it seems you completely fail to grasp the need to replicate studies in science. Your version of ‘checking’ seems to be papers in the same field that have a similar sentence in their conclusion. This is just such a fundamental error it is difficult to describe.

And your attitude to the papers that have tried to replicate MBH ? It seems you believe it is wrong, because others studies, doing entirely different things, came to similar conclusions as MBH.
yours
per

I think DanO is moving closer and closer to the position of “Steve has shown MBH to be seriously flawed, not worth considering any more, etc.” but wants Steve to better address the issues of other reconstructions or even do some work to try to reconstruct himself (or some global analysis that shows the limitations of a best possible reconstruction.)

I think DanO is moving closer and closer to the position of “Steve has shown MBH to be seriously flawed, not worth considering any more, etc.”

Actually, I can eyeball the latest papers. I don’t need Steve to tell me that the shaft has been improved by more recent work.

but wants Steve to better address the issues of other reconstructions or even do some work to try to reconstruct himself (or some global analysis that shows the limitations of a best possible reconstruction.)

yes.

it seems you completely fail to grasp the need to replicate studies in science. Your version of “checking’ seems to be papers in the same field that have a similar sentence in their conclusion. This is just such a fundamental error it is difficult to describe.

No. Anyone with a scroll wheel on their mouse can see thru this rhetorical tactic, viz. current #93.

And your attitude to the papers that have tried to replicate MBH ? It seems you believe it is wrong, because others studies, doing entirely different things, came to similar conclusions as MBH.

No. Anyone with a scroll wheel on their mouse can see thru this rhetorical tactic, viz. current #85 (2).

Dano: “No. Anyone with a scroll wheel on their mouse can see thru this rhetorical tactic, viz. current #93.
hmmm. Is this where you say, “You don’t pore over details of an already-written because you don’t have the time”; or “The “checking’ comes from more papers and their findings”. From your reply, I am not even sure you understand what replication is, never mind its relevance.Dano: “No. Anyone with a scroll wheel on their mouse can see thru this rhetorical tactic, viz.current #85 (2).
That would be when you said “So, when we look at what is going on here, Steve needs to show, somehow, that ~ a dozen multiproxy papers are wrong, “: it seems you truly do not understand that it is fundamentally important that MBH’98 be able to support the conclusions it draws, all by itself. Instead, you continue to argue that anything said about MBH doesn’t matter, because there are another 12 multi-proxy studies. The presence or absence of a dozen, or million, other different studies, has no relevance for finding out whether MBH’98 is a good study.
yours
per

Not to mention that Steve IS working on all the other proxy papers and finding various sorts of problems with them. Does the Hockey Team think they can out-race Steve in publishing papers? As far as I can see the rhetoric of the team is just a delaying tactic in having to admit that the whole proxy publication ‘industry’ is sick.

Which isn’t to say it will always be, but good future work requires operating on the patient to remove the infection and that requires the patient to stop trying to prevent the surgeon from cutting.

I think I’ll quit there as carrying the analogy further is likely to get too gory for a family friendly forum.

He agrees that its poor work. Just thinks it may be (luckily) correct. Hence the need to look at more recent studies see if they prove stick out stick shape or are also flawed. (That is if you actually care about the temp reconstruction.)

This is not unusual in science that someone first comes out with a conclusion without really having valid proof of it. I think it is kind of cheesey, but they often get credit for it, versus the people who do later (valid) work to show the effect. This is one reason people push to publish fast and why some crappy papers get written (and in prestigious journals). Sometimes, they even are able to morph their original “discovery” to try to claim title to the “real discovery” that comes out later. Bell Labs was notorious for this sort of stuff. It’s a big kerfuffle regarding credit and quality work and all that, but in general, people eventually get to the real answers (maybe not expeditiously…but this is how it happens).

The presence or absence of a dozen, or million, other different studies, has no relevance for finding out whether MBH’98 is a good study.

Well, who’s using that old study anyway, I mean besides some bulldog-types here? Whomever is telling decision-makers that MBH98 is the latest understanding of the science is misinforming them. Decision-makers ask for the latest information with which to make decisions, they don’t want old stuff. Folks trying to decide how big of a jail to build don’t look at crime rates in 1950. Traffic engineers don’t build new roads on capacity data that terminated when they were children.

Plus, future projections used in decision-making are complex and past climate is not the sole component in the projections. Adaptive management techniques are only partially informed by comparative climate levels – as the CO2 ppmv levels are far higher than in the past, strict slave-like adherence to the past is simply bad decision-making.

Why all this focus on the past anyway? It sounds like some kind of pathology. This sounds like some sad guy with a comb-over, stuck in the glory days of how great he was in high school. I mean, don’t we laugh at 40-something guys who still wear their high school letter jackets and who don’t listen to any music that wasn’t created on vinyl? And if for some reason they are still married, don’t we look askance at their women?

BTW, TCO, you can keep that tag open, sir & thx, but plz cease attempts to characterize, as judgments such as ‘poor’ are only achieved with hindsight.

After all, that’s the issue here – all this hindsight and how wonderful we are at applying it to old stuff that happened in the past. My, aren’t we clever. What is needed is a change in human perception that allows us to judge new information for it’s relevance and clarity. If this, somehow, happens from the discussions here, focus on economics next as there’s a real need there.

Dano: “About a dozen newer studies. Nice, shiny, fresh, better informed evidence to inform decision-makers with. New, new, newer! Fresh!”
“Well, who’s using that old study anyway, I mean besides some bulldog-types here? ”
“Why all this focus on the past anyway? It sounds like some kind of pathology.”
well, I am astonished. A whole new approach to science. The argument seems to be that you can publish any old crap, and so long as someone else publishes in the same field before you are exposed, then it is okay. If you publish junk, it is just some pathological focus on the past if someone cares.
If I have to point out that the king is not wearing any clothes, I will do so. Science is defined by the ability to replicate results and experiments. It really matters if an individual paper is not reproducible. Even at the very simplest, it may be that there are methodological considerations common to many of these papers; a flaw in one such paper may expose the same flaw in many.
And just to point out the obvious; all the new, fresh, better informed papers in Nature or Science matter not one whit, if they are not reproducible.
yours, per

I’m not convinced that the new studies are “improvements” on MBH98. But it’s hard to get traction on them. Esper doesn’t disclose data. There are some pieces of Moberg missing. Moberg is just a different set of “proxies” than Mann. Moberg doesn’t use up-to-date proxies, but recycles a lot of old data.

The argument seems to be that you can publish any old crap, and so long as someone else publishes in the same field before you are exposed, then it is okay. If you publish junk, it is just some pathological focus on the past if someone cares.

Anyone with a scroll wheel and able to string two thoughts together can check above and see, David, that I’m not making this case. I find it interesting you choose this type of argumentation.

What you got against my letter jacket, music and lack o’ marriedness?!

Ahhh…TCO must stand for The Comb-Over. I’ll choose a new metaphor. :o)

I’m not convinced that the new studies are “improvements” on MBH98. But it’s hard to get traction on them. Esper doesn’t disclose data. There are some pieces of Moberg missing. Moberg is just a different set of “proxies” than Mann. Moberg doesn’t use up-to-date proxies, but recycles a lot of old data.

Unless found otherwise, the new studies are improvements on knowledge, not on a specific paper.

That is: what is the question they are attempting to answer? Whether 20thC was warmer than the recent past, not whether MBH98 is right.

Why do temps matter? This is the question. Who wants to know? There are different reasons for knowing this answer. Those who combine or conflate reasons likely have a reason for doing so. The issue is whether CO2 growth is an issue.

But anyway, Steve, decision-makers who are properly briefed place marginal importance on whether the MBH shaft is straight or curvy. Why? AD 840 didn’t have the same CO2 ppmv in the atm as today. That’s a problem, so models must be used to inform the future.

See, decsion-makers aren’t making decisions in the past. They are making decisions about the future. They aren’t looking at Mann, they are looking at model projections and asking how realistic the projections are.

Try this: after your next paper is done, you should audit the CO2 measurements. They are far more important for deciding the future than past temps. Adaptive management strategies only fractionally consider temps. And the decision-makers already know this.

All this glorifying of the past isn’t doing a good job of influence. Someone is way behind with this strategy. By the time you have your arguments down on this issue, you’ll be farther behind yet, because the public, too, wants to know what will happen in the future, not what happened in the past.

This is what I’m getting at. I enjoy the different little arguments here, which are enlightening.

àÆàanàÆàÅ said:“Anyone with a scroll wheel and able to string two thoughts together can check above and see, David, that I’m not making this case.”
“About a dozen newer studies. Nice, shiny, fresh, better informed evidence to inform decision-makers with. New, new, newer! Fresh!”
“Well, who’s using that old study anyway, I mean besides some bulldog-types here? ”
“Why all this focus on the past anyway? It sounds like some kind of pathology.”
“The “checking’ comes from more papers and their findings”
“So, when we look at what is going on here, Steve needs to show, somehow, that ~ a dozen multiproxy papers are wrong,…”
àÆàanàÆàÅ, you are not making the case associated with your words. ok, why should I doubt you ?
Why not put your case really clearly, so even I can understand ?
yours, per

Dano, if I’m not putting words in your mouth, that the multiproxy studies of climate history over the millennium are irrelevant to a “properly briefed” policy maker and that the real issue is “model projections and asking how realistic the projections are.” For a “properly briefed” policy maker, I’m inclined to agree. That raises the question of why there was so much focus on the hockey stick in presentation of IPCC TAR results, if it was irrelevant to a “properly briefed” policymaker.And it was even more heavily relied upon in Canadian government presentations than in IPCC TAR presentations. When I first saw it, it struck me as highly promotional (in the mining promotion sense of the word “promotion”), which is why I got interested in it. I do not accept the argument that the hockeystick has been highlighted by skeptics; the highlight on the hockey stick was already in IPCC TAR. So why did they focus on it if it was irrelevant? I think that the answer lies in their decision-making about how to promote Kyoto.

I don’t know all the back history. However, there seems to have been a viewpoint after IPCC 2AR that they had to “get rid of the MWP” in order to sell Kyoto. (Andre at UKweatherworld speculated that Overpeck was the source of this phrase, not attributed by Deming more specifically than someone important in the AGW industry.)

My other surmise (and this is only a guess) is that they needed something mroe than model-based projections to close the deal with policy-makers. Maybe they had not had much luck selling model-based projections to policy-makers, because policy-makers were gun-shy of buying into the models either because they were suspicious of the models, which had not had an unblemished track history by any means. I think that it’s an interesting question why the hockey stick was featured so heavily in IPCC TAR promotions – this is a different question than the validity or non-validity of the hockeystick or the question whether 2xCO2 will have an impact of 0.3 deg C, 2.5 deg C or 6 deg C on climate.

If I were a “properly briefed” policy maker, I would want to know all about the issues affecting differences between these 3 alternatives – e.g. water vapor feedbacks, forcing. I would like some disaggregated analysis of each of the the 5 biggest issues in the models. I’ve looked at IPCC TAR with this in mind and it is absolutely shocking hw little informaiton there is on these questions. They report model outputs and conclusions, but no detailed analysis about why the models differ. I’d want to have a balanced review of whether there are any risks of systemic modeling biases across all models, where they are likely to occur and the direction of their impact.

If I wanted to understand the factors affecting model projections from the point of view of “properly briefed” policy maker, there’s little food in IPCC TAR. My approach tends to be from the point of view of a “properly briefed” policy maker. I have a lot of experience in trying to make judgements from scientific [geological] presentations and some math skill that not every policy maker would have. To meet the standards that I would want, if I had a big policy job, I’d commission a brand-new study of the key models – to set out how the issues. This is what I’d have expected from IPCC TAR in the first place – the question is: why didn’t they deliver? and back to the start, why did they promote the hockey stick so much if it was irrelevant (as it may be)?

My other surmise (and this is only a guess) is that they needed something mroe than model-based projections to close the deal with policy-makers. Maybe they had not had much luck selling model-based projections to policy-makers, because policy-makers were gun-shy of buying into the models either because they were suspicious of the models, which had not had an unblemished track history by any means.

Close the deal with policymakers. Gun-shy. Yada. These sound like phrases coming from a value-judgement issue POV. If you want policy-makers to consider your value judgements over others’, you have to accept the fact that your value judgements may not win out (if this is a win-lose deal). If you wish politics to run on Enlightenment principles only, with no value judgements in the calculatin’, then you have to completely restructure society.

If I were a “properly briefed” policy maker, I would want to know all about the issues affecting differences between these 3 alternatives – e.g. water vapor feedbacks, forcing. I would like some disaggregated analysis of each of the the 5 biggest issues in the models. I’ve looked at IPCC TAR with this in mind and it is absolutely shocking hw little informaiton there is on these questions.

In this model, “properly briefed” policymakers have enough time and brain power to get up to speed on the issues. This is what aides are for.

If you want answers to the questions you feel are underrepresented, you have to ensure your views get heard. In our society, there are ways of ensuring this happens. May I suggest you learn these ways?

if I had a big policy job, I’d commission a brand-new study of the key models – to set out how the issues. This is what I’d have expected from IPCC TAR in the first place – the question is: why didn’t they deliver? and back to the start, why did they promote the hockey stick so much if it was irrelevant (as it may be)?

Again, you are looking in the past, judging with hindsight. When one looks in the present, one only has so much information. This issue is how to tell when a conclusion is robust. Even then, decision-makers use other ways of knowing and rationality is only one tool in the kit.

But to the point of the para: I disagree. The scenarios were run and probablities assigned. What more do you want?

The second problem is perhaps a little more subtle, and it is that there is no obviously correct way to attach probabilities to the individual scenarios. The scenarios are essentially presented as possibilities, with no assessment of their probabilities. Obviously they are intended to cover a range of reasonable possibilities (they would have little use otherwise) but they are quite explicitly NOT assigned any sort of relative likelihoods:

Dano, I understand the role of aides in all this – you’re splitting a pointless hair here. My point about the models is both hindsight and foresight. If I were, in your terms, the aide to a policymaker, I would want to see a disaggregation of the factors that are important in climate models and an assessment of how each factor affects the range of 2xCO2 sensitivities. I say this as someone who has a genuine interest in the question. It’s something that I’d like to see.

As to your swipe that I should learn how to make my views heard, I’ve been gratified by the amount of interest that’s been expressed in my views. As I’ve mentioned from time to time, I initially looked at some of these issues purely for a personal interest in big promotions, with no expectation of affecting policy. The only policy that I’ve opined on are policies relating to the archiving of data and methods and policies relating to disclosure and due diligence. I have no personal views at present on substantive climate policy, as I’ve often said. I think that proponents of climate policy would serve their own interests most effectively by copious and diligent archiving of data and methods and not behaving like prima donnas.

What I infer from your comments is that you have no problem with policy makers being mislead, as long as they are mislead in a direction which you find acceptable.

You are inferring what you want to hear. Allow the parenthetical phrase to enter your thoughts.

“May I suggest you learn” what the scenarios are for?

I’m quite familiar with scenario analysis and adaptive mangagement.

However, I could have done a better job with that passage you take exception with, and a pithy phrase doesn’t cover the spectrum of scenarios and the management that arises out of there; I’ll ask for your indulgence to expound in a different comment.

I would want to see a disaggregation of the factors that are important in climate models and an assessment of how each factor affects the range of 2xCO2 sensitivities. I say this as someone who has a genuine interest in the question. It’s something that I’d like to see.

Me too. That’s gotten at e.g. here, so it’s not like it’s not done at all. I’m sure this will come up and be included in the WGs. I think the evidence is robust enough, but the skill at relaying the evidence in the environment of competing interests is lacking.

I think that proponents of climate policy would serve their own interests most effectively by copious and diligent archiving of data and methods and not behaving like prima donnas.

First, this conflates science and policy, and I’m not sure this conflation is correct. Policy folk are largely not doing primary research, but the primary researchers occasionally do testify.

Second, there are 14 technical articles in this week’s edition of Science. How many of these papers have diligent archives for inspection by the public?

Third, what you consider to be a swipe is a proffered hint at typical, human, expected behavior modes that are judged to be donna-like, whether they are prima, seconda, terza, whatever – there might be a cause-and-effect relationship. And humans are humans, not perfect, and can reasonably be expected to react in a certain way when confronted with unexpected situations. That is: it shouldn’t have come as a surprise that someone uttered a TLA when the rules weren’t followed – it’s a human reaction.

The refusal to show data as required by journal policies is very poor behavior. Not in the keepings with the ideals of science (ala Wilson, Feynman). It is tendentious vice truth-seeking. And yes, scientists are human and don’t always act as they should. But on issues like this, we should push them to do so. If not, we should put be very hesitant to trust their reports.

… You don’t pore over details of an already-written because you don’t have the time, unless you are using that paper as a reference or basis for your own work. It’s not done. I’m not saying this is good or bad, it just is. There seems to be a big misunderstanding of this reality among some commenters.

Perhaps things are different in your field/work environment, but I’ve certainly experienced the opposite. In fact, in graduate school, all students were required to spend a semester class doing nothing other than reading selected papers spanning the last 50 years and much of the breadth of biology, in order to determine for each of those papers whether the methods used and data reported were appropriate and/or sufficient to support the conclusions of that paper.
The idea was (and is) that personal, independent evaluation of others’ work, even outside your specialty, is a useful and important skill for a scientist to posess and to exercise. With practice, it needn’t even take a lot of time.

Dano says:

You can see the wide error bars and say there’s quite a bit of uncertainty there – it’s not hard. You say to yourself there’s a result and judge for yourself whether its robust or not. The “checking’ comes from more papers and their findings….

Without a personal, independent evaluation of the robustness of the calculation that produced the error bars, you have no way of knowing whether they are in the ballpark or not. Similarly, unless you can make a personal evaluation of the robustness of the work in the “…more papers and their findings,” all you are left with is an appeal to authority (the authors of the original & follow-on works).
I can’t answer for you, but I certainly would at least want to make sure an author’s work passed a number of my own tests before relying on them in an appeal to authority.
The various problems in the published MBH98 work, as well as the authors’ subsequent behavior, would seem to me to have severely limited MBH’s credibility as authorities to be appealed to.

Dano says:

The paper was used as a datum for decision-making. Decision-makers make decisions this way all the time. Decision-makers make decisions with results from econ papers that have low r^2s and Ts all the time, because econ is only one way to predict human behavior. Decision-making is like making sausage.

Just because bad/wrong things are done doesn’t justify other bad/wrong things.

Oops – I should have said, ‘I could have worded it better’, and avoided yet another opportunity for someone to misstate what I said to make a point. It happens a lot around here, so my bad. When will I learn?

But if you want to discuss how and when to use scenario analysis and adaptive management, I’m game – we can use that thread for me to re-state too. So go for it. I’m not comfortable at scales above a watershed, but I’ll try and keep up with you.

Armand,

I too spent time (more than a quarter) as a grad student poring over papers. Your 1st reply doesn’t address primary researcher’s time constraints.

Your 2nd reply appears to argue folk can judge the quality of the work of something out of their field.

I presume, then, you have looked at the copious r^2s and error bars provided by S&C and registered your complaint, and discussed that work in the S&C section here on this site.

Your The various problems…as well as the authors’ subsequent behavior, would seem to me to have severely limited MBH’s credibility… presumes that they acted this way to lots of people, rather than one person. Your argument would be a lot stronger if you could show this happened to more than one person.

And your last reply: decision-makers need you. Need you badly. They are doing ‘bad/wrong’ things. Make them stop. Society needs you to act to stop this.

Re: #119
I’ll keep it succinct to help you out with your time management (grin).

Your 1st reply doesn’t address primary researcher’s time constraints.

I refer you to my statement: “… With practice, it needn’t even take a lot of time.” Again, I’m just relating my observations from the environments I’ve been in — YMMV. I’ll also note that defending others’ work without at least a rough evaluation of that work yourself seems risky to me.

Your 2nd reply appears to argue folk can judge the quality of the work of something out of their field.

Yes, to at least a first approximation.

I presume, then…

Why?

Your … presumes that they acted this way to lots of people…

I disagree.

Your argument would be a lot stronger if you could show this happened to more than one person.

Well, who’s using that old study anyway, I mean besides some bulldog-types here?

See figure 2 in Damon and Laut (EOS, 28 September 2004), referenced from the Contrarians page of Schneider’s web site. Damon and Laut are attempting to refute a number of claims to do with solar-climate links. Sure, they have a later reference to Jones and Moberg, but figure 2 would look rather different with MBH omitted. (The graphic says MBH ’98, while the references say MBH ’99, and the actual “MBH” data is shown from 1850 to 2000, an end-date which is pretty smart for a reconstruction.)

Anyway, one can presume that, because Schneider is willing to cite Damon and Laut, he believes MBH 98 or 99 is still load-bearing in scientific argument.

Dano said:
“Your argument would be a lot stronger if you could show this happened to more than one person.”

Why ?
In 2003, this was the one group (M&M) that was trying to replicate (audit) MBH’98. Their behaviour to this one group speaks volumes. Strangely enough, they didn’t behave the same way to all the other people in the world- who- coincidentally- were not trying to replicate their data.
I think the logic of your position is not crystal clear to me.
yours
per

Dano said:
“Well, who’s using that old study anyway, I mean besides some bulldog-types here?”
I followed up Jo Calder’s line of thought.
According to Web of Science, MBH’98 has 384 citations, of which 30+ were published in 2005.
I think you would have to be a “denialist” to argue that MBH’98 isn’t an important contemporary study.
yours
per

Stop beating up on Dan. I think you all are so excited to have an “opponent” over here, you get into these ridiculous arguments about wether you proved him wrong in a point and he refused to admit it. Instead of that, why not try to understand his points, draw him out…figure out exactly where difference are…discuss the issues of philosophy of science with depth…drug his beer…and drag him into the circle to be our “tree-boring bitch” (oops I mean, vital coauthor).

OK, if I can’t beat up on Dan (– I didn’t think I was btw), can I try William Connelley instead? Connelley has a new feature on his blog Ask Stoat where you can put forward questions for him to answer. John Fleck asked (among other things):

R2 vs RE: Sorry to JF for not replying, busy etc: but I’m obviously out of touch: what is it code for? Is this some Bartonism?

Now, maybe this is an attempt at humour, but I don’t get much support for that reading from the context. If it’s not an attempt at humour, I’m shocked, shocked that so staunch a H-S supporter should have such a lacuna in his statistical arsenal. Someone should tell Tim Lambert.

Jo – Connolley describes himself as “climate modeler” in his realclimate bio and has contributed an article on trend estimation to Wikipedia (although the article does not use the statistics.) I presume that he is simply trying to avoid answering the question. Sometimes you have to ask things more than once from the Hockey Team and in baby steps when they’re being deliberately obtuse.