Peer Reviewing Ammann and Wahl #1 – The Correspondence

Mann cited Ammann and Wahl’s recently released paper at NAS (which was not available to us in time for the NAS panel, although I’d seen and reviewed an earlier draft.) After reading it, Per said that he thought that the reviewers had done a lousy job.

Now I was only a reviewer for the first draft (after sending in my review, I seem to have been replaced; I never heard anything more from Climatic Change.) I provided many detailed comments that were simply ignored. Obviously I have an interest in the matter, but Schneider knew of my interest and presumably that was one of the reasons that he asked me to review. (I had previously reviewed a submission by MBH to Climatic Change in 2004, which has never seen the light of day. My review was detailed, so Schneider knew how I reviewed things. Actually my review of the MBH submission led to the introduction of a limited data policy at Climatic Change for the first time, so I had a positive contribution.) In this case however, Schneider disregarded most of my comments – which, by and large, pertain to objective things even though there is a controversial edge.

The only materially new sections of the revised article are the discussions of RE versus R2, low frequency versus high frequency, Appendixes 1-3 plus of course the table of verification statistics. Ammann and Wahl say that they have provided the verification statistics so they are "available to the community". Good and I’m glad that they saw the light. But let’s be clear about this – they were not added as a simple response to a reviewer request. They refused to provide this information to me as a reviewer and Schneider abetted this. Readers of this blog know that, as recently as December AGU, Ammann was still refusing to disclose the verification r2 and similar statistics. It was only because we filed a complaint with UCAR about misconduct in withholding the results and because I’ve got an audience at the blog and have hammered away at Ammann for attempting to withhold adverse results, that they disclosed the adverse results.

I’ll have quite a bit to say about the article itself in a few days. (I’ve still got to bring the NAS panel to the end of last Friday.)

No one should be under the impression that, if I’d been a little "nicer" to Ammann, that he would have done this on his own. I’ve made some very polite offers to Ammann and have been ignored. He fought disclosure to the bitter end. If you don’t believe me, read on. This goes up to our review of Ammann and Wahl, which I’ll post up tomorrow or the next day. I’ve started with some requests from me to Ammann long before the Climatic Change process began. I’ve described my discussions with Ammann at AGU elsewhere.

Correspondence with Ammann and Climatic Change Dec. 22, 2004 SM to Ammann (no answer)Dear Dr Amman,
I attended your presentation at the AGU last week on your emulation of MBH. As you may aware, I have considerable background with this and am interested in the project. When do you anticipate posting up your results? I would be interested in any results that you are able to share at the present time.
Thanks, Steve McIntyre

Jan 4, 2005 SM to Ammann (no answer)Dear Dr Amman,
Michael Mann has been citing an article of yours under review as both confirming his results and discrediting findings attributed to McIntyre and McKitrick. I would appreciate a chance to look at your article. Thanks, Steve McIntyre

May 12, 2005 Clim Chg to SM

Dear Dr. McIntyre,
Attached is a letter from Stephen Schneider requesting review of the above referenced paper, which is also sent as an attachment (ms and four figures).

Please acknowledge receipt and let us know if you need a hard copy.
Regards,
Katarina Kivel

May 12, 2005 SM to Clim ChgI appreciate the invitation and would be happy to provide a review within 4 weeks. Regards, Steve McIntyre

May 12, 2005 Clim Chg to SM

Dear Dr. McIntyre,
Thank you for confirming receipt and for your interest in providing comments. Attached are also the Guidelines for Reviewers and Climatic Change Editorial Policy. CCedpolicy98.pdfCCGuidelinesRevs98.pdf
Regards, Katarina Kivel

May 13, 2005 Ammann and Wahl to M&M

Dear Steve McIntyre and Ross McKitrick,
we have finally submitted our manuscripts containing our own reproduction of the Mann-Bradley-Hughes climate reconstruction including a now complete analysis and verification of suggested modifications put forth in your GRL and Energy and Environment articles.

It is our understanding that you should get the two papers to review shortly (or you might have received them already). If you should not receive such a request, please let us know so that we can send you a copy.
Best regards,
Caspar and Gene Wahl

May 13, 2005 SM to Ammann (no answer)Dear Caspar,
Thanks for the email. I’ve received the CC paper, but not the GRL paper, which will probably arrive next week, so I wouldn’t mind seeing it.

Obviously there’s quite a lot of interest in this topic. Yours is the 4th Comment submitted so far to GRL, so replying has become a small industry. We have 2 Replies to finalize for next week. I suspect that they’ll run all of them at the same time.

I’ve started reconciling your code to our code and finding a lot of similarities so far. I’m glad it’s in R. While you characterize your results quite differently than we do in our EE article, many of the conclusions on calculations seem pretty similar (you note similarities in a couple of places in the CC submission). In my opinion, the key issues are going to be assessing the quality of bristlecones as a determinant of world climate history and sole reliance on RE statistics without insuring against possibilities of spuriousness.

While both parties have different objectives in terms of conclusions that they wish to emphasize, I’m 99% sure that there will be a great deal of common ground in terms of code. In order to focus debate, I would like to suggest that we try to work towards some joint statement on how we have emulated MBH98 and on any residual differences between our methods. I’m annotating as I go, and if there is some possibility of doing a joint statement, I’ll share these comments rather than using them for controversial purposes.

I would characterize both algorithms as emulations, as neither of us has "reproduced" MBH98 in audit terms, although each of us has replicated enough of the characteristics to make analytical statements. I think that your website language is somewhat misleading in this respect. In passing, it seems a little churlish that you should criticize von Storch (correctly) for not attempting to replicate MBH methods in your CC article, while at the same time, not acknowledging our emulations which attempt and substantially accomplish what you criticize vS for not doing.

Regards, Steve McIntyre
June 6, 2005 SM to Climatic ChangeIn a first look at the submission by Wahl and Ammann, I noticed the following missing information and data, which I require in order to finish the review.

The authors rightly place considerable importance on the need to report verification statistics on climate reconstructions (see page 7 ) and stated (page 23) that such verification statistics would be available at a website. However, I was unable to locate them in the article or at the website, other than the Reduction of Error statistic, the distribution of which is at issue and which certainly should not be the only significance test cited. Could you please have the authors provide the following verification statistics for each of the runs cited in the article:

Thanks very much. I anticipate making a number of comments after receiving this information.

June 10, 2005 Response by Ammann and Wahl,

Dr. Stephen Schneider
Editor in Chief, Climatic Change
Dear Dr. Schneider:
This communication is in response to a request for additional information (specifically calculation of a number of statistics) for our submission, # 3321.

Our general conclusion is that the statistics we already have included in mss. #3321 are the most meaningful for the purpose of examining the validity of reconstructions of decadal and multi-decadal trends of surface temperature over the last millennium. Extending the set of measures to include those requested would add only very-high frequency (interannual) information that cannot, by construction, examine the fidelity of reconstructing longer-term trends. Thus, these measures are not directly relevant to the purpose of mss. #3321. We explain our reasoning in detail below.

We also would like to emphasize that the purpose of making our code and data sets available
(cf. http://www.cgd.ucar.edu/ccr/ammann/millennium/CODES_MBH.html ) is to facilitate examination of the MBH reconstruction and the other scenarios we examine. Of course, the reviewer is free to use these tools to calculate these statistics him/herself. Indeed, we are already aware of one such use at the following website, http://www.climateaudit.org (S. McIntyre).

EXPLANATION

General Overview
First, and most generally, the statistics requested by the reviewer measure in the high frequency (interannual) range, as explained in detail below for the various measures. It was not by omission of consideration that we did not include any of these measures, but rather that we consider the exact interannual tracking of the climate reconstructions we have done to be, at most, of minor consequence to determining their usefulness. The reason for this is that it is at the scale of low-frequency information (multi-decadal to secular variation) that issues about last-millennium climate reconstructions are the most salient. This is clear from the last-millennium paleo-reconstruction literature, and the scientific debate has generally shifted towards this set of issues in regards to the MBH reconstruction in particular, as demonstrated by the attention given to the von Storch et al. (2004) and Moberg et al. (2005) examinations of it. In mss. #3321 itself, we address the primary issue of whether or not the early 15th century can reasonably be considered anything like the later 20th century in terms on N. Hemisphere average surface temperature. Individual years are not at issue here, but rather averages on the order of 2-5 decades. We also address the low-frequency amplitude issues raised by von Storch et al. and Moberg et al., in recognition of their importance.

The measures we use, RE and deviation from the mean of the verification period, are specifically included to account for this consideration. RE, by design, picks up a combination of both high and low frequency information in the independent verification period (explained below), and the deviation of the reconstructed verification-period mean from its instrumental counterpart picks up the lowest frequency information possible, at the scale of the entire verification period (1854-1901). We believe that the combination of these two measures is appropriate to characterize the reconstructions for the primary task of discerning long-term deviations from the calibration-period (1902-1980) mean, which is the heart of the matter for last-millennium reconstructions.

Consequences of Focusing on High Frequency-only Measures
If we were to employ high-frequency-only measures, our primary conclusion concerning the trajectory of N. Hemisphere temperature over the 600 years would not be fundamentally altered. That is, the results presented by McIntyre and McKitrick, which we find to be without merit based on RE and deviation from verification period mean, would still remain without merit. None of these results would be altered into significance by the use of high-frequency-only measures, thus the MM "correction" to MBH that the early 15th century was at least as warm as the late 20th century would be refuted in any case. What could possibly change is that some of the MBH "segments" (based on varying richnesses over time of the proxy data) and some of the WA scenarios we present might not pass verification significance testing at the highest-frequency domain. If one wanted to use this frequency domain as the primary gauge of significance (which we argue, as above, is not at the heart of the matter), then the most impact such consideration would have would be to make moot the reconstruction scenarios thus judged. In such a process, some information that is demonstrably valid at lower frequencies could be lost, but no new information would be added.

An analogy to the frequency spectra of musical instruments is apt in this regard. A violin and flute playing A440 are both producing sound pressure waves with a fundamental frequency of 440 cycles per second. Although they are playing the same note, what allows us to readily detect that two quite different instruments are being played is the sonic energy being produced at higher frequencies by each instrument (called "harmonics", or more generally "overtones"). The energy and frequency spectra of these higher-frequency components of the whole sound differ for families of instruments, and indeed for each individual instrument. Using high-frequency-only measures of merit as the final arbiters in validating climate reconstructions would be analogous to using only the overtones to characterize the sound being produced by different instruments. Doing so would allow us to determine which instrument is being played, but would lose the information of what notes they are actually playing! In climate reconstruction, such a process would involve losing trend information in relation to a standard (typically the calibration period mean), but would focus on year-to-year fidelity. It is exactly this result that use of the statistics requested as primary validation criteria would entail.

Conclusion
Based on these considerations, we believe that the measures of merit we have reported in mss. #3321 are appropriate to validation at the frequency domains that are salient in last-millennium climate reconstruction of hemispheric/global averages. That high-frequency information has, at least some, relevance we do not argue, but we do strongly argue against using high-frequency-only measures as final arbiters of significance. To do so could result in throwing out demonstrably valid decadal/multi-decadal information, which we believe is a scientifically inappropriate waste of information.
Considerations Concerning the Requested Statistics and the RE Statistic
All of the requested statistics we have examined (the first four) isolate high-frequency (interannual) information on reconstruction performance. Each of the four measures is evaluated in this regard here. The RE statistic is evaluated in relation to the other statistics in (5).

1) The product moment correlation coefficient (r) Any arbitrary offset in the means of the series being compared leaves (r) completely unchanged, meaning that it can have either low or high values that are entirely unrelated to the low-frequency performance of the reconstructions.

2) The coefficient of efficiency (CE) In the case of CE, a related issue arises in that, by design, the mean of the period being examined is the standard against which deviations in the instrumental values are calculated (cf. (5) below). Thus, CE is incapable of measuring the ability of the reconstructions to detect changes in mean behavior (in relation to the calibration period mean) of the instrumental data being used for verification.

3) The sign test This test is, again by construction, a high-frequency-only statistic. It measures only year-to-year changes in the direction of sign of the reconstructions in relation to those of the actual values.

4) The product means test The product means test can also be a test that is insensitive to detection of changes in climate average behavior–depending on the mean values used for calculating the "cross-products of the actual and estimated yearly departures from their respective mean values" (Cook et al., 1994). If these means are both over the verification period, then again, this statistic is a high-frequency-only measure. It is this use that we expect from the context of the Cook et al. explanation.

5) The reduction of error statistic (RE) RE is identical to CE, with one exception. For a given period of interest, both subtract from one the ratio of the sum of squared residuals of reconstruction to the sum of squared deviations of the instrumental values from their mean. In the case of CE, as mentioned in (2), the mean for the instrumental values during the verification period is the verification-period mean itself. In the case of RE, the mean for the instrumental values during the verification period is the calibration-period mean. This difference allows the RE of verification to detect as useful information changes in the mean of the reconstructed values from the calibration-period mean. RE rewards this detection, and thus it can register as a valid reconstruction one that does lose some high frequency fidelity in the verification period, but which retains useful low-frequency fidelity in relation to offsets from the calibration period mean. Cook et al. discuss this "odd behavior" that a high-frequency test (they mention r2) can show poorer performance than RE in such a situation. However, this discussion is concerned with ensuring that high frequency reconstruction fidelity is the target of interest; conversely–and most importantly–the detection of differences of mean between the calibration and verification periods is not considered as a target of examination.

Concerning whether RE is "at issue" The requester mentions that the RE statistic is at issue, a claim that Dr. Ammann and I have shown is made moot by the results of our indirect tests in ms #3321. In addition, Dr. Ammann and I have shown in other material referenced in mss. #3321 that the analysis of McIntrye and McKitrick in GRL (2005)–which claims RE significance levels are improperly determined by Mann, Bradley, Hughes–is itself deeply flawed. Thus, the argument in the request is incorrectly put in this regard, and it also ignores that we do use an entirely separate statistic–the deviation from verification period mean. [my bold]

June 10 Schneider to SM (Cover Letter for A&W Response of June 10, 2005)

Dear Dr. McIntyre,
With regard to your request, authors Eugene Wahl and Caspar Ammann claim (see attached) that much of the data you have requested can be derived from information they have already given and argue that high-frequency results are not what is in debate in most of the literature. In fact, I wonder, given Ed Lorenz’s classical contributions on unpredictability of weather, which leads to stochasticity of high-frequency climate, what could we learn at an interannual time scale about longer term issues like the multi-decade averaged paleo-climatic temperature reconstruction that has been in dispute?

In any case, if you have strong arguments to the contrary, of course, I will be happy to receive them and pass them on to the authors.

In addition, given the nature of this issue, it is not unlikely that there will be unresolved methodological and philosophical differences among reviewers and between reviewers and authors on this topic. If, as I suspect, that turns out to be the case, then the usual practice at Climatic Change, when there is no closure between some reviewers and authors, is to commission “springboard” editorials that openly raise these issues of dispute, so that the broad interdisciplinary readership of Climatic Change can be better enlightened on what is technical and what is paradigmatic disagreement.

But, it is premature to predict an impasse between reviewers and authors until all reviews are in and the authors’ revision is resubmitted.

Thank you for your efforts as a reviewer.
Sincerely,
Stephen H. Schneider

Editor

June 15 SM to Climatic ChangeDear Katarina,
Could you please send me a pdf of the following publication. I would like to refer to it in order to respond to the recent letter from Wahl and Ammann in connection with my review of their CC submission. Dr. Schneider should have a copy of the article around (he cited it in a recent presentation in England). If not, could you obtain it from the authors.

At their website, they say the following in connection with this study: "This result indicates that modern-period validations of reconstructions based on relatively poor-quality proxies can give a strongly false sense of security about the likely long-term reliability of these reconstructions." Thus, it bears strongly on their current argument for refusing to produce verification statistics that I believe to be relevant.
Regards, Steve McIntyre

June 15, 2005 Ammann Response to June 15 Request

Dear Editors,
This request for an additional manuscript is rather puzzling to me. It appears highly unusual that a reviewer would be requesting through an editor material that is (a) not mentioned anywhere in the manuscript and (b) not at all relevant to the research contained in the paper under review. The submitted manuscript, including the online distribution of the reconstruction code, seems quite sufficient for performing a review.

The requested paper is completely irrelevant for the review because it is based on Climate Model data, studies a few single isolated grid points, and focuses on high-frequency interannual climate variability. All three of these issues are not under consideration in the Climatic Change manuscript where the aim is to introduce an open-source code to redo Mann-Bradley-Hughes (MBH) within its own framework and evaluate a number of recently raised criticisms that concern century scale hemispheric climate. Neither the criticisms nor our evaluation addressing them question the fundamental assumptions underlying MBH. This is clearly stated in our submission. There is no element in the thrust of the manuscript that the reviewer is considering that has any link to the mentioned paper on Stationarity and Fidelity of ENSO using climate model data.

After brief consultation with E. Wahl, I politely decline this request and would ask the reviewer in question to get in touch with us if he or she is interested in this science unrelated to the manuscript under consideration.
Best regards,
Caspar Ammann

June 22, 2005 SM to Climatic Change (no answer)Dear Dr. Schneider,
In your letter of June 10, 2005, you suggest that I, in my capacity as a reviewer, should carry out computer runs in order to obtain the data that I requested from Wahl and Ammann. You stated as follows:With regard to your request, authors Eugene Wahl and Caspar Ammann claim (see attached) that much of the data you have requested can be derived from information they have already given.

Wahl and Ammann’s exact words were:We also would like to emphasize that the purpose of making our code and data sets available (cf. http://www.cgd.ucar.edu/ccr/ammann/millennium/CODES_MBH.html ) is to facilitate examination of the MBH reconstruction and the other scenarios we examine. Of course, the reviewer is free to use these tools to calculate these statistics him/herself.

First, the website in question only contains information on one scenario. It does not include information on the other scenarios examined. These are promised after publication in Climatic Change. So the data sets involved in the “other scenarios”‘? are, as a matter of fact, not available.

Second, the availability of “much of the data”‘? is no substitute for the availability of all the data. In my experience, it is usually the data that is hardest to obtain that is most likely to prove problematic.

Thirdly, availability of source code does not affect the authors’ responsibility to provide requested data and results. The availability of source code is very important for verification and replication, but it is not a substitute for provision of important and standard statistics as calculated by the authors.

Finally, last year, with respect to another paper, you explicitly took the view that Climatic Change reviewers were not expected to run source code. Your words were:

Reviewers are not expected to rerun authors codes, (Jan. 25, 2004)
[it] is not generally a reviewer responsibility to perform replication analyses–as a practical matter we’d have precious few pro bono reviewers if each were required to perform replication work on complex codes–theirs or anyone else’s. (Feb. 19,2004)

Wahl and Ammann may not be aware of this CC policy, but I find it particularly odd that you should have adopted the position of your recent letter. I request that you re-consider your decision in the light of the opposite position that you took last year.

Thank you for your consideration on this matter. I also find both the reasons for the refusal by Wahl and Ammann to provide the requested results and the refusal itself to be very unacceptable. I will send you comments on this in a separate email.

Regards, Steve McIntyre

June 22, 2005 SM to Climatic Change,(no answer)The response letter from Wahl and Ammann states that they "have shown in other material referenced in mss. #3321 that the analysis of McIntrye and McKitrick in GRL (2005)–which claims RE significance levels are improperly determined by Mann, Bradley, Hughes–is itself deeply flawed."

This "other material" is not on the present record. 1) Could you ask them to briefly summarize the flaws in the analysis of RE significance levels that they are referring to here. 2) if the other material is unpublished, could you ask them to provide a copy of the other material so referenced. 3) could you find out from them the approximate anticipated publication date of the other material?

Regards, Steve McIntyre

July 7, 2005 SM to SchneiderDear Dr Schneider and Ms Kivel,
I have not heard back from you on my most recent correspondence. Proceeding on the information available to me, I have enclosed my review of the Wahl and Ammann submission, which certainly took more time than I would have wished.

However, I have benefited from reading many interesting articles in Climatic Change and am happy to assist with any reviewing where you think that my assistance will be of benefit to CC.
Regards, Steve McIntyre

July 8, Clim Chg to SM

Dear Dr. McIntyre,
Steve Schneider has asked me to let you know that he appreciates your reviewing the manuscript by Wahl and Ammann in a timely manner. But since it is important that the process be as thorough as possible, please let us know if you need more time to prepare a revised version of your review based on this response from the authors to your most recent request, or if you are satisfied with your review as is.

We look forward to hearing from you.

Regards,
Katarina Kivel

_

_____________________________________________________

RESPONSE FROM EUGENE WAHL AND CASPAR AMMANN

The attached article text is in response to the request from the reviewer received June 30, 2005. It is the full text of an article submitted by Caspar Ammann and myself to GRL, which was declined. The decline decision was not for technical reasons, but because GRL had several comments on the same initial paper by McIntyre and McKitrick (2005), of which ours was one, and the editor chose to decline for reasons of repetitiveness. We disagree with this decision from an editorial policy standpoint, however, we are planning to submit this text to another journal besides GRL. What is attached is exactly the text to which we refer in mss 3321. The attachment and this paragraph together provide a full response to the reviewer’s questions 2 and 3.

In response to the reviewer’s question 1, in the quote from our response letter to the reviewer’s earlier request for additional information (response dated June 10) we are actually not commenting on an analysis of the significance of RE levels as such. Rather, we are highlighting that the analysis in McIntyre and McKitrick (2005)–on which the question regarding RE significance levels is itself based–is flawed. We demonstrate these flaws in the attached text, which as mentioned is in the process of being re-submitted (the fundamental scientific content will be unchanged on re-submission). The purpose of mentioning these flaws is to show that the McIntyre and McKitrick article, which is used as a basis for justifying higher RE levels for significance than those commonly used in dendroclimatology and by MBH, is itself at issue for being conceptually inaccurate, and thus cannot be a strong basis on which to question standard analyses of RE significance.

July 8 SM to Clim ChgNo, it is unnecessary to change anything. Regards, Steve McIntyre

I find their continued characterisation of r^2 as a “high-frequency” statistic really irritating. Yes it excludes the mean, and for a finite length series (e.g. 45 years) this effectively reduces its response to the very low frequency components, but it is fully responsive at a frequency of 0.02 cycles per year, and still 50% responsive at a frequency of 0.01 cycles per year. Still producing a substantial response at a centennial scale variations! What is more disturbing is that at frequencies below this (say 200 years per cycle and below), there is clearly huge scope for spurious relationships considering the proximity of the calibration period.

On a slight side note, I assume the RE statistics are produced by setting the y axis to zero for the 1902-1980 training period average temp. One of the less attractive things about RE is that the relative merit of the mean is determined by the relationship between the standard deviation of the data and the mean of the data set. Greater weight is applied to that which dominates. If you use the period (say) 1960-1980 to zero the temperature anomaly, as opposed to (say) 1880-1900, you get a very different score for RE. It shouldn’t (see below?***) affect the significance of the result, but it makes interpreting RE just by looking at a figure rather more tricky, especially for temperature anomalies where the average is positioned somewhat arbitrarily.

It has crossed my mind that I could come up with some absurd billion-to-one against examples of where RE values are anomalous, by messing about with the reference standard deviation to mean ratio, just as they have done with r^2. But I think it would be wise not to stoop to that level: such an analysis of a statistical skill score seems to me to be a somewhat pathological pursuit.

(*** not so sure about this, thinking about it. Because changing the mean would change the relative importance of getting the mean or the standard deviation right, perhaps it is possible to bring a score in or out of significance by such an approach?)

The letter was from an editorial assistant so I wouldn’t have concerned myself about address style from an editorial assistant, as I had many other matters on my mind. Usually the issue doesn’t arise. I was Mr McIntyre for the NAS panel, if you’re worried about it, and they felt that my comments were worth hearing. That was our last correspondence with CC. What do you think is explained here?

Spence, if you divide any of the classic spurious regressions in half with a “calibration” and “verification” period – and rememebr that these guys have already peeked at the data -, you will get a “significanct” RE.

Schneider had gone on record as being very critical of E&E for not allowing Mann to review our original article. When Schneider asked me to review a submission by MBH in 2004 to Climatic Change in 2004, I presumed that it was to be consistent with his objections to E&E. And, in fairness to him, he did give me an opportunity to review the MBH submission, which was never published. (That did not prevent Mann from citing it in Jones and Mann [2004] as supposedly trashing us; Jones and Mann [2004] was then used as authority in Rutherford et al [2005] that we had supposedly been trashed, but the original MBH article was never published.

When Ammann and Wahl came along, I assume that Schneider had two motives: (1) to be consistent with the stated CC policy mentioned above; (2) that I’d done a good review on the MBH submission.

It never occurred to me that titles were relevant to this. As you say, I had nothing to do with causing the error. However, I will correct the matter when I correspond again with CC. But I’d be very surprised if titles had anything to do with it. If he was worried about it, McKitrick would have been happy to provide additional review. In terms of competence, I was obviously highly qualified to provide a review, as you will undoubtedly agree when I post up the actual review.

True, my knee-jerk response to their over-analysis of r^2 is to over-analyse RE, which probably just muddies the water even more than they have managed on their own. The reality is, as ever, pretty simple. Under these circumstances, the RE statistic is clearly not sufficient.

Steve B, let me ask you a question. Look at the sentence towards the end of the June 10, 2005 response by Ammann and Wahl in which they say that the material referenced [their GRL submission] supposedly trashes us. On June 6, 2005, GRL had rejected the Ammann and Wahl submission. Do you have any problems with what Ammann and Wahl said here?

What is the reason you posted all these communications? As a visitor here you’ve told well the tale of your tribulations by referencing them. You’ve convinced me of your difficulties, and now they have been validated with the NAS activities and the lack of, so far at least, effective responses to your positions, findings and questions. Seems like overkill to me.

There are really a lot of misrepresentations in Ammann and Wahl. I tried to deal with them in the review, which I’ll post up tomorrow, but, as you will see, the eview was mostly ignored. The trouble so far is that the misrepresentations get repeated. UCAR issued a press release saying that Ammann and Wahl had shown that all our claims were “unfounded”, which was repeated by Houghton to a Senate committee and by Mann and others to the Barton committee and now to the NAS panel. Ammann and Wahl is cited in IPCC 4AR as completely trashing us.

Unfortunately if I don’t pick all the spitballs off the wall one by one, they get repeated. I also want to emphasize that problems with Ammann and Wahl were brought up long ago and just ignored. NAS isn’t going to get into this type of thing. I’m particularly irritated with Ammann as I personally discussed the problem of misrepresentations with him and he essentially spit in my eye. It may be overkill, but indulge me a little.

Re #12: Maybe I’m missing something, but all they mention in the letter is material referenced in their CC submission. If they had said material “submitted to GRL” instead that would have been objectionable. If it was rejected for reasons of too much competition rather than poor quality I don’t see that it would have been especially relevant to mention that in the June 10th letter. Do you have any reason to suspect the reference in the CC submission wasn’t corrected in a timely manner?

Re #9: jae, do you think everyone who catches you in pointless fibs is pompous? (Remember “Greenland must have been greener in the MWP because of the name” and “Norse farms have appeared from under retreating Greenland glaciers”?)

In their letter refusing to provide verification statistics, Ammann and Wahl said:

In addition, Dr. Ammann and I have shown in other material referenced in mss. #3321 that the analysis of McIntrye and McKitrick in GRL (2005)–which claims RE significance levels are improperly determined by Mann, Bradley, Hughes–is itself deeply flawed. Thus, the argument in the request is incorrectly put in this regard,

The only "material referenced in mss #3221" by Ammann and Wahl is "Ammann, C.M. and E.R. Wahl, ‘Comment on “Hockey sticks, principal components, and spurious significance” by S. McIntyre and R. McKitrick’, in submission to Geophysical Research Letters)." So they are referring to their rejected GRL submission, which, further, contained no discussion of the RE statistic, much less a demonstration that our analysis of it was flawed. So it was the GRL submission and I’m glad that you agree that their use of this was objectionable.

But I’m not objecting about the accuracy of the reference in the paper for goodness sake. I’m objecting to its use in the Response Letter, as justification for withholding adverse results,as though it were still alive. It was not “in submission to GRL” on June 10. Not just the use, but to the misrepresentation of the contents of the paper. If you tried to do that to a securities commission or to an auditor, you’d be roasted. If someone tried to trick you like that in business, you’d be a fool to do business with them. The old and wise saying – if you lie down with dogs, you catch fleas. How could you trust them? Obviously they were a bit unlucky – what were the chances that I would have been involved in both processes? But they tried to get cute and got caught. You simply can’t try to trick people that way.

As to corrections, absolutely they did not correct in a timely manner. They took a risk by announcing their submission to GRL. Having done so, when it was rejected by GRL, they had an obligation to announce the rejection as well. You’d have to do that in mining promotions and I don’t see why UCAR should have lower standards than mining promotions.

Something similar happened in Jones and Mann 2004. They cited MBH submitted to Climatic Change and did not issue a correction when the MBH submission was rejected.

Judging from what I have learned here, I think this is how peer review works:

1) An author or authors submit a paper to a journal for publication.
2) It’s given to a couple of people in the field to review.
3) If the reviews are positive, the paper is published.
4) If the reviews are negative, the review task is reassigned to some different peers.
5) Go to step 3.

People like Steve McIntyre, Ross McKitrick, Louis Hissink and others who work in fields like mining and economics have to prove that their statistical analyses are as solid as possible, because if they don’t investors will not put money in, the company might lose money, etc. As such, they know a lot about verification, how to avoid statistical fallacies (e.g. cherry picking biases) and how to make the strongest effort possible to guarantee the outcome is as accurate as possible.

They are asking for climate statisticians like Mann, Bradley, Hughes, Ammand, Wahl, etc. to use similarly exhaustive methods in order to make sure their results are valid, as they feel that, judging by the serious policy implications of these findings, as much effort should be taken to validate climate studies as mining studies or economic projections.

They seem to be resisting these efforts. I don’t think that necessarily makes them dishonest. However, their methods are not fully above board (for example, the way that they refer to Steve’s work as being “discredited” without actually providing the details of how they have arrived at that conclusion). I think it reflects badly upon them.

I don’t think we’re the only ones who feel this way either. Steve reports that a number of scientists back up his requests for reproducibility and auditing of studies. Personally, I’m not going to assign motives, I think we can all come to our own conclusions why some people are being “difficult”.

RE: #7, 9
Steve B. was asking a legitimate question, even if it was minor and pretty tangental to the main subject being discussed; but that happens quite a bit around here. So in this case I think you should cut him some slack. Generally I find some truth in the notion that Steve M. gets snubbed because he is not a PHD.

Re: #17
Steve B., if you are a supporter of the idea that only published work counts, or at least that you can only quote published work, then the idea that someone quoted work that they knew had been rejected for publication, should bother you. Now maybe you think that it’s OK to quote unpublished work in reference to other work you are attempting to get published, and if that’s so, then I can see why you have no problem with Ammann doing just that. Otherwise I don’t understand your reluctance to admit that what Ammann did was unsettling, to say the least, and perhaps outright dishonest. What say you?

Anyone that deserves a PhD knows that the degree is not a necessary requirement for contributing to knowledge. Few, if any, of these guys have a doctorate in “climatology,” (whatever that is) and they do not necessarily have any more authority than anyone else to discuss ALL topics included under the “climatology” field. Their PhD dissertation and research dealt with one little tiny aspect of some scientific discipline, like the number of wiskers on catfish, and the degree may make them a “world-class expert” on that little area, but not necessarily on any other area. Probably none of the “climatologists” know as much about statistics as Steve M. There are far too many supercilious arrogant PhDs in the world that think the degree confers some type of general superiority. It doesn’t.

If you publish a paper, it has to stand alone, with all the references. For this reason, you cannot publish a paper where you rely upon results that are unpublished, or “in press, but I can’t tell you where, and they might not really be accepted”.

It is very difficult to submit a paper whose conclusion rely on stuff in another paper, which may or may not be accepted. Referees are allowed to come to a conclusion on both papers, and to reject the paper on the basis that the supporting paper is not sound.

In this context, misrepresenting a paper as accepted, when it is not, clearly represents a standard of behaviour which requires no scientific training to interpret.

Have to concur with Per (#28) that a paper has to stand alone. Given A&W are claiming that it was space pressure rather than content which saw their paper rejected, why haven’t they published on-line? As it is, there is a load of cross-referencing to this vital critique which muddied the water but where’s the beef?

#19 If I’m not mistaken, in their response to your review, Ammann and Wahl cited a manuscript that had been submitted and that they knew to have been rejected. Is that right? If so, it is doubly unusual. First, it would be unusual to cite a manuscript that has already been rejected. They claim the rejection was because it was repepetitive, but how would any reviewer know that? The result is that the reviewer has no possibility of checking arguments within the the cited work. In the second place, common practice is that reponses to reviewers should be self-contained. That is, any argument made in reply must be complete within the body of the reply. It would be OK to cite a previously reviewed and published, and thus available, paper. It is *not* OK to both cite a rejected paper and to not reproduce in full the analysis that refutes a reviewer’s criticism.

I also find it unusual that Ammann and Wahl declined your request for a pdf of “Stationarity and Fidelity of Simulated El Niño-Southern Oscillation…” Their answer presumes their own full knowledge of all possible significance of the paper in question. However, they could not have known what the reviewer had in mind — some signficance that might have escaped them. In my experience, what a reviewer wants, a reviewer gets.

A third aspect I find strange is Schneider discouraging you from trying to reproduce calculational aspects of the work. His only concern should have been that you do so within the stated review period of 4 weeks or so. With that, he should have both encouraged you to proceed, and thanked you for being so thorough. This sort of behavior has been my experience with editors when I have recalculated some results of papers I’ve had under review.

The only avenue an author has during a contentious review is to either show that the reviewer is wrong or being argumentative for its own sake, or else to withdraw the manuscript and re-submit elsewhere. In the former case, an author would have to explicitly show the reviewer’s arguments are unfounded; either by reference to other published work or else by some detailed and definitive analysis presented in reply. I have at times replied to reviewers with several figures embedded in the response, made just for that reply, along with multiple pages of detailed analysis.

In short, Steve, the process and exchanges you experienced in reviewing Ammann and Wahl’s paper look very strange to me. Schneider let them get away with things he should not have done, which I think is what allowed the review to go as it did. Schneider became part of the process rather than staying above it. Without a fully satisfactory response to a reviewer’s strong criticism, it is the job of the editor to reject the paper, not to publish anyway with an editorial follow-up. It is certainly not the job of an editor to act in a partisan matter, supporting the refusal of authors to provide information a reviewer cogently requests. My only caveat is that you should have explained why you wanted that information. Perhaps you did. If so, then any rationale for refusal goes away.

I suspect that there could be some difficulty, and I may sympathise with the editor to some extent.

It is the editor’s job to look after the peer-review process. There will be various referees, and SM will obviously have a subjective viewpoint, since his work is being criticised. The editor must look at all the referee’s comments, and must master the arguments to be able to come to a sensible view about how to proceed; which he can then pass on to the authors.

Part of the issue is that you may have to persuade the editor of your viewpoint. A&W provided a lot of argument, and a very simple analogy, for the editor to follow on the issue of R2 vs RE. There were even three graphs to make a point ! Even if this is palpable rubbish, you still have to explain to the editor why this is rubbish, and this is especially true if the editor’s expertise does not cover the area of the manuscript; because they are then more easily swayed by seductive argumentation.

If this boils down to an editor/ referee making judgements outside their specialist area of knowledge, it won’t be the first time. They are often dependent on their referees, and what they say. Having said that, even I can see that there is quite a bit of weirdness in there.
cheers
per

One place to see the Mann’s (2003) 2 millenium hockey stick is at http://stephenschneider.stanford.edu/. I first saw the hockey stick on a Nova program on PBS five years ago, and Schneider bio says he consult for Nova so he probably had a hand in that program also. He has made up his mind.

Re:#19, #27.
I’ve got to agree with Pat Frank (#30), that “In the second place, common practice is that reponses to reviewers should be self-contained. ”
However, it seems that the problem began earlier, when AW did not supply a copy of all non-published references (i.e. their GRL submission) along with their manuscript for review. This is an absolute requirement when submitting a manuscript for peer review in all the journal processes I’ve seen. This makes it doubly “unusual” that they did not submit a copy along with their response-to-reviewer.
Their attempt to justify that RE was not at issue might be ascribable to their “lack of skill,” except that, besides misrepresenting the contents of their unpublished GRL paper, they didn’t even explicitly reference it in their June 10 response. I can’t imagine a serious scientist writing in a point-by-point detailed response that a point is justified by “…other material referenced in mss. #3321” unless they are deliberately trying to hide the identity of that “other material.”

I also strongly agree with per’s comments in #31, especially “Even if this is palpable rubbish, you still have to explain to the editor why this is rubbish,…” Note the simplistic examples that AW give to show how r2 can go wrong.

#31, 34. This wasn’t in the version that we commented on. If you look at our actual review posted up today, I don’t think that we left much out. Here I merely posted up the introductory correspondence.

Steve M,
I don’t seriously think that you were removed as a reviewer because you don’t have a PHD. More likely is that you were asking questions that they felt were too tough and/or didn’t want to answer. I just thought that Steve B’s idea had some merit and should not be dismissed out of hand.

1) The high frequency component the actual climate is real. 2) The high frequency component is of the same magnitude as the low frequency component

Even though the various hockey stick graphs span a much longer range (1000-2000 years), they are also reporting a temperature variation of almost the same range (0.6 – 1 C). The satellite vs. surface measurement debate also illustrates a parallel debate in the climate research community concerning the difficulties of quantifying low frequency drift. If the high frequency components of the satellite and surface measurements didn’t match, then there would be a completely different debate. At least one of the two would have been considered to be seriously wrong.

Am I correctly in that the tree ring proxies have trouble with short term correlations? If the short term bumps and dips in the graphs can be ignored, it’s more believable that spurious correlations might occur. Is there something wrong with my comparison?

Re #38
Leaving aside the whole issue of whether tree rings can really measure temperature reliably in a useful fashion, the whole high frequency vs. low frequency aspect of the reconstructions has only been brought up as a poor last-ditch attempt to justify ignoring r2 (not because of anything inherent in the tree rings). I don’t think tree ring proxies have any worse characteristics in the short term than in the long term.

Having read Steve’s withering review of the original WA submission, I’m absolutely amazed that it has gone to press largely unchanged. It really makes you wonder about the quality of other peer-reviewd papers in the field (actually any field).

And I don’t agree with per – I think there is plenty to find fault with in the editor’s behaviour. It’s not about arcane statistical discussion. You only need to read MBH and MM05(a&b) to realise that WA completely misrepresent MM’s papers. I doubt that a bright 12 year old would have a problem seeing that.

Steve, I hope you have an opportunity to respond to this in Climatic Change.

Given A&W are claiming that it was space pressure rather than content which saw their paper rejected, why haven’t they published on-line?

I don’t quite see that as being the main issue, John. It seems to me that they (and others) have used their unpublished work as proof that M&M should be discounted. If they had only just submitted the first paper, then were commenting on the review of the second short thereafter, you could understand them counting on the first [after all, no-one submits a paper in the expectation it will be rejected].

However, this entire saga has played out over an extended period of time. During that time, A&W have relied on it, Mann has relied on in, Houghton has relied on it, et al.

This isn’t even the most egregious problem. What A&W said on 10 June was that they had shown [read: proven] via their submitted, but unpublished paper, that M&M was “deeply flawed”.

However, at the time they wrote this statement, they knew that their submitted paper had been rejected [this occurred on 6 June].

Bear in mind that they did this in defense of a paper submitted to Climate Change. However, the rejected paper they were referring to as negating M&M was one submitted to GRL – two totally different papers.

So, let’s succintly summarise what’s happened:-

* A paper was submitted to GRL
* A later paper was submitted to CC
* A critical review was received in relation to the CC paper
* The GRL paper was rejected
* In response to the CC paper critical review, the authors cited the GRL paper as negating the criticism
* The authors’ assertion was not just that the GRL paper (which they already knew was rejected) supported their position, but that it effectively rebutted the criticisms of their new CC paper

Now, I would argue that this is a triple sin. The first sin, is referring to a rejected paper at all.

The second, is knowing it had been rejected, but still referring to it as supporting your point of view.

The third, is not specifically (other than by an obscure alpha-numeric reference) disclosing the fact to the journal and the reviewer that your “proof” of the criticism being unfounded was not only in a paper that had already been rejected, but in a paper which YOU had authored.

There is no possible excuse for this sequence of events. I hesitate to call things frauds and I suspect that A&W, having a deep emotional and professional commitment to their work, found it difficult to accept the criticism and rejection of it. Nonetheless, if Steve’s account is accurate, there is no other possible explanation, except that that they were disingenuous…and they lied.

It is also a serious and (I would suspect) a reviewable matter in relation to a paper staying on the published record, when an incorrect fact is used to support the case for its publication.

One point of correction, in relation to my third sin – A&W did actually identify themselves as the authors of the mss.#3321 piece, so perhaps their sins are only 2.5…although what’s 0.5 between friends?