Well, well. Look what the cat dragged in.

We seem to be having occasional success in getting things archived. CSIRO was shamed into providing the data for their Drought Report and David Stockwell has now reported on this.

Earlier this year, we reported a form of academic check kiting by Ammann and Wahl, where they had referred to Supplementary Information for key results, but failed to provide the Supplementary Information. Flaccid peer reviewers and flaccid editors at Climatic Change either didn’t notice or didn’t care. Given that RE significance had been a major issue both in the original MM articles and in the twice rejected GRL submission by Wahl and Ammann, you’d think that someone would have spent a couple of minutes checking out whether the argument in the SI actually worked. But, hey,…

The editors of Climatic Change didn’t have any information about the SI. When I contacted Caspar Ammann for the SI, he replied early this year in the typically ‘gracious’ Team style:

why would I even bother answering your questions, isn’t that just lost time?

So this became one more issue on the blog. Some readers get tired of the litany of non-compliance. Look, I get tired of the non-compliance too. Ammann’s case was particularly egregious because the article actually referred to and relied on the SI, which was then withheld. In some cases, sunshine works. CSIRO grudgingly archived their drought data and, a couple of days ago, I noticed that Ammann had grudgingly put up his Supplementary Information (without notifying me despite my outstanding request.)

I’ve been criticized for not replying to Wahl and Ammann, but, unlike, say, IPCC section authors considering this material, I actually like to be able to examine the Supplementary Information and this has only been available for a couple of weeks (and, in my case, effectively only a couple of days.)

Some of the results in this SI are simply breath-taking. I hardly know what to say or where to begin.

For now, I’ll assume that readers are aware of the issues involved in RE significance and why this has been a battleground issue. MBH asserted that an RE statistic of 0.0 was “99% significant”. In MM2005a (GRL) and MM2005c (GRL-Reply to Huybers), we observed that very high RE statistics could be thrown up merely by red noise handled in MBH style, noting 99% percentiles of 0.54 in our Reply to Huybers (which improved the simulations of MM2005a, where the issue was first raised.) We didn’t argue that these particular benchmarks were written in stone, merely that “great caution” needed to be used in interpreting RE statistics – a point that I’ve further amplified recently by observing the very high RE statistics associated with classical spurious regressions.

Both Ammann and Wahl and Wahl and Ammann cite a benchmark of 0.0. I’ve been keenly awaiting their “proof” of this benchmark. Ammann and Wahl had said:

Individually-established thresholds are not necessary for verification significance evaluations based directly on the characteristics of a target temperature series, such as Northern Hemisphere average surface temperature. For these cases, an RE threshold of zero can still be considered appropriate.

A typical climate science “argument” that we see so often. Just re-assert your original position (and then cite it.) Notwithstanding the fact that the results in our Reply to Huybers had not been rebutted in peer reviewed literature and that the Ammann and Wahl submission to GRL purporting to rebut these results had been rejected twice, Wahl and Ammann, relied upon by IPCC, used an RE benchmark of 0.0, a decision reported in Ammann and Wahl as follows:.

MBH and WA argue for use of the of Reduction of Error (RE) metric as the most appropriate validation measure of the reconstructed Northern Hemisphere temperature within the MBH framework, because of its balance of evaluating both interannual and long-term mean reconstruction performance and its ability thereby to avoid false negative (Type II) errors based on interannual-focused measures (WA; see also below). A threshold of zero was used in these studies, above which a hemispheric reconstruction was regarded as possessing at least some skill in relation to the calibration period climatology (MBH98, Huybers 2005; WA).

Now, as I’ve observed elsewhere, MBH did not actually make the above “argument”; this is simply wishful thinking. In the running text of Wahl and Ammann, we are referred to their Appendix 2 and then, confusingly, back to the companion article Ammann and Wahl:

We consider the issue of appropriate thresholds for the RE statistic in Appendix 2; cf. Ammann and Wahl (2007, this issue) where additional examination of significance thresholds is reported.

And later in the running text of Wahl and Ammann, they again re-assert (without proof) the benchmark of zero, once again referring to their Appendix 2 and back to Ammann and Wahl, which had referred to Wahl and Ammann.

Numerically,we consider successful validation to have occurred if RE scores are positive, and failed validation to have occurred if RE scores are negative (Appendix 2; cf. more detailed discussion in Ammann and Wahl, 2007).

Elsewhere in the running text to Wahl and Ammann, the RE benchmark of zero is referred to again:

The verification RE scores for this scenario (Table 2) are only slightly above the zero value that indicates the threshold of skill in the independent verification period

Now to Wahl and Ammann Appendix 2, tantalizingly titled “Appendix 2: Benchmarking significance for the RE statistic”, where the following result is stated:

When we applied the Huybers’ variance rescaled RE calculation to our AC-correct pseudoproxy PC1s, we generated a 98.5% significance RE benchmark of 0.0.We find that the combination of AC-correct pseudoproxy PC series with the variance-rescaled RE calculation provides the most appropriate mimicking of MBH possible in this simple case of examining the potential verification skill available merely from the non-climatic persistence contained in PC1 of the ITRDB N. American data. Additional analysis of benchmarking RE significance is provided in Ammann and Wahl (2007).

As written, this is simply a re-statement of results in Huybers 2005, which pertained only to a univariate case, and which were superceded by the results in our Reply to Huybers, which is not mentioned here (a point that we raised in connection with their rejected GRL submission, which also failed to consider all the relevant literature.) Inconsistently, Ammann and Wahl appears to note that our Reply to Huybers (MM05c) contained results not available in the earlier study:

Particularly, MM05c (cf. Huybers 2005) have evaluated the extent to which random red-noise pseudoproxy series can generate spurious verification significance when propagated through the MBH reconstruction algorithm.

In their July 2008 Supplementary Information, they have commendably archived RE statistics for many different simulations, including separate simulations for each of the 11 MBH98 steps plus some other cases. However, these results are not based on univariate testing (in the style of Huybers 2005 and MM2005a, early considerations of the matter), but simulate networks somewhat in the manner of MM05c (Reply to Huybers), though they inexplicably (or explicably) leave out the PC calculation step.

In each reported case, they did 1000 runs and collate calibration RE (which is close to calibration r2) and verification RE statistics. Although I’m only going to consider the AD1400 step today, here’s a script that reads these results for further analysis; the RE values for their MBH emulation are also input here. Each of the 11 items in the list WA.stat is the table of verification stats for that step.

The code for collecting quantiles from these tables is shown by Ammann here. Although the code uses the R language, it’s written in a Fortran style. So it has many do-loops that are totally unnecessary in R and make it much harder to see what they are doing. One of the beauties of R is that it makes it possible to use vector approaches that cut down the number of lines exponentially while making the code more transparent. Here’s their code for the calculation of quantiles.Further below, I show my re-written version in my own style below (making sure that the results reconcile):

One more preliminary. While their code considers various ratios of a previously unheard of statistic – the ratio of calibration RE to verification RE, their Table S1 only reports on the case where a benchmark of 0.75 is used. We are assured that this benchmark is “conservative”, but given that this particular ratio has never, to my knowledge, been previously used in any statistical literature, I’m not sure why this standard is “conservative” or “liberal” or irrelevant.

I examine their results for the AD1400 step (h=1) below and for the Table S1 calibration RE/verification ratio of 0.75, setting up three logical vectors, representing conditions considered by Ammann and Wahl: 1) whether the calibration RE was positive; 2) whether the cal RE/ver RE ratio was above the arbitrary benchmark of 0.75 and 3) whether the verification RE exceeded the MBH RE as calculated in the WA emulation. This code is as follows:

As a first cut, I calculated the quantiles for the verification RE statistic (column 2), where MBH and then Wahl and Ammann had advocated a 99% significance benchmark of 0.0, as opposed to the 99% benchmark of 0.54 presented in our Reply to Huybers (0.59 – MM2005a). Here’s their result: not 0.0, not 0.02, not 0.05, not 0.10, not 0.20, but 0.52. WTF? (yes, microscopically lower than our 0.54, but they didn’t include a PC calculation step. But even without a PC calculation step, they got extremely high RE values from simulations using an MBH style. Is this mentioned anywhere in their article? Nope. I’m speechless.

In our articles, we had reported that MBH verification RE were at an elevated percentile, but not one could be precluded in statistical terms. We noted that the pattern of statistics – high RE, failed verification r2 and CE was typical of the red noise situations, indicating spurious regression, placing weight on the pattern, as much as any individual result. Up to a few percentiles one way or another, Ammann and Wahl got exactly the same results (unreported.)

Without investing in any aspect of their analysis, it’s interesting to see how they extracted a few more squeaks from this pig to make it “99% significant”.

First they checked whether any of the calibration RE values were negative and, if they were, they assigned the verification RE a value of -9999, but left the -9999 value in the calculation, pushing the MBH result a little to the “right”. In this case, there were only 7 (!) cases where the calibration RE was negative (what does that say?), and the quantiles are pushed only a little at the lower end, leaving the upper values unchanged, as shown below. So this issue is completely irrelevant to the analysis and mere sleight of hand.

Next, they checked the ratio of calibration RE to verification RE and if it was under 0.75 (“conservative” – they didn’t mention whether it was “rigorous”), they assigned these verification RE statistics a value of -9999. No fewer than 419 out of 1000 were consigned to this category. This yielded a 99% benchmark of 0.4837, so even with all this tweaking, they couldn’t get a 99% benchmark of 0.0. Did they report this? Nope.

Where does the “99% significant” come from in Table S1. They calculate the number of cases meeting the above conditions, which are also in excess of their MBH emulation, yielding the result shown in the first line above.

This result (rounded up to 0.99) is carried forward to their Table S1 which states:

Table S1 :Verification RE Significance Levels (all at 0.75 as a minimum threshold for the calibration/verification RE ratio): One minus the significance level shown gives the estimated chance of committing a Type I error of falsely rejecting the null hypothesis of no significance for each scenario.
Proxy Network RE Significance Level
1400-network 0.99
1450-network 0.96
etc….

And similarly for other steps. These results are summarized in Ammann and Wahl as follows:

Furthermore, the MM05c proxy-based threshold analysis only evaluates the verification-period RE scores, ignoring the associated calibration-period performance. However, any successful real-world verification should always be based on the presumption that the associated calibration has been meaningful as well (in this context defined as RE &gt0), and that the verification skill is generally not greatly larger than the skill in calibration. When the verification threshold analysis is modified to include this real-world screening and generalized to include all proxies in each of the MBH reconstruction segments – even under the overly-conservative conditions discussed above – previous MBH/WA results can still be regarded as valid, contrary to MM05c. Ten of the eleven MBH98 reconstruction segments emulated in WA are significant above the 95% level (probability of Type I error below 5%) when using a conservative minimum calibration/verification RE ratio of 0.75, i.e. accepting poorer relative calibration performance than the lowest seen in the WA reconstructions (0.82 for the MBH 1400-network)

I try not to use adjectives, but, in my opinion, the representation of these results by Ammann and Wahl (and the related announcements by UCAR) are reprehensible. Instead of refuting our analysis of RE benchmarks, they’ve confirmed it (with slight discrepancies explainable by different handling of PC series). They’ve dragged this thing out for years.It’s taken until July 2008 to see their SI. Had this SI been available in February 2006 when Wahl and Ammann was supposedly “accepted”, one wonders whether it would have been so uncritically relied upon by IPCC. Instead they failed to provide the SI to reviewers, reviewers and editors didn’t care, they refused to provide it when initially requested and, now when it grudgingly becomes available, the results are completely at odds with their representations.

Now is the time to write a letter to the editor of the journal based on info in the SI. Not a materials complaint, but a scholarly contribution, highlighting consistency between results in the SI and previous papers by M&M, and outlining the logical consequences.

Steve, a, “Executive Summary” would be both helpful to most of your readers and a necessary item to inform those who need to be informed. Hopefully, after you have digested this early version of the situation, you will be able to write that.

As I understand it, it’s a measure of if a prediction has statistical skill over a validation period or not. They claimed zero. Which isn’t supposed to happen; it would attribute skill to random numbers.

Luis: From what I understand, the importance of W&A is the IPCC relied upon it when it wasn’t even accepted yet, it uses RE when it should have used R^2 and it referred to information that wasn’t available, rather a hidden circular reference type of thing. Then of course the referenced information doesn’t support the claims.

But hey.

Why would I want to give you my data, you’d just try and poke holes in my work.

Steve:
The benefits of dragging this out were twofold. First, many readers will have forgotten the original paper(s), and ‘refuting’ it will gain little traction. Second, and more importantly, the political documents which are built upon to it have been written and sent forward and (will) become law in many cases. Getting the original paper debunked might be possible eventually, but the political damage has been done. Which is the principal reason for all the foot dragging, whether it be dendros, or paleos, or icemen or whomever.

Does any of this rise to academic misconduct? I state this as question because I really don’t know, but if careers can be affected and publicized, that could bring serious discredit to reports using these studies, and the subsequent policies relying on them. Otherwise, George M has a point, the ship has sailed.

Here is my reconstruction of these unfortunate events. Please correct me if I am wrong:

1. W&A wanted to write in AR4 that M&M 2003 had been discredited.

2. To do this they used invalid statistical methods (dissected by Steve above) which they knew perfectly well could not withstand due diligence.

3. They withheld the supplementary information that would have allowed Steve (or anybody else) to prove that their methods were invalid until after AR4 was published.

4. They submitted these results to a journal AFTER the cut off for submitting papers to the IPCC for AR4 (thus minimizing the opportunity for somebody to correct them before AR4 went to press).

5. They falsified the acceptance date of the paper in a crude attempt to permit its use in AR4.

6. They proceeded to write the following in AR4:

McIntyre and McKitrick (2003) reported that they were
unable to replicate the results of Mann et al. (1998). Wahl
and Ammann (2007) showed that this was a consequence of
differences in the way McIntyre and McKitrick (2003) had
implemented the method of Mann et al. (1998) and that the
original reconstruction could be closely duplicated using the
original proxy data.

The IPCC AR4 is clearly the most influential document in setting emissions policies around the world.

Each time it is used by policy makers, the question must be asked: “How reliable is the information presented in the AR4?”

The answer is routinely given as follows: “The IPCC AR4 represents the consensus of the vast majority of the worlds climate scientists. It has been vetted by multiple levels of review, and is governed by a rigorous framework ensuring that dissenting views are fairly and accurately presented. The IPCC AR4 is therefore extremely reliable.”

W&A have effectively demonstrated, in one instance, the utter falsehood of this statement.

It is my belief that, if effectively communicated to a larger academic audience, this incident could have a truly profound impact on perceptions of the IPCC’s credibility.

LEt me answer questions about where this article fits in with a little analogy. Recently we discussed Li’s failed proof of the Riemann hypothesis, where I sarcastically commented that this proof would be easier in climate science, where they could merely refer to “rigorous” and “conservative” tests, without actually having to validate them. In fact, it would have been fairer to express this in terms of the “Team” rather than over-generalizing, a point that I conceded. Li’s “proof” fell apart when the following comment was made on a blog:

The “proof” is that of Theorem 7.3 page 29 in Li’s paper, but I stopped reading it when I saw that he is extending the test function h from ideles to adeles by 0 outside ideles and then using Fourier transform (see page 31).

Li promptly withdrew the paper.

Let’s compare this to Wahl and Ammann and compare how they handled the matter of whether the 99% RE benchmark was 0.0 or not. Since you’re all tired of RE statistics, we’ll substitute ideles and adeles. To impart some of the flavor of the role of IPCC deadlines in the Wahl and Ammann fiasco, let’s suppose that there was a $1 million prize for a proof of the Riemann Hypothesis accepted by December 2005 and in peer reviewed literature by the end of February 2006. I’m not sure how to analogize the role of complicit editors and the IPCC – but maybe you picture a situation where the editor of the lucky journal got $500,000 and the sponsors of the contest expect to get funding of $50 million if their initiative is successful. (I’m not suggesting such venality is at play here, I’m merely trying to describe a situation where editors and sponsors are not very interested in ensuring regulatory compliance.)

In May 2005, pseudo-Wahl and pilt-Ammann announce that they’ve proved the Riemann Hypothesis, which UCAR announces in a national press release. Sir John Houghton announces this to congress. pseud-Wahl and pilt-Ammann (submitted, 2005) says:

we extend the test function h from ideles to adeles by 0 outside ideles and then use Fourier transform (see pilt-Ammann and pseud-Wahl, under review, 2005).

A peer reviewer asks:

Please show the proof for the extension of the the test function h from ideles to adeles by 0 outside ideles.

pilt-Ammann and pseud-Wahl reply via the editor:

This request by the reviewer is specious. We’ve shown (in pilt-Ammann and pseud-Wahl, under review) that arguments that you cannot extend the test function h from ideles to adeles are without merit.

Unluckily for them the reviewer learns that pilt-Ammann and pseud-Wahl has been rejected and notifies the editor:

pilt-Ammann and pseud-Wahl, under review, has been rejected. If they are to utilize the argument that you can extend the test function h from ideles to adeles by 0 outside ideles, then this should be incorporated into the paper in question.

Outraged by the temerity of crypto-GRL in rejecting their submission, pilt-Ammann and pseud-Wahl resubmit to crypto-GRL, and submit a revised version of pseud-Wahl and pilt-Ammann (submitted), which the compliant editor accepts on the last day of the contest (Feb. 28, 2006). The accepted paper, now pseud-Wahl and pilt-Ammann, “in press” 2006, says:

we extend the test function h from ideles to adeles by 0 outside ideles and then use Fourier transform (see pilt-Ammann and pseud-Wahl, under review, 2006).

Two weeks later, pilt-Ammann and pseud-Wahl, (under review, 2006) is rejected. What to do? Even the compliant editor has his limits; he refuses to publish an article which, for a disputed step in the proof, cites “pilt-Ammann and pseud-Wahl, (under review, 2006)”, when that article has actually been rejected. However, if they withdraw pseud-Wahl and pilt-Ammann (“in press” 2006) in order to re-tool the rejected companion article, they won’t win the prize. So the editor and pilt-Ammann and pseud-Wahl cook up a cunning scheme. pilt-Ammann and pseud-Wahl will write a new article and use the same name, pilt-Ammann and pseud-Wahl, and submit it to the compliant editor. All the references in the prizewinning article to “pilt-Ammann and pseud-Wahl, under review” will be replaced with references to “pilt-Ammann and pseud-Wahl, 200x” so that no one at contest headquarters will be the wiser. Everyone will maintain the pretence that everything was copacetic as of Feb 28, 2006.

It takes six months to re-tool, but sure enough, six months later, in August 2006, the new article, “pilt-Ammann and pseud-Wahl, 200x” is submitted to the compliant editor. This time, the new article, “pilt-Ammann and pseud-Wahl, 200x” attributes the missing step to the Supplementary Information:

we extend the test function h from ideles to adeles by 0 outside ideles and then use Fourier transform (see Supplementary Information).

Though inconsistently, they occasionally cite the first article as authority, completing the circle:

pseud-Wahl and pilt-Ammann 200x extend the test function h from ideles to adeles by 0 outside ideles and then use Fourier transform.

Although the new article “pilt-Ammann and pseud-Wahl, 200x” cited its Supplementary Information for the step in controversy, they did not attach the Supplementary Information with the submission, which the compliant editor overlooks. None of the reviewers ask to see the Supplementary Information, showing that you can extend the test function h from ideles to adeles by 0 outside ideles. Nor do any of the reviewers even comment on the fact that the Supplementary Information is not even available for inspection.

To everyone’s releif, the reviewers approve “pilt-Ammann and pseud-Wahl, 200x”, now pilt-Ammann and pseud-Wahl, 2007. This triggers the release from purgatory of the original prize winning article, pseud-Wahl and pilt-Ammann, 2007, which now cites the newly approved companion article as authority:

we extend the test function h from ideles to adeles by 0 outside ideles and then use Fourier transform (see pilt-Ammann and pseud-Wahl, 2007).

The sponsors of the contest are happy since they’ve already used this in their promotions. Everyone’s happy, it seems.

Until one day, the original reviewer – the one who actually asked for the proof for the extension of the the test function h from ideles to adeles by 0 outside ideles – looks for the Supplementary Information. The journal says – we don’t know anything about it, talk to pilt-Ammann. So the reviewer writes pilt-Ammann, who answers haughtily:

why would I even bother answering your questions, isn’t that just lost time?

However, unfortunately, the reviewer happens to operate a popular blog and he starts raising questions – where is the Supplementary Information proving that you can extend the test function h from ideles to adeles by 0 outside ideles. So grudgingly pilt-Ammann posts up Supplementary Information in July 2008, over 3 years after the original submission. Unfortunately the Supplementary Information doesn’t show that you can extend the test function h from ideles to adeles by 0 outside ideles; in fact, it shows the exact opposite.

#17 To an academic, salary without dignity is life in hell. Just focus on accuracy in science and let the chips fall where they may.
#14 It is not “huge”. It’s a tempest in a teapot. In all likelihood nothing will happen. A principled response, however, is to follow through regardless, doing the right thing whatever the consequences.

Steve: Nothing turns on this in the “big picture”. In a big enough view, nothing much matters. However on a personal basis, how many times have I had to hear that Ammann and Wahl had proved this or proved that? And nothing is what they say it is. It’s very hard to reply to this dreck, because even the most even-tempered reply cannot avoid being confrontational.

There are two shell games to watch here. “PrimePaper” asserts something without proof, referring to “SecondPaper.” “SecondPaper” keeps getting rejected… but ultimately (wayyy after the required publishing date for “PrimePaper”) is rewritten and injected into the published literature so that only minor tweaks to PrimePaper are necessary to complete the loop… with everyone hoping nobody will notice the shell game. Let alone the specious unprovable assertion used.

Our story:

In May 2005, Wally and Andy announce that they’ve proven e=mc2, which UCAR announces in a national press release. Sir John Houghton announces this to congress. As part of their proof, “PrimePaper: Wally and Andy” (submitted, 2005) says:

This request by the reviewer is specious. We’ve shown (in “SecondPaper: Andy and Wally”, under review) that arguments that you cannot prove 2 + 2 = 5 are without merit.

Unluckily for them the reviewer learns that “SecondPaper: Andy and Wally” has been rejected and notifies the editor:

“SecondPaper: Andy and Wally”, under review, has been rejected. If they are to utilize the argument that you can assert 2 + 2 = 5, then this should be incorporated into the paper in question.

Outraged by the temerity of crypto-GRL in rejecting their submission, Andy and Wally resubmit to crypto-GRL, and submit a revised version of “PrimePaper: Wally and Andy” (submitted), which the compliant editor accepts on the last day of the contest (Feb. 28, 2006). The accepted paper, now “PrimePaper: Wally and Andy”, “in press” 2006, says:

Two weeks later, “SecondPaper: Andy and Wally, (under review, 2006)” is rejected. What to do? Even the compliant editor has his limits; he refuses to publish an article which, for a disputed step in the proof, cites “SecondPaper: Andy and Wally, (under review, 2006)”, when that article has actually been rejected. However, if they withdraw “PrimePaper: Wally and Andy” (”in press” 2006) in order to re-tool the rejected companion article, they won’t win the prize. So the editor and Andy and Wally cook up a cunning scheme. Andy and Wally will write a new article and use the same name, “SecondPaper: Andy and Wally”, and submit it to the compliant editor. All the references in the prizewinning article to “SecondPaper: Andy and Wally, under review” will be replaced with references to “SecondPaper: Andy and Wally, 200x” so that no one at contest headquarters will be the wiser. Everyone will maintain the pretence that everything was copacetic as of Feb 28, 2006.

It takes six months to re-tool, but sure enough, six months later, in August 2006, the new article, “SecondPaper: Andy and Wally, 200x” is submitted to the compliant editor. This time, the new article, “SecondPaper: Andy and Wally, 200x” attributes the missing step to the Supplementary Information:

we assume 2 + 2 = 5 (see Supplementary Information).

Though inconsistently, they occasionally cite the first article as authority, completing the circle:

“SecondPaper: Andy and Wally 200x” assume 2 + 2 = 5.

Although the new article “SecondPaper: Andy and Wally, 200x” cited its Supplementary Information for the step in controversy, they did not attach the Supplementary Information with the submission, which the compliant editor overlooks. None of the reviewers ask to see the Supplementary Information, showing that you can assume 2 + 2 = 5. Nor do any of the reviewers even comment on the fact that the Supplementary Information is not even available for inspection.

To everyone’s relief, the reviewers approve “SecondPaper: Andy and Wally, 200x”, now named “SecondPaper: Andy and Wally, 2007.” This triggers the release from purgatory of the original prize winning article, “PrimePaper: Wally and Andy, 2007″, which now cites the newly approved companion article as authority:

we assume 2 + 2 = 5 (see “SecondPaper: Andy and Wally, 2007″).

The sponsors of the contest are happy since they’ve already used this in their promotions. Everyone’s happy, it seems.

Until one day, the original reviewer – the one who actually asked for the proof of the assumption that 2 + 2 = 5 – looks for the Supplementary Information. The journal says – we don’t know anything about it, talk to Andy. So the reviewer writes Andy, who answers haughtily:

why would I even bother answering your questions, isn’t that just lost time?

However, unfortunately, the reviewer happens to operate a popular blog and he starts raising questions – where is the Supplementary Information proving that you can assume 2 + 2 = 5. So grudgingly Andy posts up Supplementary Information in July 2008, over 3 years after the original submission. Unfortunately the Supplementary Information doesn’t show that you can assume 2 + 2 = 5; in fact, it shows the exact opposite.

Ammann and Wahl conclude that the highly publicized criticisms of the MBH graph are unfounded.

As you notice from this press release, UCAR timed it to coincide with one of my rare presentations, which they refer to:

Caspar Ammann, a paleoclimatologist at the National Center for Atmospheric Research (NCAR), is available to comment on the so-called hockey stick controversy discussed by Stephen McIntyre and Ross McKitrick today at the National Press Club in Washington, D.C.

Needless to say, we were asked about this and had no information on exactly how they purported to show that our results were all “unfounded”. And so over three years later, we finally see the SI. What a bunch of rascals. And IPCC – when I asked them to ask authors of unpublished papers for SI – they threatened me with expulsion. And of course Ammann sends his IPCC comments to Briffa in secret and IPCC refuses to release them – their “open and transparent” process. Buncha…

I’m mad about this because of the unmitigated mendacity.

On a more positive note, I’m thinking of a new poetry thread entitled:

I am the very model of a modern climate scientist,

I can program my computer in a style archaic and diabolical,
I can simulate in Fortran and list in order alphabetical,
I can document my articles using references real and hypothetical
[Chorus explains – in press and preparational]
In short, I am the very model of a modern climate scientist,

For my knowledge of statistics, though I’m plucky and adventury
Has only been brought down to the middle of the century
But still, in matters modeling and simulationist,
I am the very model of a modern climate scientist

None of this has ever been about science or truth, it has always been about worldview and politics. Journalists don´t become journalists to report the facts, they want to ”make a difference.” The same can be said for many scientists, particularly the ones involved in activism. Facts won’t matter after the damage has been done. They will simply say ”my bad” and move on. You will feel late to the party and wonder why you did all that work finding the truth that people are bored with and no longer interested in. See how that works?

#20 I understand your emotion. Such behavior is reprehensible. I guarantee that these events are being noticed by people who matter. I said “it’s a tempest in a teapot”. I didn’t say it didn’t matter. The big picture is comprised of many smaller pictures, all of which matter.

My point is that a lynching is not required for people prone to hanging themselves. Give ’em some more rope.

What would happen if you prepared a summary in high school level journalese with the key facts in the first paragraph? [snip]

Following are links to some formal approaches:Caspar Ammann is at the Climate Global Dynamics Division, National Center for Atmospheric Research, Boulder, Colorado, part of University Corporation for Atmospheric Research, UCAR.

See: Ethics UCAR
“As a publicly supported corporation, we are charged with the public trust and must commit ourselves to spending federal money wisely and ensuring that our work is honest and above reproach.” Katy Schmoll, UCAR Ethics Official
x1662, kschmoll@ucar.edu

. . .All UCAR employees have a responsibility to foster an environment that promotes intellectual honesty and integrity, and that does not tolerate misconduct in any aspect of research or scholarly endeavor. As defined in the Federal regulations, research misconduct means fabrication, falsification, or plagiarism in proposing or performing research, reviewing research proposals submitted to any Federal agency, or reporting research results. Fabrication means making up data or results and recording or reporting them. Falsification means manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record. Plagiarism means the appropriation of another person’s ideas, processes, results, or words without giving appropriate credit. Research misconduct does not include honest error or differences of opinion.

A finding of research misconduct will only be made after an inquiry and an investigation as set forth in the procedures. UCAR will take appropriate disciplinary action against individuals upon a finding that research misconduct has occurred. . . .

The above regulation only applies to conduct that occured on or after April 17th, 2002. For alleged misconduct that occurred before April 17, 2002, we use this definition and follow the procedure described in 45 CFR 689 above.

The GSA maintains a list of debarred individuals and organizations (EPLS) in order to facilitate excluding listed persons from participation in government contracts or receiving government benefits or financial assistance.

There are a couple parts of this that I just don’t get. The first is why would anyone go to this extreme? The IPCC, UCAR, NCAR, etc are so clearly in the driver’s seat that I can’t see why they even feel the need to defend temperature reconstructions at any level. They could simply acknowledge M&M, and say that reconstructions need more work. Then they can claim that past temperatures are irrelevant anyway, and do so effectively just by repeating a mantra of “humans are releasing CO2 at a previously unnatural rate and we are scared of it.” Heck, Steve is so careful about not over-reaching that they could agree with everything he says and it still would not change anything. So, why do this? What am I missing?

The second thing is just a feeling I have — that there must be a lot of people that feel betrayed by their pals. If you were working hard trying to add solid science on the subject of AGW and some small group of cowboys does stuff like this, how would you feel? It has to occur to them that this kind of behavior will raise a cloud of suspicion over their own work, too. Bender, I am very interested in your thoughts if you care to share.

Steve: There’s a difference between the overarching IPCC movement and the interests of the Mann, Ammann crowd, who support the movement, but have personal attachments to their own remperature reconstructions. It surprises me, just in management terms, that IPCC managers of WG1 made no attempt to get control of this particular issue, which was festering throughout AR4, even being mentioned as such by the chapter 6 Review Editor. If I’d been managing the process, I’d have inquired whether this line of argument was really relevant to what we were doing and, if it wasn’t, I’d have cut my losses and jettisoned the commitment to this area. Indeed, as an AR4 reviewer, I suggested that. From an IPCC viewpoint, jettisoning non-essential arguments would have the benefit of forcing exposition, including their own authors, onto the more important issues, for which the public is hungry.

Another aspect of the debate that has surprised me with so much business background. All of us have seen business leaders in unfriendly cross-examination where smart guys somehow can’t remember the day of the week. It’s amazing how stupid they seem to be; you have to remind yourself that they’re smart. But businessmen would rather appear stupid than crooked. I think that the opposite is the case with academics. It seems like they’d rather appear crooked than stupid.

I’m sure that some climate scientists find this sort of exchange disquieting. But none really have any appetite for other people’s controversy and it’s a lot easier institutionally to go along with the Mann side of the dispute.

I still hold my belief that if you find the right journalists telling the story in a simple but dreadful “robust” way, it may be a “W&Agate”.

I also hold the belief that if this goes to the public knowledge it is good for science, for the pressure amounts to IPCC to be much much more rigorous in its process, or else further show that they are in premeditation mode and not just in incompetent mode.

One should note that the advice of this being only a “tempest in a teapot” and that one should “give ‘em some more rope”, that maybe other people with entire different intentions on these matters are not that blasé, but rather working furiously so that their own view gets to be reported and selected.

I agree with D. Patterson #34 on this, but due diligence is still needed if you want this to be public. Science blogs are interesting, but I really doubt its true impact on public perception about this. My two cents of course, and I can be very wrong in this.

“However, CE is not the only measure of skill; Mann et al. (1998) used the more traditional “RE” score, which, unlike CE, accounts for the fact that time series change their mean value over time. The statistically significant reconstruction skill in the Mann et al. reconstruction is independently supported in the peer-reviewed literature.”

This is the Wikipeida conclusion in the “Hockeystick Debate” section. A good start would be to challenge that, although I doubt there would be any success because the AGW team seem to have gained all the political high ground. They’ve also convinced a number of high profile government advisers who won’t be anxious to go back to their political masters and tell them they may have made a mistake. Then they have hordes of non-scientific supporters in the environmental movements who see AGW as the potential way of dragging us back into the 17th century. The other component is the people who have been persuaded that ‘ALL” the scientists are saying that GW is a result of man-made CO2 emissions. You have to admit they’ve run a pretty good campaign and it might still be running in 2100 before people begin to realise it is overstated puff. Either way it looks like an uphill struggle, but keep up the good work Steve.

So calculating R2 is a “wrong, incorrect” thing to do. Poor old R2. Down and out on its luck, rejected by climate scientists, forced to wander the streets with only cheap bottles of booze in which to drown its sorrows.

But hey, Wahl and Ammann come to the rescue. They brush its hair, buy it a shiny new suit and rename it “Calibration RE” – and hey presto, it is a useful metric to be embraced by the community again. The rehabilitation is almost perfect.

The ad hoc trimming of the monte carlo results are hilarious as well. I wonder how they came up with 0.75 as a threshold? Presumably if they had increased that threshold just a tick the significance would have gone from 0.989 to 0.991. No sneaky rounding required! They must have had a reason not to. What is the actual value of this ad hoc statistic for their reconstruction? Between 0.75 and 0.8 would just be funnier than funny.

(They wouldn’t be able to choose a number like 0.7621 because it would be obviously cherry picked. But they must have had a reason not to go with a “robust, conservative” 0.8 as well. Or perhaps it just throws out too many of the Monte Carlo results to yield a reliable 99% figure?)

If we go through every possible statistic we can think of (including a few nobody has heard of before), set the threshold (post hoc) just below the actual reconstruction, I bet we could get a really good statistical significance on the Monte Carlo test. Like a Juckesesque 99.99%.

Tough one, isn’t it. Imagine you were basically told to progress your career you had to defend something like MBH98. But they had choices. Like: try a different career?

Steve: Ross has wondered when Ammann would stop carrying Mann’s luggage around for him. Did you notice that A&W and W&A do not even acknowledge Mann, whose fingerprints are all over these papers. Indeed, the main lines of argument are set out in Mann’s early RC posts, which we cite in our MM 2005b (EE), and before that in MAnn’s Nature reply in 2004. But not even an acknowledgment? Does that count as plagiarism as well?

Ooops, read the text more carefully. 0.82 is the figure. I guess they were worried that 0.8 was too obvious a cherry pick… hmm, like 0.75 isn’t. (sarcasm there). So I guess it doesn’t make the “funnier than funny” grade, just plain old “funny”. And depressing. All at the same time.

Jeez, please be aware that that John Denver was born AFTER 1900. I would thus think there must be something wrong with your data. Also, the point marked “highly derivatave” would be even more high if it were moved to the right along the curve. There are probably other things wrong too. For example, 80 Bazillion and 42 D should be in standard form. And A.D. 1900 should read C.E. 1900. Otherwise, it is O.K.

Steve, I am impressed and thankful you are here doing what you are doing. Only a very few people on earth have the intellect, energy, experience, contacts, and time to even attempt what you are doing.

I think the “AGW exists, is bad, and must be stopped at virtually any cost” viewpoint can “die”, eventually. But, it will take a combination of taxpayers objecting to the cost (when it finally starts to hit the pocketbook) plus the publicizing of contrary facts such as only a very few people, like you, can provide. Thank you. Disclaimer: While I am skeptical of many AGW claims, if your objective review ultimately proves the AGW case, then that is a good thing too — all that should matter are the facts.

Meantime, I will continue to visit your site daily as it is among the very few objective sites that pursue this topic in depth. Moreover, because you present the details (e.g. scripts, datasets) like no other, yours is by far the best site I have found for learning the ropes — so I can eventually perform critical analyses myself. In that sense, you provide yet another invaluable public service.

Re #42: Wheres the speculation? Ammann already openly admitted to Steve that his approach to MBH98 was based on what would be good for his career. That much is documented on this very site.

Mind, there is another admission here. Consider this: the monte carlo run generates a bunch of random results, RE scores. To work out the quantile, you have to line ’em all up in a row, left to right, from lowest to highest. Your goal is to get the MBH98 RE score in the top percentile; for 10,000 runs, that means getting the MBH98 score in the top 100 values.

But there’s a problem. It isn’t in the top 100. You have to shift some of those pesky results.

Getting rid of the runs doesn’t really help: if you remove values equally above and below your current position, your percentile remains the same. So you have to shift results to the right (i.e. with high RE scores) to the left of your current location. That’s what the -9999 in the code does, a very negative value shifts it off the scale to the left, instead of removing it.

Think about it: it is a clear admission that a high RE score on its own is inadequate to determine the significance of the reconstruction. Some high RE scoring results are considered worthless.

And what metric do they use to determine which are worthless? Effectively, those that scored poorly in their calibration r-squared in relation to their RE score.

This whole story is utterly amazing. Steve, I suggest you might get better rhyming if you try “climatologist” instead of “climate scientist,” although it has a more limited meaning. Anyway, branching out a little because I just couldn’t resist:

My name is John Wellington Wells,
My specialty’s forecasting hells.
No blessings, all curses,
And what’s even worse is
You can’t reproduce it – oh, well!

I can raise you scores of projections,
And that without statistics,
Unmoved by journal rejections,
Using methods veiled and mystic.

My apologies to W.S. Gilbert, but it’s what I could come up with on short notice.

#47. When the standard deviation of the reconstruction is equated to the standard deviation of the target in the calibration period, the calibration RE and calibration r2 are directly linked by the geometry – which I posted as a teaser a couple of years ago without any bites. They relate because of the geometry of isosceles triangle, a pretty little exercise in what used to be high school geometry.

But it is a tangled and illogical web. After all their blithering against r2, the use of “calibration RE” to sift out “bad” RE scores.

One of the reasons for my exposition of the check kiting is specifically to show that there is overwhelming evidence that these particular steps in W&A, though ostensibly occurring in a “peer reviewed” journal, were never examined or even thought about by a reviewer.

Also, in fairness to Ammann and Wahl, while the SI may be execrably late and provided grudgingly, they have provided materials which can be used for analysis. Precisely because of that, some of these matters can be precisely diagnosed, as opposed to the typical situation where you have to try to guess what they did and then they say “nyah, nyah , nyah, you didn’t guess right, ha, ha, ha.”

If the MWP and LIA were real global events, then it seems to me that the century and millenium scale variability is rather larger than the IPCC estimates. That variability may even be greater than the changes counted as evidence of AGW. Further, it seems to me that the models do not account for or explain such natural variability.

In other words, if the MWP and LIA were real, then everything else comes crashing down. It would not be a disproof, but it would be a breaking of “proof”.

#24. As to why the broader community puts up with this, I think that they face the sort of problem that churches faced when confronted by sexual misconduct problems by individual priests. Responsible people in that community clearly suppressed information, not merely for venal institutional goals, though those were undoubtedly also present, but out of a genuine concern that any damaging of the image of the church would hurt not just the church, but the flock itself, by standing as an obstacle to their salvation.

I think that many climate scientists view this site as “damaging”, not because they can put their finger on anything incorrect that is said here, but because they think that the exposure of individual situations, regardless of the facts of the individual situation, will interfere with the greater public good of global salvation.

I have more confidence in the public. I think that the public can distinguish between individual clergy and the wider institutions.

Steve: Ross has wondered when Ammann would stop carrying Mann’s luggage around for him. Did you notice that A&W and W&A do not even acknowledge Mann, whose fingerprints are all over these papers. Indeed, the main lines of argument are set out in Mann’s early RC posts, which we cite in our MM 2005b (EE), and before that in MAnn’s Nature reply in 2004. But not even an acknowledgment? Does that count as plagiarism as well?

I have a simple answer on this one Steve: they’re trying to get around Wegman’s network analysis. Mann encouraged them to publish without co-authorship or reference so A&W and W&A would appear to be independent.
Steve: I know why they did so – to appear “independent”. My question is different – did they fall into the trap of using Mann’s prior work without citation? A problem that arises only because they were being too cute in trying to appear “independent”.

Has anyone asked Wahl & Ammann for a comment? It would seem appropriate to give them the opportunity to reply before jumping the gun with calls for formal investigation. Climatic Change might be interested in making a statement or comment as well.

And if they choose not to answer, that would be interesting to report too.

I realize you don’t want your analogy pushed on this site, but I’m sure you know similar lines of thought have been noted quite a bit by people who post here.

But one short extension which seems to be valid in this case. People often wonder if the problem among the priests is exacerbated by the rule of celibacy on the part of priests and the easy access to young children. Exactly what these would correspond to in the case of climate science world might be debated but I think the “cloistering” effect of the universities and access to journalists might be good parallels.

I’ve added reference material to the post including various versions and links to correspondence and reviews, many of which have appeared earlier on the blog.

#58. I discourage people from running off in all directions. Poorly phrased complaints by well-meaning but not exactly informed intervenors can act as a vaccination – permitting an easy reply on something inaccurate. I’m not sure what I want to do. On an earlier occasion, I filed an academic misconduct complaint against Ammann, objecting, inter alia, to his withholding of the verification r2 results (which confirmed one of our findings.)He had refused to provide this information in response to a request from a reviewer. In a personal lunch, In our lunch, I encouraged him to disclose matters fully and comprehensively because it was the right thing to do. Ammann said that he had no intention of disclosing this adverse information. So if anyone wonders about my views, I think that that should speak for itself.

While the complaint was dismissed, the adverse results were reported in the 2006 version and noted in the NAS report (after we drew their attention to this.)

In terms of complaining, people have to think about the purpose. In that case, I had a practical objective, where I was still trying to ensure that the publication record was not distorted by Ammann withholding results, and, if that could be accomplished without personal consequences to Ammann, that was fine with me. Still is. I don’t understand his recklessness in thie matter; it’s not like I’m just going to stand by idly.

Part of the difficulty for me is that I’m not really interested in a Comment-Reply exchange under these circumstances. The problems arise because Ammann has failed to fully disclose results, not because of legitimate differences in interpretation. I’m more inclined to pursue matters with the journal in this instance, pointing out to them that their reviewers did not have access to the SI, that the SI did not support the article and requesting a Corrigendum prior to contemplating a Reply.

It’s pretty clear why W&A did this, because they thought they would get away with it. It was a calculated risk. If it weren’t for Steve McIntyre they would have gotten away with it. IPCC’s rush to judgement was the perfect vehicle for their get away.

I would be very careful about calling fraud. Scientists are under pressure to use new methods all the time, which they may not be expert on. They also must constantly push what they know as far as they can. All of this causes mistakes. A statistician is no doubt horrified by much of what gets published. On the other hand, I am sometimes horrified by the nonsense that is claimed based on valid statistics but people who don’t know the science itself (like assuming trees respond linearly to temperature). The fault that can be claimed here and elsewhere is when people have a basic error pointed out and ignore the criticism.

There are some interesting practical uses for this, now that I’ve settled down a little form yesterday. Notice how high the calibration R2’s are in a Mannian set-up with red noise – something that I’ve observed, but a point that is easier to make using Ammann’s data. Think about where this leaves Juckes and his version of “99.99% significant”,

There is a common jury instruction which tells jurors that if they find a witness to have lied in one aspect of his testimony they may disregard all of his testimony. In actual practice what usually happens is not only is that witness not believed but any of the witnesses associated with him are viewed more circumspectly.

Don’t the numerous reputable scientists at the IPCC realize that their credibility is at stake as well when they tolerate repeated stonewalling and sleight of hand such as this?

#13, #61 — the #13 layman’s summary misses an important element of the shell game. My #19 summary is still too much detail I suppose. Here’s #13 with the extra part (and a few minor edits to avoid needless offense):

3. W&A submitted a paper to be cited in AR4 (at deadline), and put the methods in a secondary paper, also not published at AR4 deadline.

4. The secondary paper was twice rejected, holding up publication of the primary paper cited in AR4.

5. W&A submitted a third version of the secondary paper to the journal that was waiting to publish the original primary paper (see step 3), hoping nobody would notice this sleight of hand. In this version, the (invalid) statistical methods were relegated to a Supplementary Information (SI) addendum.

6. W&A withheld the SI, which would have allowed reviewers, Steve (or anybody else) to show their methods were invalid. The SI was withheld until long after AR4 was published.

7. W&A’s primary paper was actually submitted to the journal AFTER the cut off deadline for submitting papers to the IPCC for AR4 (with the effect that nobody had opportunity to comment or correct before AR4 went to press).

8. Someone falsified the acceptance date of the primary paper in a (successful) attempt to permit its use in AR4.

9. The following was accepted as part of AR4:

McIntyre and McKitrick (2003) reported that they were
unable to replicate the results of Mann et al. (1998). Wahl
and Ammann (2007) showed that this was a consequence of
differences in the way McIntyre and McKitrick (2003) had
implemented the method of Mann et al. (1998) and that the
original reconstruction could be closely duplicated using the
original proxy data.

Thus, the ball of yarn is unwound:
* Now that the SI has been released, Steve was quickly able to falsify it (above)
* Falsified SI means the secondary paper is falsified.
* Falsified secondary paper means primary paper is falsified.
* Falsified primary paper means AR4 statement is falsified.

A good lesson in why the entire chain of evidence for scientific analysis must be available and time given for knowledgeable people to validate or falsify the results. Why should any paper “submitted” or “in press” or “being researched” be citable?

How science should work,is if something is falsified you either need to evaluate the data differently or give it up. Example. Years ago I thought catastrophe theory was interesting and found that a grazing system model could generate a butterfly manifold. HOWEVER: I also showed that the folds were so close that you would never be able to detect this topology with field data. Thus, a curiosity only. End of story. I did not declare that it was real in spite of never being detectable.

An incredible amount of work, persistence and intellect went into this. Congratulations.
Consider the following:
1)Toto has pulled aside the curtain, not everyone has noticed yet, but I think they will.
2)For example Anthony Watts’ website viewer numbers are going up steadily and at a good clip
3)Anyone concerned about the wrong headed AGW project needs to work scientifically, if they are able, including conversations with other scientists who need to be educated. The rest of use need to speak out and support people like Steve McIntyre and Anthony Watts with money (websites cost money, bandwidth costs money, journals and books cost money, assistants cost money) and volunteering if appropriate (Surfacestations).

[snip – politics]

Steve: I have never personally suggested that any policies be changed and have said that, if I were a policy maker, I would rely on advice from major institutions until there was a consensus in another direction

Add to that that Ammann circumvented IPCC comment procedures by submitting comments directly to Briffa, rather than through the public process. So that when we submitted comments on the Second Order Draft, Briffa’s retort used analysis from Ammann and Wahl 200x (then not even submitted), which could not be demonstrated on any published literature.

An additional point of annoyance to me and perhaps why I’m pretty hard-line about this, is that in December 2005, I suggested to Ammann that, since our codes reconciled, we should attempt to write a joint paper stating what we agreed on – and in practical terms, 99% of our results reconcile – and, if that failed, revert to the status quo. Think how much time and energy would have been saved.

In retrospect, I’m sure that one of the over-riding considerations for Ammann blowing off this sensible suggestion was the looming IPCC deadline and the fact that IPCC was then naked in terms of any response by the Mann camp, other than realclimate posts.

“Part of the difficulty for me is that I’m not really interested in a Comment-Reply exchange under these circumstances.”

I don’t imagine there’s very much they could usefully say, and I wouldn’t really expect a reply from them. I just thought that it would act as an inoculation against the claim that rather than check it out with the authors first, it had been immediately splashed across the internet. I would assume that if you went to the journal, the first question they’d ask would be whether you had checked with the authors – to see whether they’d posted the right data, say.

But if it was me, the first thing I’d do is see if any other eminent statisticians could take a quick look at the paper and the data and replicate/confirm what you say. Not that I doubt, but it then makes it more than just a blog post in the eyes of the scientific community, and follows the forms about replication of controversial results. I would think you would want to get things nailed down first on a scientific level, rather than have the political climate-sceptic blogosphere pick it up and start running with it. But it’s really none of my business, of course.

#68. I’m mulling that over and, if necessary, will go that route. However, I think that, prior to doing that, I may want to get a couple of rulings from the journal. If the issues relate to disclosure as opposed to differences of interpretation, then surely that goes to the duties of the editor and the scientists, as opposed to a third party statistician. I realize that I tend to think more in terms of prospectuses than science disputes; however, when it comes to disclosure and misrepresentation, I view the primary responsibility as lying with the scientists and journals, as opposed to placing an obligation on me to write Comments. I realize that this is not the first instinct of an academic, but I’m not convinced that it’s an incorrect approach (nor of the opposite). I’m thinking about it.

If this were a mistake why haven’t they admitted it was a mistake? Keep in mind they were questioned early on about this very issue. As Steve states in his analogy:

A peer reviewer asks:

Please show the proof for the extension of the the test function h from ideles to adeles by 0 outside ideles.

pilt-Ammann and pseud-Wahl reply via the editor:

This request by the reviewer is specious. We’ve shown (in pilt-Ammann and pseud-Wahl, under review) that arguments that you cannot extend the test function h from ideles to adeles are without merit.

and furthermore:

Until one day, the original reviewer – the one who actually asked for the proof for the extension of the the test function h from ideles to adeles by 0 outside ideles – looks for the Supplementary Information. The journal says – we don’t know anything about it, talk to pilt-Ammann. So the reviewer writes pilt-Ammann, who answers haughtily:

why would I even bother answering your questions, isn’t that just lost time?

If this was a “mistake” it was a “premeditated mistake.”

There is a Zen story called “Eating the Blame”:

One day at the monastery of Master Fugai Ekun, ceremonies delayed preparation of the noon meal, which forced the cook to hurry. He took up his sickle and quickly gathered vegetables from the garden, then threw it all into a soup pot – unaware that in his haste he had cut off the head of a snake.
At the meal, the monks where highly complementary of the delicious soup, but the Roshi himself found something odd in his bowl. He summoned the cook and held up the snake head. “What is this?!?”
“Oh, thank you, Roshi!” the cook said, and immediately ate it.
— “Eating the Blame” Zen Story

W&A should have “eaten the blame”, (and pulled their paper), as soon as Steve said, “What is this?!?” Then they could have “moved on” with some credibility.

Returning to SpenceUK’s point about the calibrationRE/r2. Let’s ponder the tweaking a little more. The 95th percentile for calibration RE/r2 using Ammann’s own simulations is 0.51, while the observed MBH (WA variation) is 0.39, well short of the 95th percentile.

Ammann posits a “real world” scenario where the decision to take a model to “verification” depends first on a positive calibration RE and then on a ratio of calibration RE to verification RE. How absurd. I’ve NEVER seen this particular sequence of tests used in ANY study nor does it make any sense. If you’re going to use the calibration RE as a first hurdle, surely “real world” requirements will be more than calibration RE of 0! If you’re going to claim “99% significance” as AMmmann does, shouldn’t you at least be 95th percentile in calibration RE? Their AD1400 network is only about 65th percentile against red noise.

I agree with bender. Its more damning and ultimately more fun to watch somebody hang themselves, than to lynch them by posse. What you have to do is make sure the scaffold is built and there is plenty of rope.

When the standard deviation of the reconstruction is equated to the standard deviation of the target in the calibration period, the calibration RE and calibration r2 are directly linked by the geometry – which I posted as a teaser a couple of years ago without any bites.

c#43 So Huybers used univariate variance matching in RE simulation, something else than Mann et al do when they compute verification RE. But it seems to me that even PC1 doesn’t pass Huybers test. Am I right? Something silly going on here.

Steve: UC, my abject apologies. Of course, you got geometry in one bite. I need to move it to the right thread tho. I’ll comment on Huybers re-scaling separately/

“where MBH and then Wahl and Ammann had advocated a 99% significance benchmark of 0.0, as opposed to the 99% benchmark of 0.54 presented in our Reply to Huybers”

Does this mean that Wahl and Ammann had been published (whether it be by response or by paper) saying they had reproduced 99% significance of 0.0, when their own SI for an earlier paper showed that they had almost exactly matched the original MM claim of 99% significance of 0.54 ?

Steve: The SI for Ammann and Wahl 2007 (SI available in July 2008) shows that the 99th percentile value for their red noise REs from their AD!400-type network was 0.52 (as opposed to the value of 0.54 reported in MM2005c). They throw out some values on questionable pretexts, but even doing so, only reduce the value to 0.483. Wahl and Ammann 2007 state the benchmark is 0.0. Be careful in how you use x% significant as that’s not necessarily a sensible form of expression. But the point is right.

Your lead article is very long, and not easy to follow by someone who doesn’t already have the background. I believe many readers would value a shortish summary along the lines of:

For data X and model Y here is what the RE statistic does.
Here is what its distribution would be in a standard case.
Here is what A&W assert in the paper is a significant value.
Here is what A&W assert in the SI is a significant value.
The effect of the size of this discrepancy is …
Plus of course any other brief information you can think of.

This exercise might even help you in drafting a letter to that Journal!

Cheers and hoping I may thank you in anticipation,
Rich.

Steve: I realize that this article starts in the middle of the story. It’s not that I don’t know this, but I can only do so much. What I wanted to do here was document in a timely fashion what I identified in the SI and what was new for me, writing to some extent for readers that have been following the conversation. This won’t be the end of this, don’t worry.

First they checked whether any of the calibration RE values were negative and, if they were, they assigned the verification RE a value of -9999, but left the -9999 value in the calculation, pushing the MBH result a little to the “right”.

#76. Expressed in CCE terms (and I’m finally on the same wave length with you here), let me review the history of attempts to come to grips with this, so that you’ve got exactly where the SI fits in.

Mann started off doing a univariate simulations, comparing against a null model of 0.2 AR1 red noise series, which doesn’t throw up high RE values.

The “real” issue is that spurious regressions throw up high RE values – e.g. Yule’s C of E marriages versus mortality and that the RE test is not “rigorous” but something that is passed by all sorts of oddball non-relationships.

However, in MM05, we approached it from a different angle. We observed that a univariate model between Mannian PC1s generated form red noise (with a lot of HS series) threw up a lot of high RE values, so that an RE test was ineffective against a null of biased picking (which is what the Mannian PC1 method did.) I’ve expressed it here more pointedly than our original article, but the point is hte same. We showed this by simulations in which we fitted the PC1 by regression and then re-scaled at the NH temperature level. This threw up a lot of high RE values (MM2005a)

A little more information about Mannian methods became available, in part through the WA code available in May 2005, which showed a re-scaling step at the temperature PC stage. Huybers observed that under this process, you went back to a 99th percentile of 0. The reason (as shown in a CA post) is that a lot of HS overshoot.

But Huybers hadn’t modeled the impact of networks (nor had we in MM2005a). In our Reply to Huybers, we considered this and did new simulations mixing one red noise PC1 with 21 white noise series and found empirically that this yielded RE values (99th percentile 0.54). So matters stood.

Wahl and Ammann Appendix 2 describe simulations that are along the lines of what Huybers did and beside the point relative to our Reply – an observation that we made in our REply to their GRL submission – which may have contributed to its rejection.

Although Wahl and Ammann maintain this description, the procedure in SI is different from what they say they did. They’ve actually done simulations more along the lines of our Reply to Huybers. A few differences – they skip the step of making synthetic PC1, but they do red noise models of the proxy networks. They also get very high RE values, in effect, confirming our finding that you can get a lot of high RE values from red noise networks, although they do not report this (and actually report the opposite.)

This article is a bit of a two-edged sword, as with the verification r2 before it. Once again, in practical terms, they confirm our results. The problem is that they say the opposite.

As an aside, to do their simulations, they use our code (their code uses the same variable names as our article 0- things like “Data1″,”NM”, which I would probably do differently now with a bit more coding experience. Needless to say, they do not refer to our code directly; they attribute the code to Huybers, saying that it is “based” on our code. Interestingly, Juckes et al also applied my technique for generating pseudoproxies (applying a Hosking algorithm, though they don’t attribute us.) Again, I’d probably do it a bit differently now, but it’s not materials.

I think the word “malfeasance” could be used here quite comfortably. I would even go so far to say that it is demanded by the circumstances. When Steve McIntyre, who is usually so restrained, uses words like “reprehensible” and “mendacity” to describe the behaviour of these scientists, you know it’s got to be really bad. What a shame that so much of the AGW science — and its subsequent policy fallout — is based up the work of such unprofessional scientists and on a system that allows them to get away with such behavior.

If I were retired, I’d have time to digest all of this. But I’m not, and so I have to slap pieces together here and there to get a feel as to what’s going on.
I just read No. 16, and I’m truly amazed how all this progressed.
snip

Steve
When you can, I would find it helpful to see graphs comparing these different distributions with the associated 95%, 99% etc. cutoffs and 0.0, . . . 0.52 results to better understand and integrate your findings of R2 vs RE etc.

At Craig Loehle,
You’re right to an extent, but c’mon..like there isn’t a pattern here?
Their behaviour when requested for data sets, pressed, etc pretty much says it all. It’s not like they concede their mistake and withdraw their papers, like Li graciously did with his failed proof of the Riemann hypothesis.
No way. Instead they give you the run-around, smoke and mirrors, rope-a-dope, stonewalling, kick, scream, and cold shoulder. When finally backed hopelessy into a corner, they just walk away and thumb their noses at you – the bad guy.
What a bunch of respectable individuals! Taxpayers and the public have every right to get angry. Too much is at stake here.
Steve: In “big picture” terms, I don’t think that a lot is at stake. But in terms of my academic corpus, such as it is, there’s something at stake. Academics protect their pieces of turf and so will I. If Ammann wants to play games, so be it.

I think Steve McIntyre has this just right. It’s a turf battle and has only minor significance in the AGW arena as a whole. Few outside the academic arena would have noticed if, as Steve Mc recommended, the entire chapter on Paleoclimate reconstructions had been deleted from the IPCC AR4. Extreme AGW stands and falls on the GCM’s. Paleoclimate has not played a major role in validating or invalidating those. The paleoclimate proxies don’t have enough resolution, precision or geographic coverage for that.

OTOH, if you could show that the cloud parameterizations in the models had been tweaked to deliberately increase climate sensitivity as opposed to tuning for hindcasting, you might have something. Good luck with that.

The longer the IPCC and the journals wait before finally admitting error, though, the worse the PR repercussions will be. See cover-ups, history of.

The Team is now so deeply invested in their artificial reality, developed as it was “for the greater public good of global salvation”, that they are now totally insulated from their own cognitive dissonance.

They have wrapped themselves in the warm comfy fur of moral superiority, where ends justify means and procedures can be modified to achieve the higher moral imperative that is simply beyond “old style” scientific methods.

All they can do now just keep praying the climate will have an huge acceleration in temperature to prove they were correct, if not always right.

I’ll try to give an intuitive explanation of the stats issue here. You have a data set, X. Using the data you estimate a parameter B, so the parameter can be expressed as a function of the data, B=f(X).

Now you want to decide if your parameter estimate is “significant”. In a regression model, “significant” means a large distance from zero, so you find a numerical measure of distance, D, between B and 0, and ask whether it is large or not. Note that D is a function of the data and of B, so we’ll call it D=g(X, f(X)). In many cases there are tables you can look up, but if not you can do a Monte Carlo test. This involves making up a set of random numbers N, and computing g(N, f(N)) each time. g(N, f(N)) is just a number, and the very very important thing is that the function f inside the brackets is exactly the same as the function f you used to estimate B. Apples-to-apples, that sort of thing. You do that 1000 or 10,000 times, and now you have a list of numbers. Rank them and look at the 95th percentile, which is called the critical value, C. If your value of D is larger than C then we say your estimate is “significant”, in the sense that it is farther from zero than you’d get just by chance.

In the MBH case (and prediction problems generally), D is defined by the closeness of B to some target value S. So instead of a distance measure we want a closeness measure, but call it D anyway. There are closeness measures that have tables, but the paleo people like the RE statistic that does not have tables (this preference is not well-grounded, but leave that aside). So we do a Monte Carlo test as before, generate 1,000 numbers using g(N, f(N)), and compare D to the 95th percentile value.

Steve and I showed in our 2005 GRL paper that a proper Monte Carlo test would show that Mann’s closeness measure D is less than the appropriate critical value C, so his model does not have significant predictive skill for telling us historical temperatures. Wahl and Ammann later claimed that they could establish Mann’s result is actually significant and we’re wrong.

But now, with the release of the data and code archive, we can see that they showed no such thing. Mann’s RE score is less than the Monte Carlo critical value even in their calculations. What Wahl and Ammann have done is something else. They have shown that if you generate a list of Monte Carlo numbers using some new function h(N), then the 95th percentile is less than Mann’s D.

They generate a list of numbers g(N, h(N)) and use that to tell us whether the estimate B=f(X) is significantly close to the target. But the problem is that h differs from f, so this is an invalid comparison.

In particular, they are reporting the critical value for a conditional distribution of estimations, where the posterior conditions are {RE exceeds 0} and {calRE/verRE exceeds 0.75}. These particular conditions are ad hoc and ridiculous and all that, but the main point is they were not imposed on the original estimation. So the list of numbers being generated does NOT test the significance of Mann’s reconstruction. By their own reckoning, an apples-to-apples test based on g(N, f(N)) shows the usual insignificance. End of story.

The additional commentary above, by way of analogies, explains why all this matters. Wahl and Ammann’s paper was the fig leaf used by the IPCC to defend its reliance on the hockey stick, it was the basis of Sir John Houghton’s claims in Senate testimony, it was the basis of an NCAR press release against us, etc. And it comes down to an argument of a purely contrived, erroneous and nonsensical character. The leading lights of the climate change intelligentsia – the IPCC, Houghton, NCAR, etc; took their stand on this stuff.

While I understand Steve’s pro forma deference to accepting advice from the institutional consensus for the purpose of policy formation, at a certain point the institutions become discredited as regards their advisory function by their failure to maintain the procedures that were set for them, as well as failing on a prima facie basis to produce quality work.

I agree, Raven. I’m not convinced the public places as much faith in scientists talking about models as they do pretty pictures such as the hockey stick. Data is immaterial. John Q. Public picks up the paper and sees the hockey stick, not some statistical analysis or GCM output. Ultimately, what the public thinks is what matters in this debate.

But you’re doing exactly the same thing as the Team, assuming something that you don’t and possibly can’t know: the magnitude, timing and geographical extent of the MWP. I agree that it probably happened and was probably about as warm and possibly warmer than current temperatures. But I don’t know that with high confidence and I certainly don’t have sufficient data with sufficient precision to validate or invalidate climate models.

Even if we did have good temperature proxies, we don’t know what ocean currents were doing then compared to what they’re doing now. And that’s just one example of data that would be necessary for a proper comparison.

The usefulness of this for me is that whenever anyone tries to make the argument to authoritative ‘peer review’ I can reference Steve’s little cat feet, here. That this evidence of academic chicanery has to do with the famous Crook’t Hockey Stick is just so much gravy. This is a story that can resonate, illustrative as it is of the state of science in climatology. In a cooling world, people wonder what went wrong.
======================================================

But you’re doing exactly the same thing as the Team, assuming something that you don’t and possibly can’t know: the magnitude, timing and geographical extent of the MWP.

I don’t think Raven is making an assumption about whether the MWP was global or not, or how warm it was. Rather, Raven is pointing out the impact a warm MWP has on the assumption of unprecedented current temperatures as well as what is predicted by the GCMs.

Surely there is strong evidence for both the MWP and the preceeding Roman Warm Period:

1. The Schnidejoch Pass

2. Grapes in the UK

3. Norse settlement in Greenland (among the birch forests, running pigs then cattle and sheep and growing crops for 400 years. Their cadavers are still being dug out of the permafrost – further evidence that Greenland was MUCH warmer back then.)

4. The evidence amassed by the IDSOs at http://www.co2science.org, showing huge numbers of papers supporting the thesis of a GLOBAL MWP which was warmer than the present, with very few papers indicating to the contrary.

The major contrary evidence to date has been the hockey stick. Always questionable on the grounds given above, it was dealt death blows by Steve and Ross, but some of these were parried by Wahl and Amman. Wahl and Amman have now been shown to have misrepresented their results, and their attempted parry has failed.

The sheild is broken. The Hockey Stick is Dead. The MWP/RWP thesis seems to explain the facts better than any other.

DeWitt Payne says:
“But you’re doing exactly the same thing as the Team, assuming something that you don’t and possibly can’t know: the magnitude, timing and geographical extent of the MWP.”

I was only arguing that the existance of a warm MWP would affect the credibility of the GCMs and the catastrophic AWG argument. I did not claim that the MWP actually existed. I realize that the data to answer the question does not really exist.

I agree with Ross and Pat. The problems, as described by Steve, are institutional not merely individual. They not only involve A&W, but the journals their paper(s) were submitted to and appeared in, the NCAR, the Lead Authors, Review Editors, etc. of the IPCC, and so on. Pielke Snr made the point well following the release of the draft CCSP Report; we’re confronted by a climate oligarchy whose intent is to report their own singular view of climate science and nothing else.

#45 Somehow these findings need to be publicised well beyond this blog. Not sure how best to procede.
For what it is worth, RobR, I have given this thread a notice and link in Stay Warm, World and on the USA Freedom Forum as I share your feelings on this.

I’ve been trying to understand better the background of this story and I naturally went to RealClimate to see what they made of it.

In here, a “dummies guide to the latest HS controversy”, they basically state that MM2005 is (insert ad hominems to your choice).

But they also say this:

MM05 claim that the reconstruction using only the first 2 PCs with their convention is significantly different to MBH98. Since PC 3,4 and 5 (at least) are also significant they are leaving out good data. It is mathematically wrong to retain the same number of PCs if the convention of standardization is changed. In this case, it causes a loss of information that is very easily demonstrated. Firstly, by showing that any such results do not resemble the results from using all data, and by checking the validation of the reconstruction for the 19th century. The MM version of the reconstruction can be matched by simply removing the N. American tree ring data along with the ‘St Anne River’ Northern treeline series from the reconstruction.

It also says:

Basically then the MM05 criticism is simply about whether selected N. American tree rings should have been included, not that there was a mathematical flaw?

Yes. Their argument since the beginning has essentially not been about methodological issues at all, but about ‘source data’ issues. Particular concerns with the “bristlecone pine” data were addressed in the followup paper MBH99 but the fact remains that including these data improves the statistical validation over the 19th Century period and they therefore should be included.

So does this all matter?

No. If you use the MM05 convention and include all the significant PCs, you get the same answer.

Questions I have:

1) Does MM2005 really outputs the same answer that MBH98 if you put “all the significant PCs”? If so, what was the point on making that paper on those statistical issues? A refine issue?

2) Was MM2005 real intentions to exclude the infamous brittlescone material out of the reconstructions?

I’m a little confused out here. A third question might be, where does this find of this week leave RealClimate’s HS controversy for dummies’ page?

Thank you in advance.

Steve: HAve you read our 2005 E&E article? We discuss the various permutations and combinations there. But note that we did NOT present any squiggle using Mannian proxies as an alternative view of the world, as we categorically did not endorse their proxies or methods. We merely observed that minor variations in methods led to very different results, all turning on the weight assigned to bristlecones by the methodological permutation. While it is very difficult for a statistical method to be so bad that one can say it is “wrong”, we did observe that about Mannian PC1. We reported results for 2 PCs and for 5 PCs, observing, as Mann acknowledged, that one led to an HS and the other didn’t. Since we were not presenting an alternative view of the world, we were neither “including” nor “Excluding” bristlecones. HOwever Mann had stated that their reconstruction to the presence/absence of all dendroclimatic indicators; for this representation to be true, it would also have to be robust to the presence/absence of bristlecones which it obviously wasn’t. We observed that this important warranty in MBH98 was false and indirectly acknbowledged to be false in this RC post and elsewhere – a point that RC has repeatedly refused to confront. OF course, Mann had previously studied the impact of excluding bristlecones in this CENSORED directory, though these results were notoriously unreported. In effect, in this reconstruction, bristlecones are really all that matters; therefore, we said that people should carefully examine whether bristlecones are a valid proxy, hence the ongoing interest in bristlecones at this blog, which has included our own sampling program at Almagre, proving the Starbucks Hypothesis.

Thank you for your comprehensive reply. I also took the time to read Dr Ross McKittrick’s detailed essay about the subject and your words ressonate.

What gets confusing is that the rationale that you are mentioning and that is detailed further in that essay seems to be of a complete different planet from that what RC makes it to be. Am I missing something on RC’s page, or can I conclude that they are misrepresenting the issues at hand?

Another question, perhaps a silly one that has been asked before bazillions of times,

Given the hability of MBH98’s code of creating HS’s from random series, I’ve often been confronted with the argument that while such hability was true, it was in a rather small scale (1%, from what I can gather at the M’s essay) compared to the final graph. While HS’s presented numbers like +-2K in their graph, the evidence for that HS random series is a graph that has numbers like +-0.02K. If it were simply a matter of adding up the results, this wouldn’t matter at all at the final product.

What I thought it was though is that it depended on the scale of the amplitude of the graph, that is, that the HS shape is dependant of the scale of the general amplitude on the graph, and not of absolute K values. And yet, values of 0.02K were chosen for that graph, so I remain in doubt. Can you confirm this?

Thank you
Steve: That point is a total red herring and notably never advocated by Mann or Ammann. The scale of the simulated PC1s is exactly the same as the scale of the MBH PC1. In the svd algorithm, the ||.|| =1. In subsequent steps, they re-scale the PC1 to a 1902-1980 standard deviation of 1, blowing it up and then it enters into a regression (so the re-scaling doesn’t matter other than the coefficient.) The size of the effect is precisely the same size as in MBH. So the criticism is phony. However, the real issue is ultimately bristlecone validity. If they can mysteriously intuit climate throughout the world, then one would have to abandon objections. The question is whether they have a unique ability to intuit climate. If their ability is not unique, then you should be able to deduct them from the MBH network and still get an MBH answer. Ammann has finally acknowledged that you can’t deduct them from the MBH network and still get a HS from that network in the early portions of interest.

In the MBH case (and prediction problems generally), D is defined by the closeness of B to some target value S. So instead of a distance measure we want a closeness measure, but call it D anyway. There are closeness measures that have tables, but the paleo people like the RE statistic that does not have tables (this preference is not well-grounded, but leave that aside). So we do a Monte Carlo test as before, generate 1,000 numbers using g(N, f(N)), and compare D to the 95th percentile value.

If I may add to what Ross has said about the RE statistic, it does not have tables, because the distribution of the RE statistic (unlike say, the t-distribution or the normal) depends strongly on the centering and form of the instrumental record used in each particular case. Although the RE statistic is not scale dependent (i.e. you get the same value of the statistic whether in F or C temperatures), it depends heavily on the choice of the baseline (what a value of 0 represents). Let me illustrate.

Suppose that we have a reconstruction for anomalies (zero represents an arbitrarily chosen point), where the the typical temperature anomaly in the record is of the order of magnitude of 1 degree and the typical “error” in the validation period is +/- 2 degrees. A simple calculation indicates that the RE statistic will have a value of about: RE = 1 – (2^2)/(1^2) = -3 . Admittedly, this is a very poor reconstruction. However, all is not lost. Instead of predicting the anomaly, we will reconstruct the “mean global temperature”. According to NCDC this is about 13.9 C. So we add this value to both the instrumental record and the reconstruction. The numerator of the RE statistic does not change, but the denominator increases so that the RE is now about 1 – (2^2)/(14^2) = .98. Want to do better? Translate the temperatures to Kelvin. Adding another 273 to both series, we get an RE of .9998. Can’t do much better. Same reconstruction, same comparison temperature record. Even though the effect will be smaller, the RE statistic will change if a different time range is selected to calculate the anomaly. The problem with the RE statistic is that it appears to be meaningful only when the variables involved are ratio-scaled (i.e. the value zero actually means “none” and is not merely an arbitrary point and all values are positive). In this case, the RE value is basically the error is expressed as a proportion of the values being estimated, squared, and subtracted from one (although in that case I would not choose to define the RE as it has been done by the climate folks). As adapted by the climatologists in AW and MBH for anomalies, the statistic has no meaning and no proper interpretation. As well, it lends itself to “improvement” through through ad hoc manipulation techniques (e.g., variance matching).

In their paper, AW state on p.77

A threshold of zero was used in these studies, above which a hemispheric reconstruction was regarded as possessing at least some skill in relation to the calibration period climatology (MBH98, Huybers 2005; WA).

Since you can get an RE statistic equal to zero with a reconstruction consisting of nothing but the value 0 repeated over and over, this statement indicates a naivety which is remarkable in people whose work relies so heavily on understanding their own statistical tools. In the example above, we would get REs of 0 by predicting that the mean global temperature was 0 Cs every year back to the year 1000. If measured in Kelvin, an RE of 0 would be obtained by predicting that the world has been at absolute zero for the last 1000 years! In this latter case, we could improve our reconstruction considerably by predicting that it was always at 0 C since that would increase RE to roughly .997.

As a lay person it has always struck me as odd that back in 2007, when SteveM’s work forced GISS to admit that 1934 was hotter than 1998 in the US, the accompanying cry from the Team was that the US represented around 2% of the world’s surface area and, therefore, this made no difference to the global climate scene.

Yet the BCPs which occupy a tiny,tiny,tiny percentage of that 2% surface area are somehow deemed by them to be representative of temperature on a global scale and can remove historically documented evidence that the temperature was higher in times past from the record!

#103 Luis, I’d like to add another point to the discussion on whether Steve and I “left out” key data. The series in question are principal components, which are weighted averages over the entire input data set. All the data are used to form each PC, only the relative weights change. The reason Mann’s result originally seemed plausible was his claim that the hockey stick shape was the “dominant pattern” in the North American data set, i.e. PC #1. But that was due to the flawed PC method. Had the PC computations been done correctly, the hockey stick shape would have been confined to a low-order PC accounting for only 8% of the variance of the North American group. It would have been obvious that there’s something wrong with a Northern Hemisphere reconstruction whose shape dramatically changes based on whether one such data series is included or not. And nobody would have believed that this one series should be allowed to over-ride the entire remaining data set, which is what happens in Mann’s computation.

Later we discerned that the hockey stick shape is confined to a small and controversial sample of bristlecone pines. The particular flaw in the PC method caused all the weight to be loaded on them in the final reconstruction. If they are removed, the hockey stick shape disappears absolutely no matter what variations in methodology are used. We argued in our 2005 E&E paper (which I recommend as a very comprehensive treatment of the issues which remain even up to the present Ammann and Wahl fiasco) that the bristlecones should not be used in reconstructions, and the NAS panel agreed with this. We also showed that Mann had experimented with removing the bristlecones from his original data set and observed the disappearance of the hockey stick, but not only failed to report this but even claimed later that his result was robust to the removal of dendroclimatological indicators (including bristlecones). Because we argued that the bristlecones should not be included, some of our critics have claimed that we are, once again, deleting “key” data. This came up during the drafting of the IPCC report when this claim was made in an early draft.

Mann had experimented with removing the bristlecones from his original data set and observed the disappearance of the hockey stick, but not only failed to report this but even claimed later that his result was robust to the removal of dendroclimatological indicators (including bristlecones).

… the failed test results having been stored in a directory, later deleted, named (ironically & unfortunately) “CENSORED” – meaning data being systematically withheld in a form of sensitivity analysis

…This statistically, mathematically, and morally bankrupt “shell game” is just another tiny speck in a growing inkblot that most people could care less to see.

Does anybody else here believe (like I do) that George Orwell was a prophet of monumental proportion considering the tilted direction that the U.S. and the rest of the world is leaning? Incrementally, the scientists are becoming tools of the State, and as such, they are just as prone to the type of doublespeak we are accustomed to hearing from politicians. Once science is content to accept whatever propaganda the state wants us to swallow, how far are we really from concentration camps, storm troopers, and eventually, gas chambers? Sure, that seems a bit dramatic, but I bet such ideas seemed dramatic to most reasonable Germans and Poles in 1938. I think Orwell’s prophecy just missed it by a few decades. Maybe it should have been called 2014 or 2024.

I wonder if they will put my face in a cage with a rat when I refuse to admit that 1+1=3. Hopefully, honest people like Steve and the visitors to his website will hold off the actualization of Orwell’s vision until after I’m gone…

I’m proud to be part of history at this point. The Orwell spectre is there – but it is also galvanizing people into response, into bringing their best side forward, courage, integrity, persistence, and so forth. I’m a non-scientist who realized that a potentially extremely serious of obscured corruption had arisen in a key area of Science – so I’ve taught myself key elements of the science (stats – not yet – but I’ve read McKitrick’s account of the sorry story). I may represent the tiny fraction of ordinary people who actually think for themselves, but “ordinary people” is such a huge number that even a “tiny fraction” of this is a lot of people.

Underneath the “lid” of “consensus”, a new and very interesting science is coming to birth. The pages of blogs like CA and Watts Up are littered with diamonds. I’ve got a vision for a new wiki that simply puts the William Connolleys to one side and puts up the real science. Obviously it needs sufficient rigor. But people like myself need to be able to access good basic info. I started reading this thread wondering what was SI, and was RE “reverse engineering”, and the acronyms were explained in the text but if you miss that you’re sunk. No glossary.

I cannot set up a wiki at this point but I can put the idea out. Might write a leader about all this – especially if someone here says “yes!” Meanwhile I’ve done a basic info page (click on my name) – which also tells the story of my own U-turn.

Lucy:
I should probably take this to the forum or the open thread but I did want to say that you have an interesting website and one whose polite and low key approach I greatly appreciate. Good luck with your various initiatives.

Oops. Disregard what I said about the RE statistic and the baseline. I was misled by the formula given in the file rePH.m.txt:

%Reduction in error statistic. For the MBH98 and MM05 studies, y_hat is the
%proxy or simulated obseravtions and y_ref is the instrumental record.
%
%function re=rePH(y_hat,y_ref);
%
%PH, WHOI 2005.

function re=rePH(y_hat,y_ref);

re=1-sum( (y_hat(:)-y_ref(:)).^2 )/sum(y_ref.^2);

What I didn’t realize was that this formula assumes that the calibration period temperature data must be standardized to mean 0 so that the mean of the calibration period is implicitly used as the baseline for the validation period temperature. An RE equal to 0 is equivalent to the reconstruction obtained by predicting the average of calibration temperature data for every year. What Ross said in #89 still goes – the statistic has no simple distribution and nothing to recommend its use over the usual measures used it mainstream statistics while it has many drawbacks pointed out in MM and by other posters on CA.

“Perception is reality” is a fundamental concept in today’s information-saturated world. Few have the time or the energy to critically examine the stream of information coming their way from a wide variety of sources concerning a wide variety of topics.

Anyway …. My informal poll of a variety of friends and neighbors inside and outside of my usual scientific/engineering circles indicates the following general belief concerning recent temperature trends, both regional and global: The strongly upward trend predicted by the Hockey Stick in 1998 has not only been fully realized over these last ten years, it has actually been exceeded.

So I ask them, on what basis do they believe this, i.e. that the predicted upward trend has in fact been fully realized, or even exceeded?

The responses generally go like this, more or less: “We hear about this topic on the news all the time, a strongly upward trend is the gist of what the news is reporting as fact, and no one with any credibility has offered any kind of contrary evidence or contrary information that temperatures are not rising as fast as was being predicted ten years ago.”

Topic 2: 1984 as an Operative Paradigm in 2008:

David Eisenbeisz: …. Incrementally, the scientists are becoming tools of the State, and as such, they are just as prone to the type of doublespeak we are accustomed to hearing from politicians….

I am in regular contact with the “scientists” you speak of and they are in no way tools of state influence operating within a 1984 style of paradigm. It’s all about money, influence, and the self-congratulating egotism which goes with being “right” about some important social, economic, or environmental question.

The reality here is that The State has become the scientist’s own tool for promoting their own power and influence in the economy and in society — the basic motivation being attainment of the significant economic and personal rewards which go with gaining that kind of power and influence over millions of other people.

The global warming industry — although mostly a state-owned business operation — is just that, a business operation functioning in a decentralized, globalized marketplace, of which these climate scientists comprise an important human resource both as departmental managers and as production workers in the industry’s day-to-day information manufacturing operations.

I believe personally that here in the US in the Year 2008, government-owned global warming businesses have become an important manpower sink for scientific and engineering talent which in previous decades would have been otherwise employed in America’s smoke-stack, industrial supply, and/or consumer goods manufacturing industries.

So … I do not believe that anything like a 1984 mind control paradigm is responsible for the general public’s perception that the world’s actual temperature trend over the last decade is now confirmed by direct measurement to be sharply upward, in general conformance with AGW alarmist predictions.

It’s all about what kind of information generates a business profit within this globalized, decentralized information/ideas marketplace.

You know, after reading Bishop Hill’s review of the perils of Ammann and Wahl, this thread becomes much more comprehensible.

I never realised before that statistics was such a flexible branch of mathematics, whereby one can invent a statistical test for significance if you don’t like the ones everyone else uses. And get it published in Climatic Change.

I have a transcript if you email me, but these lines were good (or bad).

PM KEVIN RUDD: Well, I just look at what the scientists say. There’s a group of scientists called the International Panel on Climate Change – 4000 of them. Guys in white coats who run around and don’t have a sense of humour. They just measure things.

DR TIM FLANNERY: That is the most important thing. Stop burning coal and other fossil fuels and stop putting carbon dioxide into the atmosphere because that is what is warming the atmosphere and that is what’s driving the changes.

DR JAMES LOVELOCK: At the best, wind power cannot provide more than a tiny fraction of the energy needs of civilisation. I think it’s one of those things politicians like because it can be seen that they’re doing something.

TARA BROWN, reporter: So, if nothing else, their (politicians’) hearts are in the right place?
DAVID EVANS: Yeah, sure, however their brains are in the wrong place and we didn’t elect them for their hearts, they’ve got to use their brains as well.

The “proof” is that of Theorem 7.3 page 29 in Li’s paper, but I stopped reading it when I saw that he is extending the test function h from ideles to adeles by 0 outside ideles and then using Fourier transform (see page 31).

Since you can get an RE statistic equal to zero with a reconstruction consisting of nothing but the value 0 repeated over and over, this statement [that an RE over 0 is significant] indicates a naivety which is remarkable in people whose work relies so heavily on understanding their own statistical tools.

It highlights to me once again the oddity of climate science, which is that it is the only scientific field of study whose subject is not a physical phenomenon. Physics studys the physical world and how matter interacts physically, chemistry studys elements and molecules and their chemical interaction, biology studies life … but climate science studies not the physical phenomenon of weather, but the average of weather. Thus the subject matter of climate science is not a physical phenomenon, it is a mathematical average. I can think of no other physical science for which this is true.

Now, you’d think that people claiming scientific expertise in a field whose subject matter is not a physical phenomenon but a mathematical average would have some grounding in the mathematics of averages and standard deviations and their unbiased estimators, along with error values and their measurement and propagation, and the like.

And you’d think that people delving even further into the study of said mathematical average, people who were say involved in trying to estimate said mathematical average using proxies, would be well versed in calculating and understanding things like RE and R^2 and CE and LTP and STP and autocorrelation …

Willis, you raised a fascinating point. Earth really doesn’t have a climate, does it. It has instantaneous weather, and the gradients of energy flux. It has topology, too, but that’s again only relevant to the energy flux. One can speak of LTP and dynamical memory as regards weather trajectories and climate, but those are really mathematical expressions and a kind of heuristic to make classical sense of what’s going on. Their only physical meaning is that the system at time ‘t’ is dependent on the state of the system at time ‘t-1.’ So, you’re right. Climate, both global and local, is really an artificial reconstruction of what the average of energy flux was doing in the past. It has no independent existence.

There are a couple of link problems in the references at the end. There is no link to the version 1 of Ammann & Wahl, and also the link to Ammann & Wahl version 2 has a file creation date of May 05, suggesting that it’s actually version 1.