Wednesday, June 8, 2011

The Wegman Report was a report to Congress, invited by Rep Barton, Chair of the House Energy and Commerce Committee. The report has recently been revealed as heavily plagiarised. It was the centerpiece of hearings directed at Michael Mann's "hockey-stick" papers (MBH98, Nature 1998,MBH99)

However, this post is about the science. The thrust of the WR scientific criticism of MBH is that they used an inappropriate mean to normalize the proxy data - the mean for the calibration period, rather than the full period. This would tend to produce hockey-stick results.

The WR report was based on papers by McIntyre and McKitrick, particularly MM05b GRL. Wegman used their code, archived here. An important claim, frequently cited, is that the MBH algorithm would generate results of hockey-stick appearance, even if the data consisted of red noise with no such tendency. To this end, they showed three figures based on red noise simulations:

Fig 4.1 compared the first PC generated from such a simulation with the MBH reconstruction.

Fig 4.2 showed a histogram of "hockey-stick index" (a difference of means as a measure of HS shape)for 10,000 simulations using the limited and the full mean.It showed a normal unimodal distribution for the full mean ("centered"), and a bimodal distribution for the partial mean ("decentered").

Fig 4.4 came with this caption:One of the most compelling illustrations that McIntyre and McKitrick have produced is created by feeding red noise [AR(1) with parameter = 0.2] into the MBH algorithm. The AR(1) process is a stationary process meaning that it should not exhibit any long-term trend. The MBH98 algorithm found ‘hockey stick’ trend in each of the independent replications. It showed twelve HS-like PC1's generated from a MBH algorithm.

Deep Climate did a thorough investigation of these graphs and their provenance, to complement the work he and John Mashey did on the plagiarism. Regarding these plots he found:

the HS PC's shown were anything but random samples. In fact, the 10000 simulations had been pre-sorted by HS index, and the top 100 selected. A choice was then made from this top 100.

Although Wegman had said that "We have been able to reproduce the results of McIntyre and McKitrick (2005b)", the PC in Fig 4.1 was identical to one in MM05b. Since the noise is randomly generated, this could not have happened from a proper re-run of the code. Somehow, the graph was produced from MM05 computed results.

The red noise used in the program was very different to that described in the caption of Fig 4.4.

In this post, I mainly want to concentrate on the first issue. How much of the HS shape of the PC's that they showed was due to the MBH selection process (and there is some), and how much to the artificial selection from the top 1% of sorted HS shapes? To this end, I tried running the same algorithm with the same red noise, but using correct centering.

This post arises partly from a thread at Climate Audit. A commenter, oneuniverse, undertook the task of re-running the MM05 code. You can find his comments near here. His results are here.

I should first point out that Fig 4.2 is not affected by the selection, and oneuniverse correctly points out that his simulations, which do not make the HS index selection, return essentially the same results. He also argues that these are the most informative, which may well be true, although the thing plotted, HS index, is not intuitive. It was the HS-like profiles in Figs 4.1 and 4.4 that attracted attention.

However, it is also clear that the selection process did make the plots in Figs 4.1 and 4.4 more HS-like.

My own results may differ slightly from those of oneuniverse, in that I took the view that the MM05 code was mixing multiple issues. They noted that MBH also standardised twice by dividing by sd. So they did an explicit svd calc for the MBH emulations, but a standard R prcomp for the centered emulations. I took the view that since it is the decentering that is being studied, it is better to compare the effect of changing just that. I don't believe the other differences matter much, but I think it is better practice to vary one thing at a time.

I'll focus on Fig 4.4, since the PC shown in Fig 4.1 might as well be taken from that selection. Here is the original from the Wegman Report:

And here is my corresponding emulation, using the same selection procedures. It isn't identical to the WR version, but shows the same features.

Now here is what you get if you use centered differencing in the same program. I did this by replacing the calibration mean by the full sample mean but leaving everything else (there's a bit more to it - see update below).

Clearly, there is also a strong appearance of HS shape. But this has nothing to do with the decentered mean. It is the result of the prior selection for HS shape that Wegman used.
You'll notice that the scaling is also somewhat different. This is due to MBH dividing twice (in effect) by the sd in normalising, at least in the MM05 version. I'm not sure that this is a good idea, but it shouldn't make much difference in PCA. The second denominator in the MBH case is larger, because it is calculated relative to the deviant mean. Incidentally, the scaling in the original MM05 code was very different again.

And here is a properly representative sample of the decentered PC's. I simply took a consecutive block (actually 9001-9100) and made the same selection from those, instead of the sorted subsample. I didn't change the numbers within the 100 - because the data is randomly generated, it should be possible to then choose a subsample arbitrarily, rather than regenerate a random selection.

Now we see that there is still some tendency to HS shape, but much less. It can go either way, as expected. In the PCA analysis, sign doesn't matter, so the sign variations don't cancel.

Finally, here is the corresponding randomly chosen centered version. There is essentially no HS tendency.

The last two plots are a fairer indication of the HS tendency than is seen in Fig 4.4 of the Wegman report. It isn't nothing, but it isn't as neat as portrayed there.

Update: I should clarify "leaving everything else" . The "mannomatic" transform in MM05 is: mannomatic = function(x) {N = length(x); i=(N-MK):N; xstd = (x- mean( x[i]))/sd(x[i]); sdprox = sd.detrend(xstd[i]); mannomatic = xstd/sdprox; mannomatic } I've modified the MM05 version for clarity. MK was set to 78, the number of years in the calibration region. It determined the range i, which is used both for mean and standard deviation, and the further normalisation. My "centered" version sets MK=N-1 (all years) rather than 78.

Update: Conclusion.

Wegman, using the code of MM05b, claimed that the technique of MBH (decentering) would yield hockey-stick shaped PC1's even from red noise input (1st fig above).

The 2nd fig above (first below) confirms this. However, the effect is part due to MBH, and part due to a very artificial selection in the MM05 code, where a subsample (100 from 10000) was selected for HS shape prior to display.

The third fig above shows that this artificial selection will itself create HS shapes without decentering - no MBH effect

The fourth fig above shows how Wegman's Fig 4.4 should have looked, without the artificial selection. Some HS effect, but not nearly as much.

The final fig above just confirms that with no selection and no decentering, the HS goes away.

Following the suggestion of oneuniverse, I have recalculated the effect of selection with a centered algorithm, done according to the original MM05b algorithm instead of my adaption of the MBH algorithm. It corresponds to the third plot above. It is of course from a different red noise instance, since the program was re-run. The prcomm() algorithm returns a very different scaling.

It appears to be to also have a very strong HS appearance - but judge for yourself. Again, I emphasise that this is done with the centered MM05 algorithm, as used by Wegman, and the HS derives sinmply from the artificial selection, not from anything unusual in MBH.

144 comments:

Very nice, and Eli notices you plotted everything on the same scale, which raises the question as to why it appears that the short term noise in the centered version is higher than in the decentered version??

Eli,Yes, it relates to my comment after the second of my plots. MBH, at least as implemented by MM05, normalises by dividing by the sd for the calibration period. Then they divide again by the sd of a detrended version, also in the calibration period.

I don't know why that last step, if it is what they did. If the sd in that calibration period is larger than that for the full 581 years, then that will have the effect of reducing the scale (of the decentered version), so the short term noise looks less..

"In this post, I mainly want to concentrate on the first issue. How much of the HS shape of the PC's that they showed was due to the MBH selection process (and there is some), and how much to the artificial selection from the top 1% of sorted HS shapes?"

In order to do this, you should really compute the 'MBH' and 'centered' PC's in the same way as MM05b. (And then examine what the top 1% of centered, and random samples, look like). Yet your analysis above uses a different function (the modified 'mannomatic') than MM05b does. While you say the differences are small, for correctness you should use the same function, otherwise you are, as you put it, "mixing multiple issues".

It can be noted that while the presentation of Fig. 4.1 and 4.4 may not be satisfactory or even that informative, it's Figs.4.2 and 4.3, and Appendix A that objectively demonstrate the biased nature of the MBH algorithm. Fig. 4.2 and 4.3 do not involve any selection by HS index. The Fig. 4.2 histograms you have described above - and while the HS index may not be intuitive, it can be confirmed by inspection that sorting by the HS index is fairly effective in separating out the hockey sticks. Fig. 4.3 demonstrates the effect of not using the short-centered 'mannomatic' on MBH's data (the hockey stick disappears). Wegman also provides a mathematical analysis in Appendix A of the biased effect short-centering in the general case.

re: DeepClimate's investigation

As I mentioned at CA, I also checked DeepClimate's claim about the neccessity a high (0.9) value of rho for an analysis of the bias using AR(1). DeepClimate wrote :

"It’s true that NRC did provide a demonstration of the bias effect using AR1 noise instead of ARFIMA. But it was necessary to choose a very high lag-one coefficient parameter (0.9) to show the extreme theoretical bias of “short-centered” PCA. Indeed, that high parameter was chosen by the NRC expressly because it represented noise “similar” to McIntyre’s more complex methodology."

Contrary to what DC says, however, the biased nature of the short-centered algorithm can be observed by using AR(1) noise with lower values of rho than the 0.9 specified by DeepClimate. The MBH algorithm is able to create hockey-stick shaped PC1's from AR(1) with rho=0.5, 0.2, 0.1 and 0.0001 - these are the values I've checked. (The average of the absolute HSI index decreases with decreasing rho, but the bimodal versus unimodal distribution of the HSI for MBH vs. centered methods is still clear, and an inspection of the random samples gives visual confirms the bias.). I'll be able put up some graphs for these runs this week-end.

It's true that AR(1) with low values of rho cannot simulate the proxies very well, but DeepClimate misses the point. As one of the Nature reviewers of the MM paper pointed out (posted by Ross McKitrick at CA):

"MBH seem to be too dismissive of MM’s red noise simulations. Even if red noise is not the best model for the series, they should have reservations about a procedure that gives the ‘hockey stick’ shape for all 10 simulations, when such a shape would not be expected."

Oneuniverse,I think the point about the status of figs 4.1 and 4.4 is well made by that Nature reviewer's quote:"a procedure that gives the ‘hockey stick’ shape for all 10 simulations, when such a shape would not be expected."I presume he's referring to Fig 4.4. That's what people notice. And he's been misled.

Incidentally, I presume he saw a copy which was made by the code but didn't finally appear in MM05b, but surfaced in the WR.

I agree that the "semi-mannomatic", as I noted, involves using different ranges for sd as well as mean. I still think that varies fewer things. However, I'll do the original MM05 style as well, as you did, for completeness.

I thought the "late-centering" issue can only be a problem if the data already have some kind of hockey stick to begin with.

The way I understood it, the problem is in calculating the covariance matrix by simply multiplying the data by itself (Cov(X) = XtX). This works A-OK if all the data is centered, that is, if you subtract the mean from each time series, then multiplying any two timeseries point by point and taking the average will indeed give you the covariance (e.g. it's expected to return zero if the two timeseries don't correlate, as the positive and negative terms will tend to cancel out).

But if the data is not centered, then this might create "fake covariance". For example, if all the data points in two timeseries are positive, then multiplying them pointwise will return a positive, non-zero value, even if the two series have zero correlation.

MBH apparently center on the late period of the data, rather than the whole data. This can be a problem if the average for the late period differs from the global average, because then the data will be de-centered and fake covariance appears.

But if the late period mean differs from the whole-data mean, this precisely means that the data does exhibit a hockey-stickish shape in the first place!

So IOW the "late-centering" doesn't create hockey sticks out of nowhere; rather, it can only emphasize already-existing hockey stick trends by pushing them up in the order of PC's.

If I am correct (which I doubt) then comparisons based on PC1 alone are disingenuous. They suggest that the method generates non-existing HS out of nowhere, when they have simply promoted the existing HS to a higher PC. You would need to show the effect of the late-centering on the full, final reconstruction - or at least show all the PC's that are to be included for this reconstruction, as per your selection criterion.

toto,I think that's right - I'd be interested in oneuniverse's view. The process selects HS's for PC1, but doesn't create them. In the final reconstruction, either of MM05a (the EE paper) or W&A, correcting the centering made changes at the 1400 end, but not much at the recent end. I'm planning to look more at the other eigenvectors - the MM05 simulation code needs a bit of work to make it easy to store multiple eigenvecs.

It's speculative to decide that the reviewer is (probably) referring to Fig.4.4. The reviewer refers to "all 10 simulations", while Fig.4.4 shows 12 sticks from MM05b's 10,000 simulations.

Also, please note that the manuscript submitted to Nature may be different to the published GRL paper. It may be an idea to ask Ross what the reviewer may have been referring to.

As we've both confirmed, a random selection of results from the centered method(s) shows no hockeysticks, while a random selection from the MBH method shows a significant presence of hockeysticks.

For example, looking at the random samples of 100 from my run (viewable from the link you kindly provided), I can see that about 75% from the MBH method are strong hockey sticks, while about 1% from the centered method could be admitted as hockey sticks.

Eli: PCA attempts to maximise variance in PC1. If all the proxies are centered, then it has to get high variance by using short term noise. If they're decentered it can get high variance from the proxies that wander a long way from the calibration period mean.

Nick: I'd be interested in seeing a comparison between random and cherry-picked AR(1) simulations --- I suspect the top 1% trick was put there to emphasise the HSI in those results.

"Under the MBH98 data transformation, the distinctive contribution of the bristlecone pines [ie. the hockey-stick shape] is in the PC1, which has a spuriously high explained variance coefficient of 38% (without the transformation – 18%). Without the data transformation, the distinctive contribution of the bristlecones only appears in the PC4, which accounts for less than 8% of the total explained variance."

It's worth pointing out that the strip-bark trees (bristlecones and foxtails) which provide MBH's hockey-stick shape are poor proxies for temperature, and should not have been used in temperature recommendations. According to MM05, 93% of the variance in MBH98's PC1 is accounted for by 15 strip-bark sites, 13 of which are from Graybill & Idso 1993, wherein it is remarked that the tree-ring widths at these sites do not correlate well with local and regional temperatures:

“It is notable that trends of the magnitude observed in the 20th C ringwidth growth are conspicuously lacking in all of the time series of instrumented climatic variables that might reasonably be considered growth-forcing in nature.”

MBH apparently center on the late period of the data, rather than the whole data. This can be a problem if the average for the late period differs from the global average, because then the data will be de-centered and fake covariance appears.

But if the late period mean differs from the whole-data mean, this precisely means that the data does exhibit a hockey-stickish shape in the first place!

Tamino cites Ian Jolliffe in order to support his attempted defense of short-centered PCA:

"You shouldn’t just take my word for it, but you *should* take the word of Ian Jolliffe, one of the world’s foremost experts on PCA, author of a seminal book on the subject. He takes an interesting look at the centering issue in this presentation."

Numerous commenters pointed out that Jolliffe's presentation doesn't support Tamino, and in fact Jolliffe cautions against using decentered PCA. Tamino dismissed these remarks.

Ian Jolliffe then commented at Tamino's :

"It has recently come to my notice that on the following website, http://tamino.wordpress.com/2008/03/06/pca-part-4-non-centered-hockey-sticks/ .. , my views have been misrepresented, and I would therefore like to correct any wrong impression that has been given.

An apology from the person who wrote the page would be nice.

In reacting to Wegman’s criticism of ‘decentred’ PCA, the author says that Wegman is ‘just plain wrong’ and goes on to say ‘You shouldn’t just take my word for it, but you *should* take the word of Ian Jolliffe, one of the world’s foremost experts on PCA, author of a seminal book on the subject. He takes an interesting look at the centering issue in this presentation.’ It is flattering to be recognised as a world expert, and I’d like to think that the final sentence is true, though only ‘toy’ examples were given. However there is a strong implication that I have endorsed ‘decentred PCA’. This is ‘just plain wrong’."

He goes on to say: "It therefore seems crazy that the MBH hockey stick has been given such prominence and that a group of influential climate scientists have doggedly defended a piece of dubious statistics. "

Ian Jolliffe also wrote to Steve McIntyre, to let him know that McIntyre and McKitrick's had, unlike Tamino, correctly understood his presentation:

"You have accurately reflected my views there, but I guess it’s better to have it ‘from the horse’s mouth’."

So Tamino claims that one of the "world's foremost experts" on PCA supported Tamino's claim that dentered PCA is appropriate. Said expert them turns up to let everyone know that Tamino has misrepresented him, that the idea that he has endorsed decentered PDA is "plain wrong", and asks for an apology for Tamino.

This must have been embarassing for Tamino, and is maybe the reason that he's deleted these pages from the website. (Your links are to a web archive).

Incorrect--a whole bunch of Tamino's posts were lost, in part thanks to an overly aggressive 'skeptic'.

Regardless, nobody's perfect, not even Wegman and McIntyre. In fact, they are far from perfect.

Nick,

So your findings seems to suggest that M&M (and Wegman) did cherry pick those replications to support their assertion/belief that the MBH algorithm producing a HS even if the data consisted of red noise?

Maybe an abstract at the head of your post might help readers less familiar with the nuances of this technical post. Tks.

Oneuniverse #3,As you suggested I have added an appendix in which the selection was done from the prcomp() algorithm in MM05b (the original). The second graph I showed used the original MM05 implementation of MBH.

I suspect the Nature reviewer was seeing an earlier version of Fig 4.4. As you've noted, the code does some weird (and undocumented) things (which I took out) with the 7th and 8th plots - they look as if they have been added as an experiment. So it's quite likely that an earlier version had 10 plots.

What is probably an important issue here is that Mann et al. have addressed these issue in subsequent work, no? Even if some might argue otherwise, how is it that if MBH98 was so flawed or that the HS was an artifact of their methodology (as some McIntyre followers here seem to believe) that numerous subsequent independent paleo reconstructions using different proxies and methodologies that have also revealed Hockey Sticks in the temperature reconstructions?

In contrast, McIntyre has been vigorously pushing and endorsing the problem-plagued Wegman report until very recently, although he may be less keen now given the recent devastating revelations. Was he all these years really unaware of the very obvious problems with his MM05 code and with Wegman's "analysis" (I would argue that he did not actually make an independent analysis as claimed)? Strikes me as that someone who alleges to be in the know like McIntyre should have known that. So that leaves two options-- McIntyre knew and turned a blind eye all these years, or he knew that what he was giving Wegman would give Wegman the answer he wanted, or he did not quite know what he was doing when he wrote the code for MM05.

This all reflects much more poorly on MM05 and Wegman than it does on some flaws in what was a seminal and ground breaking paper published over 12 years ago now.

Nick, if you we venture to make an independent reconstruction that addresses the issues that you have identified, would it too have a HS? IIRC, the changes involved amount to hundredths of a degree C.

"The comment above from Oneuniverse about Ian Jolliffe is interesting too. Did Tamino ever apologise?"

Yes, he did, and it became apparent that there was some misunderstanding over terminology, as well as Mann's methodology (Jolliffe said that he actually couldn't figure out exactly what Mann had done from the paper itself).

"#5 The final fig just confirms that with no selection and no decentering, the HS goes away."

That's not "the HS goes away from Mann's paper" but rather "the HS goes away from the red-noise samples". Others verified years ago that PCA analysis of Mann's original data with no decentering showed a very similar HS as Mann's original paper. Put those two thoughts together ...

Alex #20,1) Yes, I think that's likely2) See other comments (and also my fig on decentered unselected). It does promote such shapes in the PC list - these are PC1 plots. It would have a lesser effect in a reconstruction using several PC's.

Actually toto, no, it will happen any time. But it will tend to be stronger the "redder" the noise is. But it will never -- for realistic noise parameters -- produce more than 5% or so that resemble the hockey-stickness of real data. That's what we have verification procedures for.

oneuniverse, yes, Tamino did apologise. The misunderstanding was about one detail of his presentation; the math in my second link stands.

here is Jolliffe's first comment on Tamino's; and here Tamino's response. There's more in that thread.

Are you able to explain the selection procedure and what is wrong with it a bit more clearly?

I see you have said "the HS PC's shown were anything but random samples. In fact, the 10000 simulations had been pre-sorted by HS index, and the top 100 selected. A choice was then made from this top 100."

Are you saying Wegman & M&M did thousands of code runs but only selected the ones that looked most HS like and used that as evidence that the PC technique was always producing HS results when in fact it was only tending to do this?

Alex #30,Basically, yes. There's more detail in the Deep Climate post that I linked. They do a run of 10000 simulations, which is the basis of the histogram in Fig 4.2. Then the MBH-style PC's computed are sorted by "hockey stick index". This is the difference between the recent 78 year mean and the 581 year mean. It's a measure of HS shape, as plotted in Fig 4.2.

Then from the top 100 of that sorted list, they choose one (#71) for display in Fig 4.1, and 12 for display in Fig 4.4.

Naturally, this accentuates the HS shape of the sample. That's why I put the MM05 results, which should have no HS tendency, through the same selection. Indeed, after the same sampling, HS shapes emerged (2nd of my figs).

I had a look at the CA thread you linked and it seems to me that Steve McIntyre did respond re the selection procedure.

STEVE M: "I’ve commented on the lack of due diligence by academic inquiries in the past. After saying that strip bark chronologies should be avoided in reconstructions, the NAS panel illustration used reconstructions dependent on strip bark chronologies. I asked North about that in an online seminar and he had no answer – he said that my questions were always tough. In a seminar at Texas A&M, North said that his inquiry just “winged” it. The figure showing the distribution of the HSI index is from the full sample. Mann’s claim was that his reconstruction was “99% significant” – whatever that means. Our article did not say that all simulations generated a HS pattern. The point was that MBH operations applied to red noise could generate high HSI-index results.

"In a real data set, Mann’s data mining algorithm picked out Graybill bristlecone chronologies in the North American network and moved them into a much higher PC – which we reported in our articles. The Graybill chronologies were known beforehand as problematic as we observed at the time and the NAS panel recommended that they be ‘avoided”. END STEVE M.

I get the feeling there is a straw man against the M&M position, i.e. that they never asserted in the first place that the Mann algorithm always produced HS shapes? Thus sorting by HS index might be valid to show that it tended to do so?

Steve also seems to acknowledge somewhat that the Wegman Report is shown to be diminished in its significance after revelations of lack of due diligence.

All in all I can't find real disagreement between what you've found here & what M&M found in 2005?

Alex #31,No, I was talking about the Wegman Report in that thread, as here. And Wegman, in the caption to Fig 4.4, said "The MBH98 algorithm found ‘hockey stick’ trend in each of the independent replications." And in Fig 4.1, "However, the top panel clearly exhibits the hockey stick behavior induced by the MBH98 methodology." No mention of the artificial selection procedure, which would have produced similar results for W's own methodology as well.

Steve in his reply, changed the subject to what was said in MM05b, and didn't respond on what Wegman said.

It's true that MM05b did not include a plot like Fig 4.4, though the code did create one. Judging from the Nature reviewers quote, a similar figure was in that earlier submission, leaving the reviewer with the impression that 10 out of 10 figs showed a HS shape. Now one can't be sure that the selection was used there, but as shown here, you don't generally get that unanimity without it.

What MM2005b did say was "The simulations nearly always yielded PC1s with a hockey stick shape, some of which bore a quite remarkable similarity to the actual MBH98 temperature reconstruction – as shown by the example in Figure 1."And they also didn't mention the selection procedure that had been used to get the example in Fig 1 (identical to W's Fig 4.1).

what I meant was that you could separately show the effects of centering and of variance normalization. In my understanding Fig 4.2 doesn't separate these, and I think it is pertinent. Here's your chance :-)

(BTW I really prefer histograms as they show properties of the population, not just examples of members from which it is hard to generalize visually)

Why has SteveM not called out the pick 100 as wrong, at least in Wegman's work? He is just incredibly evasive and hard to pin down.

A real mathematician, a real scientist wants to understand things, to fix errors, even in his own work, and to make sure that what is understood is 100% correct.

SteveM seems much more like a lawyer or a junior high school debator. Unwilling to cede even small technical points and always trying to change the subject to the larger debate even when the question clearly under consideration is a micro-point.

That to me is both dishonest...and cowardly.

A lot of this stuff, he has been resisting since 2006.

I think it is very obvious that he has the brains to recongnize some of his errors and of Wegman's. But he does not speak out.

I've put up the histograms for the 10,000 simulations - they're linked in Nick's head-post. 'figure2.jpg' is a replication of MM05's results (using a fresh run of simulations). 'figure2_semi_mannomatic.jpg' shows the histograms using simulations following Nick's specification above.

The MBH algorithm, unlike the centered algorithm, can be seen to tend to produce hockey-sticks in the PC1.

1. The red noise proxies that McIntyre generated were way 'overcooked' persistence-wise because he used ARFIMA rather than the AR1(.2) algorithm that Wegman thought/assumed he used. This is more akin to using AR1(.9) than the AR1(.24) and AR1(.4) algorithms that Mann himself originally used.

2. The 100 PC1's with the highest 'hockey stick index' (a term McIntyre coined) were mined from 10,000 runs and archived separately. The Wegman Report Fig. 4.4 shows a selection of 12 'hockey sticks' taken from that extreme cherry-picked 100.

So it turns out that the Wegman Report was pretty much a stitch-up, and Wegman took McI's stuff at face value. In other words: he did absolutely zero due diligence on it before presenting to Congress.

I know this is a contentious issue that has been dragging on in the blogs for around 5 years now, but I can't understand why everyone is ignoring the elephant in the room when it's plain to see from Deep Climate's analysis what happened here.

It would appear to me that this thread about Wegman's due diligence in his congressional testimony and report has muddied the waters about the effects of the MBH de-centering of the PCA with regards to producing hockey sticks. I am attempting with some difficulty to keep clear what the evidence shows. There are a number of comments that have a personal flavor to them that really only take away space from eliciting what the evidence on de-centering effects are and how it is interpreted. There are conclusions and then updates to the analysis and then vague language about what the updates mean.

Oneuniverse has been presenting the counter evidence and I am attempting to put this all together for my own edification. From your posted comment below, oneuniverse, I am attempting to link to your figures, but have not been able to find the links. Could you be more specific in your directions? Your conclusion would appear to be in near complete opposition to what I see from a number of posters here - although in my mind they have not been as clear as your have been.

"I've put up the histograms for the 10,000 simulations - they're linked in Nick's head-post. 'figure2.jpg' is a replication of MM05's results (using a fresh run of simulations). 'figure2_semi_mannomatic.jpg' shows the histograms using simulations following Nick's specification above.

The MBH algorithm, unlike the centered algorithm, can be seen to tend to produce hockey-sticks in the PC1."

TCO #35,I looked through Steve's posts at the time and oddly he doesn't seem to have ever commented directly in detail on the stat content of the Wegman report. He just quoted it - even in the Nat Post op-ed at the time.

I did look through his very interesting talk at Ohio State U and found this eerie observation:"We also observed that they had modified the principal components calculation so that it intentionally or unintentionally mined for hockey stick shaped series. It was so powerful in this respect that I could even produce a HS from random red noise.

This last observation has received much publicity. However, we did not and do not arguethat this is the only way that a HS series can be obtained from red noise: there is the old fashioned method - manually select series with a hockey stick shape and then average."

There's background on the red noice hockey sticks in Ross McKitrick's "What is the Hockey Stick Debate About" PDF (April 4, 2005). They make an appearance in his figure 7.

He says:In 10,000 repetitions on groups of red noise, we found that a conventional PC algorithm almost never yielded a hockey stick shaped PC1, but the Mann algorithm yielded a pronounced hockey stick-shaped PC1 over 99% of the time. The reason is that in some of the red noise series there is a ‘pseudo-trend’ at the end, where a random shock causes the data to drift upwards, and before it can decay back to the mean the series comes to an end. The Mann algorithm efficiently looks for those kinds of series and flags them for maximum weighting. It concludes that a hockey stick is the dominant pattern even in pure noise.

In Figure 7, seven of the panels show the PC1 from feeding red noise series into Mann’s program. One of the panels is the MBH98 hockey stick graph (pre-1980 proxy portion). See if you can tell which is which."

*****

Now, to a mathematical idiot like me, it sure seems that he is saying that those hockey sticks were pulled out at random (since a "pronounced hockey stick" appears "over 99% of the time"). That's a bit different picture than hockey sticks pulled from the top 1% most-hockey-stick-like PCs

I wouldn't be surprised if they were used in other presentations similar to this.

The main finding (imo) of testing with artificial data with no underlying trend is that the MBH algorithm is shown to be a biased algortihm by strongly tending to create a hockey stick shaped "PC1" (I put "PC1" in quotation marks because the non-standard decentered MBH analysis is not actually PCA). To echo one of the reviewers of MM's submission to Nature, one should have reservations about a procedure that tends to select hockey-stick shapes where none are expected.

re: Huybers' "full normalisation"

Huybers raises the possibility of a third way of standardizing :

"Thus, a third normalization is proposed where records are adjusted to zero-mean and unit variance over their full 1400 to 1980 duration, a standard practice in PCA [Preisendorfer, 1988 p22; Rencher, 2002 p393] here referred to as “full normalization”."

Yet as Huybers also mentions, "NOAMER records are standardized chronologies".

MM, in their reply to Huybers, note that "full normalization" is not considered to be standard practice by the statistical authorities cited by Huybers, and explicitly recommended against by one of them, for data such as the NOAMER records:

"Huybers’ [2005] two statistical authorities either do not recommend standardizing variance for PC analysis on series with common units [Preisendorfer, 1988] or recommend the opposite (i.e., a covariance PC calculation) [Rencher, 1995, p. 430; see also Overland and Preisendorfer, 1982; Rencher, 1992]. Only Rencher [1995] even mentions the possibility of standardizing variance of networks in common units in exceptional circumstances that do not apply here."

Kenneth #38,Again, apologies for the spam filter - I know you've been caught twice recently. I did think of moving from Blogspot, but I see wordpress sites are having trouble too.

The links to the oneuniverse posts are:Comments near here. His results are here.

There's not much difference in our results. Wegman's fig 4.2 does not use selected results, and seems robust - my plots and his are much the same. I don't think the other plots are very different either. My argument goes:1. Wegman's fig 4.4 is the combination of two factors - the MBH tendency to make PC1 HS-shaped, and his artificial selection of HS shapes. 2. So I looked at the effect of these factors individually. In the 2x2 set of plots, the top RH plot shows selection alone, the bottom LH one shows MBH alone. In both cases, there is a HS effect. Personally, I think the selection has more effect, but you can judge that yourself. Oneuniverse did these cases too, with, I think, similar results.3. The bottom left is how Fig 4.4 should have looked.

In particular, look at p.58, where (eminent, more than Wegman) statistician Grace Wahba is mentioned by Gerald North:"Grace Wahba, who some of you may know at Wisconsin, she sent me an email and she says: Hey they used my name and they said I was a referee. He sent it to me about 3 days beforehand and I sent him a bunch of criticisms which they didn‘t take into account."

I don't know what she wrote, but if you look at the next on p.58, Noel Cressie, who kindly forwarded his comments to me, posted for me by DC @ here. As described, the timing made it irrelevant to the actual WR, but Wegman certainly had it before the hearings, but you would never know that. Read the whole thing, but the part that bears on this is:

"3. In Figure 4.4, MM showed the hockey stickto bend upwards in all their (well chosen)realizations. In fairness, you should showsome realizations where it bends downwardstoo."

A. McIntyre HAD to see that AR1 (0.2) label and know it was screwed up. He is a hawk for those kind of things...and he knows his code and Mann's code better than anyone...plus he knows there are different options and branches in his code...and even helped Wegman run his stuff (thus could assess how well Wegman understood the topic). But he was silent.

B. A dummy like me noticed it, and I'm just a blog reader. And I pointed it out in 2006(ish). But McI still did not comment on Wegman's mistake (a mistake related to use of his confusing code).

The gist of Noel Cressie's email is found in the unequivocal statement he opens with:

'I concur with the technical contents of your report'.

Don't you think?

On your point, it is found that he considered your point to be less important than points 1 & 2, which is why he makes it point 3. The wording suggests he didn't think was a big deal. Maybe his placement after qualification 'in fairness' led Wegman, who may have been busy, not to notice the recommendation?

This seems to reinforce the view that all of this is a bit of a quibble, and that the bottom line should remain that the MM05 paper was right.

oneuniverse, the point I am trying to make (which became clear in my head only now) is that, while M&M05 lifted the autocovariance behaviour of their synthetic proxies verbatim, warts and all, from the corresponding real proxies, they did not do so with the variances but set those to unity. This is why they never explicitly normalized their synthetic proxies, or studied the effect of doing so -- there was no need to do so, obviously. Or, you could say they did it implicitly. Your computation confirmed this.

This means that it is still not a clean comparison of centering conventions only. As far as I can see, this little detail is nowhere pointed out in the paper...

Nick, I'd be interested in comparing the mean of a (large) ensemble of red noise with decentered against centered PCs using as close an emulation of MBH as you can do. (I suppose this is McIntyre's emulation of MBH, right?)

I realize pink noise isn't a good emulation of the actual proxies (some of which have a lot more warts just being red noise-ish), but it would be a fairer test of whether decentering leads to any distortion beyond the selection process of the MBH algorithm itself, than just "eyeballing" it.

Carrick #51,I'm not sure what you jave in mind here. Is it to complete the reconstruction (mean) MBH style for simulations?

I am quite interested in doing that. I have an unfulfilled intent from last year to do more with the Wahl and Ammann code, and I have the complete thing running now, so I want to look at what the various PC's do, and how they respond to centering.

"Contrary to what DC says, however, the biased nature of the short-centered algorithm can be observed by using AR(1) noise with lower values of rho than the 0.9 specified by DeepClimate. The MBH algorithm is able to create hockey-stick shaped PC1's from AR(1) with rho=0.5, 0.2, 0.1 and 0.0001 - these are the values I've checked..."

I call *bullshit* on that claim. Like TCO says above, even people with minimal stats fu like me can see through that. Try again? This is pathetic.

If anyone can explain to me, in terms that a scientifically literate layman will understand (where I've gone wrong with my assumption that the 'persistent, trendless red noise' ARFIMA proxies generated by McIntyre have no basis in nature), I will put up and shut up.

You shouldn't be happy with applying a procedure that tends to find a hockey-stick shaped PC1 in random trendless data, to real world data. The standard centered method, recommended by the stats literature, performs as one would expect for the red-noise data, and doesn't tend to select hockey-sticks for PC1.

I've put up plots of runs for AR1 rho=0.1 and AR1 rho=0.0001 (at the same googlsites page linked to by Nick). These are for 2000 rather than 10,000 PC1's, to save time.

The random samples from the results of the centered method (MM05 spec) are essentially flat (but 'noisy').

The random samples from the results of the MBH method contain many hockeysticks (over 50%). The hockey-sticks are more pronounced for rho=0.1, but also present for rho=0.0001.

The samples from the 'top 100' (sorted by HS index) are also interesting. The MBH method has generated strong hockey sticks for both rho=0.1 and rho=0.0001, whilst those from the centered method are almost completely flat, showing that the 'selection' by HSI scrutinised by Nick has almost no effect for these low-rho AR1 runs, while the MBH decentered step is still selecting hockey-stick shaped PC1s.

I also put up the histograms for the two runs, which display the characteristic biased behaviour of the MBH method in each, although the average of the absolute HS indices decrease with decreasing rho, as mentioned at CA.

Alex #49"This seems to reinforce the view that all of this is a bit of a quibble, and that the bottom line should remain that the MM05 paper was right."

Surely not?

The bottom line might be that MM05 was right on this specific point: The effect of short-centering in preferentially producing HS shapes from red noise data (precisely how red still seems to be a subject of debate, but I'm sure we'll see that sorted shortly).

However I have yet to see a convincing argument that MM05 was correct in chosing the number of PCs without reference to the holdout data. The seems to be pretty crucial.

CCE,That texas post is so full of snark abd bile, I can't even make out what the allegation is.

"In my first post, the other day on this, I observed that Ammann’s simulations, like ours, threw up a LOT of high RE values – EXACTLY as we had found. There are nuances of differences in our simulations, but he got a 99th RE percentile of 0.52, while we got 0.54 in MM2005c. Rather than disproving our results, at first blush, Ammann’s results confirmed them."

It's this alternation between "They're all shifting targets" and "We got it first" that just makes no sense.

McIntyre is a sophist. I've said a million times that I agree that short-centering has an effect. The problem is that McI conflated other issues with that one to exaggerate the extent. this is both dishonest in terms of overmaking the case...and poor insight in terms of confusing issues.

The man is not someone who REALLY wants to drill to understanding. He only wants to penetrate to insight when it helps him, hurts his opponent. But not purely just to understand. This is the behavior of a sophist (a junior high school debator).

Note that the fellow STILL has not updated his screwup with the MMH paper post. And that he locked the thread when it was apparent that he had made a mistake.

The man loves Clinton. Clinton was an equivocator. McI is scum, people. Ed Zorita is a prince of a man.

Firstly, an apology from me is in order. Though we are obviously on different 'sides' of this issue, you have been nothing if not civil. It was late here in Ireland the other night when I last posted, and I was tired. You know how that goes. So now that I've calmed down a bit, we can discuss further...

In your response to my little rant, you said (#55):

"You shouldn't be happy with applying a procedure that tends to find a hockey-stick shaped PC1 in random trendless data, to real world data."

In fact, I think this single sentence sums up almost the entire disagreement between McIntyre and the 'Team'. The issues here are two-fold:

1. McIntyre's so-called 'trendless red noise' simulations were generated based on 581 years of data (ca. 1400 - 1980) from 70 North American sites. If we are talking about the same thing, then it is the data contained in McIntyre's MM05 on-line archive (at: ftp://ftp.agu.org/apend/gl/2004GL021750/) called '2004GL021750-NOAMER.1400.txt'. And presumably, this is the data you generated your own 2000 simulations from, albeit using an AR(1) algorithm with various values of 'rho' (NRC call it 'phi') ranging from 0.5, 0.2, 0.1, all the way down to .0001. But wait... both yourself and McIntyre refer to these random simulations as "trendless". Well, therein lies a huge problem: the simulations are based on "real world data", which has *alleged climate signal in it*! Therefore, "trendless" is an oxymoron.

But OK, even assuming that, it is still a valid/good idea to add some random noise to the data to see if your principal component algorithm can still show some skill in extracting the real signal. The idea is to simulate the effect of non-climatic influences such as disease. And this is the first thing I have a real problem with: McIntyre used ARFIMA to generate his simulations and this algorithm *way* over-cooks the persistence of the red noise. Deep Climate reckons that ARFIMA, which is approximate to AR(1) with an autocorrelation coefficient (rho or phi, what have you) of 0.9, introduces effects which persist for almost 20 years. In no way does this emulate what happens in nature, which is why I said in my previous post that it has "no basis in nature". This concept is discussed at more length in this guest post from 2006 by David Ritson at RC:

"Von Storch et al added proxy-specific noise that was highly correlated from year to year. It was characterized as AR1, or a simple Markoff process, noise with 70% of a previous years history carried over from a previous year to the next year (this a corresponds to a one year autocorrelation coefficient of 0.7). A factor of 0.7 corresponds to a decorrelation time of (1+.7)/(1-.7) or 6.3 years and reduces the effective number of independent data-points by the same factor i.e. 6.3. If the noise component of real proxy data were really so strongly red, not only the precision of results of Mann et al. (the target of the von Storch et al's analysis) but indeed of all previous millennial paleo-reconstructions would be substantially degraded. In the past it has been generally accepted that the added noise should be only slightly red, if not white (uncorrelated). Von Storch et al. provided no rationale for why they assumed such large year to year correlations."

The 'Team' reckons that an autocorrelation coefficient of between 0.15 to 0.3 effectively simulates real world conditions. McIntyre's ARFIMA simulations? Not so much. In fact, even further from reality that von Storch's! So that's the first big problem with what McIntyre did, and which Wegman just copied without even understanding what McIntyre had done (which is exactly why you can't let someone like McIntyre or Wegman blindly throw statistics at a process in nature that they don't understand. Which is why the plagiarism aspect of the Wegman Report also comes into play. But that we will have to leave for another day). And now on to the second problem:

2. The two sides of this issue have been going round and round on this since about 2003, thus I don't expect I'm going to be able to add any clarity or closure here. So I'll keep it brief: if you use a centered algorithm like McIntyre does to pick out pricipal components, you have to be prepared to include other principal components besides the first one (PC1). Denying (sorry) this basic tenet is akin to saying that PCA does not work!

In summary: the reason why your AR(1) red noise simulations picked out hockey stick-shaped PC1's even with an autocorrelation coefficient of .0001 with the MBH de-centered algorithm is because, as Ritson puts it in a letter to Wegman (source: http://www.meteo.psu.edu/~mann/house06/RitsonWegmanRequests.pdf):

"Surely you realized that the proxies combine the signal components on which is superimposed the noise? I find it hard to believe that you would take data with obvious trends, would then directly evaluate ACFs without removing the trends, and then finally assume you had obtained results for the proxy specific noise! You will notice that the M&M inputs purport to show strong persistence out to lag-times of 350 years or beyond. Your report makes no mention of this quite improper M&M procedure used to obtain their ACFs. Neither do you provide any specification data for your own results that you contend confirm the M&M results. Relative to your Figure 4.4 you state 'One of the most compelling illustrations that M&M have produced is created by feeding red noise (AR(1) with parameter = .2 into the MBH algorithm'.

In fact they used and needed the extraordinarily high persistances contained in the attatched figure to obtain their 'compelling' results. Obviously the information requested below is essential for replication and evaluation of your committee's results. I trust you will provide it in timely fashion."

(my continuation post at first appeared, and now seems to have disappeared. Will try again, sorry if it eventually repeats)

"Von Storch et al added proxy-specific noise that was highly correlated from year to year. It was characterized as AR1, or a simple Markoff process, noise with 70% of a previous years history carried over from a previous year to the next year (this a corresponds to a one year autocorrelation coefficient of 0.7). A factor of 0.7 corresponds to a decorrelation time of (1+.7)/(1-.7) or 6.3 years and reduces the effective number of independent data-points by the same factor i.e. 6.3. If the noise component of real proxy data were really so strongly red, not only the precision of results of Mann et al. (the target of the von Storch et al's analysis) but indeed of all previous millennial paleo-reconstructions would be substantially degraded. In the past it has been generally accepted that the added noise should be only slightly red, if not white (uncorrelated). Von Storch et al. provided no rationale for why they assumed such large year to year correlations."

The 'Team' reckons that an autocorrelation coefficient of between 0.15 to 0.3 effectively simulates real world conditions. McIntyre's ARFIMA simulations? Not so much. In fact, even further from reality that von Storch's! So that's the first big problem with what McIntyre did, and which Wegman just copied without even understanding what McIntyre had done (which is exactly why you can't let someone like McIntyre or Wegman blindly throw statistics at a process in nature that they don't understand. Which is why the plagiarism aspect of the Wegman Report also comes into play. But that we will have to leave for another day). And now on to the second problem:

2. The two sides of this issue have been going round and round on this since about 2003, thus I don't expect I'm going to be able to add any clarity or closure here. So I'll keep it brief: if you use a centered algorithm like McIntyre does to pick out principal components, you have to be prepared to include other principal components besides the first one (PC1). Denying (sorry) this basic tenet is akin to saying that PCA does not work!

In summary: the reason why your AR(1) red noise simulations picked out hockey stick-shaped PC1's even with an autocorrelation coefficient of .0001 with the MBH de-centered algorithm is because, as Ritson puts it in a letter to Wegman (source: http://www.meteo.psu.edu/~mann/house06/RitsonWegmanRequests.pdf):

"Surely you realized that the proxies combine the signal components on which is superimposed the noise? I find it hard to believe that you would take data with obvious trends, would then directly evaluate ACFs without removing the trends, and then finally assume you had obtained results for the proxy specific noise! You will notice that the M&M inputs purport to show strong persistence out to lag-times of 350 years or beyond. Your report makes no mention of this quite improper M&M procedure used to obtain their ACFs. Neither do you provide any specification data for your own results that you contend confirm the M&M results. Relative to your Figure 4.4 you state 'One of the most compelling illustrations that M&M have produced is created by feeding red noise (AR(1) with parameter = .2 into the MBH algorithm'.

In fact they used and needed the extraordinarily high persistances contained in the attatched figure to obtain their 'compelling' results. Obviously the information requested below is essential for replication and evaluation of your committee's results. I trust you will provide it in timely fashion."

You wrote: "The 'Team' reckons that an autocorrelation coefficient of between 0.15 to 0.3 effectively simulates real world conditions. "

Yet the NOAMER network has autocorrelation coefficients between 0.03 and 0.79, with a mean of 0.415. (The highest correlation coefficients belong to the bristlecone pines.)

This isn't relevant to the AR1 analysis whose results I just described. You've misunderstood what I did - the AR1 runs I generated weren't based in any way on the NOAMER network, or any 'real-world' data - each series was artificially (randomly) generated data with the characterstics of ARIMA(1,0,0) with the same fixed rho specified at the beginning of the analysis (depending on the run, 0.1, or 0.0001 etc). For such data, PCA simply shouldn't tend to find hockey-sticks for the PC1, but that's what the MBH decentered method does. The centered procedure recommended by the stats literature doesn't do this. The MBH decentered "PCA" is biased.

As mentioned earlier MM05 noted that using the MBH method, the hockey stick is in PC1, for which analysis it explains 38% of the variance, while using the centered method, the hockey stick is in PC4, where it explains only 8% of the variance.

[To Eli also: ] According to Michael Mann's 22nd Nov. 2004 post at RealClimate, "PCA Details", the first 2 PCs were retained in MBH98 ("1902-1980 zero reference period, data normalized by detrended 1902-1980 standard deviation"), and the first 5 PCs should be retained for the MM analysis ("1400-1971 zero reference period, data un-normalized") using the same rules. It's confusing though, as MBH98 does include a discussion of the first 5 PCs ("It is interesting to consider the temporal variations in the first 5 reconstructed PCs (Figure 5a).").

However, I'd rather not get into the potentially time-consuming question of exactly what rules were used by MBH98, whether these corresponded to the RealClimate post or their Corrigenda, and indeed what the suitable rules might be. Suffice it to say that for the purpose of temperature reconstruction, the bristlecone and foxtail strip-bark proxies shouldn't have been used. and without them, there is no hockey-stick signal in the MBH98 temperature reconstruction. Graybill and Idso 1993 have already commented that the magnitude of trends in ring-width growth at the strip-bark bristlecones sites they considered (13 of which are used in MBH, and include the strongest hockey-stick signal, from Sheep Mountain) are "conspicuously lacking in all of the time series of instrumented climatic variables that might reasonably be considered growth-forcing in nature.”. (Indeed, their paper, was an investigation of what had caused the growth, so it was careless or reckless of MBH to use these sites). McIntyre and McKitrick, in their reply to von Storch, confirmed Graybill and Idso's finding by demonstrating the bristlecones' near zero mean correlation to CRU gridcell temperatures.

Additionally, the NAS 2006 publication, "Surface Temperature Reconstructions for the Last 2,000 Years" stated that "“strip-bark” samples should be avoided for temperature reconstructions" (advice that Mann chose to ignore in later publications). While that was a statement made about strip-barks in general, we can certainly say that the strip-bark proxies used by MBH are not useful proxies for temperature for the instrumental period (which covers the period of the "blade" of the hockeystick).

Eli,Indeed so. I've been looking at the other PC's in those red noise simulations. PC1 is not so dominant. The eigenvalues (from SVD MBH) of one instance went: 73.02, 46.92, 44.80, 39.81, 39.55, 37.79...

Actually, I think MBH98 used a variable number depending on period. For the last stretch 1400-1450 they did use only one and 1450-1500 there were 2, as I recall. They considered overall up to 16 - they just used as many as they could. W&A used 4 during those periods, but said it made little difference.

1U #70,Sorry, I don't know what happened there (it was about 1.30 am local time). There doesn't seem to have been a spam filter problem. I think I can see everything - is anything missing now in what you can see?

Alex #68,The statement about getting a HS shape in 99% of cases has achieved wide circulation, but is obviously unsatisfactory, since no quantitative measure of HS shape is specified. It's probably based on the histograms (WR Fig 4.2), which certainly do show a HS bias for PC1. But of course, as Eli said, PC1 is only part of a recon.

I had a weird experience at WUWT a while ago. I came under attack from Steve for "fabricating a fake history" etc; he claimed that he had anticipated all the Wahl and Ammann results back in the 2005 EE paper. Specifically, their reconstruction of MBH using correct centering. This was contrary to the widely held perception (shared by Wegman) that this hadn't been done, and the McI/Wegman story was that W&A were not believable because A had been Mann's student.

So I looked through MM2005EE again. Fig 1 does show a recon with centering. It's not a straightforward recomp like W&A did - Gaspe cedars have been removed etc. But indeed you can extract a case where a recon with centered and decentered methods are compared. And in terms of HS, it makes very little difference. Who knew? The difficult part is 1400-1500, where in MBH98 there is an acknowledged shortage of data (that's why they stopped at 1400), and the calc is less robust. But, surprisingly, Steve is now saying that they anticipated WA's result.

Ross McKitrick's account to APEC also presents a very different picture of the 2005 papers.

(not being able to use blockquotes and/or italics on this site as a way of differentiating who said what is getting confusing, thus you will be 'OU' and I will be 'SM')

OU: You wrote: "The 'Team' reckons that an autocorrelation coefficient of between 0.15 to 0.3 effectively simulates real world conditions."

Yet the NOAMER network has autocorrelation coefficients between 0.03 and 0.79, with a mean of 0.415. (The highest correlation coefficients belong to the bristlecone pines.)

SM: I wasn't referreng to the autocorrelation coefficient of the *data*, but rather of the *red noise*. This will become clearer below...

OU: This isn't relevant to the AR1 analysis whose results I just described. You've misunderstood what I did - the AR1 runs I generated weren't based in any way on the NOAMER network, or any 'real-world' data - each series was artificially (randomly) generated data with the characterstics of ARIMA(1,0,0) with the same fixed rho specified at the beginning of the analysis (depending on the run, 0.1, or 0.0001 etc). For such data, PCA simply shouldn't tend to find hockey-sticks for the PC1, but that's what the MBH decentered method does. The centered procedure recommended by the stats literature doesn't do this. The MBH decentered "PCA" is biased.

SM: Oh. So you didn't use a representative sample of the actual data to generate your random proxies from? That's what the climate scientists did (and what McIntyre did with the NOAMER data, but he used an unrealistically large autocorrelation coefficient with a decorrelation period of almost 20 years). The idea is to inject a realistic amount of random noise into the actual climate data, to simulate non-climatic influences on the trees like disease or insect infestation. That's why the noise has to have a realistic decorrelation period of just a year or two. You know, to simulate what happens to trees in the real world? Then you run a ton of random simulations and see if your PCA algorithm can still pick out the climate signal from amongst the noise, and that the signal it picks out can be verified against the period where the signal you are trying to isolate (in this case, temperature) *is known*. This is called the 'calibration period', and was 1902 - 1980 for MBH98.

If you didn't generate your random proxies based on real world data, then you're not trying to replicate what MBH98 did. In fact, you're no longer doing science. Rather, it's what tamino likes to call 'mathturbation'. Just playing with numbers that have no physical basis behind them. And I'm guessing your AR(1) runs must be tending to trend either up or down towards the end. There's no way in hell any PCA algorithm, even MBH's, could pick hockey stick-shaped PC's out of data with no trend in it!

And round and round we go. These blog threads never end conclusively. That's why we need the peer-reviewed literature.

Obviously, I disagree with your assessment of the validity and usefulness of my analyses.

To hopefully help you understand what I've done, please note I've posted the results of four _different_ analyses (each with it's own "fig.2" histograms).

The first is a replication of the MM05 ARFIMA analysis, using a fresh set of synthetic series (this corresponds to the analysis in the "Update - Appendix" section of Nick's post above.). The second, following Nick's suggestion, was a tweak of the first analysis, and is the same as Nick's main analysis above. The third was to test the MBH and centered algorithms on AR1 noise with rho=0.1, and the fourth was to test the MBH and centered algorithms on AR1 noise with rho=0.0001. (I also tested with values of 0.2 and 0.5 but have as yet to post the results).

The analyses using AR1 noise showed that the MBH algorithm is still tending to generate hockey-stick shaped PC1s even with low values of rho (DeepClimate's lag-one coefficient parameter).

This disproves DeepClimate's statement that "it was necessary to choose a very high lag-one coefficient parameter (0.9) to show the extreme theoretical bias of “short-centered” PCA."

But the treatment in GRL did not adequately explain the issue of centered versus noncentered being at the same time conflated with covariance versus correlation. Huybers blew that up good. SM pointing to EE is begging the point. He was playing thimble game.

Interestingly, his buddy Ross actually advocates the covariance as being "more correct".

I tried having an in depth conversation with them on that a while ago pointing out examples from blood chemistry (it is not just "units" but factors: ppm cyanide and ppm chloride need normalization! You can't treat them as just different stuff in the blood but all ppm chemicals are the same.) Despite it being right on...in terms of topic...and himself being able to drive the discussion a lot better than I could...McI just resisted and blew it off.

McI actually plays VERY careful to get the benefit of the conflation, even of the advocating covariance without actually signing onto it. He lives in some wave particle duality land of artful dodger shiftiness. Really juvenile when you get down to it.

Ross is not so subtle and actually pins himself down as pro covariance.

Huybers really nailed this stuff. After I reread his comment for the second time, I realized the Steve posting ranting about it was mostly squid ink. Huybers really nails it. you don't even need linear algebra to ujndersstand it. Just knowing that y needs to be fixed if you are looking to vary x and measure it's impact on z.

This stuff has been going on for years, the guy has been resisting for years, and his brew crew have been ignoring it. Still remember pulling a couple scales off of yapdog Bender a long time ago...

You can see in the right-hand panel of that figure that the PC1s generate with AR(1) autocorrelation coefficient of 0.2 in no way exhibit a hockey stick shape.

So either the NRC and Deep Climate are wrong, or there is a mistake in your code/analysis (i.e. most likely a trend in your AR(1) simulations that you are assuming to be trendless). If your base data is flat (with just red noise added), then even MBH's decentered PCA *cannot possibly find a trend that isn't there*. Ever consider the possibility that you are making a mistake somewhere?

Steve #79: You can see in the right-hand panel of that figure that the PC1s generate with AR(1) autocorrelation coefficient of 0.2 in no way exhibit a hockey stick shape.

Call it what you want, the bias is still apparent. DeepClimate acknowledges this, writing "the biasing effect of “short-centered” PCA is much less evident when applied to AR1(.2)".

The post-500 values are quite consistently higher than the pre-500 values. Try increasing the number of PC1's from 5 to 500 - the post-500 values are again quite consistently higher than the pre-500 values, it's not a fluke - there's a bias in the MBH algorithm. Try changing phi in the code from 0.2 to 0.1, or 0.01, or 0.001, or 0.0001 - again, the post-500 values are quite consistently higher than the pre-500 values. This is not an underlying pattern in the AR(1) noise, and it shows the biased nature of the decentered MBH algorithm.

It might have helped if DeepClimate had included the red line that's also generated by the NRC code. For some reason, he omits this. According to NCR: "The first eigenvector of the covariance matrix for this simulation is the red curve in Figure 9-2, showing the precise form of the spurious trend that the principal component would introduce into the fitted model in this case.". If the red line is not excluded from the right-hand panel, one can see that it has a sharp upward kink at 500 (the start of the baseline) on the x-axis for phi=0.2, 0.1 and 0.0001.

The point of using random data with such low AR1 coefficients is to show that the MBH algorithm has a biased impact even on such data. The actual NOAMER tree-ring data have higher AR1 coefficients, of mean 0.4, and going up to 0.79 (not far from the 0.9 used by NRC) , and so the bias is more prominent.

1u #80,"It's not a hockey-stick, since the beginning part has higher values than the final part - the 'blade' of the stick doesn't have unprecedentedly high values."

Yes, I said "The difficult part is 1400-1500, where in MBH98 there is an acknowledged shortage of data" They are down to using just 1 PC.

That was discussed in W&A as well. But that's not a methods limitation - as time passes, more data is acquired.

From W&A:"Similar to MM03, this scenario yields much warmer NH temperatures for the 15th centurythan both MBH98 and WA, which are also at odds with 15th century temperatures in otherempirical reconstructions (see Jones and Mann, 2004). According to our assessment, however,this result does not have climatological meaning because the reconstructions clearly failvalidation tests,..."

And BTW, the distribution of the HSI for short-centering is simply (the square root of) chi-squared with 70 degrees of freedom, the number of synthetic proxies to which the PC computation is applied. This is easy to show once you see what is happening (this also to Steve Metzler #82).

Now as a point of terminology, I don't think you can call this 'bias': a chi-square distribution is a sum of squares of standard normal distributions, which are very much random. And what matters is not just this effect, but the outcome of the full verification chain, using as many PCs as appropriate, and applying the same centering convention both to the data and the synthetic proxies that provide the verification bounds (and, of course, using a realistic autocorrelation model). There are valid reasons for objecting to short-centering (like Jolliffe's) but this is not one of them IMHO.

Nick #83re: MM2005EE, Fig.1You had written: "But indeed you can extract a case where a recon with centered and decentered methods are compared. And in terms of HS, it makes very little difference."

Whatever the climatological significance of the temperature reconstruction, the application of centered vs decentered methods to the data does produce differences - the 15th century results are depressed with the decentered method, while with the centered they exceed 20th century levels.

I considered it more informative to see how the results vary for a range of phi - in addition to NCR's 0.9, I chose 0.5, 0.2, 0.1 and 0.0001 (close to 0). I also ran it using white noise - again, the biased MBH method produces elevated (or depressed) results for the baseline period.

You object to the use of the term "bias". We could instead use the NRC report's language: the MBH method, unlike the centered method, can introduce a spurious trend-like appearance in the leading principal component. "Bias" seems a good shorthand, also used by DeepClimate - as in biased towards selecting a PC1 with a spurious trend from random data that doesn't contain that pattern.

Martin: There are valid reasons for objecting to short-centering (like Jolliffe's) but this is not one of them IMHO.

The use of a method that can introduce spurious trends should bother you ("spurious trends" if one interprets the results of the decentered PCA in the same way as if there were produced by normal PCA, as MBH do). The recommended method is centered PCA, which is understood, and doesn't introduce such spurious trends.

Ian Jolliffe, author of one of the seminal books on PCA, wrote at Tamino's :There are an awful lot of red herrings, and a fair amount of bluster, out there in the discussion I’ve seen, but my main concern is that I don’t know how to interpret the results when such a strange centring [MBH's method] is used? Does anyone? What are you optimising? A peculiar mixture of means and variances? An argument I’ve seen is that the standard PCA and decentred PCA are simply different ways of describing/decomposing the data, so decentring is OK. But equally, if both are OK, why be perverse and choose the technique whose results are hard to interpret?

Jolliffe noted in a review of one of MM's submissions that if you use decentered PCA, you are no longer successively maximizing variance as in normal PCA, and you cannot then sensibly interpret the results as explaining variance (yet this is what MBH do by treating the results of the decentred PCA in the same way as if they're from centered PCA).

In McIntyre and McKitrick's discussion of this figure, they describe a questionable and undisclosed ad-hoc manipulation of the data by MBH :

The middle panel (“Archived Gaspé”) shows the effect of merely using the versionof the Gaspé series archived at WDCP, rather than the version as modified by MBH98,accounting for a material change in the early 15th century. The only difference betweenthe two series is the extrapolation of the first four years in MBH98. Under MBH98methods, a series had to be present at the start of a calculation step in order to beincluded in the interval roster. In only one case in the entire MBH98 corpus was thisrule broken – where the Gaspé series was extrapolated in its early portion, with theconvenient result of depressing early 15th century results. This extrapolation was notdisclosed in MBH98, although it is now acknowledged in the Corrigendum [Mann etal., 2004c]. In MBH98, the start date of this series was misrepresented; we discoveredthe unique extrapolation only by comparing data as used to archived data.

Hey, thanks for the explanation. Now I can see where the 'kink' in PC1 is coming from! It would indeed appear that oneuniverse is indulging in a bit of useless 'mathturbation'.

So now we're back to Square 1: if you consider all the PCs that are relevant, it doesn't really matter much to the overall results. This fact has been independently verified ad nauseum by other parties since MM05.

OTOH, if you only consider PC1 like McIntyre and oneuniverse are insisting you must do... ah, forget it, it's time to move on. We're off topic with this sub-discussion anyway, and it's not fair to Nick.

I don't quite agree... it is an interesting point. But in the end it is indeed irrelevant.

> you cannot then sensibly interpret the results as explaining variance

Yep oneuniverse, that was Jolliffe's objection. But all this means is that the result is theoretically sub-optimal (slightly, I would guess), by including more noise that should have been excluded, and/or excluding some signal that should have been taken along (and anyway the choice of the cut-off PC number is a judgment call, rules of thumb like Preisendorfer's notwithstanding). This doesn't mean that this suboptimal solution is unphysical, and indeed that's what Wahl and Ammann demonstrate.

I'm late back to the dance, but I agree entirely with Eli's comment here: McIntyre has been forcing the card on PC1 for eight years, but to evaluate the Mann reconstruction you have to look at all the PC he did and the effect of noise on them

Regarding Martin's question: (Did you detrend like they did? I think you need to.)

Agreed. The short non-technical answer is "yes, I detrended" and did so on a window-by-window basis in reconstructing the periodogram. Thank you for the reference.

"Bias" seems a good shorthand, also used by DeepClimate - as in biased towards selecting a PC1 with a spurious trend from random data that doesn't contain that pattern.

While I hugely respect DC, I think he is wrong on this. If "bias" is present in short-centred reconstruction, then it is also present in standard-centred -- only to a lesser degree. Let me elaborate.

As also your histograms show, the hockey stick indices from synthetic proxies are distributed on both sides of zero: there are as many negative as positive ones. The mean HSI is zero, for both centering choices.

Now, for these synthetic proxies the sign of the HSI coming out is just an algebraic accident without deeper meaning -- very different from real proxies where the sign is indeed physically meaningful.

Now if you choose instead to look at the absolute values only, then the mean HSI is positive... for both centering choices. Yes, it is a lot bigger for short-centering; but for standard centering it is not zero -- look at your histograms again. If you want to call that "bias" (as apparently you do, as quoted above), it applies to both cases. The difference is only in magnitude.

Even later to the party: Who says that AR1 is a good choice for the noise model for testing purposes? If you model the actual series with fracdiff rather than arima, many of the series show long term persistence (d > 0.1 ). I never did finish what I started on this, but a quick and dirty test showed a lot of HS behavior from randomly generated data based on a set of fracdiff models with coefficients calculated from the actual series.

DeWitt #93,I'm not sure there has been an argument that AR1 is better than, say fracdiff. It's arisen because Wegman said it was AR1 when it wasn't.

I think your tests would be interesting. If by HS behaviour you mean a specific rise in the calib period, that would be surprising. Or could it just be that the very persistence means long excursions from the mean are possible at any stage?

John,Yes, it is odd. Currently the main page says 100, but there are 95. Blogger allows posters to delete their own comments, and I think there was at least one that was duplicated because of the spam filter. Maybe the main page count doesn't notice deletions.

When I liberate from the overactive spam filter, that can change the numbering of visible later messages.

Yes, a rise during the calibration period in about 30% of the random series large enough to cause selection of those series by the standard criterion in Mann et. al. 2007(?). It's on The Air Vent somewhere. It started to look too much like work to finish it.

The modes are different. For centered, the most common HSI values are zero and near zero, while for decentered they're not.

Now if you choose instead to look at the absolute values only, then the mean HSI is positive... for both centering choices. Yes, it is a lot bigger for short-centering; but for standard centering it is not zero -- look at your histograms again. If you want to call that "bias" (as apparently you do, as quoted above), it applies to both cases. The difference is only in magnitude.

I'm not saying that the short-centered method is biased because the mean absolute HSI is positive. It's biased because, unlike the centered method, it's prone to introducing spurious trends in the PC1, as noted by McIntyre and McKitrick, and the NRC report.

As I noted earlier, my use of 'spurious' follows the NRC report's language in their discussion of the MBH algorithm. (See #81)

Huybers, in his reply to MM05, also confirmed that the MBH algorithm is biased :

"The reason for the bias in the MBH98 PC1 can be understood by considering that PCA maximizes the variance described by each principal component where variance is measured as the sum of the squared record, sigma^2=sum over t of (x_t)^2 , and x is not necessarily zero-mean. The MBH98 normalization tends to assign large variances to records with a pre-1902 mean far from the 1902 to 1980 mean, and records with the largest variance tend to determine PC1. This bias was checked using a Monte Carlo algorithm independent of MM05’s."

Statistician Noel Cressie didn't object to Wegman's usage of the terms "bias" and "biased" in Appendix A of the WR. (Point 4 of Cressie's letter shows that he has gone through Appendix A, in which Wegman notes "Because thetraining period has higher temperatures, this biases the overall data lower for the period1400-1995, thus inflating the variance.". The A.3 section, in which Wegman provides an estimate of the bias in the general case, is titled "Numerical Methods and Bias Illustration". ).

Statistician Richard Smith (Smith 2010, reply to Li 2010) in recounting MM05's paper, notes and does not contest that "In many cases, the simulated series combined with the MBH algorithm resulted in a spurious hockey sticklike curve for the trend."

The only objection I can see is that the usage of the terms "biased" and "spurious" may not be specific enough (although it's fairly clear from context) : the short-centered "PCA" method is biased, and is prone to introducing spurious trends, if one is interpreting the results in the same way as standard centered PCA ie. as successively maximising variance, which is what MBH were doing.

Any decent statistician knows there are errors that can skew results, but the question is whether a theoretical problem makes an actual difference in the results. Go back to #46. In SSWR, p.58, I wrote:

'It is exactly what one might expect a busy statistician to say. I would paraphrase the key points as:I concur with the MBH decentering issue, so compute it the "right" way.'

For instance, if I have a sample of heights from 100 people, but I compute a mean based on only 99, my answer is *wrong* ... but only if the missing person is a supergiant does the the error make any difference. No reviewer is going to disentangle the mess with AR models and hen redo the analysis with centering, and then do a statistical analysis of the differences, especially when the published error band makes this all silly.

So, do peoplee accept my interpretation of Cressie, as written in SSWR p.58,60, 71? Or not?

Thanks, oneuniverse for your clear explanations and references to past comments. The muddy waters get stirred with a misinterpretation here and some personality issues thrown in there and you continue on the path of a logical analysis.

In the big picture I would assume that it is obviously apparent that errors, omissions and lack of due diligence can be pointed to on all sides of the climate science issues. If we can get past all that baggage and attempt to analyze the evidence that has been presented and ask intelligent questions, we can all decide for ourselves the apparent worth of these arguments. I see several issues that always seem to be overlooked in discussions of the temperature reconstructions and in particular those of MBH and progeny.

There are several areas of contention in the methods used in these reconstructions and it becomes difficult to keep the overall criticisms and defenses in order and perspective when only a part of it is being discussed. Defenses sometimes appear to assume that a point of contention from the critics is the only point and therefore can be written off as a nit pick or least not a game stopper when taken alone.

The reconstructions (and those doing the reconstructions) that stick the instrumental record on the current end of the series can truly confuse the issue and particularly so when the reconstructed part does not follow the instrumental record or ends relatively early. What part of the blade of the hockey sticks or semblances of hockey sticks is attributed to the instrumental record?

Do we consider sufficiently the evolution of the MBH reconstructions as the graphs would without the instrumental record stuck on end and note these reconstructions becoming less hockey stickish?

There are other points of contention in, for example, the robustness of the reconstructions without certain proxies and the potential variance reduction that a number of reconstruction methods can evidently produce.

1U, Kenneth, etcIt's odd to me that people keep rehashing whether decentred PCA is biased, or whatever. It's clearly not an optimal method, and was probably used by mistake in MBH98 and 99. That is 12+ years ago. It hasn't been used again, and won't be. So I try personally not to get too deep into those arguments.

As far as methods are concerned, W&A have shown that the original MBH calc was little affected over most of its range by this choice. The 15th C end was wobbly, but that was because data was running low - we now have much more. And of course we now have many other reconstructions, none of which used decentred PCA.

Then people tend to switch to talking about bristlecone pines etc. But that has nothing to do with bias in variant PCA.

This is an issue for me because I do find the analysis issues mathematically interesting, and I've been rerunning the W&A code and thinking of blogging on it. But I'm ambivalent, because in anything but the peculiar climate argument context, there just wouldn't be any point.

Kenneth #107,"What part of the blade of the hockey sticks or semblances of hockey sticks is attributed to the instrumental record?"In my view, we're talking about global temperature reconstructions. So you plot your best information about temperature in each year. Back to about 1850, that means instrumental. Proxies are definitely second-best information.

So what proxies tell us (to the extent we believe it) is just that temperatures 0-1850 didn't vary much relative to the recent rise.

That's why I find the "hide the decline" talk frustrating. No-one sensible thinks that we should revise our understanding of post-1960 temperatures based on tree-rings. The legitimate divergence question is whether the pre-1850 proxy-based data is reliably calibrated, not whether there is temperature validity in the "hidden" curves. And that has been much discussed.

I think the issue with the pre-1850 data is not so much the calibration, but doubt as to whether the tree-ring data is a reliable temperature proxy at all. If tree ring width has declined during a period when temperatures are known to have increased, this must at least raise the possibility that the same thing has happened before. The doubt can't be simply allayed by asserting that the recent discrepancy must be due to some unknown human-related factor - although I'm sure that dendros with a scientific conscience are going into the issue a good deal deeper than that.

Personally I'd very much like to see what you have to say on the subject of W & A. Whenever I try to work out what the issues actually are with MBH98 I just get completely confused by the torrents of verbiage at CA. This post has been exceptionally helpful in clarifying things, and it ould be great to see the same thing done with regard to W & A.

Andrew,"I think the issue with the pre-1850 data is not so much the calibration, but doubt as to whether the tree-ring data is a reliable temperature proxy at all."I'd regard those as pretty much equivalent, and so I'm sympathetic to what you're saying. But I do think that conscientious dendro's have looked into it, and say that while divergence is a problem, there is real validation - the proxies have value.

Nick, it is sometimes difficult for me to attempt to stand in your shoes and view these issues the way you do. Surely you must judge as others of us do that the goodness of a reconstruction is how well it correlates with temperatures, and particularly those temperatures furthest away from the series mean. If the reconstruction cannot follow well the current warming it obviously raises question about past proxy responses to higher temperatures. That makes the attachment of the instrumental record where the reconstructions show a poor fit and even a decline during the current warming period especially distressing for those who expect a scientific view of the results.

Further even for a case where the reconstruction ends early and there is, thus, no major decline to hide, I would not judge that it would be good science practice to tack on the instrumental record as it has not been clearly established that the reconstruction temperatures are the "same" as those measured from the instrumental record. We have the point of contention on reconstruction methods being capable of matching the variances and under estimating the heights of the hills and depths of the valleys. I believe, another way the reconstruction and instrumental records are different is that the tacked on instrumental record is derived from the whole globe or continent, depending on the reconstruction, or at least that is the attempt, while the reconstructions are much more limited in spatial coverage.

The point is made at the Skeptical Science blog that the divergence does not occur with the south designation proxies, but that the proxies (south, north, east, west) all track well prior to the divergence period. They track well if we confine tracking well to mean that the proxies tend to go up and down together over the average period that the graph applies, but not if we expand that definition to how much they go up and down to the same extent for a given short time period. The proxies could well be reacting in concert with climate variables but not necessarily alone to temperature nor with the same amount of response.

Of course, the question left unanswered is: are the proxies from the north, east and west locations typical of what we might expect from the proxy response to higher temperatures or those from the south and further what are the responses at individual sites and not an average of proxies from an arbitrarily select latitude and longitude range. What effect would the divergence have on the proxy/reconstruction algorithm assuming that the divergence affects the calibration period? We know that some tree ring proxies have shown the opposite of divergence and that is phenomenal growth like 5 to 6 sigma. Do we understand that growth? It tends to get lost when we look at averages.

A point of contention arises from the selection process used to qualify tree ring width and density proxies and whether random samples selected with a well-thought-out prior rationale might lead to different results for reconstructions. The selection process that I see as being used for reconstructions could be considered more as in-sample testing. Therefore to answer some of this point of contention one would need to sample the same trees, or reasonable facsimiles, for the years following the original gathering of data and measurements for the published reconstructions. The cry for this has been to bring the proxies up to data and it could provide out-of-sample testing.

Nick #108,As far as methods are concerned, W&A have shown that the original MBH calc was little affected over most of its range by this choice.

The original MBH calc had the HS in PC1, explaining 38% of variance. The centred method has the HS lower down the PC order, explaining less of the variance. How does this affect the uncertainties? W&A do not recompute these - their analysis is incomplete.

Then people tend to switch to talking about bristlecone pines etc. But that has nothing to do with bias in variant PCA.

W&A acknowledge that the MBH bristlecone/foxtail proxies do not correlate with local climatalogical measurements (ref. Graybill and Idso 1993, MM05). They claim that the climate field reconstruction method can overcome this obstacle, stating:

"It is important to note in this context that, because they employ an eigenvector-based CFR technique, MBH do not claim that all proxies used in their reconstruction are closely related to local-site variations in surface temperature. Rather, they invoke a less restrictive assumption that whatever combination of local meteorological variables influence the proxy record, they find expression in one or more of the largest-scale patterns of annual climate variability to which the proxy records are calibrated in the reconstruction process (Mann et al., 2000)."

This "less restrictive assumption" needs to be justified on physical grounds, yet MBH and W&A never do so. The established science of dendroclimatology is based on a physical understanding (however limited) of trees' response to local conditions - we know that the physical factors that affect trees (such as temperature, percipitation, humidity, insolation, availability of nutrients etc) all do so through local physical interaction with the tree. MBH and WA never justify how it can be that large-scale patterns of climate variability can affect the tree without leaving a discernable signal in the local climatalogical measurements.

And of course we now have many other reconstructions, none of which used decentred PCA."

Almost none of these reconstructions agree with MBH, according to which persistent warming only started post-1900.

Kenneth #113, My general view is that it is desirable to get the best picture we can of past temperatures. It may be good, maybe less so. The reason is not to "prove" AGW - the case for AGW comes from the forcing - radiative physics plus emission of gases. But naturally one would like a check on whether the expected temperature rise has been observed. We've seen a substantial temp rise measured with thermometers, so tick that one.

But even so, maybe such rises have been common? That's where proxies come in. If it turned out that there were big past temp variations, that wouldn't disprove AGW - we still have the basic physics plus the observation of the expected temp rise. If you predict a rise, and observe it, that's the best you can expect. But if the rise is "unprecedented", then that does become extra evidence.

So we have the proxy account. It isn't as solid. The divergence definitely detracts from it. But still, there's a lot of different kind of proxy evidence now, and while it's not consistent, the recent warming certainly still looks unusual.

To repeat - the aim is to get the best estimate available of the temp for as far back as we can. Some parts will be good, some less so. Eventually, going back, we'll run out of data. But the way of showing that estimate will be a graph which, for each year, shows the best temp estimate that we have. That will be part instrumental, part proxy, indicated.

Variant parts, like the "decline", are relevant to error estimates, and bear on general credibility, but they are not actually part of the temperature estimate.

The Mann off-center mean method was non-declared, does cause a SMALL bias, and was probably an error. The problem from the McI and cheerleader side, is they have routinely exaggerated that effect rhetorically by changing more than one factor at a time. Some of the Mann side (e.g. Tamino and Mann and Dhogza) do not want to admit a single error. even a very small one. (Although Tammy came to Jesus a bit when he interacted with Jolliffe...really stuck in his craw that silly poster TCO was right on Jolliffe.)

The problem from the McI and cheerleader side is they do not want to admit some of the errors and sophistries on their side (pick 100, changing more than one thing at a time but running with offcentering rhetorically, noise model questionable and not properly disclosed, etc.) And when called on it, the McI and McI nuthuggers will "run for the ice" (except they run for the bristlecones or changing the scope of discussion from McIs exaggerations to some general topic).

I realized this once I finally reread the Huybers comment carefully. Really reread it rather than the McI crit OF IT. Huybers is West Point trained. I trust him to tell the truth. To walk into the captain's room and say "I threw your damned palm trees over the side. Now what's this damned no liberty all about?" McI is a Clinton-equivocation-loving pennystock shell company operator.

You guys gotta learn to look for truth regardless of who it helps. Watts does not do this. Nor does McI. Lucia and Moshpit claim to. But they really don't either. You gotta have the sac to look for the truth. And to call your dinner-buddy and blogging allie on being a liar.

Nick Stokes, it is rather obvious from what you say about the instrumental record standing on its own gives further reason not to connect it to the reconstructed series. What you say about a recent rapid rise in temperatures can be readily shown on its own in a separate graph, since, as you noted, that rise and anyone's concern about (without relating the rise to detrimental or beneficial effects to humankind) is a separate issue from past temperatures.

Showing it hooked to the end of the reconstructed series is not placing it there to show all the "best" temperatures available as you defend that action, but rather to equate the quality of the reconstruction as being at the same level as instrumental record and further as a selling point to emphasize the recent warming as unprecedented by scientist/advocate types.

Remember there is no attempt to splice the instrumental record to the reconstructed series, but rather simply place it there at the end. I look at the hooking-on of the instrumental record in the same light as I do the use of spaghetti graphs when they obscure the differences between proxies and/or reconstructions.

#121 KFThe reconstructions have no meaning if they are not related to the thermometer record. Validity is checked during the overlap ( approx 1850-1970). Not showing thermometer temperatures with reconstructed temperatures would be the worst kind of academic misconduct.

I would encourage showing the instrumental temperatures and proxy responses, and in detail, in the calibration and verification periods. Unfortunately those correlations are often not very good and are not shown in good detail in the published reconstructions.

In fact, John McManus, now that you brought up this issue, I would add that calibration/verification should be an important and separate graph that should be shown or better displayed in these publications. We then have the reconstruction in a graph, the calibration/verification that is supposed to tie the instrumental and reconstruction together in another graph and then the instrumental in a separate graph - or at least in a very easily distinguished separation in a combined graph. It would much clearer in seeing the important relationship and quality of that relationship with separate graphs of the stepwise process.

You make a good point: Why do we not see more detailed graphs illustrating the instrumental record and reconstruction response in the calibration/verification process?

Take a look at the graphs in Mann et al. 1998 SI and in Mann et al. 2008 that show proxies/reconstructions distinctly and visually separable from the instrumental data. You can readily see that most proxies dip late while the instrumental record is going up at an accelerating rate in these graphs. Now compare those graphs to graphs where instrumental record is tacked onto the end of a reconstruction or several reconstructions and the separation of the proxy response and instrumental record are difficult to discern. They present very different pictures and stories.

In my post above I should have referenced the graphs in the Mann et al. 2008 article and SI and not the SI of the 1998 paper.

Also note in these graphs a tendency for divergence between proxies/reconstructions and the instrumental record for the non tree ring proxies/reconstructions - in addition to the better publicized divergence for the tree ring proxies/reconstructions.

I tried to access the SI but was blocked.1998 Fig.5 shows PC's ; I don't think these are proxies. I can't see there is actually a different story being told.

The 2008 paper graphs are easier to read , being in colour. This is probably due to the evolution of journals not author malfeasance.

As shown in Nick's post above the attempt to improve graph clarity continues and will continue.

Coming to dendro through archaeology, the divergance problem is old news: older than MBH 1998. It has been recognised and theories have been put forward for decades, but a few things are recognised. The divergence occurs only sporatically in a few species in a few places. Tree growth is dependant on climate. Tree rings contain climatic information that is useful. Divergance has little effect on archealogical knowledge.

It has always seemed to me that by ignoring archeaology, climate sceptics weaken their arguements.

I am not sure that no-one is admitting that the PC process that Mann et al. is not a good method. Nobody uses that method any more, which is a pretty clear indication that they don't trust it.

That's the real way that scientists admit what they have done isn't correct - they abandon it.

The issue is that people aaretrying to make this method into an example of 'poor science' or dodgy science. The people defending it are trying to make it clear that the first method, though not optimal, was actually good as a starting point. Other scientists worked of it and developed their own techniques. The newer re-constructions may not follow the original exactly but they aren't really all that different.

People that are critical of proxy reconstructions in general should put their money where their mouth is; make their own and defend them. Motherhood statements about how it looks like most proxies dip in the late twentieth century is not sufficient. If their real beef is that dendrochronology is useless, they should take that up with dendrochronologists; find out why dendro's are confident.

Yeah...IOW, incapable of disaggregating the issues and worried that an admission of even a trivial error will be taken and run with. Therefore not admitting it. This is behavior of (poor) Internet debators. Not scientists. And real mathematicians and scientists have no issue getting into the nitty gritty and admitting a minor error and not worrying about some MF taking it and running with it.

The difference is that Mann is addressing it in the literature, McI just snipes from his blog.It's a huge difference.

There's no point in Mann saying his PC variant was bad. It serves no purpose for anyone in science, because there's no real quantifiable way to say how bad it is. Instead he has tried various other methods, and so have many other authors, and they all basically back up the original version. Mann is behaving exactly as a scientist should; someone claims his method is bad, so he tries to do the same thing using a different method. It then becomes irrelevant.

John McManus, we need to focus on what those graphs (Fig 2 main article Mann (2008)) are telling us and not some peripheral issue. The graphs in the SI are much the same as those in the main paper and show the divergence of the reconstructions (and thus the divergence in the proxies that were used for the reconstructions) for both the tree ring and non tree ring reconstructions. Look at the graphs in Figure 2 as opposed to Fig 3 (where Fig 3 is the usual rendition we see of reconstructions with the instrumental record tacked onto it).

The body of the paper of Mann et al. 2008 makes the following comment:

"This latter finding suggests that the divergence problem is not limited purely to tree-ring data, butinstead may extend to other proxy records."

> I’m going to spend some time studying it, before making any other comments. Here. About this. Except that I just got a surprise on another subject, and now I’m suspicious of everybody. Guess I should have been from the start.

Nick - did you ever look into M&M's process used to create the "persistent red noise". I'm not an R person (I think it's loaded on my laptop, but I used it at most once). I can't see in McIntyre's code where the proxy data being used is detrended, but hosking.sim is a black box to me. Does hosking.sim detrend the data or is Ritson's assertion to Wegman that they contaminated the noise correct? If so, wouldn't doing it properly likely reduce the occurence of HSs ?

McIntyre seems willing to concede Ritson is right, "But even if Ritson should eventually show that he’s right, this is not something that anyone else has noticed so far or that he’s been able to persuade any non-Team people about. It’s an issue that he did not raise in his own comment on our GRL article; it’s not an issue that’s discussed in Wahl and Ammann or "vetted by Nychka".

I take that to mean SteveM didn't have an answer to Ritson's claim. Martin Vermeer seems to back Ritson up.... the problem is that the hockey-stick shape if not removed contaminates the noise model. That is, you get much “redder” (more serially persistent) noise than if you did it correctly. This makes finding spurious hockey sticks much more likely.

I have a little, and it's suggested I do more. It's a bit hard to pin down. Wegman has clearly mis-described what the code does, but what it does is not measured by simple numbers. It uses a Hosking method to fit to the autocorrelation observed. I believe that is unwise, but it isn't a coding issue. So far, Hosking is a black box to me too, and probably to everyone involved.