MBH Proxies in Bo Li et al

We have a new article from the Mannian school, this time involving supposedly “independent” NAS panelist, Doug Nychka, and geologist Caspar Ammann, who is very enthusiastic about calculating covariances using Mannian proxies. The lead author is a statistician, Bo Li. The article purports to use MBH99 proxies and says

we do not critically evaluate these proxies but simply apply them as the best available set at the time of 1999

They then proceed to use what appears to be Mann’s PC1, plus no less than 4 different time series from Quelccaya (Core 1 dO18 and accumulation, Core 2 dO18 and accumulation) with each different core remarkably and mysteriously teleconnecting in different ways with “climate reconstruction fields” such that mere covariances are interpreted as having physical meaning.) But more on the statistics on another occasion. The article concludes by saying:

The authors thank Dr. Michael Mann for providing the proxy data.

Jean S observed that the proxy series do not match the MBH versions that he had. I checked this as well. The figure below compares MBH99 Quelccaya Core 2 accumulation (with its characteristic pattern that Hans Erren figured out a few years ago) and the Quelccaya Core 2 version used by Li et al. The two proxy series are clearly not the same; the correlation is 0.70. Correlations of about this value are typical. It’s hard to say what Mann has done this time. I checked the a.c.f’s of a couple of versions to see if Mann had given them smoothed versions, but, in the examples that I looked at, the new series did not have additional autocorrelation. The correlation of the original version to the new version was consistently about 0.70 for all of the series – which is close to the square root of 1/2 – but why? All these proxies could have been obtained from WDCP so I don’t know why they didn’t use original data. Pretty mysterious.

Here’s another example:

UPDATE: The above surmise that noise was added to the data has proved correct. Bo Li has confirmed that noise was added to the proxies and that they archived a sample of synthetic data and did use original proxies. She has accordingly corrected the description of the data sets on her webpage to “synthetic” instrumental and “synthetic” proxy and responded cordially, apologizing for any inconvenience. This resolves the proxy issue, which was an odd one. When I get a chance, I’ll look at the statistics.

The data sets that I posted on my website are not the real Northern Hemisphere temperature and the MBH99 proxies. They are generated by adding white noise with unit variance to the standardized real data. The pseudo data sets on my website only serve as a toy example to try the R code that I used in my paper. However, the results in Li et al. (Tellus, in press) are based on the real data instead of the pseudo data. I am sorry that I did not explain very clearly what the data set on my webpage is and also sorry for the confusion that I brought to you as a consequence. I have modified my webpage to make the point more explicitly.

In our paper, we looked at the residuals in calibration (14 proxies against not PC1 of instrumental, but full N-Hem average) and found that there is enough serial correlation that an AR2 process is warranted to represent the “noise”. Then, the approach was to simulate a set of ensembles that fulfill the criterion during calibration so that the explained variance is the same but that the noise is different (using that AR2). Then, because we wanted to establish a method to study the maximum decadal values, we then simulated all series. Simulating time-evolving series was necessary because we have serial correlation, and thus decadal maxima can be computed. The goal was not to be producing the best reconstruction that is currently possible, but to demonstrate a way of how one could go and address the decadal max question. This question was not well addressed in the NRC report, and thus this work is a followup as collaboration between stats and geoscience to show how one could better answer the significance question for the old MBH99 framework.

A couple of questions: the MBH99 proxy version used in this article do not match original proxies. It’s not a linear transformation or re-scaling since they all have a correlation about 0.7 to the original proxies. Any idea what’s going on? Why did you use a different version?

Do you have any idea how confidence intervals in MBH99 were calculated? There is no statistical reference in MBH99 and the method is unknown to any climateaudit readers, including several professional statisticians and post-docs. I don’t get much cooperation, but perhaps you can obtain some information or at least a statistical reference. Perhaps Ammann knows.

Hans Erren has an explanation on his website. What happens is that the ice layers get compressed and are measured to only 1 digit. Then they assume that the compression is exponential yielding the results shown in the top panel through a re-expansion.

But let’s not get too caught up in this issue but focus on what Mannis doing. I’ve added another example to avoid this distraction.

Ian:
That pattern really jumps out, like a repeating signal of some sort. It must have a mathematical form. It looks like the same set of data repeated but displaced to the left and shifted up slightly. “Weird” is exactly right.

Not to sidetrack this too much, but after reading Hans’ correspondence from Thompson, it seems like there is an opening for acquiring the elusive quelcayya dataset in his response. If his claim is that he cannot provide the data because it is not in electronic form, surely it would be simple for him to fax the handwritten pages to somebody? Or to xerox and snail mail them? I am sure we could find a way to get the handwritten form into electronic form pretty easily.

In Hans’ explanation the input data is a cyclical accumulation between 0.5 and 2.5- does this represent a seasonal change i.e the regular seasonal cycle? Therefore, has Mann removed this seasonal signal to produce his result?

As the used compaction formula is not mentioned in TMBK1985 nor at the NOAA archive, Lonnie Thompson was asked for the original core log data. After a reminder he responded as follows on 10 March 2004:

Dear Hans: Just returned from China! Unfortunately, those logs are all
hand done. These data where not put on electronic format.
We have just redrilled the Quelccaya ice cap in 2003 and brought back two
frozen ice cores and will be producing a new log based on
this new data. Unfortunately, right we are processing Bona-Churchill
ice cores and the new Quelccaya and Coropuna cores are
in the cue.

#19. NOt that Bona-Churchill core have been reported. AS I’ve mentioned before, I predicted a ong time ago that the delay in Bona-Churchill results is because they aren’t “good” – I predict that dO18 goes down in the 20th century. It’s not a risky prediction since dO18 at Mount Logan went down a lot – of course, this is due ot changes in regional wind circulation, but if it went the other way,. Thompson would interpret it as evidence of GW.

>> What happens is that the ice layers get compressed and are measured to only 1 digit. Then they assume that the compression is exponential

If you all had read every detail in this paper by one of the world’s foremost experts on ice cores, you wouldn’t be so surprised by what looks like faked data.

For example, based on this underlying assumption, the experts expected the B 17 planes to be at a 12 meter depth. I saw the documentary, they actually found the planes nearly 80 meters down. I urge you again to look at figure 3 very closely.

These data where not put on electronic format.
We have just redrilled the Quelccaya ice cap in 2003 and brought back two
frozen ice cores and will be producing a new log based on
this new data. Unfortunately, right we are processing Bona-Churchill
ice cores and the new Quelccaya and Coropuna cores are
in the cue.

Probably just my sensitivity but does not this statement by Thompson encapsulate the tendency that we have seen before in climate science to hurry through a study in sloppy fashion then move on — and in this case move on from a “move on”. It seems they feel they need to get the word out and eventually they will get it right or maybe at least in a form that can be replicated.

The pattern in the “Original” first graph between 1000 and 1400 is clearly a result of some very brutal rounding: only a few values are a priori allowed for each year. But why aren’t the lines with the allowed values simply horizontal? There is some superimposed parabolically decreasing trend, and moreover the distance between the allowed values gets smaller as you go to the future.

I find this kind of rounding very strange. It might have a small effect on temperature averages but it brutally influences the counting of temperature variations, especially the very high-frequency ones. One could understand this rounding if they had digital thermometers with 1 digit in the year 1200. But the actual numbers are clearly obtained differently and should be continuous. Clearly, the error margins are huge but what’s the point of allowing 3-5 different temperatures only?

The lines won’t be horizontal if a graph is showing growth rates in an index level with brutal rounding.

Thus, for example, consider an index set to 100 in 2007 and rounded to one decimal place. The growth rate calculated form that index will show the pattern you observe. (Although that is not exactly what is happening here, it is a very common occurrence.)

An unbiassed sampling strategy is essential to any statistical paper. It must be based on quantitative methodology that could be reproduced by another, which also enables future out-of-sample data to be added in.

The behind-closed-doors selection of these proxies simply renders any statistical significance as worthless, because it enables cherry picking. But then, the team like cherry pie, and it seems the newest team member has a taste for it as well.

The versioning issues are just farcical. How many different temperature reconstructions can you achieve by swapping out different versions of different proxies? This would be a useful indicator of how much uncertainty is introduced by versioning issues alone.

Credit to Bo Li at least for making his data and methods more visible, and credit to Steve McIntyre for shaming them into behaving more like scientists; this should enable more efficient analysis of the work, which can only help to advance our understanding of historical temperature reconstructions.

After playing a while with the Li proxies, and performing various tests, here’s my GUESS how the proxies were obtained. I’m not saying this makes any sense nor I’m not guessing who did this (if it was done). UC and others should try to prove me wrong :)

It’s simple as this:
take MBH99 proxies, standardize them to zero mean and unit variance, and then add standard Gaussian N(0,1) white noise!

Jean S, perhaps a completely dumb idea, but what would their other “proxy” data source — the temp record — look like if the same procedure were applied? (Standardize to zero mean and unit variance, and add standard Gaussian N(0,1) white noise.)

My gut-level confusion about their process is that they have some proxies with the desired shape; they convert to 1000 years of noisy low-variance “data”, which they then “whip into shape” presumably by fitting to a temperature record… and they assume that all of this is reasonably accurate information. Take the numbers you want to come out, hide them behind noise door #2, mix, analyze, and *poof* what a surprise.

It just feels like linking assumption to assumption to create a tissue of presumption.

[If so, who has a good calculator puzzle that might be appropriately similar if simplified, in a fun way? I’m thinking of things like the old “pick any three-digit number; enter it twice into your calculator (e.g. 123123); divide by eleven; divide by your original number; divide by thirteen. The answer is… seven! Wow….]

#35. Thompson won’t disclose sample data. This was one excuse at that time. But there are many posts here at about attempts to get at Thompson’s data (prominently used in Inconvenient Truth) and Thompson just denies access. Most recently, Ralph Cicerone of NAS explicitly endorsed Thompson’s denial of data access.

I received a cordial reply from Nychka who asked Bo Li to identify the source of the proxies as he had “assumed that it was Mike’s original 14 proxies.” On the query about MBH confidence intervals, he said:

As to the MBH standard errors I have never read a precise description what was done. (Caspar any help here?) I bet it is something fairly simple like using the standard error in the verification period.

I replied that MBH98 confidence intervals were calculated on the calibration period but MBH99 procedures were something to do with the spectrum of the residuals. I also observed that, if MBH98 CIs had been calculated on verification period errors, they would obviously have been the same size as natural variation since the verification r21 was approximately 0.

Nychka was one of two statisticians on the NAS panel investigating MBH and other surface reconstructions.

take MBH99 proxies, standardize them to zero mean and unit variance, and then add standard Gaussian N(0,1) white noise!

I think that you may be right about this. The range of the standard deviations of the “new” version is about 1.4 which would make sense as the sum of two near-orthogonal N(0,1) series – for most of you, this is just a version of the Pythagorean Theorem.

I’ve posted here on a number of occasions that you can get results like MBH with one spurious regression plus white noise. It would be amusing if Nychka was inadvertently moving acceptance of this viewpoint along.

As to the MBH standard errors I have never read a precise description what was done. (Caspar any help here?) I bet it is something fairly simple like using the standard error in the verification period.

Color me gobsmacked. This is the data that he is using as an input for his study, and he never bothered to check to see if the data was any good?

take MBH99 proxies, standardize them to zero mean and unit variance, and then add standard Gaussian N(0,1) white noise!

residuals are not autocorrelated, they do not correlate with each other, do not correlate with original data.. Seems that you are right. Heh, if residuals are pseudo-random values from Matlab (or whatever software) , and someone forgot the randn(‘state’,sum(100*clock)) -command.. Oh dear ;)

Why are the points along the X-axis points, rather than bars? It is quite clear that there is much error in the time base the further back in time we go. It is also quite clear that the further back in time we go the more the data along the y-axis is smoothed, naturally. Each sample from the core contains more decades of signal when the ice is compressed, than when it is less compressed.
It would be quite easy to workout what the actual error range is, just from ice compression, using a simulation. The further back you go, the bigger the pixels are.

I wrote to Ralph Cicerone as you suggested and have not had an answer. I saw one other similar Cicerone letter answered elsewhere in CA and it was just evasive. Has anyone had any luck?

My USA friends advise me not to annoy any important USA person with an Italian-sounding surname. I’m still trying to work out why. I would not believe that people with Italian surnames are deficient in intelligence. I just bed it down to a racist statement or a statement about race-horse heads in bed.

Change of topic. For those not familiar with correlation coefficients, would it be instructive to show some scattergrams with various r? As a mining man too, we seldom placed much weight on r below 0.9, even 0.95, depending on the purpose of the calculation. When people talk about r of 0.7 or less I stop reading.

Second last topic: Can an authority direct us to a quantitative paper on the estimation of age in ice cores from stable oxygen isotopes? I cannot understand how a qualitative description of a partitioning weather effect suddenly turns into a quantitative numerical estimate. Besides, fluid inclusion studies in rock minerals are subjected by responsible authors to all sorts of caveats and enjoy low acceptance of reliability.

UC – worthwhile thought, but it doesn’t look like that’s the case from her script. This particular data set seems to be read in on the same basis as instrumental data. So this example seems to be embedded.

BTW the statistical analysis completely evades the topic of spurious regression – a topic which seems to be incomprehensible to climate scientists. They all need to take a course in stock markets and random walk to get a better sense of how you can trick yourself.

I think that you may be right about this. The range of the standard deviations of the new version is about 1.4 which would make sense as the sum of two near-orthogonal N(0,1) series – for most of you, this is just a version of the Pythagorean Theorem.

For the electronicers among us. Adding two noise sources increases the noise 3 dB. A 3 dB power change is an increase of 1.41 in noise current and noise voltage into a constant load. i.e. the power doubles.

#49. Their excuse: “the best available set at the time of 1999″. The problems with the bristlecones were already known in 1999. The bristlecones gave an answer that they wanted; they were already known to be flawed, so they were obviously not the “best available” even then.

That’s one thing that bugs me about so many in the climate change community.

There willingness to work with data that is known to be bad, as long as it’s “the best that is available”.

Such an attitude is acceptable when you are doing a research project and are just trying to get a basic understanding of the phenomena in question.

However these same “scientists” will then turn around and tell us that their understanding of how the atmosphere is very high (based on what, bad data?)
and that their models are sufficient to justify overturning the world’s economy. Still based on data that many of them admit is deeply flawed.

Re:#47 “BTW the statistical analysis completely evades the topic of spurious regression – a topic which seems to be incomprehensible to climate scientists. They all need to take a course in stock markets and random walk to get a better sense of how you can trick yourself.”

Unfortunately, climate scientists are not the only ones unable to grasp this concept. Over the years, I’ve come across a number of statisticians who don’t understand it either. I once saw a neural network model built to predict regions where a competitor was likely to target a compan’s prime customers. What it actually did was very nicely report the areas that the competitor ALREADY HAD targeted.

The above surmise that noise was added to the data has proved correct. Bo Li has confirmed that noise was added to the proxies and that they archived a sample of synthetic data and did use original proxies. She has accordingly corrected the description of the data sets on her webpage to “synthetic” instrumental and “synthetic” proxy and responded cordially, apologizing for any inconvenience. I’ve edited the post accordingly. This resolves the proxy issue, which was an odd one. When I get a chance, I’ll look at the statistics.

The data sets that I posted on my website are not the real Northern Hemisphere temperature and the MBH99 proxies. They are generated by adding white noise with unit variance to the standardized real data. The pseudo data sets on my website only serve as a toy example to try the R code that I used in my paper. However, the results in Li et al. (Tellus, in press) are based on the real data instead of the pseudo data. I am sorry that I did not explain very clearly what the data set on my webpage is and also sorry for the confusion that I brought to you as a consequence. I have modified my webpage to make the point more explicitly.

In our paper, we looked at the residuals in calibration (14 proxies against not PC1 of instrumental, but full N-Hem average) and found that there is enough serial correlation that an AR2 process is warranted to represent the “noise”. Then, the approach was to simulate a set of ensembles that fulfill the criterion during calibration so that the explained variance is the same but that the noise is different (using that AR2). Then, because we wanted to establish a method to study the maximum decadal values, we then simulated all series. Simulating time-evolving series was necessary because we have serial correlation, and thus decadal maxima can be computed. The goal was not to be producing the best reconstruction that is currently possible, but to demonstrate a way of how one could go and address the decadal max question. This question was not well addressed in the NRC report, and thus this work is a followup as collaboration between stats and geoscience to show how one could better answer the significance question for the old MBH99 framework.

found that there is enough serial correlation that an AR2 process is warranted to represent the noise. Then, the approach was to simulate a set of ensembles that fulfill the criterion during calibration so that the explained variance is the same but that the noise is different (using that AR2) No worries

There is only one example in the paper: the one with the supposedly original proxies. Also the figure code supposedly creates the images in the paper; how can you reproduce those if you don’t have the original data? Unless, of course, the added noise does not really matter.

Li:

The goal was not to be producing the best reconstruction that is currently possible, but to demonstrate a way of how one could go and address the decadal max question. This question was not well addressed in the NRC report, and thus this work is a followup as collaboration between stats and geoscience to show how one could better answer the significance question for the old MBH99 framework.

Yes, and this is the reason why the results are presented in conferences in this fashion (my bold):

The short instrumental record of about 100-150 years forces us to use proxy indicators to study climate over long time scales. The climate information in these indirect data is embedded in considerable noise, and the past temperature reconstructions are therefore full of uncertainty, which blurs the understanding of the temperature evolution. To date, the characterization and quantification of uncertainty have not been a high priority in reconstruction procedures. Here we propose a new statistical methodology to explicitly account for three types of uncertainties in the reconstruction process. Via ensemble reconstruction, we directly obtain the distribution of decadal maximum as well as annual maximum. Our method is an integration of linear regression, bootstraping and cross-validation techniques, and it 1) accounts for the effects of temporal correlation of temperature; 2) identifies the variability of the estimated statistical model; and 3) adjusts the effects of potential overfitting. We apply our method to the Northern Hemisphere (NH) average temperature reconstruction. Our results indicate that the recent decadal temperature increase is rapidly overwhelming previous maxima, even with uncertainty taken into account, and the last decade is highly likely to be the warmest in the last millennium.

I suppose that due to the lack of space they omitted the info that their results are based on 14 harshly criticized proxies (sufficient spatial coverage, bristelecones, Mannian PCA, Thompson’s Quelccaya, etc.).

I also find it extremely interesting that the paper cites (twice) the Shumway and Stoffer (2006) book. The same book argues that the global temperature series is a random walk… Dr. Li, please re-read especially the pages 57-63 although I recommend studing the whole book in detail.

1) Well, it is hard to tell if it should be called classical or inverse. That is, they still haven’t figured out that the problem under study is known in statistics as (multivariate) calibration. Their model (eq (1)) is misspecified: I wonder if, statisticians, Li and Nychka have ever wondered why in the regression model the variables on the left (usually denoted by Y) are called dependent/response variables and the ones on the right (X) are independent/explanatory variables. For useful references, see, e.g., here.

2) Don’t be too harsh, UC :) That would actually amount to testing the validity of the model. I see some positive signs here: they implicitly admit that MBH99 is a terrible overfit (p.4 “For example, MBH98 and MBH99 applied essentially an OLS to fit the linear model”, and on the next page they notice that even the GLS fit is an overfit).

I found a pdf of what appears to be a presentation on Bo Li’s website. Is there also a paper somewhere? Or is the pdf it (at least for now)?

Although I need to study these results to be more specific, my first reaction of what is up on the website is that Bo Li has made a critical error than many young statistcians make: they don’t ask enough questions before diving into the mathematics. As a result, we’re left with many of the same fundamental flaws M&M exposed in the original hockey stick, but covered up with more complicated mathematics.

Residuals from Li et al. Fig 1. are obtained using b) for sure. That’s the one that assumes that very special prior for the temperature. And as those residuals are correlated with temperature, that AR(2) noise model is partly based on temperature autocorrelation! But I’m walking on thin ice here, as I haven’t had time to comprehend the full paper ;)

Motivated by the recent discussion of uncertainty in the MBH99 reconstruction (North et al., 2006), we illustrate our statistical procedures for the purpose of this article by restricting our network of proxy records to the 14 series originally used in MBH99 for the period back to the year 1000 (see table 1 in MBH99).

As ICE (b in the post #64) was the correct method to replicate Figure 1, (and R-code indicates that ICE it is) I tried the same method with AD1820 network. There’s 112 proxies, so overfitting (if present) should be more visible with this network (compared to AD1000). Here’s the result, 1850-1859 as a verification period:

The difference between the prediction and the actual observations is an unbiased estimate of the statistical prediction error. If there is no overfitting, the variance of the observed prediction error is expected to be equal to the prediction variability derived from our linear model.

If I got it correctly, for AD1820 step, the variance of observed prediction error is 300 times the predicted error.. Compare to the one they obtained with AD1000 step

This approach suggests an inflation factor of 1.30.

Quite interesting, Wahl and Ammann would probably deem #65 reconstruction unreliable, and they would have to remove proxies to obtain acceptable RE.

#66: Yes, that applies directly to MBH99, that’s why I said they admit that MBH99 is a terrible overfit. But notice that the inflation factor 1.3 is not directly for ICE (with AD1000 step), it’s for their model where they allow AR(2) for the noise (GLS fitting).