Can’t See the Signal For the Trees

ABSTRACT: A new method is proposed for determining if a group of datasets contain a signal in common. The method, which I call Correlation Distribution Analysis (CDA), is shown to be able to detect common signals down to a signal:noise ratio of 1:10. In addition, the method reveals how much of the common signal is contained by each proxy. I applied the method to the Mann et al. 2008 (hereinafter M2008) proxies. I analysed all (N=95) of the M008 proxies which contain data from 1001 to 1980. These contain a clear hockeystick shaped signal. CDA shows that the hockeystick shape is entirely due to Tiljander proxies plus high-altitude southwestern US “stripbark” pines (bristlecones, foxtails, etc). When these are removed, the hockeystick shape disappears entirely.

=============================================

Pondering the wide variety of distributions in the M2008 proxies led me to try to understand some of the statistical properties of a group of proxies. In particular, I wanted to understand more about how to find out if there is a common signal in a group of proxies. Before I got into the question of what the signal might look like, I wanted to know if the signal existed.

To do this, I invented a curious game that I called “hide the signal”. In this game, I took a group of random pseudoproxies, which I standardized to a common mean of 0 and a standard deviation of 1. To these, I added a common signal of different sizes. Then, I tried to devise processes that would reveal the existence of the signal. Not the shape of the signal necessarily, but the size of the signal and how it was distributed amongst the proxies.

After some experimentation, I adopted a procedure which successively removes the proxy with the least correlation with the average of the remaining proxies. The procedure looks like this:

1. Repeat a loop that does the following:

a) Calculate the row means (yearly means of a proxy matrix where the rows are years and the columns are individual proxies) of all of the “proxies” for each year. Let me designate this resulting dataset as RM, for the row means. It has one entry for each year, the mean (average) of all the proxy values for that year.

b) Invert (swap plus for minus) all proxies that have negative correlations with RM. (Since we are looking at the correlation with the average of all proxies, there are typically not many correlations which are negative). Repeat until there are no negatively correlated proxies with respect to RM.

2. Repeat a loop that does the following:

a. Calculate RM, the row means of the group of proxies.

b. Identify and record the proxy with the minimum correlation with RM.

c. Remove from the dataset the proxy with the minimum correlation. Repeat until there is only one proxy left.

At the end of this process, graph the minimum correlation values.

My logic ran like this. If I keep removing the proxy that least resembles the common signal, I will end up with the (possibly few) proxies containing the majority of the signal. It also allows me to investigate how the information was distributed among the proxies.

Knowing nothing about this new procedure, I decided to start by examining datasets containing no information at all. I looked at results from random distributions of various types. I created 100 “proxies” containing 1000 “years” of data for each type of distribution.

The first big surprise in this was the following graph:

￼
Fig. 1. Correlation distribution in a variety of random datasets. The results, while not identical, are so close that the top one covers them all.

I found this quite amazing. In terms of how the information is spread out among the individual proxies, there was absolutely no significant difference between any of the standard kinds of distributions. It seemed like that could be very useful, although I did not know how.
At a minimum, it meant that I could measure the area above the line. This is aided by the fact that a group of 100 identical proxies would produce a correlation distribution which is a straight line of value “1”, since every proxy has a correlation of 1.0 with their average. This meant that, if there were a signal present, I could directly measure the amount of the signal without knowing anything about that signal.

Finding this line of investigation very promising, I moved on to hiding a signal in the proxies. For signal data, I took the first thousand months of the HadCRUT3 dataset. Wanting to make “hide the signal” a hard game, I decided to create datasets with the same structure as the HadCRUT dataset. This particular dataset is well described as an “ARMA” dataset.

The “AR” in ARMA means auto-regressive, and “MA” is moving average. Each has a coefficient. For the HadCRUT dataset (first 1000 points), the relevant coefficients are AR = ~ 0.8, MA = ~ -0.4.

As a baseline, I calculated the information distribution of a random ARMA (0.8, -.04) group of “proxies”. My second surprise came when I graphed that and a few other ARMA datasets, shown as Fig. 2. I found that the correlation in the ARMA datasets is distributed differently than all of the random datasets. It does make sense that as the autocorrelation increases, the correlation with the overall average (RM) should increase.

It is also of interest that the addition of a negative “MA” term serves to oppose the increased correlation with RM due to an increase in autocorrelation. For example, the ARMA(0.8, -0.4) proxy is very close to the normal dataset.

In any case, I then added various levels of signal to the random proxies, as shown in Fig. 3. Again, a surprising result. It is indeed possible to measure the signal to noise ratio in a group of datasets, without knowing the details of the signal. It turns out that the S/N ratio can be calculated directly from this information.

Fig. 3. Information distribution in a group of “proxies” with uniformly added signal. The proxies are ARMA (0.8, -0.4). The signal is the Hadcrut dataset, which has the same ARMA coefficients. The blurred areas behind each line are approximately the 95%CI. “S/N” is the signal to noise ratio.

To determine how the results of the analysis would look if the signal is only present in some of the proxies, I then experimented with hiding the signal in a subset of the proxies. The analysis was able to isolate just the proxies containing the signal. Here are sample results where only some of the proxies contained the signal:

￼
Figure 4. Correlation distribution in a group of proxies where only some proxies contain signal.

Finally, I investigated the effect of adding signal in different amounts to different proxies. In other words, rather than adding the same amplitude of signal to each proxy, I added a random amplitude of the same signal to each proxy. Since the signal is larger in some proxies than in other, the distribution takes a different shape.

￼
Figure 5. Correlation distribution in a group of proxies where each proxy has a random strength signal added to it (same signal, different amplitude). Again, colored areas show approximate 95% CI.

Now, with that as a prologue, I felt that I had a pretty good grasp of the possible kinds of correlation distributions I might find in the M2008 proxies. So, I went to take a look. The first thing I did was to calculate the yearly averages. I wanted a long term record, so I took all of the M2008 proxies that extend from 1001 to 1980. There are 95 of these proxies.

I took the 1001-1980 data from these 95 proxies. I standardized them all to a mean of zero and a standard deviation of one. Then I averaged them, as shown in Fig.6, and standardized the result. Note that the recent average is ~4 standard deviations from the mean.

￼
Fig. 6. Standardized average of 94 M2008 Proxies

Hmmm … that looks suspiciously like the original MBH98 results, the infamous “HockeyStick”. So, I applied my new method to see how much of a common signal there was in the M2008 proxies, and which proxies contained the signal. Here is the result of playing “hide the signal” with the M2008 proxies.

OK, what are the anomalies in that distribution? The first thing that caught my eye were the three proxies close together at the top. Quite odd.

The next is that there are two clear groups of proxies. The first group is the first 70 proxies or so. It has lower correlation values, with a signal/noise ratio of around 1/5. This is followed by about 25 proxies in odd-shaped clumps with relatively higher correlation values, with a S/N ratio of up to 1/3.

I wondered if this represented different types of proxies, like ice cores or trees. I also wondered what proxies were at the upper right. Fig. 10 shows that breakdown:

￼
Figure 10. Red is lake sediment, black is tree rings.

No obvious pattern, except that the top three are lake sediment, and the top 25 are mostly tree rings. Here is legend for the top 25 proxies:

Mystery solved. The three in red at the top, and one further down, are all the Tiljander lake sediment series, which are known to be corrupted.

Once we remove the four Tiljander proxies, it is obvious that the whole edifice is built on a few closely related high-elevation, moisture limited pine trees located in the southwestern US. These tree rings make up no less than 19 of the 21 remaining top proxies after Tiljander is removed. In other words, the bristlecones are back and with a vengeance.

I guess the deal is that no self-respecting paleoclimate reconstruction would be complete without the bristlecone pines (PILO), which make up no less than 12 of the remaining top 21 (after Tiljander is removed). In addition we have the bristlecone’s cousins, the limber pine (PIFL) and the foxtail pine (PIBA). All of these records contain are from similar ecosystems and contain similar signals. The overwhelming majority were collected by Graybill. His work has been called into serious question by LInah Abadneh’s thesis, wherein she was unable to replicate his results.

If I ran the zoo, I’d throw out all of those high altitude pine tree ring records. They are known to have problems, their use has been recommended against, and the principal investigator’s work is under a cloud. I would omit them.

After removing the Tiljander and the southwestern US pines, the average looks like this:

￼
Figure 11. Long term M2008 proxies without Tiljander and southwestern US high-elevation pine proxies.

Note that the hockeystick shape has disappeared entirely. Standard deviation now runs about ±3 SD, a more reasonable result than the +4 to -2 SD obtained with the bristlecones.

Having removed the Tiljander and the southwest high-altitude pinus spp, I looked at what created the pattern in the 73 remaining proxies.

So once again, the signal is dominated by a closely related group. These appear to be a group of Argentinian tree ring proxies.

Conclusions and discussion

Why do I find it suspicious that a simple average gives a hockeystick? What does this tell us about the initial choice of proxies for the M2008 analysis?

1. The longer term proxies in M2008 are totally dominated by the Tiljander and the 19 southwestern US “stripbark” pine proxies. It is those proxies, and those proxies alone, that create the Hockeystick shape found in the signal.

2. Both those groups of proxies have been discussed in the literature, and have been found wanting. Sixteen of the 19 pine series are from Greybill. They are not valid proxies.

3. Once these two groups (Tiljander and pinus) are removed, the new signal is dominated by another group of related tree-ring records, this time from Argentina.

4. These groupings, and their dominance of the results, indicate a systematic problem with the initial selection of proxies. The problem is that a number of closely related records from one geographical area can easily overwhelm and dominate the common signal. This makes it clear that closely related groups of proxies should be averaged before inclusion, to prevent the domination of the common signal.

5. Correlation distribution analysis is a useful tool for determining whether a group of proxies contain a signal in common, and which proxies contain the signal

6. At some point, after Steve figures out Mann’s method, the proponents of Manns work are sure to claim that the hockeystick signal is really there, regardless of the method used … yes, it really is there, but only in the Tiljander and bristlecones. Garbage in, garbage out …

7. Even when/if we can finally come to agreement on the existence some historical common signal in the proxies, we will be faced with a new question … what does that signal represent? Temperature? Moisture? Some combination of both? Neither? Here there is no clear answer of any kind.

My best to everyone,

w.

=============================================

APPENDIX:

The R function that does the work of the correlation distribution analysis looks like this:

The function expects a listing of the column ID numbers (which identify the columns in the matrix “x”) to be available as a variable called “colpoint”. The variable “rampfactor” contains the values of the correlations of each of the proxies. The variable “removed” contains a list of the ID numbers of the proxies in the order in which they are removed.

Not tested for turnkey, but it should be plenty to spell out the workings of the correlation distribution analysis.

Bravo! Publish it! But, if you were a “real” climate scientist, your 1 page code would take at least 30 pages and use 12 temporary files. I hear Gil Grissom left CSI, maybe you can apply for his job, Willis.

Willis, very simple analysis, clearly described, devastating results. I predict peer review rejections on the grounds that it would not be of interest to the journal subscribers. Please do not resubmit, or submit to any other climate journal.

Re: John Norris (#12), If you look a couple of places in the code, Willis has a ampersand character, followed by “lt” and a semicolon. This is a code for the “less than” character, which would screw up WordPress if put in directly. Amusingly, in his note, WordPress converted this code into the actual character.

WOW! Great sleuthing, Willis. Awesome! But, I suppose Mann, et. al. are probably now working on Mann, 2018, which will use the same “important proxies” plus some new ones. For his sake, I hope Linah Abaneh has not yet published her results by 2018.

What is so pretty about Willis’ work here, is that he publishes the code, shows exactly how the method behaves with different noise structures and how well it can detect a signal in data under different assumptions. How’s that for demonstrating a method? Very clean and nice. The opposite of hand-waving. Re: thefordprefect (#13), In your graph the data are rising from 1850–the black horizontal line is just an arbitrary baseline for computing an offset.

Nice work. I myself was thinking about piddling around with some of the first things you tried.

One thing, wouldn’t it be most useful to use this method to analyze dominant signals in particular groups of proxies? For instance, tree-ring proxy data can be affected by any number of things, CO2 percentage, precipitation, temperature, etc..etc.. If the method of finding a global temperature record is going to rely on multiple types of proxies (presumably to gain accuracy), wouldn’t this method be best used to identify the dominant signals in various types of proxies *BEFORE* they are combined into a historical temperature record? The temperature signal is going to take time to find in each of them, so this method that essentially plots out the correlation depending on proxy should help explain to those looking to analyze the dominant signals what is actually happening in tree rings, or what is the strongest trend in ice-cores, or sea-sediment plankton counts, etc…

Also, since this method helps you identify the proxies with the strongest correlation to an assumed signal couldn’t an inverse fourier method be used on specific proxies (as identified by this method) and the results compared to known periodic oscillations in precipitation, temperature, etc..? This might eventually tell you exactly what signal you’re seeing (if we get lucky).

The original data is available as R tables (thanks to Steve M.) The data itself is here, and the details (type of proxy, authors, etc.) are here. Use an R statement like:

manndata = load(“mann.tab”)

manndetails = load (“details.tab”)

to extract the data once the files have been downloaded.

thefordprefect, you ask, does the hockeystick in this data match the instrumental record? Well, no … but then the original hockeystick didn’t fit that well with the instrumental record either. However, this hockeystick is very similar to the original hockeystick. See MBH 98 or the IPCC TAR for the original. In addition, the rise in the data you show above begins, not in 1980, but in 1880 or so..

John Norris, the WordPress software running this blog doesn’t like “less than” symbols, it thinks that they are the start of some HTML tag and chokes. It has replaced them with ” & lt ; “. See line 11 of the R code above, which says:

x[,correls<0]=-x[,correls<0]

You’ll need to replace the ” & lt ; ” with the actual “less than” symbol. (Unfortunately, it translated the < symbol in my note, but not in the R code).

Arn Riewe, you say:

Even if the Tiljander, Bristlecone and Argentinian proxies were not tainted, wouldn’t that point to a regional rather than global result?

Yes. To me, it means that the duplication of very closely related proxies is skewing the result. Out of the 95 M2008 proxies that extend from 1001 to 1980, about 20% are bristlecone and allied pine proxies from the southwestern US. When you stack the proxy deck like that, the region with the most proxies takes over and you get a regional result.

thefordprefect, you ask, does the hockeystick in this data match the instrumental record? Well, no … but then the original hockeystick didn’t fit that well with the instrumental record either. However, this hockeystick is very similar to the original hockeystick. See MBH 98 or the IPCC TAR for the original. In addition, the rise in the data you show above begins, not in 1980, but in 1880 or so..

I apologise for this, but I honestly do not see the rapid rise in your Fig 6 occuring at the same time as the Mann temperature curve. Unless I am looking at the wrong thing the rises begin with a 50year difference. The rest of the dips and bumps are aligned well, so its not an offset in the whole data set. Surely if the signal you are finding is “THE hockey stick” then it should overlay the Mann data exactly?

I’m sure I read somewhere (and Lucia in 78 alludes to this) that if the proxies did not match the instrumental record then they were downgraded – The instrumental record from 1900ish (? my guess) onwards must be the gold curve and if the proxy does not fit then it is not valid at this date – it could however, match date for date with an earlier sequence on another set of proxies and consequently may have some validity.
An overlay of your fig6 and Mann’s

Remember my statement, that my result was “very similar to” the original hockeystick. Are they identical? No, I didn’t say they were. I would not expect them to be, as they contain different proxies and are calculated in very different ways.

Given that they are based on different proxies and are calculated in different ways, however, makes the similarity to the original hockeystick all the more surprising. Part of the difference is likely due to the presence of the four Tiljander proxies. These skyrocket starting a couple hundred years ago or so (IIRC, I don’t have the originals handy), and thus they likely drag the final “blade” part of the stick up earlier than if they were absent … but I digress.

My point in mentioning the hockeystick shape of my results was that:

1) we get a hockeystick shaped graph, which is very similar to Mann’s original “Hockeystick” and contains many of the same features (e.g. dips in ~ 1350, 1450, and 1700, etc.), by simple averaging of all of the proxies in Mann et. al 2008 which cover the period 1001-1980, and

2) the blade of that hockeystick disappears entirely when we remove two sets of proxies which are known to have problems (Tiljander and southwestern US “stripbark” pine spp., the problems of which have been discussed both in the literature and on this site.) The blade does not become smaller when that is done. It disappears.

This confirms the strong dependence of the original Hockeystick on the stripbark pines of the southwestern USA, which has been discussed extensively on this site. It has continued through all of the subsequent “independent confirmations” (which were neither independent nor confirmatory) of the original hockeystick. And it has continued up to the present, with the M2008 paper. This dependence on a single group of proxies makes a mockery of all the claims of those studies to be “robust” or to apply to the globe or the NH. They are neither robust nor global, they are at best a measure of the SW USA, and likely not even that.

My only criticism is that you’re writing this here instead of in a journal! Get it out there; this seems to be a valid criticism and it deserves peer review.

Why publish here? I am a self-taught amateur scientist. As such, my knowledge is extensive and need-driven rather than comprehensive. For all I know, UC or Lucia or Koutsoyiannis or one of the heavyweights in statistics will come along and say “Oh, that’s Jacobsen’s method, see “Journal of Arcane Statistics 1957”.

Plus if there’s errors, I’d rather find them here …

In fact, I’m hoping someone does say it’s a known method, it will save me the trouble of re-writing this for a journal. Have to redo all my graphs in b/w, re-write it in some much more ponderous style, you know the drill.

And yes, I plan to submit it if in fact it is a new method of proxy analysis. I’m also taking recommendations on where to submit it if that is the case.

Why publish here? I am a self-taught amateur scientist. As such, my knowledge is extensive and need-driven rather than comprehensive. For all I know, UC or Lucia or Koutsoyiannis or one of the heavyweights in statistics will come along and say “Oh, that’s Jacobsen’s method, see “Journal of Arcane Statistics 1957”.

Plus if there’s errors, I’d rather find them here …

In fact, I’m hoping someone does say it’s a known method, it will save me the trouble of re-writing this for a journal. Have to redo all my graphs in b/w, re-write it in some much more ponderous style, you know the drill.

And yes, I plan to submit it if in fact it is a new method of proxy analysis. I’m also taking recommendations on where to submit it if that is the case.

Willis E, I think you have described the essence of layperson participation in the discussions and analyses of scientific papers and methods — and taken it a step or two further.

I, being in no position to do a comprehensive evaluation of your effort, await those who are. Regardless of the outcome, you are to be commended for your efforts and making these blog an even more thought provoking place to participate.

I can’t help feeling that Willis’ work belongs in a Journal of Statistical Climatology and not a blog. Especially since he claims to have created a novel statistical test or set of tests which have yet to be described and replicated.

Perhaps of interest is that the st deviation increases at about 1850 just at the point that measured temperatures are available.

Just out of interest here is a different temp graph for Oxford UK (figures unmodified from Met office). Temperatures normalised on a month by month basis to the average of the same month in the period 1960 to 1990. No clever processing on the 5 year average – it just an average!

As you can see its reasonably flat until 1982 then a steady rise (a bit of a dip from 1878 to 1895)

What spurious off-topic point are you trying to make and why is this graph relevant? From your earlier post, it also appears that you didn’t look at the time scale on the hockey stick structure of e.g. figure 11 in Willis’ post when you compared it to the graph in 13. Otherwise you would realize that the global average graph is at the tail end of of Willis’ graph (yes, it shows rising temperatures from about 1880) and represents only the “blade” of end of the stick, not the entire stick.

The Oxford graph is possibly different in that it compares the monthly average temperature against the monthly average, averaged over the reference period 1960 to 1990. I assume the usual is to average the yearly temperature over the reference period and then subtract this from the current yearly average. This method shows no rise until perhaps 1931 – no where near the 1850 claimed in the main article.
You say “represents only the “blade” of end of the stick, not the entire stick.” (blade=shaft?) which is not what i understood by the comment:
“Fig. 6. Standardized average of 94 M2008 Proxies
Hmmm … that looks suspiciously like the original MBH98 results, the infamous “HockeyStick”. ”

RomanM, the “colStats” R function allows you to collect statistics from a matrix on a column by column basis. It has the form

colStats(x,whichtest)

where “x” is the matrix to analyse, and “whichtest” is the name of a function to apply to each column.

You can use one of the standard R function, or a user-defined function, for the “whichtest”. In my case, I use a formula I defined that calculates the correlation of a column with the row means of all of the columns.

In R, you can type “?colStats” in the console to get the full definition.

Re: Willis Eschenbach (#26)
Thanks. I had looked in R and could not initially locate it. I looked deeper and found it in the fUtilities library which needs to be loaded first before the function can be accessed.

But isn’t this just cherry picking? You find the hockey stick pattern and then remove it, justifying the removal after the fact. The real work is proving that the removed proxys should be removed before the fact, independent of what shape they have.

What if there were no known issues with Graybill or Tiljander? You are just picking a shape you want to remove and then removing it.

But isn’t this just cherry picking? You find the hockey stick pattern and then remove it, justifying the removal after the fact. The real work is proving that the removed proxys should be removed before the fact, independent of what shape they have.

What my Correlation Distribution Analysis shows is where amongst the proxies the common signal is located. It does not say “remove it” or “don’t remove it”.

There are two separate issues here. The first is, from whence ariseth the hockeystick? My analysis shows that the HS is located entirely in two groups of proxies, Tiljander lake sediments and SW US stripbark pines, both of which have large known issues. If Mann et al. had done their homework, and listened to the various statistical authorities, they would not have included either of them at all. No cherry picking there, nothing after the fact, the issues were well known beforehand and were detailed on this site and elsewhere long before my analysis.

My analysis does not show that those proxies should not be used. It merely identifies those proxies as the keys to the result. If you remove those proxies, the hockeystick disappears … and there are excellent ex-ante reasons to remove them, which are known and discussed in the relevant literature. As far as I can tell, the only reason that the AGW adherents keep using the stripbark pines is that they know that without them, the hockeystick disappears … but I digress.

The second question has to do with the location of the remainder of the signal after Tiljander and stripbark proxies are removed. Again, we find that much of the signal is located in a closely related group of proxies, this time in Argentina. This raises a separate issue — what should be done with a bunch of related proxies from a single (or geographically small) location.

My own feeling on this issue is that in general, such groups should be averaged beforehand, and only their average should be included in the stack of proxies. Regardless of whether the proxies are valid or not, adding 20 or so closely related proxies from the SW US, and another seven closely related proxies from Argentina, stacks the deck quite heavily regarding the final outcome.

Regardless of whether the proxies are valid or not, adding 20 or so closely related proxies from the SW US, and another seven closely related proxies from Argentina, stacks the deck quite heavily regarding the final outcome.

Isn’t there a discussion about how to deal with averages over known subgroups in “The Mismeasure of Man”? I seem to recall one “proof” that northern Europeans had superior cranial capacity involved averaging skull size over samples with qualitative conclusions based on the notion that the ethnic groups with the larger skulls would have higher intelligence. The “Northern European” sample consisted almost exclusively of male skulls. Some other groups female skulls and in one case almost all female skulls. So, in that case skull size was the proxie. You have to be careful about weighting to get an average.

That’s essentially what I was wondering. But I was wondering if you compared the statistical procedure in Willis’ method with PCA, just what relationship is produced. Perhaps Steve or UC or Jean S is already at work on this.

Isn’t the CDA method assuming that there is only one signal of interest in the data (the row means) and quantifying how much of it each proxy contains it. So the assumption is that the data contains one signal plus noise

I always understood that PCA would identify and extract multiple independent signals from the data and would quantify how much each of these signals contributed to the variance of the data.

I always understood that PCA would identify and extract multiple independent signals from the data and would quantify how much each of these signals contributed to the variance of the data.

Close, but not exact. The components are only uncorrelated, not independent. Independent Component Analysis (ICA) is used to extract independent components, and generally requires higher order statistics such as kurtosis or negentropy.

What I like about this is that it has correctly found the outliers without any prior knowledge of the reason that they are outliers. I dont have sufficient stats knowledge to make any comment on the methodology, and it will be interesting to see follow-ups by other stats experts to confirm its robustness as a tool (no offence to Willis) i.e. peer review. I can see a myriad of other applications for it coming soon. Well done.

As it stands now, many people unfamiliar with the twists and turns of Mannian statistical sleight of hand won’t really get the significance. This has to be reduced in a crucible, almost to soundbite level to be effective on a broad scale.

Isn’t there a discussion about how to deal with averages over known subgroups in “The Mismeasure of Man”? I seem to recall one “proof” that northern Europeans had superior cranial capacity involved averaging skull size over samples with qualitative conclusions based on the notion that the ethnic groups with the larger skulls would have higher intelligence. The “Northern European” sample consisted almost exclusively of male skulls. Some other groups female skulls and in one case almost all female skulls. So, in that case skull size was the proxie. You have to be careful about weighting to get an average.

Very true.

All that my method of correlation distribution analysis does is identify the sub-groups. What to do with them is another question, about which there is substantial literature.

My bozo method would be to identify each group and think about what they represent, and why they might be so similar, and whether they are responding to moisture, or temperature, or what. Look deeper into the mystery.

For the purposes of initial identification of further subgroups in the Mann dataset, on the other hand, I’ll just average all the Argentine patagonian cypress proxies into one single proxy, standardize it, and re-run the analysis with the Argentinian cypress getting one vote instead of seven … time, I need more hours in a day. This is very much a work in progress.

First, I have not preselected the proxies except for requiring that they have data from 1001 – 1980. In particular, I have not excluded wildly heteroscedastic datasets, which probably should be done. Some of them have data points eight or more standard deviations away from the mean … riiiiight … but I left them in, I wanted to see how it flew with hardcore nasty data first.

Next, note that correlation distribution analyses above, both with and without the Tiljander/stripbark datasets, indicate some kind of common signal across the datasets. It matches a signal to noise ratio of about 1:5 assuming random distribution of the signal. I’m quite curious about that signal. Will it disappear when the Argentine cypress are averaged out? I don’t think so.

I say that in part because I have run my function in reverse … a curious concept. What I have done is I took the list and flipped it (greatest to least correlation) and removed them in that order. I set it up to graph the remaining proxies as it calculated each one, to make a simple kind of flipbook movie.

Starting with the hockeystick shape as shown in Fig. 6 above, the removal of the top proxies quickly removes the hockeystick. After ten are gone, the modern data is even in amplitude with the early data. After the top 25 are removed (tiljander/stripbark), we get Fig. 11 above.

But beyond that, the shape doesn’t change much for a while. The shape after removing the Argentine cypress is not that much different from before. Certain of the features are maintained through a good part of the remainder of the dataset.

This suggests an interesting method for extracting the common signal. This would be to pare the proxy dataset from both ends … the proxies that don’t make a difference to the common signal, and the ones that really make a difference. Leave the ones in the middle …

How to do that, and does it improve things? I think that progress can be detected by using the correlation distribution analysis technique. At present, it shows a curve which matches a randomly scattered signal with a S/N of 1/5. Some proxies have more of the common signal, and some have less. We want a smaller, better correlated dataset, in particular one with a high S/N ratio.

Perhaps, flip the function around and pare from the other end (greatest correlation with RM) and stop at some predetermined point. I haven’t graphed what that looks like yet … instead of increasing correlation, you’re removing it. I suspect that the last third or so of the dataset will be some real weirdo proxies, they’ll be real different from each other, the outliers … and that’s valuable information as well. It would not surprise me if some of the wildly heteroscedastic proxies ended up in that group. Yes, always more to do.

outstanding piece of analytical tool dvelopment.
Have you looked at the anomalously low figures once you have stripped out the US pines and Tiljander lakes. I am not surewhat a point with less than no signal means but presumably it means a negative correlation with the “hidden” signal. Is there any evidence that these proxy’s are un-reliable. More when I have thought about this in more detail although I confess to being a rank amateur at statistical analysis

Actually by the ‘thefordprefect’ graphing Oxford’s instrument measured temperatures, he points out exactly the problem with the Tiljander Graybill etc proxies that make the hockeystick the hockeystick. The ‘thefordprefect’ is using a regional record as a subistute for a global average, he is saying ‘look at Oxford that proves the global rise in temperature!’ when in fact all it proves is the temperature in Oxford, likewise all the Agrintinian tree rings and SW US tree rings indicated growth occuring to trees in particular years in a reginal area and the lake sediments indicated depositions during particular years in a particular lake.

This results in the message from the other 75 or so records that make up the bulk of the global proxies being drowned out by the shouting from these few.

As with many of the others, I am in awe at what you are reporting here. Not only does it appear to be new, it also seems very sensible. I particularly appreciate your follow up comment (Willis Eschenbach (#41)) because it clarifies things about where to go next.

However I have 3 thoughts.

First I would be very curious to see the breakdown between (initial) positive and negative correlations. I understand why you are taking the absolute value in this test but I wonder if there is not some kind of worthwhile additional information to be found if we note by some manner or other which proxies were positive and which negatively correlated to the final result.

Secondly it occurs to me that the correct course of action is not to remove all suspicious proxies but to remove all but 1. Thus since you discover that 19 proxies come from trees in SW USA you should simply take one of them as respresentative of that part of the signal. Ditto with mud in Scandinavia(? think that’s Tiljander origin?) and trees in Argentina. If you do that what happens?

My expectation would be that you get something similar to fig 11 even with one representative proxy from Argentina, SW USA and Scandinavia but I don’t know.

Finally, assuming smart statistical minds can confirm that this seems like a sensible thing to do, running this test over the gridcells that make up GISS, HADCRU and the other instrument/satellite records ought to give us an idea of whether these records also overweight in certain areas

Willis:
I think Patrick M. has a point, but a point that can be readily checked. I think your approach is very elegant and easy to understand but it may in fact be very similar to various types of factor analysis.
Your conclusion, however, remains, even if the method is not as novel as it may forst appear.
As to the cherry picking charge – this is silly IMHO. How else is one meant to isolate or understand the primary explanatory variables in a signal with many different components. The error comes in bundling a whole bunch of proxies together assuming that you do not know anything about those proxies and that you do not have to worry about the quality of the individual proxies. There is a whole literature on meta-analysis techniques and I am pretty certain this problem of a few dominant “proxies” or studies will have come up before.

It reminds this layman a bit of PCA, but it this case it might be called principal proxy analysis PPA. In any case, better is irrelevant, it is what it is, and what it is strong evidence that no matter what model year of hockey stick we look at, it is always made of the same wood, stripbark bristlecones. Mann is just putting new lipstick on the same tired old pig.

#52. Willis sent a draft of this post to me some time ago and I encouraged him on the topic. I asked him a question related to #52 – how this method tied into PCA (as opposed to why it was “better”).

As others have observed (e.g. John A), I’ve objected on many occasions to the use of “novel” statistical methods on contentious data sets to produce applied conclusions. What’s sauce for the goose is sauce for the gander and just because some readers “like” the results and believe that they are relevant for policy purposes is all the more reason for not jumping to conclusions based on novel methods.

The dependence of the most recent Mann version on bristlecones and Tiljander sediments can be identified by elementary methods – indeed, we’d already discussed this a while ago.

I think that it’s safer to regard Mann’s data set as an interesting example with perverse data that provides motivation for the method, rather than as an end in itself. If the method is to be useful, then, as John A and others observed, it has to be placed in the context of other methods without worrying too much about the outcome with Mann’s data set.

In this light, the plot of correlation distributions by proxy looks rather like an eigenvalue scree plot done from smallest to largest,as opposed to the usual convention of the other way around. Given that PCA is simply a singular value decomposition of a correlation matrix (covariance matrix), I can’t help but think that there is a mathematical connection of some type between the methods and I think that it would be necessary to fully canvass this topic, starting with some simple experiments.

As others have observed (e.g. John A), I’ve objected on many occasions to the use of “novel” statistical methods on contentious data sets to produce applied conclusions. What’s sauce for the goose is sauce for the gander and just because some readers “like” the results and believe that they are relevant for policy purposes is all the more reason for not jumping to conclusions based on novel methods.

The dependence of the most recent Mann version on bristlecones and Tiljander sediments can be identified by elementary methods – indeed, we’d already discussed this a while ago.

#53. One of the problems with Mannian data sets is that not everything become a “proxy” merely by being called one. For example, Mann inverted the Tiljander mud from the interpretation of the author. (Tiljander said that high 20th century sedimentation were due to such mundane things as ditches and bridge construction and had nothing to do with climate.) Mann’s data mining method blithely ignored this. Similarly with the Graybill proxies. Ababneh, as Willis observed, did not replicate Graybill’s results. On several counts, the Graybill chronologies cannot be relied on. The continuing addiction of the Team to Graybill chronologies is perhaps understandable but not justifiable.

Interesting technique, it seems like a good method to find the strongest signal. Please consider using the non-infilled dataset and redo the same analysis. I know it’s not like you get paid to do these things but I wonder how much signal was available before infilling.

I think you’re probably right that some signal is in these series but this quote caught my eye.

At some point, after Steve figures out Mann’s method, the proponents of Manns work are sure to claim that the hockeystick signal is really there, regardless of the method used … yes, it really is there, but only in the Tiljander and bristlecones. Garbage in, garbage out …

I am curious how much signal is left when it has not been artificially added in by a cut and RegEM paste. I haven’t had a chance to look at the proxy list you used to see the amount of infilling but it had a significant effect on the signal level in the total data set. I did a post last month where I was able to extract any pattern I wanted from the M08 data using CPS.

“Why is your analysis any better than PCA?”
This analysis show directly which odd proxies (or groups of) have the strongest impact on the result.
If you omit some groups of proxies from the rest of the data you will notice little changes.
That means that if you use, like Mann, some of these odd proxies you will always get a certain result independently of how you combine the rest of the data. In this case the odd proxies should be removed to find a more universal result.

Congrats to Willis for his hard work. As someone on the outside, I guess I’m surprised that this is new stuff. The hockey stick is 10 years old. I would have assumed that it would have been pulled apart and had each piece carefully inspected a long, long time ago. [Of course, I once assumed that “peer-review” meant scientists had actually checked the work, too. Or that thermometers were placed properly. Or that IPCC pronouncements incorporated ALL the science. Or that IPCC participants followed published guidelines. Or that “scientists” provided details of their work so that others could check it out. Silly me.]

My question — is this post significant because the methodology Willis came up with is a new tool for analysis (useful in other contexts)? Because I thought we already knew that Mann’s hockey stick was dependent on garbage proxies.

Isn’t there a nicer way to average them though? Take the number of proxies, and imagine them distributed evenly across the globe. Measure the sum of the distance of one location to all the others.

Now, take the actual proxies and measure the distance from each to all the others. Divide the average sum by the sum for each, and you get the factor that each proxy needs to be multiplied by to find its proper contribution to the average trend (I think).

You may have to do it on land alone, and I’m not sure how to do that. You could ask, what is the smallest grid with a number of cells equal to the number of proxies, that covers the land surface of the world? Then average the sum for the centerpoints of that grid.

At any rate, even if the above is all wrong, I’m sure there are statistical methods for compensating for geographical clustering in a sample.

I know you don’t really want to see it but someone asked to what was happening to CO2 and others criticised only one location (it was only an example!!), So a few extras added. I still do not see how the “hockey stick” above agrees with the HADCRUT data – I thought the proxy data had been aligned with the instrumental record when it was available.

Congrats on an excellent analysis. Although Patrick M has a point it is a bit mute. Whether the statistical methodology used is PCA or CDA or anything else, the point to be understood and appreciated here in both Steve’s previous analysis and now in Willis excellent analysis is that the Mann 2008 hockey stick, just like the MBH 98/99 hockey stick is not robust to the removal of certain proxies, namely BCPs. This time Mann didn’t need to accidentally provide a CENSORED folder for us to find this out.

snip – please don’t make policy editorials. NExt time, the post will be deleted.

Is it a worthwhile internet exercise to create hidden signals in a few thousand pseudoproxies and see who can find the inserted signal?

I mean, this is what we’re going for right? Since the world wants an accurate historical temperature record, and since individual proxies can range on one axis from pure noise to strong (but unknown) signals, and on another axis from useful temperature signal, to completely useless… Would anyone even bother trying to solve this kind of a puzzle?

SETI astronomers have been looking for unknown (but presumed) signals from stars for decades. I’m sure their methods would translate over somewhat easily. The differences are they typically only analyze *ONE* data stream at a time for signals (narrow-band-emissions). They don’t pick/combine jumbles of a hundred or so from thousands of recordings from different stars where it isn’t known which of the thousands have a valid signal in them.

Now, take the actual proxies and measure the distance from each to all the others. Divide the average sum by the sum for each, and you get the factor that each proxy needs to be multiplied by to find its proper contribution to the average trend (I think).

You mean, average sort of of way one might with a set of surface temperature measurements? 🙂

Obviously, no one estimating GMST would put a whole bunch of thermometers in one region, only a few in other regions and then weight by thermometer. Given the small surface coverage, it’s difficult to know quite what to do given the limitations on what is available– but some sort of rational weighting is needed. Or, failing that, one needs to explore how different rational choices of weighting affect the result. If different choices result in large differences in results, that reduces our confidence in any particular weighting.

If there is a “signal” in a 1000 series combined with “regular” noise, you can extract the signal in a variety of different ways and the results look pretty similar.

Although the Team loves to get all wrapped up in different multivariate methods – and 99% of all readers seem to adopt this premise – I don’t think that this is the main issue. As I’ve observed dozens of times, the problem is that everybody seems to want to skip the step of showing that bristlecone ring width chronologies are replicable temperature “proxies” (or Finnish varves) and then show that these things work out of sample on new data. Or explain why Yamal is a “proxy” and not Polar Urals Update. Until results can stand still for one proxy in one region, applying multivariate methods to networks of these things ends up accomplishing nothing.

Re: Steve McIntyre (#77),
Of course, your point is more important than how to weight the individual ‘proxies’. If some set of ‘proxies’ don’t track temperature, then their weight should be exactly zero. That’s true no matter how many non-proxies they have or how the non-proxies are distributed over the planet surface.

When one looks att Fig 11, it does seem that there may be a temperature signal there, though a noisy one. The MWP and the LIA is there and the modern upturn. It even seems likely that some of the historically known short cold episodes are visible (mid 12th century, 1690’s).

Looking at the Oxford temperature record, my immediate reaction is that something happened there in ca 1995. Note the abrupt step change in minimum temperatures with no commensurate change in maximums. Looks like a siting or instrumentation issue to me.

thefordprefect: Willis published an analysis of Mann 2008 proxies. If you want to compare his graphs with Mann’s, that’s reasonable. If you do that I think you find they match quite well. But I don’t see the point in comparing them to instrument data trends unless you are trying to argue that Mann 2008 is invalid because it does not match either. Even then, that’s a discussion for a different day.

lucia and others who have questioned the weighting … I am in total agreement with Steve M. that people get too wrapped up in the details of the weighting. Before we get to the individual proxy weighting, before we consider geographical weighting, before we try to determine whether we should use CDA or some other method, we need to examine the proxies themselves.

Thats partly why I have taken this tack, precisely because there is no weighting in my method. I wanted to find out, if we take the weighting out of the equation, what can we find out about the proxies themselves? Which ones make a difference, that is to say which ones contain the common signal, and which don’t?

In this case (M2008) it is important because they’ve thrown everything into the mix – good proxies, bad proxies, things that seem to not be proxies at all, it’s a real mess. In part, I suspect that this was done so that they could claim “robustness”. Yes, if there are say 95 proxies in the 1001-1980 mix, the result doesn’t change much when you pull one or two of them out, or even take out all four of the Tiljander proxies … but does this make their result “robust”? And yes, if a fifth !! of the proxies that stretch from 1001 to 1980 are high altitude southwestern US pine proxies, you can pull out any five of them without much changing the result … does this make their results “robust”?

For those that want to compare my results with the modern instrumental temperature record, you’re misunderstanding what I have done. I have looked for the existence of a common signal in the Mann proxies. I have made no claims about what that signal might be. It might be temperature, it might be moisture, it might be some combination, it might be neither.

Nor am I making a new or original historical temperature reconstruction, we’re miles from that. I’m just trying to figure out what common signal exists in the M2008 proxies, and which proxies are most responsible for the shape of the signal.

My results are similar to Manns original Hockeystick for a simple reason. They are both built on southwestern stripbark pine proxies. Take those out, and the edifice disappears. The similarity of the results also weakens the argument that there is anything special about Manns complex and abstruse methods.

Next, someone asked about using the non-infilled data. My understanding was that only the recent data was infilled, which is why I used the period 1001 – 1980.

Finally, I agree wholeheartedly with Steve M. regarding the use of novel statistical methods. That’s why I provided baseline results from applying my methods a variety of random datasets, along with results from datasets actually containing known signals distributed in different ways. If Mann were to do the same with PCA or any of his novel methods, I’d look much more kindly upon them.

I make no sweeping claims for my method, I have thrown it out here for further examination. It seems to do a good job of identifying various proxies of interest.

Finally, I agree wholeheartedly with Steve M. regarding the use of novel statistical methods. That’s why I provided baseline results from applying my methods a variety of random datasets, along with results from datasets actually containing known signals distributed in different ways. If Mann were to do the same with PCA or any of his novel methods, I’d look much more kindly upon them.

I make no sweeping claims for my method, I have thrown it out here for further examination. It seems to do a good job of identifying various proxies of interest.

I think your comments speak well to issue that a blog post does not have to suffer the inhibitions of a published paper. You are also not making any claims beyond the revelation of your analysis procedure for comment at this blog.

Who’d a thunk that that Wyoming cowboy, instead of going into town or drinking coffee around the campfire, was pondering climate issues.

Let’s say the scientific validity of the proxies as representing temperature was done in the first place. Is it then true that this technique would not be of great value for this problem? Or might it be of value in identifying regional anomalies?

In my opinion the best contribution the author did to discussion of the Global Warming is in the very end of this article:

“Even when/if we can finally come to agreement on the existence some historical common signal in the proxies, we will be faced with a new question … what does that signal represent? Temperature? Moisture? Some combination of both? Neither? Here there is no clear answer of any kind.”

To use any proxy as a thermometer substitute we should have evidence that tree rings or anything else used as a proxy are really measure temperatures and all other effects are pretty negligible. Otherwise the proxy or even a real device such as a thermometer shows something but it is not a real temperature. That is why meteorologists put thermometers away from direct sun to avoid negative effects due to optical effects which may totally subdue the real signal – temperature we would like to measure.

Could somebody find a correlation between thickness of rings and temperature? I doubt. Just make a thought experiment what will happen if temperature is colder (or warmer) but humidity is extremely low. I guess a tree will wither.

If there is a correlation it should be prove in real experiments with trees but such experiment will require dozens of years, and it is a big question if it will show anything.

snip – I am getting very tired of people editorializing on policy and politicians contrary to blog policy.

Congratulations to WillisE on a very clear exposition. While the principle that we should be cautious about ‘novel’ methods should always apply, there is a difference here. The real novelty here is that the method is simple, easily understood, and pared down to its essentials, while the methods used by Mann and his allies are ever more intricate and arcane, and fortified with impenetrable language. Even Jolliffe took years to come to grips with one part of it, his own specialty.

SteveM has done invaluable service by parsing Mann’s methods in forensic detail, and finding them wanting. What the debate also needs are more elegant alternatives. I always thought “surely there is a simpler way” and Willis has provided one.

The Willis analysis could easily have confirmed Mann’s if it had found that the ‘best’ proxies were independent analyses scattered around the globe. It did not.

Congratulations willis. Publish and be damned, as they say.
But make damn sure that you emphasise the missuse of Tiljander, the contribution of the Graybill series, which “should be avoided”, plus Ababneh’s failure to replicate Graybill’s results and the inclusion of other, questionable, moisture limited tree ring series.
Oh yes and what do they say about this over at RC? “There are non so blind as those who will not see…”

If I understand correctly, the correlation coefficient r between two data series (M data points each) can be interpreted as the scalar product of two vectors (created by centering and normalizing the data) in M-dimensional space, or the cosine of the angle between them. Would it make sense to use arccos(r) as an angular distance measure between proxies, and do some data clustering analysis (quality threshold, or somesuch)? I think this could identify groups of proxies carrying common signal.

If I understand correctly, the correlation coefficient r between two data series (M data points each) can be interpreted as the scalar product of two vectors (created by centering and normalizing the data) in M-dimensional space, or the cosine of the angle between them. Would it make sense to use arccos(r) as an angular distance measure between proxies, and do some data clustering analysis (quality threshold, or somesuch)? I think this could identify groups of proxies carrying common signal.

What I have done is a kind of clustering analysis. However, I don’t think that transforming the dataset as you suggest, using arccos(r) instead of r, would change my results.

I say this because all the arccos transformation does is transform 1 > r > -1 into pi/2 > r > -pi/2. Yes, there is some “rubber-band” distortion, but nobody changes places. Bigger in one transform is bigger in the other. Adjacencies in one transform will still be adjacent in the other. Groups in one will be groups in the other. So my results would not be affected.

Cluster analysis can indeed be a useful tool in this situation. There are several different methodologies for cluster analysis and each methodology has a variety of ways for defining “distances” to be used in the cluster process. It takes some practice to decide what method is the “best” for a particular situation.
You might wish to compare your results to the ones you get when using a hierarchical method with the distance between two proxies defined as 1-absolutevalue(correlation) where the larger the correlation, the “closer” the two proxies are. In this case, I used the average of all correlations between the proxies in one cluster and the proxies in the other to define the distance between two clusters. In R:

Willis, congrats on a well-thought out novel approach. As mentioned by others, it is related to Multivariate Approaches such as PCA, Factor Analysis, and (in particular) Cluster Analysis.

Clustering techniques are particularly useful because they allow you to go beyond a one-at-a-time approach and look for optimal groups. I used to use so-called “fuzzy clustering” approaches to looking for outlier/influential groups in multiple regression residuals.

I don’t know if R has any fuzzy clustering capabilities, but it might be worth examining.

Your dendrogram sure as heck pulls the Tiljander group out doesn’t it. And the Graybill stuff too.

And err a whole bunch of others. It seems like each author has a cluster – there’s a thompson cluster, a curtis cluster, the arg cluster…

I’m not enough of a statistician to understand what exactly the significance of this result is (and I can see the argument that nearby records of like thingies (trees, lakes etc.) will tend to be related) but combined with Willis’ work you really see how you can vary the weights of different regions and get different results.

In some ways this reminds me very much of Fourier analysis and transforms (or maybe a reverse transform). Perhaps this is not surprising I think the mean of the Fourier components of a curve is effectively the curve/(no of components)…

I don’t think of this as a transformation of the data. The dendrogram is just a nice visual way of displaying how similar (or disimilar) the proxies are to each other where similarity is defined as strength of the correlation between them. Choosing another distance measure or another method of forming the clusters might produce a somewhat different picture.

By drawing a horizontal line across the graph at a given level, you “cut ” the set of proxies into clusters (like bunches of grapes). Where to draw the line depends on how many clusters you would like or how strong you want similarity within clusters to be.

After removing the Tiljander and sw US pines, you may recall we found seven Argentinian cypress. Removing those, things look somewhat better …

Figure 13. Further decomposition of the M2008 proxies

Despite looking better, we find that the top four are again a group … this time of lake sediments. In fact, the tree rings seem to have lost their luster at this point, here’s the contestants from the top down:

Once again, the top contestants are in groups. Now trees are in again, with two groups of four clearly visible. Western juniper (JUCO) and bald cypress (TADI). I’m removing each group and recording the groups memberships as I go. I’m surprised at how good the algorithm is at separating these proxies out into groups.

When I get done, I’ll average each group, re-standardize, and add all of the resulting average proxies back into the proxy dataset … my goodness, what a pile of work just to get down to a stack of individual data points.

In order to understand better what is going on in Willis’ correlation procedure, it is useful to look at it in a slightly different mathematical context. This can then give you a better ability to compare it to what other known procedures do.

From multivariate analysis considerations, it is not difficult to show that the correlation of a proxy with the mean of all the proxies is equal to the average of the correlations of that proxy with each other proxy (including itself). In math terms, it is the average of the row (which corresponds to that proxy ) of the correlation matrix. With this relationship in hand, we can translate Willis’ program to operations purely on the correlation matrix (call it cormat).

1. Repeat a loop that does the following:
a) Calculate the row means (yearly means of a proxy matrix where the rows are years and the columns are individual proxies) of all of the “proxies” for each year. Let me designate this resulting dataset as RM, for the row means. It has one entry for each year, the mean (average) of all the proxy values for that year.

b) Invert (swap plus for minus) all proxies that have negative correlations with RM. (Since we are looking at the correlation with the average of all proxies, there are typically not many correlations which are negative). Repeat until there are no negatively correlated proxies with respect to RM.

Calculate the averages of all rows of cormat. If any are negative, flip the signs of all of the values in that row, then of all of the values in the corresponding column. This process can change previous row mean pluses to minus, so continue until none of them is negative. It is not immediately obvious to me that this is always possible nor that it always leads to a unique sequence of row averages. That would need to be shown mathematically.

2. Repeat a loop that does the following:
a. Calculate RM, the row means of the group of proxies.
b. Identify and record the proxy with the minimum correlation with RM.
c. Remove from the dataset the proxy with the minimum correlation. Repeat until there is only one proxy left.
At the end of this process, graph the minimum correlation values.

Determine which proxy has the lowest row average. Delete the corresponding row and column of the correlation matrix. Go back to the beginning and repeat the process in the reduced correlation matrix until you get down to the last one (actually the last two, since they both have the same row sum).

Running this in R will give the order of removal and the correlation of the removed proxy at the time of removal. I didn’t do a lot of testing of the program so iIf there are any bugs, I’d be happy to fix them.

This is the result on what I believe to be Willis’ 95 proxies. There appear to be some minor differences with Figure 7 but I couldn’t get Willis’ program to run to check them out.

Re: RomanM (#102)
Don’t ya just hate it when you make a little mistake. I slightly misspoke myself in the previous post. The row means of the correlation matrix are the covariances of the (standardized) proxies with the mean . The correlations are equal to the row sums divided by the square root of the sum of all the elements of the matrix. The corrected program is

I’m curious about this. It’s not about trees, but it does deal with numbers and signals and trends and such.

There’s pretty much a flat global mean temperature anomaly since 1881 (hovering within +/-.2 of 0, and a trend of about .2 if each of the first 44 years have .2 added, the next 57 are left alone, the next 17 have .2 subtracted, and the next 11 have .4 subtracted.

Is there some sort of formula that may tell us if some number of years from 2009 would have to have something added to each to get them back around the zero line? Or even if the next period would be .6 (.2 more each period) or .8 (doubling each period).

Can it answer the eternal question: What was it about 1881-1924, 1925-1980, 1981-1997 and 1998-2008?

Roman:
So how might one evaluate the strength of a common underlying signal (possibly a temperature signal) as opposed to a common locale/common method “signal” using cluster analysis? It seems to me that the dendrogram suggests that there is no strong underlying signal because a relatively small change in the position of the cut-line quickly produces what appear to be groups of proxies by researcher/method/locale.

I don’t think that cluster methodology will necessarily be helpful in that regard and its role should not be overestimated. In some ways, the results you see here should be expected. It would not be a good situation if proxies from the same location told wildly different stories. Whether the relationships are due to temperatures or something else is another matter. Properly summarizing the proxies and relating to local conditions is a good start. Other formal statistical techniques, as Steve mentions above, are needed to evaluate the different problems of proxy estimation: cherry picking, divergence, etc.

RomanM:
Hmmm, I agree that same method, same location should cluster but if clustering just produces clusters on this basis don’t method variances swamp any possible common signal?

Steve:
Do you have a handy reference to Brown and Sundberg?

The more I think about this, the more daft the whole notion of clumping together these proxies appears. As Steve said at the beginning you first have to specify and perhaps isolate the temperature signal in a given proxy before any of this aggregation and splicing takes place. Otherwise I think we simply end up with another form of RM’s question of what is the average global temperature. Trying to figure it out is OK: Pretending to be definitive is a bit over the top.

When collecting/analyzing proxies, one would expect to find a certain sameness in the proxies. Otherwise, it would indicate that other factors may have corrupted the integrity of the proxies.
In Mann’s case, he went ahead and used certain proxies which did not exhibit ‘sameness’, and he selectively chose ones which supported the hockey stick shape.

The problems are actually somewhat different. We expect to find “sameness” in the proxies. However, putting a bunch of closely related proxies into the mix means that it is likely that they will overweight the final result. That’s the first problem.

The second problem is that not all proxies are valid. Proxies like the Tiljander series and the “stripbark” pine species are widely known to be invalid. Mann uses them anyhow, because they create the “hockeystick”.

1. The division into clusters is done mathematically, and is very successful at identifying clusters of similar proxies.

2. It is expected that there would be similarities between proxies collected in close proximity to each other. However, including a host of them merely distorts the results.

3. A couple of these clusters contain known bad proxies (Tiljander and bristlecones). These should be removed. In addition, those proxies with no archived records should be removed.

4. I would remove any significantly heteroscedastic proxies as well. There are not a lot of them in these 95 records, from memory about a dozen.

5. I would replace all of the clusters which are from a single species and geographical area (e.g. Argentinian Cypress, Eastern USA Cypress) with the standardized average of that group.

6. In the final cluster called “Mixed Bag”, and in the two Speleothem/Sediment clusters, I would average any pairs of similar records from the same geographical area (e.g. the “lee_thorpe_speleo” pair, the GISP and FIsher ice core pair, the Curtis lake sediment pairs).

7. At that point, with distinct proxies from a variety of geographical locations, I would look at the correlation distribution analysis and the clustering analysis to see what remained, and consider my next step.

The classical approach to clustering in paleoecology is agglomerative and hierarchical (historically, this reflects the development of these methods from the phenetic school of taxonomy); modern clustering techniques for very large data sets (e.g.,genetic networks) are based on information theory and are typically neither hierarchical (i.e., cannot be depicted as dendrograms) nor agglomerative; rather, they are based on network linkage patterns

Cluster methodology was around before “palaeoecology” was an invented word and indeed it has been successful in picturing relationships in diverse fields – for example, data derived from DNA fingerprint studies in zoology and biology. The artificial intelligence movement produced some nice methods that go beyond cluster analysis.

One drawback is that cluster analysis contains essentially no information on cause and effect. It does not answer WHY clusters form and HOW FAR they should be apart.

In some DNA reconstructions, (like the evolution of Man) people have tried to postulate a constant mutation rate and so calculate ages at which dendrogram clusters separated. These ages can then be compared with other methods like isotope dating. But is this correct? Is there is different rate for each family/genus/species etc? Is the rate different in hot areas versus cold areas if mutation is a heat-dependent reaction? etc etc. The point is, the picture is pretty but the interpretation is not.

I suspect that some of what is seen in the above cluster diagram is the human desire to publish data that are in close agreement with each other when theory says they should be. Viewed another way, the clusters might be separating precision (close cluster induced by author) from accuracy (different author, more remote cluster).

And this is all before one considers the complications of global geography in the example you give. And any other real effects.

There seem to be two different claims running about this site and other. 1) Mann’s methods produce hockey sticks from noise. 2) The hockey stick is produced by bad proxies that are hockey stick shaped. Seems like these are two different claims. Is it one or the other being claimed by different people, or are both being claimed by the same people at the same time?

Re: Brian Macker (#123), it’s a bit more complicated than that! But, roughly speaking, both, since: (1) Mann’s methods are known to preferentially extract hockey sticks from basically noisy data which contains any hockey-stick-like features, and (2) the hockey stick in Mann’s reconstruction is known to be extracted from a small number of bad proxies that contain such features.

Re: Brian Macker (#123), thanks for your question. The two claims reflect two different problems.

1, In the original 1998 Hockeystick paper by Mann, Bradley, and Hughes (MBH98), there was a mathematical error. The effect of this error was to “mine” for hockey stick shaped data series. This error was pointed out by McIntyre and McKitrick, and it was subsequently fixed by Mann.

2. In MBH98 and in all subsequent studies that claimed to be “independent confirmations” of MBH98, proxies with known problems were used. In the majority, these are proxies from the SW USA, from bristlecone pines (PILO above) and other closely related “stripbark” pine species. These form the first group in my analysis at the head of this post, because of the huge effect that they have on the results.

In this latest 2008 study by Mann et al. (M2008), there is an additional group of bogus proxies. These are the “Tiljander” lake sediment proxies, which are stated by the original researcher to be contaminated after about 1800 but were used by Mann et al. anyway.

Finally, there are individual proxies which have problems which should disqualify them from any reasonable analysis. These include the Tornetrask and Fisher Agassiz proxies.

The squabble over the methods and whether to use EIT or PCA or some other analysis misses the point for me. As I have shown above in Fig. 6, a simple average reveals the hockeystick shape in the proxies. However, it is equally simple to show that this hockeystick shape is entirely the effect of the bad proxies. When the bristlecones and the Tiljander proxies are removed, the hockeystick shape disappears entirely, as shown in FIg. 11.

There are, of course, other problems. Sometimes, if two proxies exist in a geographical area, only one is used. This is usually the one with a recent rise in the temperatures.

Another problem is the use of unarchived datasets. M2008 at least gives the data used, but for some proxies this is the only version in public circulation because the original dataset was never archived. Sometimes two or three different versions of the “same” proxy have been used in various studies which have claimed to “confirm” the hockeystick.

There is a more subtle question which remains, however, which is related to your first question above. This is the effect of any form of “matching” to the recent temperature record. The problem is that the instrumental temperature record contains a recent rise. Suppose we took a set of a thousand random pseudoproxies covering the years 1001 – 2000. We select out of those the ones that have a good correlation to the 150 years of the instrumental record, and we average them. What will the result look like?

Well, all of them will have a rise in the last 150 years, but be randomly distributed before that. When they are averaged, the random part (the first 850 years) will average out to a straight line, while the recent part will show a rise, and voila! The hockeystick appears.

This is why I have used a straight average in my analysis above, because if you use any method to fit random proxies to the instrumental record you will get a hockeystick.

If you truly want to use such a matching method, I suggest that what you need to do is apply that same method to a group of random pseudoproxies. Then subtract the hockeystick shaped result (which you will get from any method) from the results of applying the same method to your actual proxies. This will remove the inherent bias in the method.

#125. Willis, I agree with most of this, but it’s not entirely correct to say that the PC1 error was ever really acknowledged. Mann’s posts at realclimate denied any error whatever and there has never been any backtracking from this. Tamino recently made a series of posts denying that Mann’s method was erroneous.

They argued that if they could “get” a HS using conventional PC methods by changing the number of retained PCs in the North American network to 5 – which thereby included the bristlecones. (This is obviously not news to you, but I’m just rehashing a little.)

Rutherford et al 2005 used the incorrect PC1s; as did Mann et al 2007 – without a blink or an apology. They were also used in Hegerl et al 2007; Osborn and Briffa 2006 and even in the IPCC AR4 spaghetti chart (despite the protest of a couple of reviewers),

Steve, thanks for correcting the record. Amazing that after Wegman and everyone else has called uncentered PCA an error, they still all deny any bad math. I note, however, that they have given up on principle component analysis.

The reason for their addiction to the bristlecones, and to the Tiljander proxies, is made clearer by the following graph. It shows the cluster dendrogram from Fig. 14 above. It also includes the graphs of the average of the cluster over time. Note how most of the groups have no hockeystick shape.

Figure 15. Correlation clusters for all Mann proxies which extend from 1001-1980. Graphs at left show the 1001-1980 yearly average of all proxies of the linked cluster.
Can you get a hockeystick using PCA, without Mann’s error? Easily. Can you get a hockeystick using EIB, or CRA, or other methods? No problem. As Figure 6 in the head post shows, you can get a hockeystick by plain old garden variety averaging.

But can you get a hockeystick without Graybill’s bristlecones? Very difficult. That’s why Tiljander is newly added to the mix, to provide a hockeystick in the absence of tree ring data. Look at the shapes of the clusters, it’s all visible there.

My plan is to continue exploring the M2008 dataset. My next move will be to remove the heteroscedastic proxies (per my previous post, “When Good Proxies Go Bad”). I know that will get rid of the Tiljander proxies. I will also remove all of the Graybill data, for the reasons outlined above.

Next, the practice of including multiple proxies from the same or very close locations seems unjustifiable. People who wouldn’t consider doing that with instrumental records seem to find it acceptable in proxies. I don’t. Take a look at the first two proxies in the “mixed bag” section above. They are a C13 record and an O18 record from the same location. They are more like each other than they are alike to any other proxy in the record. I would take the average of them (and other similar pairs) and keep that average in the dataset. I would not include both. The same is true for say the West Coast USA Juniper group or the Argentinian Cypress. I would remove the group and leave the average. Having seven very similar records from Argentina skews the results.

Having removed or averaged the relevant proxies, I will then re-run both my correlation distribution analysis and the cluster analysis, and see what the remaining common signal might look like. Bear in mind that we have no idea what that signal might represent, if anything.

None of this will make it online soon, however. One of the joys of living on a tropical island is boat trips. One of my friend’s sons is getting married on an island a couple hundred miles away. We’re taking his boat (25 foot, twin 135 hp outboards) island hopping out to the Western Province for the wedding. We’ll stop along the way at a small raised coral barrier island that he and I have some kind of traditional title to, spend some time there, then run out west to the wedding.

And although I have my little Coleman inverter with battery clips to run my computer off the boat’s battery … there’s no phone and no net on the small island. Where the wedding is to be held there’s internet … sometimes …

Part of what I love about living here is that I get to see the daily occurrence of the tropical thunderstorms. Those marvelous heat engines are what keep the earth from getting too hot. And in the boat, I get to see the storms up close and personal. So for the next few days, I’ll be cloud-watching. (I tell my wife that it’s “meteorological research”, it sounds much more grown up. Strangely, she remains unconvinced.) So if you don’t hear from me for a few days, it’s because I’ve got my feet on the ocean … and my head in the clouds.

Finally, my thanks and appreciation to everyone who is contributing to this thread. To those asking questions, to those answering questions, to those providing information and insight, and to the lurkers, my very best to you all. Keep the questions and insights coming, I’ll be back soon.

Figure 15 provides a lot of explanatory power in a small package. Really good.

I know you commented above that we shouldn’t get so caught up in the weighting, but for people like me who think “weighting?” right away, the figure would provide even more explanatory power if some measure of weighting could be added.

I briefly described the technique in the base note to a math professor that is an expert in signal processing. He was not familiar with the technique; he saw no flaws in it at first glance. That is not a studied evaluation, just a first impression. He offered two fields where different data collection methods are sometimes combined: US Dept of Defense and medical imaging. Hope this helps.

regarding your comment (127) that (centred?) PCA with bristlecones can give the hockeystick shape. It looks like you’d have to fiddle with the data somewhat, even so. The following link is to a presentation by Edward Wegman. Slide number 15 compares the original Vs the centred PCA method and it seems to have flattened the hockey stick completely. Keep up the good work.

To review the bidding: when last heard from I was about to head out in a small boat to attend my friend’s son’s wedding. It was to be held on another island a couple hundred miles away (300 km). I’m Willis, I wrote the head post of this thread. And this is a tale of a South Pacific wedding.

A beautiful morning, up before tropical dawn, sky almost clear … except off the far distance between the launch ramp and the nearest island, one lone renegade thunderstorm that had refused to die from the night before. I reckon it was night-adapted, and was sitting where some warm currents met.

We put the boat in the water just before sunrise, floated off the trailer and headed West. Things were going swimmingly, we ran at thirty mph (45 kph) for about an hour, boat was feeling good. Then the alarm goes off, “Low oil reserve”. Say what? Oil injected two stroke engine, we check the oil reservoir, it’s working perfectly, full to the brim. Engine is running well, but we decide to err on the side of caution and come back to port. Looks like we’re flying out west to the wedding. Better than drifting out to sea for the wedding. Ah, well, it was another part of life’s rich pageant, and a beautiful one. This morning, I thought the dawn thunderstorm would build up with the day, but to my surprise, it faded away quickly with the coming of the morning warmth. Always something new to see. And any morning spent messing about in a small boat in a vast ocean is a very good morning.

Plus I got to fly over and around the thunderstorms instead of boating under and around them.

So, we flew to the wedding. The airstrip is on a tiny, airstrip sized island just offshore from the town. The town is small, about five thousand people, despite which it’s the second largest city in the country, which tells you something. It is a seaside provincial capital in a country where everyone travels by canoes large and small. The town stretches a couple of km along the harborside on both sides of a single road, with calm water everywhere in the harbor. On the sea side of the one unpaved road it is mostly Chinese shops built from on land and extending out over the water on pilings. Tied up all along the waterfront are boats of all sizes, from the inter-island trading ships tied up at the main wharf, to the small paddle and sailing canoes tied up everywhere from the main market to the far end of town.

When we got to that little town, where the wedding was to be held, everybody knew about the wedding. All the guys in the boat taking us from the airport to town knew about the wedding. Social event of the season. For me a rare chance of watching the social whirl while being outside of it. I’m well known in that town, I lived for three years on a nearby island, I’m considered to be one of my friend’s family.

The wedding, like all island weddings, was a long and complex affair. My friend (the father of the groom) and I had absolutely nothing to do with putting it together, it was all “sait blong Meri” (“side belong Mary” means women’s business) so his wife was in charge, we were just out of town guests. My friend’s house is up on a hill overlooking the harbor and the vast surrounding ocean, you can see over a dozen islands from the house. It has a roofed second story verandah that extends clear around the house, tropical style. We sat up on his verandah “watching the whole thing come down in perfect harmony-y-y-y-y”, as Taj Majal sang. They were putting stakes in the ground to hold up the bunting at the entrance to the reception.

Now, everything in the South Pacific islands happens by consensus. As a result, we watched as it took a minimum of 3 to 5 people to put a stake in the ground. No sledgehammer, of course, so they found a stone. Much laughter, comments and speculations, jokes about still being in the stone age. After the stone didn’t work, my friend’s oldest son, about 35 now, a bear of a man, came up with what looked like the crankcase of a long dead bike, gears and all. More laughter. He gave few strokes of the gearbox, enough to show how it’s easily done … if you have biceps the size of my thighs, as he does. He then handed the gearbox to some younger guy, probably a relative, and walked away. His part was done.

The younger relative staggered a bit, the gearbox was heavier than it looked, he set it on the ground. This was followed by a lengthy discussion among the assembled stake advisory group, about where, and how, much of which is inaudible from where I’m sitting up on the verandah. Different folks squat and look at the stake. Under their direction, the gearbox man pulls the stake back to vertical. He gives it a stroke. Then they decide it’s in the wrong place. It is pulled up and held several places, finally settling a few inches from where it started.

Then the gearbox man gets the nod. He’s young and strong, but clearly not a man to make weighty decisions like when to whack a stake. He gives the stake a few more strokes, and sets the gearbox down again. The sun is blazing hot. Someone (not the gearbox man) pulls the stake back to vertical. More laughter. More strokes. Now, clearly, the stake is almost deep enough to hold. The question of whether it is in fact deep enough to hold brings much hilarity and extended discussion. The decision is finally made for a few more strokes of the gears.

Unfortunately, during the discussion, the gearbox man has walked away to do something else. After some time and further discussion, another man is selected, and he picks up the gearbox. Another couple strokes. More discussion. Finally the verdict is in. The stake is good. For now. And next … next, we watch them put in the second stake.

This second stake had the additional needs of being level at the top with the first stake, and parallel to the first stake. Well, if you thought the first stake discussions had been spirited …

All this time, a constantly changing cast of lovely folks has joined and left the stake advisory group. In the South Pacific, the number of advisors rises proportionally to the complexity of the problem. With the added challenges of being level and parallel added to the stake problem, a larger group of advisors was inevitable. The opinions of people walking past were solicited and discarded. Soon, it was clear that there were factions developing among the advisory group. Some advised moving the first stake. Someone would tilt the stake to the right, then the gearbox man would get the nod. Then once he set the gearbox down, someone else would tilt it back to the left. Another couple of blows.

And all of this accompanied by the laughter, and the comments, and the total lack of any sense of hurry that make the South Pacific such a great place to live. Yes, in fact, it did take a group that varied between five and ten people at least an hour to drive two stakes in soft soil … but I tell you it was an hour spent in joyful pursuit of a social interaction that had absolutely nothing to with productivity. It was a pleasure and an honor to have the opportunity to watch them contribute their part to making a fun reception for everyone.

My friend’s kids and their inlaws were out in full force. Every single one was there, the oldest son in from his house on the island the holy man gave my friend, then the groom, and my friend’s youngest daughter who is a lawyer. His most traditional son and his wife are in from the family’s ancestral village on another island. My friend’s two youngest sons (aged 21 and 24) are there, and his oldest daughter and her husband and kids. Scads of grandchildren. People had already built the floors of the pavilions for the grooms and brides parties on the hillside outside my friend’s house. They put up the tarps, and they started to wire up some lights so they could work after dark … but by then it was heading towards dark, and people had put in a full day. Then everyone went home.

The work wasn’t done, and the wedding was going to be the next day. I scratched my head … my friend and I had another beer, watched the sunset over the ocean.

Then, this being the South Pacific, after a couple hours a half dozen guys came back and to the accompaniment of much joking and horseplay, they wired up the lights after dark using flashlights … I nodded my head. Had a glass of water. When the lights were finally wired up, they continued working into the night, putting the tarps up over the floors in case of rain.

Next morning they had the wedding. Assuming that our compadres would find us there, and that a stiff drink was likely the proper foundation on which to start a wedding day, my friend (the father of the groom) and I wandered down to the bar of the only hotel in town. Finding both assumptions true, we sat on the second story there and watched what seemed like the entirety of the little town stream by on their way to the wedding. It was 10 AM, and already hot. I had on my suit, but I hadn’t put on the coat and tie. The wedding was to start at 11:00. At 11:00, one of his kids called us to make our appearance. I tied my tie, put on my suit coat, and went out to face the music. We walked the few blocks to the church. It was very hot by eleven.

Of course, this being the South Pacific, it was a false alarm, they weren’t ready for the father of the groom yet. But about then his wife drove up with the truck. We parked it across from the church and sat in air conditioned splendor watching the folks arrive, until they did require the father and mother of the groom. At which point they went to get pictures taken, the bride looked ravishing, everyone was duded to the max, all of the four groomsmen were my friend’s sons, the angelic looking young ring bearer was his grandson. And no angel, I might add.

A charming gentleman was at the front of the church. He asked which family I’m with. I say the grooms. He directed me to the left side, where I found a spot directly under a fan. But this is the South Pacific, so after while the same man came up to me and said he’s sorry, but I’m on the wrong side. I look around, he’s right. I get up, go round, find a spot near a fan on the other side, and continue to sweat.

All the dignitaries were there. The Premier of the Province. Provincial Members. The local holy man who gave my friend and I an island was there. His eyeballs always look like they might spin like pinwheels. Interesting guy. The church was jammed, packed to the rafters, with people standing outside and people looking through the windows.

The officiating minister gave an alternately impassioned and inaudible sermon, with the impassioned part blasting out of the poor overdriven church speakers in an almost incoherent stream. We stood. We sang. We sweated. We sat. We said “amen”. Mercifully, it was over soon. Everyone streamed out laughing and fanning themselves with the programs.

The next three hours were consumed with setting up for the reception. The hot stone motu cooked food was brought to the central location. Last minute adjustments were made to the bunting. The wedding cake set up under the cake tent. People started streaming in, every family bringing one and often two big trays full of food. People milled around, talking story, laughing, chasing flies off the food.

Since it was time, my friend and one of his many sons and I went down to pick up the holy man, who is in fact a British OBE as well as being very strange. He’s bought an old dive boat that he travels and lives on. I went on board. It’s like a floating village. I mean it’s just like a village, people sitting around, things hanging everywhere, boxes scattered around on deck, general disarray, bunches of fruit hanging off of the winches. There he was, chewing betel nut and with his hair totally frizzed out. He remembered who I was. He got in the truck with his people, and my friend’s son and I rode in the back of the truck up the rocky, rutted hill to the wedding. He got out and was led to his easy chair. Notable people sat in plastic chairs, there were enough chairs for maybe forty people, hundreds of others stood or sat on the ground.

Then when everyone had arrived the speeches began. They were MC’d by the groom’s maternal uncle, who I guess is the highest ranking man in that branch of the family. Entranced with the sound of his voice, he spoke too softly to hear, and even those who could hear him were not paying attention. The PA system wasn’t much help either, but with maybe three hundred people there, all of them wanting to talk to each other, the PA system didn’t really have a chance.

I wandered around in the crowd, sometimes able to hear the father of the bride, sometimes not. I drifted around the tables where some food was already placed. When I came around the far side, I found a long string of kids sitting on a very long, low bench made of boards laid on beer crates. I sat down with them to watch the show. I could see why the kids liked it, you could hear the PA system, you could see everyone at the bride’s and groom’s tables from there, plus you could see all the people in the chairs and standing round. I sat down at the end of the line of kids.

After the bride’s father finished his speech, the MC gave a rambling monologue and turned it over to my friend, the father of the groom. He gave a much more rambling and slightly inebriated monologue about family. He thanked everyone, he mentioned a whole bunch of people in particular, pointing them out in the crowd, but they’re hard to see.

Then he mentioned my name, and pointed me out. But because I can see everyone, everyone can see me. Everyone in the seats looks over at me, the only adult sitting in the sun on a twelve inch high bench with a bunch of school kids, the crazy gringo. People standing up behind the seats look over at me. Heck, even the whole row of my six to ten year old ex friends, at the end of whose line I was sitting in peaceful anonymity, turns as one child and looks at me, their eyes wide. I tip my hat and smile to one and all, pull the brim down over my eyes, and study the ground until my friend mentions other people in the family and the carnival moves on …

I laughed about that as I drifted back through the crowd. Timing is everything, I thought. I went back up to the second story verandah. I could feel a thunderstorm brewing. The MC kept MCing, and he had almost gotten to the food part, when the father of the bride decided he wanted to talk again. I could smell the rain was coming, and he wanted to talk some more. Finally he gave up the microphone, and the people were loosed on the tables of food. I heard a couple more peals of thunder.

Now, there were three serving tables, which the MC had decided were table 1 for the honored guests to get their food; table 2 was no pork (for the Christian Seventh Day Adventist followers); table three included pork dishes. But he announced this in a very confusing and inaudible way, and he was wrong anyhow, so everyone lined up for table 3 and the line stretched until forever … I sat on the verandah and watched. Some people got some food and sat down to eat. Food was delivered to the bride and grooms tables and they began to eat.

Meanwhile, I could see the thunderstorm bearing down on us. I could feel the thickening of the air, the rise in humidity before the rain. Breaking my vow not to try to influence events in any direction, I went downstairs and out to the groom’s table, and asked the groom’s mother if she realized that the thousands of dollars worth of wedding presents were approaching inundation. She said yes. I said OK. I had given the happy couple cash in hand, so I wasn’t worried, my present was safe. I went back up topside to the verandah to watch things unfold. Meanwhile, the lightning and thunder had gotten closer.

Soon the first raindrops came. Fortunately, it started out light. This gave people a bit of warning. One of the groom’s sisters started to move presents. Some other people joined her, and as the rain increased, the tempo increased. Just before the last presents made it under the house, the sky burst open, pelting rain, with lightning and thunder blasting insanely close by all around. Half a second or less from the flash to the boom. It was so overpowering that a few people just covered their heads and whimpered at the intensity of the storm, but most everyone jumped up and vanished. Disappeared. It was amazing how fast people could move when impelled by driving rain, lightning and thunder. Within about ten minutes, all of the people were gone. Not only that, but every scrap of food on the place had vanished as well. All the food that had been put on the serving tables were picked up on the fly as people fled the storm, boxes were emptied into cars, every woman had found her own pots or plates, they folded it up and vanished. The family stayed, the folks under the tarps were ok, but the tent over the wedding cake was starting to go, and it was only screening on the sides. Two young guys grabbed a tarp and wrapped it around the tent to keep the cake dry, they stood there in the pouring rain cracking jokes until someone found a piece of rope to tie the tarp down.

The bride and groom, and the families, and me as well, were glad that the raid had driven the people away. Saved having to pry them out of drunk corners at midnight. After the rain, it was clear again and cool. And besides, there was still the custom part of the marriage to attend to, so the bride went home with her father and her people to get ready for that.

Then the women of the groom’s family all painted their faces, and took some particular tree branches. They, and the men (except for the groom) all piled in the trucks and drove to the bride’s home. There, the women all yelled and waved branches and shouted for the bride to come out, to come with them to her new home.

The father of the bride came out. He explained that first off, he had returned two of the three bands of shell money that was paid as bride price, along with all of the cash money. This was because the bride’s family is from an island where the land passes through the matrilineal lines, and the bride’s mother is a princess in that line. So the first thing the bride’s father said was, even if she did go, unlike in most marriages here, she was not giving up all membership in her tribe to join her husband’s people. As symbolized by them returning two of the three bands of the shell money bride price, she retained some rights in her own line.

And then, with that over, he said what custom demanded, which was that in any case none of that mattered ’cause she wasn’t going anyhow. She was just too precious to them, it was all over, she was their little snowflake, they couldn’t let her go, forget about it, no way it would happen, the nice ladies from the other island might just as well go home.

At this, the women from the groom’s side redoubled their screaming, and they danced a threatening dance, with the branches held as though they were bird wings. Back and forth they danced, chanting some ancient half-understood chant. Then the women all rushed the bride’s house, where against token resistance the women physically picked her up, and carried her to the truck. They drove home screaming at the top of their lungs all the way, louder than I’d have thought possible, and when they arrived back home they once again picked the bride up, about six women picked her up and carried her into the house where the groom was and put her down. Everyone cheered, the bride and groom beamed in a kind of abashed fashion, and at last all of the marriage festivities were over.

Now that, I thought while sitting on the verandah enjoying the evening, was a South Pacific wedding.

So, that was my week, and the reason I haven’t posted. We now return you to your regularly scheduled programming.

I pondered on mathematical questions in between beers and marriages, did some work on my laptop. I’m preparing a new post. The abstract is that of the 95 M2008 proxies that cover the period 1001-1980, a bit less than a third are heteroscedastic (p less than 0.05), a bit more than a third are homoscedastic (p greater than 0.95), and a third is in between. Care to guess who is in the heteroscedastic pile? Graybill, Tiljander, Fisher Agassiz, Dongge, mongolia-darrigo, Thompson Quelcayya, Tornetrask, all the usual suspects.

I haven’t read extensively, but in all that I have read I’ve never seen it mentioned that plants ingest CO2. As such, to me it’s not surprising that independent of temperature, terrestrial plant characteristics will in part depend on the amount of CO2 in the atmosphere. Of course, temperature could also have an effect. But by examining “tree rings”, how do you separate the influence of temperature from the influence of atmospheric CO2 levels? Isn’t it possible that tree rings are mostly a proxy for the amount of CO2 in the atmosphere? If so, “tree-ring” correlation with CO2 levels says nothing about temperature–one way or the other. Furthermore, if true a change in atmospheric CO2 levels would result in a change in “tree-ring” characteristics; and by examining “tree-ring” data you would conclude that, yep, CO2 levels have increased in recent times–duh. It’s likely that there are tests to separate the effects of CO2 levels and temperature on “tree-rings” and all I’m demonstrating is my ignorance; but if no such test or tests exist, to use “tree rings” as a proxy for temperature, and in particular to claim that atmospheric CO2 levels govern atmospheric temperature, is patently riduculous. Any comment?

Re: Reed Coray (#134), Yes, well, the “what does the CO2 represent” argument is ever present and ignored by the team. Since they have a correlation with temperature, they dutifully attribute it to temperature, ignoring CO2. You cannot separate CO2 and temperature in a non-linear system in which multiple inputs are correlated with each other. Heck, by definition CO2 is causing temperature rise… plant food and all that. Since PCA does not ascribe origin (can’t), the team has decided to make its own rules in this regard.

Re: Reed Coray (#134), thanks for the comment. You ask if the tree ring proxies are responding to CO2 rather than temperature.

From my perspective, I think I can say a few (very few to date) things about the M2008 proxies that extend from 1001 to 1980 (the group I have been analyzing).

1) There appear to be two signals in the proxies, and the hockeystick shaped one appears to be bogus.

2) Once we remove the few proxies that create the hockeystick, there appears to still be a common signal in the proxies.

3) We don’t have a clue what that signal might represent.

For some of the tree ring proxies, CO2 might (or might not) be a factor. We simply can’t tell yet. For other kinds of proxies, this is less likely, but still possible.

But truly, we don’t know what it means when we see a host of proxies that take a dip in e.g. 1650. Are they responding to temperature? To moisture? To CO2? To the acidity of the rain? To some combination of the above? To none of them?

I am currently doing further research on the question of what the signal in the proxies looks like. But as to what the signal might mean, what it actually represents … further deponent sayeth naught.

But truly, we don’t know what it means when we see a host of proxies that take a dip in e.g. 1650. Are they responding to temperature? To moisture? To CO2? To the acidity of the rain? To some combination of the above? To none of them?

Willis: Great story about the So. Pacific. My father-in-law spent some time there, and he tells very similar stories. But he was there 65 years ago! during WW II. Sounds like things truly do not move very fast on some of those islands!

I would say yes, the signal is a combination. The plants are responding to some combination of temperature, moisture, carbon, rain pH, strength and number of hours of sunlight, cosmic rays, wind, animal waste, cloud cover, atmospheric pressure and of course, (of course!) the x factor.

The trick is decoupling them all, isn’t it. Perhaps that is not possible to any reliable degree.

Re Reed Coray #134, and subsequent 135-138,
I did try including CO2 in regressions involving the MBH99 treering proxies considered by Li, Nychka and Ammann (Tellus 2007), as discussed in CA thread “More on Li, Nychka and Ammann”, especially comment #32.

I found that including CO2 greatly weakens but does not eliminate the apparent significance of temperature for the famous bristlecone-loaded PC1. Temperature does not significantly affect Patagonia with or without CO2. CO2 slightly weakens the significance of temperature for Fennoscandia, and completely knocks it out for Urals. I concluded,

So although CO2 does not always eliminate or even reduce the effect of temperature on treering growth, it often does. Accordingly, the uncertainty bounds of a proper CCE temperature reconstruction based in whole or in part on treerings may be substantially increased by including this factor.

Second, I read Hu McCulloch’s thread “More on Li, Nychka and Ammann”. Most of the statistics discussion in that thread is beyond my ability to comprehend at this time. However, I do have one comment on how such information is presented. I worked in the military-industrial complex for 36 years and I thought we had a patent on “acronymism”, but I was wrong. If you don’t beat us, you sure come in a close second. I know (a) your blogs are primarily meant for a discussion among people familiar with the field, and as such acronyms are an efficient tool in the exchange of information, and (b) constructing an “acronym list” is a considerable effort. However, the global warming debate is being argued at the layperson level as well as the expert level. Although I shudder at the thought, it might turn out that lay people will ultimately decide global warming policy, not scienfitic experts. Lay people may comprehend a miniscule portion of the information your blogs are attempting to convey, but that portion will be enhanced if the “mystery of the acronyms” is removed. So my suggestion is that somewhere in each blog (or in a URL reference within each blog), the blog editior include an acronym list for all acronyms appearing in that blog. I know this is extra work, but it might be time well spent.

Too many years, it seems, since you’ve missed the tree (revealed by Mike Davis) for the military-industrial complex, which has it all over climate science WRT to acronyms. The latter isn’t even close, IMO, based on 44 years in that former arena.

Re: 144 and 145. Thank you Mike; and John, you’re right on both accounts. When it comes to acronyms, climate change doesn’t hold a candle to the military industrial complex; and I can’t see the trees for the forest. I’m a dummy. All I had to do to find that information was do a search for “ACRONYM”. Sorry.

“Variance” is a mathematical measurement of how much a dataset (say a historical temperature record) varies around its average value.

“Heteroscedasticity” in a record means that the variance in the record changes over time. Homoscedasticity means the opposite, that the variance in the record doesn’t change over time.

In a previous post , “When Good Proxies Go Bad”, I gave examples of both types of proxies. I also discussed the theoretical reasons the heteroscedastic proxies should not be used for temperature reconstructions.

Some of the M2008 proxies are so extremely heteroscedastic as to be ridiculous, such as the Tiljander proxies. Some of these have data which is six standard deviations away from the mean.

I looked at the distribution of the scedasticity of the 95 Mann et al. 2008 proxies which span 1001-1980 (Fig. 1). I found a fascinating and most unusual distribution. More than half of the proxies are in the total of the two end bins, from p = 0.0 to 0.01, and 0.99 to 1.0. There is little grouping apart from that, the rest are spread out pretty evenly with the exception of a small clump at the left end (0.0 to 0.05).

In fact, it extends further than that. Even when we go down to p less than 0.005 and p greater than .995, there is still almost half the data in the two ends.

I wanted to see what the proxies at the extreme left end of Fig. 1 looked like, so I plotted the annual means of just the proxies with p less than 0.005. The result is shown in Figure 2. Here is a list of those proxies.

Graybill, Thompson, Tiljander, Fisher Agassiz … google any of those within site:climateaudit.org and you’ll get plenty of information about them. The list also overlaps extensively with the “Graybill” and “Tiljander” clusters revealed by the previous cluster analysis (above).

When I plotted the extremely heteroscedastic proxies, I also wanted to see the effect of including all of the proxies in the middle of Fig. 1. To my surprise, adding the 49 mid-range proxies to the extreme proxies didn’t make much difference. Oh, the blade of the hockeystick goes higher in the extremely heteroscedastic group. But tripling the size of the initial group hasn’t changed the overall shape.

Figure 2. Yearly averages of two groups of M2008 proxies. Blue and (smoothed) orange lines are the averages of the proxies which are heteroscedastic at p less than 0.005. Black and smoothed yellow lines are those proxies plus all other proxies with p less than 0.995. Orange and yellow are smoothed with a 31 point centered Gaussian filter.

Finding this quite interesting, I then proceeded to look at the other, right-hand end of Fig. 1. I plotted the extremely homoscedastic proxies. And once again I added in the 49 mid-range proxies to see what difference they made. Figure 3 shows the results. Once again I was surprised. Again there was little difference between just the 25 extremely homoscedastic proxies (p>0.995) and the larger group which also includes the 49 mid-range proxies. Once again, the dataset size has tripled with very little change in the average. Odd.

Figure 3. Yearly averages of two groups of M2008 proxies. Blue and orange lines show proxies that are heteroscedastic at p greater than 0.005. Black and yellow lines are all proxies with p greater than 0.995.

Again, while there is a bit of a difference at the start of the record, overall the two groups track quite closely.

Now, I find this a curious outcome. The Goldfeld-Quant Test against heteroscedasticity has neatly divided the data into three groups – two groups that make a whole lot of difference, and one much larger group that doesn’t make much difference when combined with either of the first two smaller groups … most strange.

I suspect (but as this is an ongoing investigation I don’t know) that several things are at play here.

First, a fairly small (N=20-25) subset of proxies is driving the hockeystick shape in Fig. 2, and another group of the same size is driving the shape shown in Fig. 3.

The next is there are still pairs, triplets, and clusters in the data. One thing I’ve found out in this excursion is that when there is no weighting on the individual proxies, the result can be driven by a small number of similar proxies. For example, there are a bunch of Argentine Cypress records that both cluster analysis and correlation distribution analysis show to be very similar. This large number of Argentine proxies will obviously contribute disproportionately to the end result. They should be replaced with their average.

The third issue, of course, is the lure of the signal … the lure that drives scientists mad. One possible explanation for a middle group which doesn’t make much difference to either end group, is that there is a signal which is common across all of the proxies. The heteroscedastic proxies agree with the rest in the early part of the record but go mad after about 1800. So when we add in the middle proxies, only the recent end changes …

But like I say, all of that is speculation. My main conclusion out of this is that yes, the heteroscedastic proxies should be excluded from historical reconstructions. I will be proceeding under this assumption.

Once I remove the heteroscedastic proxies, my next job is boring, and it can’t be done by machine. I have to go through the cluster analysis, find the pairs and triplets and clusters that should be averaged out, and see if I can finally winnow this astounding pile of proxies down to something real.

Further reports as time and the tides allow, best to everyone, I wish you thunderstorms, with sunlight far reaching on the sea …

Can I suggest an alternative to averaging out the members of a proxy family? The problem with that approach as I see it, is that it requires a certain amount of human judgment and so opens up possible accusations of cherry picking. You might average several south american tree rings of a given species together. But would you also average in the same species of tree at a geographically distant point. or would you average in geographically close proxes of a different kind?

How about doing things, in the same way as they are done for the Instrumental Temperature Record. After all, the purpose of a proxy reconstruction is either to compare it with the instrumental record or to use it in lieu of it.

So, instead, average together all the proxies in each given 5° x 5° grid-cell. In that way a large number of proxies in one place can’t weigh the results too heavily.

Re: Nick Moon (#149), you ask about averaging. I use averaging in lieu of any weighting system. This is because I have no evidence at all about the makeup or the significance of the signal. So I have no reason to think that one proxy is “better” than another.

In addition, as I showed in the head plot, if there is a signal, it is revealed by averaging. If we knew more we could likely extract the signal better, but we don’t. For my money, if it can’t be seen in an average, we’re getting into sketchy territory. And in this group of proxies, averaging does reveal a common signal.

You ask why not do it like the instrumental record … we can’t, because we have no common scale. With temperature records, I know that 30°C in Vladivostok is the same as 30°C in Tahiti, and a ten degree drop in either one the same. With proxies, all we have are records which may or may not contain a signal, and which do not have a common scale.

You also talk about cherry picking. I have not done the job yet, but I’ve been thinking about ex ante criteria for averaging.

There are some easy cases. If there are two (or more) samples from the same location, of the same type, by the same investigator, I’d average them. Start from there and work up to the hard cases. I’m doing an investigation, not looking for a final answer yet.

The key to me is the similarity of the cluster, and its location among the other clusters. For instance, the Argentinian Cypress cluster (above) are all from nearby sites, and have a very distinctive shape. They form their own cluster, and it’s the second to last cluster assimilated (Graybill is last, at the top).

Now, I see nothing wrong ex ante with the cluster … but clearly we don’t want to have seven of them balanced against single proxies from other geographical areas.

From what I have seen, I don’t think the details of exactly who is averaged will make a whole lot of difference. Once you pull out the heterogeneous proxies (or alternately the Graybill and Tiljander clusters), the overall shape seems fairly robust. But, I haven’t done the shovel work yet. I will do it a bit at a time, so I can compare the effects of various actions.

Re: BarryW (#150), you ask about trying various cutoffs. I’ve played with it quite a bit. As Figure 1 shows, the majority of the signal is in the ends of the histogram. Where you put a cutoff point matters very little. You see in Fig. 3 above the results of setting the cutoff at 0.005 and 0.995, and they differ very little.

I think, perhaps, I wasn’t explaining myself very well. The issue here is, should 100 similar proxies (say all bristlecones or all juniper trees) really be thought of as 100 times more significant than 1 speleothem. By doing simple averaging you are giving each individual proxy the same weight. If you identify proxies as members of a cluster and then average within each cluster, and then average the clusters together, then you are giving each cluster the same weight. Which strikes me as more reasonable than the first simple average.

But it still calls for a certain degree of judgement. I wasn’t accusing you of cherry picking, I was trying to find a way of of avoindg that being an issue.

My alternative suggestion, is that proxies are identified by lat+long to a given 5° by 5° grid-cell. You are then clustering the proxies together by grid-cell. Take the average for each grid-cell, then average all the grid-cells together. Course, you’ll be doing this with anomalies not actual temperatures. But this is roughly how the instrumental records seem to work. if one cell has 100 thermometers it doesn’t get more weight than the 1 thermometer in the next cell.

That is different from your approach, but probably not by much. If a given proxy family covered a large area then it would get a little bit more weight. Although not by as much as if you use all the individual proxies separately.

The approach I suggest would also reduce the influence of geographically close but unrelated proxies. There might be a location somewhere on the globe, with plenty of long lived trees, interesting caves, deeply sedimented lakes and plentiful branches of Starbucks. But should our estimate of global temperature be more weighted by this geographical region, simply because more different proxies have been produced from that location? The actual number of proxies seems pretty small – when compared to the number of thermometers used to create the instrumental record. So I’m not sure that this can be a major issue. Nevertheless, in principle, it seems wrong that one part of the globe should be over-represented in the global average simply because it has produced more data.

Nick:
My problem with your sugestion is the obvious fact that we do not know what any of these items are proxies of to start with. It would require testing them against local temp. first to see if they are good proxies. If you average all like items in an area.
Well Having read this interesting progress I agree with Willis.
Willis:
I am awaiting the results of this step and the future steps that you take along this road.

I wasn’t suggesting anything that radical. Willis was proposing that he would replace the proxies in a cluster with a single average for that cluster. That seems sensible otherwise, 10 tress of the same species in the same area have ten times the weight.

My suggestion was, that you could achieve, probably much the same effect, by a different method. Namely first average proxies in a grid cell. It isn’t exactly the same. If Bristlecone or Tiljander proxies happen to not all be in the same grid cell they’ll have a bit more impact. There would be a difference in the other direction, if there were two different clusters in the same geographical location. If you average by clusters that would give two signals. If you average by grid-cell, you’d only have one.

I wasn’t proposing that Willis starts a whole new research project or starts hunting down potential new proxies. I was simply offering an alternative way of attempting to treat the issue that sometimes you have a cluster of 10 similar trees in one place, and other times you only have one tree or 1 lake sediment record. If you are simply averaging the data together then it matters that you have weighted the data equally.

[…] Willis Eschenbach also did a very cool calculation where he found a 20% common signal in the proxy data. If both methods are correctly done, that would mean the trees are far more sensitive to moisture or something else other than temp! […]

[…] Bad :: Analysis of the proxies in the Mann 2008 paper on temperatures of the previous millennium. Can’t See the Signal For the Trees :: Cluster and similarity analysis of the Mann In Which I Go Spelunking … :: Cave records as […]