Jones et al 2009: Studies Not "Independent"

One of the ongoing Team mantras has been that the Mann hockey stick has been supported by a “dozen independent studies”. Obviously, I’ve disputed the claim that these studies are “independent” in any non-cargo cult use of the term “independent”. A new article by Jones and multiple coauthors (Holocene 2009) comments on this issue.

Let’s review the bidding a little on this topic. The viewpoint expressed here (and which is pretty obvious upon merely inspecting the proxies used in the reconstructions) is that the studies are not independent either in authorship or in proxy selection – with certain stereotyped proxies (Graybill bristlecones, Briffa Yamal) being used over and over again, thereby contaminating any supposed independence. This lack of independence is readily verified. And yet the opposite claim is repeatedly asserted.

Mann’s reply to questions from the House Energy and Commerce Committee in 2005 made the same claim about a “dozen independent studies”:

Recent work since the TAR has provided further support for this conclusion, which is now common to more than a dozen independent studies published in the peer-reviewed scientific literature.

After protesting about realclimate censorship here (at CA) on October 29, 2005, I managed to get a comment on this issue accepted at RC a few days later (on Nov 3, 2005).

The various supposedly “independent” reconstructions are not in fact independent either in authorship or proxy selection. There are important defects in each such study individually with proxy quality and robustness with respect to outlier results.

This prompted the following inline response by William Connolley:

[Response: At the moment, this looks like wild assertion / mud slinging. Given that the various reconstructions are the same on the important points, it seeems that the major conclusions are robust. Asserting that everyone else is wrong and only you are right is implausible – William]

Realclimate, showing a highly questionable commitment to its claim that “questions, clarifications and serious rebuttals and discussions are welcomed”, then censored my attempt to support my argument.

That the authors are not independent can be seen merely by inspecting the names of the coauthors of the Team studies in the usual spaghetti graph. Briffa et al [2001] with coauthor Jones is obviously not “independent” in authorship from Jones et al [1998] with coauthor Briffa. Jones and Mann [2004] and Mann and Jones [2004] are not independent of Briffa (Jones) et al 2001 or Jones (Briffa) et al (1998). MBH (Mann, Bradley and Hughes [1998, 1999]) is not independent of Bradley and Jones [1993], which in turn is not independent of Hughes and Diaz [1994] or Bradley, Hughes and Diaz [2003], etc etc. To say that these supposedly “independent research groups” are not in fact “independent” in any sense familiar to non-climate scientists is hardly “wild assertion/mudslinging” as Connolley claimed.

Wegman touched on this in 2006, but his “social network” analysis didn’t really capture the essence of the issue. Unfortunately he focused on Mann’s coauthorships, rather than Jones and Bradley, who are much more central figures. He also failed to consider the canonical Stick studies as a group – which would have yielded a more germane result than the one actually produced.

The issue of proxy re-use is just as serious (or more serious). Wegman et al 2006 had an instructive illustration of this, showing proxy usage by study drawing on information that I provided in a spreadsheet. Wegman observed of this figure:

It is clear that many of the proxies are re-used in most of the papers. It is not surprising that the papers would obtain similar results and so cannot really claim to be independent verifications.

Without acknowledging a source, IPCC AR4 conceded that the reconstructions shown in its new spaghetti graph were “not entirely independent” [carefully not mentioning bristlecones or Yamal]:

As with the original TAR series, these new records are not entirely independent reconstructions inasmuch as there are some predictors (most often tree ring data and particularly in the early centuries) that are common between them, but in general, they represent some expansion in the length and geographical coverage of the previously available data (Figures 6.10 and 6.11)

Citing IPCC AR4, Jones et al 2009 now concedes that “several chronologies” have been utilized in “virtually all published studies”, again carefully not mentioning bristlecones or Yamal:

Several chronologies extending over a longer time span, with variability displaying a strong and direct association with changing local temperatures, have been utilized in virtually all published studies aimed at reconstructing Northern Hemisphere (NH) or global average surface temperature changes during the millennium leading up to the present (Jansen et al., 2007[IPCC AR4]).

Note that Jones et al 2009 substantially embellished the actual IPCC AR4 wording. IPCC AR4 did not state that these repetitively used chronologies had “a strong and direct association with changing local temperatures”. Indeed, Review Comments are against this claim. As an IPCC AR4 reviewer, I contested the use of bristlecone/foxtails in a spaghetti graph of proxies supposedly qualified against local temperature, observing (See SOD Review Comment 6-1144) in respect of a series from Osborn and Briffa 2006 (which claimed a correlation of 0.19 – itself hardly “strong and direct”:

I checked the correlation of this data to HadCRU2 gridcell temperature and only obtained an insignificant correlation of 0.04. The authors said that they had cited the temperature data incorrectly, that they had actually used CRUTEM2 yielding a correlation of 0.19 and that HadCRU2 data was spurious in its early portion (1870-1887) because there was no station data. However there is station data at GHCN going back to the data in HadCRU2. D’Arrigo et al 2006 considered using foxtails and rejected the use of this data because it did not meet standards of being correlated to gridcell temperature, expressed in very similar terms to Osborn and Briffa 2006. The contrasting views of D’Arrigo et al 2006 certainly establish that the relationship is “ambiguous” and that this proxy should not be used on multiple grounds.

Notwithstanding the fact that the actual correlation was 0.04 (as could be easily calculated), IPCC rejected my call that they not be shown in a spaghetti graph of temperature proxies as follows:

Some of what the reviewer says may be true, but is as yet unpublished and the current review is based on multiple strands of evidence, among which the results of Mann and colleagues remains relevant.

Whether or not I had published the incorrectness of the claimed Osborn and Briffa correlation is irrelevant. My calculation was correct and the authors either knew it or should have known it. (And, needless to say, IPCC did not require that some point be published if the opinion was adverse to us e.g. Ammann’s secret emails to Briffa, withheld from the IPCC Review Process.)

From this weak and contested gruel – an actual correlation of 0.04 to local temperature – Jones and the extended Team have ratcheted the argument in favor of the non-independent proxies to a claim of a “strong and direct association with changing local climates” for these non-independent proxies.

And they wonder why they have a PR Challenge.

In fact, I’m not sure that anyone presently claims that bristlecone/foxtail chronologies have a “strong and direct association with changing local climates”. When confronted with opposing evidence, Mann’s defence was “teleconnection” – that bristlecone chronologies teleconnected with world temperatures.

Anyone glancing at either the list of authors or the data can see that “independent” isn’t really a word that can be used to describe this all. Or is there some special definition of the word being used here perhaps.

Sam, it’s gobsmacked. And not me, anymore, because I have been numbed by so many almost unbelievable incidents like this. Don’t those folks have any idea just how foolish they look to those who bother to look closely at their work?

On the other hand an r2 of 0.04 explains 400 percent more of the variation than an r2 of 0.00, so I guess it depends upon how you define a “strong and direct association.” 🙂

But only 400%? .04 is ∞% compared to 0, we can always repeatedly subtract 0 from .04 as many times as we want! It’s the subtraction that never ends, sort of like keeping track of authors and datasets for the team.

A meteorological teleconnection is where a meteorological event in one location sets off a corresponding event in another. For example, if there is a strong high pressure ridge at 500mb along 140°W a low pressure trough often forms downstream over the Great Basin.

Using teleconnections to apply conditions in caves to the atmosphere in another part of the world is really stretching the concept.

Does this: “Fourth, a greater use of climate model simulations is needed to guide the choice of reconstruction
techniques”, from the abstract, mean that proxies will be assessed by how well they reproduce the model simulations?

The teleconnection of Mann was pure voodoo: the notion that the strong growth of the strip-barked bristlecone pines during the 19th and early 20th Centuries in Western Colorado was not correlated with local mean temperature but was mysteriously teleconnected to “the global temperature field” – presumeably by some process unknown to physics but which may involve fairies.

Every decent statistician would call such a “teleconnection” a spurious correlation between unrelated time series, of which there are many noted in mainstream statistical literature. But not Mann or the IPCC.

As Steve pointed out in his more recent resampling of those same trees, the magic bristlecones appears to have conveniently dropped that spooky “teleconnection” in more recent times, a fact confirmed by the mysteriously self-censored PhD thesis of Linah Ababneh.

The actual choice of data is irrelevant so long as the same invalid selection process is used and is poor causal correlation between the measured variable and the global temperature: data series must match the GISS recent (increasing) temperature “construction” but there is no requirement to match anything before that.

All “independent” studies using these selection criteria will produce a “hockey stick” and be meaningless.

The actual choice of data is irrelevant so long as there is is poor causal correlation between the measured variable and the global temperature and the same invalid selection process is used: data series must match the GISS recent (increasing) temperature “construction” but there is no requirement to match anything before that.

All “independent” studies using these selection criteria will produce a “hockey stick” and be meaningless.

Unfortunately, Connolley is hardly the worst of the warmer cabal there. In fact, my (few) dealings with him have been cordial (but unproductive of change). Some of the others seem utterly impervious to reason.

What happens if you declare the satellite measurements as “Our best instrument”, and use the various individual surface measurements as proxies as input to Mann’s own methods.

The whole point of Mann’s method is determining the relative validity of inadequate proxies. All the sites with microsite and potential urban heat island issues should thus be detected and given a low weighting.

(He might even call the best fits teleconnected.)

Steve: This has nothing to do with anything in Mann’s studies. Let’s avoid simply piling on.

The former source of additional error can be estimated by using known properties of the calibration algorithm (such as standard formulae for estimating the standard error of regression coefficients – used, eg, by Briffa et al., 2002a) though pseudo-proxy studies (see below) can provide an alternative estimate of the total calibration error and should be used more in future work.

Clever move, pseudo-proxies. This way they can dodge the correct equations (for a year or two, the game is about to be over .. ).

Re: Steve McIntyre (#23), Of course. Reading one of these climate papers is a lesson in tediousness. They throw in 2 or 3 references every other sentence. It’s a bear to read and even harder to understand. I often wonder if they do this intentionally to obscure the work they’ve done, or perhaps to prop up their papers among each other (social network). A quick browse at engineering papers (from academia, particularly through the IEEE) and it is night and day.

Gads, my last paper only had 8 references in 6 pages, and one of them was simply because I had to reference my advisor’s book. 🙂

As a scientist who knows next to nothing about the ‘science’ of palaeoclimatology, I find it amazing that 30 authors can get together and agree about the subject of this paper. The language used is mostly indecipherable. The only thing I could clearly see where the 30 authors would be expected to agree was the claim that much more work needed to be done, i.e. the usual cry for more funding. I thought the ‘science’ was settled (or that’s what successive government ministers have stated in the UK), or hasn’t Jones been told that by his bosses and paymasters. Is there anyone else beyond these 30 who could provide truly independent studies?

Re. non-independence of papers: I don’t understand the underlying problem here. The constant reuse of the same proxies seems to suggest a dearth of new data, but I do not see why this should be the case. If the issue is as important as it seems, then funding data collection should not be an insurmountable problem, it’s not like collecting rock samples from Mars. Have all the good sites (ie. reasonably accessible and likely to yield good data) been sampled?

Something from that article jumped at me. Jones et.al. claim that the Greenland icecores have very good correlation with temperature measured along the coast of greenland since the 1700s. At the same time RC writes in the article linked to in a comment to the previous post that historical claims of Greenland being warmer during the MWP are under dispute. I couldn’t look into this because the source RC was linking to was gone.

Using only Greenland icecores calibrated to Greenland temperatures, what did the climate look like when the viking colonized?

Another thing that relates to this is the age of the studies. IIRC, the Graybill tree core data is getting close to 30 years old.

If I can find the time, I’d like to get the dates of the “relevent” studies and see when the “data gather date” is, to see the average age of the proxies. If the scientists are using this data to show correlation to the climate (which they peg at 30 years), how much of the currently used proxies cover the last 30 years (current climate)?

The 28 recommendations at the end of the Jones, et al. paper make fascinating reading. Number 5 stands out particularly in its call to study the divergence problem and update older proxies.

(5) There is pressing need for further study of the likely precedence and causes of the apparent ‘divergence’ between instrumentally recorded and some dendroclimatically estimated temperature trends (typically some high-latitude NH regions) in recent decades. This emphasizes the priority requirement for systematic updating of many existing tree-ring data, to continue in parallel with efforts to expand the representation of data into new areas.

1. Reading the paper, it appears that Jones et al. actually do recognize that there are issues with prior reconstructions and that there is a lot of work that can be done to improve them. This should be applauded, not mocked.

2. I don’t understand why it took three years to get what amount to fancified meeting minutes published.

3. Even when preceded by the word insignificant, “correlation of 0.04” is an oxymoron. The proper word would be “uncorrelation” or “noncorrelation” or “total lack of correlation”. Even if the correct value is 0.19, lack of useable correlation might be the appropriate descriptor. Of course, we expect some degree of correlation between temperature and tree ring proxies but if that correlation (either r or R^2) is <0.2, how can one construct a meaningful temperature reconstruction from that series?

Re: Bob North (#33), 1) agreed, however, they (the Team) keep saying this while doing nothing about it. Perhaps one day they’ll listen.
2) Choir. Preaching.
3) Not really an oxymoron, but a redundancy. Yes, however, how can one reconstruct past temperatures when the correlation with known temperatures is this low? More preaching to the choir. Hopefully, if Steve keeps pointing these issues out, they’ll really “move on” and drop the reconstruction mess.

Bob, I agree with your pont 1. This is actually quite a useful resource for those who want hard data to undermine a claim that temperature is the major factor determining tree wing widths. And they also explicitly call for updating data, which has the potential to refute some reconstructions. But they spoil the impression a bit by continuing to refer to “divergences” when reconstructions fail to agree with observations. Is there anything in their statistical methodolgy to distiguish between a divergence and a refutation?

Re: davidc (#58), No, not really, but it’s not really as simple as a yes/no answer. The ultimate problems is that due to non-linearity and/or stationarity issues, you have no way of knowing when in the past divergence has happened. It may be possible to account for it now if you figure out what caused it, but that is only because you have pretty good information regarding the entire environment, i.e., all the inputs. What was it like 500 years ago? No way to know without the reconstructions, but then you have a circular argument. As a result, you’re kind of forced into the null hypothesis which isn’t really a “refutation,” but rather a “this conclusion cannot be supported with this line of evidence.” The argument keeps coming back from this as “well, other lines of evidence support our conclusions” which is immaterial (and a complete crock, where’d these guys learn their science?). THIS evidence does not support this conclusion, it has nothing to do with whether everything else is right or not.

I would note that only going back 1000 years is very convenient for obtaining a hockey stick, because if there was a MWP around 900 or 1000 yrs ago, you need to go back at least 1400 years to clearly see the hump in the curve. If you only go to 1000 yrs ago it is simply too easy to get a linear segment up to the end of the LIA.

That the authors are not independent can be seen merely by inspecting the names of the coauthors of the Team studies in the usual spaghetti graph. Briffa et al [2001] with coauthor Jones is obviously not “independent” in authorship from Jones et al [1998] with coauthor Briffa. Jones and Mann [2004] and Mann and Jones [2004] are not independent of Briffa (Jones) et al 2001 or Jones (Briffa) et al (1998). MBH (Mann, Bradley and Hughes [1998, 1999]) is not independent of Bradley and Jones [1993], which in turn is not independent of Hughes and Diaz [1994] or Bradley, Hughes and Diaz [2003]

You can guess who the peer-reviewers of each paper were just by looking at which authors are present in one paper but missing in the other. 😀

So I did some analysis of the references cited in the Jones etc. 2009 paper.

I excluded the names Z1.ZN in the “A1, A2 and A3 :DATE Article X in book Y Z1 Z2 and Z3, editors” cites* to arrive at the following statistics

There are (E&OE) 506 total references – of which 71 are single author ones.
There are totally 1158 different authors identified (some may be dupes and/or duds but I hand edited a few obvious errors).

The author with the most collaborators are Mann,M.E who collaborated with 92 others:
Bradley,R.S 7 times
Rutherford,S 6 times
Hughes,M.K 5 times
Schmidt,G.A 4 times
Jones,P.D 4 times
Ammann,C.M 3 times
Wanner,H 3 times
Shindell,D.T 3 times
Goosse,H 2 times
Waple,A 2 times
Xoplaki,E 2 times
Renssen,H 2 times
Timmermann,A 2 times
Luterbacher,J 2 times
Briffa,K.R 2 times
Wahl,E.R 2 times
Keimig,F.T 2 times and a few dozen singletons

and Luterbacher,J who collaborated with 92 others
Wanner,H 17 times
Xoplaki,E 10 times
Casty,C 7 times
Dietrich,D 4 times
Riedwyl,N 4 times
Pauling,A 4 times
Pfister,C 4 times
Jacobeit,J 3 times
Rutishauser,T 3 times
Zorita,E 3 times
Esper,J 3 times
Gyalistras,D 2 times
Mann,M.E 2 times
Paeth,H 2 times
Küttel,M 2 times
vonStorch,H 2 times
Frank,D 2 times
Grosjean,M 2 times and the dozens of singletons

The most prolific duo appears to be Jones,P.D and Briffa,K.R who collaborated a total 22 times together.

If I can work out a way to do it I’ll put up the raw data for others to analyze.

*Adding in these editor names would increase quite a lot of collab stats

The Jones et al. (2009) temperature reconstruction review goes against much of analysis methods utilized here at CA, i.e. the review defends the current state of reconstructions by making very general statements about volumes of referenced work, while it has been shown in analyses and discussions here that proper analysis requires looking at individual papers in detail.

I would judge that the Jones approach is better suited for a defense attorney’s pleading his case in court than a scientific review. I would be curious how various scientists in the climate science community view it.

I always wonder about the habit (convention) of quoting r (or r^2) as if it/they were some useful practical measures of the value or indeed validity of a presumed correlation. (Note that in using or quoting one of these statistics we seem to be thinking of it in terms of its /predictive/ value).

When I was using statistical techniques in industry quoting r or r^2 with no further information was not far removed from a capital offence. It was regarded as an attempt to conceal something from an unsuspecting reader. What is required is not simply a statistic such as r, but a statement about the data set from which it was computed. This might be how many data pairs or regression points were used to compute the statistic. However, for someone lacking instant access to statistical tables this may be only a minor improvement. What is required is a forthright statement displaying the probability of the obtained r or r^2 occurring due to chance effects alone under the H0 that the true correlation is zero..

Only this sort of approach delivers information that is of real value to the interested reader.

Going a further step along this route, another practical measure of the utility of such a statistic in the context of regression would be the derived confidence interval for a prediction, for chosen and useful values of the independent variable(s), at some conventional probability level, such as 80%,90%,95% or 99%. This would induce some kind realism in the inferences that were based upon the statistic.

Francis:
Interesting. The software that Wegman used to identify the overlapping networks might be useful. It may also be important to classify whether the reference was used to support a position or to refute a position.
I am planning to look at this software for a non-climate project I have but I have not really looked at this software yet.

Re: TerryS (#54), I believe the majority of the quote is not him, but someone asking him if he felt daunted by the fact that “over 100 authors disagree” (paraphrased) with his theory, to which he replied “why, it only takes one?” (again, paraphrased). It is likely an anecdote as well, so probably didn’t even happen the way we have all heard. 🙂

Hmmm, mathematicians have an “Erdos number”. Perhaps now is a good time to institute a Jones or Mann number. Thus if an author had a high “Jones number”, we might be more assured of seeing new data and analysis.

Don’t besmirch the Erdos number by suggesting the invention of a “He-the-Mann” number! My erdos number is four (two ways) and my thesis advisor had a two (but I never wrote a joint paper with him).

I think that it should be noticed that there seems to be a distinct “anti-correlation” between the global temperature anomaly and the number of co-authors of climate papers. Anybody want to test for causality? 😉

Sorry for the OT question, but are you saying that you never wrote a joint paper with your thesis advisor?

This was over 40 years ago! My thesis topic was highly theoretical and the original topic came out of a question I raised in a course with the advisor. He was off in Europe most of the time that I was writing the thesis and I left to take a teaching position before he returned so we finished the details by mail (before the internet was available). I published the results solo.

I did proofread (and correct some proofs in) galley proofs for two of his books. That should be worth an “adjustment” of at least .5 in the Erdos number! 🙂

In the data I’ve assembled the 3rd most prolific collaborator (Masson-Delmotte,V 91 collabs) appears to not have ever collaborated with Mann – though they do share a few collaboratees e.g. Briffa

This probably means than even in the data I’ve mined Mann numbers would tend to be between 1 and 5 – I’ll need to check. Of course to do a “Mann” proper number I’d need to add the book editors and that would probably add links

Re: FrancisT (#61), Mann numbers as high as five would prove nothing at all. Scientists are a pretty interconnected bunch: I’m an Erdos 3, but am not any kind of mathematician. At a guess, any number over 2 is just as likely to be chance as evidence of a real link. See here for musings by Gavin and friends on this topic.

Wow, things has changed a lot. I bet he made more contributions to your thesis and subsequent publication than many of the “coauthors” listed in Jones et al 2009.

I find very unfair that the work made by the experimental scientists who spent long hours/months doing the tedious and hard field work and the work made by these co-signers of a review article are regarded to worth the same. Or even less if you start comparing impact indexes and numbers of citations your papers get.

The Einstein “quote” is unsourced and he probably never said it. It’s a poetic romantic sort of explanation of the scientific method, or actually I suppose specifically falsifiability. “No amount of experimentation can ever prove me right; a single experiment can prove me wrong.” Maybe one day in passing he said something like “Experimente prüfen sie nicht, sie prüfen sie korrekt sind, falsch dass seien sie.” but there’s no record.

Now the story on “other scientists” and being incorrect is supposedly from when he and Bohr were fighting, maybe something surrounding the EPR paradox perhaps, or maybe even dating back to the mid 1920s. Nothing sourced (like his answer to Born’s question if God approved of gambling in a 1926 letter that “I, at any rate, am convinced that He does not throw dice.”) and nothing as eloquent. His step daughter Margot was on the subject of the constant annoying discourse between the two and their supporters, and thoughts it might destroy Einstein’s reputation. She asked something like “Daddy, are you afraid Niels will show you wrong?” to which he replied along the lines of “Right, wrong, eh. He’s only one man…. Eff that guy anyway.” He then patted her on the head and said “Now go make us some noodles dear.”

Steve: Let’s have a moratorium on discussing the Einstein quote. The point is fine, but it sounds grandiose in the context of anything being discussed here.

In a typical way, they twisted the point about non-independence. The Mann number was amusing, but irrelevant to the common sense point that Briffa, Jones et al 2001 is not “independent” of Jones, Briffa et al 1998 or Bradley and Jones 1993 or Mann and Jones 2003 in any sense of “independent” familiar off the Island.

BTW my Erdos number probably is 4 or 5. McIntyre – McKitrick – Essex (and Essex will surely have a low Erdos number of no higher than 2). So if CA readers did a paper with McKitrick (or McKitrick AND Essex), you’d lower your Erdos numbers.