Is Yamal Homogeneous? An Esper-Style Answer.

Starting with the first of my recent posts on Yamal, I raised the issue of whether the CRU 12 actually came from a homogeneous population to the subfossil population.

Although Briffa’s online response to my Yamal posts stated that CRU has “stressed” potential problems arising from “inhomogeneous” sources” in their “published work”, I have been unable to identify any CRU article that describes a procedure for testing two populations for homogeneity or any article that reports the results of such a test. (Yes, Melvin and Briffa 2008, 2009 catalogue various biases and problems associated with RCS methodology, but, as already noted, this discussion is not connected to actual statistical literature or to directly to homogeneity).

In the absence of such discussion in CRU literature, in my last post on Jaemtland, I observed that Esper et al 2003 directly considers the problem of population inhomogeneity. While Esper doesn’t describe a quantitative statistical procedure for assessing homogeneity, it describes a shall-we-say artisanal procedure which Esper used to assess inhomogeneity in a proposed combination of data.

Today I’ll apply the Esper methodology to both the Briffa 2000 Yamal data set and the extended Briffa Yamal 2009. The results are provocative.

Yamal (Briffa 2000) Esper-Style
Esper compared populations for homogeneity by plotting RCS curves on the same quadrant. The left graphic below shows (again) my close emulation of the Esper methodology for comparing two Jaemtland populations. Based on the visual difference between these two curves, Esper concluded that the biological populations were distinct, that the two populations could not be combined in a simple RCS one-size-fits-all procedure (and that the 19th century step was spurious, speaking only to a shift between populations and not to growth patterns.)

On the right, I’ve shown the corresponding graphic for the Briffa 2000 data set, again stratifying the population into two strata – this time, comparing the CRU12 to the subfossil population (plus 5 Shiyatov trees sampled in 1963 and not participating in the post-1970 “boom”). (BTW Jeff Id presented virtually the same curve within a day or two of the issue being raised – see link). I’ve added the smooth curves from fitting a negative exponential to the entire population – a variant to curve fitting after averaging widths for each age-year a la Briffa-Esper , a variant that I prefer since it is much more justifiable in statistical terms. (To date, I haven’t thoroughly canvassed the impact of this preference and I have other options in my RCS function.)

Esper et al 2003 reported that the difference between the two RCS curves on the left was sufficiently significant to establish population inhomogeneity and forestall combining the two inhomogeneous populations in a one-size-fits-all RCS chronology. What are we to make of the corresponding curve on for the Briffa 2000 Yamal population on the right?

The difference between curves in the right graphic “looks” just as significant to me as the difference between curves in the left graphic. Unfortunately, neither Esper (nor Briffa) establish quantitative procedures for assessing such significance. My own opinion – and I’m speaking here in purely statistical terms – is that any reasonable significance tests which rejected homogeneity for the Jaemtland populations (left) would also reject population homogeneity of the CRU12 with the Yamal subfossil population (right).

While Melvin and Briffa 2008 did not discuss procedures for testing population homogeneity, their “differing-contemporaneous-growth-rate” bias is an artisanal term for an inhomogeneity issue. They provide a passim description of this sort of bias (ironically in a reversed context to the one at hand):

In an RCS chronology, if in one period fast-grown trees outnumber slow-growing trees (or vice versa), artificial medium-frequency trends (i.e., of non-climate origin) might result. It is at the recent end of a chronology that the influence of downsloping indices, derived from fast-growing trees, may not, in general, be balanced by the upsloping index series from slower-growing trees. The result, even under constant climate conditions, is an overall negative bias, seen in the final century or most recent decades of the chronology.

Unfortunately, this somewhat blurred recognition of potential problems of population inhomogeneity did not prompt the authors to canvass statistical literature for testing procedures to evaluate such inhomogeneity.

Summing up to this stage, an Esper-style procedure for examining population inhomogeneity provides considerable evidence of inhomogeneity between the CRU12 and the Yamal subfossil population – a possibility that I raised in my initial post. I say “evidence” because I haven’t reduced the Esper procedure to a formal statistical test. However, given the importance of population homogeneity as an issue (an issue that CRU supposedly “stressed”), it is really their obligation to demonstrate population homogeneity, something that was never done for the Briffa 2000 population.

As already noted, Briffa has “moved on” to an expanded Yamal population. He presented a classic “moving on” defence i.e. that problems with the published population didn’t “matter” because they could “get” a similar result with a new population, even if the ink is barely dry on the new data set and the new population is still unpublished in the peerreviewedlitchurchur. (As previously noted, the moving-on defence never acknowledges the correctness of criticisms of the old data set; neither do they rebut the criticisms of the old data set. They just “move on” using a cloud of ink as a camouflage, sort of like a giant squid or team of giant squids in flight.)

The Briffa October 2009 Yamal Dataset
In my initial post, I observed that inclusion of the Schweingruber Khadyta River (russ035w) data set had a substantial impact on the modern portion of the Yamal RCS chronology.

In his October 2009 response, Briffa argued that he could still “get” a RCS chronology that was similar to the Briffa 2000 using an expanded Yamal data set – moving on defence. Briffa’s online response provided the first location information on the sites of the CRU12 – 2 of which came from a site on Jahak River (JAH), 5 from a site on the Porzayakha (POR) and 5 from a site on the Yadayakhodyyakha (YAD), all shown in the Location Map below. The Schweingruber Khadyta River (KHAD) site is also shown on the map below.

Yamal Location Map

It seemed implausible that only 12 cores had been taken in 1988 and 1990 from the JAH, POR and YAD sites. This proved correct. Each site had additional cores, measurement data from which was included with the online article ( JAH- 25; POR- 12; YAD -10), still rather small samples, but 4 times as many as the CRU12. The CRU10 made up 10 of 22 POR-YAD cores. A table summarizes core and tree counts is shown below.

Site

Briffa 2000 Cores (Trees)

Briffa 2009 Cores (Trees)

Subfossil

235 (219)

235 (219)

Shiyatov

5 (5)

5 (5)

Jahak (JAH)

2 (2)

25 (23)

Porzayakha (POR)

5 (5)

12 (12)

Yadayakhodyyakha (YAD)

5 (5)

10 (10)

Khadyta River (KHAD)

0 (0)

34 (18)

Briffa conceded that it was reasonable to include KHAD data in an RCS chronology, but also argued that the KHAD data was “atypical” and that a sensitivity reconstruction with KHAD data was “arguably the least defensible” (as compared to corresponding versions separately adding in YAD, POR and JAH data.)

Briffa agreed that the Khadyta River, Yamal (KHAD) met their selection criteria, explaining its omission on the grounds that they “simply” hadn’t considered its inclusion. However, he argued that their influence was overstated in my initial calculation. Schweingruber had taken two cores per tree, while the Russian samples typically took one core per tree. In his online response, Briffa averaged the two Schweingruber cores for each tree, thereby reducing their influence relative to the other cores (inconsistently not doing the same thing for Russian trees with duplicate cores, though they are few and it’s not a big issue.) This seems like a fair enough procedure though there seems to be a bit of opportunism – a similar situation arose when a Schweingruber series was added to Taimyr and it doesn’t appear to me that the same thing was done over there. For present purposes, let’s stipulate to KHAD averaging and the addition of the new data and once again do Esper style testing for population inhomogeneity.

In the next graphic, I report on the results of stratification tests on the expanded population. In my original post, I contrasted the chronology resulting from the combination of subfossil and KHAD data to the corresponding chronology combining subfossil and the CRU12 (primarily POR-YAD.) As noted above, Briffa vehemently argued that the KHAD data was “atypical” and provided the “least defensible” reconstruction. In the graphic below, I’ve expanded the initial comparison using the Esper-style analysis: on the left, I’ve contrasted the subfossil population (here strictly subfossil, excluding the 5 Shiyatov trees) with a modern stratum consisting of KHAD plus JAH and the 5 Shiyatov trees and JAH ( 48 trees in total), and, on the right, the same comparison of the subfossil population to a second modern stratum, extending the CRU12 to the POR-YAD combination (22 trees, including the CRU10 – there were 2 JAH in the CRU12.)

The results are self-evident. The age-dependence curve of the KHAD-JAH-Shiyatov population is, remarkably, similar, (and remarkably similar) to the corresponding subfossil curve, while the POR-YAD population isn’t. In respect to the POR-YAD comparison, it appears to me, that as with the CRU12, any significance test that distinguishes between the Jaemtland populations a la Esper will also determine inhomogeneity between the Yamal subfossil population and the POR-YAD population (but not for the KHAD-JAH-Shiyatov population.) The Esper-style test indicates that it is the KHAD-JAH combination and not the POR-YAD combination that offers a homogeneous extension of the subfossil population to the modern period. And that a simplistic attempt to use “all” the data is merely a backdoor method of getting the inhomogeneous POR-YAD cores into the mix – a line of argument that we’ve seen previously with bristlecones.

Although the KHAD-JAH population is evidently much more homogeneous with the subfossil population than the POR-YAD population, Briffa’s online article commentary gives the opposite impression, stating of my initial sensitivity test using KHAD data:

exclusive use of the KHAD data (i.e. the most ‘extreme’ of McIntyre’s alternative chronologies) likely provides an atypical representation of the more general long-term course of changing tree growth, as represented by the data from the other sites.

Briffa’s argument, in effect, entirely ignored the problem of population inhomogeneity – notwithstanding CRU claims to have “stressed” this issue. They dumped all the data into the hopper including the inhomogeneous YAD-POR data (extending the CRU12) and got a chronology that was sort of similar to the original chronology. They argued that there was:

“no additional information with which to justify the exclusion of any of these data”

This is obviously not correct. There is enough information to carry out an Esper-style test for population homogeneity, a test that indicates that the KHAD-JAH population is relatively homogeneous with the subfossil population, while the POR-YAD population isn’t. Contrary to Briffa’s claim, this inhomogeneity test mandates either a substantial adjustment to the POR-YAD population, or, if that is unavailable, the exclusion of the data, as Esper did with Trondelag and Jaemtland. Calculating the resulting chronology, here is the 20th century portion:

Adding the apparently inhomogeneous POR-YAD populations will, of course, result in a HS. The issue is, of course, the inhomogeneity of the POR-YAD population with the Yamal subfossil population. Briffa said that they “stressed” the importance of population homogeneity.

But it’s not enough to pay lip service to the concept. It needs to be demonstrated.

The Esper-style test is, by no means, revealed truth and doubtless there are better ways of testing for population homogeneity well known within conventional statistical literature. But at least the Esper method is recognized within dendro literature and, to my knowledge, the only test for population homogeneity reported in the dendro literature (I’d be happy to test any other methods drawn to my attention).

Application of the Esper-style test demonstrates Briffa’s failure to carry out any test for population inhomogeneity and demonstrates the futility of the analysis in Briffa’s online post. It also supports the surmise made in my original post – that there was a serious issue with the population inhomogeneity of both the CRU12, now seen to extend into the expanded POR-YAD 22.

A note that I’ll be following up on – in carrying out the above analysis, the age-RW plots compare age to average ring widths for each age. These values do not reconcile to values at Briffa’s online article (even where core counts are matched exactly.) It appears that the age-ring width plots in Briffa’s online article do not show actual average ring widths but adjusted average ring widths- the adjustments might be based on the algorithm described in Melvin and Briffa 2008, but this is just a guess.

76 Comments

Fantastic work. The resulting chronology isn’t that different from one I calculated by running each region separately – which should also be somewhat reasonable if homogeneity is an issue. Individual regions aren’t guaranteed to be the same either.

As I had previously surmised, it was implausible that only 12 cores had been taken in 1988 and 1990 from the JAH, POR and YAD sites. This proved correct. Each site had additional cores, measurement data from which was included with the online article ( JAH- 25; POR- 12; YAD -10), still rather small samples, but 4 times as many as the CRU12. The CRU10 made up 10 of 22 POR-YAD cores.

This has been making me crazy but I don’t have enough experience. Here we have sets of data from different regions with consistent numbering schemes which happen to contain some of the data from the Yamal HS but not other data. Was this data sorted out in the original Russian paper and that’s why it wasn’t used in Yamal? Is this a compilation of different collections by different people? Or is it just more sorted and scrapped data in paleo. If it’s the last then someone with a noisy blog should do a post showing both halves – scrapped vs non-scrapped.

The graphic below contrasts RCS curves from JAH-YAD-POR sites selected into the CRU12 versus sites not selected (and only available now). The bulk of the excluded cores (23) are from JAH (which I grouped with KHAD in the comparison in the post as it seemed to group with KHAD rather than YAD-POR and I prefer to start with simple contrasts); there are 12 from YAD-POR.

Re: jeff id (#2), I share your continued confusion at all the sites and updates and what’s included and what’s excluded. Steve a table would be nice, textual descriptions are just not doing it for me.

this in particular left me scratching my head and wondering what the point is

I’ve contrasted the subfossil population (here strictly subfossil, excluding the 5 Shiyatov trees) with a modern stratum consisting of KHAD plus JAH and the 5 Shiyatov trees and JAH ( 48 trees in total), and, on the right, the same comparison of the subfossil population to a second modern stratum, extending the CRU12 to the POR-YAD combination (22 trees, including the CRU10 – there were 2 JAH in the CRU12.)
The results are self-evident. The age-dependence curve of the KHAD-JAH-Shiyatov population is, remarkably, similar, (and remarkably similar) to the corresponding subfossil curve, while the POR-YAD population isn’t.

The difference between curves in the right graphic “looks” just as significant to me as the difference between curves in the left graphic. Unfortunately, neither Esper (nor Briffa) establish quantitative procedures for assessing such significance. My own opinion – and I’m speaking here in purely statistical terms – is that any reasonable significance tests which rejected homogeneity for the Jaemtland populations (left) would also reject population homogeneity of the CRU12 with the Yamal subfossil population (right).

Just compare parameter estimates from the negative exponential fits. If either the intercepts or the decay rates are significantly different from each other at the 0.05 probability level then the curves are highly unlikely to be drawn from the same population.

I realize that one can use the properties of the curves for a test of significance. And doubtless there are other tests of population homogeneity within the literature. For now, I’m trying to canvass the artesanal methods and ensure that I’ve understood them properly before trying to propose how or whether to quantify them.

It’s too bad that engineers haven’t figured out to attach wind turbines to Team publications in order to capture the energy from arm waving. Arm waving seems to be a highly renewable form of energy and,unusually for wind energy, seldom seems to have still moments.

Surely the fit for each data set should deliver parameter estimates, their marginal confidence intervals AND a parameter cross-correlation coefficient. So you get the two-dimensional (linearised, presumably) confidence REGION for the fitting parameters for each data set. Wouldn’t it be those regions that should be compared?
Admission: it’s been a while since I worked on this sort of thing. But I was a Draper-and-Smith man in my time.

These are not reported in Team methods. Despite Delayed Oscillator’s criticism of me for not using ARSTAN, because I use standard R functions, I can extract standard errors. I’ll have to modify my wrapper program to report these values; it’s easy to do, but will probably have to wait a day or two.

IN a quick look at one fit, the standard errors were “small” relative to the coefficients: t values of 40-100.

There are a whole series of issues in the dendro world that are “stressed” and “acknowledged” but never tested for: population homogeneity, stationarity, divergence, precipitation effects, nonresponders within a population. These issues and others are mentioned, and then the analysis is carried out as if they had somehow been resolved by the mere fact of mentioning that one is aware of them. And yet, each of these issues, unless proven to be of no consequence, essentially invalidates the whole exercise. “Yes, we are ready for takeoff, if we have gas and there is no ice on the wings and the runway is clear” (said the captain, without verifying that any of these things is true).

The difference between curves in the right graphic “looks” just as significant to me as the difference between curves in the left graphic. Unfortunately, neither Esper (nor Briffa) establish quantitative procedures for assessing such significance. My own opinion – and I’m speaking here in purely statistical terms – is that any reasonable significance tests which rejected homogeneity for the Jaemtland populations (left) would also reject population homogeneity of the CRU12 with the Yamal subfossil population (right).

You can be sure that if there is a test that does not reject one and does reject the other, no matter how inappropriate, it will be used.

The problem with qualitative curve comparisons: as a freshman taking physics, I got my first test back. It was on oscillations, springs, etc. Teacher said to “sketch” the curves of the equations. I did. I saw the first set of markings all the way to the end: perfect score. Then the teacher went back and marked it down to an 85% because my drawings were not neat enough–this guy did not like to ever give an A. First pass–perfect, but he could change his mind. The problem with Bender’s suggestion is that the ci on the right side Yad data red line will be huge, making almost anything “compatible” with it (though not for the black line).

the ci on the right side Yad data red line will be huge, making almost anything “compatible” with it

Well, if we accept Rob Wilson’s proposition that it is fair to screen populations on the basis of their variance during any interval of concern then this alone could be used as a criterion for distinguishing the two populations. What are the residual variances during the phase that concerns you? Divide the one variance by the other and you have an F-statistic to tell you if the two variances are different.

Is “artesenal” Queen’s spelling of “artisanal” (a lovely word to describe what they are doing, denoting both skill and, shall we say, identifiable idiosyncracy of the practitioner) or is there some other meaning intended by that spelling?

Steve: Yes. While I have the odd typo, usually I don’t make spelling mistakes, but I did here. Mu use is modeled on the use of the word for “artisanal” mining operations – which, as here, indicate that there is a practical, if unsophisticated, intent.

No prob, bro. Just wanted to make sure I wasn’t missing an intended wink somehow. It’s still a lovely and evocative usage of the word. Of course, the individual artisan may in the end turn out to be a prodigy and redefine his field –but the burden of proof is on he and his defenders.

Come guys, since it is obvious that dendroclimatologists make these comparisons by eye balling then eyeballing the comparisons that Steve M has presented shows that Briffa did not take homogeneity into consideration here. Instead of us all getting our underwear in knot why not wait and see if Briffa replies to this latest consideration. It is tiring to here the dendros talk a good game and then turn around and ignore their own pronouncements – evidently in haste to publish papers. Let Briffa come up with a statistical test to justify his selections and rejections – even if it will be after the fact. Heck, I’ll even accept a reply that says: Gee, I never even thought that about doing a test – or we just took someone else’s word for it.

Steve: you should try the variance F-test, suggested earlier, in #18, on Wilson’s base case of Polar Urals vs Yamal. I’m willing to bet the difference in variances here is trivial compared to the difference pointed to by Craig Loehle.

S. McIntyre:
Eyeballing, I (my eyes) have a problem with the gray (subfossil) portion of the “Briffa 2000 Esper-style Homogeneity Test”. My eyes tell me that the right-most one-third of the smoothed curve is cutting too low (too many data points above the smooth, too few below). Now you are the statistical expert, not I. Wanna clue me in here?

There are only a few points that contribute to the age curve on the right. You can fit an nls curve on all the measurements OR on the age-profile curve. I fit the nls curve on the entire population, rather than to the age-profile curve but the PLOT here shows the age-profile curve with the population fit. As you observe, there isn’t a precise visual match. Briffa, Esper both fit the curve to the age-profile curve rather than the population. Reasonable people can perhaps disagree on procedure, but on first blush, it makes more sense to me to fit to the population.

The age-profile curve is the average ring width for each age. Esper uses a biweight mean, while, AFAIK Briffa uses a simple mean – not the most defensible decision with the non-normal distributions, but used here for consistency.

I’ve posted it in the RCS – One Size fits all thread but it seems relevant here as well (and probably when Steve posts his next thoughts in this sequence).

Using a very standard statistical approach I have asked the same question – do POR-YAH-JAH have the same growth profile as the sub-fossil trees in the Yamal dataset?

To answer the question I run the following random effects regression:
log(rw(tree,year))=constant+climate(year)+age(tree)+(age(tree).I(live))+error

climate(year) indicates that I estimate a parameter for each year that can loosely be called climate but is just the average year effect
age(tree) indicates that I estimate a parameter for each age of a tree
age(tree).I(live) indicates that I estimate a parameter for each age of a live (YAD, JAH, POR) tree

This parameterisation means that the estimates for age(tree).I(live) are differences from the profile for all other trees in the sample. A test for their significance can be constructed as a standard Chi squared test. This test is so standard that it is canned in my statistics package and most others (testparm in Stata if you are interested).

The graph of the difference between the live trees and the subfossil trees with 95% confidence intervals (2 standard errors) is below. The axis is natural logs so the divergence is such that ring widths of these trees are about 60% (e(-0.5)) of the subfossil ring widths over the first 100 years of their life, which matches what one can eyeball off the RCS curves above:

The test for the equality of all these coefficients to zero gives these results:
chi2(377) = 1223.50
Prob > chi2 = 0.0000

i.e. the coefficients are jointly significantly different from zero (at any meaningful significance level you want to consider)

If you want to be more precise, the growth from 1-100 years is significantly lower than average and the growth from 300+ years is significantly above average while growth from 100-300 is not significantly different from average for the rest of the trees in the Yamal data set. I note that you could eyeball this from the RCS curves that Steve has posted here if you were looking for it. But I’m more comfortable talking about standard statistical tests of parameter signficance than eyeballing a curve with unknown standard errors.

So yes, the estimates are noisy and have relatively wide standard errors. Notwithstanding that, it is possible to draw statistically based conclusions that the live trees are inhomogenous with the subfossil trees.

Yes, and this one for the less numerate among us: “Notwithstanding that, it is possible to draw statistically based conclusions that the live trees are inhomogenous with the subfossil trees.”
======================================

I would like to see a new study named “McIntyre and Biffra 2010” (or Biffra and McIntyre) to provide a definitive statistical validation of the entire dendro methodology for deducing climatological effects based on tree-ring analysis.

There are logical dangers in preferring a series of curve fits to establish significance, because the fitted cirves might not be optimal ot suitable. Better to encourage the authors to do their own?

As you know, in geology, the observer cannot change the rocks to suit the outcome. There is some artificial advantage in dendro to be able to select and reject rees based on whether they look appropriate. In the final strict analysis, the method fails of one cannot select the trees before the event. So, while it is useful to demonstrate that there are differences between groups as you show, there is still the major problem of tying the relative differences to a climate signal, more specifically to temperature.

It’s like pulling decaying teeth one by one after taking considerable trouble to show thay need replacing with expensive individual crowns. After a while, it is easier to decide that all-round dentures are the solution. And knowing that dentures are not capable of the performance of the originals.

The modern treeline is right at 67 deg 30 min so I would suspect the POR live trees circa 1990
are right at the upper limit of their viability. The other sites being south of this line will not
be as stressed and should be more homogeneous in their behavior, which you have shown here. I suspect
the Team wanted to use trees right on the edge to maximize any sensitivity. More heavily boring in
the upper reaches of KHAD would have been the pragmatic thing to do.

I would suspect the POR live trees circa 1990 are right at the upper limit of their viability.

You “suspect” that but things are not always as they seem. For example, the Polar Urals site is a little bit to the south but measurably higher. The ring widths for POR and YAD are high relative to the 60 Schweingruber sites in NW Siberia.

Steve, I also like the giant squids analogy. Now the Hockey Team has a mascot! Here in Orange County we have the Anaheim Ducks. The IPCC/CRU has the East Anglia Giant Squids, a hockey team that makes their own sticks and covers their tracks.

The downwards red peaks like around year 1470 are not supported by much grey matter. In contrast, the upwards red peaks like the following one to 1520 or so, are filled with grey. There are a couple of exceptions, but mostly the downwards cusps seem to have less original data around them and so look white inside.

Yes, yet note the understated tone of Steve’s language this time. He ends the preamble this way:

The results are provocative.

Now, what is “provacative” to one may be “disquieting” to another, so it really doesn’t pay to get into semantics. But I don’t see any of Steve’s dozens of critics lining up to thank him for his muted language this time.
.
Just goes to show you that people pay more attention to packaging than substance. Which is precisely how we got into this whole hockeystick mess.

I’m wondering what will happen when this sort of methodology is applied against Taimyr.

BTW there are a couple of Schweingruber data sets that seem to be virtually co-located with Taimyr sites, that weren’t included in the expanded Briffa 2008 data set. I guess that Briffa simply didn’t consider them either.

BTW there are a couple of Schweingruber data sets that seem to be virtually co-located with Taimyr sites, that weren’t included in the expanded Briffa 2008 data set. I guess that Briffa simply didn’t consider them either.

Why would you consider hunting for more data when you “get” the answer that all the other “independent” studies are “getting”? Once you’ve “gotten” what you were trying to “get” you “simply” “move on”. The “self-correcting” part comes later.

I have a cool new way of doing a generalized exponential RCS chronology using the well-known nlme package – one that yields a LOT of diagnostics. I hadn’t previously used the SSasymp function in this context, but it is precisely the same as the generalized neg exponential.

I guess that one defence would be that this is not comparing apples with apples as only trees alive in the 20th century have the huge sigma up tick because only the 20th century temps had that up tick. It could be claimed that this invalidates comparing the RC growth curves. Unfortunately this maybe hard to argue against as by definition there is not a full overlap being live and (sub)fossil trees.

There is a bit of an overlap and I suppose that could be tested to see if adding the different groups together produces a tighter/lower SD(year) plot in the resultant chronology during the calendar years of the overlap than would be the case for uncorrelated data. It might be better to plot SD(year)/MEAN(year) where MEAN(year) is of course the resultant chronology. This boils done to the simple question “how well do they correlate during the overlap period, as that determines the way the SDs behave additively.

When I try this it does not look like the groups fit at all well, but I can not calculate a relevent statistic. So it is just an impression.

FWIW when I try to plot the SD/MEAN for just the fossil trees (over the full 2200 years) it looks a bit of a roller coaster leading my thinking that there are probably other inhomogeneities (around 210, 420 and 1720) amongst the fossil trees.

Steve, at the time Briffa wrote his response to you I bet he did not really understand how badly he had screwed up. This post holds his errors up to the light so everyone can see the cracks. I’m just wondering if Briffa gets it even now. If he does, my guess is he will not attempt a substantive response. If he is like Mann, he will say it is nonsense and then attempt to stop journals from publishing your criticisms. If he still doesn’t get it, he may think he can still win and attempt another response. I’m hoping for a further response. It promises to be a hoot!

In Benoit Mandelbrot’s book, The (Mis)Behavior of Markets, there is a short discussion of tree rings. This had not made much of an impression when I first read the book. Now, however, I wonder if it is applicable to the homogeneity issue and to the whole question of tree selection.

My apologies if this is something everybody knows and has already been factored into the discussions. Here is the short précis (as I understand it)from page 184 of the 2006 Basic Books paperback edition:

…unlike the roll of dice where the probability of an outcome is independent of all past outcomes, events in the real world such as yearly Nile river levels, the price of cotton, or the movement of the stock market have some degree of dependence on past events. Of possible interest to the current topic, Mandelbrot talked about tree rings having a degree of long-term dependence and he used a plot of bristlecones from Mount Campito in the White Mountains of California. Among other things, he commented that

“Adjacent tree rings, the marks of growth only a year or two apart, are highly correlated. Beyond a few years, the correlations fall; the pattern from one decade or century (my italics) to the next is more haphazard. But the correlations fall more slowly than expected. In fact, it is 150 years before they are so insignificant that to distinguish them statistically from chance, the usual tests are powerless.”

Statistics, fractal mathematics, and power laws are far outside my expertise, but I did wonder how Briffa and the rest have dealt with this issue. Is this another homogeneity factor? Does it further complicate the use of tree rings for climate temperature purposes? Again, sorry if this has been covered on CA, but I could not find a direct reference to it.

We’ve talked about this before but not for a few years – and I’d prefer that you hold the thought for another occasion. Mann argues that the autocorrelation is low-order AR1 red noise at most and hotly contests long autocorrelation.

There is some other relevant information that I wasn’t aware of when I commented on this sort of topic in 2005 or so. For example, it is now crystal clear to me that simple mechanical deformation (see the close spaced cores in our Almagre posts) can cause century-scale high sigma excursions in ring widths. At present, I don’t know how to model this sort of structure. I’m not sure that long-range dependence is the mot juste for it, but I wouldn’t exclude it either.

….. it is now crystal clear to me that simple mechanical deformation (see the close spaced cores in our Almagre posts) can cause century-scale high sigma excursions in ring widths ….

In our Mechanics of Materials classes, we engineers learn that simple mechanical deformation of a non-homogeneous material can sometimes result in an unexpected pattern of strain distortion within the material.
.
So, within this contextual framework, the following question naturally arises: do the outermost chronologically younger tree rings respond to simple mechanical deformation in patterns that are materially different from those in the older parts of the tree?

Re: Scott Brim (#60),
Take a look at at two nearby cores at an Almagre strip bark tree discussed http://www.climateaudit.org/?p=2239. HEre’s a ring width plot from the prior post showing the huge amount of difference between cores a few inches from one another.

Black – drilled from S in center of strip; red – from SE from edge of strip

I am a bit doubtful of the Esper style when applied to the Yamal live trees. For the simple reason that it is possible that the difference in the growth curves could be signal. The data that is being compared consists of both a growth curve and signal. If someone maintains it is signal then there needs to be another way to settle the issue.

I have considered how to try and compare apples with apples and I think there maybe some value in using PCA to isolate the dominant signal from the noise.

If you construct the chronology but instead of averaging to obtain the signal for each calendar year you extract the first PC and calculate the unexplained variance you may have a measure of fit for each core in terms of the ratio of unexplained to explained variance, low values indicating a good fit.

I have attempted this and I think I have produced a PC/EOF pair that maximises the explained variance, obviously it is a sparse matrix and so it is not the normal PCA procedure.

In order to compare like with like I only used data from the years of overlap of the live and fossil trees (1580-1971) and have excluded all trees that do not have a minimum of 97 years in that range (the length of the shortest history of the live trees in that range, i.e. YAD081 once truncated at 1971). That left just 57 cores.

After that I proceeded to cast out trees based on the highest ratio of unexplained to explained variance, recalculating PC1 each time.

To begin with PC1/EOF1 explain about 80% of the variance and PC1 looks similar to the Briffa chronology (Correlation .77) but clearly not the same. Some of the difference will be due to excluding cores not meeting the 97 year criterion but some simply because it is a PC not an average.

The casting out process creates an ordered set about which certain statistics can be calculated. I chose to look at whether the YAD, POR etc. cores where cast out atypically early. So if for instance all the YAD(5) where cast out in the first eleven that would be significant and a value put on how likely that was compared to the most likely number to be cast out (5*11/57 =~ 1).

The casting out proceeded in a way compatible with chance, the only group to go noticeably early was the JAH(2) after 22 casting outs, but not an unlikely event.

I was quite surprised that the fossil cores did not stand up for each other better, they were the dominant group, but they did not. They did not seem to agree with each other any better than they did with the live cores.

The process stopped when there were five cores left one of which the truncated YAD081, the other four being fossil cores. The next core to go would have left the chronology disjoint.

The final PC1 with just the five cores resembled the original PC1 (57 cores) correlation .83. So the process seems to have kept cores typical of the original set and maybe really has cast out cores peripheral to the original set in some sort of even handed way. I was surprised that the fit between these PCs looks particularly good in middle years (1670-1850) where I suspected that divergence between the live and fossil cores would be strongest, but the process found fossil cores typical of that period, which originally contained a high proportion of live trees, without any retention of live cores (the remaining YAD081 doesn’t start until 1875).

Well one thing is for certain the method I chose did not discriminate between the live and fossil cores, and the method was picked before the process and the result was known, perhaps it shouldn’t have as it is simply not doing what I had hoped, or perhaps the fossil cores are just as inhomogeneous internally as the larger set is.

I have to add that I did initially try another method using PC2 to pick outliers but I did not understand what it meant and on occasion it kicked out a core that resulted in a drop in the proportion of explained variance which suggests that it was not necessarily picking the worse fit cores, it was also taking too long, but it did cast out the YAD(5) very early because I suspect they agreed on how they disagreed with the rest of the cores.

Re: Alexander Harvey (#58), I tend to agree with you about the usefulness of the Esper method. At least with my method the estimation is simultaneous and so you can separate the year effects from the age effects. (Because the trees are not exactly the same age in any given year you can get this identification.) I wouldn’t, however, go so far as to call either of them ‘signal’. Who knows exactly what it is – it is just the variance that can be ascribed to age effects and year effects from that data. Are you ascribing ‘signal’ to year effects or are you meaning something else?

Furthermore, my method is OLS, and therefore BLUE (best linear unbiased estimator) – that is it has the lowest theoretically achievable variance of all linear unbiased estimators. (If some assumptions are satisfied.) Or, rewording it, it has the maximum possible explained variance of any linear unbiased estimator. That seems to be what you are looking for with your PC approach isn’t it? With OLS you have it.

I am a bit doubtful of the Esper style when applied to the Yamal live trees. For the simple reason that it is possible that the difference in the growth curves could be signal.

There are a couple of different issues here. First is the simple one that population inhomogeneity will cause break points in a one-size-fits-all standardization. This is nothing to do with dendro. It’s elementary statistics. Esper’s Jaemtland example shows an instance with RCS.

So anybody using RCS has to figure out a way of showing population homogeneity. Esper’s method was the only one that I’d seen. I’m not asserting that it’s “right” – only that it is an approach within the dendro literature.

I agree that separating “signal” from inhomogeneity is not all that simple a problem. However, the Esper test stands against Briffa’s claim that the KHAD data is somehow anomalous and the POR-YAD data is representative. Briffa’s own “tests” don’t test population homogeneity. If dendros have no way of demonstrating population homogeneity, then they it’s hard to justify one-size-fits-all RCS standardization.

While it’s all very well for readers to propose some bright idea, I am generally critical of homemade methods that are not related to known statistical literature. Testing for homogeneity should start with a test reported in a statistical textbook in some roughly similar context – there is considerable literature on growth curves.

Check out page 3 of this presentation on tree rings by the University of Western Ontario. Their intent is to
show some of the scaring and distortions that can happen in real-world trees. In this example a mature tree
with very small outer rings has been scarred by a glacier, and then after that there is a sudden huge growth
spurt with rings very much larger than just a few years before. Note that if you were to take cores of this tree
at right angles one would be normal, the other would show a pathology. This is why you need to have a tree guy
check the core samples to rule out pathologies before including them in climate studies.

The presented homogeneity analyses might ultimately reveal even a more serious problem for tree-ring chronologies as a climate proxy than the (so far ignored in the published studies) must for a quantitative test for population homogeneity.

The discussion revolves around fast-growing short-lived trees and slowly-growing long-lived trees, which display very different ring width patterns as dependent on age. Those trees that are “destined” to live long often, at the same age, display slower growth rate when compared to trees “destined” to die young. Just from common sense, it would seem to me legitimate to ensure that temperature of each calendar period in the chronology is reconstructed using the same proportions of data from the different tree groups.

But the problem is that taking dead trees from a fossil population we already know which trees were “destined” to live long and which happened to die young. In contrast, in a living population looking at trees of a given age we cannot say which will live long and which will die young. This information belongs to the future. Of course, we could try to figure that out comparing growth rates among the living trees. However, as we also need to remember about the possible climate signal (which is unknown), we will be unable to objectively decide whether a given tree is a normal fast-growing tree at a zero climate signal or it is an (otherwise) slowly-growing tree with a growth rate enhanced by some climate signal.

In other words, lack of the information about the “destined” life span of a given tree in a living population can render the problem of proving population homogeneity between fossil and living samples mathematically unsolvable.

Re: Anastassia Makarieva (#64),
If you can’t differentiate those destined to die young versus those destined to live long, either on the basis of (a) chronology trends or (b) climate responses, then there is no material difference to worry about; your “unsolvable problem” is irrelevant.

Re: bender (#66), Given that short-lived and long-lived trees show a different response to climate signal, I must be able to know whether a given tree is a short-lived or long-lived from independent sources. For fossil trees I do have such information from their ultimate age. For living trees this information is lacking.

For a hypothetical example, to reconstruct temperature for year 1500 I use 10 tree rings of 50-60 years of age, five of which come from short-lived trees and five — from old-lived trees. Now I want to compare these data with the same 50-60 years old tree rings for year 2009. And I also want that five be from short-lived trees and five — from long-lived, to ensure a plausible comparison. But for living trees that are 50-60 years of age today I do not know which will live long and which will die next year. Hence, in this case I cannot in principle reach the population homogeneity with the 1500 year sample.

But for living trees that are 50-60 years of age today I do not know which will live long and which will die next year. Hence, in this case I cannot in principle reach the population homogeneity with the 1500 year sample.

As I recall from Melvin’s thesis, he talks about using the tree’s diameter as a predictor for eventual tree life expectancy (fast growing trees die early) and thus use that as a factor in the expected growth rate curve.

That would make the difference eventually manifested by tree life time a function of growth rate. A slow growing tree that is killed at a young age (by lightening or another act of nature unrelated to the tree itself)would surely be different than one that grows fast and dies young because of that growth rate.

Re: Kenneth Fritsch (#71),
What I mean, if there is no climate signal you are right, one is always able to discriminate between a fast-growing and a slow-growing tree. But presuming an unknown climate signal, how can one discriminate between an inherently slow-growing tree that has been growing faster for the last 50 years due to AGW from a fast-growing tree without AGW? If both tree classes reacted strictly proportionally to temperature, this information could still be inferred from comparing fastest to slowest trees in the population. But if the fast-growing and slow-growing trees react in a quantitatively different manner to temperature changes, there will be no objective way to tell them apart in a signal-affected living population and no way to meaningfully compare them with the fossil population. The climate signal can potentially change not only the mean growth rate of all trees, but also the frequency distribution of growth rates. Alternatively, the population structure with respect to growth rate and ultimate age can change dramatically without any climate signal.

I would think that the (strong) climate signal tree ring growth could be differentiated from the background tree ring growth for a fast growing tree and further that a climate signal is not going to be present for all 50 years. If one removes the climate and other noise we obtain a growth rate curve and that growth rate curve will be different for a fast and slow growing tree.
.

I am not sure how this is best done without confounding a strong climate signal with fast growth, but I would guess that using multiple aged trees would be required.
.

I think personally that for Yamal the problem with older trees and tree rings is a measurement error that is present for all tree ring ages, but that error provides much more leverage on narrower older tree rings in the resulting delta calculation and thus produces huge CIs.

Given that the idea of the tree-ring analysis is to infer a climate signal from the changing growth rate of trees, my question is — perhaps an analysis of population homogeneity between the fossil and living samples should be made based on metadata rather than on the growth rate itself? Otherwise it will be always possible to conclude, as essentially Briffa et al. did, that the two populations are so different because they lived under very different climate signals (AGW and no AGW). After all, Esper who concluded that the Trondelag trees grew “too fast” and Briffa who concluded that his 20th century sample grew precisely as fast as needed to produce a hockey stick, look equally subjective.

I mean, suppose that in a hypothetical tree population there are yellow and red trees, with yellow trees doubling growth rate per 1 degree of temperature rise, and red trees decreasing their growth rate by 10% under the same conditions. Then the analysis of population homogeneity should ensure that the 20th century is NOT predominantly represented by yellow trees at the expense of red ones compared to the fossil population. If, on the other hand, yellow and red trees are indistinguishable in their climate response, then it will not matter for the population homogeneity analysis whether the sample is largely composed of red or yellow trees.

In this light the issue of population heterogeneity does not appear as a purely statistical one. It should explore homogeneity with respect to those traits that has proved to elicit a differential growth response under the same climatic conditions. If, for example, vigorous trees with large thick branches, which Steven Mosher once mentioned, react more favorably to climate change than the thin-branched trees, then one should test for homogeneity regarding the size of branches in the fossil versus living population. But again this hypothetical analysis will be an analysis based on metadata (tree color, branch size), not on the data (tree growth rate) themselves.

I can be mistaken, but I interpret Melvin’s (2004) graphs as clearly suggesting that short-lived and long-lived trees display very different climate responses. So showing that Briffa’s tree-age structure for the 20th century is distinct from the fossil population with respect to the ultimate tree age should already be a proof of population inhomogeneity.

These are just a few thoughts, and I would readily agree that it is another irrelevant idea, but in my view it is difficult to expect that a general CA reader, however interested, would deliver arguments at the same in-depth level as the CA professional core does.

Steve and Anastassia:
I don’t disagree with your remarks. I re-iterate: IF you can show a heterogenous response for this specific instance THEN you have a strong case. I would not start by assuming, rather generally, that the “problem is unsolvable”. You must start by showing that it needs solving.

YAMAL peninsula
Latest research into photosynthesis has shown that it is a chemical process with strong electrical undertones. It should not be unexpected that the process is affected by changes in the magnetic field to which it may be exposed.
While the geomagnetic field was declining in the Southern and the western parts of the North hemispheres, Yamal peninsula as well as the wider region of central Siberia is exception in this respect; here the geomagnetic field was on strong rise since 1930s as shown in this graph:
See also (work in progress)
: http://hal.archives-ouvertes.fr/docs/00/41/83/04/PDF/NATA.pdf