The Briffa-Osborn Variance Adjustment

UC inquired about the variance adjustment in Osborn et al (Dendrochronologia 1998), which is used in many Team publications. The number of series in many reconstructions declines as you go back in time. If you take an average of standardized series (the CVM method), the variance over an early time interval will be larger than the variance in a later time period.) The BO variance adjustment was used originally in proxy reconstructions but this procedure or a variant seems to have been introduced into some of the CRU temperature gridcell series as well. The adjustment is described as follows:

Each regional mean thus obtained tended to have greater variance during years when few chronologies were available to contribute to the average; this effect was corrected for by scaling by the square root of the effective number of independent samples available in each year.

First they state

Let’s assume that one starts with a set of series all standardized to 0 mean and sd -1. Then if is the average of n series with a mean correlation ,

(1)

If the series are uncorrelated (, the variance goes down to 1/n;

(2)

whereas if the series are perfectly correlated ( ), the variance stays at 1. They assert:

“an artificial signal will be introduced into the variance of Xbar if the sample size varies through time.”

Comparing (1) and (2), they define the “effective independent sample size” as follows:

(4) ;

They express (1) as follows

(5)

When one thinks about this, this is a very odd terminology. This is measuring not so much the “independent sample size” as the relative lack of coherency in the sample – but let’s proceed, holding this thought. They make the unsurprising observation: “If is low, variance will increase strongly as n falls below 10.” They observe that in western U. S. confiers are about 0.6; in eastern US deciduous hardwoods about 0.3 and as low as 0.2 in deciduous European sites; and from 0.28 to 0.71/.74 for Siberian RW and MXD sites. They illustrate (Fig 2a) the average of 8 sites in S Europw where variance increases pre-1750 as n decreases ( is only 0.07).

They go on to say:

The method presented here is theoretically based …Equation 4 provides the time-dependent effective sample size if supplied with the time-dependent available sample size. We would then expect the variance of the mean timeseries to vary according to equation (5). If we adjust the mean timeseries by

(6)
then we would expect the variance Var(Y) to be independent of sample size (but would still have any real variance signals that are present in the data)”

They go on to discuss a couple of variations, where latex varies with time, but the idea is the same. Briffa and Osborn do not provide any third-party statistical references for this procedure.

Here is a function to implement the BO adjustment. rbar0 can be a time series or a constant.

I’ve included a script here illustrating the use of this method in attempting to replicate the archived version of Jones et al 1991. I can more or less replicate a smoothed version of the archived reconstruction, but the difference between my attempt to replicate Jones et al 1998 and the archived version can be up to 0.5 deg C in individual years. (And we’re told that these reconstructions are accurate to within a couple of tenths of a degree or so.)

Top – comparison of emulation to archived as smoothed; bottom – difference between emulation and archived version.

As to the Briffa-Osborn adjustment itself, if you have series with relative little inter-series correlation, one expects the variance to increase by reason of the Central Limit Theorem. Does the Briffa-Osborn adjustment do anything other than disguise this? I think that someone on the Team needs to prove the validity of the methodology statistically. Of course no one on the Team bothers. They just advocate a recipe and then assert it.

I don’t have the original Osborn paper, but Frank et al seems to explain the method sufficiently:

In dendrochronology it is common practice to create a mean-value function as the best estimate of the trees’ signal at a site. This averaging process helps eliminate noise particular to individual trees and cores thereby increasing the signal quality.

IOW, measurement contains wanted part (signal) and unwanted part (noise). Noise term can be reduced by averaging (I wouldn’t use term ‘eliminate’ here). Measurements share a common signal, so averaging does not affect the signal part.

The variance of the mean-value function, however, depends upon the number of series averaged together and their interseries correlation (Wigley et al. 1984).

This is obvious. If we use only one tree, variance of the measurement

over time is

,

assuming uncorrelated s and n. If there are more trees, we take the average. Average doesn’t affect the signal part, but it reduces the power of the noise. Efficiency of this reduction depends on how correlated the noise term is between the trees. For a given year, the average is

Expectation value is (given s, )

Looks good to me. If you scale Y, you’ll obtain a biased estimate of s, right? Now, where do I go drunk?

I am not sure I buy the independence of signal(s) and noise(n), where the signal is a pure temperature signal. If y=f(t, p, tp , x), where t is the temperature signal, p the precipitation and tp an interaction term and x is all other factors including random noise then if s = f(t) and n=f(p, tp , x) then s and n can clearly be correlated. Can you really separate strong interaction effects between t and any other factors, with p being the obvious one, by this approach? Does this make sense? For example, and perhaps simple mindedly, if a given ring width is produced by average temperatures and average precipitation or by above average temperatures and above average precipitation, how do you separate temperature and precipitation? But I assume that this is so obvious a point dendrochronologists must have addressed it, would they not? It sounds like, for example, they choose sites based on some assumptions that attempt to control for other factors like precipitation – but frankly the logic of assuming constant that which inherently fluctuates is very puzzling. This is, I assume, part of the argument for up-to-date records so that this assumption of independence can be tested.

Am I readin this correct: “…this effect was corrected for by scaling by the square root of the effective number of independent samples available in each year”?

If one assumes that each sample contains a signal plus noise, doesn’t different scaling for different years distort the signal? After averaging chronologies of different length, the result must be the signal plus different amount of noise for different time periods, depending on the number of effective samples in each time period. Isn’t the effect of the correction that you lower the signal amplitude for periods where you have less data!? Instead of increasing the error bars!!

I think Martin is right, the difference in variance is due to different amount of noise cancellation, correct? If so, you just have to live with the higher noise when you have less samples, and represent this as larger error bars. Scaling will affect both the signal and the noise, thus masking the potential signal in the periods where there are less samples. I don’t see how this can possibly be justified. Why is this procedure being used repeatedly if it hasn’t been shown to be a valid statistical technique?

If one assumes that each sample contains a signal plus noise, doesn’t different scaling for different years distort the signal?

Yes. This adjustment leads to a biased estimate.

Isn’t the effect of the correction that you lower the signal amplitude for periods where you have less data!? Instead of increasing the error bars!!

Yes. In the past, we have sparser data. Past variations will be scaled towards zero. Increasing the error bars is not a legal move in climate science. Those bars might reach the current temperature levels, that wont do.

#7

Why is this procedure being used repeatedly if it hasn’t been shown to be a valid statistical technique?

For example, and perhaps simple mindedly, if a given ring width is produced by average temperatures and average precipitation or by above average temperatures and above average precipitation, how do you separate temperature and precipitation? But I assume that this is so obvious a point dendrochronologists must have addressed it, would they not? It sounds like, for example, they choose sites based on some assumptions that attempt to control for other factors like precipitation – but frankly the logic of assuming constant that which inherently fluctuates is very puzzling. This is, I assume, part of the argument for up-to-date records so that this assumption of independence can be tested.

Amazingly, I don’t think the dendrochronologists HAVE addressed this fundamental and extremely important issue. They seem to avoid the question like the plague. Tree rings are often very good proxies for moisture. I don’t think they are generally valuable as “thermometers” for many reasons.

Multiplication of the mean timeseries with the square root of Neff at every time t theoretically results in variance that is independent of sample size.

Seems that Eq (6) is not correct, X and Y mixed (??)

Steve wrote:

Because there is little reason to believe that the annual variance in the early period was substantially greater than at present, Briffa and Osborn [1999] proposed a variance adjustment methodology (applied here) as follows.

Because Briffa and Osborn have never heard of filtering theory (specifically, the problem of estimating the state of a stochastic dynamical system from noisy observations), they decided to go the easy way and just scale the observation so that the result looks good.

RE: #12 – I would however concede that some species in Marine West Coast, and in the wetter coastal portions at the northern margins of Mediterranean climates may be local temperature proxies. But how many such places are there on earth and how few of the overall claimed set of global tree ring proxies are actually found in such places?

A site mean with fewer effective samples might be expected to have a lower (temperature)signal to noise ratio. Then the signal part has lower amplitude after normalization than in a mean from a site with more effective samples. If all site means were adjusted to compensate for this before the total mean is calculated, the signal part in the total mean would be independent of which sites that are included at a specific time.

How this compensation should look I don’t know, but multiplying with n’ is probably in the right direction.

But, this compensation should be done uniformally over time but different on different site means before calculating the total mean. I can’t see that this is what they do.