Esper et al 2009 on West Siberia

Esper et al (Global Change Biology, in press) “Trends and uncertainties in Siberian indicators of 20th century warming” is relevant to our present consideration of Briffa’s Yamal, which I will get to shortly. The cutline in their abstract declares in effect that the divergence problem is not as “bad as we thought”:

Despite these large uncertainties, instrumental and tree growth estimates for the entire 20th century warming interval match each other, to a degree previously not recognized, when care is taken to preserve long-term trends in the tree-ring data. We further show that careful examination of early temperature data and calibration of proxy timeseries over the full period of overlap with instrumental data are both necessary to properly estimate 20th century longterm changes and to avoid erroneous detection of post-1960 divergence.

However, it doesn’t seem to me that this is supported very convincingly by their data analysis. They analyze the (archived) Schweingruber to 1990-1991 plus a considerable number of recent measurements, all of which are unarchived – non-archiving seems to have become standard practice among Esper and Euro dendros. Esper et al also comment on the instrumental record, worrying about adjustments to a degree that would not be out of a place in a CA thread. Here are a few excerpts.

Here is an excerpt from Esper’s Figure 3, showing the effect of different standardization methods (Hugershoff, Negative Exponential, RCS and 300-year spline) on the average of over 70 chronologies. As you can readily see, on an overall basis, there is a decline in both RW and MXD for a very large population of Siberian sites since 1940 or so. Esper’s abstract and conclusions emphasize the fact that the post-1940 decline in the RCS version (red) is somewhat less than the post-1940 decline in the Hugershoff version (purple) – and also that the 19th century RCS rise is greater than the 19th century Hugershoff rise. However, in our consideration of Yamal, the slight difference in post-1940 decline is irrelevant: once again, the large population doesn’t show the huge Yamal rise. The issue, as stated on many occasions, isn’t just the “divergence” of Briffa’s Yamal chronology from Khadyta River, but its “divergence” from growth patterns throughout western Siberia – making one wonder about possible inhomogeneity in the Yamal population, an issue that I’ll return to.

Esper specifically showed results from a “new” (and Euro-unarchived) west Siberian network, summarized in the next graphic. The “new” network ends up at a z-score in 2000 of almost exactly zero, while Briffa’s Yamal is exploring stratospheric multi-sigma deviations.

Esper Fig. 6. Updated WSIBnew tree-ring data and coherence with regional temperatures. Top panel shows the seven new MXD and eight new TRW RCS-detrended site chronologies together with their mean (WSIBnew) and the mean of all records in the WSIB clusters C1-3 (WSIB). While the latter extended only until 1990, WSIBnew reached 2000. Middle panel shows the WSIB and WSIBnew tree growth data scaled over the 1881–1990 (WSIB) and 1881–2000 (WSIBnew) periods to regional JJA temperatures. JJA and WSIB data have been decadally smoothed. Bottom panel shows the WSIBnew MXD and TRW timeseries together with JJA temperatures over the 1970–2000 period. Details on the updated WSIBnew sites, and all other tree-ring locations, are listed in supplementary Table S2. RCS, regional curve standardization; TRW, tree-ring width.

Esper also questions a variety of issues in the station histories, mentioning UHI, regional inhomogeneity in adjustment practices (see the Discussion and Conclusion for these) and GHCN adjustments. On regional adjustment to temperature records, they say:

In addition, the homogenization methodologies currently applied particularly in large-scale approaches, have difficulties in identifying and correcting for systematic biases that simultaneously affect data across larger regions (Parker, 1994; Frank et al., 2007a; Thompson et al., 2008). If we, for example, consider the substantial changes of instrumental summer temperatures that were recently applied to early station data in Europe and elsewhere (see both Frank et al., 2007a; Bohm et al., 2009, and references therein), it appears premature to solely use early temperature readings for proxy transfer and evaluation of DP in remote high latitude regions

The top panel below shows a graphic displaying GHCN adjustments of the sort that I did here a couple of years ago in connection with Hansen’s Y2K problem, emphasizing that the adjustments are as large or larger than the temperature changes being measured (a familiar CA point.) In the caption to the bottom panel, he says: “Negative deviations were inverted, combined with positive values, and decadally averaged.” I don’t understand the purpose of this procedure and had enough needles in my eyes for a while.

Esper Fig. 8 Differences between raw and adjusted (GHCN) temperature station records. Upper panel shows the single June, July, and August adjustments of all 13 Siberian stations and their mean timeseries (bold). In the lower panel the adjustments were averaged to mean JJA mean timeseries and sorted by stations in WSIB, ESIB, and NESIB. Negative deviations were inverted, combined with positive values, and decadally averaged. Ust is Ust’-Maja, Sur is Surgut, and Dud is Dudinka (see Table S3).

For our present consideration of Yamal, the evidence from the Esper networks in western Siberia is one of declining ring widths in the last half of the 20th century. Briffa’s Yamal is an exception to this general pattern – a point that is not discussed or reconciled in Briffa’s response thus far. Esper cautions in respect to RCS standardization:

It seems important to note, however, that RCS-detrended data generally contain greatest uncertainties, require large datasets, and are prone to biases caused by inhomogeneous sample collections (Esper et al., 2002, 2003a). Particularly relevant to the Siberian data analyzed here could be biases due to (i) the tendency that the oldest trees often grow most slowly (Melvin, 2004; Esper et al., 2007b; Wunder et al., 2008), and (ii) the composition of data from only living trees and relatively homogeneous age-structure (Esper et al., 2007a, 2009). The former bias is likely more relevant for TRW than MXD – because of the greater amount of variance contained by the agetrend (Schweingruber et al., 1979) – and would ultimately increase positive long-term trends in RCS chronologies.

In his response, Briffa made no effort to defend the methodology of the original Yamal chronology beyond declaring that it was done in good faith, instead moving on to argue that they can “get” a similar chronology from a somewhat larger data set, as presented last week. The most important issue – as stated here and elsewhere by Esper – is the potential “bias caused by inhomogeneous sample collections”, an issue that I’ll consider in connection with the new Yamal data in a forthcoming post.

“Negative deviations were inverted, combined with positive values, and decadally averaged.” sounds like a convoluted way of saying “average absolute value of adjustments in each decade” or “decadally averaged magnitude of adjustments”.

That assumes, of course, that “combined with positive values” means summed or added rather than some other unspecified manipulation.

This appears to be yet another case where the journal’s policy, in theory at least, requires enough information be published that others can replicate the results, but that in practice this requirement is not monitored or enforced.

Here is what the Climate Change Biology journal says about the Material and Methods section of a primary research paper:

4. Materials and methods
This should allow replication of all experiments described and demonstrate the validity of those experiments for the research being conducted.

The policy for supplementary material at Global Change Biology is set by the parent company, Wiley-Blackwell.

Wiley-Blackwell is able to host online approved supporting information that authors submit with their paper. Supporting information must be important, ancillary information that is relevant to the parent article but which does not or cannot appear in the printed edition of the journal.

and

Supporting information may also be displayed on an author or institutional website.

aThe most important issue – as stated here and elsewhere by Esper – is the potential “bias caused by inhomogeneous sample collections”, an issue that I’ll consider in connection with the new Yamal data in a forthcoming post.

Figs 3 – 7 including core counts and an esperesque reconstruction from code SteveM primarily developed. Also my post has caught the attention of Delayed Oscillator who is apparently a publishing dendroclimatologist although he doesn’t say who. Hopefully, he’ll clear up some of the issues with standardization.

Ah! So into the lion’s den, no wonder he’s so sensitive on every criticism I write. He even snipped me on his latest thread for saying that “besides these problems being drop dead obvious,” He left the part where Briffa has outlined the very same problems and everything else. He/She is nervous about this particular skeptic for sure.

—-
I’m surprised that Espers spline sits above his negative exponential correction in the headpost. That could happen with an exponential RCS fitting to the trees individually instead of the average so older trees don’t contribute much. One of my messing around regressions did that but it was a visually poor standardization fit.

Re: jeff id (#11), yes, if you search around the site you will find photos (1) taken
from a field research where he tried to update his site from abroad and had problems. date was feb 08 I recall.
Picture was of a forest ( looked to be coniferous) fog laden. Some tree dude could prolly look at it. There are other bits of data that would allow me to figure out who he is, but he hasn’t pulled any junk like Eli did or like Tamino did, so he is not abusing anonymity, he gets a pass from me.

My new “mantra” questions:
What, if any, selection criteria are described in the paper? [Ideally, this is clearly stated.]
To what extent, if any, are the criteria dependent on the data (sets) fed into the analysis? [Ideally, not at all.]
To what extent are the criteria followed? [Was data left out that fit, or included that doesn’t fit?]

Steve, in figure 3 what is the difference between RCS and negative exponential? After RomanM’s recent post and its discussion, nomenclature is confusing me. I thought there “negative exponential RCS” methods and “other (later) RCS” methods. Is there something known as “negative exponential” that is not an RCS method? Thanks.

Re: NW (#12),
Article and SI are uploaded to http://www.climateaudit.org/pdf/tree . My interpretation is that “EXP” is “conventional” standardizing (i.e. tree by tree but as stated here, prohibiting positive slopes). Prior to Briffa, trees were standardized one by one; the problem with this method is that you don’t get centennial variability. Whether the centennial variability of Briffa’s one-size-fits-all has relevant meaning is of course a different issue.

We applied four detrending methods, RCS, HUG, EXP, and SPL, to the single TRW and MXD measurement series by calculating ratios (residuals for HUG; Briffa et al. 1998) between the raw data and fitted growth curves. For each method, detrended series were averaged per site using a robust bi-weight mean and the variance of the mean chronologies’ stabilized for changes in sample replication and interseries correlation (Frank et al., 2007). Chronologies were truncated over the 1801-1990 common period (1801-2000 for the WSIB update), and subsequently averaged to form the mean cluster chronologies (C1, C2, …, C7). RCS was applied on a site-by-site basis (Esper et al., 2007), and the mean of all a age-aligned data (i.e., the Regional Curve) smoothed using a 10-year spline (details in Esper et al., 2003). HUG included growth curve functions with positive slopes; EXP excluded such functions but utilized the long-term mean instead. SPL included the application of a cubic smoothing spline with a 50% frequency-response cutoff for 300-year waveforms (details in Cook and Peters, 1981).

Re: Steve McIntyre (#14),
Re: romanm (#15),
Hi Steve and romanm,
I’ve suggested this before and here I go again. All these standardization approaches (the most common negative exponential, but also Hugershoff, RCS, and many more) are easily performed using the ARSTAN program freely available online through the usual dendro sites (also mathematical expressions and plots of all or a summary of the curves used are included in the program).

I know you prefer to perform these analyses using your own scripts but I suggest you at least try these options (with these data, if they become available, or with any other dataset) so you can be sure you are comparing “apples and apples” and see if your results match the original results/figures being discussed in your posts. In this case, I can assure you that all the options used by Esper can be performed/replicated in 5 minutes using ARSTAN.

Some concepts, e.g. standardization options in this new paper (and in many other dendro papers) may seem “obscure” but they are really not that complex: they have been discussed many times in the dendro literature.

[Steve: just because a reader – even Roman – doesn’t know the dendro practice, please do not assume that I don’t. I’ve carefully analysed what ARSTAN does and doesn’t do. And I’ve carefully benchmarked against ARSTAN]

I think you would make much more progress if you used more frequently and familiarized more deeply with the many tools that dendrochronologists have created and placed freely online for analyzing tree-ring data. For example, I remember reading in some past replies/comments to Rob Wilson’s notes that COFECHA was the program used to standardize some tree-ring data. This is a clear example where you are confusing the purpose of COFECHA with the purpose of ARSTAN in the process of building a tree-ring chronology. COFECHA is used to check the dating of raw, undated tree-ring measurements whereas ARSTAN is used to standardize and average the properly dated series into a site chronology.
Steve: Again this is a needlessly condescending remark as it applies to me. CA readers do not have the same experience as I do.

I am more interested in fully understanding the procedures used than I am in merely duplicating a single chronology. Running ARSTAN may not provide the necessaryinformation for evaluating exactly how the procedure works and will almost certainly not be easily changeable for the purpose of examining the effects of altering the procedures.

This sort of procedural analysis has been something I have done regularly in my professional capacity. A simple description (and these ARE mathematically simple procedures) is all that I wish to start from.

but I suggest you at least try these options (with these data, if they become available, or with any other dataset) so you can be sure you are comparing “apples and apples” and see if your results match the original results/figures being discussed in your posts. In this case, I can assure you that all the options used by Esper can be performed/replicated in 5 minutes using ARSTAN.

The reconciliation presentated some interesting numerical analysis issues which do not resolve clearly in favor of ARSTAN even given their own assumptions. For some trees in the cana036 dataset (interestingly also used in Melvin 2007), I got exact replication using nls. In some cases, ARSTAN quit too soon and I could get a neg exp fit that ARSTAN missed. In some cases, whether or not you got a fit depended on your starting point. And most remarkably, the first core fit by ARSTAN was always fit wrong.

Another advantage of using R is that I can write routines to do entire networks in a few minutes. ARSTAN requires manual handling of the data. It’s 25 year old programming for the most part and very cumbersome for the simple tasks undertaken.

Plus, like Roman, my interest is statistical. What statistical interpretation can be placed on dendro recipes? As I’ve mentioned elsewhere, I view their methods as “artesanal” – not “wrong”, but done for practical purposes with little understanding of statistical practice. I don’t say this to deprecate things, because the problems are interesting and difficult when you combine crossed random effects, heteroskedasticity, nonlinearity, non-normality, autocorrelation,…

Between your two posts I did a google search on the site and the first one I went to (which may have been the #1 hit), was the one in post 25. I like the way you start it:

I’ve already received my first condescending comments from the dendro world about the mysteries of standardization. Just to pre-empt some further pontification presuming that I know nothing about these mysteries, I’m posting up some notes that I wrote in 2004 on standardization – which was what I would have been working on had people just accepted the defects in the Mannian multiproxy articles at the time.

The trouble with having posted several thousand blogs is that people not regulars don’t know what’s been already discussed here. Hence the value of searching before complaining.

The curve created from the mean of ring width for each ring age is smoothed by using a suitable mathematical smoothing function (Briffa et al. 1992; Esper et al. 2003; Melvin et al. 2007) to create smoothly varying RCS curve values for each ring age.

Thus a single curve is calculated from the raw data with no prior manipulation of the individual trees ring sequences. That curve can be a spline or the negative exponential (discussed in the post One Size Fits All) among others.

The same growth functions can be applied on individual trees or on groups of trees from different sites and this appears to be what the other names mentioned refer to. I haven’t been able to track down an exact definition of what constitutes the Hugershoff method yet.

This result is important. It bears directly on Briffa’s recent argument (and the one parrotted by dabbler Deep Climate) to the effect that his Yamal chronology is not the anomaly; it’s anything that McIntyre generates that is the anomaly. We’ll see about that – if these data can ever be wrested from Esper.

Re: bender (#17),
The Schweingruber data is available. Instead of listing the series with known IDs, Esper listed sites in photo format with lat long information – which may or may not enable the Schwein sites to be tied down. As to the “new” data, I’m sure that it will the the usual run around if I pursue it.

Maybe it’s time to experiment with Aarhus Convention FOIs against European dendros.

“Negative deviations were inverted, combined with positive values, and decadally averaged.” sounds like a convoluted way of saying “average absolute value of adjustments in each decade” or “decadally averaged magnitude of adjustments”.

Is that a fancy way of saying that any negative signal was isolated from the rest of the signal, inverted, and added back in? That cannot be what it sounds like.