The "Full" Network

The “full” network isn’t.

Mann et al 2008 describe a “full” network consisting of 1209 proxies:

We made use of a multiple proxy (‘‘multiproxy’) database consisting of a diverse (1,209) set of annually (1,158) and decadally (51) resolved proxy series … Reconstructions were performed based on both the ‘‘full’ proxy data network and on a ‘‘screened’ network (Table S1)

The locations of the “full” network are illustrated in their Figure 1, captioned “spatial distribution of full (A) and screened (B)” network. Their Figure S7 compares “long-term CPS NH land (a) and EIV NH land plus ocean (full global proxy network) (b) with and without using tree-ring data”. Their Figure S8 compares “long-term CPS NH land (a) and EIV NH land plus ocean (b) reconstructions (full global proxy network)”. Their Figure S11 shows “plots of the NH CPS Land reconstruction (A) and EIV Land plus ocean (full global proxy network)”.

Table S1 shows breakdown counts by dendro and non-dendro, by annual-resolved and total, by NH, SH and Global. The top portion of Table S1 is excerpted here:

The problem is that their actual calculations appear almost certainly not to have used the network with the count statistics shown in Table S1, but a truncated version of that network, with their “full” network using only 663 (560 – NH) of the 1209 sites (1026- NH). Jean S has run the gridproxy.m program using the FULL option and so we have been able to “prove” this for this part of the program (CPS) through an actual run of the program. (I’m not sure that the EIV portion of the program can be run with present code.)

First here is a table showing counts that compare to the top right counts (NH all) for Table S1. The column headed Matlab shows the counts obtained from the gridproxy.m program, while the column headed “Reported” shows the corresponding column from Table S1. I’ll explain the calculation of the other 3 columns below (which I calculated from first principles. For now, you can see that my NH total in the third column matches the counts from gridproxy.m exactly in nearly every case and, in no case, differs by more than 1.)

NH Dendro

NH Other

NH Total

Matlab

Reported

421

139

560

560

1036

297

129

426

427

700

162

124

286

287

408

90

120

210

211

277

62

39

101

102

151

28

34

62

62

102

19

30

49

49

77

11

30

41

41

63

6

24

30

30

46

4

21

25

25

40

4

20

24

24

37

4

20

24

24

34

4

19

23

23

33

4

18

22

22

28

4

18

22

22

28

3

18

21

21

27

2

17

19

19

24

An Unreported Screening Procedure on the Full Network
The difference appears to result from an unreported screening procedure on the “full” network.

Mann’s SI does describe a screening procedure on his “screened” network:

Screening Procedure. To pass screening, a series was required to exhibit a statistically significant (P 0.10) correlation with either one of the two closest instrumental surface temperature grid points over the calibration interval (although see discussion below about the influence of temporal autocorrelation, which reduces the effective criterion to roughly P 0.13 in the case of correlations at the annual time scales)

The implementation of a procedure along these lines can be discerned in gridproxy.m in the case (confusingly) entitled “WHOLE”, but which means screened. At one stage of the program, correlation hurdles are defined for various proxy classes (using the proxy class codes to set the hurdle.) “Annual” and “decadally” resolved proxies have different benchmarks. Documentary and speleothems (5000,6000 series) have somewhat different benchmarks (corals – 7000 check for negative correlation). The relevant lines are:

Later in the program, they test the observed correlation for each proxy against the relevant benchmark (their code is very inelegant; they speak Matlab with a heavy Fortran accent). For the “screened” network, corra is 0.106.

If a series has a negative correlation to temperature (as about half of the tree ring series do), the test shown above will lead to the exclusion of this series. So this test either intentionally or unintentionally eliminates all the series with negative correlations to gridcell temperature – thus the reduction in NH count from 1036 to 560. Jean S got this count from running gridproxy.m and I got this number by independently comparing reported correlations to a benchmark of 0. So while it’s not impossible that I’ve misinterpreted something here, until we see some alternate explanation, I think that this interpretation should be regarded as valid.

Now this creates problems – both with the statistical significance of what was reported and with the accuracy of the methodological description in the PNAS article, both familiar themes in this corpus.

We’ve discussed on many occasions that you can “get” a HS merely from picking upward-trending series from networks of red noise (David Stockwell had a good note on this phenomenon on his blog a couple of years ago. My first experiments of this type were on the cherry picks in the original Jacoby network.) Since gridcell temperature series from the mid-19th century to late 20th century by and large have an upward trend, this test is equivalent to a simple pick operation. Whatever the merits of effectively screening the data set for upward trending data, this affects statistical benchmarks, as the null is applying the same sort of operation to red noise networks.

Secondly, whatever one may think of the Mannian PC algorithm (and defenders are reduced to a pretty hard core right now), few people defend the non-reporting of their modification of the PC algorithm and their failure to assess the statistical significance and properties of these changes. (Quite aside from whether they can “get” a HS using some other method.) We’re into something pretty similar here. Again there seems to be an unreported screening operation on the “full” network. The value of archiving code is once again demonstrated as the unreported operation can only be discerned through parsing code. Because Mann et al commendably archived code concurrent with publication, it hasn’t taken years to identify the issue. (I note that the precise identification of the problem with Mannian PCs also was identified when a little bit of code was inadvertently left in a directory identified for the public after MM03.) So code really does help pin down problems.

If the period for testing for correlation with temperature was something like 500 years, then maybe those with no correlation are “bad” proxies without it being potentially spurious (a “mining for hockey sticks” exercise), but even then negative correlations would be just as good as positive (just be careful with the sign). Using R2 they wouldn’t have made this error of thinking a negative correlation means no predictive power.

I guess I am bemused by the decision to use r> |.106| as the criteria for including a proxy series. Why not r > |.206|? Since r> |.106| leads to an r2 of 1%, which means that there are innumerable other factors – some likely to be more potent – contributing to the proxy metric. Without specifying these how does one go about reconstructing the past temperature record without necessarily including error bars that make the whole exercise problematic. I mean if you had two thermometers in the same grid cell that caliberated r=.10 would you use either of them? Or do I have the wrong end of the stick here?

#5. Also recall the pick two procedure for their calculation of correlations.

The other aspect to this is reverse engineering. It’s hard to rationally justify the purposes of many of these methods, but they all end up affecting weights. Some preliminary calcs by Jean S indicate that the weights are highly concentrated on a few proxies. Lucy and Charlie Brown one more time. So the issue will be whether this few proxies (bristlecones, Finnish sediments with Mann orientation) are wonder thermometers. Same old same old.

I tried sending you an email about this, but it seems to have become lost. There are quite a few dead links to climate2003.com on the CA homepage.

Thought you might like to know for cleanliness purposes.
Steve: That’s an old website. I got tired of paying for it when I had space at CA but need to spend some time posting the pages again under a new alter ego. So much to do, so little time.

If you want to prove that the mean climate in the past 1000 years was without structure, picking proxies with a r=.1 with climate is ideal. Take a bunch of proxies that explain barely more than 1% of the variation in temperature and average them and you get 0 for each point in time…a nice flat hockey stick handle.

#7. Hard to say. I don’t think that anyone has got the next portion to work yet.

There are any number of calculations which can be used to show the pointlessness of certain operations. At some point, you’d think that people in the trade have to take some responsibility for underwriting subprime proxies.

I have to mention that it is not just the picking process that results in creation of the hockey stick. It is the amplification process, an ideal scale and offset match the temperatures in recent times to the best possible extent which deamplifies historic data.

A simple picking process amplifies current data while averaging the remainder. If there is a real historic signal, it is on the correct scale. I demonstrated this effect on this post. The data was not amplified, just picked.

It is very important that the discussion doesn’t become, why not use better or higher correlation values. Any noise created by processes other than the one you are looking for will cause this effect. Higher correlation sorting of a set of data gives the same shaped result as low correlation with a noisier graph due to less series passing calibration.

Jeff, here are some posts by David Stockwell along similar lines. http://landshape.org/enm/2006/03/ (the pictures see, to be lost, I’ll check with him about this.) This was in part pursuant to passim remarks here. I’ll re-read your post and try to understand the interaction.

An elementary exercise that grad students have to go through is before collecting data see if the sample size they plan to collect with the expected variability will enable them to detect an effect of a given size (a power analysis). With a correlation of .1 as a cutoff, it is unlikely that any signal (such as MWP) would be detectable, no matter how many proxies are tossed in the hat.

So this test either intentionally or unintentionally eliminates all the series with negative correlations to gridcell temperature

It is intentional. They state in the main text:

Where the sign of the correlation could a priori be specified (positive for tree-ring data, ice-core oxygen isotopes, lake sediments, and historical documents, and negative for coral oxygen-isotope records), a one-sided significance criterion was used. Otherwise, a two-sided significance criterion was used.

So the first if (and the corresponding for the “decadal” proxies later) tests for positive correlation. This includes categories 9000 (tree-width), 8000 (ice-core oxygen isotopes? – yes), 7500 (MXD), 4000 (lake sediments), 3000 (composite??? – SM: 3000 – a few temperature reconstructions from tree rings unaccountably not classed with dendro; 3001- ocean sediments) and 2000 (Luterbacher). It is worth mentioning that four Lake Korttajärvi series are in the category 4000.

tests for negative correlation and is design only for the category 7000 (coral oxygen-isotope records?).

Now the third if for the categories 6000 (speleothems) and 5000 (what is this category??? [SM- documentary, 5000 mostly precipitation, 5001 mostly Chinese ] , it sure looks like the historical documents category… maybe 5000 and 4000 should be switched??) is the one for series with a priori unknown sign of the correlation. The interesting thing is the line right after

temp=x(kk:kkk,i+1)*sign(z(ia,i));

So although the sign of the correlation is not known, it is known a priori that the series with negative correlation should be flipped!

Where the sign of the correlation could a priori be specified (positive for tree-ring data, ice-core oxygen isotopes, lake sediments, and historical documents, and negative for coral oxygen-isotope records), a one-sided significance criterion was used. Otherwise, a two-sided significance criterion was used.

Steve, does this mean they expected a positive correlation for the Lake Korttajarvi sediments you mentioned in your “It’s Saturday Night Live” post? Are lake sediment studies normally positively correlated and the Korttajarvi sediments involved a different kind of data collection procedure which actually gives a negative correlation?

Steve: I can’t imagine that all the possible lake sediment parameters are known a priori to be positive.

I’ve made a few inline comments on #15. In my post, I said that 5000,6000 were speleothems and corals. Brain cramp. Corals are 7000 series, 6000 is speleothem, 5000 documentary.

I’m still not convinced that they intentionally shrunk the “full” network without reporting the screen and shrink. While their stats and commentary on the “full” network are incorrect, I suspect that this particular screen was inadvertent; otherwise, they are making intentional misrepresentation – something that I’m reluctant to think on these facts.

I’m sure that you’ve noticed that they use Mannian smoothing – wouldn’t it be nice to encounter one method that was used in conventional literature.

The function lowpassmin cycles through end point padding alternatives. I’ve experimented with 10-point Butterworth filters using the R-function from signal package and noticed very serious end point ringing for the Socotra and Dongge series. I’m not used to Butterworth filters and am not over-interpreting this, but the effect is quite large in a first pass

Butterworth basically just means “no ripple”. i.e. the frequency response falls off from a shelf into a nice smooth curve with no wiggling around. There are filters with steeper falloff but they have ripple in the “shelf” portion in exchange. There are also filters with less phase shift but they also have ripple.

Regarding the coding style: I think what Soronel was trying to say is that a good programmer uses named constants to make the code readable. In C you could do:

Electronics texts are not very helpful on the impact of these filters on noisy stochastic series, which are a different design issue.

THe issue that I’m looking is at end effects where the butterworth filter seems to really produce some spurious ringing effects at the ends of some series e.g. Dongge. But I’m also not used to these filters and there may be some differences between the R and Matlab implementations that’s producing an artifact. JEan S has done a Matlab calc on DOngge but I’m stuck for now on transposing it to R.

Given that the smoothing is probably not a good idea in the first place(see MAtt Briggs), spending time on potential end point issues resulting from Mannian smoothing is a very unedifying project.

He issue that I’m looking is at end effects where the butterworth filter seems to really produce some spurious ringing effects at the ends of some series e.g. Dongge.

I assume, from this comment, that you are interested in the time domain response of the Butterworth filer and not the frequency domain that previous posters have described. It has been a long time since my circuit lectures but your comment seems to be describing the impulse response of the filter. Perhaps more knowledgeable people cna comment but this is an inherent part of the filter design/ The ringing will change depending on the shape of the signal. A signal with an abrupt cutoff will have a perceptible ring. This can be heard on the audible ringing tone of a telephone system. The tone will be low pass filtered after D to A conversion. Depending on the state of the signal when it is cut off, a pop (ringing) mauy be heard. So the signal sounds like ring-pop

From an earth sciences perspective, a more important, underlying question is whether signal processing/filtering methods developed for electronics are, in fact, applicable to climatological data and analysis, both for real data and for proxies.

Also recall the pick two procedure for their calculation of correlations.

Last weekend I tried implementing the Mannian Pick Two procedure on my golf game: the course wasn’t busy, so I tried teeing off twice on each hole and playing the better of the two drives. This gave me a much better score than the stuffy conventional rules. The PGA should take note!

Can anyone give an overview of the “5000 – Documentary” proxies? Because the central European historians are pretty emphatic about the Medieval Warm Period, and it would be interesting to see if any of that information is in here in the first place.

From tracking wheat harvests through birthrates and taxes, there’s a fair chunk of records.

Mr. McIntyre : I think Don is right and you are indeed dealing with the impulse response properties of the filter.

I was just trying to help by describing what the term “Butterworth” actually means. What it describes is a frequency domain property (maximal passband flatness and no ripple). I realize this isn’t very helpful statistically but thought it would be good to at least know what the term means.

I am fairly sure that you are both right that a Butterworth filter will exhibit significant ringing after a step change. It’s arguable what the correct method is for using a Butterworth on a series which has finite length. The only 100% valid approach is to truncate the smoothed result by the filter width, so that all your output values are only involving valid input sample values.

If you’re not going to do that you’d have to, at the very least, replicate the end values to avoid injecting high frequencies into the filter near the ends. Step changes are not what this type of filter is designed to handle at all. But even that isn’t really a very clever thing to do, since you’re affecting the trend by having a dramatic loss of all frequency components of the signal near the ends. Neither is linear interpolation very clever, that fixes the first derivative but not the second derivative, etc.

So really where does it stop? You’d basically need a way to generate data at the ends which has the same spectral properties of the data itself but won’t affect the trend and append that. I’m still not sure that’s a totally valid solution but it’s the best I can come up with. I’d truncate it.

They go on to say that if you want minimal ringing after a step change you should use a Thomson filter. Maybe that’s the better way to go in this case. So if you’re going to do your own study, you might want to look into that. Won’t help for pre-existing studies using Butterworth though.

I’m amused by the topic of filter design having cropped up. It’s many years since I did any of this kind of thing and I never really understood the electronics. But I knew someone who did. His standard technique for testing analogue filters was to put a square wave into it, and look at the output on a scope. This allows you to see the impulse repsonse. A Butterworth filter would, if I recall, ring a bit. You’d get a high frequency spike that then decayed very quickly. Bessel filters wouldn’t. They were, in effect, critically damped.

I’m more than a little bit amazed to find this coming up in a discussion on statistics.

But I’m having real trouble imagining what applying such a filter to this kind of data means. If one believed that the climate was a collection of overlapping cycles of different lengths – then this technique could be used to remove the shortest cycles.

But proxies aren’t a continuous signal. They are a series of data points, and the variation between successive data points could be quite large – not disimilar to the change at an end point. If you are getting ringing at an enpoint are you also getting ringing within the dataset. This rather depends on the frequency of sampling (Nyquist?) also on what smoothing might already have been applied.

Steve: Take a look at the Dongge cave smoothing on the new thread. Is this the sort of effect that you’ve noticed:

I can say from my experience in a different field (audio engineering) that Butterworth filters are commonly used as low- or bandpass filters when “treating” noisy analogue sound recordings, because of their nice flat response in the frequency domain. In my own audio restoration work I am *not* using them any more however, as I find the VERY noticeable impulse distortion objectionable – a steep BW filter will turn a sharp audio impulse (no matter if caused by a scratched record or by a percussive instrument) into a declining sinusoidal wave of perceptible length, about 5 to 10 times as long as the original impulse, audible as a “ringing” noise like from a dampened plucked string. Definitely NOT what you’d want to use if – like with the temperature curves discussed here – you’d need to preserve the curve SHAPE rather than (a certain part of) its spectral properties! By now, there are linear-phase digital lowpass and bandpass filters available for audio and video use that avoid these defects. Sorry I have no idea how exactly these are implemented mathematically, but IMHO this would be a field where to look for better filtering/smoothing methods.

I’m starting a thread on Buterworth filters. Given that frequency fidelity is not an issue in these reconstructions (nobody argues for stable underlying cycles), as others observe, why use this particular method? Anyway, please take this to the other thread.