Ryan’s Tiles

Ryan O has produced a very interesting series of Antarctic tiles by calculating Steigian trends under various settings of retained AVHRR principal components and retained Truncated Total Least Squares eigenvectors (Schneider’s “regpar”). The figure below re-arranges various trend tiles provided by Ryan in a previous comment, arranging them more or less in increasing retained AVHRR PCs from top to bottom and increasing number of retained TTLS eigenvectors left to right. Obviously, in terms of any putative regional reconstruction, the results are totally unstable to what de Leeuw would describe as “uninteresting” variations of regpar and retained PCs.

I want to review two things in today’s note. First, the instability reminds me a lot of a diagram in Bürger and Cubasch GRL 2005, which built on our prior results. There’s something remarkable about the Bürger and Cubasch 2005 presentation that we’ve not discussed before. Second, I thought that it would be worthwhile to review what Steig actually said about fixing on PC=3 and regpar=3, in light of this diagram. We’ve touched on this before, but only in the context of varying regpar and not on the joint variation of retained PCs and regpar.

Figure 1. Collation of Ryan O Trend Tiles. I caution readers that I haven’t verified these results. However, Ryan has built on my porting to R of the Schneider RegEM-TTLS algorithm and has placed relevant code online, in keeping with the open source analysis that we’ve all been conducting during this project.

Bürger and Cubasch 2005
Buried in the Supplementary Information to Bürger and Cubasch 2005 is the following graphic which shows high early 15th century values under some “flavors” of MBH98 parameters – “flavors” corresponding to parameter variations with reduced bristlecone weights, which correspond to similar diagrams in MM05b (EE) especially.

Bürger and Cubasch SI Figure 2. SI readme says: This directory contains 3 additional Figures showing …, (2) the analysis for the MBH98-step for AD 1600 [ a different step than the AD1400 step discussed in MM05a,b] …[Figure] 1600.eps [shows] the 32 variants from combining criteria 1-5 (grey, with CNT=0), distinguished by worse (light grey) or better (dark grey) performance than the MBH98-analogue MBH (10011, black). Note the remarkable spread in the early 16th and late 19th century. [my bold].

This figure is not only not presented in the article itself; it is not even referred to in the running text, which refers to the Supplementary Information only as follows:

Figure 1 shows the 64 variants of reconstructed millennial NHT as simulated by the regression flavors. Their spread about MBH is immense, especially around the years 1450, 1650, and 1850. No a priori, purely theoretical argument allows us to select one out of the 64 as being the ‘‘true’ reconstruction. One would therefore check the calibration performance, e.g. in terms of the reduction of error (RE) statistic. But even when confined to variants better than MBH a remarkable spread remains; the best variant, with an RE of 79% (101001; see supplementary material1), is, strangely, the variant that most strongly deviates from MBH.

Bürger and Cubasch Figure 1 is shown below. While it is somewhat alarming for anyone seeking “robustness” in the MBH quagmire, they refrained from including or even referencing the diagram that would be perceived as giving fairly direct support of our work. I don’t blame Gerd Bürger for this at all; he cited our articles and has always discussed them fairly. In 2005, the mood was such that Zorita and von Storch felt that their ability to get their 2005 Science reply to Wahl and Ammann through reviewers would be compromised if they cited us in connection with bristlecones and MBH and discussed the issue without citing us (Zorita apologizing afterwards) even though we were obviously associated with the issue and they were well aware of this. In the Bürger and Cubasch case, the diagram was buried in the SI. (We have obviously been aware of this diagram and have used it from time to time, including our NAS presentation.)

Bürger and Cubasch 2005 Figure 1.

I apologize for the digression, but I think that there are some useful parallels between the non-robustness observed in Bürger and Cubasch 2005 and in Ryan’s tiles. The reason for such instability in the MBH network was the inconsistency between proxies – an issue that we referred to recently in our PNAS Comment on Mann et al 2008, where we cited Brown and Sundberg’s calibration approach to inconsistency – something that I’ll return to in connection with Steig.

Regpar and PC=k in Steig et al 2009
On earlier occasions, the two Jeffs, Ryan and I have all observed on the instability of trends to regpar choices, noting that the maximum for the overall trend occurred at or close to regpar=3. It was hard to avoid the impression that the choice of regpar=3 was, at best, opportunistic. Let’s review exactly how Steig et al described their selection of regpar=3 and to their selection of PC=3.

In the online version of their article (though not all versions), they say (links added by me):

We use the RegEM algorithm [11- T. Schneider 2001], developed for sparse data infilling, to combine the occupied weather station data with the T_IR and AWS data in separate reconstructions of the Antarctic temperature field. RegEM uses an iterative calculation that converges on reconstructed fields that are most consistent with the covariance information present both in the predictor data (in this case the weather stations) and the predictand data (the satellite observations or AWS data). We use an adaptation of RegEM in which only a small number, k, of significant eigenvectors are used [10 – Mann et al, JGR 2007]. Additionally, we use a truncated total-least squares (TTLS) calculation [30 – Fierro et al 1997] that minimizes both the vector b and the matrix A in the linear regression model Ax=b. (In this case A is the space-time data matrix, b is the principal component time series to be reconstructed and x represents the statistical weights.) Using RegEM with TTLS provides more robust results for climate field reconstruction than the ridge-regression method originally suggested in ref. 11 for data infilling problems, when there are large differences in data availability between the calibration and reconstruction intervals [10 – Mann et al, JGR 2007]. For completeness, we compare results from RegEM with those from conventional principal-component analysis (Supplementary Information).

The monthly anomalies are efficiently characterized by a small number of spatial weighting patterns and corresponding time series (principal components) that describe the varying contribution of each pattern… The first three principal components are statistically separable and can be meaningfully related to important dynamical features of high-latitude Southern Hemisphere atmospheric circulation, as defined independently by extrapolar instrumental data. The first principal component is significantly correlated with the SAM index (the first principal component of sea-level-pressure or 500-hPa geopotential heights for 20S–90S), and the second principal component reflects the zonal wave-3 pattern, which contributes to the Antarctic dipole pattern of sea-ice anomalies in the Ross Sea and Weddell Sea sectors [4 – Schneider et al J Clim 2004; 8 – Comiso, J Clim 2000]. The first two principal components of TIR alone explain >50% of the monthly and annual temperature variabilities [4 – Schneider et al J Clim 2004.] Monthly anomalies from microwave data (not affected by clouds) yield virtually identical results [4 – Schneider et al J Clim 2004.]

Principal component analysis of the weather station data produces results similar to those of the satellite data analysis, yielding three separable principal components. We therefore used the RegEM algorithm with a cut-off parameter k=3. A disadvantage of excluding higher-order terms (k > 3) is that this fails to fully capture the variance in the Antarctic Peninsula region. We accept this tradeoff because the Peninsula is already the best-observed region of the Antarctic.

Virtually all of the above is total garbage. We’ve seen in earlier posts that the first three eigenvector patterns can be explained convincingly as Chladni patterns. This sort of problem is long known in climate literature dating back at least to Buell in the 1970s – see posts on Castles in the Clouds. “Statistical separability” in this context can be demonstrated (through a reference in Schneider et al 2004 (by two coauthors) to be the separability of eigenvalues discussed in North et al (1982). Chladni patterns frequently occur in pairs and may well be hard to separate – however, that doesn’t mean that the pair can be ignored. The more salient question is whether Mannian principal component methods are a useful statistical method if the target field is spatially autocorrelated – an interesting and obvious question that clearly is not the horizon of Nature reviewers.

Obviously the above few sentences fall well short of being any sort of adequate argument supporting the use of 3 PCs. In fairness, the use of 3 PCs seems to have been developed in predecessor literature, especially Schneider et al JGR 2004, which I’ll try to review some time.

However, the regpar=3 decision does not arise in the earlier Steig Schneider literature and is entirely related to the use of Mannian methods in Steig et al 2009. The only justification is the one provided in the sentence cited above:

Principal component analysis of the weather station data produces results similar to those of the satellite data analysis, yielding three separable principal components. We therefore used the RegEM algorithm with a cut-off parameter k=3.

This argument barely even rises to arm-waving. I don’t know of any reason why the value of one parameter should be the same as the other parameter. It’s hard to avoid the suspicion that they considered other parameter combinations and did not consider combinations that yielded lower trends.

25 Comments

This was interesting (including the digression, especially in light of the “caspar and the jesus paper” post I just read).

I’m just curious about the following:
In the previous post(s) about PC retention the suspicion of opportunistic choice of parameters came out very strong.
Now the above picture of tiles shows some even more “auspicious” parameter combinations (assuming all tiles are on the same scale)

My question therefore is whether you’d have a hunch why they didn’t choose a combination yielding an even higher trend than 3/3.
Is it just getting too instable “out there”? – see 7/9 strong cooling, 7/10 strong warming…
Or too hard to justify an otherwise arbitrary choice of parameters?

Can the tiles be shown across the the two-parameter space with the PC value on the x axis and regpar on the Y? Heck, fit a model to the surface of different higher-order outputs derived from the maps (e.g., net cooling or warming, variability, homogeneity). Is there any relationship between the two parameters?

The sign instability in the above figures is a function of using a PCA variant of RegEM. The PTTLS version (Tapio’s version and the one used by Steig) does not cause as many sign flips; however, the absolute magnitude of the artifacting is the same. I chose the PCA ones for the panel because the artifacting is obvious, but it would probably not be appropriate for a publication because Steig didn’t use the PCA version – he used PTTLS. The artifacting in PTTLS is just as real, but not as easy to see because the signs of the PCs don’t flip as frequently.
.
The adaptation of the PCA version – which approaches the final eigenvector number iteratively – is by far the most stable of the three. It doesn’t cause the artifacting you see in the panels above until you get to regpar=15 or so. The IPCA version is the one I used for the reconstruction in the previous topic.
.
One thing I ought to point out is that when the artifacting begins occurring, it’s quite obvious. There is a step change in the output from a non-artifacted regpar setting and an artifacted one. It is easiest to see this in the ground station predictor matrix in the individual station time series. Non-artifacted runs have about the same variance in the imputed time periods and the original data time periods. Artifacted runs have a massive difference in variance. This artifacting may only occur on one or two of the series, so you have to look at all of them (it also shows up in RE/CE and r statistics as a step change).
.
I tabulated these results over at the Air Vent, but it’s probably pertinent here as well:
.

.
Another pertinent point is that if you use too low of a regpar, the imputed variance ends up being much smaller than the variance of the actual data. This, too, is quite obvious. Many of the imputed station records in the ground station predictor matrix using Steig’s method show virtually a flat line pre-1982 (same with some from the AWS recon). The interesting thing is that examination of artifacting in the ground station matrix leads you to the same conclusion as the PC selection rules like broken stick.
.
Re: Paul29 (#2), There’s not really a relationship like what you’re looking for. The flipping of the PCs is almost random, and which ones flip depends on the stations you use and the RegEM variant you use.

Re: Axel (#4), Before I started doing this, I thought so, too. But as the case with everything, there are subtleties that I didn’t realize until after having done a bunch of reconstructions.
.
Your followup actually asks two separate questions, so I will rephrase them and answer them separately. First:
.
1. Do more satellite PCs improve accuracy?
.
Yes. Including more satellite PCs definitely results in an increasingly faithful reproduction of the original satellite data. Around 7 PCs or so the Peninsula warming begins to properly resolve to the Peninsula rather than West Antarctica. This resolution continues to improve until you get ~20 or so PCs. After that, there’s virtually no difference in the plots as the number of included PCs increases.
.
There is, of course, a tradeoff. The more PCs you include, the longer RegEM takes to converge. So the objective is to include enough PCs to obtain a physically useful result that is still calculable.
.
2. Does a higher regpar setting improve accuracy?
.
To a point. Setting regpar too low, of course, does not provide enough information to obtain a physically useful result. Increasing the regpar setting will resolve this problem.
.
Unfortunately, you can’t just keep increasing the regpar. There are two reasons for this:
.
a. As you increase regpar, the resolution gets so fine that the initial infill for the missing values is “retained” by the algorithm rather than being “forgotten”. Indeed, if you set regpar equal to the full rank of the input matrix, it converges in one iteration and simply returns the original values plus the values of the initial infill.
.
b. Due to the short record length of many of the ground stations, setting regpar too high allows the solution matrix to be too “flexible” – causing spurious correlations between stations, PCs, noise, and the initial infill. This is the cause of the strange patterns you see in the panels.
.
There is a lot of discussion in the literature about PCA, including discussion about false “teleconnections” due to spurious correlations, especially when record lengths are sparse. There’s a discussion relevant to this topic in this paper:
.ftp://ftp.geosc.psu.edu/dbr/Publications/200507_Polar_Geography.pdf
.
The North 1982 paper on PCA also has a lot of salient points:
.http://www.icess.ucsb.edu/gem/North_et_al_1982_EOF_error_MWR.pdf

Steve,
.
I figured my musings on WA & AW are more appropriate here than the first thread, since the intent of this seems to be more of a discussion about PCA and divergent reconstruction results rather than Antarctica specifically. Anyway, I’ve read both WA and AW (I hadn’t read AW previously), as well as Wegman, MM03, MM05a&b, and Huybers. Slightly OT and entirely irrelevant, but I have to say that both WA & AW were unnecessarily obtuse. I have many thoughts, but only enough time for one of them for the present.
.“Convergence” argument. I have to admit I do not fully understand their argument. To me, they appear to be conflating unrelated issues: a) “Convergence” of the representation of the underlying data; and, b) “Convergence” of the reconstructions to the “right” answer. They also seem to later to further conflate this by justifying the inclusion of suspect tree ring series because their exclusion does not yield the “right” answer and must therefore contain important climatological information – which is not even subtly circular; it’s blatantly circular.
.
For a): Q.E.D. Each successive PC includes more and more information about the underlying data set. Inclusion of all PCs returns the original data set. I would agree that enough PCs must be included to provide a faithful enough resolution of the data such that subsequent analysis returns meaningful results. Including more may make the problem computationally difficult, but should not result in significant changes to the results.
.
For b): This one required a valueless series of cross-referencing to figure out. The basic idea seems to be that they iteratively include PCs until they get the original MBH reconstruction – and then stop. The argument would have merit, except:
.
(1) They do not continue iteratively adding PCs to see if the results change further. This is equivalent to a priori determining the right answer and then continuing to add information until the right answer is achieved. However, I suspect that the answer wouldn’t change even if they did include more because:
.
(2) I could find no reconstruction that they performed that excluded both the Gaspe series and all bristlecone/foxtail pine series that was able to produce anything like MBH. To obtain the “right” answer, they had to include at least some mixture of these.
.
(3) They justify this apparent violation of the convergence rule by ex post excluding non-converging reconstructions on the basis of poor RE statistics. There is no statistical justification for doing this. It’s cherry-picking the results. It is also a problematic assertion to make, as it would imply that the remainder of the series used in the reconstruction do not contain important climatological information, and, therefore, should not be included.
.
In summary, I feel the “convergence” argument inappropriately uses a by definition property of a PCA representation of a data matrix as an analogue to the reconstruction results by imposing an ex ante benchmark for the result.
.
I also have some additional thoughts about (3) above, but I think I will put those in one of the WA threads since it doesn’t really relate to PCA at all.

Re: Ryan O (#9), also in the mix is the NAS panel statement that strip bark (bristlecone/foxtail) be “avoided” in reconstructions. Mann et al 2008 purported to respond to NAS but once again used Graybill bristlecones. We criticized them for this, Mann’s reply purported to justify their inclusion in Mann 2008 on the basis that Wahl and Ammann was subsequent to the NAS panel and superceded them (tho they were considered in preprint by both NAS and Wegman.) The whole thing is really quite bizarre.

Re: JimR (#12), You need to read caspar-and-the-jesus-paper.
To cut a long story short, the paper didn’t technically make the deadline, but I would guess that they argued to themselves that it would have done so if it wasn’t for those pesky nitpickers and so made special dispensation.

Re: DaveR (#13), I think that JimR is well aware of the history and his point here is actually a rather nice piece of judo, as the timing is quit relevant.

The NRC/NAS panel stated that strip=bark should be “avoided:

While ‘strip-bark’ samples should be avoided for temperature reconstructions, attention should also be paid to the confounding effects of anthropogenic nitrogen deposition (Vitousek et al. 1997), since the nutrient conditions of the soil determine wood growth response to increased atmospheric CO2 … (Kostiainen et al. 2004)

Mann et al 2008 purported to have applied NRC recommendations, stating (and this was not an incidental point as they refer to the NRC panel on several occasions):

We were guided in this work by the suggestions of a recent National Research Council report (35) concerning an expanded dataset, updated data, complementary strategies for analysis, and the use of thoroughly tested statistical methods.

However, notwithstanding the NRC position on strip bark (bristlecones) and bristlecones were hardly an incidental point in the debate, Mann et al 2008 used Graybill bristlecone chronologies anyway. We noted this in our PNAS comment as follows (recall that there are only 250 words):

Although Mann et al. purport to ‘‘follow the suggestions’ of ref. 5, they employed ‘‘strip-bark’ dendrochronologies despite the recommendation of ref. 5 that these chronologies be ‘‘avoided’

Mann’s reply was:

Finally, McIntyre and McKitrick misrepresent both the National Research Council report and the issues in that report that we claimed to address (see abstract in ref. 2). They ignore subsequent findings (4) [Wahl and Ammann :)] concerning ‘‘strip bark’ records

In climate science, unfortunately, no more credence can be attached to this sort of published statement than to a blog posting at realclimate.

First, Mann’s claim that we “misrepresented” the NRC panel is obviously untrue. They used the word “avoided” and we quoted them. It’s black and white.

Mann here takes the position that Wahl and Ammann (2007) was “subsequent” to the NRC report. Here are some relevant dates:

June 2, 2006: IPCC deadline for expert and Government comments on second draft. As an IPCC reviewer, I observed that Wahl and Ammann, referred to by IPCC, was then only in preprint, citing a rejected companion article, and did not meet IPCC publication deadlines (in print and referenceable in permanent version by end Feb 2006).

June 16, 2006 – IPCC distributes comments to Chapter Lead Authors

June 22, 2006 – NRC/NAS report released. Press release issued.

June 26-28, 2006 – Fourth Lead Author meeting, Bergen, Norway considers comments on the second order draft and revisions to produce the final draft start immediately afterwards.

“In preparing the final draft of the IPCC Working Group I report, Lead Authors may include scientific papers published in 2006 where, in their judgment, doing so would advance the goal of achieving a balance of scientific views in addressing reviewer comments. However, new issues beyond those covered in the second order draft will not be introduced at this stage in the preparation of the report.

Reviewers are invited to submit copies of additional papers that are either in-press or published in 2006, along with the chapter and section number to which this material could pertain, via email to ipcc-wg1@al.noaa.gov, not later than July 24, 2006. In the case of in-press papers a copy of the final acceptance letter from the journal is requested for our records. All submissions must be received by the TSU not later than July 24, 2006 and incomplete submissions can not be accepted.”

AUGUST

Annotated responses to all comments on the second draft need to be completed and sent to the TSU by CLAs by August 4

Martin Manning had sent an instruction to reviewers in a pdf file dated 1 July 06 saying,

“In preparing the final draft of the IPCC Working Group I report, Lead Authors may include scientific papers published in 2006 where, in their judgment, doing so would advance the goal of achieving a balance of scientific views in addressing reviewer comments. However, new issues beyond those covered in the second order draft will not be introduced at this stage in the preparation of the report.

Reviewers are invited to submit copies of additional papers that are either in-press or published in 2006, along with the chapter and section number to which this material could pertain, via email to ipcc-wg1@al.noaa.gov, not later than July 24, 2006. In the case of in-press papers a copy of the final acceptance letter from the journal is requested for our records. All submissions must be received by the TSU not later than July 24, 2006 and incomplete submissions can not be accepted.”

July 19, 2006 – First day of Hockey Stick hearing at House Energy and Commerce Committee

On July 24, 2006, I sent the following notice to IPCC WG1:

The following two peer-reviewed reports are relevant to chapter section 6.6. Both have been peer-reviewed, although neither appears in journals.

Wegman et al 2006 is published at
energycommerce.house.gov/108/home/07142006_Wegman_Report.pdf. The peer review process was described in testimony to the House Energy and
Commerce Committee.

North et al 2006 is published by the National Academy of Sciences.

Both reports confirm that an incorrect and biased principal components method was used in Mann et al 1998. The NAS panel additionally reported that bristlecones should be “avoided” – a criticism prominently made in McIntyre and McKitrick 2005a, 2005b. The NAS panel agreed that Mann et al 1998 failed some verification statistics and withdrew confidence interval claims attached to this reconstruction.

They stated that verification period residuals should be used in estimating conifdence intervals – as opposed to calibration period
confidence intervals used in the IPCC SOD (a criticism that I previously made).

Both are important studies and need to be carefully assimilated.

As JimR observes, if Wahl and Ammann was “subsequent” to the NRC panel report of June 22, 2006, it did not meet IPCC deadlines. In practical terms, it neither met IPCC deadlines nor was subsequent to NRC. NRC had a preprint of Wahl and Ammann – I sent it to them and they discussed it. Wegman also discussed a preprint, observing it had “no statistical integrity”. As at the IPCC deadline, it was still not accepted in a final form, as the publication version cites an article (Ammann and Wahl 2007) that was not even submitted until August 2006.

Unfortunately you will just continue to get buried in new submissions. In another 20 years or so after the US has wrecked its economy it will be realized that the warning signs had been present all along, just like the intel leading to the Iraq war.

I’m amazed that “TCO” has popped up with his usual “Publish” statement now.
And he’d probably be right, a joint paper from yourself, Ryan O & 2xJeffs on the issues surrounding the use of these nifty mathematical doo-dahs, with Steig et al as an “example”.

Re: Adam Gallon (#20), A comment to Nature would certainly seem appropriate. Doesn’t need to be snarky (guaranteed not to get published if it is). Just point out that the trend (and spatial trend patterns) depend critically on the choice of algorithm parameters, and that Steig’s justification for their choice is probably invalid: the first three principal components are likely just to be Chladni patterns, not “important dynamical features of high-latitude Southern Hemisphere atmospheric circulation”. Throw in a few pretty tiles and it would be a compelling counter.

Better watch out about the comment to Nature: they’re liable to say that it’s been too long for this to be considered.

It’s getting to where a new variation of Climate Science is being born, that of Statistical Climate Sciences. It’s being proven, again and again, that the choice of PC’s and regpars have an effect on the results, causing drastic changes with the same data being used.

Considering that the “Team” has stated they’re not statisticians, and have refused help from mainstream statisticians, then some statisticians need to step forward and post articles in their OWN peer-reviewed journals. They’re not being heard in the “climate” journals…

Just a thank you to the statisticians here for helping a plumber understand that PCs and regpars can make data say whatever you want it to. Why can a plumber understand this simple truth and the preeminent climate scientists and scientific journals cannot?
Thanks,
Mike Bryant
PS Climate Scientists should be ashamed.

You’re right. Data decomposition and reconstruction by principal components goes badly awry whenever the premises of the Karhunen-Loeve expansion, which underlies all PC methods, are forgotten. The most fundamental premise is some well-defined spatio-temporal frame over which there are multiple data sets. The whole purpose of the KH expansion was to reduce these data sets to more economical, more-easily intelligible metrics.

It is one thing to obtain the natural eigenmodes of tidal oscillation of some basin such as the Black Sea from partially incomplete data. It is quite another to try to determine the temperature record throughout Antartica, which is not spatially homogeneous, has no natural time frame apart from the diurnal and annual, and lacks adequate data, to begin with. Sadly, the mistaken idea that PC methods cure all data deficiencies has taken hold in some quarters and often leads to the travesty of highly arbitrary results so clearly displayed by Ryan’s tiles.

Whenever PCs are used as a time-series analysis tool over time-frames that are not naturally defined, it should arouse the suspicion that someone is trying to claim more information than is truly available in the data.

[…] Written by: lucia var addthis_pub=”4a117063171aab5f”; How many readers love the idea of RyanO and the two Jeffs assembling their analyses to submit a comment to Nature? How many love the notion of me assembling […]

[…] Ryan Tiles Climate Audit Posted by root 35 minutes ago (http://www.climateaudit.org) Martin manning had sent an instruction to reviewers in a pdf file dated 1 july 06 saying and he 39 d probably be right a joint paper from yourself ryan o amp 2xjeffs on better watch out about the comment to nature they 39 re liable to say that it been too Discuss | Bury | News | Ryan Tiles Climate Audit […]