Is Kaufman 'Robust'?

A common meme in Team-world these days is that any issues or errors are minor and that none of them “matter”. As we peel back the layers of Kaufman et al, this is the first line of Team defence.

The rhetorical impact of Team reconstructions largely derives from the modern-medieval differential: is it in the red or is it in the black?

Thus when one sees study after study which has modern-medieval differentials that are always just slightly in the black, any prudent analyst would arch his/her eyebrow slightly and examine any accounting policies that may have contributed to getting the result in the black. (I use the term “accounting” intentionally, since the term implies that there be policies for the inclusion/exclusion of particular data sets and their truncation.) And let there be no doubt: when one is dealing with CPS reconstructions with very small data sets (20 or so), it is quite possible to affect the differential through a small subset of the data.

Sometimes the accounting exceptions look innocent enough e.g. MBH’s unique extension of Gaspe tree ring series back to 1400 from 1404. However, such variations in accounting policy invariably seem to enhance HS-ness in the composite and each such variation needs to be examined.

Obviously there are a few methods that I’ve learned to look for: does the study use Graybill bristlecones? Does it use Yamal? Does it use upside-down Tiljander? Are there any truncations or extensions of the series? Is the most modern version of the series used?

I noticed almost instantaneously that Kaufman used Yamal and upside-down Tiljander (however mitigating the impact of upside-down Tiljander by truncating it at 1800). He used two other Finnish series, both of which are, as far as I can tell right now, used in an orientation upside-down to that proposed by the original authors. Kaufman truncated the Blue Lake varve series because of supposed non-temperature inhomogeneity in the early portion of the series, but didn’t truncate the later portion of the Loso Iceberg Lake varve series where there was a definite inhomogeneity. Kaufman appears to have used an old version of the Hallet Lake series (which was replaced over a year ago in Nov 2008 at NCDC – otherwise, the inconsistencies between the Kaufman version and the NCDC version are inexplicable.) In addition to Yamal, Kaufman used two other Briffa versions, while not using seemingly plausible tree ring series at Tornetrask (Grudd) and Indigirka, Yakutia.

I’ve done a quick sensitivity analysis in which I’ve done a CPS average (980-1800 base) with the following variations:

1. Current version of 2- Hallet Lake and non-truncated version of 1-Blue Lake. (I haven’t checked whether this “matters”, but there didn’t seem to be any overwhelming reason to use an obsolete version of Hallet Lake or a truncated version of Blue Lake.)
2. The three Briffa series (Yamal, Tornetrask-Finland, Taymyr-Avam) are replaced by Polar Urals (Esper version), Tornetrask (Grudd version) and Indigirka (Moberg version). (I think that this is the sensitivity that carries the water here and my guess is that most of the difference arises from the Briffa data. I’ve provided materials that make this easy for anyone interested to check.)
3. The three Finnish proxies are used in the orientation of the original authors i.e. flipped from the Kaufman version.

Here’s the result. Obviously there is a lot in common in the general appearance of the two composites – the difference between the two is that there is nothing “unprecedented” about the 20th century in the latter case.

Top – CPS average of 23 Kaufman proxies; bottom – variation as described above.

One can calculate the relative contribution of each accounting decision to the change in appearance. The largest contribution comes from Yamal versus Polar Urals. Each accounting decision has some impact on the modern-medieval differential. I’ve uploaded data sets and scripts so that interested readers can experiment for themselves and will attach an explanation script in the first comment.

I do not claim that the bottom graph is more reasonable or less unreasonable than the first graph. My point here – as on many other occasions – is that just calling something a “proxy” doesn’t mean that it is a “proxy”. The sensitivity of the modern-medieval differential to different roster selections means that the data is not consistent enough to yield a “robust” result.

Regarding the unpreprecedentedness of the calibration range. There is of course the marked lack of centennial scale variance in the pre-calibration range in comparison to the high slope in the instrument calibration range ala-VonStorch 04 and of course (Id 09) — no need to add more to that..

It’s a nice demonstration of your earlier point about which series create the curve.

Next download information about proxies (previously collated from SI and elsewhere), the archived version of Kaufman’s decadal values and original data versions that I’ve located (experimenting to match Kaufman decadal where I could):

one can do a variety of sensitivity plots if one wants to “disaggregate” the effect of each accounting decision. Some “matter” more than others. I’ve illustrated the effect of the group of alternatives in the figure, but provided materials for interested readers to examine sub-variations if they want.

And what should be published, a primer on doing basic sensitivity testing?

The intro to this thread, in my mind, and despite other deficiencies revealed here about Kaufman (2009), is point, set and match. It will not be published, but well could discourage Kaufman from commenting here – and not because of any real or perceived impoliteness.

I agree, a beautifully simple – and devastating comment. I do not see any reason that this should not be published. There are no deep statistical issues here but Steve has a unique expertise with the underlying data, and it is important to expose in the literature the arbitrary choices and manipulations leading to yet another HS in Kaufman (2009).

As Steve has more than adequately demonstrated Kaufman’s choice of proxies is most definitely NOT ‘arbitrary’. Rather as Steve sensitivity study shows, Kaufman’s choices have been designed to attempt to resurrect the Team’s mantra that the (claimed) warming trend towards the end of the 20th century is ‘unprecedented in the last 1000 years’. Following the criticism the Team received post the NSF and Wegman reports, this is the Team’s latest attempt to re-establish the hockey-stick for the IPCC sans Graybill bristlecone pines – and all nicely timed to maximise publicity in advance of Copenhagen in December. Now all we have to do is to keep searching the internet for any folders on FTP servers labelled CENSORED!

And what should be published, a primer on doing basic sensitivity testing?

Obviously that is what is required because neither the authors nor the reviewers seem to have have any clue about it and yet the term “robust” is routinely trotted out simply because other studies with the same sensitivity issues back up the first! This state of affairs will continue until these issues are out in the open and scientists wise up.
.
IMO, a review article should be published on the state of the climate multi-proxy reconstruction field. The various “tricks” of the trade need to be revealed and the impact they have on the data clearly displayed. This should include, maybe in a separate article, guidelines for best practice. I realise this would be a lot of work, but Steve already seems to have published a lot of the content on this site already.
.
The advantages in such an approach would appear to be that, not only do you rebut all past reconstructions using shoddy methodology, but you also rebut all future reconstructions which continue to use such methods.

Once again the unreliability of proxies as a method for accurate climate records is exposed. One would think that scientists would stop creating papers like this… no, on second thoughts you wouldn’t it’s easy “science” to do an easy to get published. One is reminded of medieval science when experiments were frowned upon and the way to do Natural Philosophy was to examine the texts of History and write erudite commentaries ( albeit in Latin not R!! ) Then along came Pascal, Bacon, induction and the scientific method

If Irving Langmuir were still with us today, he would have a field day with these reconstuctions as instructive examples of his ‘pathologcal science.’ In particular his symptom #2 (of 6) would seem to apply:

2. ‘The effect is of a magnitude that it remains close to the limit of detectability, or many measurements are necessary because of the very low statistical significance of the results.’

However this doesn’t quite fit. He might need to propose a new symptom (#7):

‘The effect is of such exquisite sensitivity that almost any other choice of data or method of data analysis eliminates the effect.’

Langmuir’s assumption was that pathological science was the result of honest but misled scientists, whose desire to believe overcame their scientific judgement. Had he envisioned the current state of climate science, his symptom #6 would have read as follows:

6. ‘The ratio of supporters to critics rises up to somewhere near 90%* and then falls gradually to oblivion.’

This state of affairs will continue until these issues are out in the open and scientists wise up.

The issues are out in the open and have been for some time – as far as I know CA and the papers Steve has produced are available around the globe. Without doubt CA is read (and on occasion contributed to) by the authors of the questionable studies.

The question isn’t about “wising up”, it’s about “acting”: When are they, and the scientific journals and institutions, going to put their house(s) in order? And if they aren’t going to do it who is?

Re: curious (#15), When they can no longer feign ignorance by pretending each successive study doesn’t fall foul of the same methods previously criticised. A similar reformation is currently ongoing in the field of cognitive neuroscience after a review looked at the state of play and found methods wanting compared to the claims made.
.
The editor of the Journal of Cognitive Neuroscience wrote this in an editorial:

Unrealistic, financially motivated claims about functional brain imaging can have a negative impact on society at large and on our field. If too much is promised and not delivered, funders may become wary of cognitive neuroscience and skeptical about its genuine potential. Bad advice given to businesses concerning marketing and personnel selection could lead to expensive mistakes, and bad advice given to governments concerning security screening and interrogation could lead to far worse. Yet imaging is being offered for these applications now, with scant evidence of validity.

The parallels between a “financially motived” cognitive science field and a “politically motived” climate field are clear and the sloppy science both turn out to make a point are just symptoms of that, but at least in neuroscience, the downsides of doing sloppy science are being recognised for the damage they can do to the field.

It would be nice to have the code and data. I doubt that the Yamal builders want that info public. By its shape, its pretty obvious there has been a large scale sorting and removal of data which didn’t make the consensus cut.

I used the data you linked above and ran a curve without Yamal and a couple of other varieties this morning. Just removing Yamal makes 400 AD peak slightly above all of the rest.

You guys are jumping the gun a little bit. That the recon is not robust does not mean the central conclusion of the paper is wrong.
.
You are going to have to dig a little deeper, because the main claim in the paper is that (1) there was a fairly smooth cooling trend from 1AD to 1900AD (caused by orbital forcing), and (2) this trend reversed suddenly in the 20th c. despite the continued cooling that should have been expected from the orbital forcing.
.
Even if the historical recon is completely wrong, you still have the GCM output to refute. (It’s the fact that the recon matches the GCM output (and the fact that both exhibit very little noise) that makes the past cooling trend so compelling, and the current warming reversal so surprising (OMGIWTWT).) i.e. If the GCM is correct and the proxies are spurious junk, then the paper’s conclusion is still correct.
.
GCM component aside, the key questions are (1) how the calibration exercise leads to a strong match to the modern instrumental record and (2) how the reconstruction step leads to such a smooth, noise-free cooling trend 1AD-1900AD.
.
Take Steve M’s alternative curve above and re-run the Kaufman analysis and I doubt the conclusion will change. Because the primary result in this paper is not the exact shape of the recon. As long as there is a long slow cooling trend and a 20th c. reversal, the primary conclusion will hold.
.
To really refute this paper is going to require a deep analysis of the fundamental validity of these “proxies”.
.
Note: nothing I say here invalidates any of Steve M’s statements. I am just contextualizing them.
.
There’s more to say, but I don’t have a lot of time right now.

“main claim in the paper is that (1) there was a fairly smooth cooling trend from 1AD to 1900AD (caused by orbital forcing), ”

The following suggests the opposite.

“What does The Milankovitch Theory say about future climate change?
Orbital changes occur over thousands of years, and the climate system may also take thousands of years to respond to orbital forcing. Theory suggests that the primary driver of ice ages is the total summer radiation received in northern latitude zones where major ice sheets have formed in the past, near 65 degrees north. Past ice ages correlate well to 65N summer insolation (Imbrie 1982). Astronomical calculations show that 65N summer insolation should increase gradually over the next 25,000 years, and that no 65N summer insolation declines sufficient to cause an ice age are expected in the next 50,000 – 100,000 years ( Hollan 2000, Berger 2002).”http://www.ncdc.noaa.gov/paleo/milankovitch.html

Not necessarily. The time scales of the Kaufman premise (20th century Arctic should have been continued cooling) and the NOAA prognosis that you quote (25, 50, 100Ky) are very different. One is in the recent past, the other is in the far future, 5-50 times longer than the 2Ky study period. If you think Kaufman’s GCM run is bunky you’ll have to prove that using more direct means than a generic text from the web.

#14 DavidJR: “IMO, a review article should be published on the state of the climate multi-proxy reconstruction field. The various “tricks” of the trade need to be revealed and the impact they have on the data clearly displayed. This should include, maybe in a separate article, guidelines for best practice. I realise this would be a lot of work, but Steve already seems to have published a lot of the content on this site already.”

I agree with the above. Rather than taking on one paper at a time in a direct adversarial manner, a “review” paper on “observed pitfalls and traps” of reconstructions would IMO be a useful paper.

SM has lots of problematic papers to choose from. SM already has a lot of the code and a lot of the databases put together, so an example of best practices on openess of methods should be easy. Jeff Id as co-author would bring to the party his turnkey R programs that use Mann’s own database to make any version of hockeystick that one wants. (http://noconsensus.wordpress.com/2009/06/20/hockey-stick-cps-revisited-part-1/)

The two of you have already done most of the hard work. The key now is to establish a baseline standard for proxy reconstructions by putting the words on paper and going through the peer review process.

that makes the past cooling trend so compelling, and the current warming reversal so surprising (OMGIWTWT).)

I just meant that I don’t feel that anything about this paper is compelling. There is nothing here whatsoever. Even negotiating whether one aspect or another is temp is beyond my understanding. I really don’t understand this curve might be related to temperature in any way at all.

I see what Bender is getting at. You still see a long slow cooling period and an abrupt change. All the same I’d still rather have that rebound than a continuation of the cooling and I suspect the polar bears would too.

I don’t think that bender’s being helpful in using the word “robust” to describe two quite different issues.

“Robust” as to a reconstruction can have a useful operational meaning in describing whether or not the “key” properties of the recon (e.g. medieval-modern differential) survive elementary sensitivity tests, such as the use of slightly different proxies. By an example, I’ve shown here that the modern-medieval differential is not robust to some simple changes. What about the long-term decline – could I select alternative proxies so that the composite goes up slightly over the past millennium rather than down? Haven’t thought about it, but I certainly would preclude it. As a start, one could try flipping proxies upside-down to match the Tiljander view of sediment orientation.

I have no idea what operational meaning would attach to bender’s use of the term “robust” as to the paper.

My analysis here is along the lines of my usual analysis. I’m not asserting that the medieval period is warmer than the modern period – only that this sort of data doesn;t prove the point. This doesn’t preclude the possibility of the point being proven with some other better data. That’s one reason why I like to look carefully at new data sets, such as the sediment series here – to see if maybe someone has found some new better data.

Right now, it seems to me that varvochronology looks like it’s got as may, if not more, problems than dendrochronology, but I’ve only started looking at the datasets.

I don’t think that bender’s being helpful in using the word “robust” to describe two quite different issues.

This is true. I was being intentionally ambiguous. I’m in part pointing out that there is some ambiguity in the title of the OP (Is “Kaufman” the recon or the paper). I’m also pointing out that the validity of the paper’s conclusions do not depend so sensitively on the precise shape (or robustness) of the recon – because the modern-medieval comparison does not lie at the crux of this paper.
.
Sidepoint: AFAIK there is no reason to believe that the circumpolar Arctic region was particularly warm during the MWP. No grapes. No Vikings. No logs of NW passages.

I have no idea what operational meaning would attach to bender’s use of the term “robust” as to the paper.

I mean they might have got the right answer, but using an incorrect method. (Wegman’s formula for bad science.)

My analysis here is along the lines of my usual analysis. I’m not asserting that the medieval period is warmer than the modern period – only that this sort of data doesn’t prove the point.

I agree. And you know what the counter is: “it doesn’t matter; our argument is valid under just about any circumstance”.

Right now, it seems to me that varvochronology looks like it’s got as may, if not more, problems than dendrochronology, but I’ve only started looking at the datasets.

I am way more concerned about the breakpoint non-stationarities in varvochronology than I am with the nonlinearities and ambiguities of dendrochronology. The way to attack this paper and this recon is by a very tight focus on the science of varvology. So I applaud your focus. And, for the record, that’s precisely why I said from day one I want to hear, not from Kaufman, but from his varvologist friends. Sort of like how I prefer listening to Rob Wilson than Michael Mann: you’re at least going to learn something.

As I’ve said before, the way out of the recon dilemma has to be through better proxies – so my first interest is seeing whether these things might do the job. I don’t necessarily assume the worst – despite what people may think.

However, there were some things that quickly put me on guard for this paper – the recycling of Briffa’s Yamal proxy was definitely a bad sign. At one of the PI meetings, one of the scientists wondered whether they shouldn’t do a report on their sediments without mixing all the other junk.

IT would have been far more interesting for me to see them try to reconcile the proxies on a local/regional basis. Look at BSi or varve widths or whatever on a consistent basis – just like the NSF abstract. That’s what these folks need to do to progress.

In that sense, it’s disappointing to see an old-school cherry-pie with Briffa’s tired old proxies in new makeup.

Re: Steve McIntyre (#43),
I can’t disagree with anything you say. My main point in all this is simply to contextualize. To show how and why there is more to this paper than the tortured fragile nonsense of MBH98. It’s quite a bit simpler, more clever, less fragile. Even if varvoclimatology is junksci (which it may be!) there is the problem of the arctic cooling GCM run, which obviously doesn’t turn on the recon. While you are working out the problems with the proxies, someone like lucia needs to attack the GCM run.
.
My purpose is primarily to quell some of the ridiculous piling on from people that don’t know what they’re talking about. This paper is not quite the fragile house of cards that MBH98 was.
.
If the modern calibration step can be shown to be junk, this will invalidate the historical recon. But the residual – the GCM run vs. instrumental record – would still stand.

Bender, can you conjecture why these proxies, that all are assumed to be reacting to the same Arctic temperatures, can look so different, one from another. Climate indicated to be that local and unique at these proxy sites would mean that many more proxies are required to obtain a reasonable average reconstruction for 60 N and north. Or are we looking at noise that is unique to an individual proxy, or at least type of proxy, that will somehow average out to reveal a temperature signal? The latter conjecture would require a lot of explaining or a lot of arm waving.

Do you think that Kaufman could do better than conjecture in reply to these queries?

Re: Kenneth Fritsch (#45),
I think they are “reacting to the same arctic temperature”, but that the reaction is very weak. I have very little confidence in this varvology. It is very strange to me that this paper could make it to Science without more substantial proof of concept at the more fundamental level. Dendroclimatology, by contrast, has a long history of establishment – and yet you see how weak a link there is there between temperature and trees. Something fishy in this. Smells like fast-tracked pseudoscience.
.
Nobody commented on my read of Lamoureux & Bradley. That’s a mistake. The foundational stuff that Steve is just getting into is critical. That Bradley and Tiljander do not see eye-to-eye is a major tipoff that there may be no consensus on this science.
.
So to answer your question: noise dominates at the indivudal proxy level because the local signal is very weak. The scaling up of the weak signal can be overcome by cherry-picking, not to fit an individual motif (HS), but to generate a global mean motif. Two criteria. Cherry-pick the 20c. to match the instrumental. Cherry-pick the 1900y cooling trend to match the GCM run.

This paper is 90% about proxies and I don’t see any reason why this portion is “less fragile” than any other cherry-pie CPS reconstruction.

I re-read the paper quickly in respect to the modeling – this is mentioned passim and the paper hardly can be construed as a presentation of a climate model. Plus there are some really odd things about the model presentation.

The Kaufman SI says that the model is NOT of the last 2000 years but of the 5600-3600BP period:

The length of the simulation was 2400 years, corresponding to 6000 to 3600 BP (years before 1950). The time period analyzed here is the 2000-year period from 5600 to 3600 BP. No similar simulation with orbital forcing is available for the most recent two millennia covered by the proxy records.

I went back through the PI minutes. The first meeting said:

First a simulation with CCSM for the past 2000 years is desirable. This will allow the long time series to be put into a temporal perspective and would extend through the 20th century. This simulation is already scheduled as part of the CCSM Paleoclimate Working Group computer allocation.

The second meeting said:

Caspar suggested that it is not too soon to get started with some model-data experiments, even with preliminary time series. To begin, we can try model-data comparisons of our paleodata reconstructions vs existing model reconstructions since 1750 (see “Action Item” above). We can also aim to compare spatial patterns and amplitudes of reconstructed vs modeled climate change in recent times as a first test.

This paper is 90% about proxies and I don’t see any reason why this portion is “less fragile” than any other cherry-pie CPS reconstruction.

1. The proxy premise IS fragile. It’s the overall argument that I am saying is not as fragile.

2. The text is not indicative of the logical structure of the paper. The figures are. The GCM run is one of two load-bearing structures propping up the argument. Take out the proxies and the argument stands, even though you’ve now killed 90% of the text.

Re: Steve McIntyre (#52),
I am too trusting a person, Steve. I saw the GCM data and merely *assumed* it was for the logical period: 1AD-1900AD. Your read was more careful than mine. Trust but verify. Kudos.

Re: Steve McIntyre (#47),
This whole sequence is fishy. I’m glad you pulled it out. I never would go to such depth as to read PI minutes. Maybe taking down the GCM card will not be as hard as I thought. Maybe they “winged it” a little.

These sea-ice, snow, and terrestrial feedbacks were represented in a transient, mid-late Holocene simulation with the Community Climate System Model (CCSM3) (8, 25). For the period from 5600 to 3600 years ago, orbital forcing was the strongest time-varying forcing of this simulation. Results from this simulation show that the relation between summer insolation and temperature in the model is the same as for the proxy reconstruction, thereby supporting the connection between the Arctic summer cooling trend and the orbitally driven reduction in summer insolation (Fig. 4).

The language of the SI repeated below isn’t any longer, but is more precise.

The length of the simulation was 2400 years, corresponding to 6000 to 3600 BP (years before 1950). The time period analyzed here is the 2000-year period from 5600 to 3600 BP. No similar simulation with orbital forcing is available for the most recent two millennia covered by the proxy records.

I don’t see how anyone would deduce from the wording in the article that they used the 5600-3600BP period in their Figure 4. It’s certainly not brought to the reader’s attention. I do not accept “space limitations” as a reason for not being more precise as the method could have been stated easily within the space limitations. I see little point in speculating on motives for why things were arranged as they were, because we’ll never know.

For my contribution to testing the robustness of the 23 proxy “Arctic” reconstruction, I used several a priori criteria for my selections. I did no substitutions of proxies and only proxy removals to avoid any biases in my choice of substitute proxies and primarily to avoid the additional work of putting the substitutions into proper and standardized form.

My first reconstruction used the very strict adherence to the 1-2000 time period for the series and removal of proxies which have controversial flipping issues, cherry picking of TR proxies and one proxy where an older version was being used by Kaufman et al.

My second reconstruction used a lesser restrictive time period of 1-1980 with all the other criteria of the first reconstruction.

My third reconstruction used the criteria of the first but added back the cherry picked TR proxies 22 and 23.

My fourth reconstruction used the criteria of the second reconstruction but added back the cherry picked proxies 22 and 23.

My fifth reconstruction used the time period 980-2000 with the criteria of the first reconstruction.

My sixth reconstruction used the time period 980-1980 with the criteria of the first reconstruction.

Note that all reconstructions ending in 1980 also have data points for two points beyond that date that are listed in pink and represent a Kaufman averaging of the proxies that extend beyond 1980.

The time series for the 6 reconstructions are presented below along with the average of the 23 proxy Kaufman (as averaged by Kaufman et al) for comparison.

The 23 proxy average, as noted before, contains very different time series that, in my mind, happen on averaging (the Kaufman way) to give a sufficient hockey stick shape appearance with a marginally higher modern warm period than any of the 2000 years to past muster for a publication with the claim made by Kaufman et al.

The restricted first reconstruction shows a rather flat temperature trend for the 1-2000 time period. This reconstruction would be publishable in the context of the current warm period not being unprecedented. It could be argued that the proxy number is too low to make any conclusions.

The second reconstruction, less restricted in time, gives more proxies and yields the same conclusion as the first one. It has a bit more of a hockey stick appearance than the first one but does not show a higher modern warm period.

Adding back the “cherry picked” TR proxies 22 and 23 to the second and third reconstructions appears to my eyes to show a lesser hockey stick shape and relatively lower modern warming period than the 23 proxy Kaufman average and would make the case made by Kaufman et al graphically less convincing.

The fifth and sixth reconstructions would have shown an unprecedented modern warming period with a nice hockey stick appearance using the criteria that I used for the 1-2000 period. If the Kaufman authors could have found a rationale for truncating the series to the period 980-2000 or 980-1980, they could have made the claims they made in the paper only for the last 1000 years instead of 2000 years – even though the modern warming trend appears to start too early as it does in the Kaufman 23 proxy average.

1) Do the esteemed scientific journals that publish the work of ‘the Team’ consider the issues of concern with their work that have been identified on this site to be irrelevant or wrong?

2) Do those journals also consider that the publishing/archiving of data and methodology for papers published to be irrelevant or just unecessary?

You see it seems to me that the problem may not necessarily be with ‘the Team’ but with the journals that effectively promote their work. An analogy may help show where I am coming from.

The current financial crisis started with teams of bankers originating loans on sub-prime mortgages which by definition were of poor quality. This would not have spawned a global banking crisis had it not been for the global securitization market where these loans were parcelled up and sold to other banks around the world. Most importantly that could not be done without the rating agencies. It was these bodies who gave these parcels of bad loans the highest quality credit rating – they gave the portfolios credibility in the market (just like the scientific journals give credibility to published work with the moniker of peer review). However we should not forget that the rating agencies only opine on the securities they do not underwrite or guarantee them and thus caveat emptor or buyer beware prevails so the other banks who bought these assets and allowed the risk to be spread around the globe were culpable too (in much the same way as those who accept at face value climate papers published in journals without checking them).

Now some auditors may have questioned this process along the way but they were obviously drowned out by those who felt the risk reward was too good to miss and who could rely on the validity of their judgement backed by the opinion of the rating agencies. (Much the same as we today see folks still giving credence to the work of ‘the Team’ because of the validity given to it by the esteemed scientific journals who promote their work through publication despite the many concerns expressed about the methodology).

The point of all this is to say perhaps the scientific journals should be being challenged more don’t you think?
Steve: I haven’t been shy about the journals. I’ve certainly done what I can (and readers have helped) to motivate journals to require authors to archive data and code. Some progress is being made on data – it;s a lot easier to get action now than it was on MBH98.

Bender, I think if you take a hard look at all the 23 proxy shapes you will see only one that shows the hockey stick shape and even when recognizing that some of the proxies are truncated before the blade.

Your conjecture was that, despite the issue of the modern warm period not necessarily being higher than the past periods in the 2000 year time frame, that the hockey stick appearance was the key to the K09 paper. You alluded to the model run agreements.

What then would become critical would be the break point in the handle and blade of the hockey stick to correspond with GHG rapid increases. My break point analysis may have been less sensitive to end of series break points, but a visual comparison shows, by my old eyes, that the break comes too early in the series and this agrees with my calculations. Using the raw annual data would, of course, make finding the end point break point less difficult than dealing with 10 year averages.

Model run agreement with K09 or my renditions of it would imply that natural processes over a 2000 year period have nearly as large or larger effects on “Arctic” temperatures than AGW. That would I think be something new.

My next question was to ask for the times series generated by the models, but I think Steve M has shown that that will probably not be forthcoming. It would be most interesting to see where the model would take the 2000 year trend in the next several centuries and, since we think that ice ages are initiated by cooling in 60N and north, whether those models would indicate an early ice age (without AGW). Do not most models indicate 20 to 30K years before our next ice age? Would it be an embarrassment to show that AGW (with salutes to the fossil fuel executives) averted an impending ice age?

Re: Kenneth Fritsch (#58),
1. Ken, what matters is that the recon mean (~PC1) has HS shape. This does not require that all, or even most, of the individual series have HS shape. The few proxies that load heavily on PC1 are the ones driving the recon. (And what’s driving the calibration is a separate, but equally important, question. And same thing there: it’s the mean that matters to the result, not variations at the level of individual proxies.).
.
This highlights a point I made years ago in one of my first posts at CA: calibrations and recons that ignore the noise in individual series are false. They suppress the true amount of uncertainty on recon estimates. I recommended a bootstrap procedure to do this robustly. To my knowledge the field has foolishly resisted this suggestion.

2.

Would it be an embarrassment to show that AGW (with salutes to the fossil fuel executives) averted an impending ice age?

Perhaps not to the authors – because I think that is precisely what they are knowingly implying. But to the alarmist movement: yes, I would think so. I would like to see Gavin, Tamino et al. wriggle out of this one.

the timing of the next Ice Age is an intriguing issue that would be worth writing about. In the Pleistocene, interglacials haven’t been all that long and it’s logical enough to wonder about the Holocene, which has been going on a while in Pleistocene interglacial terms. IPCC compares the present interglacial to a long one around 400,000 BP and says that no return of glaciation is expected for 30,000 years or so. I have not studied the relevant literature to see why they think this, but it would be interesting to do so.

It would be useful to crosscheck this against the implications of KAufman Fig 4F.

“IPCC compares the present interglacial to a long one around 400,000 BP and says that no return of glaciation is expected for 30,000 years or so. I have not studied the relevant literature to see why they think this, but it would be interesting to do so.”

I have, and to put it briefly it seems fairly likely to be true. However at present and for the next couple of millenia the insolation is only slightly over the level where glaciation started at the end of the Holstein Interglacial (the long interglacial referred to above). Which seems to me a very good reason not to experiment with “geo-engineering” to cool the planet.

Ken, what matters is that the recon mean (~PC1) has HS shape. This does not require that all, or even most, of the individual series have HS shape.

Bender, you will have to do better than a generalization. I was looking for an explication/explanation.

The proxy responses show long episodic periods of change that are rather unique to that proxy. Some proxies show little trend over the entire time period: others show a downward trend towards the end of the series. If the proxies are responding to the same long term underlying trend and contain noise and/or localized effects, I would think that that trend would be obvious in individual proxies over a 2000 year period.

Otherwise noise and/or local effects are obliterating the trend in which case means the uncertainty becomes huge and implies that many more proxies would be required to find a signal – if there was one to find.

The second question a scientist would ask (after, at least, attempting to explain the proxy differences), I would think, is what is the degree of difficulty extracting (cherry picking) and combining a number of individual proxy responses to obtain a hockey stick shape and what would be the probability of that occurring by chance. Such an exercise would require an exposition of all the available proxy and avoiding, at least, initially and for this analysis, hand waving off or rejecting with good rationale any of the available proxies.

Ken,
RE: bootstrap calibration. If Steve M were to plot curves instead of bars on the OP graphs and then over-plot 95% confidence interval lines, then my point might become a lot more clear. I bet the confidence envelope is extremely wide. Sugesting the calibration statistics (and hence recon significance levels) are way over-inflated.

The only rational approach to confidence intervals in these sorts of calibration that I’m aware of is Brown and Sundberg. I know right now what a Brown and Sundberg analysis will show – that the proxies are too inconsistent to yield anything other than a floor-to-ceiling CI.

I am very suspicious of leave one-or-two out bootstraps when the data set has been cherry picked. By now, the Team knows that they have to have a spare in case Yamal (or the bristlecones) is in the penalty box. Remember Mann 2008: he “proved” bristlecones didn’t matter by doing a run with upside-down Tiljander (untruncated) and “proved” Tiljander didn’t “matter” by doing a run with bristlecones.

Hallet Lake BSi looks like it would be a very serviceable spare if Yamal was in the penalty box in a leave-one-out test. But until one can reconcile Hallet Lake BSi to Goat Lake BSi to Cascade Lake BSi and show that there is a reason for preferring one to another, it’s all just cherry-pie.

The only rational approach to confidence intervals in these sorts of calibration that I’m aware of is Brown and Sundberg. I know right now what a Brown and Sundberg analysis will show – that the proxies are too inconsistent to yield anything other than a floor-to-ceiling CI.

Yup.

I am very suspicious of leave one-or-two out bootstraps when the data set has been cherry picked.

Steve – I was not for a second suggesting that you have been shy about the journals nor that readers have helped in the process of moving things forward. Indeed I commend you and the folks here for all your/their efforts and the progress that has been made.

The fact remains that ‘the Team’ enjoys an air of credibility (despite Wegman as well) when their work is published with the ‘peer reviewed’ stamp of authority in these esteemed journals yet so many issues of challenge (that are readily repeated with each publication) can readily be cited by you and others.

Surely it is a challenge that should be squarely laid at the feet of every scientist worthy of such a title – are they are willing do defend their standing and call these journals to account?

Or perhaps scientists have been corrupted by money grabbing (funding I believe is the euphemism) as badly as bankers?

If the proxies are responding to the same long term underlying trend and contain noise and/or localized effects, I would think that that trend would be obvious in individual proxies over a 2000 year period.

I understood your question the first time and I already answered. Are you not reading what I write?

The only rational approach to confidence intervals in these sorts of calibration that I’m aware of is Brown and Sundberg. I know right now what a Brown and Sundberg analysis will show – that the proxies are too inconsistent to yield anything other than a floor-to-ceiling CI.

As long as there are a lot more observations than proxies (not true in Kaufman’s case), an optimal univariate proxy index can be found using the covariance matrix of univariate regression residuals of proxies on temperature, so that there is no need to deal with his chi-squared F-based CI’s. In fact, a Brown-like consistency test can be performed directly on the proxies, rather than on their univariate reconstructions as he does. (Again, since Kaufman has more proxies than observations, there is no way to do this without imposing a lot of restrictions on the covariance matrix.)

Below I have calculated the trends (change in standard proxy response per decade) for all the 23 Kaufman proxies from the start of the series to 1945 or whatever date the proxy ends if the end occurs before 1945. The Kaufman authors claim the end of the gradual cooling in the Arctic was approximately 1950 and, therefore, the proxies, if responding to this gradual cooling due to decreases in summer insolation, should show a somewhat similar decline.

The table is presented below with the proxies from the same general area separated by an intervening space. I also include a map of the spatial dispositions of the proxies from the Kaufman paper.

Re: Kenneth Fritsch (#72),
This illustrates my point that the individual proxies are inconsistent. If they’re responding to a common climate signal, the response is so weak that the wild noise (demonic intrusions) dominates. Don’t beg me to explain the intrusions. I can’t. And I’m not interested.
.
The next step is to report the mean pairwise correlation between all series. Then compare this number to the sort of number that dendros find accetpable. I bet the mean correlation here is less than 0.2. Whereas dendros require 0.3-0.4 before a calibration can be considered credible enough for a recon.
.
The step after that is to compute the standard deviation for each time point an plot the mean +/- 2sd above and below the mean curve. Yes, the envelope is “finite”. But practically speaking, it will be “ceiling to floor”. I can’t explain the demonic intrusions, but I can show what their presence implies.
.
The step after that is a proper bootstrap calibration. Not a leave-some-out type of test. Resampling.

Re: bender (#73), bootstrap resampling should be better than jacknife leave-out-one. But if I recall correctly these methods are designed to handle fair random samples from an unknown distribution. When, as may be true in this case, the samples are not chosen at random, but with some prior end in mind, then you really only learn about the sampling process, not the underlying system. And that’s much harder to fix.

Re: Jonathan (#74),
There is no way to use statistics to correct for cherry-picking. That is not what I’m proposing. I’m saying that DESPITE the fact that they cherry-picked the proxies, the study is still flawed – because of the inconsistency that Ken is pointing to. To prove it, resample from the proxy confidence envelope, not the actual proxies. Do this 1000 times and generate 1000 calibrations and 1000 recons. Most of them will be junk. The spread among the family of recons will be ceiling to floor such that ANY GCM run would lie within the envelope. i.e. GCM-proxy “match” is not particularly compelling.

With reference to the color coded and contoured trend map shown in my post #72, I found in the Kaufman SI that this map was constructed using ERA-40 reanalysis data for the period 1958-2000. I either need to be a scientist or come up with a healthy number of pounds to obtain these data.

In the meantime, I went to the KNMI site to determine the number of GHCN stations that reported temperatures for the 1958-2000 period for the region 60N and pole ward. Here is what I found for the Arctic divided into three zonal areas:

80N-90N:
Stations = 1

70N-80N:
Stations = 16

60N-70N:
Stations = 101

These stations are all land occupied. I am puzzled how the map in the Kaufman paper can show such localized detail with the small number of stations that were available – as indicated by the GHCN data. Can anyone out there help with this one? I plan to look at the variations in the temperature trends for the 1958-2000 period for the GHCN individual stations noted above.

I suspected that the only way to use spatially sparse data to obtain the detail of the map would require kriging. Are you referring to a high resolution Digital Elevation Model?

I wanted to look at the data used to construct the map because I have a difficult time accepting that the very wide ranging negative to positive and localized trends over 43 years is the proper interpretation of the map – even though it seems straight forward.

As I recall, in pondering the use of ERA-40 data, the 1958-2000 period was the available reanalysis period from that series. Evidently a legitimate scientist can obtain the ERA-40 data series free, while non-scientists have to pay for it. I’ll look at KNMI data in the meantime.

I have a difficult time accepting that the very wide ranging negative to positive and localized trends over 43 years is the proper interpretation of the map

I was going to ask: why?
.
But then I reviewed the map, knowing that you are pretty careful even in your casual observations. And I have to agree. It is interesting that the strongest instrumental trends occur in areas where there are no proxies. The centre of Hudson Bay, for example. It would be good to plot the weather stations on this map to confirm that the coloured blobs are indeed centred on those station locations. Curiously, in the areas where the proxies are located, the instrumental trends are very weak. #1, #17, #22 might be considered exceptions. For a proper calibration you want to examine the areas with the warmest vs. the coolest trends. This obviously was not done. How many of these “proxies” consistently show the correct, opposing pattern when they are chosen from those anomalous little blue patches?

It’s interesting to compare all of these climate papers to the sort seen in medical literature.
I’ve spent many years as a medical rep, so have read hundreds of such papers in my time and discussed them with members of the medical profession.
If I were to present a clinical paper, where the methods used to “prove” the efficacy of a therapy utilised the methods seen to “prove” the hypotheses behind global warming/climate change, I’d be shown the door with great alacrity.
The methods used all appear to be based upon data dredging & post-hoc analysis.
If a pharmaceutical agent were to be subjected to such processes, it wouldn’t get anywhere near be granted a licence.
The medical equvalent of these papers.
“We gave it to x patients and took data from the y patients who responded favourably and ignored the z patients who didn’t respond, we also ignored the side effects any patients suffered from, so
this shows that it works and is safe & effective to use”
The climate equivalent.
“We looked at x proxies and took data from the y proxies that fitted our theory and ignored the z proxies who didn’t, we also ignored the side bits of data from these selected proxies that diverge from the theory when compared to known temperatures, this shows that we’re in an era of unpresidented temperatures”

Re: Adam Gallon (#75),
There are limitations in climate science that can’t be gotten around, so it is pretty fruitless to compare scientific methods in such disparate fields. Replication, randomization, manipulation are not available in climate science the way they are in lab science. That is why the models are so critical.

bender, I agree that the models are the relevant thing for the “big picture”, but unfortunately a lot of the sales promotion is linked to proxy studies and unprecedentedness. We have too many examples of “bad” results not being reported or not being used: Jacoby’s few good men, Bona Churchill, the Polar Urals update, Indigirka River tree rings,… There seems to be a disincentive in the field to publish “bad” results.

For completeness of the table from my previous post at this thread, the average trend (Kaufman averaging) of the 23 proxies was -0.00038 for the period 1-1945.

Since the map of the proxy sites appears to have reproduced well from my previous post, it can be noted that the trends in the region 60N pole ward are very localized and varying over the 42 year period shown. Evidently these local trends vary from approximately -3 to 7 degrees C per century over that period. I could not find a reference for the map depictions and thus feel compelled to look at the KNMI series for some kind of confirmation of this localized climate in the Arctic. I do not see how the 23 proxies with closely spaced groupings within the 23 could give a representative picture of Arctic. Perhaps a statistical analysis of this proposition is required.

I also am wondering how many temperature records make up the enclosed iso temperature areas, that is, how much more localized and varied would the climate be if more records were available.

Re #73, I looked at the pairwise correlations a few days ago, and they appear to be a complete mish-mash of values, positive, negative, large and very small. What was pretty obvious in plots of the proxies against Year or time (whose scale remains a mystery to me!) is that there are some “observations” that appear to be totally out of touch with the remainder in the series. A case in point is for proxy 23, where two observations call attention to themselves, being outside 3 standard deviations from the sample mean. For P22 the two “outliers” occur at the end of the series. Quite a number of the proxies also have values that in an experimental situation would certainly be deemed worthy of close study of the original notebooks or of being repeated. It is singularly unfortunate that the original data seem to be inaccessible in a case like this. Even clerical (transcription) errors cannot be ruled out.

The data as a whole seem to me to be strange. They give the impression of having undergone some sort of transformation that yields scales that provide standard deviations fairly close to 1 and means that also a frequently in the range 0 to 1. They are certainly not “standardised” in the usual sense (mean 0 variance 1), and to me this implies that they should not be assembled into row means without due consideration. There has been some discussion here of the underlying distribution(s). Some are remarkably normal, others decidedly not, and quite often (but certainly not exclusively) the values that produce non-normality are concentrated around what I presume is the end of the series.

Another unfortunate aspect of the data is the lack of observations near both ends of the series. I realise that there may be good technical reasons for this, but it certainly puts a considerable weighting factor onto those proxies for which a complete set of values is available.

In common with others here I feel uneasy (to put it mildly) that data like these might be ascribed a huge importance by the politico/academic consortium that is ruling the world as regards climate change is concerned.

But one additional thing the climatologists do, when flipping proxies as Mann08 and Kaufman09 have done with Tiljander, is to switch the “recovered” count with the “died of side effects” count, as required to achieve the desired conclusion.

… but, yes, your interpretation is correct. I expect that is what the data will show. Remember the scale. We are talking -0.3 to +0.5C per decade. Over 40 years that’s only 1-2C. Plot a map of the s.e.’s on those slope estimates. That will put the significance of the colored blobs in context.

My preliminary calculations, using KNMI station data, show that, indeed, the summer temperature trends are significantly smaller than those for the entire year and that the summer months in the Arctic can have negative trends over the entire 1958-2000 period.

These calculations using the KNMI data are slow going. ERA-40 access would require much less effort. I will sample the 60N-70N data and then determine a fit with the map. Currently I have no reason to doubt the map.

Neither do I. What I am questioning is the location of the proxies. For a strong calibration (where you know the proxy really is a proxy) you want your samples to be located in strongly contrasting areas in terms of instrumental temperature trend (whereupon you run the risk of falsely conlcuding that trends in the proxy that match the instrumental record are tempearture-related, but you have the advantage that if there is no match, then you truly don’t have a proxy.) If you locate the proxies in areas with no trend you have to hope that annual temperature fluctuations are highly variable and that your proxy responds instantaneously quickly to temperature.
.
Locating the samples in the red and blue pools maximizes your chance of falsifying the proxy’s validity at the risk of over-estimating its sensitivity. (At this point I think we’re far more concerned that these things just are not temperature proxies than we are about possibly inflated sensitivity coefficients.)

With data in hand for the 1958-2000 period from KNMI, I was able to verify the wide ranging and localized trend data in the Kaufman Arctic are defined as 60N pole ward.

After some thought on the matter, it is my view the composite plus scale (CPS) method used by Kaufman et al assumes at the very start (standardizing the proxies from the various Arctic locations) that the proxies are all reacting to the same climate conditions, i.e. assumes the Arctic climate is uniform over the entire area.

I have also thought that the further the actual climate deviates from that assumption the more uncertainty that that development adds to the final reconstruction results. I am not aware of a method of calculating that added uncertainty.

Another assumption in K09, I trhink, is that the summer months (JJA) anomalies are perfectly correlated to the annual anomalies. I suppose for a purely scientific paper this assertion would not be necessary, but that the paper’s authors were probably aware of the policy implications and that requires being able to say something about the annual anomalies.

The K09 authors being keenly aware of the assumptions noted above attempt to show evidence to support them. They point to the correlations of summer month anomalies to the annual ones for all of the Arctic area. In my mind, the correlations should be broken down by some logical subdivisions of the Arctic. To that end I show these regional correlations in the table presented below.

The K09 authors do not address the Arctic local variations in trends other than to show the trend color coded map in their paper with no discussion of the import of that variation to the uncertainty it must add to the reconstruction. Below in the table noted above I added larger area regional trends for the Arctic.

The authors also point to the CPS reconstruction showing an almost straight line decline for 2000 years and then claim an upward swing in the HS blade for the final 50 years that would correspond with presumed AGW effects. I have shown in previous posts that breakpoints in the reconstruction and individual that contribute to the composite do not correspond to the authors claim for the most recent 50 years. I made a proviso that annually resolved data, unlike the decadal resolution presented in K09, would perhaps make end point break points easier to detect. With that in mind I found rather complete monthly data in KNMI using CRUTEM 3 data for the period of 1898-2008.

I did break point calculations in R as documented previously on that data for the spring (MAM), summer (JJA), autumn (SON) and the winter DJF) months and for the combined annual series. The graphs of the series with breakpoints are shown below. Note that while perhaps one of two break points found in most of the series could be construed to fall close to the 50 year before present time, the direction of the segment slopes are in the wrong directions. The instrument period does not appear to agree with the K09 thesis and I believe that Lucia has found this same inconsistency. I personally do not see the K09 conclusion consistent with their reconstruction results.

In this post I merely wanted to make my points about what I see as the K09 assumptions implicit in using CPS methodology about instrumentally measured temperature anomalies and the added uncertainty when analyzed regionally in the Arctic and not averaged over the entire Arctic as the K09 authors did. The break point inconsistency becomes clearer with the use of instrumental data that are resolved annually.

Re: Kenneth Fritsch (#97),
So there are no temperature upticks occurring at the time(s) the CRU Yamal chronology diverges upward from Schweingruber’s, which are the early 1950s and early 1970s. Interesting. These series are averages for the entire arctic region. How hard would it be to extract the data for Yamal region, for comparison?