Karoly and Gergis vs Journal of Climate

On June 10, a few days after the Gergis-Karoly-Neukom error had been identified, I speculated that they would try to re-submit the same results, glossing over the fact that they had changed the methodology from that described in the accepted article. My cynical prediction was that a community unoffended by Gleick or upside-down Mann would not cavil at such conduct.

The emails http://www.climateaudit.info/correspondence/foi/gergis/Part%202a%20Journal%20Correspondence.pdf show that Karoly and Gergis did precisely as predicted, but Journal of Climate editors Chiang and Broccoli didn’t bite. Most surprising perhaps was that Karoly’s initial reaction was agreement with the Climate Audit criticism of ex post correlation screening. However, when Karoly realized that the reconstruction fell apart using the methodology of the accepted article, he was very quick to propose that they abandon the stated methodology and gloss over the changes. In today’s post, I’ll walk through the chronology.

Karoly’s first technical response (June 7 Melbourne) to Neukom’s confession was a surprisingly strong endorsement of criticism of non-detrended correlation, going as far as to even agree with me by name:

Thanks for the info on the correlations for the SR reconstructions during the 1911-90 period for detrended and full data. I think that it is much better to use the detrended data for the selection of proxies, as you can then say that you have identified the proxies that are responding to the temperature variations on interannual time scales, ie temp-sensitive proxies, without any influence from the trend over the 20th century. This is very important to be able to rebut the criticism is that you only selected proxies that show a large increase over the 20th century ie a hockey stick .

The same argument applies for the Australasian proxy selection. If the selection is done on the proxies without detrending ie the full proxy records over the 20th century, then records with strong trends will be selected and that will effectively force a hockey stick result. Then Stephen Mcintyre criticism is valid. I think that it is really important to use detrended proxy data for the selection, and then choose proxies that exceed a threshold for correlations over the calibration period for either interannual variability or decadal variability for detrended data. I would be happy for the proxy selection to be based on decadal correlations, rather than interannual correlations, but it needs to be with detrended data, in my opinion. The criticism that the selection process forces a hockey stick result will be valid if the trend is not excluded in the proxy selection step.

Neukom replied immediately (8:55 June 7 Melbourne) that he agreed, but warned that peril lay that way, since they had very few proxies that met even such a minimal standard:

I agree, but we don’t have enough strong proxy data with significant correlations after detrending to get a reasonable reconstruction….

Meanwhile, Gergis and Karoly were drafting a notice to the Journal of Climate, with the notice being sent the next day (June 8), describing the error as an “unfortunate data processing error”, explaining that they had used detrending in a related SH paper, but had inadvertently failed to do so in the AUS paper:

When we went to recheck this on Tuesday [June 5], we discovered that the records used in the final analysis were not detrended for proxy selection, making this statement incorrect.

The detrending of proxy records had been done in another paper on Southern Hemi sphere temperature variations that we had been writing simultaneously. So we wrongly assumed the same thing had been done in the Australasian paper..[REDACTED] … this was not picked up until now.

Although it was an unfortunate data processing error, it does have implications for the results of the paper. We wish to alert you to this issue before the paper goes into final production.

They asked that the paper be removed from the online section and asked how to proceed:

Please let us know how you’d like us to proceed, be it through a revised or new submission.

The following day (June 9 Melbourne), Journal of Climate editor Chiang gave Gergis some bad news: he had decided to rescind acceptance of the paper and asked Gergis to withdraw the paper, inviting her to re-submit fresh following a re-do:

After consulting with the Chief Editor, I have decided to rescind acceptance of the paper- you’ll receive an official email from J Climate to this effect as soon as we figure out how it should be properly done. I believe the EOR has already been taken down.

Also, since it appears that you will have to redo the entire analysis (and which may result in different conclusions), I will also be requesting that you withdraw the paper from consideration. Again, you’ll hear officially from J CLimate in due course. I invite you to resubmit once the necessary analyses and changes to the manuscript have been made.

I hope this will be acceptable to you. I regret the situation, but thank you for bringing it to my prompt attention.

On June 11, Gergis forwarded this to Karoly, Phipps and Gallant without comment (2G: ; 2K, ). Despite this seemingly categorical email from Journal of Climate, Karoly, Gergis and the University of Melbourne publicly maintained that the article was merely “on hold” or “under revision”.

Later in the evening of June 11, Karoly reviewed potential options for his coauthors, with options ranging from more or less ignoring results using the method of the accepted article (his option 1) to resubmitting results using the methodology set out in the accepted paper (option 3) or showing both (option 2). By this point, Karoly had moved away from re-submitting using the method of the accepted article to either ignoring the original method (option 1) or showing both (option 2). Karoly forwarded to his coauthors an email from Michael Mann in which Mann accused me of “dishonesty” with the following commentary:

Following some email discussions with Mike Mann and helpful discussions with you both last week, there appear to be several different approaches that we can take with revising the Australasian temp recon paper. I am going to go through some of them briefly, and then raise some suggestions for further data analysis that might be needed.

1. Amend the manuscript so that it states the actual way that the proxy selection was done, based on correls that included trends and were significant at the 5% level. The calibration was also done using the full data variations, including trends, over the calibration period. As Mike Mann says below and in the attached papers, this is a common approach. Don’t seriously address the proxy selection for detrended data

2. Revise the manuscript to present results for reconstructions based on both proxy selections for full correls and proxy selections for detrended correls. Expand the paper to show both sets of results and explain why the full correls ‘are better.

3. Redo the analysis for proxy selection based on what the manuscript says, proxy selection based on detrended correls, which gives only about 9 selected proxies and only one prior to 1400. No reliable reconstruction prior to 1400.

4. Redo the analysis based on proxy correlations with local/regional temps at interannual and decadal timescales, not the Australasian area average; select proxies that have strong local temperature signals, then average the proxies to get the area average temperature. This approach is like what Raphi is doing for the SH paper, I think.

My preference is now for 1. or 2. above, and not for 3. Now for some technical questions.
1. Raphi, did you estimate the significance level of the correlations between the target and the individual proxies allowing for the autocorrelation in the proxies and the reduced degrees of freedom? Some of the comments on the CA web site suggest that they can only get sig correlations for the 27 proxies if you assume 70 degrees of freedom, effectively ignoring autocorrelation. Do you have different values for the sig correlations for each
proxy, because the autocorrelation is different for each proxy?

2. In a table like the one you provided last week, can you give for each proxy record, for the 1920-1990 period, the correlation, no.of degrees of freedom and sig level for the full data, detrended data and low pass filtered data. This will help us with proxy selection.

3. It is not surprising that there are many fewer significant correlations for the interannual variations and some are even of the opposite sign for the full correlations. The spatial pattern for the temp response to ENSO, which is the main contributor to Aust temp variations at interannual time scales, is not uniform over Australasia, being quite different in NZ or Law Dome than Australia. Ailie or Raphi, can you do a map using the modem temp data for the correlations of interannual variations of gridded temp data with teh target, area average Australasian temps? Then redo the map for the full data, including the trend. My guess is that the correlns will be much larger scale for the full data. This will help to explain some of the proxy selection issues
for interannual variations.

Neukom responded immediately, generally agreeing, mentioning that he’d had similar advice from David Frank, a frequent Esper co-author.

On June 12, co-author Phipps opposed re-doing the paper using the stated methodology (option 3) on the grounds that this would not yield a “viable” reconstruction:

Based on the various emails circulated over the past few days, it appears that we will not have a viable millennial-scale reconstruction if we pursue the detrended approach. I therefore feel that we should use the raw data to validate the proxies…

My preference is therefore for David’s Option 2, with Option 1 as my second choice. I dislike Option 3 as it will not leave us with a viable reconstruction. I also dislike Option 4 as it strikes me as essentially starting again from scratch – which seems unnecessary given how far this work has already progressed, and also seems out of proportion to what is only a matter of fixing a technical issue.

Despite the very discouraging email from Journal of Climate editor Chiang on June 9 rescinding acceptance and requesting withdrawal, Karoly told Retraction Watch on June 13 that the paper was merely “on hold”:

The paper has been put on hold, while an issue with the data processing and methods that we have identified is checked. The paper has not been withdrawn nor has it been retracted.

Karoly also told The Australian that their plan was to re-submit the paper using the intended method

A fresh analysis of the data will be done, using the intended method, and the effect on the study conclusions is uncertain.

In the same article, Karoly told the reporter that “the Gergis team had not seen these [Climate Audit] posts before June 5” – a claim refuted by the many references to Climate Audit and the name-and-shame post in the emails among the Karoly-Gergis coauthors.

Despite this public posture, the Karoly coauthors were concurrently trying to get the Journal of Climate to either forget or minimize use of the methodology of the accepted article. On June 14 (14:55), Gergis had received a reminder email from Hayley Charney, Chiang’s editorial assistant at Journal of Climate, in which Charney had re-transmitted Chiang’s rejection email of June 9 to which Gergis had not responded.

However, instead of withdrawing the paper as Chiang had requested, Gergis argued (June 14) that the error didn’t matter (TM-climate science): that the error was nothing more than words describing the proxy selection method and not flaws in the analysis. (Here, Gergis took a swipe at “amateur climate skeptic bloggers” – though one of the “amateur skeptic bloggers” had been a coauthor on a recent Journal of Climate article edited by Editor-in-Chief Broccoli.) Gergis requested that they be entitled to submit a “revision”, rather than being required to withdraw and re-submit. Gergis argued that they be permitted to more or less disregard the methodology of the accepted article in the revised article itself, instead consigning discussion of results using the methodology of the accepted article to Supplementary Information:

Just to clarify, there was an error in the words describing the proxy selection method and not flaws in the entire analysis as suggested by amateur climate skeptic bloggers.

Over recent days we have been in discussion with colleagues here in Australia and internationally about the use of detrended or non detrended data for proxy selection as both methods are published in the literature .

People have argued that detrending proxy records when reconstructing temperature is in fact undesirable (see two papers attached provided courtesy of Professor Michael Mann) .

While anthropogenic trends may inflate correlation coefficients, this can be dealt with by allowing for autocorrelation when assessing significance. If any linear trends ARE removed when validating individual proxies, then the validation exercise will essentially only confirm the ability of the proxies to reconstruct interannual variations. However, in an exercise of this nature we are also intrinsically interested in reconstructing longer-term trends. It therefore appears to be preferable to retain trends in the data, so that we are also assessing the ability of the proxies to reconstruct this information.

Both approaches have been widely used in the past, and that both are supported in the literature. Thus we believe that either approach is entirely justifiable. In terms of revisions to our paper, we plan to compare the influencing of using detrended and non detrended proxy selection in a supplementary section but it is very unlikely to result in a rewrite of the paper. Instead, there will be correction of the correct method used in the paper and reference to additional supplementary
material where appropriate.

Given this paper was originally submitted for review on 3 November 2011 and was extensively reviewed by three expert assessors, my strong preference would be for permission to submit a revision of the original manuscript rather than an entirely new submission. That said, we will of course follow your advice on how best to proceed.

Chief Editor Broccoli, who had been copied on the correspondence, sharply challenged (June 15) the inconsistency between Gergis’ original information that they had inadvertently failed to implement the stated methodology (a “data processing error”) with their present position that they had carried out the analysis as they intended (but had merely misdescribed the methodology):

Your latest email to John characterizes the error in your manuscript as one of wording. But this differs from the characterization you made in the email you sent reporting the error. In that email (dated June 7) you described it as “an unfortunate data processing error,” suggesting that you had intended to detrend the data. That would mean that the issue was not with the wording but rather with the execution of the intended methodology.

Would you please explain why your two emails give different impressions of the nature of the error?

Chiang promptly (June 15) added his own commentary, very sensibly observing that they had presumed from the original notice that the authors were going to re-do the analysis to “conform to the description” in the paper and that he had asked them to withdraw the paper because the journal could not assume that the results would remain unchanged:

Both Tony and I read your initial email (dated June 8 for me, I’m in Taipei) to mean that you had intended to detrend during the predictor selection, but that subsequently you had discovered that you had not. Given that you had further stated that “Although it was an unfortunate data processing error, it does have implications for the results of the paper,” we had further took this to mean that you were going to redo the analysis to conform to the description of the proxy selection in the paper.

Assuming this to be true, my reasoning was that since you are likely to use a different subset of proxies in the recalculation, it allows for the possibility of a significantly different result and conclusion. It was on this basis that I requested that you resubmit withdraw the paper (and not because of flaws in the analysis method). I understand that the results may well remain essentially the same after the redo, but this is not something that I can assume to be true .

I hope this clarifies my decision. I’ll wait for your response to Tony’s query before I get back to you on your June 14 email?

Gergis took about 10 days to respond and was unable to give a coherent answer to Broccoli’s sensible question. She argued (June 25) that their original notification (that it was a “data processing error”) was yet another mistake; once again asked that they be able to consign any discussion of results using the methodology of the accepted article to Supplementary Information; and asked that all this be characterized as a mere “revision” (not even a “major revision”, a term of art in Journal of Climate review processes that was used to obstruct O’Donnell et al 2010.)

Sorry for the delay in responding to your emails as I have been on leave over the past week but am now back in regular email contact.

Just to clarify our position:
The message sent on 8 June was a quick response when we realised there was an inconsistency between the proxy selection method described in the paper and actually used. The email was sent in haste as we wanted to alert you to the issue immediately given the paper was being prepared for typesetting.

Now that we have had more time to extensively liaise with colleagues and review the existing research literature on the topic, there are reasons why detrending prior to proxy selection may not be appropriate. The differences between the two methods will be described in the supplementary material, as outlined in my email dated 14 June.

As such, the changes in the manuscript are likely to be small, with details of the alternative proxy selection method outlined in the supplementary material. The careful checking and analysis will take a little time but we expect to submit the revised manuscript for consideration by the journal again before the end of July. Like any other revised paper, we would expect it to be sent for peer review again.

As I mentioned previously, given this paper was originally submitted for review on 3 November 2011 and was extensively reviewed by three expert assessors , our team’s strong preference would be for permission to submit a revision of the original manuscript rather than an entirely new submission. That said, we will of course accept your decision on how best to proceed.

However, Chiang didn’t fall over though he did give Gergis and coauthors a larger window than they probably deserved. Instead of formally requesting immediate withdrawal (as he had signaled on June 9), Chiang granted Gergis four weeks to submit a revision (a date that would accommodate the IPCC deadline), but stated that this was a “hard deadline” and that failure to meet the deadline would mean “rejection”. Further, Chiang firmly rejected Gergis’ suggestion that the body of the article be allowed to ignore results using the methodology of the accepted article, telling Gergis that the sensitivity of the reconstruction to detrended/non-detrended correlations was an issue that should be “addressed” and that this would be a “good opportunity to demonstrate the robustness of your conclusions”. This latter comment was polite, but very pointed; its import would have been unmistakeable to Karoly and Gergis.

I’ve discussed your case again with Tony, and have come to a decision regarding the handling of your manuscript.

I will allow the modifications to your manuscript to be accepted as a revision, to be submitted on or before July 27, 2012 (EST) – so a month from today. Upon receipt, the manuscript will be sent out for re-evaluation .

Please note that this is a hard deadline, in order to keep the revision schedule within reasonable limits. If the revision is not submitted by July 27, the paper will be rejected.

In the revision, I strongly recommend that the issue regarding the sensitivity of the climate reconstruction to the choice of proxy selection method (detrend or no detrend) be addressed. My understanding that this is what you plan to do, and this is a good opportunity to demonstrate the robustness of your conclusions.

Our team would be very pleased to submit a revised manuscript on or before the 27 July 2012 for reconsideration by the reviewers. As you have recommended below, we will extensively address proxy selection based on detrended and non detrended data and the influence on the resultant reconstructions.
withdrawn.

Postscript.
Gergis and Karoly apparently did not meet the July 27 hard deadline. This was noted up in A CA post of August 2, which drew attention to an update on a University of Melbourne webpage. The University webpage continued to say that the article had been “accepted” and that a “revised” version would be submitted “likely” before the end of September. (This language seems at odds with Chiang’s email saying that the article would be rejected if the hard deadline wasn’t met; though it is possible that the arrangements were subsequently varied in an email subsequent to the tranche presently available.)

The Journal of Climate website was changed to say that the article had been “withdrawn” by the original authors, again contradicting the University of Melbourne webpage.

Meanwhile, the submission to Science by the PAGES 2K Consortium (of which Gergis was a member), which the IPCC Second Order Draft used as a replacement citation for the Gergis reconstruction, cited the Gergis et al article as “under revision”, a status that seems inapplicable once Chiang’s hard deadline had passed.

The article has now apparently been re-submitted to the Journal of Climate. One wonders precisely how Gergis et al will go about “demonstrating the robustness of [their] conclusions” as editor Chiang had asked them to do.

120 Comments

1/”If the selection is done on the proxies without detrending ie the full proxy records over the 20th century, then records with strong trends will be selected and that will effectively force a hockey stick result. Then Stephen Mcintyre criticism is valid”.

2/”Over recent days we have been in discussion with colleagues here in Australia and internationally about the use of detrended or non detrended data for proxy selection as both methods are published in the literature .

People have argued that detrending proxy records when reconstructing temperature is in fact undesirable (see two papers attached provided courtesy of Professor Michael Mann) .

While anthropogenic trends may inflate correlation coefficients, this can be dealt with by allowing for autocorrelation when assessing significance. If any linear trends ARE removed when validating individual proxies, then the validation exercise will essentially only confirm the ability of the proxies to reconstruct interannual variations. However, in an exercise of this nature we are also intrinsically interested in reconstructing longer-term trends. It therefore appears to be preferable to retain trends in the data, so that we are also assessing the ability of the proxies to reconstruct this information.”

Scientific study resubmitted.
An issue has been identified in the processing of the data used in the study, “Evidence of unusual late 20th century warming from an Australasian temperature reconstruction spanning the last millennium” by Joelle Gergis, Raphael Neukom, Ailie Gallant, Steven Phipps and David Karoly, accepted for publication in the Journal of Climate.
The manuscript has been re-submitted to the Journal of Climate and is being reviewed again.

I take an opportunity for an OT question since I know the “language police” makes frequent visits here 😉 Should it be spelled “resubmitted” or “re-submitted”, the above uses both?

Since reading Lynne Truss’s nicely written “Eats, Shoots and Leaves” I’ve become much more relaxed about the use of hyphens in English. There are no hard and fast rules – at least that’s how I remember it from Ms Truss. Use the little critters whenever one’s meaning-in-the-sentence is clearer that way. In the UoM case I’d say either way is fine but they should at least have been consistent, as a matter of good style.

How about this: if doing it right (detrending) leaves you with too few series to analyze, then stop. How about don’t do it. Using any old method because it enables you to get some sort of result is just rubbish. The emails clearly show an understanding of the problem of data mining/spurious hockey sticks. Doing it wrong when you know it is wrong…what can I say.

I wonder if the authors stop making attempts to get this work published , despite its problems . Because they been given the heads up that the ‘important bits’ of it were going to make it into AR5 published or not, as seems to be the case ?

I think “going to make it into AR5” is too strong, personally. They’re still going to try but this time the email spotlight comes ahead of the final decision, not years afterwards. That’s making it all a tad difficult.

“I think “going to make it into AR5″ is too strong, personally. They’re still going to try but this time the email spotlight comes ahead of the final decision, not years afterwards. That’s making it all a tad difficult.”

The Gergis paper in question is probably no worse than many reconstruction papers that have been and will be included in IPCC reviews and certainly if compared with Mann(2008). If they question the Gergis paper it opens up the whole process to second guessing past reviews and since the review process has more to do with a policy position than nuetral science I would not think that is going to happen.

The fallacy of the proxy selection process used for many of these reconstructions is simply not understood by most climate scientists and I would say a number of posters here.

And those are the hard facts of life. The sooner we understand and accept those facts the better we will know that, perhaps at this point in time for nobody’s benefit and intellectual curiosity other than our own, we need to continue to analyze and question these works.

I agree both on Mann(08) being worse than what we currently know of Gergis(12) and on the worth of CA’s critique even if it’s not possible to affect the decisions of AR5 in any way. But bending the IPCC rules is something less technical people can understand. People have read about Casper and the miraculous resurrection of his paper for AR4. At some point the game really will be up. If not now, when? as some renegades said in a tougher spot than this.

True but this one gives them something they want , southern hemisphere ‘poof’ of the hockey stick , the Team are smart enough to know a lack of such proof is an issue , although they never admit that in public. Now it seems Gergis given it to them, hence the need for it to be in AR5 one way or another.
And so although it be nice to think the IPCC would not try this trick now they know people are looking for it. On past performance would anyone be surprised if they did , after all no matter how much damming it would get on web sites like this , how much other press coverage do you think this ‘trick ‘ will actual get?

This is not science , this is politics and so its got very different rules .

Of course they want ‘poof’ (exactly) of a southern hemisphere hockey stick. I’m not saying they won’t try, in fact I’ve said they will try. I’m just saying it’s harder for them this time. It’s not just Climate Audit, it’s the editors at Journal of Climate, it’s The Australian and many other individuals across the world, so that even if Gergis(12) goes into AR5, via PAGES 2k or any other sleight of hand, it will have a significant effect on the credibility of the whole. And that the IPCC could really do without, as governments take a much harder look at the whole area at a time of austerity. Because CA and Jean S moved so fast, and because the University of Melbourne moved so fast in response to FOI, we have a very different situation from AR4. It’s win-win, whichever way they take this now.

Re: Kenneth Fritsch (Oct 30 16:20),
It is illustrative to remind readers about Mann’s answer when the screening issue was raised by M&M in their PNAS comment to Mann et al (2008).

McIntyre and McKitrick’s claim that the common procedure (6) of screening proxy data (used in some of our reconstructions) generates ‘‘hockey sticks’’ is unsupported in peer-reviewed literature and reflects an unfamiliarity with the concept of screening regression/validation.

Jean S, thanks for this reminder. Because the topic is undiscussed in climate science academic literature, we cited David Stockwell’s article in an Australian newsletter for geologists (smiling as we did so.)

The topic has been aired in “critical” climate blogs on many occasions, but, as I observed in an earlier post, the inability to appreciate this point seems to be a litmus test of being a real climate scientist.

McIntyre and McKitrick’s claim that the common procedure (6) of screening proxy data (used in some of our reconstructions) generates ‘‘hockey sticks’’ is unsupported in peer-reviewed literature and reflects an unfamiliarity with the concept of screening regression/validation.

Ironically this claim by Mann reflects either an unfamiliarity with issues associated with screening regression/validation, or is just blatantly dishonest.

Thanks for the link (should have checked it before writing) but it seems to be dated in May, unless it contains some sort of later update.

[Jean S: They’ve been updating the text few times without a notice. The May date refers to the original date the news item was published.]

Possibly this won’t matter a bit to AR5, since the SOD Figure 5-12 from the PAGES 2k Consortium shows amazing agreement on the thousand-year hockey stick for 8 of the 9 geographic areas (excepting only Antarctica). Assuming the IPCC has any sense of shame left, they might just drop Australasia and still have 7 of 8 properly falling into line.

I believe the Gergis et al material has been subsumed into the PAGES 3k consortium report, which has been submitted to Science and is presently referenced in Chapter 5 of the SOD to AR5. So presuming it gets a friendly review from Science, it won’t matter if they resubmit to Journal of Climate. In fact, it would probably be better for them if they didn’t. (Less unfriendly attention).

This whole issue of selection by (a) not detrending which looks for proxies that are better thermometers at lower frequencies and selection by (b) detrending which looks for proxies which might follow the wiggles and waggles of the higher frequencies of the instrumental record but not necessarily follow the instrumental trend does not get around the issue that the selection cannot be after the fact but must rather be based on a rational criteria before the fact.

The confliction that those scientists, who might ignore using an established selection criteria and using all the selections that criteria provides with some option in excluding outliers that can be explained away, that the email exchanges discuss is real. If you have looked at large numbers of proxy candidates for temperature one will notice that some proxies do respond to obvious disruptions in temperaurese at higher frequencies like for example a volcanic eruption or even the instrumental temperature record . The defenders of using these proxies for temperature will expound that see there the proxy is responding. In fact I have seen Rob Wilson make this very point in a CA post. The problem is that the proxy response is often at different intensities for like proxies in the same locale. Such proxies could give a reasonable high frequency response and correlation but not act as good thermometers in detecting trends – and it is trends that these reconstructions are primarily concerned with measuring.

It is my view, if one were to establish and test a rational criteria to be used for proxy selection, a good proxy thermometer would have to match lower frequency trends and not be exclusively a higher frequency wiggle match. If, however, one did not use an established criteria for selection but rather hunted for proxies that match the recent warming trend one quickly runs into the fact that proxies, many of which exhibit autocorrelation and even long term persistence character, can show series ending trends both upward and downward. Selecting an upward trending proxy could well then be selecting for a random persistence over time and not a temperature response.

The fact that these finer points are not discussed in the climate science community is telling for me and makes me very wary of their conclusions.

The reproach addressed to screening is the loss of variance outside the calibration period. In itself, the screening is not the cause of the loss of variance, the use of poor proxies is responsible. The screening causes a phenomenon similar to Mann’s trick, incorporation of instrumental characteristics in a limited period (here that of screening). We can therefore say that the origin of the hockey stick is not the method of proxies selection but :

1° The poor quality of proxies.
2° The presence of a blade in instrumental data.

There’s something else, all proxies used are probably not so bad. Then the problem is they are not selected against true temperatures but against bad instrumental data what may cause elimination of good proxies in favor of bad series following better false reference.

Even if it was not the exact reason for Gergis claim to apply the screening on detrended values, this method can effectively avoid this trap. Another precaution should be taken in a second time, eliminating series with no good crosscorrelation throughout their duration. And if nothing remains after these selections, then so be it, we at least know we don’t know.

Selecting an upward trending proxy could well then be selecting for a random persistence over time and not a temperature response.

When looking at the Gergis et al proxies, it appears that selecting based on full data correlation (ie – no detrending) selects inhomogeneous, non-stationary time series with discontinuities that create an artificial trend upslope by matching against the instrumental trend. Of course any comparable inhomogeneous series in the selection pool with a trend downslope would be excluded. This automatically creates the situation which Roman wrote about in his CA article here. Instead of the error term in the regression being unbiased (expected slope value of 0) allowing full expression of the climate signal, it now has an expected positive value which will compress the climate signal in the reconstruction.

Workplace chatter as Phil Jones instructs apprentice Raphi in the delicate art of sausage making and guarding the secret sauce:

Dear Phil,
Thanks a lot!
….
– The issue of the importance of the proxies is indeed complex. I made a lot of tests.
For instance I am aware that CAN 11 first lets the skills drop but over the entire
period the skill is better if I take it. Also, often the REs of the mean and the
fraction of locations with positive REs react in opposite direction by adding/removing
one proxy. The influence of each proxy changes with every new proxy combination. (that
is also the reason why I started now working with ensembles instead of just using one
“optimized” proxy set).
– I’ll make a compilation of my proxy records and send it to you. and I’ll send you the
contacts for the confidential material.
Cheers
Raphi
Ralphi,
I’m not saying you should use these, but they might do better than those in GHCN.
Hopefully it won’t take long to check. I suspect there will be little in it, so you might
be able to use what you already have.
I’m fully aware of how sensitive a PCR program can be to the addition/deletion of one
site!
Cheers
Phil

One must feel somewhat sad for Karoly. From being a well-respected University of Melbourne physicist, he now appears inexorably bound in the Tim Flannery (former “Australian of the Year”) direction, as a figure of mirth.

The Journal of Climate has done some correct and usual actions and are to be thanked for them, but it could do some more. For example, the peer review seemed to occupty a couple of months and 3 people near the start of 2012. Yet, in subsequent material, we are told that some of the work was still in progress in June 2012. Is it ethically corrrect for a reviewer to pass judgement on a paper whose work is unfinished? There is an alternative, which is to publish a shorter paper using only completed data and conclusions, then add to it later as more results come in.

There is also the problem of recalcitrant authors. Some of the cast of thousands are refusing, still, to supply data beyond the calibration period. Others are playing simple chess games about moving authors, dates, data, status (Pages) etc around an imaginary board to (a) thwart people like Steve and (b) presumably, to conceal deficiencies in work that could do with a deeper look.

The Editor of Journal of Climate would be most helpful to the scientific process if he requested SI’s with the rejected proxy studies, from each of the 30 or so named authors. This was the start of the matter. At one stage there was doubt that the minor authors could be rounded up in less than a month simply to sign their names on a paper to give it standing. Seems to me that those autographs are still needed as part of the process. Does the IPCC accepted unsigned documentation?

Yes Betapug – the 2009 email with Phil quoting his seminal 1986 papers to newby Ralphi has me in stiches. I commented on it back in early August after a reader Skiphil alerted me to it at my blog. See comment 16.http://www.warwickhughes.com/blog/?p=1688
not often something of such beauty comes into view.

I had googled “Karoly & Neukom” in idle curiosity. Seems both Skiphil and I were struck by Phil’s wry awareness of the sensitivity of “the addition/deletion of one site!”.

I liked the tone of the bartering of ingredient information:

“I’m putting together a large UK bid on SH proxies – for Dec 1. Maybe I can
mention this paper and the EU project and NCCR funding?
We won’t hear about that bid till next April and then it wouldn’t start till 2011. The
point I’d like to mention is that you’ve got an excellent dataset together. So the
question is will the raw data be available once your paper is submitted/published?”

“the possibility of a significantly different result and conclusion. It was on this basis that I requested that you resubmit withdraw the paper (and not because of flaws in the analysis method). I understand that the results may well remain essentially the same after the redo, but this is not something that I can assume to be true .”

So if the results do not show a hockey stick, the editors are not interested?

No. it will show that the result is not robust to the issue of detrending. This was a question from editor Broccoli and would likely be a factor in the decision as to whether or not to accept the paper. I presume that, at the very least, this issue will have to be discussed in any published paper with reasons why one result was preferred over another which gave a different result.

This is a major put down by the editor which informs the authors that their suggestions on this issue are not acceptable and would result in the paper not being accepted for publication. it is a curt and abrupt dismissal of the authors by the editor.

Results that are not significant are just as important, but the paper would be very short, especially in the results /discussion sections, and would be more use for next time / or to a new group of researches who get involved. But most of all it would be non JoC material, because although it’s “important” in a Feynmann-like way, it simply isn’t interesting enough for a “top end” journal, And it would be non -news worthy. And when you’ve already created such a fuss….

That’s why the authors are happy to forget about the basic tenet of science…

My guess is the conclusions of the original paper are dead in the water, and with it the JoC submission. The JoC editors summed it up nicely: “My understanding that this is what you plan to do, and this is a good opportunity to demonstrate the robustness of your conclusions.”

In other words, the editors already know the robustness is the key issue now. The authors were obviously never taught how to do a literature review. Just because Mike Mann did it, doesn’t make it an acceptable method nowadays.

Even if they had never mentioned the infamous detrended data, the JoC wouldn’t have accepted it after review.

The subtext throughout the Karoly-Gergis coauthor’s response seems clear enough. They know their conclusions are incontrovertible. They are just having some minor technical problems in finding the evidence and methods which will support them.

“…I also see the point that the selection process forces a hockey stick result but:
– We also performed the reconstruction using noise proxies with the same ARl properties as the real proxies.
– And these are of course resulting in a noise-hockey stick . But they are not able to reconstruct the full amount of 20th century warming and basically loose all interannual variability (and decadal before the calibration period). (attached figure, solid is proxy reconstruction, dashed is noise reconstruction, dotted instrumental)
– The noise recons have no skill (negative REs all the way through; second plot attached).
So it is truly easy to reconstruct a hockey stick with our screening but not one with reasonable variability back in time. and the REs s how that we can get some skill also at interannual timescales with our proxies (and not with noise), also evident by the correlation of 0 . 75 of our reconstruction with the target after detrending.
I can also run a reconstruction using the proxies that were excluded. This reconstruction will most probably also show a hockey stick, but again bad skill. This will show that the hockey stick does not depend on the proxy screening…”

—————————————————-

Is this working ? Justifying selection after selection with “skill”, i.e. improved correlation in the pre-calibration period ? I think not.

Assume in an thought experiment that, for simplicity, temperatures had 2 maxima of the same size in the past, one a thousand years ago and one around present time.

The proxies would be of varying quality, some with good correlation with temperature, some only for part of the time and some not.

To answer the question, if present temperatures were unprecedented, an average over all proxies may already give a noisy but correct answer, if the poor quality proxies cancel noiselike. The skill is however, in Neukom’s terms, “bad”.

What happens after screening with present time measured temperature data ?

Proxies that may have been good in previous times but not in present time are excluded.
Proxies that have been good in present time but failed to show the maximum a thousand years ago are kept.
As a consequence, the present maximum may then be enhanced, the maximum a thousand years ago is flattened. A hockey stick is generated.
The overall correlation value, however, may be increased, because a subset was chosen that showed temperature correlation at least during some time.

Thus, for deciding if present temperatures are unprecedented, the first reconstruction with “bad” skill MAY give the right answer, the reconstruction with “good” skill WILL give the wrong answer.

At the risk of appearing naive, what is the likely meaning of the term “attaching dishonesty email” as used by Karoly in the sentence:

“Following some email discussions with Mike Mann [attaching dishonesty email] and helpful discussions with you both last week, there appear to be several different approaches that we can take with revising the Australasian temp recon paper.”

DGH, I don’t see what’s supposed to be the problem with that as a general statement. If you find you accidentally didn’t implement the methodology you claimed to use (it happens), it seems reasonable to say you’re uncertain of the effect fixing your mistake will have.

Somehow these “scientists” don’t believe that a negative or inconclusive is a valid result. It is a perfectly valid result. Most experimental designs aren’t well constructed and should give inconclusive results. Other experimental designs are so poorly constructed that they give spurious results. Only a select few should be so good as to give a definitely positive result.

In this case, are we to scrap the bad for the even worse? I, for one, would welcome more journals publishing inconclusive results to show that more about the world is unknown than “science” would suggest. This, however, might shatter the pretense of certainty that certain “scientists” would like to construct.

ObtuseFaction, it’s certainly true most experimental designs “should give inconclusive results” (I’m not sure I’d agree about them being poorly constructed). However, I don’t think that means we should necessarily publish inconclusive results.

Journals are usually about the advancement of knowledge. Saying, “We know nothing new” does not contribute to that advancement. As such, we should not expect most experiments to produce publishable results.

Traditionally, in brief, a scientist sets up a hypothesis for testing, describes how it is to be tested, mentions identified difficulties, then proceeds to evaluate if the hypothesis is supported or not.
The scientist does not say ‘Let’s poke this with a stick again because we got mixed up first time around, and if it still looks presentable then we’ll publish it as a revision.”
The Scientific Method has a logical approach.

The result with proxies which pass detrended screening would not be inconclusive in general; one could form an estimate of Australasian temperatures which goes back perhaps a few centuries. But it would not be able to reconstruct 500 to 1000 years ago, as the original study claimed to do.

To my mind, this is merely a logical consequence of the paucity of proxies. Even the proxies which are available, are drawn from a very limited geographic area, hence the uncertainty range should be quite large.

HaroldW, are you sure one could form any sort of valid temperature reconstruction given their stated methodology? They had little enough data when they used the wrong methodology. If only ten or so series passed screening with the “right” methodology, I’m not sure anything could really be concluded.

That said, if they could get some sort of useful reconstruction, I’d certainly have no problem with them publishing it. I don’t think they need to go back 1000+ years to be able to publish.

John Andrews, it isn’t really a publishable result to say, “We looked at some data, and we couldn’t find anything because there was too much noise.” It’s not that the authors couldn’t find a hockey stick. It’s that they couldn’t find anything. As far as their method is concerned, there is no signal in the data, so there is nothing to talk about. Nobody would expect that to be publishable.

Now then, if one examined data other people claimed to find a hockey stick in and showed you cannot find a hockey stick without cherry-picking/using biased methodologies/etc., that should be publishable (emphasis on should).

But who wants articles from authors saying they looked at data and found nothing?

MikeN, only one of their proxies going back prior to 1400 would pass screening if they had implemented the procedure they described. As far as their methodology is concerned, most of those series are not temperature proxies. As such, they don’t have the data to make a reconstruction.

The fact their methodology cannot find any temperature signal for the past does not mean they should announce their methodology fails to find a particular signal for the past. The “no hockey stick” you refer to could just as easily be described as finding no MWP. It’s a meaningless statement.

When you have no information for a period, you cannot draw conclusions about that period. All you can do is say, “I don’t know.” It is difficult for that finding to merit a paper.

(That said, I would love to see what would happen if people used unbiased methodologies to examine all the proxy data used in reconstructions so far. I suspect the result would be that no conclusions could be drawn about the temperature differential between current times and the MWP.)

This is the sort of attitude that would have rejected the Michelson-Morley experiment, “the greatest null result in the history of science” on the grounds that it failed to find something and therefore should not have been reported.

Macumazan, I’m not sure whose attitude you’re referring to, but this is a simple matter of null hypotheses. If our null hypothesis is the data shows up nothing, a failure to find anything means little. On the other hand, if our null hypotheses is that the data will show something, such as in Michelson-Morley experiment, a failure to find anything may mean something.

As a general rule, a lack of results isn’t worth publishing. This is just a general rule though. Since a lack of results may sometimes be surprising, that lack may sometimes be worth publishing. If you want to consider those exceptions, you must use a rule which discusses null hypotheses.

The fact somebody states a simple, general rule does not mean they are unaware of or unwilling to accept exceptions.

Brandon, In some fields of science it’s quite common to publish a lack of response to a scenario, to avoid future workers from re-inventing the wheel. For example, I have written or helped write many reports on exploration of mineralised areas, where our testing showed the minerals to be uneconomic to recover, merely anomalous. So the report did indeed read like “We looked at some data, and we couldn’t find anything because there was too much noise”. Probably the majority of our reports read this way, because economic ore deposits are but a few among the many anomalies examined. Irrespective of the type of statistical work we did to arrive at this conclusion, it was still well worth the effort to note the result because it was very expensive to derive it and we had to be darned sure we were right. Indeed, the difference betwen profit and bankruptcy sometimes rested on the confident use of statistics. That was one of several measures of excellence.
Besides, in many cases, the regulations under which we worked required us to do just that.
Further besides, there is no win when one merely starts to mine a piece of ground at random in the hope that some minerals will be discovered. In climate work, there is no win in supposing that a property is an indicator of temperature just because you hope that it will show a response after the sausage machine has marinated some inputs.

Detrending does help a bit for establishing correlations, but a much better method is to use rate of change for all variables. If relationships are real, then they will show up in rate of change as correlations. This depends much less on the general shape of the curve (and detrend only removes first component of that) and much more on the specific fluctuations. Do bear in mind that correlations will be lower with rate of change but results more meaningful. For annual data, the individual years will also then be pretty much independent data points unlike trend removed where there are strong year to year correlations making the correlations found rather spurious.

Yes, splitting into frequency ranges is worth doing. However maximizing the r2 is not the objective IMO. Using longer time period smoothing reduces the significance of r2 because effectively n (no. of points in data) has been reduced. Time period of changes should represent the period in which cause and effect can be expected to be linked on reasonable grounds.

Don’t agree a lack of results isn’t worth publishing. Both the protocol and the results (lack) are of interest to anyone researching the field. At the very least, showing an approach to avoid. At most, offering a surprise if there should have been results.

If they can’t reliably go back beyond 400 years then they should publish for those 400 years. This is still useful information for the literature. The data aren’t always as amenable as we would like them to be. They should accept this and not go for a longer reconstruction at the expense of forcing a specific shape of the reconstruction thereby making the result meaningless.

It is not my view that detrended calibration is “right” either. Jeff Id and I are in 100% agreement that you either take all proxies from ex ante class or none of them.

Detrended calibration serves as a test of the robustness of the proxies. As editor Chiang observed, comparing the two results offers Gergis and Karoly a chance to show the “robustness” of their method. My guess is that their resubmission will show the non-robustness of their method and that they will argue that the non-robustness doesn’t “matter” (TM-climate science) and that the version that they like is “right” because it has a better RE statistic.

Then use a simple average, right? This isn’t really complicated, even for the rank outsider. You don’t cherry-pick. You use all proxies that you judge for physical reasons, before you start, may correlate with temperature. And let the chips fall where they may.

Simple average over proxies identified as responsive. That is: if you think “granny smith apple trees growing on the south facing slope of hills in Conn.” are proxies, you use all “granny smith apple trees growing on the south facing slope of hills in Conn.” What you don’t do is take data from 100 qualifying “granny smith apple trees…” and then compute the correlation coefficient with the thermometer records and decree the trees with the best correlation “responsive” and those from the worst “unresponsive”. The latter will result in loss of variance outside the thermometer record.

In the case where the thermometer record looks as it does “loss of variance” translates into “hockey stick with fairly straight shaft.”

How will the ex ante class be chosen then? Isn’t the purpose of screening to create an ex ante class that is valid. If it is not known if the proposed proxies are really “proxies” or not, would not the inclusion of extraneous proxies be counterproductive?

this sort of problem is familiar in econometrics – see literature on data snooping. Clearly one has to do studies to decide choose the class of proxies (be it white spruce ring widths at treeline or delD in ice cores), but once you choose the class of proxy, you should take all of them. If you conclude that there is a compounding factor that should have been taken into consideration in your ex ante decision, then one can posit a new ex ante criterion, but you then need fresh data to prove the point.

For example, Mt Logan dO18 goes down in the 20th century and is screened out in typical “climate science” reconstructions. Its going the wrong way is said to be due to “regional” variation. But once you admit regional variation, who’s to say that a greater increase in some other location (e.g. Thompson’s Dasuopu) isn’t due to regional variation going the “right” way. The screening gives a HS bias to the composite.

This seems completely obvious to me, though it seems to baffle the climate community.

“Isn’t the purpose of screening to create an ex ante class that is valid”

Yes, but ‘valid’ for physical reasons and not because *some* of the wiggles match temps during some calibration period. Some of those wiggles will match just by chance. This does not make it ‘valid’ in the non-calibration period.

“would not the inclusion of extraneous proxies be counterproductive”

No. That information would be useful in determining if the class is valid (on physical grounds).

this sort of problem is familiar in econometrics – see literature on data snooping. Clearly one has to do studies to decide choose the class of proxies (be it white spruce ring widths at treeline or delD in ice cores), but once you choose the class of proxy, you should take all of them. If you conclude that there is a compounding factor that should have been taken into consideration in your ex ante decision, then one can posit a new ex ante criterion, but you then need fresh data to prove the point.

Here’s my two cents on this:

I’ve had experience with the issue of screening in other areas of science. We developed a screening protocol from a population sample, but none of the population sample that led to that screening protocol were used in the final statistical analysis (prevalence measures in this case).

As we progressed, we identified issues, and we accordingly changed the protocol. As we did, we labeled these as population sample I, II and III. As you said, you should always collect new data if you think the change is important,however, you should never apply the new criteria to an existing population sample, otherwise a potential idiosyncratic bias gets introduced (non-modelable biases), and as such, has to be avoided.

Data from population sample I should never be included in sample II and so forth. Indeed, one thing you look at is whether going from sample I to sample II results in a statistically significant shift in quantities of interest (e..g. prevalence).

In my opinion, this is the only right way to do screening. The sorts of deviations from this practice that are often practiced in paleoclimate lead to uncharacterizable errors. If you can’t characterize the true uncertainty, as far as I’ve always been taught, the measurement is worthless.

1) the difference between trended and detrended calibration. What does it say about calibration. does it show calibration to be a robust process, or does it indicate that calibration is not robust, sensitive to methodology.

2) that a hockey stick only shows up with trended calibration. without trended calibration there is no hockey stick. thus, the hockey stick may be a result of trended calibration, not of climate change.

However, both these findings would appear to be at odds with what the authors of the paper believed they would find when they set out to do the study, so they are unlikely to recognize the significance of what they have actually found.

What Karoly and Gergis have found is very significant as a study in statistical methods as applied to climatology. However, this significance is lost to them because it wasn’t what they were looking for.

In effect they have set out to find gold. Instead they have found platinum. Since they were looking for gold, they are trying to find a way to say what they found is “goldish”.

I gather that Gergis et al will not soon be publishing a paper that documents what the data really show. Results that “fail to support” or “appear to contradict” the original hypothesis are perhaps even more useful to science than those which merely support a hypothesis.

If no publication occurs; a non-scientific question: How long will they have to return ARC funding consumed in that scientific research?

Many of your posts are valuable long after the date of the initial post, even years later. I appreciate that all your posts include the year in the header (some websites still don’t do that, unfortunately). However, in the body of the post, at least once or twice at the beginning, it would be very helpful to the reader — especially the reader a year or two from now — to see what year this was all happening in. For example, the first reference could be “June 10, 2012” rather than just “June 10.”

Apologies again for raising a minor nit. Kudos for the excellent work in tracking all this down and putting it together in a way that all can see what is happening. Extremely valuable.

The PAGES 2K Consortium article was submitted to Sciencemag on July 30, 2012, one day before the IPCC deadline. According to the available emails from Journal of Climate, Gergis et al 2012 would have been rejected on July 27 (unless there is a subsequent waiver that we don’t know about.)

Nonetheless, its status was falsely described by the PAGES 2K Consortium (of which Gergis was a member) as being “under revision”.

I haven’t noticed this discussed anywhere. Some interesting slides in this doc…. What seems to be a seminar presentation by Phipps says that there is a special Journal of Climate issue devoted to PAGES Aus2k, due to appear in 2013.

When building regressive models, one should consider what it is that is trying to be established. I suggest that we are interested in future rate of change of temperature. Therefore rate of change of temperature is the best thing to use in models, and both raw data and detrended data have problems as noted.

Phipps CV revised as of Nov. 19, 2012. Still showing the Gergis et al (2012) as “in review” which is no surprise since it was only re-submitted recently. Anyone know anything of the item [23] listed below with Phipps as lead author? That one may well bear close scrutiny when it’s available:

I had not seen this before….
Gergis has a personal website (quite different from her blog which was ‘disappeared’ last spring when the controversy began). Gergis et al (2012) is listed as “resubmitted September 2012, in review” on the publications page:

Climate scientists and U. of Melbourne news office take no responsibility for correcting their misinformation.

This (below) is still on the web along with various other “news” items scattered about from last May, as of now uncorrected.

I searched the ScienceDaily site, and no other item with updates or corrections seems to have been published.

This is interesting because the article is simply a regurgitation of the U. of Melbourne press release, as noted at the end of article.

So there is no responsibility taken by either the co-authors or the U. of Melbourne for correcting all the misinformation that THEY sent out and which was published according to their overhype in May 2012. They think it is enough to merely add a “re-submitted” note and leave all the propaganda uncorrected. Very revealing…..

Science News

1,000 Years of Climate Data Confirms Australia’s Warming

ScienceDaily (May 16, 2012) — In the first study of its kind in Australasia, scientists have used 27 natural climate records to create the first large-scale temperature reconstruction for the region over the last 1000 years….

The paper will be presented in “Pages Goa 2013” -meeting (yes, Goa, India) in mid February 2013. As the deadline for abstracts was September 8th, I think the abstract found here reflects the resubmitted paper.

Comparing the Goa abstract to the abstract of the original manuscript reveals only very minor changes. They seem to be still using screening (“temperature sensitive palaeoclimate network”), but the screening method or the number of proxies in the new network is not given in the abstract. However, they still claim “skilful reconstruction” for the whole period, so I think we can safely exclude 3. from Karoly’s list of options (anyone surprised?). Since the “explained inter-annual variance” has changed (69% -> 59%), I think 1. (and also 2.) can be also ruled out. Which only leaves 4 … maybe they adapted some variant of Mann et al (2008) opportunistic local screening? Anyhow, it will be interesting to see if they even acknowledge the possibility of the screening fallacy.

Is it “interesting” that Eric Steig of RC is now listed as a co-author?? I had not seen his name associated with this paper or with the SH group of the PAGES project before, although of course he does work on Antarctica…. would be interesting to know when and why he was brought into this group. Perhaps doing the dirty work as a cut-out for Michael Mann and the RC team…..

There’s that favorite word “unprecedented” — this time for their vast “network” of SH proxies.

btw what are the “300 sites” mentioned… do they really have so many or are they dividing previous “proxy” locations into multiple “sites”??

ahhh, I was commenting upon a different abstract for another SH paper, which involved some of the same co-authors, but is not the one referenced on this thread…. that abstract appears further down in the PDF linked by Jean S.

Since Skiphil’s last update (Jun 29, 2013) Phipps seems to have updated his publication list for a couple of times. Gergis et al. is nowhere to be found. Gergis, on the other hand, does not seem to have updated any of her publication lists. I have the feeling that the paper is gone for good. 🙂

Steve: I wonder what Gergis thinks about Mann using NH proxies to reconstruct SH temperature.

[…] Australia’s Joelle Gergis (another in a long line of activist scientists) and her colleagues did their best to resurrect Michael Mann’s notorious hockey-stick. The madness in their methodology evidently failed them. […]