McShane and Wyner Discussion

McShane and Wyner, previewed in August, has now been published by Annals of Applied Statistics as a Discussion Paper here with an accompanying editorial by Michael Stein and discussions by 13 different groups (one of which is a short comment by Mc and Mc, with an excellent Rejoinder.

BTW using Firefox, I wasn’t able to open the papers by clicking, but I was able to download the papers.

CA readers are well aware of my own view that the fundamental problem in paleoclimate is not the need for some novel multivariate method, but better proxies and reconciliation of discordant existing “proxies”. CA readers are also aware that Team reconstructions use highly stereotyped proxies over and over again in different guises – bristlecones, Yamal – sort of an ongoing version of the Dead Parrot skit in Monty Python. McShane and Wyner used the Mann et al 2008 data set, which quixotically introduced the Tiljander sediments, the modern portion of which was contaminated with bridge-building sediments.

Given the central role of these specific proxies in the target data set, I checked the various discussions to see if anyone mentioned either bristlecones or upside-down Tiljander. Given the academic-ness and non-engineering-ness of the discussion, these fundamental issues of data quality are, needless to say, not noticed by the new entrants to the discussion.

Berliner of Hu McCulloch’s Ohio State, in a short comment, says sensible things about the poor quality of the proxies without specifically attending to nuances like bristlecones or upside-down Tiljander. My guess is that he would further roll his eyes if he were aware that this sort of thing is so deeply embedded in the field.

Other than a brief mention by Ross and I, the only discussants to mention bristlecones and Tiljander were Schmidt, Mann and Rutherford.
Schmidt et al analyse a subset of Mann et al 2008 proxies which excluded the Tiljander proxies (which they coyly described only “potentially contaminated” – ignoring both the clear original statements and subsequent clarifications by Mia Tiljander that the modern portion was contaminated.) However, this subset includes multiple Graybill bristlecone series – data sets well known to have selected by Graybill for strip bark. The NAS 2006 panel had recommended that this data be “avoided” in reconstructions; Mann et al 2008 said that they were adhering to the NAS 2006 recommendations, but used bristlecone chronologies anyway. CA readers are well aware of the pea-under-the-thimble character of the Mann et al 2008 sensitivity analyses related to Tiljander and bristlecones – they purported to show that bristlecones didn’t matter – in their highly publicized nodendro reconstruction – using upside-down contaminated Tiljander sediments, and then to show that the upside down sediments didn’t “matter” by using bristlecones. In a grey supplementary information to a different article (Mann et al 2009), key conclusions about the nodendro reconstruction were quietly withdrawn (without placing a notice attached to the original article.)

Schmidt et al further disseminate the disinformation that Mann et al 2008 performed a meaningful sensitivity test on the impact of bristlecones and upside-down Tiljander – remarkably failing to cite the partial withdrawing of results in Mann et al 2009.

Salzer et al 2009, referred to here, notably did not cite Ababneh’s discordant results on Sheep Mountain bristlecones, where she was unable to replicate Graybill’s results. The failure to reconcile Ababneh’s results has been well known to CA readers for a long time and it is bizarre that people attempt to reconstruct past climate using data sets where conflicting results are simply ignored, rather than reconciled.

In a response reminiscent of Wegman’s imperious dismissal of after-the-fact changes to MBH methodology by Wahl and Ammann to “get” the desired result (see Wahl and Ammann, Texas sharpshooters), McShane and Wyner dismiss Schmidt et al’s post hoc ad hoc editing of the data set:

The process by which the complete set of 95/93 proxies is reduced to 59/57/55 is only suggestively described in an online supplement to Mann et al. (2008)3. As statisticians we can only be skeptical of such improvisation, especially since the instrumental calibration period contains very few independent degrees of freedom. Consequently, the application of ad hoc methods to screen and exclude data increases model uncertainty in ways that are unmeasurable and uncorrectable.

Yup.

McShane and Wyner comment that there is an error in Schmidt et al Figure 2, which they were able to diagnose precisely through examination of online code. They archly observe that the error arose through “improper centering”. Plus ca change.

Before proceeding, however, we must note a troubling problem with SMR Figure 2. Visual inspection of the plots reveals an errant feature: OLS methods appear to have non-zero average residual in-sample! Upon examining the code SMR did provide, we confirmed that this is indeed the case and discovered the models were fit incorrectly. The culprit, ironically, is an improper centering of the fitted values.

118 Comments

I accessed the paper, the editorial, the discussion papers and the rejoiner through Anthony’s site. Would it be impolite to request that this thread not become a polemic for the last sentence of the editorial?

What a joy it is to have real statisticians involved in the debate! Behold:

we observe that the simulations appear to have smoother NH temperatures than the real data. This is confirmed by the much smoother decay in the autocorrelation function. Furthermore, the partial autocorrelations appear to die out by lag two for the simulations whereas for the real data they extend to lag four

the result about the lag one correlation coefficient confirms Smerdon and Kaplan (2007)’s observation that the ”colored noise models used in pseudoproxy experiments may not fully mimic the nonlinear, multivariate, nonstationary characteristics of noise in many proxy series”

I love the use of the term “snooped” in reference to data selection in the conclusions:

“Second, we take the data as given and do not account for uncertainties, errors, and biases in selection, processing, in-filling, and smoothing of the data as well as the possibility that the data has been ”snooped” (subconsciously or otherwise) based on key features of the first and last block.”

McS-W note SMR’s use of RegEM code (rather than simpler, more tested stats code) is questionable. “RegEM appears to be a classic, improvised methodology with no known statistical properties” and “we cannot rule out the possibility that RegEM was tailor-made to the specific features of this simulation”.

Wow. Cherry-picked code used on cherry-picked data. That must be what Rosanne D’Arrigo meant about making cherry pie.

Our paper demonstrates that the relationship between proxies and temperatures is too weak to detect a rapid rise in temperatures over short epochs and to accurately reconstruct over a 1000 year period.

Pretty much what Phil Jones himself asserted in a “private” email. But it sure is nice to see the analyses.

there is reason to believe the wide confidence intervals given by our model of Section 5 are optimistically narrow. First, while we account for parameter uncertainty, we do not take model uncertainty into account. Second, we take the data as given and do not account for uncertainties, errors, and biases in selection, processing, in-filling, and smoothing of the data as well as the possibility that the data has been ”snooped” (subconsciously or otherwise) based on key features of the first and last block. Since these features are so well known, there is absolutely no way to create a dataset in a ”blind” fashion. Finally, and perhaps most importantly, the NRC assumptions of linearity and stationarity (NRC, 2006) outlined in our paper are likely untenable and we agree with Berliner in calling them into question.

Truth-telling in paleoclimatology. Unprecedented. I never thought I would see the day.

I accessed the paper, the editorial, the discussion papers and the rejoiner through Anthony’s site. Would it be impolite to request that this thread not become a polemic for the last sentence of the editorial?

I did exactly the same and felt exactly the same once I’d read a few of the WUWT comments. As I’d read Michael Stein’s introduction I genuinely felt welcomed and appreciated as one of the great unwashed who had stumbled in from the blogosphere! We all know there’s a price to be paid for promotion of ‘unprecedented truth-telling in paleoclimatology’, as Bender has just called it. I salute the editor responsible and would defend him to the last.

“Thus, while research on climate change should continue, now is the time for individuals and governments to act to limit the consequences of greenhouse gas emissions on the Earth’s climate over the next century and well beyond.”

There is no excuse for this. It is almost as if he was panicking that by publishing such a critical paper (McShane and Wyner) someone might call him a (gasp) “denier” so he had to make sure to show his support for the cause.

I am incredibility disappointed to see this sort of language injected into a discussion about statistics and paleo-climatology.
Steve: No more discussion of this editorial comment please. We’re all used to this sort of thing by now. PLEASE LIMIT DISCUSSION TO SPECIFIC STATISTICAL ISSUES RATHER THAN HYPER-VENTILATING ABOUT THIS SORT OF THING.

There seems to be a lack of organization in the statistical climatology camp: Tingley says that Lasso is scientifically unsound and demonstrates its poor performance by not bothering to use cross-validation to choose the bounding parameter. In contrast, his collaborators Craigmile and Rajaratnam, are critical of Lasso saying it doesn’t have oracle or shrinkage properties but end up suggesting an adaptive Lasso, i.e. a tweaked versio of Lasso.

I liked what Mc&W say about purposefully using transparent and uncontroversial methods rather than some Bayesian quagmire. Unfortunately all the trends suggest that the move towards involving real statisticians in the process will just lead to messy overcomplicated Bayesian models which, as Mc&W say, make a whole host of explicit and implicit assumptions that have an unknown bearing on the final outcomes. Richard Smith is the only warmist statistician I can think of who isn’t advocating such an approach. He still thinks a principle components approach is viable.

No doubt they’ll have yet another unresponsive “response” where – all heat and no light – rhetoric outweighs reason by 10:1. (We can probably expect the same on their to O’ Donnell’s refutation of Steig. Because they honestly do not appear capable of much else.)
.
BTW McS&W specifically pointed to the effective irreplicability of MSR code. Michael Mann, will you please drop the line that this code is protected IP, and please start using turnkey scripts that any of us can run? This is the way science is turning. Join the movement. Free the code.

The O’Donnell response meme has already been set by the Team. That is “O’Donnell et al is not a refutation of Steig but a refinement that is yet another conformation of the seminal work of Steig showing Antarctic warming.”

This is a classic response to the innovator by a group which cannot accept change. Anything and everything is a conformation of their ideas. They cannot be wrong.

It is simply amazing to me that these people hold so tenaciously to the few magic proxies. There are literaly tens of thousands of tree cores available to them, and they insist on including a particular set of 25(?) or so bristlecone trees. If these 25 are controversial, drop them, even if the “critics” are wrong about them. If you can’t get your result without them, it isn’t much of a result.

From the rejoinder: “[B]efore embarking on our discussion of their work, we must mention that, of the five discussants who performed analyses (DL, Kaplan, SMR, Smerdon, and Tingley), SMR was the only one who provided an incomplete and
generally unusable repository of data and code.”

This seems like a classic Team move. Now, having been pinned into a corner, SMR will claim that McShane and Wyner implemented the RegEM EIV algorithm incorrectly (without providing any specifics, of course) and declare M&S’s rejoinder to be “bizarre”, “disingenuous”, “unworthy of a response” and ultimately, “wrong”. For the next 5 years we will hear from other climate scientologists how SMR demonstrated “convincingly” that M&S implemented RegEM improperly and therefore “completely debunked” M&S. Oh brother, I’ve seen this movie before.

The only responsible reply is this:
“We are in the process of moving quickly from arcane proprietary codes to publically available turnkey scripts. Replicability is critical to the scientific function (and accountability to the political function), and we want to be part of this exciting movement.”

I join the authors in expressing dissatisfaction with some paleoclimate analyses. I endorse their claim that there has been underestimation of uncertainty in paleoclimate studies. The implication that additional participation of the statistics community is needed is undeniable.

Well perhaps they’ll finally STOP advocating purely statistical approaches, such as MBH98 etc and other nonsensical approaches that flip series up-side because the proper orientation is inconvenient to the result they’re trying to “get”.

Lecturing US one the necessity of models that have physical meaning! Puh-leeze!

MBH98 was not a purely statistical approach in their view (or at least in how I interpret their view). They would argue that it was a combination of a scientific approach and a statistical approach. The ‘science’ part of it, they would argue, is the method they use to select proxies to get the answer they want to get the answer the IPCC wants to get a more reliable result.

See my comment herein about mechanistic physical models of temperature response. Univariate linear correlation is hardly any biolgist’s or pyhsicist’s or chemist’s – or even statisticians’s!- idea of what it means to model proxy behavior. Only with team meme.

Re: oneuniverse (Dec 14 12:42),
Pretty slim pickings for comments over at RC. Given the damning indictment that M+W lay on Team statistical techniques I think it’s safe to say that Gavin has had to use the “cough button” 2 or 3 times for every comment he’s allowed through. A day after posting and so far they’re having a better debate about criteria/criterium/criterion usage rather than anything M+W have said.

The fact that our paper was of interest not only to academic statisticians and climate scientists but also economists and popular bloggers [1] bespeaks of the importance of the topic.

where footnote [1] says
“Steve McIntyre of Climateaudit and Gavin Schmidt of Realclimate”.

And page 14:

We fault many of our predecessors for assiduously collecting and presenting all the facts that confirm their theories while failing to seek facts that contradict them. For science to work properly, it is vital to stress one’s model to its fullest capacity (Feynman, 1974).

when I skimmed MW’s rejoinder yesterday. Was I the only one to read things into the order these apparent outsiders were mentioned? Pretty delicious.

But our host is a hard man to please:

Given the central role of these specific proxies in the target data set, I checked the various discussions to see if anyone mentioned either bristlecones or upside-down Tiljander. Given the academic-ness and non-engineering-ness of the discussion, these fundamental issues of data quality are, needless to say, not noticed by the new entrants to the discussion.

Even with JoC (maybe, eventually, when the ‘system’ manages to stomach O’Donnell et al) and Annals of Applied Statistics paying this increasingly popular blogger the compliment of taking his statistical arguments seriously, it’s going to be hard to buy him with all the kudos of academic-ness I fear.

All of the comments on McS&W are now publicly available for peer review. So go ahead and review.

Let the talk go on forever. Phil Jones in a secret email said it himself: past climate is unknowable to the level of precision that we wish (and that some of us imagine). This is McS&W conclusion. Debate this. The rest is spin.

Cancun, Mexico – In an unscheduled announcement on the steps of the Sacrificial Hall just outside the Cancun Climate Conference, Michael Mann (renowned author, top government funding recipient, esteemed blogger, who also does occasional government funded political research) revealed his latest startling findings to glassy-eyed breathless throngs of believers, almost all slowly milling about, their clothing tattered, and their arms stretched out in front of them (unlike traditional zombies, virtually all of them had their palms up, and were mumbling something about UN grants).

Using temperature proxies laboriously dug up from beneath thousands of feet of ice in Greenland, Mann, Bozo, et al were able to successfully debunk the myth of the Medieval Warm Period. “Our data show that contrary to the deniers claims, temperatures were significantly colder during the Medieval time period — in fact, all of the trees we found from the Medieval time period and were buried, and I mean buried, under tons of ice. Using the latest developments in advanced statistical methods pioneered by Stieg, (his new time/temperature/location adjusting algorithm), called ‘Statistics SMEAR’ (Spreading Metadata Everywhere Annuls Results), we were able to make an actual mathematical proof that between 2000 BC and 1500 AD, the average Greenland sort of area temperature was no higher than it is today, on average, mostly.”

He went on to say “Besides that, the sudden die off ferns, parrots, crocodiles and virtually all other forms of sub-tropical life on Greenland simultaneously, which we pinned down to somewhere between 1200 and 1300 AD, just clinches it. If that doesn’t prove how cold it was in that averaged out time period, then frankly, I don’t know what will.”

In a shocking, but not surprising second announcement, given by Mann’s esteemed colleague (and frequent peer reviewer) Bozo (aka “The Clown”), the Team announced that in unearthing their new Greenland proxies they also stumbled upon a colony of over 275 frozen prehistoric cavemen. In an appeal for an emergency UN research and security grant of some 30 billion dollars, Bozo was quoted as saying “Normally, a couple hundred cavemen thawing out and rambling about wouldn’t be a big deal. But if they gather together and choose a leader, a “Captain Caveman” so to speak, well, then WE’VE GOT A REAL CRISIS ON OUR HANDS!!”

This is a great discussion, and Mc&W puts into print what so many of us have been talking about here, the myriad of problems with the paleoclimate reconstructions. And McShane and Wyner pull no punches, either. I note RealClimate are trying to go with the “it’s not peer reviewed” trick, and this will no doubt be used to keep some criticism from AR5, but this paper really digs deep into the belly of the beast.

Great to see bender in full swing as well. This must be music to your ears, bender 🙂

There are so many good points made in M&W, I’m sure lots of people will dig into it here and pull out some choice quotes. But here’s a little aside I noticed in Nychka and Li’s discussion. They make a distinction without a difference – that Wegman’s original report did not address the PC retention issue, but did so in later questions. Well, that is true, but hardly a big deal, and Wegman was rightly critical of this ad hoc methodology (as is common throughout paleoclimatology, a point made by Mc&W and others, but apparently lost on Nychka and Li). Nychka goes on to say:

The string of references that are cited by MW on page 6 beginning with Mann and Rutherford (2002) established the robustness of the reconstruction with respect to centered verses noncentered methods if several PCs are included.

To me, “robust” means if I kick it, hit it with a hammer, play with the parameters, I get the same result. Here, Nychka and Li note that the entire results and conclusions are highly sensitive to the choice of a parameter which is almost impossible to objectively define. To me, that is the opposite of robust.

This is typical of the team approach to statistics. Fudge the figures until you get the answer you first thought of, then declare it to be “conservative” or “robust”, even if it patently isn’t.

Bad news for the team. Real statisticians don’t play by those rules, and the real statisticians have just taken up hockey.

“This must be music to your ears, bender”
More like medicine for my sick heart. Sick from all the glossy deception. Junk science, junk food for thought. Sick mind. More medicine! More medicine! Stop the eternal spinning of this web of deceit! It’s sickening!

It would be interesting to plot some measure of the aggravation in your posts over the years, bender. You started out, innocently enough, merely looking for a better explanation. That has changed. That you are clearly angrier now compared to then is not what I find sad, the reasons for your anger are what I find sad.

Climategate changed things because it proved that there was a concerted c********y to subvert the objective peer review process. That level of “advocacy” is taking it too far.

I’ve been quiet during this phase because it was mostly heat, no light. Unhealthy.

Now, with the publication of O’ Donnell’s and McKitrick’s papers, there is some light, and with that an obligation to discuss. But there is at the same time more sickening proof that the scientific process is indeed under attack from a focused group of extremist alarmists that have such a preconceived estimate of GAGW in their minds that they won’t allow the data to speak. Their great computers fill the hallowed halls.

If warming is such a threat to planetary health, then it obliges us to tend toward maximum objectivity and transparency. Efforts by faux-scientific advocates to inflate and hype the rate of warming do the public more harm than good.

This article (MW) has stimulated much valuable discussion and helped to focus attention on an important area for the application of statistics. Given the short amount of space, however, we reluctantly comment only on the second and last sections

Kaplan opens with a valuable comment that the team should pay attention to:

McShane and Wyner (2010; hereinafter MW2010) demonstrated that in many cases a comprehensive data set of p =1138 proxies (Mann et al., 2008) did not predict Northern Hemisphere (NH) mean temperatures significantly better than random numbers. This fact is not very surprising in itself: the unsupervised selection of good predictors from a set of pÀn proxies of varying sensitivities might be too challenging a task for any statistical method

Note he is not surpsrised by the result. In other words, there’s not much novelty here. It is a mundane, but practical result that should be obvious to any paleoclimatologist properly trained in statistics.

Ummm. How damning is THAT?

This echoes the sentiment of Berliner:

The authors note that it is common to assume that proxy observations are linearly related to climate variables and they proceed with this assumption. This seems untenable to me (for an extreme example see the Yellow River data in Figure 6). Even if
linearity is plausible, lumping all spatial-temporally distributed data of various types, qualities, and degrees of relationship to climate variables into a variance-covariance based summarization (principal components or EOF’s) with no underlying analysis gives me pause. I am not surprised by difficulties in then extracting usable information.

The muck seems directed at McW&S, but – SPLAT – who gets the worst of it?

When the team say a paper “hasn’t been through peer-review”, what they mean is they were not able to control the editorial process themselves.

Kaplan says “McShane and Wyner (2010; hereinafter MW2010) demonstrated that in many cases a comprehensive data set of p =1138 proxies (Mann et al., 2008) did not predict Northern Hemisphere (NH) mean temperatures significantly better than random numbers. This fact is not very surprising in itself: the unsupervised selection of good predictors from a set of pÀn proxies of varying sensitivities might be too challenging a task for any statistical method”

but the idea that it is valid to “select” proxies from a huge universe of tree ring and other data is data mining and begs the question of why some trees/proxies will be selected and not others. If some trees show no response to temperature during the 20th Century or grow slower as it gets warmer, how do we know that the ones we “selected” behaved properly in the past?

Does the ‘truly comprehensive set’ actually exist in one place? If not, what would it include? Not just all tree rings, presumably, but all other possible proxies? How does one select what might be a proxy? On what basis would one leave out bristlecones, Tiljander, Yamal? What constitutes ‘over-snooping’? (It’s always struck me as highly likely but how does one establish it?)

I thought I was having scrolling problems, but the following text appears twice:

“CA readers are well aware of my own view that the fundamental problem in paleoclimate is not the need for some novel multivariate method, but better proxies and reconciliation of discordant existing “proxies”. CA readers are also aware that Team reconstructions use highly stereotyped proxies over and over again in different guises – bristlecones, Yamal – sort of an ongoing version of the Dead Parrot skit in Monty Python. McShane and Wyner used the Mann et al 2008 data set, which quixotically introduced the Tiljander sediments, the modern portion of which was contaminated with bridge-building sediments. As CA readers are aware, Mann et al 2008 purported to prove that bristlecones didn’t “matter” by a reconstruction with upside-down Tiljaner and to prove that upside-down Tiljander didn’t matter by using bristlecones. (The effect of excluding both was sort of considered in a grey Supplementary Information to a different article (Mann et al 2009), which implicitly withdrew central claims of Mann et al 2008 about its highly-publicized no-dendro reconstruction.)”

Note the extreme differences in approaches by SMR and MW. SMR says 10 PCs are too many and that 95 proxies are too many with no substantial reasons given – not even a bizarre comment. MW says let us see what the data says and then proceed to produce some results. It is so refreshing to hear clear and understandable points being made and not some fuzzy statements that are evidently suppose to stand on the merits of who said it. It is like Schmidt, Mann and Rutherford never heard of data snooping and the statistically unaccountable uncertainties that presents.

Note also that SMR once again invoked the Team rule that if you do not follow exactly what they did in their paper that the critique is completely and irrevocably overturned. It would appear that that tactic did not work in this case. I truly believe that the SMR response suffers from the authors’ inability to turn off the methods used for discourse at RC and other advocacy venues and turn (back) on the approach that most good scientists use in these types of exchanges.

Since I have been looking at AR in proxies of late, it was good to hear MW and MM note the differences in AR that are seen between psuedo proxies, instrumental temperatures and proxies. It means more to me now when MM differ with WA on the origins of the AR in proxies, i.e. an artifact of the proxy and not necessarily the under lying climate/temperature response.

“SMR implement their allegedly objective [selection] criteria in non-standard and arbitrary ways and several times in error. When correctly implemented, the number of principal components retained varies
across each ”objective” criterion from two to fifty-seven. Using ten principal components, therefore, can hardly be said to induce ”statistical overfitting” claimed by SMR.”

Must be hell to not be the smartest guy in the room anymore, eh Dr. Mann?

From BERLINER…….”Rather, the issue involves the combination of statistical analyses and, rather than versus, climate science.”

This has gotten a lot of kudos from both RC and WUWT, but, I’m left wondering, other than incorrectly crunching numbers, what does a climate scientist do? In the end, isn’t it all statistical analysis? I mean, it isn’t really a question of whether CO2 absorbs and then re-emits, but how much. Same with water rising or albedo effects and quantitative cloud mass ect.

Its always been a numbers game, regardless how one looks at it. Starting with the ever changing historical temps, the proxies, or the physical science. I’m all for cooperative collaboration, but other than being a cheerleader, what would the climatologists do? Run amok finding more things to be scared of?

The defining issue is use of a physical/mechanistic model to describe the proxy’s temperature response (amongst other factors). If the model is univariate linear correlation, then it ain’t the sort of model the skeptics are interested in – it’s a purely statistical model. The issues are multivariate, nonlinear, nonstationary, and contamination effects. THAT’s what real stasticians mean when they talk about the need for “models”. Honest assumptions about how these “proxies” behave.

So depsite the applause from both sides for this glib remark, there are large differences in perspectives The fact is the best paleos truly do applaud the approach of mechanistic modeling. The problem is that it’s hard. So while good paleos work on challenging problems, Mann runs amok with his aphysical models that are fueled by heavy data snooping and a breathtaking lack of accountability for sources of error an uncertainty.

In other words, the skeptics are right to applaud, and the bandwagoners at RC – well, they either do not know what they are applauding, or do not really care as long as policy momentum is sustained.

Exactly, if it were easy, I’d already come up with the definitive equation and we’d all know exactly what the impacts of increased GHG’s were. Sadly, even the most basic statistics are some heavy lifting for me. (I really don’t know if PC10 is better than PC4 in this case or not.)

What strikes me though, is that it was presumably posted by Gavin and MM. This seems to be an acquiescence of sorts. And maybe you’re right, maybe neither one understands what they’re stating. (I did point out to Gavin that the 80% probability wasn’t what M&W were trying to assert, so maybe…..) IDK, the verbiage they used, even in the discussion paper, seemed a bit like they gave some ground. (Except when referencing M&M, as expected, they were a bit snarky.) To me, anyway, its interesting.

Ken, that was great! Good licks on guitar and the voice seemed like it was made for blues! Drum and bass were ON TIME! Please pass on kudos to the band. I’d ask for the chords, but I’m not sure I can translate it to my acoustic and countryfy it.

[quote]but some of their criticism is simply bogus. They claim our supplemental code was not usable, but in fact we provided a turnkey R script for every single figure in our submission – something not true of their code, so that is a little cheeky of them [as is declaring that one of us to be a mere blogger, rather than a climate scientist ;-)[/quote]

[quote]Additionally, MW make an egregiously wrong claim about centering in our calculations. All the PC calculations use prcomp(proxy, center=TRUE, scale=TRUE) to specifically deal with that, while the plots use a constant baseline of 1900-1980 for consistency. They confuse plotting convention with a calculation.[/quote]

[quote]“Second, we take the data as given and do not account for uncertainties, errors, and biases in selection, processing, in-filling, and smoothing of the data as well as the possibility that the data has been ”snooped” (subconsciously or otherwise) based on key features of the first and last block.”

[Response: Except that they didn’t. They used there own set of data, and are complaining about ‘ad hockery’ when it was pointed out. – gavin]
[/quote]

So, which is it..? Was this code available or wasn’t it? Was the data available or wasn’t it? Is the Centering erroneous or not? It seems like everyone thinks the answers to these questions are obvious…Obviously I’m not so clear 🙂

Allow me to frame the debate in a way so as to enhance understanding by the general public. Sometimes we have to make judgements as to who is likely to have behaved in a certain way based on a nuanced interpretation of the preponderance of the historical record, which, taken as a whole, yields a robust conclusion.

[quote]but some of their criticism is simply bogus. They claim our supplemental code was not usable, but in fact we provided a turnkey R script for every single figure in our submission – something not true of their code, so that is a little cheeky of them [as is declaring that one of us to be a mere blogger, rather than a climate scientist 😉 [/quote]

MW said they couldn’t use SMR’s code, while SMR seems to indicate the inverse…….uhmm, different versions of the program? Different OS’s operating the same program? There’s a myriad of reasons why this could be and quite possibly neither is to blame, other than not taking a predetermined stance. If one is going to share code, then it must be determined before hand, the OS, DB, and code program. They all have to be the same or odds are the same won’t occur.

[quote]Additionally, MW make an egregiously wrong claim about centering in our calculations. All the PC calculations use prcomp(proxy, center=TRUE, scale=TRUE) to specifically deal with that, while the plots use a constant baseline of 1900-1980 for consistency. They confuse plotting convention with a calculation.[/quote]

This, I found amusing. This is a programing command. It doesn’t speak to the actual maths of the program. It will, most likely, center something, what ever it is the program was coded to do, with the caveat that it is properly placed within the program. It is not implicit that anything was properly centered. It very well could be, but RC’s response doesn’t show that it is.

This back and forth is even cuter….[quote]“Second, we take the data as given and do not account for uncertainties, errors, and biases in selection, processing, in-filling, and smoothing of the data as well as the possibility that the data has been ”snooped” (subconsciously or otherwise) based on key features of the first and last block.”

[Response: Except that they didn’t. They used there own set of data, and are complaining about ‘ad hockery’ when it was pointed out. – gavin]
[/quote]

Well, sort of, they(MW) took the entire db, not the select few that M08 used. But, from the rejoinder, right underneath figure 1, Results using the reduced set of 55 Mann et al. (2008) proxies (excluding Tiljander) are plotted with solid lines whereas results using the full set of 93 proxies are plotted with dashed lines. Two features stand out from these plots. First, the differences between the fit of a given method to the full or reduced set of proxies are quite small compared to the annual variation of a given fit or compared to the variations between fits. Second,the RegEM EIV methods produces reconstructions which are nearly identical to those produced by OLS PC4 and OLS G5 PC5.

Ouch, it seems MW did consider (after criticism) both sets. However, it should be pointed out, 1) that they found the exclusion of some to be ……curious. and 2) their statement in regards to taking the data verbatim, in nearly verbatim in the original paper. It was that they found no reason to exclude almost half of the data for arbitrary purposes. You should note the bottom of page 4 of the rejoinder where is shows the proxies had already undergone a ‘minimum requirements standard’ to be included in the series. Just because it doesn’t fit a preconceived notion of what they’re supposed to be like is no reason excuse to exclude them.

in fact we provided a turnkey R script for every single figure in our submission

This may or may not be true. McShane & Wyner are best positioned to address this claim. But do note that this could be yet another case of the pea under the thimble, for which they are famous. McShane & Wyner did NOT assert that a turnkey R script was not provided for every figure in the paper. They asserted that the analysis was unreplicable because of heterogeneous code fragments being distributed higgledy piggledy around the web. These are not the same things! A turnkey script for Figures is good. (And hurrah, if that’s what they did.) A turnkey script to recreate an entire analysis is better.

McShane and Wyner say that the scripts for the “downstream” analysis in the Schmidt comment was available and worked fine, but the issue was the “upstream” RegEM for Mann et al 2008, where they found the scripts incoherent and which they were unable to get operational in a finite time. That seems entirely possible.

Mann and Schmidt have a pattern of delivering responses that barely glance, and then asserting with counter-vehemence something that appears contrarian. But behind the bluster is a dodge.

This is a tactic that works well in a limited exchange such as through the peer reviewed litchurchur. But in the blogosphere where bandwidth can be less limiting, there is only one winning approach: tell the whole truth and nothing but the truth.

Playing pea & thimble here is equivalent to NOT telling the whole truth. They do it all the time.

This has always been curious to me. Many speak of such things but I’m pretty sure I don’t understand. I don’t code for a living, but I’ve programmed. The word “turnkey” makes me envision some sort of vending machine. What “code” is it in? More importantly, why is it so? If it isn’t blocked properly or commented properly, then its just gibberish. I’m beginning to think a computer scientist needs to be involve in the process also. There are ‘best practices’ and standards for such things. I’ve often wondered why “super computers” were necessary for such computations…….I’m beginning to see why.

Re: suyts (Dec 14 23:21), Turnkey means that it works out of the box as long as you follow the instructions.

So: if I you say ” run program A to create data Y. Then run program B, to process data Y and produce graph 1.2″ then you supply the code to do exactly that. reproduce the research. Ideally with as few manual steps as possible. Ideally you hit RUN and the paper pops out. ( think Sweave)

I see…. I think, so it not only produces the code for you to see, but also executes it. Thanks, Steven. But there’s more involved in the processes than just the coding. While I assume the maths are checked, that doesn’t ensure that validity of the program. For instance, I once worked in a company that programed lunch programs. (Don’t laugh, it had to follow fed guidelines!) At any rate, at the time we used foxpro. The command was proper. The data was proper. Literally, 995 times out of 1000 it executed properly. It took myself and another several weeks to find where it went wrong. Why? Unknown. I’d always thought it was the way foxpro translated to boolean, but I could never find the step. My buddy thought it to be confined to a specific value inputted, but that wasn’t the case.

My point is, code?, yeh anyone can. Much like climatologists can do statistics. Database management?, yeh, that’s easy, unless you want to do it right.

Don’t get me wrong, I’m not the one. But if we want to do it right, we shouldn’t half a$$ it. Get the right people for the right job.

UC did succeed in getting it (Mann 08) to run. At the time we spent quite many mann-hours in order to get the published code run. We identified few pieces of code that were missing, and after complaints here, those ended up into Mann’s archive. So the original code (still available at PNAS and NOAA, I guess) is not even runable. The code at Mann’s site is, but requires quite a lot editing work to get it run properly. For instance, it uses a lot of absolute path references, which of course needs to be edited.

… that is a little cheeky of them … as is declaring that one of us to be [sic] a mere blogger, rather than a climate scientist

Oh dear, the good people at RealClimate spotted exactly the thing I did and it’s made them feel bad about themselves and the world. Poor, poor Gavin. I’m deeply sorry I drew everyone’s attention to it yesterday. To be grouped with Him Who (Previously) Was Not Named by McShane & Wyner (neophytes who clearly don’t have the faintest clue about the dizzy heights of climate science) and, what’s more, to be second in the list of two. I feel really bad now.

There are not four Tiljander data series — only three. The primary series recorded by Tiljander et al. were X-Ray Density, varve Thickness, and Lightsum. Lightsum is the portion of varve thickness contributed by mineral matter. (Varve Thickness and Lightsum can each be measured in millimeters; diagrammed here.)

Darksum is taken to be the portion of varve thickness contributed by organic matter. It was calculated as:

Darksum (mm) = Thickness (mm) – Lightsum (mm)

There are only two degrees of freedom among Thickness, Lightsum, and Darksum.

The authors of Tiljander et al. (2003) suggested that the pre-1720 portions of XRD, Lightsum, and Darksum contain climate-related signals. They made no such claim for Thickness.

This comment was slotted into position #27 at RealClimate.org this morning, within an hour of appearing at ClimateAudit.org (after passing moderation or being fished out of spam). Unfortunately, RC’s moderators did not append any remarks to it.

“…data are more important than models and models are more important
than specic modes of inference. In the present context, this suggests
focusing efforts on the development of new climate proxies and the
attendant statistical issues in processing them into usable forms. More broadly, statisticians need to engage the entire climatological community in questions of what raw data to collect and in how to process these data into forms that can be broadly used.”

Stein’s data centric approach seems obviously correct. Beyond that, isn’t Stein subtly implying that there is a problem with the data and that new data needs to be collected?

Stein’s process seems awesome. Soliciting lots of responses (and forcing them to be terse and to the point), and forcing the rejoinder and responses all to be written in a timely fashion really expedites the process.

Also liked Stein’s plug for transparency of data, metadata, and code.

Steve: this sort of discussion paper has a long tradition in statistics. the JRoy Stat Soc has many papers – the Brown 1981 paper on multivariate calibration, discussed at some length here in the past was a Discussion Paper. I agree 1000% that discussion papers yield far more insight into the paper than a carefully varnished standalone paper. Arguably technical blogs provide some of the function for non- Discussion Papers.

>>
In addition, nearly 10% of the Mann et al 2008
network (105 series) are series derived from the Briffa et al, 2001 network, notorious for its late 20th century decline. However, actual data after 1960 has been deleted and replaced by data infilled by a RegEM process (Rutherford et al 2005.) Use of the actual post-1960 data will
further erode performance of the proxy reconstruction.
>>

Well that’s great final word Steve. Really good that get that in the literature.

So “hide the decline” is still going on even in Mann et al 2008 . Astounding!

I can’t see this as anything other than scientific fraud. Cutting out real data because it “does not fit” and replacing it with some simulated data that does.

Blender:
>>
Well perhaps they’ll finally STOP advocating purely statistical approaches, such as MBH98 etc and other nonsensical approaches that flip series up-side because the proper orientation is inconvenient to the result they’re trying to “get”.
>>

I don’t think this was because the correct orientation was “inconvenient”. It because this proxy was so contaminated that it actually fitted the thermometer record better with a negative scaling factor (ie upside down).

Now any honest scientist would have instantly realised the data had a BIG problem. But because Mann et al wanted to reduce MWP, it suited their ends invert this proxy.

Doing this had the two fold effect of reducing the MWP of the proxy ensemble and the overall variability of the past 1000 years, thus giving weight to their “unprecedented warming” hypothesis.

Read through a few of the solicited comments about McShane and Wyner’s paper. There was lots of criticism. It would be so interesting to have the same crew of commenters review the original Mann et al paper.

Stephen, I was once a colleague of Richard A Davis, one of the contributors to the discussion. He and his co author, Brockwell, are very highly regarded in the mainstream Time Series world. I am sure his comments merit serious consideration. It is heartening that statistical academia is now taking an interest in the subject.

Curious that Romm says very little about the uncertainty on that hockey stick – which is what the paper is actually about, and indeed what half the debate is about. Also curious that he fails to echo key criticisms of the staticians re: the use of models that allow for inhomogeneities in the proxy responses. And finally there is the snooping issue (see comment by Craig Loehle in this thread), which gets short shrift in all the discussion thus far.

Why does Romm engage in such selective reporting? Is he “reporting” or is he building a case for a pre-conceived position?

In a nutshell, we may paraphrase Phil Jones: unfortunately, current paleoclimatic approaches do not provide sufficient precision to make robust conclusions about modern vs. medieval warming.

No one disagrees that we are coming out of a natural cooling phase of unknown origin called the “Little Ice Age”. So the paleoclimatoligsts really have nothing to add to the debate. The strongest support for Romm’s alarmism comes from the climate models. In terms of predicting the future, the paleoclimatic data are almost completely irrelevant – a fact which the editor himself points out.

Using these NH4+ concentrations as a temperature proxy, we reconstructed tropical South American temperatures over the last ∼1600 years. Relatively warm temperatures during the first centuries of the past millennium and subsequent cold conditions from the 15th to the 18th century suggest that the MWP and the LIA are not confined to high northern latitudes and also have a tropical signature.

It sort of disagrees with Mann et al. (2009) – except that there are a couple of anomalous warm brown MWP spots in Peru and Iran near the middle of their blue Pacific and Pacific pools in (Fig 1). Curious that these spots are not smudges.

I see that Romm relies on material supplied by a website called “Skeptical Science”. If that website is truly “skeptical” then it should revise its maps of the “medieval climate anomaly” as proposed by Mann in his 2009 Science paper. Because McShane & Wyner (2010) illustrate that that reconstruction has zero skill, implying that the maps have zero value. (Why let a little uncertainty get in the way of a nice story?)

When using statistics there should be a sharp distinction between the field in question and statistics. When we apply statistics to a particular set of data, all the meaning of the data has been abstracted away; the statistics should, I think, treat a time series the same way regardless whether it came from economics or paleoclimatology. Outliers should be determined by objective metrics. If a data set contains questionable data from the specific field, the data set should be not be used. I believe that MW deals only abstract statistics on the data which could represent economic data.

In SMR the statement (page 2, lines 30-33)

In the frozen 1000 AD network of 95 proxy records were used by MW, 36 tree-ring records were not used by M08 due to their failure to met objective standards of reliability

If the data was unreliable, why was it included in the data set? If the data is considered unreliable due to mathematical problems (i.e. outliers) then what is the objective measure to determine that they are unreliable? The issue is not whether or not 36
tree-ring records should be included or not, but whether SMR conflated paleoclimatolgy with mathematics (in the form of statistics).

McShane and Wyner wrote in their rejoinder states page 4

Consequently, the application of ad hoc methods to screen and exclude data increases model uncertainty in ways that are unmeasurable and uncorrectable

I think MW are saying the same thing that I was trying to get across. Introducing paleoclimatic reasons to exclude records is conflates paleoclimatology with mathematics.

It seems to me that buried in the basis of all these manipulations by Mann et al. are the assumptions that each proxy measures a local or regional temperature, that each such local or regional temperature is the sum of a global average temperature influence term and a regional bias term, the regional bias terms are either positive or negative and when averaged over many proxies, sum up to zero. Hence, adding up many proxies produces a measure of the global average temperature according to this theory. Unfortunately, these assumptions do not seem to have been validated. Even a casual examination of the actual proxies show that they vary wildly from one to another. What seems to happen is that even if these assumptions were true (a consummation devoutly to be wished) the regional bias terms are orders of magnitude greater than the global average term, and the global average term is buried in an ocean of noise generated by the regional bias terms. Mann and company have attempted to use complex methods to extract the signal from the noise but as M&W and many commentators have shown, the process suffers from many ills, not the least of which is the imperfection of the proxies themselves. I would quote Carl Wunsch:
“Sometimes there is no alternative to uncertainty except to await the arrival of more and better data” (Wunsch, 1999).