Esper on In-Site Cherry Picking

I noticed the following quote from Esper et al 2003 (reference in earlier post

It is important to know that at least in distinct periods subsets of trees deviate from common trends recorded in a particular site. Such biased series represent a characteristic feature in the process of chronology building. Leaving these trees in the pool of series to calculate a mean site curve would result in a biased chronology as well. However if the variance between the majorities of trees in a site is common, the biased individual series can be excluded from the further investigation steps. This is generally done even if the reasons for uncommon growth reactions are unknown.

I posted up the next quote from Esper previously, but it’s worth repeating in the present context:

However as we mentioned earlier on the subject of biological growth populations, this does not mean that one could not improve a chronology by reducing the number of series used if the purpose of removing samples is to enhance a desired signal. The ability to pick and choose which samples to use is an advantage unique to dendroclimatology.

These statements just make my jaw drop. If the information at a site level is being fiddled with (“adjusted”) in the way that Esper indicates here (and in my opinion, it would be a worthwhile investigation for someone to check whether it is – and not just from the bosses), how can you ever rely on anything? If the field workers know that the boss has a "signal" in mind and their methods do not require 100% of the data to be recorded, you have biases way beyond D’Arrigo cherry picking individual sites. Maybe I’m just interpreting injudicious comments in an adverse light, but the comments are hugely inappropriate for authors of studies being relied on by IPCC and policy-makers and deserve to be looked into.

36 Comments

Well it seems to me that IF a sufficiently high significance level were used to reduce chance correlations to a very low level, and IF trees continued to respond in the same linear way to the same factor outside the calibration period it might work. But these are big if’s and there is no indication they are satisfied.

Even if it did work, the flaws in the logic of picking cherries, making cherry pie, and then proclaiming – “Surprise! Surprise! I got a pie with cherries!” are obvious. You really need to show something unexpected from the independent reconstructions, like a sudden big dip or peak. The way they all just wander back through time in an largely incoherent way is entirely expected of data with no strong signal. So you can’t say “Surprise! I got no variations in temperature! The 20th Century is unusual!”. Circular.

You’re exactly right, Steve. If workers don’t know why some trees are deviating from a trend, then they also don’t know why other trees are following the same trend. What results is an ingrained circularity: Certain trees display conformity with a process. That’s mere association. But then the conformity is taken as process-forced causality. The assumed causality is finally offered as proof of the process. That’s circularity. My own guess, recalling from your posts that Ammann and Wahl, Jacoby, and D’Arrigo all essentially justify that same mode of rationalization, is that dendroclimatologists are so intent on finding a climate signal that their thinking itself is no longer moored to scientific rationality.

Hannah Arendt wrote a book a long time ago, about Eichmann and the banality of evil. She made the point that the outlook of what is normal can shift imperceptibly, but steadily, taking people’s minds along with it. People become acclimatized to the bizarre and, eventually, their sense of social normalcy is what would in the past have been considered insane. That’s an extreme example, but I think a process like that might have happened in dendroclimatology. Their sense of what is correct has migrated, and now what is patently incorrect seems reasonable. All driven by the extreme pressures of climate politics.

This was GoldStein’s point about scientific misconduct: the experimenters thought that they already knew the correct answer.

Dave Stockwell: I wanted to ask you a question. When you ran your reconstruction of climate using the same methods but using red noise, were you surprised how “realistic” appeared the results and how much it looked like peer-reviewed published studies?

While we here (for the most part) agree that Cherry Picking is good for pies, it’s not for science in general or climate proxies in particular. However, there is a valid point that the noise needs to be removed so that the “signal” can come through.

Obviously cherry picking data so that you come out with a pre-desired signal is not the way to go. You need to cancel the noise, not remove any signals you don’t like. So in general it’s simlar to noise cancelation systems (Headphones, speakers, dolby etc.) the problem is identifying the noise to filter, and that’s where I’m stuck. Essentially you need to figure the baseline tree growth without any outside influence. I guess you could pick tree growth of X based upon normal temp/humidity/sunlight for the area, then factor that out. Meaning you’d have to have a noise signal for each area sampling, which makes sense.

Still I don’t see how your going to pull temp alone out of that. Your still going to have to deal with confounding factors like sunlight and moisture.

I’m still trying to figure out how good tree growth in the 20th century is a bad thing for us.

And the bit I left out that is important is you apply the Noise cancelation to ALL of the tree ring samples, not throw out those you don’t like. Throwing out outlying result (extreme difference from norm) is one thing, but you have to throw out the outlyers on both extremes.

From what I’ve been able to gather from Steves work they threw out everything but the outlyers on one extreme that fit their ideal. If anything these particular samples should have been tossed.

I asked the people at realclimate about tossing out some samples, and they told me that it was based on “…physical, chemical and bilogical” criteria that were determined a priori. IF that were true, and the quote from Esper above suggests it’s NOT, then I guess that would be more like a reasonable thing to do. Yet here is Esper saying, from what I can tell anyway, that they are throwing out the data that doesn’t fit their idea of a “signal” – ie, they are not throwing it out BEFORE they do the analasys, but AS they are doing it. And ISTM that the PC method used in MBH does *exactly* the same thing with what’s left – automated cherry-picking after manual cherry-picking. It’d be surprising NOT to find what you’re looking for using *that* method.

Oblique, but not off-topic: I just read a book called “The Fly in the Cathedral”, a great and engaging book about the smashing of the atom at the Cavendish (very interesting read. It offers a picture of science right before it became such a huge enterprise.)

Rutherford and his people had been surprised by some nuclear disintegration results a famous (I forget which) German laboratory was getting. At stake was some very fundamental properties of the newly discovered nucleus, and bragging rights, so the situation was tense from the start. The interactions between the labs got somewhat heated, but eventually Rutherford was invited to come look at the setup the Germans were using.

At the time, before the advent of Geiger counters or other electronic means, the process of measuring nuclear disintegrations involved literally counting, by eye, scintillations from Zinc Sulfide coated glass. This took keep powers of observation, dark-adapted eyes, and the patience of Job.

When Rutherford observed the situation, he found that relatively untrained clerical workers were being used to do the counting. Further, he found that the experimenters were telling the counters what results they expected. Lo and behold, those were the results they always measured.
When controls were in place (i.e. no preconceptions given to the counters) the mysterious effects went away. The Germans were chastened, but everyone accepted the findings.

So, and I ask this naively, but seriously: is the only criterion for including a tree that it shows a trend that the experimenter already believes to be there, and the only criterion for excluding a tree that it does not show the expected trend? Is there no more theoretical justification than this, or did I miss it?

So, and I ask this naively, but seriously: is the only criterion for including a tree that it shows a trend that the experimenter already believes to be there, and the only criterion for excluding a tree that it does not show the expected trend? Is there no more theoretical justification than this, or did I miss it?

Of course I had Cavendish as a person for some reason until someone corrected it.

Along those lines, again somewhat tangential but to illustrate the same point. During the Castle Bravo test of the first dry Fusion bomb it ran away to three times the expected yield (irradiating scientists and Japanese fishermen).

Later what they determined as the cause was basically that the Nuclear Physicists (Second only to Rocket scientists in intelligence status?) had Used Lithium deuteride as the Fusion Material. This is made up of Li-6 and Li-7. It was determined that Li-6 would absorb the fission neutrons, creating extra neutrons and a Tritium nucleus to add to the fusion reaction.

They did all the calculations and determined that Li-7 would not undergo this reaction.

The majority of the Fusion device was Li-7 that the did not think would contribute to the reaction.

What it did was absorb neutrons and fission (releasing extra neutrons) instead of the Fusion reaction. The leftover was Li-6, which underwent the above reactions.

The point I’m trying to illustrate is. Even the smartest of scientists (which I think we can agree the postwar Los Alamos group was/is) can sometimes miss something that has a drastic effect on the results. In this case the results showed something had happened, in paleoclimate reconstructions, since we do not have an actual instrumental record you can’t just say “You figured this, but the actual readings were different”

Of course if we had the actual readings the reconstruction wouldn’t be necessary.

Dave Eaton
Great story about the English and German labs. The main reason that the bias in the German lab was discovered was that there was competition between the two labs. So the English had an incentive to “audit” the Germans’ results.

Why don’t we have more academic competition in the case of these climate studies? Too much is accepted on the word of the researchers who publish their results. Is it because of the politics of the IPCC?

I hate to attribute it to the political culture of academia and the UN. But when I do the mental experiment of thinking of other politically correct subjects and whether there would be true critical analysis in the academic world, I have to say that I do not have much confidence in their objectivity.

#14 – John, I’ve been mulling over the possibility that all these international meetings to build consensus may be reducing traditional academic criticism. For example, it’s a lot easier criticizing D’Arrigo or Hegerl when you don’t know them. In real life, they seem to be pleasant and decent people so once you know them, it’s difficult to be hard-edged about criticizing what they write, even though it should be done. By going to IPCC meetings and all the international conferences, they all become friends and write papers together. So instead of Esper and SChweingruber critizing JAcoby and D’Arrigo and vice versa, maybe they pull their punches. Just a thought.

A guy in my email group just brought the paper below to our joint attention. It concerns epidemiological studies, but I thought it had a lot to say about what’s going on in proxy climatology, as illustrated here in Climateaudit. Especially the part that says studies are *less* likely to be true when there is “lesser preselection of tested relationships [and] there is greater flexibility in designs, definitions, outcomes, and analytical modes.” That seems to encompass dendroclimatology, in spades.

Here’s the title and abstract:

Why Most Published Research Findings Are False

John P. A. Ioannidis
Summary

There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

John P. A. Ioannidis is in the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Tufts University School of Medicine, Boston, Massachusetts, United States of America. E-mail: jioannid@cc.uoi.gr

I find that the culture in chemistry is a little different from that of physics, and more different still from that of climate science. Even the more typical, reliable reactions one might run are so dependent on so many factors (people write books on nothing but solvation effects in reactions, for instance) that although one starts with some end in mind, no one really ‘predicts’ what will happen with any authority, and even if they do, the rest of us wait for the pudding before declaring that it tastes good. Hence, we tend to really scrutinize our data, because really, it’s all you’ve got for certain.

As a post doc, I worked with physicists, who were often very certain (and thus often very wrong) how an experiment would come out. Not a knock on them- they were also very careful with their data. But they were very model-driven, nonetheless, and somewhat dependent on the theorizers.

There really aren’t ‘theoretical chemists’ in exactly the same way there are theoretical physicists who predict in detail the outcome of experiments. They exists in name, but I really think the function is different. So I am admittedly not that familiar with the sort of theorizing that physicists do and that I think prevails in climatology.

ETSid’s point is interesting. While an atomic bomb is pretty complex, it is still more predictable than climate, and there one sees predictions coming up short. I suppose all that theoretical climate scientists have are models, in that real-time measurements over epochs are kinda hard. But along with all that, then, I wonder that people are so ready to trust their predictions, in the absence of demonstrably reliable models or particulary good data. It seems to me that the handling of data, especially data that did not fit the model, would be especially important. Just not including it seems nutty, but I’m not a climatologist by any stretch. Unless you can explain why you don’t include it, then it just looks to me like you are deliberately ignoring counterevidence. That can’t be good, can it?

If there is an ‘a priori’ reason, let’s hear it. But the writing of Esper and the offhand comments of D’Arrigo suggest otherwise.

There seem to be quite a few chemists in here. I spent a lot of time at the bench, running calibration curves for instruments and estimating liimits of detection. One thing I learned is that throwing out data points is extremely perilous. Even if you think a point is way out of line, it usually turns out to be your overall variability.

#16, Steve, yes, I understand that many academics are “nice” people. Still, that does not explain a lot about the lack of critical analysis of each other’s papers.

When I was in graduate school in the 1970s, the Monetarist-Keynesian debate was in full swing. The adherents to both schools went to the same national meetings, but that did not stop them from deconstructing each other’s models with great glee and trashing their results. Of course, there was an element of Democrat v. Republican in there to provide competition.

As far as being nice to each other, they were polite but mostly kept to their own crowd. So the most important element was that there were two schools of analysis and they didn’t let each other get away with the nonsense that we see in climate “science.”

It’s very telling, IMO, that none of those that are so critical of the “skeptics” will address the cherry-picking nonsense. Maybe there’s a discussion being conducted somewhere else? It is an extremely glaring issue, and they are going to have to address it sometime.

RE: “Hannah Arendt wrote a book a long time ago, about Eichmann and the banality of evil. She made the point that the outlook of what is normal can shift imperceptibly, but steadily, taking people’s minds along with it. People become acclimatized to the bizarre and, eventually, their sense of social normalcy is what would in the past have been considered insane. That’s an extreme example”

It is not as extreme as you may imagine it to be. One of my hobbies is the study of past totalitarian outbreaks and of the ongoing reluctance of Western intellectuals and leaders to truly confront it as the cancer it is. What do we do to free the patient of cancer? Only total eradication will suffice. This cancer analogy is interesting. I believe that after WW2, the cancer went into remission but not fully remission. Anyone objectively analyzing the world today can see that it is spreading again. And just like way back when, totalitarian thought has this nasty way of endearing those in the West who gravitate toward social engineering and utopianism. One of the things that made me wake up from my own past Gaia worship was a realization that I was starting to get sucked into that. I was starting to make excuses in my own mind for evil, terror and totalitarianism rising elsewhere in the world. It’s amazing how this rise of Gaia thought has so strongly paralleled the rise of a new strain of evil and totalitarianism. Go look at http://www.dieoff.org – the imprint of Margaret Sanger is there.

Re #19 David, it is just as perilous to not throw out data as it is to throw it out. If we, have a subset of data which we have good reason to believe comes from a different population, then retaining that data in our estimation of the population in question will be biased by the size and divergence of the two population parameters.

In this same line let me offer a rationale for selecting (I won’t say "cherry picking") one set of data series over another. Suppose we know that ring width or density is a function of temperature and other variables:
Y(i,t) = f( X(i,t), Z1(i,t)…Zk(i,t)) where i denotes the particular series and t the time.
When we ran our regression (during our initial research in about the relationship of this species to various environmental variables) we could not reject the equality of coefficients from one series to another, even at the 10% level. And we, hence, have concluded (now “know”) the relationship Y = f(X).
We want to use Y as a proxy for X (sorry to those who want to see a T for temperature, but during my novitiate I was badly beaten every time I used “i’, “j’, “k’, “n’ or “t’ as a variable name) but some of our series (call this index set i1) do not correlate well with X while others (set i2) does. Now remember we “know” the partial correlation (holding the other variable constants) between Y and X is strong. That’s what we tested. We figured that the variation in the Z’s is confounding the correlation between the Y and the X for set i1, and we go with i2.
I have just selected a set of series based on correlation. Is this cherry picking? On the surface yes, but it essence no. I think the test is that the moment the cherry picking issue is raised, the research says, “Oh yes, I see it can look like that, but here are the background relationships and tests. See the incredibly stable partial correlations across all of the series? So we are not really selecting on correlation but rather lack of variation of the Z’s”. (For all you wild econometric fans: we are selecting on high multicollinearity in the Y to Z relationship.) Don’t see something like that … then I guess cherry picking would be a maximum likelihood estimate of behavior. :-)
BTW, the fact that the set i2 correlates well for one period does not guarantee it is good out of sample unless there is reason — and opposed to wish — to infer that the Z’s will not be confounding for the i2 set of series in the out-of-sample period just as they were for the i1 set of series in the sample period.

The problem is that the Hockey Team creates automated cherry-picking algorithms, which as we’ve seen from Dave Stockwell’s reconstruction, result in looking like something like climate variation, but in fact have no information in them at all.

Then the belief engine kicks in: the data appears “robust” and fits the instrumental record with apparent high scores. Nobody at the journal questions it because they usually are ignorant of spurious correlations themselves. The “scientist” gets invited into the IPCC process…and so on.

But back in the real world, the original work is literally meaningless, but because it resembles a whole body of work which is also meaningless, nobody on the inside of the ivory tower is going to be anything other than supportive when some rogue Canadian statistician calls them on it.

#27-“nobody on the inside of the ivory tower” Dammit, John A, please stop being so universally and unqualifiedly damning in your references to academic scientists. I’m (almost) an academic scientist and work with them all the time, and am very _other_than_ “other than supportive” of our local Canadian rogue statistician. On the contrary, I am very, very grateful for him. I expect there are in fact academic scientists who have visited here that view things similarly.

A likely critical problem with dendroclimatology is that it is a small field. The few practioners — how may are there. . . maybe a couple of dozen?* — have all developed and used the same methods and have accessed the same data. They have all reviewed one another’s papers and found them worthy. That means they have all committed to the same methodological routines and have accepted the same reality as normalcy. In that case, an attack on the methods of one is an attack on the methods of all.

Most of the rest of science includes a much larger number of people and many more diverse methods. The criticisms are not bound by a narrow range of methods, the population is large enough to permit wide competition, and that population is able (at least in principle) to express critical evaluation without any personal entanglements.

Just last year I published a paper that overthrew one of the central explanatory paradigms in my field (I hope definitively). That paradigm was — and perhaps still is — supported by virtually all the academic heavy-weights. I had no trouble publishing the paper in a first rank specialist journal and suffered no specially severe reviewer slings and arrows; nor editorial obstacles. The point is that Steve’s experience in climate science is, in my experience, just not typical of science practice.

I value and enjoy your comments, John, and understand your bitterness and suspicion. I feel it myself. But I cannot let go by in silence your unqualified damning of academic scientists and of the practice of science. It’s not as you portray it. Soapbox mode OFF. :-)

Pat Frank: your arguments make perfect sense, and I think you are right. BUT, where the hell are those academic folks when we need them. Why don’t they speak up? Surely they can see that a few bad apples are spoiling their whole damn orchard!

#29-Jae, you know what? Most of the academics I speak with believe in AGW. The propaganda has been that effective, and very few seem to take the trouble to read the published science for themselves. In my experience, very few scientists outside the climatology field even know of the controversy produced by Steve’s work.

But back in the real world, the original work is literally meaningless, but because it resembles a whole body of work which is also meaningless, nobody on the inside of the ivory tower is going to be anything other than supportive when some rogue Canadian statistician calls them on it.

What I meant was that an ivory tower of climate scientists has been built and occupied. I have the greatest respect for academics in a wide range of disciplines, such as Ross McKitrick, Chris Essex, Sallie Baliunas, Willie Soon, Doug Hoyt, Richard Lindzen, Pat Michaels, Ian Castles, David Henderson, George Taylor, Lubos Motl… (and I’d like to apologize to all the others who have been so kind to correspond with me for not mentioning them by name but I must go on)

If anything the building contractors for the Ivory Tower are the IPCC. I think in the normal course of events, MBH98 would have been widely ignored as an aberration had it not been for being the poster child of global warming in the TAR. As Steve has already related, the IPCC has institutionalized misconduct by refusing to properly audit the papers submitted to it.

I’m not anti-academic, and I’d like to make that clear.

I am a philosophical skeptic, which means that that which proceeds from a person’s mouth or pen or keyboard in explanation of a phenomenon will by subject to checking, testing, and proper inquiry especially to those who claim expertise. Since I am not an academic, I have not taken the oath of allegiance to other academics to not question them when they make outlandish claims based on flimsy support.

Almost every academic I’ve met (and my father was a college lecturer) has any number of wild, speculative ideas based on nothing more than hunches that are discussed with other academics who also have even more bizarre ideas. It goes with the territory of creativity that at any one time, a creative person will have 100 ideas of which 99 will be flat out wrong.

If the process of peer-review meant anything, it is supposed to block those 99 from seeing the light of day, but unfortunately what it does is filter rather poorly, and lets bad ideas that should have stayed in the Senior Common Room out to cause immense damage (for example, the notion of “global mean temperature”) that cannot be withdrawn. These memes cause damage and panic to the body politic.

Because the only “proof” of the A part of GW is in the flawed studies and models that we’ve been discussimg for a year in here.

The point, so easy to miss apparently, is that extraordinary claims require extraordinary evidence and thus far, it seems we have nothing of the sort. That they keep pushing it as “settled” or “inconclusive” is propaganda at best.

It’s interesting to me that you’re “a philosophical skeptic.” Science is constructed to be objectively skeptical. I co-published a short paper in “Free Inquiry” in 2004 that discusses the difference, titled “Science is Not Philosophy.” If you’re interested in a pdf copy, let me know.

Academic scientsts take no oath of allegience to support the professional positions of other academic scientists. Fact-and-theory-based contention is the order of the day, normally.

Some years ago, I remember finding an book on Michel Serres, a philosopher at Stanford, written by another philosopher at Stanford. It wasn’t a hagiography but it expressed open admiration of him and his philosophy, as such. This sort of book does not get written by scentists about scientists. That is, books about the science of other scientists are admiring only to the extent the science itself has withstood the test of falsifying inquiry, and (apart from personal attributes) admiring of the scientist to the extent that s/he adhered to the scientific method. The reputation of past scientists can crash if it is discovered they falsified their results.

This sort of later fact-based judgment does not fall upon philosophers or theologians. The judgment criteria are different, resting only upon inner logic and coherence, and whether the expression is supple and eloquent. Relational consistency of external fact and internal theory (i.e., objectivity) does not enter at all.

One Trackback

[…] it is actually just the opposite. It reminds me of the wonderful quote from Esper 2003 (discussed here): this does not mean that one could not improve a chronology by reducing the number of series used […]