Greene: I am not, nor have I ever been a member of a data-mining discipline

Many of the problems in multiproxy studies have very strong parallels in econometrics. It’s really econometricians – who know all about autocorrelated series and data mining – rather than statisticians who really should be involved in climate statistics.

Here are some quotes from Clinton Greene, from an article about data mining from an issue of J Econ Meth (2000) which was devoted to the topic.

One of Greene’s resonant conclusions is that sometimes analysts simply have to wait for more data to provide a true out-of-sample test. He points out that repetitive use of the same data sets makes statistical tests invalid. He uses the example of an econometrician who develops his theory from data up to 1980 waiting until 2010 to test it. There’s much to be said for this in the present quandary of proxy studies – if Gasp” cedars for example were a "key proxy" up to 1980, then there’s a very easy way to see if they are a proxy for temperature – bring the damn series up-to-date.

Here are some comments from Greene, then I’ll mention a comment by Bradley below.

From this perspective data-mining refers to invalid statistical testing as a result of naive over-use of a sample. In particular, the use of a sample both for learning-inspiration and for testing of that which was learned or mined from the sample. Any test of a theory or model is corrupted if the test is conducted using data which overlaps that of any previous empirical study used to suggest that theory or model.The moral is clear. Scientific and reliable knowledge requires experiments. And good econometrics free of distorted statistic distributions requires repeated investigation of data which is at least new, even if not experimentally controlled. Given the difficulty in economics of testing in an unexamined data set it would then seem that the corollary to “‘œavoid data-mining’ is “‘œstick to pure theory’, a position taken by many.3

But testing in un-mined data sets is a difficult standard to meet only to the extent one is impatient. There is a simple and honest way to avoid invalid testing. To be specific, suppose in 1980 one surveys the literature on money demand and decides the models could be improved.

File the proposed improvement away until 2010 and test the new model over data with a starting date of 1981. In this simple approach the development and testing of theory would be constrained to move at a pace set by the rate at which new data becomes available (at which experiments are conducted). That the development of science must be constrained by the pace of new experimentation seems obvious enough. The mistake is to suppose every new regression is a new experiment. Only new data represents a new experiment. I do not consider this a pessimistic outlook. This is because I thinks much can be learned from exploring a sample. Patience and slow methodical progress are virtuous. And the impatient can conduct forecast based stability and encompassing tests with only a few years of new data. But seeing economists behave as though they do not believe in the central role of constraints and of inputs in the production of reliable knowledge, certainly is grounds for pessimism….

Statistics from data-mined specifications provide informal but valuable evidence or suggestions. The claim that one specification is less data-mined than another is not sufficient to justify formal interpretation of regression statistics as in classical statistics. All are guilty and a measure of explicit data-mining does not discriminate between useful and un-useful work. In-sample “‘œtests’ are useful as design criteria but only out-of-sample tests are precisely meaningful applications of statistics. Without out-of-sample testing there is no distinction between running regressions and constructing historical (ex post) narratives….

But the most important fear of data-mining stems from legitimate doubts about the validity of most testing in over-worked time series data and from the false hope that if our own contribution to data-mining is small then our own research will somehow circumvent the problem of pre-test distortion of standard statistic distributions. But avoidance of explicit specification search will not cure our discomfort with sloppy in-sample “‘œtests’ nor will it insulate us from dishonest or deluded results. Specification search is the only way to learn from the data. Out-of-sample testing of data-mined specifications is the only way to conduct statistically valid tests and create reliable knowledge. All else represents a wish for scientific validity unconstrained by scientific inputs, technology, method and sensibility.

The Bradley comment that I’m thinking about is one at the UCAR press conference in 2005 to promote the hockey stick. Someone asked him about what the proxies showed after 1980. Bradley said that they continue to function just fine after 1980 – citing Oerlemanns in a just-published Science paper. Now I think that glaciers going up and down are probably a pretty decent proxy for temperature, but Oerlemanns is a bait-and-switch. Glacier movements were not a proxy in ANY of the multi-proxy studies and Oerlemanns only goes back through the LIA. The question is whether bristlecones or tree ring densities function as temperature proxies after 1980. Most of the evidence is that they don’t – the Divergence Problem.

If Greene or some other econometrician were asked to consider the Divergence Problem, there would be little doubt as to the conclusion – the model of a linear relationship between ring width and temperature extending to warmer temperature ranges was invalidated. End of story. If someone wanted to provide a new model, then it would have to be tested, but the last one failed. In D’Arrigo et al 2006, Rob Wilson’s good influence at least ensured that they admitted the post-1985 failure, but they astonshingly went on to compare MWP and modern levels after admitting the failure.

Clinton A. Greene, 2002, I am not, nor have I ever been a member of a data-mining discipline, Journal of Economic Methodology 7:2, 217″€œ230 2000

9 Comments

There is a climatologically apt irony present in the immediately preceding paper in the same issue of the journal. Here’s the abstract of the paper by Pagan and Veall, with the relevant parts bolded:

“We maintain that the actions of researchers show that data mining is a necessary part of econometric inquiry. We analyse this phenomenon using the analogy of an industry producing a product (econometric analyses). There is a risk of selective reporting as Mayer indicates but we argue that other researchers (competition) will ensure that the sensitivity of truly important findings is checked. Hence, initial researchers have an incentive to analyse sensitivity from the beginning and so produce a quality product. Some suggestions are made towards encouraging this process. The ‘general to specific’ approach to data mining as promoted by Hoover and Perez can be valuable but it is premature to eliminate other strategies.”

The bolded part immediately elucidates the problem brought to proxy climatology by Michael Mann. His obscure language in publication, his secrecy concerning methodology, and his continuing obstructionism has frustrated the “competition” that should have had the opportunity to check his results. (One might say the same about Phil Jones.)

This analogy has the further benefit of putting Wegmann’s social network analysis in its proper light. That is, by roping most proxy workers into a collaborative relationship, Mann (deliberately or not) ensured there is no competition at all. He produced in the proxy community the business-equivalent of an interlocking directorate. The result was monopoly and restraint of trade. The “incentive to analyse sensitivity from the beginning and so produce a quality product” was entirely short-circuited.

As a result, the product was not quality, as you and Ross have shown.

Unfortunately for us all, Mann’s product had an addictive ingredient that has enslaved the minds of his consumer base. They are very angry about being exposed and very threatened by the incipient removal of a product that eases so much (psychological) pain, and are going to great lengths to keep it in the market.

A. R. Pagan and M. R. Veall (2000) “Data mining and the econometrics industry: comments on the papers of Mayer and of Hoover and Perez” Journal of Economic Methodology 7(2) 211-216

It’s common sense that you become less critical of people when you know them. I’ve corresponded with Rob Wilson; he’s been pleasant to me and I’d be pretty reluctant to get hard-edged with him.

Wouldn’t all the climate conferences have an impact? When I went to university many years ago, I’m sure that professors weren’t haring off to conferences all over the world every few weeks. Now they see each other a lot and that must surely smooth things over.

#2, I can only speak from my own experience in chemistry, which is that people put their own ideas ahead of professional friendship. Among the professionally ethical, their ideas must be analytically defensible. Among the less ethical (often those with the most ego-driven ambitions), their ideas are defended, whether ultimately analytically defensible or not. I’ve seen intense debates, even publicly in conferences, between prominent scientists who knew one another well, but were contesting mutually exclusive interpretations. (I’ve had a couple such debates myself.)

So, it often doesn’t matter whether people know one another or not, or have met at conferences. It matters whether someone is advancing ideas that contradict one’s own, and what one thinks about the meaning of the relevant data. Ideas are, after all, the currency of science. Those who have the best ones are the richest. That’s what the fight’s all about, and the passions are real.

People of good personality, like Rob Wilson, almost always will gracefully concede if they’re shown wrong through an unarguable analysis. Others will show bad grace, right or wrong. The personal lesson from practicing science is that it is always better to not be ego-involved in the outcome. The challenge is to be thoughtful, not to be right. Nature decides who’s right, and nature is always trickier than anyone knows.

The critical ingredient leading to clarity, as always, is that data and methods be fully exposed. I’ve never, ever, seen anything in chemistry like the deliberate and dishonest obscurantism that infects climate science.

The term Data-mining I am familiar with, is a method to scan corporate archives for useful legacy data. Wiki describes misuse of statistical data as “data-dredging” or “data-fishing”.

Data mining has been defined as “the nontrivial extraction of implicit, previously unknown, and potentially useful information from data” [1] and “the science of extracting useful information from large data sets or databases” [2].

Used in the technical context of data warehousing and analysis, the term “data mining” is neutral. However, it sometimes has a more pejorative usage that implies imposing patterns (and particularly causal relationships) on data where none exist.[citation needed] This imposition of irrelevant, misleading or trivial attribute correlation is more properly criticized as “data dredging” in the statistical literature. Another term for this misuse of statistics is data fishing.

Any articles on “data mining” usually distinguish between senses in which it is used.

“Data snooping” is another term which may be more apt for what’s going on with the multiproxy studies. The point is that Osborn and Briffa or Hegerl know in advance what particular proxies look like and so the data has been “snooped” in advance and just as in the econometric cases, statistical tests no longer apply as they would to fresh data.

At least back in 1988 when I took my first Econometrics class, “data mining” had a specific meaning.

It is used to refer to the practice of looking at the data (graphing, computing descriptive statistics, running regressions, reading previous studies using the same data set etc) and *then* coming up with hypotheses that match the patterns in the data.

Since the hypotheses are formed to conform to whatever is seen in the data, there is no chance they can be rejected. Now, if the data set were used to form ideas about a model to be tested, then a separate but equally applicable data set were obtained and the model were tested against those data, then that would be a proper application.

If you torture a data set enough, it will confess to what you want to hear.

When I look at climate related work, I do not see many testable hypotheses. I see a lot of maintained hypotheses. And, an alarming tendency to equate correlation with causation.

The idea seems to be to search for ways to construct series that correlate with human activity without seriously questioning the connection between actual temperature movements and the constructed series.