Demetris Koutsoyannis at 11th Statistical Climatology Meeting

The aim of IMSC is to promote good statistical practice in the atmospheric and climate sciences and to maintain and enhance the lines of communication between the atmospheric and statistical science communities.

Geoffrey Boulton’s associate, Gabrielle Hegerl, was an organizer. One of the important sessions was entitled “Reconstructing and understanding climate change over the Holocene”.

It’s a different sort of stochastic process than the all-too-artificial ARMA processes that dominate present-day analyses and well worth paying attention to.

Demetris wrote me to say that he attended on Tuesday and Wednesday, including the session “Reconstructing and understanding climate over the Holocene” (see p. 18 in the “final program and logistics“, commending the first presentation by Heinz Wanner (Holocene climate change – facts and mysteries).

He said that Mann’s talk included an interesting cartoon with an ensemble of hockey sticks, one of which is being broken by an angry guy.

He reported that Climategate came up in a talk by Reinhard Böhm, who, according to Demetris, “promoted a thesis that it is dangerous to put raw data open on the internet because some they would misuse them.” Demetris said that, in his own talk the next day, he tried to respond to this (and “to comment on Mann’s cartoon with the ensemble of hockey sticks–but Mann wasn’t there”).

Demetris sends the following extended commentary on his exchange with Böhm.

One of the interesting talks I attended in the 11th International Meeting on Statistical Climatology (http://cccma.seos.uvic.ca/imsc/11imsc.shtml, University of Edinburgh, Scotland, 12-16 July 2010) was that by Reinhard Böhm. He is the author or a recent (2008) book in German, “Heiße Luft – Reizwort Klimawandel: Fakten – Ängste – Geschäfte” (Hot air: the climate change controversy – facts – fears – funding). His talk was given in the session “Climate Data Homogenization and Climate trend/Variability Assessment” and was entitled “Bridging the gap from indirect to direct climate data – experience with homogenizing long climate time series in the early instrumental period”. The long abstract can be seen in p. 90 of the book of abstracts, accessible from http://cccma.seos.uvic.ca/imsc/11imsc/final_program_abstracts.pdf. In his talk he referred to Climategate and discussed the question whether original climatic data should be available to the public or not. His main point was that the original data are contaminated with biases and inhomogeneities and thus need homogenization. Therefore, only processed data are useful and should be available to the public.

I disagree with this thesis and I addressed three questions to him in the end of his talk:
1. If I homogenize a data set of an area, do you think that there might be a possibility that I introduce more biases that originally contained?
2. If you studied the climate of that area would you rely solely on my processed data or would you retrieve also the original data?
3. Do you think that the original data should be available to the interested scientists or not?

In my question 1 he replied “yes”, which I appreciate, given that I believe that standard procedures for consistency checking and homogenization are strongly affected by inappropriate statistical assumptions (e.g. iid variables with exponential distribution tails), which are invalidated in the real word. In question 2 he replied that if I give explanation of the procedures I followed he would rely on my processed data. About question 3 he said (if understood well) that its reply would need a long time, but in brief the raw data should not be available on the internet because some could misuse them, e.g. choosing only a few stations that demonstrate a specific behaviour that they want to advocate.

I was not happy about the last answer and, next day, I found the opportunity to reply indirectly, using the first slide of my own talk (http://www.itia.ntua.gr/en/docinfo/991/), which contains the title and the web link to my presentation. I said that the online availability is not just for this presentation. Rather, in my group we believe in transparency and have agreed that everything we produce, papers, reports, data, etc., should be openly available on the internet. And I continued “Please feel free to misuse them but, also, please be advised that transparency is the most powerful weapon against misuse”.

This universe, which is the same for all, has not been made by any god or man, but it always has been, is, and will be an ever-living fire, kindling itself by regular measures and going out by regular measures.

Interesting use of Hurst-Kolmogorov statistics to model long-term system effects of climate. If man-made global warming were a fact, you might see
changes in the HK operator at some point in time. Maybe going back thousands
of years you find a value of 0.84 but then recently something changes and
0.78 seems to work better. I wonder if Mr. K has looked into this?

On page 28 of the linked presentation, an analysis of the Greenland ice core temperature proxy is included, yielding a Hurst parameter of 0.94, very similar to the recent Svalbard temperature series of 0.95. I think this is also similar to the HadCRU global temperature series. It seems that the lower troposphere temperature from MSU satellites has a slightly higher value.

Care is required when assessing proxy measures though, as the statistical processes applied to the proxies may not be neutral with regard to the HK behaviour. This particularly applies, for example, to the multi-proxy studies where sampling criteria and calibration can have a big impact on the modern / historic relationship.

On a separate note, it is great to see Anagnostopoulos et al referenced as being “in press”. I did wonder if this paper would be blocked at the peer review stage, but it seems it will now be published. For me, whether these studies are dismissed or ignored in AR5 will be an important marker as to whether the IPCC have turned over a new leaf.

The idea that raw climatic data is comprehensible only to the self-annointed cognoscenti who call themselves “climate scientists” smacks not only of academic hubris, but of Orwellian censorship in the guise of “homogenization.” I’m happy to see that Dr. K stood up against such self-serving nonsense.

I’m not that happy to see Dr. K claim in his presentation that Papoulis treats only “memoryless” stochastic processes. As a thoroughly state-of-the-art EE, Papoulis is certainly aware of capacitance (i.e., storage or memory)effects that give most physical systems a decaying impulse response function. This gives rise to outputs with transient autocorrelation functions even under pure white-noise inputs. There is a well-established analytic framework well-known in system analysis for characterizing not only the output signal but its realtionship to the forcing or input signal for a very wide of sytems and inputs, including narrow-band “cyclical” signals.

The HK formalism that Dr. K embraces is very much more limited, confining itself to non-negative acfs and strict-power-law spectral structures. This formalism necessarily excludes “cyclical” signals. But such signals are the ubiquitous in geophysics. I’m particularly unhappy with the proposal of a stochastic process consisting of randomly distributed step-changes and white noise. Since physical systems are driven by energy, any true step-changes are very unlikely in the geophysical setting.

The old HK formalism certainly has its place in the analytic firmament, but it’s not the method of choice in dealing with real-world geophysical data.

I’m not that happy to see Dr. K claim in his presentation that Papoulis treats only “memoryless” stochastic processes.

No, I do not say that. I say that according to Papoulis (and I agree with him) memory is a notion referring to systems and not to processes. Thus, we can have systems that are memoryless or not memoryless. Naturally, his books covers both.

The HK formalism that Dr. K embraces is very much more limited, confining itself to non-negative acfs and strict-power-law spectral structures. This formalism necessarily excludes “cyclical” signals. But such signals are the ubiquitous in geophysics.

It depends what you mean with “cyclical”. Strictly cyclical, with constant period? Then the cyclical behaviour does not need a stochastic description. Fluctuating behaviour with ups and downs but without constant period? Then the important thing is whether such fluctuations demonstrate persistence or antipersistence. For persistent behaviours, as in all examples I show in the presentation, the HK framework is good and is the simplest–but you can extend it adding more parameters if you do not like parsimony. For antipersistent behaviours, HK is not sufficient and certainly needs extension. See my recent paper: Koutsoyiannis, D., A random walk on water, Hydrology and Earth System Sciences, 14, 585–601, 2010 (http://www.itia.ntua.gr/en/docinfo/923/)

I’m particularly unhappy with the proposal of a stochastic process consisting of randomly distributed step-changes and white noise. Since physical systems are driven by energy, any true step-changes are very unlikely in the geophysical setting.

This is just a working example to demonstrate and explain that “memory” is a misleading concept and term. I propose replacing “memory” with “change”. But I do not propose to use this as a full description of the natural process. I propose HK instead.

“It depends what you mean with “cyclical”. Strictly cyclical, with constant period? Then the cyclical behaviour does not need a stochastic description. Fluctuating behaviour with ups and downs but without constant period?”

Or perhaps a cyclo-stationary SP with a very rich spectral density — some powerful components of which are of very long wavelength (~100,000 years and more as evidenced by ice age periodicity) while others are of very short wavelength (seasonal, annual and decadonal oscillations). Constant period? — eventually; but could be a very long time. (Would such a series contain a unit root when analyzed over a short time interval of say 2,000 years? Might it appear non-stationary over such a time interval?)

It’s like the old golfing joke — golfer turns to his caddie and says, you think I get a get a 4 iron there? Caddie pauses, looks at the golfer and says, ‘eventually’.

You state:”…memory is a notion referring to systems and not to processes.”

The concept of memory is no less applicable to processes that are derived from the white-noise “innovations” process, e.g., AR or IMA processes, or combinations of both. Unlike the white-noise process, whose acf is the Dirac delta function, the acf of these processes is a transient function of the lag variable.

The reason I put “cyclical” in quotes is to provide an indication that I’m not referring to strictly periodic functions (e.g. diurnal planetary rotation) whose acfs are likewise strictly periodic and of the same fundamental period, rather than transient. I’m refering to irregular cycles such as widely observed in a host of geophysical processes (e.g., ocean waves, quasibiennial oscilation of upper troposphere winds, etc.). Their acfs take the oscillatory form of damped sinusoidal functions–quite distinct from the always-positive, monotonically decaying acfs assumed in HK formalism. Their power spectra (Fourier transforms of acfs) show strong peaks and valleys, rather than the monotonically decaying power-law spectral structures of red or pink noise.

HK does have its place in analyzing, say, turbulent diffusion processes with corresponding spectral structures. But it provides a model that is ill-adapted to the analysis of oscillatory phenomena. It is such phenomena that are ubquitous in the records of climate variables.

The concept of memory is no less applicable to processes that are derived from the white-noise “innovations” process

Yes, if you put such “white-noise innovations” as the focus if your view of nature. This is usual, but in my opinion misleading (even if you use “filters” to introduce “memory”). For instance, it leads to the perception of a static climate, as shown in my slides.

But it provides a model that is ill-adapted to the analysis of oscillatory phenomena.

If you find some time, try to read my “Random walk of water” paper, which starts from an oscillatory process.

I thought we were discussing the basic assumptions of HK formalism, which often lead to the estimation of the Hurst coefficient from acf(1) alone, versus exploiting the entire sample acf, as is done in power spectrum estimation.

My view of nature, based on physics as well as signal analysis, encompasses nonlinearities and nonstationarities. That is an entirely different subject, however.

sky, my impression is that you are a modern guy 🙂 and I am not surprised that your view of nature encompasses fashionable notions such as “nonlinearities” and “nonstationarities”. But I am a traditional guy, who tries to explore very old wisdom contained in the fragments of Heracleitus, the logic of Aristotle, etc., sometimes in conjunction with modern wisdom (see my #comment-236134 above re. the contrast of Heracleitus with Einstein).

May I, therefore, draw your attention to Antisthenes and his quote “The start of wisdom is the visit (study) of names” — see more details in my presentation in http://www.itia.ntua.gr/en/docinfo/944/

In this respect, may I ask you, how you define nonstationarity(ies)? (See my own view in the presentation linked above). Are you aware that, when in your comment above you write “acf(1)”, you have tacitly assumed stationarity (otherwise, you must write acf(t, t+1)? Are you aware that when you write “the estimation of the Hurst coefficient” you already assumed stationarity and ergodicity?

May I also ask you, how you define nonlinearity(ies)? What is the domain of your definition? Deterministic or stochastic? Note that a stochastic approach need not be nonlinear to produce trajectories that resemble real-world phenomena, whereas a linear deterministic approach is generally a poor representation of reality. Moreover, the principle of maximum entropy with the Boltzmann-Gibbs-Shannon definition of entropy leads to linear dependence of variables (in a stochastic framework, of course). See more details in http://www.itia.ntua.gr/en/docinfo/799/

After many decades of professional experience in analyzing and modeling geophysical processes, I’m considerably less “modern” than you suspect. There’s nothing “fashionable” about the terms “nonstationary” and “nonlinear.” Both are time-honored, familiar technical terms that tell us what the process or system is NOT.

Nor did I ever remotely suggest that a nonlinear stochastic approach is necessary to obtain “trajectories” that “resemble” the real world. On the contrary, my reference to standard linear methods of signal analysis (BTW well described by Papoulis in abook by that name)should have alerted you to that.

I’m quite aware of the basic things you mention. Are you aware of the imitations of HK when dealing with processes whose spectrum contains very strong peaks and valleys?

Since, as you say, you are not as “modern” as I initially suspected and since you do not give definitions for these concepts different from mine, I think we have good reasons to converge. To make a step further toward convergence, my reply is yes, I am aware of the limitations you mention. Please see slide 12 of my presentation in Edinburgh, where I say: “The HK process does not provide a ‘perfect’ and ‘detailed’ mathematical tool for geophysical processes. Rather it is the most parsimonious and simplest alternative to the classical, independence-based, statistical model”.

Since, as you say, you are quite aware of the basic things I mention, I hope you will agree with me that the notions of nonstationarity and nonlinearity have been badly abused and understand why I insist to its correct use.

While the HK framework is a step upward from simplistic white-noise noise models, it INFLEXIBLY PRESUMES a stochastic structure that is almost never exhibited by time-series of climate variables. The latter almost invariably exhibit oscillatory behavior (due to spectral peaks) that is a game-changer wrt to the primitive concept of persistence. It is a simple model, to be sure. But I would not characterize HK as “parsimonious,” as if the essential features have been captured. Let’s leave it at that.

I tried to put an end to this exchange, which does not seem to attract the interest of anybody else, by stressing our convergence. But I fully disagree with your last comment, particularly with your use of “never” and “parsimonious”. But I still wish to put an end to it, so I will not explain my reasoning any more. You know who I am, you can read my papers (some I have indicated above), all of which are available on line (at least as preprints), and you can find my reasoning there.

In professional circles, real-world evidence is what speaks most convincingly. By ensemble averaging 110-year-long series of annual average temperatures recorded at a score of broadly representative US stations little affected by UHI, the following sample acf r(m) is obtained for m =1,25yrs by a demonstrably unbiased estimation algorithm:

Note the fairly regular secondary peaks spaced 6-7 years apart and the
persistently negative values at the longer lags.

Applying the HK formalism means ignoring these essential features, while
prescribing a monotonic decay based on the value r(1) = 0.339. But that
value is strongly influenced by the intra-decadal oscillations and tells us
virtually nothing about the persistence of the multi-decadal oscillations
that lead to increasingly negative values at the longer lags. The latter
climatic oscillations show the highest power density by far, accounting for
a third of the total variance in the first few spectral bands alone. The
intradecadal oscillations that produce the “jittery” year-to-year
variabilty are more of academic interest vis a vis climate change. HK
tells us nothing reliable about either in this case, which is structurally
not much different from what is observed around the globe.

Demetris raises a very important issue regarding the ease with which climate data can trick the observer into false trend significance, if the long memory is not properly modeled. Good thing the IPCC was right on top of this.

For some years now I have been wondering at the comparative absence of visible depth in this creature called climate science. It is refreshing and even necessary to have the input of people such as you and the other inspirational people mentioned by name here. Such first class work sets a target for the young folk joining in and the older existing people who need to lift the game – we hope that they can rise to the hurdle, as we had to in our respective disciplines.

The “can’t put it in public because it will be misused” argument just always resonates with me as the priests not wanting the Bible in German or English or any other language that people could then read for themselves and say “Hey, waitaminute…!”

The concepts of autocorrelation presented in this presentation/paper are indeed unique and point out some aspects of time-series analysis that require serious consideration in dealing with temperature data and the averaging thereof.

One thing that I would like to find is a Fast Fourier Transformation plot of temperature data over time. We know that there are at least two main frequency responses that will exist – a daily 24 hour period and a seasonal variation over each year. There are likely much higher frequencies that may exist on a local and perhaps regional basis as weather patterns fluctuate due to a variety of circulation effects. There are also significant frequencies associated with El Nino and La Nina as well as other oceanic oscillations. I’m sure we could also see the characteristic oscillations due to sun spots and axis wobble effects. By examining such a frequency plot there may be ways to extract better information about the long term trend of temperature as a function of GHGs and other climatic sensitivities. Does anyone know if such a study has been performed or published?

Frequency-domain (power spectrum) analyses, along with other well-known time-series methods, are seldom seen in the climate-science literature, where rather simplistic time-domain methods prevail. What they reveal can be put in a nutshell as follows:

1. In tropical and temperate zones, fairly narrow-band semi-centennial and longer oscillations dominate the spectrum of the longest, least-corrupted annual-average station records available.

2. There is far less spectral power near the periods of the 22-yr Hale solar cycle, and negligibly little at those near the 11-yr Schwabe cycle. A significant minor peak appears, however, at the 5.5 yr second harmonic in many regions whose coherence with the El Nino Index tends to be high.

3. A broad continuum, whose features vary considerably from region to region accounts for the remaining power.

What this implies for estimating “trends” dovetails with and extends further Dr. K’s conclusions about much greater uncertainty than is commonly assumed with simple AR noise models. A moments reflection about 30-yr linear “trends” in the face of strong oscillations , say ~60yrs in period, tells us that they too are oscillatory.

Thanks for the information – do these analyses appear in the refereed literature?

I quite agree about linear trends of short duration actually being oscillatory in the face of other longer period steady-state alternating changes. It is a fundamental issue in using ramp changes as a forcing function that has led to many misinterpretations of cause and effect relationships linear or otherwise.

My collegues and I have made no attempt to publish these results of empirical analysis, being entirely content to use them simply as a guide in our modelling. I should also point out that, given the inadequately short time series available, the spectral indications provided thereby are quite provisional, with confidence intervals corresponding to a chi-square variable with ~10 degrees of freedom.

I’ve actually done Fast Fourier Analysis for the Central England Temperature, the Armagh temperature record, the Solar Cycle since the 18th Century, the Aswan Nilometer record, as well as fun things like the Dow 30 and a 70-year tidal record for Halifax, Nova Scotia.

(Willy = an Orca called “original data” which is seen at the surface at rare occasions only and which usually is well hidden in a deep ocean called “national or subnational climate data archives”)

It is interesting to learn once again about the advantages and the disadvantages to have a common “lingua franca” (English). In the case of the short discussion in Edinburgh between a Greec and an Austrian we did quite well I think and I only want to briefly add something concerning Demetris’ question 3:

Yes, I clearly answered that I would not like to have all “original” data easily accessible via the web for everyone. But then I added that they should be at the free disposal for everybody on demand. This “filter” provides the great advantage for the user to receive the necessary additional information about what he gets: E.g. a temperature time series with longterm negative trend caused by a (quite common) relocation from the historic center of a city to the airport in much more rural surroundings.
I strongly believe that progress in science is to a great part based on trust in the work of others. We can by no means try to always start right from zero – this would in fact make impossible any progres in science. In my special case (homogenizing) I congratulate everybody – let’s say from Canada – who is capable to find the reason of inhomogeneities in – let’s say Greek – time series in the station histories kept in the archive of Demetris’ national weather service. I suppose the older ones are handwritten in Greec language and supposedly also in Greek writing?

I stop here with a short passage I have posted in another blog (Klimazwiebel) this January:
I hope my plea for “tricking” is not misunderstood but regarded as what it is – an attempt to see things more differentiated and sophisticated. A completely liberal data policy may seem to be the only acceptable and achievable alternative at first sight. But not each modification of the original data has the intention to “hide the truth” – at the contrary, the overwhelming majority of such attempts want to help to effectively unveil the truth.

Another “last word” to “geo”: Yes, you are right! But I, as one of these homogenizers see my role as one of those “translators” of the Bible from an un-understandable language (the original data, being a mixture of climate signals well hidden in non climatic noise, biases, outliers..) into an easier understandable language which is nearer to the “truth” (climate) but of course not “the truth”. And I at least try to explain how I did the job of translating in publications, conference talks, blogs… I hope you are as strict also against the translators of the bible (how is your Aramaic? I hope good enough to go to the roots of the “original” bible).

I, as one of these homogenizers see my role as one of those “translators” of the Bible from an un-understandable language (the original data, being a mixture of climate signals well hidden in non climatic noise, biases, outliers..) into an easier understandable language which is nearer to the “truth” (climate) but of course not “the truth”.

Your analogy is perhaps more apt than you know. Without going OT or into banned discussion of the “R” word :), those who know something about bible translations are well aware that translators over the years tend to introduce their own biases… a Biblical version of “hide the decline”… everything from switching the apostle “Junia” (a woman’s name) to “Junius” (nonexistent man’s name) to reversing the sense of an entire argument by leaving out one little three letter word: “NOT!”

The argument that we should simply trust others’ work has little support in science history, let alone climate science. People make mistakes, are subject to confirmation biases and more. As proven time and time again here and elsewhere.

If you had presented your argument as “I believe it is crucial for meta data to be made available together with raw data”, you would have found a strong chorus of support in this blog community. Strong support. Are you aware of the Reproducible Research methodology?

IF data homogenization/cleaning were done with publicly available codes, it might be ok to prefer the cleaned up data. However, in too many cases the “fixes” have turned out to be nonsense. Early versions of the Hadley data claimed that they adjusted their data by eye. We have seen GISS “adjust” perfectly good rural stations based on urban ones. “Adjusted” and “homogenized” data end up having a bigger trend than pure rural stations. The urban heat island effect is claimed to have been removed from the global data (GISS) or to be ignored (Hadley) when clearly it has not been removed and can not be ignored. Rural stations have been dropping out of the database (the great dying of thermometers) (even though the stations still exist in most cases) and it is pretended that this can’t have any effect. etc. In paleoclimate studies, we have seen evidence ignored that strip barks pines have anomalous growth , Tiljander proxies used upside down, lat-long locations transposed, and these errors are never acknowledged or fixed. So people are not happy with the adjustments and would really like to see what is going on, in detail.

Your comment is right on. All the efforts at replication “confirming” the veracity of CRU and GISS temperature records all seem to ignore the crucial issues you raise. THIS is where the real scam lies.

As a data libertarian ( and code libertarian) I’m not sure I get the point of this:

“Yes, I clearly answered that I would not like to have all “original” data easily accessible via the web for everyone. But then I added that they should be at the free disposal for everybody on demand. This “filter” provides the great advantage for the user to receive the necessary additional information about what he gets:”

As I understood it your purpose was to preclude or diminish the possibility that the data would be misunderstood or misused.

1. all data whether raw or processed is subject to this. there is nothing inherently different about raw data. it is numbers.

2. There is nothing to insure that people will read your documentation. whether you post it with your data or supply it
after registration

3. If your goal is to diminish the misuse of data then its best to do the following. post all code used to process the data.

4.Licence your code in such a way (GPL) that inforces sharing back derivative code.

The idea that registration is required to provide additional information is rather quaint.

Steven Mosher says, “The idea that registration is required to provide additional information is rather quaint.”

I agree with this comment, and I would like to also add that the only rational purpose I can see to requiring such “registration” or identification of oneself is to enhance the process of identifying “enemies” so they can be added to one of the “blacklists” that have been revealed (or perhaps one that hasn’t yet been revealed), and can thus be “dealt with accordingly”, as is the customary intention of those who maintain “blacklists”.

Your reference to translations from Greek, would only serve to underscore the point. Particularly, if the original Greek “data” was being interepreted/extrapolated (say from words like “sort of cold, kind of cold, very cold”), rather than literally translated (say between very well-understood number systems).

Dr. Böhm you state, ”But I, as one of these homogenizers see my role as one of those “translators” of the Bible from an un-understandable language (the original data, being a mixture of climate signals well hidden in non climatic noise, biases, outliers..) ”

For the last three years, I have been asking climate scientists and climate science skeptics for a precise technical definition of the often-used term “climate signal”, a term which appears ubiquitously in the peer reviewed climate science literature.

In the nuclear industry, we have precise, well documented definitions for the commonly-used scientific and engineering terms that we employ in our various technical literature articles and in our various safety analysis documents.

These definitions are more than simple dictionary entries, they have several important descriptive elements associated with them, including the dimensional units to be used, the kinds of physical processes from which the term is derived, the physical or technical contexts in which the term is properly employed for a data analyses or for an experiment, guidelines for the proper application of the term in a research project report or in a safety analysis report, and the historical provenance of the technical term.

So far, of those I have inquired of who seem to possess the proper scientific qualifications, no has come forth with a precise technical definition for the term “climate signal” — one which comes anywhere close to the definitional standards for terms we employ in the nuclear industry.

Nor can I find such a precise definition out on the Internet, nor has anyone pointed me to a climate science reference text which contains a precise definition, one which employs the aforementioned descriptive elements.

The lack of a precise technical definition for the term “climate signal” must certainly present difficulties for the climate science community in performing their experiments and in writing their various analyses, in that the term could not be relied upon to be consistently and professionally applied among nominally independent research projects and among nominally independent data analysis efforts.

Dr. Böhm may I ask a favor of you? For the benefit of both myself and of the Climate Audit readership, may I ask you to take some time from your schedule to write a precise technical definition for the term climate signal — one which employs the aforementioned descriptive elements, using appropriate citations as needed?

Climate being an abstract, largely statistical, construct, the short answer to your question is that there is no “climate signal” in any strict sense of the phrase. One can meaningfully speak, however, of the temperature (or any other variable’s )signal at a particular location. In that context, the signal is simply the ideal time-history, free of any measurement error and extraneous influences. This is in contradistinction to what is recorded, processed and stitched together as a time-series from various readings of instruments that have been deployed under the name of a single station, whose exact siting and general environment may have changed materially over decades and centuries.

It is precisely the influence of the latter factors (well summarized here by Craig Lohle’s comment) that introduce significant discrepancies from the “true” signal in the station records. Particulary onerous are station moves, instrumentation changes, and the creation of a temperature “bubble” due to urban growth or land-use changes (deforestation) near the station. They all introduce extraneous components that corrupt the time-series, with UHI and many land-use changes invariably introducing a spurious “trend.”

The attempt to deal with these extraneous temporal components (which goes under the misnomer of “homogenization”–a spatial or ensemble concept in rigorous science) rests entirely upon ad hoc data adjustments. It is these adjustments, starting with TOB and ending with straight out changes in the monthly average values of the historical record, that renders the whole excercise of estimating the “signal” a perilously unreliable.

Instead of attempting to “correct” corrupted records, they should best be ignored in regional or “global” compilations of average temperature variations. But, with more than 80% of century-long records outside the USA being urban records, this would mean that the greatly incomplete spatial coverage throughout the globe would become even more glaring.

Bureaucratic, rather than altruistic, dictate the unscientific ad hoc practice of “homogenization.”

Sky, thank you for your response. While it doesn’t meet the kind of definitional standard for “climate signal” I was hoping for, your response does provide some very illuminating insights into the topic.

You say, “Climate being an abstract, largely statistical, construct, the short answer to your question is that there is no “climate signal” in any strict sense of the phrase.”

The climate science community, especially the paleoclimatologists, seem to view their work in extracting “climate signals” from instrumental temperature series and from temperature proxy data (tree rings, etc.) as being very much akin — if not fully equivalent — to what electrical engineers do when they extract time series wave pattern data from various electromagnetic phenomena.

Now, if one reads their peer-reviewed papers in detail, one gets the impression that climate scientists believe they have indeed extracted a “signal” in a very strict sense of meaning, a signal which is the climate science equivalent of what the electrical engineers extract when they investigate waves in the electromagnetic spectrum.

Thus the climate scientists justify the use mathematical data extrapolation tools of various kinds, a prominent example being Regularization EM, a tool which has been used very successfully by electrical engineers to fill in missing time series data for a variety of practical, real-world signal processing applications.

However, when I see how often the term “climate signal” is being employed in climate science papers in such a variety of technical and interpretive contexts, but without a clear definition as to what the term “climate signal” actually represents in terms of the physical phenomena being investigated, I have to ask myself the following question:

Regarding time-series temperature data from both instrumental and proxy sources, does there currently exist a sufficient body of knowledge concerning the underlying physical phenomena associated with interactions among climatological, biological, and geological processes which would enable climate scientists: a) to rationally employ mathematical tools such as RegEM in generating their time series temperature data; and b) to accurately determine what the time series data actually represents in terms of physical climate change?

My short answer to both your questions is: almost certainly NOT! Although many climate scientists affect the lingo of system and signal analysis, they are far removed from any rigorous signal extraction as is done by EEs, even when dealing with instrument records. And when it comes to proxy records, one look at what they call “transfer functions” is enough to make one chuckle.

You may find Prof. Koutsoyiannis’ opinion on the term “climate signal” illuminating, though it is not a direct reply to your question:

When classification of a specific process into one of [deterministic and random] fails – and it usually does … – then a separation of the process into two different, usually additive, parts is typically devised…. This dichotomous logic is typically combined with a manichean perception, in which the deterministic part supposedly represents cause-effect relationships and reason and thus is physics and science (the “good”), whereas randomness has little relationship with science and no relationship with understanding (the “evil”). The random part is also characterized as “noise”, in contrast to the deterministic “signal”….

The entire logic of contrasting determinism with randomness is just a false dichotomy.

D. Koutsoyiannis (2010), “A random walk on water”, Hydrology and Earth System Sciences 14, 585-601. http://itia.ntua.gr/en/docinfo/923/ (the quotation is from the beginning of the second page).

1. The Greek vs. Austrian/German discussion reminded me of a chat with Armin Bunde, the speaker former to me in this session, who is German. He complained that I had translated the Heraclitus fragment in my last slide into English. He said I should not–evidently he knows classical Greek very well. Also, the Bible metaphor reminded me of long discussions with the late Vit Klemes about the Great Commandment “Love your neighbour”, which in the Greek original is somewhat different “Agapa ton plesion sou”. Plesion is not exactly neighbour. By the way, the original of New Testament is in Greek, not in Aramaic.

2. I have some 20-year long experience with raw data (including handwritten reports and paper tapes of autographic devices) and their processing. A long time ago, I started an initiative for computerizing all hydrometeorological data of all agencies in Greece in a distributed data base system (see e.g. http://www.itia.ntua.gr/en/docinfo/32/). The system we designed contains, for each station, digitized raw data and as many versions of processed data as available. We are working to convince the Greek authorities to make everything openly available on the internet and we are in good pace. Some have agreed already. I am very satisfied with our system design but I regret one thing: The system does offer access to the digitized raw data but not to the handwritten papers and tapes. We are now trying to reshape the system design so that it also contain scanned images of the original papers and tapes. I now know that this is important, because many times, in a specific extreme storm for example, I wish to access the tape of the pluviograph, particularly when there is disagreement with the daily total recorded at the conventional gauge.

3. The multiple versions of processed data may not indicate mistreatment of the data or any intention for fraud. When I revisit a data series I usually change the processing based on my new knowledge. In this, I need to retrieve the original data. For example, in the past I used to follow standard homogenization techniques. Now I am more attentive. See my recent poster: Koutsoyiannis, D., A note of caution for consistency checking and correcting methods of point precipitation records, International Precipitation Conference (IPC10), Coimbra, Portugal, 2010 (http://www.itia.ntua.gr/en/docinfo/1002/)

Your descriptions of data-archiving and data-handling issues refer to your experiences with hydrological records, including readings of rainfall gauges.

From an outsider’s perspective, there is a striking difference between these practices, and (what appear to be) the generally-accepted practices with respect to temperature records.

I would have assumed that the precipitation records provide a valuable and nearby lesson for scientists concerned with temperature records.

E.g. —

* Archiving primary data is superior to not retaining it.
* Technological advances can facilitate open access to data.
* Easy access to data is preferred.
* Metadata should be preserved, preferably linked to primary data.
* Adjustments to data should be transparent and traceable.

My assumptions are naive. These are clearly not teachings that have been accepted by the paleotemperature-reconstruction community.

I agree with your statement, but better than scanned handwritten notes is the ability to get the guy who did the work in the first place and get him on the phone.

I’ve wasted weeks of time working with an old dataset which, when I talked to the guy who generated it, told me it was nonsense and should be ignored. He worked out it was nonsense only several months after it was generated, because another dataset produced by the same instrument was nonsense and he looked at the instrument and discovered it was faulty.

Much more important than scanned notes is the phone number of the group who generated the data.

It will take this geologist some hours to fully digest Dr. Koutsoyannis’ work. A first glance at the poster and quick read of the presentation looks most interesting and as in much of his other work highly thought provoking. I was however quite disquieted by Dr. Bohm’s comments as reported. His comments posted above suggest he either does not understand or chooses not to understand the importance of raw data preservation and discrimination. It is possible that we have a slight translation problem between German and English. I suggest Dr. Bohm take a few hours out of his busy schedule and reread Eric Popper’s “The Logic of Scientific Discovery” perhaps in the original German. He might also read my short essay Data (retreadresoruces.com/blog, )which discusses the relationship between data and metadata and requirements the “scientific method” places on them.

Perhaps one of the main issues to come out of Climategate and the subsequent discussions of open-ness and transparency (as highlighted above) will be a reversion to the forgotten element of the scientific method – replication.

Academic research, with the nature of competetive grant awards and ‘publish or perish’ means that academics are now not really in the business of replicating previously published work (i.e. verifying that using the data and method as described in the original work produces the same results and without the use of fundamentally erroneous assumptions) – the hockey stick spaghetti forest, where a series of semi-independent reconstructions applying various (but similar in intent) statistical methods (and similar assumptions) to overlapping datasets, all of which provide publishable papers with similar conclusions, is a good example of where such competetive academic science gets you.
By comparison, a single publication and then a serious critical examination by attempted replication doesn’t get many people many citations and so doesn’t help with the next grant application, but actually can move the field on much better (either by validating the initial work or highlighting areas that can be improved).

This is where a combination of improved transparency (including access to raw data as well as descriptions of the methodology applied) and the ‘interested amateur’ (especially courtesy of the blogosphere) can develop a role – there are now lots of people with good science or maths backgrounds with the time and inclination to look at the data and methods and hence attempt both ‘pure’ replication (i.e. same method, same data, same assumptions) and to evaluate how minor changes (of reasonable methodological choices, input data and assumptions) have an effect.

That’s where work like Steve’s can bear most fruit, in showing how several of the reconstructions are influenced by one Yamal tree, some Californian bristle cone pines or a Finnish lake sediment sequence demonstrated to be inverted (where the reasons for recent changes are known because of real world conditions not relating to climate).

It is worth noting also that such ‘amateur’ replication work is already going some way to supporting the original work – for example Mosher and others work on temperature reconstructions are showing that CRU and GISS are using reasonable assumptions and methodological choices in the production of their gridded products, contrary to some of the sceptical claims. There are still questions (not yet investigated) regarding the scale of influence of both UHI effects (on the large scale) and micro-site effects on the original data being input, but so far the replication studies are helping demonstrate that the warming is not an artefact of the processing.

BTW when I refer to ‘interested amateur’, I’m really meaning someone who is outside the academic mainstream within the field being discussed, and so includes people such as academic statisticians who can look at things with fresh eyes.

There are several things that impressed me about this conference as well as Steve’s post on my contributions–and not only. First, I was impressed when Peter Guttorp, whom I did not know in person, wrote me that he and Peter Craigmile organize a session on long term memory and invited me to speak in this. This made me happy, even though I do not favour the term “memory” as I explain in the presentation. Second, I was impressed by many flattering comments I heard from people who attended my talk and whom I met for first time. Third, I was impressed when Steve wrote me talking about my contributions and asking my impressions about the conference. I knew that he was in London during the time of the conference, but he seems he watches everything on climate. Reinhard Böhm’s comment above (whom I notified about this post) was also very kind of him. I thank them all, as well as all folks contributing in this discussion. I agree with most of the comments and I think I owe some clarifications. I have a hectic time, so, unfortunately, I have to make these clarifications in instalments.

First instalment, re. Steve’s note on my “comment on Mann’s cartoon with the ensemble of hockey sticks”. This comment was roughly this, and refers to my slide no. 9 or 10. Both contain the same figures. The upper figure with the static climate is what classical statistics produce. The lower figure is produced by Hurst-Kolmogorov statistics and seems to correspond to real-world climate. Cut a small part of the lower figure, that is, from reality. When referring to real world climate, this part is unavoidable small because in reality the period of instrumental measurements is short. Use inappropriate/classical statistics to produce a static climate, like in the upper panel, for the past. Stick the small real-world part with the longer static part. You have a perfect hockey stick. Since the inappropriate/classical statistics is so widespread in climatology, an ensemble of hockey sticks is not a surprise.

My questions are these:
1. Is your stochastic HK process perhaps the same as an ARIMA(0,1,n) process?
2. Does the exact nature of the stochastic process matter less than the dramatic effect that the presence of that stochastic process has on the error bounds of observed trends?

I am happy to see Steve’s formulation “all-too-artificial ARMA processes that dominate present-day analyses”. I fully agree! They are too “all-too-artificial” and they mislead us. They mix up the essential properties of a stochastic process with technicalities and logistics. Essential properties are whether a process is stationary or nonstationary and, if it is stationary, its marginal and joint distribution. The latter is described though its autocorrelation (or autocovariance) function, its power spectrum, and climacogram. All these three concepts are equivalent to each other and any one can be derived from the other by an appropriate transformation. The logistics are how we generate a time series from a process. In the ARMA approach the two are thought of as the same thing, i.e. a specific ARMA process implies a specific autocorrelation function.

Another essential issue is how many parameters we need to describe the joint distribution–the dependence in particular. Obviously, we should seek as few parameters as possible, in accord to the principle of parsimony. But the ARMA models may need too many parameters to make a realistic model.

As I have shown in Koutsoyiannis, D., A generalized mathematical framework for stochastic simulation and forecast of hydrologic time series, Water Resources Research, 36 (6), 1519–1533, 2000 (http://www.itia.ntua.gr/en/docinfo/18/) the logistics is a simple problem given modern computer abilities. I have provided general algorithms applicable to any type of autocorrelation, without any reference to ARMA structures.

In terms of the questions: No, HK is not ARIMA(0, 1, n). The former is stationary, the latter not. I do not favour focusing our discussion on issues like “unit root”, and using too sophisticated algorithmic procedures on the complex plane for problems of simple logic. I will give a simple example. Let us assume we have daily measurements from a raingauge. Does the observer empty the raingauge bucket every day after taking the measurement? Then I will use a stationary description of these data. Otherwise, if he does not and he provides accumulated data (actually, there exist cumulative devices which are placed in inaccessible areas), then I will use a nonstationary description. Alternatively, I can make some subtractions to find the rainfall depth at each day, or to “stationarize the data”, and use again a stationary description. Of course, one may also try fit ARIMA models to the original data, without bothering to ask or think whether they are accumulated rainfall depths or not. Then he can locate the model’s root on the complex unit circle, and subsequently decide about stationarity or nonstationarity. But this is not my approach.
On the other question, re. the “exact nature of the stochastic process” and the “presence of that stochastic process”, I have some difficulty to understand it. In my view these formulations imply that stochastic processes are material things that can be “present” and have some “exact nature”. In my view they are just models, i.e. abstract mathematical constructions and there is not one-to-one correspondence of the abstract world of models with the real world.

I think we are failing to see the (common) forest because of the trees. Considering the discussion over at Bart’s (in which you, too, participated 🙂 I think we are basically approaching the same critter from different directions

All the consequences of the high Hurst coefficients you reported, are the same as those of a (near) unit root process (ie. same forest, behind different trees).

– high uncertainty in parameter estimates (also the ones describing expected change, or if you will ‘the trend’)
– very high prediction uncertainty (widening prediction intervals with increasing forecasting horizons)
– deterministic (linear) trends are bogus

Considering this, don’t you think it’s a bit of a stretch to classify a process exhibiting a Hurst coefficient of 0.99 as ‘stationary’ and thereby very different from the integrated process representation? For all practical purposes (see again your conclusions), this process can very well be described as non-stationary.

So, in my view, the point is not whether the series *has* a unit root (nothing *has* a unit root, and I think we agree on this, philosophically), but whether it should be modeled as having one. For those unaware of the pretext of this discussion, see the Bart thread (linked above) where I gave an entire battery of formal arguments why I believe the process should be modeled as a unit root containing one.

As I see it, the method you propose can be mirrored to a specific ARFIMA (where FI = ‘fractionally integrated’) process. In other words it is (very, very) long term stationary, as physics would predict. However, the number of observations available, ie. the instrumental record in case of temperatures, would make estimating any such process pointless.

Think of it this way: in order to ‘measure’ long term stationarity you need a time frame over which (at least a part of) this stationarity can be observed. The instrumental record falls well short of this requirement, and the results of ARFIMA estimates can thus be described as ‘arbitrary’ and useless for any ‘trend’ inference. Needless to say, this didn’t stop climate scientists from estimating it and publishing the results as some kind of ‘proof’ of something (what exactly, has eluded me so far).

Hence, as I proposed earlier, we should model it as a full-fledged unit root process (ie. a non-stationary one), as it resembles one so much. Just like Kaufmann and for example Beenstock did when proceeding with cointegration analysis.

Also, I strongly disagree (as I did in our earlier e-mail conversation 🙂 with the notion that ARMA processes require a large number of parameters to describe. The main advantage of an ARMA/ARIMA structure is that it *is* parsimonious due to its enormous flexibility. Hence the modeling framework’s ability to outperform most, if not all, structural models in terms of prediction accuracy.

In the case of the temperature record I managed to describe the ARIMA process ‘governing’ the series in 3 parameters (both versions of the stochastic trend), which can hardly be considered ‘overfitting’. One simply has to responsibly employ Information Criteria when performing comparative diagnostics.

***

However, it comforts me that we arrive at the same conclusion: There is much more natural variability and trend precision than is generally understood.

This convergence in conclusions is indeed what one would expect when employing different, yet consistent, methods to study the same observed process. As you can see, I completely agree with your assertion:

“In my view they are just models, i.e. abstract mathematical constructions and there is not one-to-one correspondence of the abstract world of models with the real world.”

————-

Dear Steve,

What exactly do you mean with “Steve: Different animals. The process affects the error bars”?

A complete model describes the entire process, not just a part of it. For example, error bars themselves are generated via a deviation from a base model, of a process. In this sense, the error bars are not ‘affected’ by a process. Rather, they and their dependence are a part of it. Can you clarify, please?

Also, what does ‘all to artificial’ mean in this context? I don’t see any reason why a HK approach would be any less ‘artificial’ than an ARIMA one. Any model, and especially one summarizing a hypercomplex system with unknown boundary conditions and a practically infinite set of determinants into a finite set of parameters (in the case of trends: less than 5), is per definition ‘artificial’.

One Trackback

[…] happened. For example (thanks to reader John Moore), at the 11th Statistical Climatology Meeting, Demetris Koutsoyannis asked another scientist whether, as a rule, “original data should be available to the […]