Uncertainty in Air Temperatures

Pat Frank has a new article recently published in E&E on the statistical uncertainty of surface temperatures. He has requested an open venue for discussion of his work here. This is an opportunity for readers to critically asses the methods and understand whether the argument/conclusion is sound. – Jeff

—————-

Uncertainty in Surface Air Temperature -by Pat Frank

All you fellow evil climate-change deniers out there may recall that December last year, Energy and Environment published my paper [1] on the unapt statistics used by the scientists at the CRU/UEA and UK Met [2] to represent uncertainty in the global average surface air temperature (GASAT).

The paper uncovered two mistakes in the published CRU error model:

1) The statistics of random error were uncritically and wrongly applied.

2) Systematic instrumental error was completely neglected.

The GISS methodology is as poorly realized, apparently, given the very small error bars that consistently accompany their published anomaly trends (described below).

When these mistakes were rectified the estimated lower limit of uncertainty in any given global annual surface air temperature anomaly proved to be (+/-)0.46 C. This uncertainty is sufficient to make the surface air temperature anomaly trend statistically indistinguishable from 0 C, at the 1-sigma level, over the entire 20th century.

Figure 3 from that paper:

For those interested, the reprint is freely available, courtesy of Bill Hughes at Multi-Science Publishing (pdf download). A summary discussion also appeared last January here on tAV and on WUWT.

That paper represented only half the study, however. The second half is now just out, again in E&E, (2011) 22(4). It’s open access and a reprint is available here (pdf download).

The new paper uncovers and explores the consequences of two further mistakes in the CRU (UK Met, GISS) GASAT statistical uncertainty model. The CRU model:

Implicitly asserts that all daily temperatures are constant across any given month.

Completely neglects magnitude uncertainty [designated as (+/-)s], which is the uncertainty in a mean that must be applied when averaging a measured set of observables that have inherently variable magnitudes.

In Ref [1], the CRU statistical model was shown to strictly follow signal-averaging “Case 1,” wherein the 1/(sqrtN) error reduction is applied to the measurement mean, and magnitude uncertainty, (+/-)s, is absent.

Given these statistical conditions, the necessary physical corollary to Case 1 is that all the measurement error is random and that all the observables are of inherently constant magnitude.

What does it mean in terms of air temperature, when “all the observables are of inherently constant magnitude”? It means that the CRU statistical uncertainty model has an unstated and implicit physical assumption: one of constant air temperature.

Again: when the uncertainty model is made to uniformly exclude (+/-)s, the statistics necessarily constrains the physics and imposes the condition that the observable is of an inherently constant magnitude.

The CRU statistical model excludes magnitude uncertainty, (+/-)s, and so includes a hidden physical assumption: daily air temperatures are constant across any given month (in the full statistical model, both day and night temperatures, actually).

The climate scientists compiling the global average surface air temperature anomaly record have failed to recognize this error, and unfailingly applied this statistical model since at least 1986.

Some may ask: so what if the formalism of the GASAT statistical uncertainty model implicitly asserts constant temperatures across any given month?

Well, if the assumption is that air temperature is physically constant in any given month, and it is so assumed, then there can exist no physically real temperature gradients across that month. Every day within every given month must have the same temperature. Every monthly temperature trend must have zero slope.

Since, during any month, the CRU statistical uncertainty model does not allow any temperature trends to be physically real, then any such trend is not allowed to have any external climatological meaning. Therefore, the CRU statistical uncertainty model requires that any non-zero trend must represent a problem internal to the measurement process.

The CRU statistical model requires that non-zero trends in observed monthly temperatures must be assigned to experimental error.

In the CRU model, every non-zero trend in monthly temperature becomes a systematic experimental error; statistically imposed and entirely artificial.

From a physical point of view, it is obvious nonsense to suppose that monthly temperature trends represent experimental error. But we’re not analyzing physics. We are analyzing statistics, and examining the ramifications of a statistical model. Statistics knows nothing of physics. Statistical models, as such, are not bound by physical theory. Climate scientists have uncritically applied an incorrect statistical model to a physical problem. That incorrect statistical model enforces an impoverished physical view of surface air temperatures. That impoverished physics, in turn, imposes a distorted interpretation onto the measurement results – surface air temperature.

The first part of the new E&E paper demonstrates this very basic mistake exists in the CRU error model, and then examines its consequences. For example, it assesses the false uncertainty imposed on the October 1998 temperature records of seven globally distributed surface climate stations.

Here are the results in summary:

Table 1: Imposed Uncertainty in October 1998 Air Temperature Anomalies

The statistical uncertainty model is so poorly constructed that it converts any non-zero monthly temperature trend into an experimental error. This experimental error is then imposed upon the temperature anomaly record. The climate science community has apparently completely overlooked this problem. They have been thoroughly incognizant of the false error: the false error imposed by the impoverished physics enforced by their poorly constructed statistical error model.

These statistically imposed uncertainties are enormously larger than the 160-year anomaly trend of 0.8 C, and make a mockery of the whole process. Of course, these huge uncertainties are artefactual and can be eliminated merely by using physically more appropriate statistics. But climate scientists have never moved to a physically more realistic statistical model, and so the problem persists.

In applying an incorrect statistical model, the climate science community has avoided a real systematic uncertainty of order (+/-)0.5 C and has imposed a false artefactual uncertainty of order (+/-)5 C. The ~(+/-)5 C artefactual uncertainty is a methodological embarrassment but not physically real. A revised and appropriate statistical model will avoid it.

That’s the present circumstance. So far there’s no evidence of any official intent to correct the model. Of course, bringing in a physically appropriate statistical model means taking systematic measurement error into account, along with other non-random elements including magnitude uncertainty. Doing so, of course, leads to an air temperature anomaly trend like that in Ref. [1] Figure 3 (above), which is statistically not different from zero.

But what, then, about magnitude uncertainty?

CRU and the UK Met use the 1961-1990 anomaly normals to set the zero line for their global average surface air temperature anomaly trend. Part IV of the new E&E paper provides the magnitude uncertainties for the 30 sets of twelve months within the 1961-1990 normal period.

Here they are from the paper:

Table 2: Magnitude Uncertainties in the 1961-1990 CRU Monthly Normals

Month

(+/-)s (C)

Month

(+/-)s (C)

January

0.194

July

0.128

February

0.212

August

0.133

March

0.199

September

0.126

April

0.137

October

0.155

May

0.144

November

0.154

June

0.132

December

0.200

CRU May 2005 data set.

From these monthly (+/-)s values, the root-mean-square average annual magnitude uncertainty is (+/-)0.17 C. This number is the uncertainty in the magnitude of any given annual anomaly during the normal period, due to the natural and inherent variation in each average monthly temperature anomaly magnitude during the 30 years of the CRU normal period.

In a proper analysis of the 160 years of global air temperature anomalies, this magnitude uncertainty should be propagated into the temperature anomaly calculation and appear as (part of the) error bars in any graphical presentation of the anomaly trend.

But how many times have we seen that done — by any group? (Hint: not once; not ever.)

In 1988 [3], Jim Hansen implied that the normal-period magnitude uncertainty should be included in estimates of the climatological significance of surface air temperature anomaly trends, when he made a big deal in his paper about the (+/-)0.13 C magnitude uncertainty in the GISS 1951-1980 normal.

In his words: “What is the significance of recent global warming? The standard deviation of annual-mean global-mean temperature about the 30-year mean is 0.13 C for the period 1951-1980. Thus the 1987 global temperature of 0.33 C, relative to the 1951-1980 climatology, is a warming of between 2-sigma and 3-sigma. If a warming of 3-sigma is reached, it will represent a trend significant at the 99% confidence level.”

So, Jim Hansen says that if the global anomaly trend emerges from the magnitude uncertainty bars, it becomes statistically significant. But notice what else Jim Hansen has done here: 1951-1980 now embodies the definitive climatological variability of an entire century of temperatures. If the anomaly trend exceeds the 1951-1980 3-sigma, it is climatologically significant to 99% confidence for the entire GISS 1880-1988 period.

In 1988 Jim Hansen testified before the Senate Committee on Energy and Natural Resources that if the anomaly trend reached +0.4 C, then there would be 99% certainty (0.39 C = 3-sigma), that AGW would indeed be observationally apparent.

Again, in his own words [4]: “[C]ausal association of the greenhouse effect and the global warming … requires first that the warming be larger than natural climate variability and, second, that the magnitude and nature of the warming be consistent with the greenhouse mechanism. … The observed warming … is almost 0.4 degrees Centigrade by 1987 relative to climatology, which is defined as the 30 year mean, 1950 to 1980 (sic) and, in fact, the warming is more than 0.4 degrees Centigrade in 1988. The probability of a chance warming of that magnitude is about 1 percent. So, with 99 percent confidence we can state that the warming during this time is a real warming trend. … [T]he results of global climate model calculations [show] … the expected warming is of the same magnitude as the observed warming. Since there is only a 1 percent chance of an accidental warming of this magnitude, the agreement with the expected greenhouse effect is of considerable significance. (my emphasis)”

So there we have it. The entire 99% significance of AGW rested upon the definitive 20th century climatology, which was ‘whatever happened between 1951 and 1980.’ That’s a pretty flimsy foundation for such an earth-shaking conclusion.

In his 1988 testimony, Jim Hansen’s Figure 1 showed the senators the GISS almost-1988 global temperature anomaly trend. He gave a 95% confidence error bar of 2-sigma = (+/-)0.14 C [5], but offset it from the actual trend line. This 2-sigma error bar represented [6], “The greatest source of error or uncertainty in the derived temperature changes [which] is due to the incomplete spatial and temporal coverage provided by the finite number of meteorological stations.” Jim Hansen’s climatologically critical (+/-)0.13 C magnitude uncertainty was nowhere to be seen on his graphic (notice that he mentioned nothing about instrumental error, either).

Here’s how the GISS 1988 anomaly trend could have looked to the senators if Jim Hansen had decided to include (+/-)s:

Recall the CRU 1961-1990 normal (+/-)s = (+/-)0.17 C. Three sigma of that is (+/-)0.51 C, not (+/-)0.39 C. Natural variability doesn’t seem to respect the rather arbitrary definitions of climatology.

Here is what the global average surface air temperature anomaly trend looks like with the CRU normal-period error bars, set to the climate-science accepted Hansen-definitive 99% certain 3-sigma:

, with error bars representing 3-sigma of the (+/-)0.17 C normal magnitude uncertainty; error bars are (+/-)0.51 C.”]It appears that when the definitive climatology changes from definitively 1951-1980 into definitively 1961-1990, the global average surface air temperature anomaly trend changes from definitively 99% real to definitively not emergent from the zero line.

That certainly demonstrates a result robust in the climate science sense.

But why should any 30-year period define 160 years of natural variability? We know that climate began emerging from the LIA somewhere around 1850 [8, 9]. If Earth’s climate has been in a single phase since then, a best estimate for the natural variability of air temperature is the annual jitter across the entire 160 years.

The even more definitive definitive natural variability, calculated from the full 1850-2010 CRU data, set turned out to be (+/-)s = (+/-)0.28 C [10]. When this anomaly magnitude uncertainty is combined as the r.m.s. with the estimated lower limit of measurement error, (+/-)0.46 C [1], the uncertainty associated with the surface air temperature anomaly trend is 1-sigma = 0.54 C.

The full 160-year variability allows that the Jim Hansen approved 99% definitive uncertainty limit becomes (+/-)1.6 C, which definitively indicates that we’re only half-way to demonstrating a definitive AGW thermal influence on climate.

Here’s what the CRU 2010 anomaly trend looks like with 3-sigma of just the 160-year (+/-)0.28 C natural variability:

Figure: The 1850-2010 global average air temperature anomaly trend, with (+/-) 3-sigma of 160 years of natural variability; the error bars are (+/-)0.84 C.

Here’s the analogous plot, but with 99% significant 160-year natural variability centered on the zero line:

Figure: The 1850-2010 global average air temperature anomaly trend, with (+/-) 3-sigma of 160 years of natural variability centered on the zero line; the error bars are (+/-)0.84 C.

It’s very clear that the entire anomaly trend never emerges from the range of natural variability. Even the intense 1998 El Nino anomaly of +0.53 C does exceed the 95% level of natural variability – (+/-)0.56 C about zero.

For 160 years, the global air temperature has been wandering about well within its self-defined natural limits.

Summary points concerning the last 25 years of global air temperature anomaly studies, as carried out by the community of climate scientists:

1) They have uncritically assumed random read-error.

2) They have completely neglected systematic instrumental error.

3) They have almost completely neglected magnitude uncertainty, (+/-)s.

4) They have statistically imposed an impoverished physics. This in turn forced very large, very false, and entirely unrecognized uncertainties onto the GASAT anomaly record.

5) They have missed the fact that the 160-year natural variability, alone, obviates any reasonable alarm about recent trends in surface air temperature.

6) After rectifying both the incorrect and the neglected measurement statistics, the empirical uncertainty in the 160-year trend of global average surface air temperature anomalies produces a trend statistically indistinguishable from zero Celsius. That is, from the point of view of measurement statistics, a line along the zero axis is as informative about the trend in global surface air temperature anomalies as is the measured trend line itself.

None of these mistakes have been corrected in the peer-reviewed literature that purports to construct and report the GASAT anomaly trend.

Finishing up with the conclusion paragraph from the paper [10]: “Analysis of the statistical protocol commonly used to estimate uncertainty in the global average surface air temperature anomaly index shows it to be fatally flawed. It should be discarded in favor of one that explicitly reflects the lack of knowledge concerning the error variances in surface climate station temperature measurements. The magnitude uncertainty in the normals period of the global mean air temperature anomaly has rarely been evaluated. This temperature magnitude uncertainty represents the minimal variability one may expect in mean annual temperature over a given climate regime, presuming that the normals period is representative. Assuming the 1961-1990 normals period is representative, then (+/-)0.51 C captures 99.7% of the 20th century global average air temperature variability. If the global climate has been in a single phase over the interval 1856-2004, then (+/-)0.84 C captures 99.7% of the intrinsic climatological air temperature variability of the 20th century. These considerations imply that most or all of the variation in global average temperature observed over the 20th century can most parsimoniously be assigned to spontaneous climate fluctuations that also display the pseudo-trends reflecting persistence. It appears that there is no particular evidence for an alarming causative agent in the 20th century global average surface air temperature trend. Therefore policies aimed at influencing this trend are empirically insupportable.”

312 Responses to “Uncertainty in Air Temperatures”

I’m not convinced by this paper. I think Pat is getting jammed up by never quite defining T(bar) and using it two different ways. I’m looking at it and pondering whether the best ‘blog’ way to show the problem is using monte-carlo or algebra. I’ll be thinking about this a bit though.

stansaid

Howardsaid

I don’t pretend to know stats, but do know field data collection very well. What strikes me is that the uncertainty does not decrease since 1880. With improvement in monitoring methods and coverage into the modern era, wouldn’t the uncertainty decline through time?

Paul Linsaysaid

The bigger issue is the concept of global average temperature. I think it’s physically meaningless. On any given day temperature varies from 240 K to 310 K across the earth, the daily variation at any location is roughly 10 K, and annual variations run the gamut from a few degrees to many. Absolutely no physical process is affected by it (unless you’re Michael Mann measuring teleconnections). As for a rise of 0.5 C, my thermostats at home can barely deal with a change that small.

On top of it all, the measurements that go into global average temperature are from badly maintained, uncalibrated instruments with no quality control of the data collection.

It would be much better to force the modelers to publish global temperature maps and compare their results to the satellite measurements. I’d bet the comparison would be extremely embarrassing.

It’s time to get away from statistical arguments and push the discussion towards physical reality. As far as I’m concerned, the stats arguments are more theology than science and allows hocu-pocus to hide data.

timetochooseagainsaid

The problem with arguments based on uncertainty is that it is just as likely a trend is really much larger as it is that it is actually zero. That the signal is small relative to noise is a good point, but I wouldn’t make the leap that the signal isn’t there.

Pat Franksaid

Thank-you again, Jeff, for giving me the opportunity to speak out. And thanks to everyone for your interest.

First off, Dr. Francis Massen of the Meteorological Station of the Lycée Classique de Diekirch, Luxembourg, wrote and pointed out a typographical error on article page 412, two lines under equation 9. The “T_bar_prime” in parenthetical expression relating r_i is missing the prime.

That line should read, “… for the Gaussian fits to the bias temperatures (r_i = tau_i_prime – T_bar_prime) …” I’ve sent a short corrigendum to E&E to correct that.

Francis also mentioned that older versions of Adobe appear to not render the article pdf properly, and some of the expressions may not appear correctly. Francis particularly noted that the “prime” disappeared from the T_bar_prime in the line just under equation 8. Everything seems to come out right in Adobe Reader 9.4.6, which is what I have (Mac version).

#1, Lucia, is it possible that the typo I missed has misled you? T_bar_prime is defined in equation 6.

#3, Howard, uncertainty should decrease with improved equipment and monitoring, yes. However, in my first paper, I estimated a lower limit of uncertainty from data obtained using modern equipment under ideal conditions of monitoring and maintenance. A modern lower limit is applicable to the full range of data, because it will be less than the uncertainty in older data.

The magnitude uncertainty discussed here doesn’t include the measurement error. It was calculated from the values of the monthly standard deviations of the full 160 years of anomalies.

“Physical, mathematical, and observational grounds are employed to show that there is no physically meaningful global temperature for the Earth in the context of the issue of global warming. While it is always possible to construct statistics for any given set of local temperature data, an infinite range of such statistics is mathematically permissible if physical principles provide no explicit basis for choosing among them. Distinct and equally valid statistical rules can and do show opposite trends when applied to the results of computations from physical models and real data in the atmosphere. A given temperature field can be interpreted as both ‘‘warming’’ and ‘‘cooling’’ simultaneously, making the concept of warming in the context of the issue of global warming physically ill-posed.”

And you’re also right that fully acknowledging the systematic uncertainties in the global surface air temperature anomalies calls into question the calibration of GCMs.

Pat Franksaid

Timetochooseagain, there is no suggestion that the signal isn’t there. Please note that the sentence at the end of the penultimate paragraph is qualified, “…from the perspective of uncertainty statistics…”

The physically true trend line could be almost anywhere within the statistical uncertainty bars. The sentence is a bit extreme, I agree, but I made it to draw attention to the fact that the uncertainty is so large that no one knows what the true trend looks like. All the alarm is about a trend that has no strict physical meaning.

#6 Brian, you’re right, thanks. I never seem to get the text fully clean.🙂

Pat–
The difficulty is that I’m scratching my head as soon as I get to “case 2”. In fact, there is some puzzlement in case 1– but this just has to do with which statements are definitional. For example: in the second line on 972, you give an equation for Tbar. Is this definitional? It must not be because if it was, there can be no uncertainty. But if it’s not definitional, what’s the “noise”, n_bar in (2) relative to? tau_c in (1)?

The difficulty is that while I may guess right, I have to guess. And this is happening in case 1. So, as I move on to case 2, I now have to consider all the various options.

So, in short: my puzzlement starts well before equation 6. Mostly, I don’t know what you errors are supposed to be relative to. That is: What the definition of “the right thing”? (This sounds stupid, but it’s not. Depending on what analysis someone is doing, we need to know what ‘perfection’ is before we can disucss all the sources of error.)

timetochooseagainsaid

9-I get that, although I think that the data is a little more informative than a flat line, because the flat line is not in he assumed mean/median/mode/center of the probability distribution in any year. While the flat line may fall within an arbitrary probability threshold, in some years the flat line is less likely than others. Better to quantify the probabilities than to just say it’s within the 99 or 95% bounds. A hypothetically relatively steep line is also as “informative” if it falls within the intervals in all years.

I think over the shorter period of the last thirty years we have very accurate data from satellites (which are verified against radiosonde data) which confirms that the surface trend (for that period) is erroneously large, so physically the flat line is much more reasonable than a steep line, even though both are equally likely statistically.

phisaid

kimsaid

Heh, so it seems when I say ‘the best correlation between CO2 levels and temperature was in the last quarter of the last century’ I really have not much basis for saying that. Curiouser & Curiouser.
=================================

Carricksaid

I don’t pretend to know stats, but do know field data collection very well. What strikes me is that the uncertainty does not decrease since 1880. With improvement in monitoring methods and coverage into the modern era, wouldn’t the uncertainty decline through time?

Yes, it would have to. Increased coverage, more uniform methods for processing data, etc. E&E lives up (or down) to its reputation.

Carricksaid

Phi, actually I do understand how the data are collected, and I understand how to correctly model different sources of error. I give Pat credit for putting his work out for others to take target practice at, I have too much to work on to bother with shooting it down, I’m glad Lucia’s taking an interest, hope she follows through.

The real test of objectivity will be if this error analysis is shown to be in error, would he and you admit it.

phisaid

Carrick,
I said above that I did not think there is a significant problem of accuracy, so I do not think the statistical methods of error handling are appropriate. This is an issue of bias and in this respect, there is no improvement since the nineteenth century, on the contrary.

Pat Franksaid

Pat Franksaid

#10, Lucia, I didn’t initially realize you were referring to the first paper. You’re referring to the equation for the mean on page 971.

In paper 1, Case 1 discusses signal averaging when the observable has an inherently constant magnitude. Equation 1 separates the ith measurement of tau_c into the inherent magnitude of the event, tau_c, and the random noise attending the ith measurement, thus t_i = tau_c + n_i.

Obviously, in a real measurement, the noise can’t be separated from the inherent magnitude like that. Separation is made in equation 1 so that the components of a case-1-type measurement can be discussed explicitly.

The equation in line 2 of page 971 describes how the mean of the measurements is calculated. Since each t_i includes n_i, which is the noise uncertainty in the ith measurement of tau_c, then although the value of the empirical mean can be computed exactly, the meaning of the measurement mean includes the uncertainty in the mean as imposed by the cumulation of the noise in the several measurements. That is the message conveyed by equation 2.

The difference in meaning between a measurement mean and the mathematically unambiguous value of a computed average is the difference in meaning imputed within science and engineering, as opposed to the valuational meaning produced by the calculational operation alone.

Please see references 14-17 in paper 1; especially 16 and 17. This is all very standard, and I don’t see that you have to guess at anything.

The “perfection” you require can be thought of as the true inherent magnitude of a measured observable. Any measurement has a variety of errors within its methodology — some random (thermal noise), some systematic (aging or temperature-sensitive electronics). We never, ever know the true magnitude of the observable we’re measuring (even in a highly precise calibration experiment).

If one agrees with Neil Bohr that science is only about what can be observed rather than about what is, then the “true” magnitude of an observable is almost reduced to a philosophical construct. It’s unobservable. If you like, the “perfection” is a categorical definition developed within statistics, and adopted by science in order to make an estimate of measurement uncertainty.

Pat Franksaid

#11, timetochooseagain, I generally agree with your first paragraph. I’ve been in occasional email conversation with Michael Limburg in Germany, who contacted me after paper 1 appeared. Michael’s diploma thesis has involved making an exhaustive study of the sources of uncertainty in the surface air temperature measurement record. He recently observed that the only proper way to plot the air temperature anomaly trend is as a thick grey band about zero, with no line at all showing within. I agree with him, and expect that you’d agree as well.

Unfortunately, Michael has apparently run into some professional opposition that has prevented him from publishing his results — a circumstance that is shamefully common in climate science. He’s a young guy who should not have to bear such pressure.

About satellite temperatures, I agree they’re far better than the surface air temperatures. The claim is that they’re good to (+/-)0.03 C, relative to temperature measurements simultaneously acquired by balloon-launched radiosondes. But it looks to me like that number reflects precision, rather than accuracy. Evaluations of radiosondes themselves appear to show biases, i.e., systematic errors, that should be propagated into reported satellite temperatures.

Pat Franksaid

Carrick, as I noted to Howard in #8, Figure 1 plots a modern lower limit estimate of uncertainty. As an uncertainty estimate calculated from calibrations of a modern temperature sensor, it shouldn’t decrease with time to the present.

A better estimate, admittedly, would be to enter in the absolute error of a LIG thermometer in a Stevenson screen, ratioed in to the uncertainty estimate according to the fraction of LIG sensors entering measurements into each annual anomaly. Then one would see the uncertainty bars increasing in magnitude at least back to about 1990. But unfortunately, I have not found a calibration study of LIGs in Stevenson screens referenced to a high precision standard, i.e., an experiment like that of Hubbard and Lin applied to LIG/CRS. All the studies appear geared to getting transfer functions between LIGs in Stevenson screens to the MMTS sensors brought in after about 1990, so that the surface station temperatures can be adjusted after the change-over.

Your comment about E&E is unworthy. Isn’t science judged by the quality of the argument?

Steve Fitzpatricksaid

A lot of pretty smart people (including those of a strongly skeptical POV) have looked at this, and not come to your conclusions. I will (over the next week) go over your paper in detail, but the chance of a large, fundamental mistake seems to me quite low. I would be cautious about being too sure of this.

timetochooseagainsaid

23-The uncertainty in the balloon records should not propagate the to the satellites, as the radiosonde records are merely used to verify the that they are all in reasonable agreement. However, it is a fair point that the radiosondes have their issues with accuracy. But given the good agreement with satellite records, I tend to think that in terms of trends, and even year to year variability, these are probably small. Since their is reasonable agreement, in order to argue for a trend much higher or lower than what they show one would need to argue that completely different sources of data have biases of the same magnitude in the same direction, for completely different reasons, which is pretty unlikely. A recent paper to look at how certain or not we are about the tropospheric trends is:

They concluded that, accounting for systematic biases in some of the datasets, they could get a best estimate of the trend to within about ±.03 degrees per decade, at least for the tropical data, and I expect elsewhere the agreement is even better. So I think we can be reasonably confident in the satellite data, which indicate that recent trends are pretty small and not much to get alarmed over.

Philsaid

The Essex, et al. (2007) paper mentions different methods of averaging data. Currently, I believe that temperatures or their anomalies are averaged using a simple average (i.e. the Sum of T_i from i=1 to n divided by n). It is my understanding that in climate modeling radiation is the principle method of heat transfer and that convection and conduction are considered to be of minimal importance. Essex mentions, IIRC, that a more appropriate method for averaging temperatures when radiation predominates would be using the Sum of (T_i)^4 divided by n, where T_i are in Kelvins. Observing that most of the warming appears to be at high latitudes, it occurred to me that the anomaly method makes the implicit assumption that an anomaly of 1°C at high latitudes is the same as an anomaly of 1°C at low latitudes, whereas, since radiation varies with the 4th power of the absolute temperature, this assumption would not necessarily be true given the large difference in absolute temperatures between high and low latitudes on Earth.

Some time ago I tried to make some rough calculations of what the effect on the trend over the last century or so would be if this method of averaging temperatures were used instead of the simple averaging of anomalies. I came up with a very rough estimate that the warming trend may be overstated by as much as 100% – that is that the trend using the average of the 4th power of absolute temperatures would be about half of the oft quoted trend of about 0.7°C or so. Unfortunately, I never posted my rough calculations and I would have to do them all over again, so I am posting this simply for purposes of discussion with no support for the size of the apparent bias. In conclusion, it is possible that an anomaly at high latitudes may be worth only about half as much as an equal anomaly at low latitudes, which, given that the high latitude anomalies are much greater than low latitude anomalies, may result in significant overstating of the global average temperature trend over the last century or so calculated using simple averaging of anomalies.

Philsaid

Carricksaid

Pat, your error bars regardless of how you obtained them do not pass the smell test, because the much sparser coverage prior to 1950 introduces both measurement and systematic error sources. Error bars should grow as you go back in time before 1950, and should be relatively stable since then. Urbanization has not occurred uniformly, so it you take that land usage changes into account, again you shouldn’t see a constant error bar associated with that, but one that changes over time.

I think if I were to make a serious effort about improving the art, I’d sit down and digest carefully Roger Pielke Sr’s work.

As far as my not having time to work on this but still having time to comment on it…well I don’t to look at your paper in detailk, but Im still allowed an opinion I think, based on “first cut” sanity tests.. And my opinion (which is informed) statements like this—“He recently observed that the only proper way to plot the air temperature anomaly trend is as a thick grey band about zero, with no line at all showing within”—while red meat to the “faithful”, are mindnumbingly silly.

Regarding E&E, they have a very bad reputation for poorly reviewing the work that gets submitted to them. I think your paper (assuming it survived) would have been greatly strengthened by a strongly critical review (not negative just critical).

Finally, I will affirm that I believe you are acting in good faith,. Perhaps you could quote the phrase that you felt suggested that I considered this otherwise?

MPsaid

I would estimate the standard errors of the mean temperatures with a Tamino/Roman/Berkley type of method by inverting the Hessian of the weighted sum of squared errors and use the Inverse Student’s T cumulative distribution function with the appropriate degrees of freedom to estimate the 95% confidence interval for each mean temperature. The variance of the temperature for each month can be estimated from the residuals. This method takes into account covariance between the parameters (mean temperatures)

Or alternatively estimate standard errors by bootstrapping, e.g. resampling of stations, generating many reconstructions (could lead to missing months I guess). From this the 95% confidence interval can be estimate using Studentized Bootstrap or Percentile Bootstrap. Could be computationally costly, benefit is you do not have to make any assumptions about the underlying distributions.

phisaid

This is all well and good but until you’re not interested in the practical conditions of measurements, you will not know what represent the values of which you may correctly process the margins of error. These values ​​are simply not representative of the temperatures averages regional or global. The divergence of TLT over lands of the northern hemisphere makes it clear.

Or alternatively estimate standard errors by bootstrapping, e.g. resampling of stations, generating many reconstructions (could lead to missing months I guess). From this the 95% confidence interval can be estimate using Studentized Bootstrap or Percentile Bootstrap. Could be computationally costly, benefit is you do not have to make any assumptions about the underlying distributions.

This sounds interesting; does anyone know of someone using a re-sampling approach like this? Something like this, or a bottom-up error propagation (which could also be implemented by Monte Carlo) seems straight-forward enough. Maybe it’s the quality of the meta-data that makes doing the error propagation hard?

Pat Frank’s argument is totally erroneous. His analysis makes an error which should be obvious to everyone who thinks about how the temperature record is being used.

If there is a systematic error in a given thermometer, as long as it does not vary systematically with time, it should not contribute to a significant error in a TREND, which is what is used to calculate the global temperature anomaly. After all it is not the absolute value of the temperature which is the issue only the size of the change in temperature over time.

Mark Tsaid

Actually, backgroundny systematic error that varies with the amplitude will necessarily vary with time (unless the amplitude is fixed) and will necessarily impact a trend. TThermometers, in general, tend to have errors that are a function of amplitude.

Next time try to be a bit smarter regarding your arguments as that one took all of a second to refute.

W.r.t my comment in the other thread, I must conclude your data analysis skills are minimal at best, if you have any, and thus you do not have much of a science, logic, or engineering background.

Anonymoussaid

About the error bars, how to deal with the difference between Giss (they use raw data) and the CRU which homogenizes temperatures ? According to Böhm et al. 2001, raw data are affected by a cooling bias of about 0.5 ° C per century. This is obviously huge, what do we do? Objectivly, ethicaly ?

“Actually, backgroundny systematic error that varies with the amplitude will necessarily vary with time (unless the amplitude is fixed) and will necessarily impact a trend. Thermometers, in general, tend to have errors that are a function of amplitude.”
Mark,

A background systematic error that was proportional to the amplitude of variation was not part of Frank’s analysis. It was never alluded to in his paper. He analysed a temperature offset that varied from thermometer to thermometer, but did not say anything about an offset of the slope of temperature readings versus the variation of actual temperature.

A constant systematic offset for each thermometer was assumed in his analysis. He never calculated the impact of systematic offsets on the average temperature trend. He neglected the fact fixed offsets in the temperature of an indivdual thermometer would cancel out.

If you have a reference that shows a significant error in proportionality fact for thermometers, please provide one. Otherwise I must conclude that you are making it up as you go along.

You are reluctant to admit that the emperor has no clothes on. You simply nod your head in assent with any analysis that seems to debunk global warming because you like the conclusion, and overlook a glaring error in analysis.

Nearly all measurement devices have a non-linear error, this is fairly common knowledge for anyone with any level of data analysis education. If you’re going to continue to pretend to be such an idiot, I’ll just assume you’re making it up as you go along because you have nothing else.

Pat Franksaid

#26 Steve, to my knowledge no one else has taken the systematic instrumental error determined for surface station temperature sensors, and applied it to the air temperature anomaly record.

#27 Timetochooseagain, I’m in no position to knowledgeably discuss radiosondes or satellites, and so will have to let the topic lapse.

#28 Phil, your approach sounds interesting and I’d encourage you to pursue it a bit. Chris Essex is a great guy. He’s open to contact and has helped me a lot. So, why not contact him with your idea? He’d probably be willing to advise you about whether it’s a valid method. You may have a GRL submission.🙂

Pat Franksaid

#30 Carrick, the systematic error bars in 2010 E&E 23(8) represent a *lower limit* estimate using careful calibration experiments Hubbard ad Lin applied to a modern MMTS sensor. In the paper, I specifically mention that the lower limit of uncertainty from a modern sensor under estimates the uncertainty in past measurements, because past sensors were less accurate and even more prone to the systematic errors stemming from environmental conditions. Your “smell test” is misapplied.

An full-blown uncertainty estimate would include the systematic errors of LIG thermometers in CRS shields, as they occur under various conditions of local climate over past time through 1880. This would require knowing those uncertainties, or estimating them by carefully calibrating representative sensors in field experiments sampling different climates across the globe, or in a series of well-controlled lab experiments. No one has done such experiments.

If one had such data, then the uncertainties would increase into the past. But we don’t have such data, and a lower limit estimate becomes valid. A lower limit estimate need not increase into the past. It needs only to be lower than past uncertainties. See page 981 & 982 in 2010 E&E 22(8) for a discussion of this very point.

The anomaly magnitude uncertainty in 2011 E&E 22(4) is just a measure of natural variability and is annual variability calculated from the monthly variabilities. If one had the data, one could estimate the uncertainty in the total natural variability. That sigma would also increase into the past. But one doesn’t have the data.

You weren’t commenting on my work, Carrick, you were sniping at it — making disparaging remarks without substance. E.g., “E&E lives up (or down) to its reputation.,” e.g., “The real test of objectivity will be if this error analysis is shown to be in error, would [Pat] and you [Phi] admit it. My bet is on won’t.”

You’re welcome to have any opinion you like and have every right to express them all. But I have the same right to call you on the shallow ones.

Representing an uncertainty envelope as a grey band without an internal line may be “mindnumbingly silly” to you, but is nevertheless represents uncertainty in a trend in a way that is scientifically entirely honest.

You wrote, “Regarding E&E, they have a very bad reputation for poorly reviewing the work that gets submitted to them.”

A reputation established by whom? Certainly Gavin Schmidt, Tamino, and Michael Mann are in no position to tarnish the reputation of others. Who else, and how do they know?

You also wrote, “I think your paper (assuming it survived) would have been greatly strengthened by a strongly critical review (not negative just critical).”

As you raised the issue of review, E&E 2010 22(8) and E&E 2011 23(4) were once a single manuscript that was submitted to the AMS Journal of Applied Meteorology and Climatology. It went through four reviewers, two associate editors and 3 rounds of review with them. Three of the four reviewers recommended publication. One of them was adamantly opposed from the outset. This latter reviewer merely continued to repeat his objections after I had repeatedly shown they were meritless. The editor accepted the one reviewer over the three, and rejected the manuscript. The three reviewers had many critical questions but none of them found any serious error in the analysis. Neither did the fourth, but his adamancy carried the day. That particular reviewer also gratuitously accused me of dishonesty. The whole process took a full year. It was a fairly standard climate science experience, for a skeptical submission.

On the other hand, one of the E&E reviewers found an error in one of the equations that had gotten by all four AMS reviewers, two AMS AEs, and the editor. So, which journal provided a better review?

When I submitted to E&E, I was open about the AMS submission and its outcome. The publisher of E&E, Mr. William Hughes, asked to review the prior reviews before he fully agreed to my submission. My entire experience with E&E was of standard high quality, and I speak as a well-published experimental scientist.

You wrote, “Finally, I will affirm that I believe you are acting in good faith,. Perhaps you could quote the phrase that you felt suggested that I considered this otherwise?”

You can find it in #19, where you bet that neither I nor Phi would admit to having made an error even were it objectively demonstrated. You imputed we are both dishonest (or blind or stupid). However, I appreciate and accept your affirmation.

Pat Franksaid

#31 MP your method only computes the variances of the reported numbers. It reveals nothing at all of systematic instrumental error, which is the main concern of my study.

Phi has it right in #32. The “practical conditions of measurements” includes the most basic question of all, when worrying about measurement variance, which is how accurate the sensor is under real field conditions. This rock-bottom issue of data quality control seems to be invariably invisible to people discussing global anomaly trends.

#35 Eadler, systematic error does vary with time. It varies continually with local weather conditions. I discuss this in paper #1, and the calibration experiments of Hubbard and Lin deal with this subject extensively. You need to read the literature before making a judgment of right and wrong.

#39 Carrick you made a distinction without a difference. Loss of objectivity is to yield principle to subjective desire. What else is that but a sacrifice of ethics to preference? You implied Phi and I would be dishonest. If you propose blind subjectivity, you’re suggesting absent professional recognizance; also known as ethical incompetence. However you parse it, it’s unfair and unjustified.

#42 Eadler your analysis is wrong. I did not apply a constant offset. In fact, if you look at the paper, and specifically at the Legend to Figure 1, you’ll see that I actually removed the bias offset from the calibration data before doing the Gaussian fit. From Figure 1, Legend: “each temperature bias relative to the R. M. Young probe was removed to yield a common mean of 0 C.”

The lower limit uncertainties I calculated are taken from the standard deviations of the systematic error. They are not a constant offset bias. In the paper, I also point out that the offset bias will almost certainly vary with each new data set, because the environmental conditions that produce the systematic error vary continuously with time (p. 978).

I refer several times in the paper to the published calibration experiments of Hubbard and Lin (refs 35, 42,44,46 & 49), which are specifically concerned with systematic error. I suggest you avail yourselves of those papers. You’re outspoken in your criticisms, not stopping at personal disparagement, without having done your homework.

Mark Tsaid

Eadler has neither the requisite knowledge nor experience to offer valid criticism, Pat. He is merely a parrot regurgitating what he finds from others without the ability to assess the veracity of their claims. I would not be as nearly as patient… though, I suppose, I have been in the past, just not nearly as often as you.

phisaid

GISS and CRU have contradictory approaches for the quality of raw data. That either is correct (in fact, I think they are both wrong) means that some variability in the series does not have the randomness attributed to it by applying the theory of errors. The measurement errors are overestimated and that is what is observed by comparing surface temperatures with proxies. But, the biases are ignored and they are important.

phisaid

To understand the case, it should be noted an important point. The physical phenomenon underlying rises more or less linearly in terms of regional averages but it is not the case at individual stations, there, its evolution can be chaotic. It is therefore very difficult to distinguish his effect from measurement errors, at least if we restrict to a statistical reflection.

I had a short discussion with Lucia on the topic yesterday and I have taken the time to carefully review what you have written and I think it has some problems. I was concerned initially because after having compiled temperature records myself, the variance is visibly different in time and visibly smaller than the stated error bars above – the thumbnail engineer in me says the error bars were too big.

Lucia pointed this out to me so I don’t get credit for finding it, but the problem I have with the method is that monthly weather variance is treated as uncertainty, I don’t believe this is at all correct. Local weather noise, if recorded accurately, is what the mean is comprised of. Constructing error bars based on this expected variance tells us little about the error in our knowledge of the actual earth temperature at that month. Think of it this way, if the thermometers were 100% perfectly maintained and 100% perfectly accurate and precise with perfect global coverage, you would still have huge error bars by your method yet global air temps would be nailed down perfectly. Jones 97 handled the separation of weather noise and measurement noise reasonably although I can’t vouch for the result as I have not replicated any of it.

Also, the conclusions about trend in the blog post and paper aren’t supported by a point to point statistical analysis. The trend is known better than any individual monthly value.

My point is that weather noise cannot be assumed to be a stochastic error. This is intuitively true due to the thermal mass of the earth. Not only will proximal stations be correlated indicating an accurate representation of actual temperature, distant stations may experience a measurable decorrelation, although the complexities of climate insure two stations shouldn’t be permanently decorrelated so detecting the effect with simple correlation math is likely a waste of time. PCA might pull a few decorrelated patterns out but without extreme care, they are as likely garbage as good.

Steve Fitzpatricksaid

I think you (and Lucia) have put your finger on the issue. If any measurement process is assumed to be perfect in terms of the measurement itself, then the uncertainty in the measured variable must be zero; any analysis of that data which suggests otherwise (via combination of variation in the measured variable with variation in the measurement process) is simply mistaken.

Carricksaid

#39 Carrick you made a distinction without a difference. Loss of objectivity is to yield principle to subjective desire. What else is that but a sacrifice of ethics to preference? You implied Phi and I would be dishonest. If you propose blind subjectivity, you’re suggesting absent professional recognizance; also known as ethical incompetence. However you parse it, it’s unfair and unjustified.

Good grief, whatever. Humans being subjective (which is the norm) is now suddenly being ethically incompetent. Yeah.

Steve Fitzpatricksaid

You may not have meant it that way, but I think most people would take:

The real test of objectivity will be if this error analysis is shown to be in error, would he and you admit it. My bet is on won’t. as meaning willful denial of reality. Suggesting to someone that they are probably going to willfully deny reality is always going to piss that someone off.

phisaid

Carricksaid

I’ll teach you another thing phi, people are allowed to their opinions, without having to launch into a Ph.D. thesis to explain why. And sometimes, as with you, it’s obvious when the audience is unreceptive to what I consider reason, so you know when to cut your losses.

In the Brohan paper I could not find Pat’s eq. 2 or something similar, Brohan does not include weather noise in the statistical model for a single grid cell. I agree with Jeff that weather noise should not be part of the statistical model that was designed to capture measurement uncertainty in a single grid cell. However, weather noise will affect the uncertainty in global temperatures when global coverage is not complete and should decrease when global coverage increases.

phisaid

About Brohan 2006, homogenization and the effect of disturbances, two quick notes (with or without receptive audience) :

1. The CRU use widely series already homogeneized by nationnal institutes, the results shown are for this reason only due to a residual homogenization.

2. The effect of urbanization is evaluated on papers based on the distinction between rural and urban stations. This method is highly questionable because it reduces disturbances from urbanization to the sole UHI. There is no a priori evidence that the slice UHI is predominant. If it is not, the result can only be a dramatic underestimate of the bias.

Geoff Sherringtonsaid

Pat, Thank you for putting into correct terminology the vague expressions of unease I’ve had for years, long after leaving the formality of the white lab coat and the bench of quantitative instruments. One gains a “feel” for error from bench work. Your paper is in accord with that “feel” which is a highly unscientific method of expression, but gives me much comfort. I shall read your paper several times before commenting again; but at first blush it is a most important contributiion. Thank you.

Pat Franksaid

#50 Jeff, thanks for your thoughtful comments. I’m composing a reply and should be able to post it in a day or so.

#64 Geoff, thanks. Like you, I felt some innate unease when I first read the way Brohan 2006 treated station error. I have to sweat experimental error all the time, as I’m sure you did, and just couldn’t see how measurement error could be so readily dismissed.

The situation turned out to be much more complicated than I’d first thought (when does that not happen?), which is why I went through the cases step-by-step in the first paper. I wanted to be very sure, for myself first, that all the parameters were laid out clearly.

Geoff Sherringtonsaid

Pat, Have you see this paper? It’s not so much for analysis of propagation of errors in time series as for accuracy of devices supposed to give similar results.
With regards to my many friends at Bureau of Meteorology, Australia.

[…] ‘The Air Vent’, Pat Frank presented results from his recent paper on errors in surface temperature measurements published in Energy and the Environment. Jeff Id introduced this paper with the following […]

phisaid

Carrick,
You ask Pat for his standard of proof. I could ask you the same question. On the blackboard, I have presented five elements converging to the same conclusion: the over-estimation of the twentieth century warming (NH land) to a value close to the degree. I could look for other and come to six, seven, eight, nine or ten developments showing the same thing. At what number will you acknowledge the existence of a flaw?

The situation turned out to be much more complicated than I’d first thought (when does that not happen?), which is why I went through the cases step-by-step in the first paper. I wanted to be very sure, for myself first, that all the parameters were laid out clearly.

Your error occurs in paper 1. You went astray by never defining what the “true” value corresponding to what CRU is trying to report (or alternatively, what you think is the “true” value that might be meaningful.) It may be that your error bars are “errors” relative to something someone somewhere might hypothetically under some circumstance want to discuss or report. I don’t know what that thing might be, but what I do know is that your error bars are irrelevant to the error in the quantity CRU is trying to observe and report.

Mark Tsaid

My comment was not nasty in the least, Carrick. Eadler does not have any knowledge of the things he is arguing. This is well known, which is why it is not a surprise that his arguments are typically fluff or so far off they are obviously irrelevant. His comments to me in particular amount to pretty good evidence he has never had to calibrate any piece of equipment, let alone a thermistor.

Pat (nor Phi) need not take outrage since there was no reason to do so. Your assumption that he did with you is actually ridiculous. There was no martyr card. He merely pointed out that you shouldn’t question his ethics and, once you stated that was not your intention, he graciously accepted that as sufficient explanation.

Carricksaid

You ask Pat for his standard of proof. I could ask you the same question. On the blackboard, I have presented five elements converging to the same conclusion: the over-estimation of the twentieth century warming (NH land) to a value close to the degree. I could look for other and come to six, seven, eight, nine or ten developments showing the same thing. At what number will you acknowledge the existence of a flaw?

At the point where you put the numbers in and show that quantitatively it matters for estimation of global mean temperature trend. Not hand-waving, start to finish analysis that met the rigors of a peer reviewed document.

Carricksaid

My comment was not nasty in the least, Carrick. Eadler does not have any knowledge of the things he is arguing

LOL. Not nasty “in the least”. Woah. Learn thyself, Mark.

Int any case, I don’t know how you could possibly know whether Ealder has “any of the knowledge of the things he is arguing”. Total hyperbole, designed to shut down arguments. Attack the critics, keep it on a personal level, SOP.

Regarding, Eadler’s comments, the presence of systematic effects doesn’t affect the measurement of the trend, unless the error varies over time. You made a what amounts to a poor argument about how temperature systematic effects could> depend on temperature, but then you never showed whether it mattered.

Pat (nor Phi) need not take outrage since there was no reason to do so. Your assumption that he did with you is actually ridiculous. There was no martyr card. He merely pointed out that you shouldn’t question his ethics and, once you stated that was not your intention, he graciously accepted that as sufficient explanation.

This is pure hyperbole on your part. I never suggested he “took outrage”. I was questioning the motives for taking umbrage over the suggestion that he would have trouble admitting fault, were it pointed out. (You guys act like that never happens in RL, and it is somehow insulting to suggest that it could.)

As I said, it’s easier to attack people and vilify them than it is to deal with substantive criticisms.

Like Jeff says, people need thick skins if they are going to blog, and they need an even thicker skin if they are going to publish peer reviewed literature, never mind in a journal not known for its quality control like E&E.

Carricksaid

Pat, I feel that i owe you a bit less of a cryptic explanation of my concerns about E&E. The comment you make about judging a paper regardless of its source is a good one, my comment was more about how E&E has done you a disservice by allowing the paper to be published without as thorough of a vetting as it deserves. As a result, I think you have a paper that has substantive flaws in it.

My suggestion for future papers is to start by submitting them to JGR or other “main stream” journals for this type of work. If after the review process is over, it fails to get accepted (and you disagree with the decision), then migrate to another easier to publish in journal, after taking into account any criticism that you think was appropriate.

My suggestion for future papers is to start by submitting them to JGR or other “main stream” journals for this type of work. If after the review process is over, it fails to get accepted (and you disagree with the decision), then migrate to another easier to publish in journal, after taking into account any criticism that you think was appropriate.

As you raised the issue of review, E&E 2010 22(8) and E&E 2011 23(4) were once a single manuscript that was submitted to the AMS Journal of Applied Meteorology and Climatology. It went through four reviewers, two associate editors and 3 rounds of review with them. Three of the four reviewers recommended publication. One of them was adamantly opposed from the outset. This latter reviewer merely continued to repeat his objections after I had repeatedly shown they were meritless. The editor accepted the one reviewer over the three, and rejected the manuscript. The three reviewers had many critical questions but none of them found any serious error in the analysis. Neither did the fourth, but his adamancy carried the day. That particular reviewer also gratuitously accused me of dishonesty. The whole process took a full year. It was a fairly standard climate science experience, for a skeptical submission.

On the other hand, one of the E&E reviewers found an error in one of the equations that had gotten by all four AMS reviewers, two AMS AEs, and the editor.

Steven Moshersaid

“The bigger issue is the concept of global average temperature. I think it’s physically meaningless.”

question: I tell you that the average of all sampled places on the globe is 14.5C. I know ask you to estimate the temperature at a random
location: x,y. Provide your best estimate.

question: when people claim that it was warmer in the MWP or cooler in the LIA is that a physically meaningful claim?

question: is it warmer on the surface of venus? what does this mean, physically.

“It would be much better to force the modelers to publish global temperature maps and compare their results to the satellite measurements”

1. They do publish this data. You can turn it into maps if you like.
2. You do realize that the temperatures recorded by satellites rely on physics models?
a. are you willing to accept the physics models that satellites depend upon as being true?
b. can you sense a trick question when you read one?

Carricksaid

Contrary to Pat Frank’s discussion in section 2, the spread in the temperatures over the “M” days of the month makes absolutely no contribution to the uncertainty in CRU’s ability to estimate the mean value of the M temperatures

I’m not sure why propagating error through an integral (sum) is so fraught.

I think the more interesting stuff is from that Hubbard and Lin paper; seems like a motivated skeptic could grid up some of the screens in one of the open FEM packages and do some rough estimates of the biases to go along with their empirical work; maybe even design a better one. It’s interesting that these sorts of heat transfer problems have been used as toys extensively by the folks from Sandia for researching UQ methods.

Mark Tsaid

This is pure hyperbole on your part. I never suggested he “took outrage.”

No, you said “I don’t see any similar level of shock or dismay on yours or pat’s part.” Not hyperbole by me at all, just a different phrase with the same intended meaning. Why don’t you learn thyself as well?

You refer to my nasty comment yet cannot grasp why someone opening their work to public scrutiny would take offense to you question their objectivity. Hypocrisy is the worst of your traits.

W.r.t. to eadler, I justified my response perfectly. He asked if i had a reference that showed that temperature sensors have a significant propotional error and i provided one. Apparently you did not see the link to a typical measurement non-linearity? Are you also saying that I have to point out all of the non-linearities in every MMTS sensor out there? If you would like we can also discuss drift/non-linear R-T curves for thermistors, which require regular recalibration – something i would expect a physicist to know about. We can also discuss why just aout all measurement equipment used in labs requires recalibration at regular intervals.

Anyway, a non-linearity in amplitude WILL change a trend uncertainty, this is a truth, but the magnitude of that change is what must be determined to which I did not comment. If you can keep you replies to what I was actually referring to, you won’t look like you haven’t actually read it.

There’s a point at which you expect people engaged in a debate to understand the basic premises underlying all arguments. Clearly eadler does not as evidenced by his accusation that I was making up such non-linearities, or even the very basic error he made criticising Pat’s paper (which is when I made my nasty comment.) Should I also be required to explain why 2+2=4?

Steve Fitzpatricksaid

Which means (based on what Pat says) that all the editors and reviewers at a well known climate science journal seem to have missed a rather glaring conceptual error, and made their acceptance/rejection decision based on other factors. Hummm…. why am I not terribly surprised by this? Perhaps because we have seen ‘other factors’ considered before.

Jeff Idsaid

If you are measuring the same thing a hundred times, the uncertainty is easy to estimate. The problem in determining the error here is different because each measurement at a different point on Earth is of a different ‘true’ condition. In global temp averaging, the true condition is is dependent on the location and weather patterns of earth. The large variance in measurements of the true condition created by natural variance does not indicate the accuracy to which we know the true condition but rather the typical variance of the true condition. Pat’s paper seems to incorporate the true expected variance as error in the knowledge of the true condition. What it also does is state above that because the variance of data comprising individual point in his paper exceeds individual values in the trend, we don’t know the trend. Whether that statement is true or not is not supported in any way by the statistical analysis in the paper (or perhaps I missed it). With enough data, we know trend better than any individual value.

Steve Fitzpatrick, wouldn’t it be great if Pat Frank shared the full text of the reviews? I’d be interested to know the arguments the one ‘adamant’ reviewer made that Pat Frank found to be without merit.

Carricksaid

Which means (based on what Pat says) that all the editors and reviewers at a well known climate science journal seem to have missed a rather glaring conceptual error, and made their acceptance/rejection decision based on other factors. Hummm…. why am I not terribly surprised by this? Perhaps because we have seen ‘other factors’ considered before.

You can get people who are behaving irrationally during the review process. Many (most) reviews are not helpful. I agree with Jstults, it would be interesting from a sociological perspective to see whether any of the reviews were substantive, and if they were, whether Pat actually grasped the criticism.

Mark Tsaid

This is pure hyperbole on your part. I never suggested he “took outrage.”

No, you said “I don’t see any similar level of shock or dismay on yours or pat’s part.” Not hyperbole by me at all, just a different phrase with the same intended meaning. Why don’t you learn thyself as well?

You refer to my nasty comment yet cannot grasp why someone opening their work to public scrutiny would take offense to you question their objectivity. Hypocrisy is the worst of your traits.

W.r.t. to eadler, I justified my response perfectly. He asked if i had a reference that showed that temperature sensors have a significant propotional error and i provided one. Apparently you did not see the link to a typical measurement non-linearity? Are you also saying that I have to point out all of the non-linearities in every MMTS sensor out there? If you would like we can also discuss drift/non-linear R-T curves for thermistors, which require regular recalibration – something i would expect a physicist to know about. We can also discuss why just aout all measurement equipment used in labs requires recalibration at regular intervals.

Anyway, a non-linearity in amplitude WILL change a trend uncertainty, this is a truth, but the magnitude of that change is what must be determined to which I did not comment. If you can keep you replies to what I was actually referring to, you won’t look like you haven’t actually read it.

There’s a point at which you expect people engaged in a debate to understand the basic premises underlying all arguments. Clearly eadler does not as evidenced by his accusation that I was making up such non-linearities, or even the very basic error he made criticising Pat’s paper (which is when I made my nasty comment.) Should I also be required to explain why 2+2=4?

Steven Moshersaid

Well pat has had a run in with a reviewer who refused to budge, when pat thought we showed him to be in the wrong.
And pat attributed that to the fact that he is a skeptic

Now, we speculate that it was Pat who refuses to budge when he was shown to be wrong.

Before Lucia completes her monte carlo, I’ll suggest that people should raise their hand whether they will accept
a monte carlo analysis or not.

I bet that folks who support Pat wont accept the results sight unseen. but Carrick, Jeff, SteveF, Kenneth and I are all willing ( i bet) to accept a monte carlo result sight unseen.

Takers? And if you have questions about how lucia will do the test, ask them up front. Otherwise I predict that when you dont like her results you will change the subject, or mumble some nonsense about global temperature being meaningless.

The problem in determining the error here is different because each measurement at a different point on Earth is of a different ‘true’ condition.

This doesn’t seem to be a problem to me. The functional (monthly global average) just maps the vector of station values to a number. The fact that the true value for each station is different really doesn’t matter. This seems like a straight-forward error propagation problem: quantify uncertainty in the output given the uncertain inputs. Am I oversimplifying this, and missing something? Maybe the concern is that biases caused by radiation and natural convection on the screens introduce error correlations that aren’t being properly accounted for (this doesn’t seem like a particularly huge difficulty to treat either)?

Carricksaid

You refer to my nasty comment yet cannot grasp why someone opening their work to public scrutiny would take offense to you question their objectivity. Hypocrisy is the worst of your traits.

I don’t have any problem with people making tough statements, not even the harsh comment of yours, as long as you are being honest about what you think. I’m just amused about the unequal standards that get applied to Pat’s “supporters” and his “detractors”. That I am easily amused hardly makes me a hypocrite (the double standards you guys apply, makes somebody a hypocrite, probably not me.)

As to the other, be blunt, I think it’s just f**king stupid to take offense when somebody comments on one’s lack of objectively: That we all lack objectively should be an assumption that guides us in how we seek truth in science. If you can’t’ handle your objectivity on a particular matter called into question), you really need to pick something a little more “soft core” to futz with, like maybe fantasy football.

Regarding nonlinearities in amplitudes, I think it’s a non-issue here. The biggest problem facing the measurement of temperature anomalies would be an offset shift over time. While nonlinearity in amplitude response could cause this, a more plausible source is the change in the geographic distribution of instrumentation sites over time.

Carricksaid

While I’m thinking about it, I don’t mean to limit sources for a shift in the temperature offset over time to just changes in the geographical sampling. UHI and land usage changes, station shifts, changes in instrumentation and method of measurement (e.g., TOD adjustment) are some that come to mind. The real trick is figure out how you use a network of sensor and to use the knowledge that there is a spatial/temporal coherence in the data and a geographic effect on temperature trend in order to tease out the relative importance of these various effects,

Steve Fitzpatricksaid

The fact that the true value for each station is different really doesn’t matter. This seems like a straight-forward error propagation problem: quantify uncertainty in the output given the uncertain inputs. Am I oversimplifying this, and missing something?

No, you have it right I think… the fact that the stations vary in true temperature has nothing whatever to do with uncertainty in the average of all stations. Place 10,000 separate thermometers at each station location and average the individual readings; the uncertainty in the measurement at each station then falls by a factor of (10000)^0.5 = 100. I believe Pats ‘uncertainty’ will remain very high, even with 10,000 independent daily readings at each station location.

Mark Tsaid

There’s no double standard, Pat noted your question directed towards his own work. My comment was directed towards Beadle (damned phone wants to fix it to Beadle and I am tired of correcting it.) It is, however, hypocritical of you to not recognize that your comment could easily be construed as nasty while you harp about a double standard.

Oh, and I have offered no position on the paper or his methods because I have not iinvestigated. I merely pointed out Beadle’s flawed comment. I am neither a supporter nor a detractor at this point.

There’s nothing wrong with taking exception to accusations of a lack of objectivity, particularly given that the accuser is basing his opinion solely on the fact that he disagrees with the work (aapparently, since no supporting evidence was provided.) PPerhaps Pat just expected a bit more out of you given what he knows about your background?

Oh, while I agree that the measurement bias may be a non-issue, your claim that it is is just as specious that it is not without some more analysis or data to back the claim. Good thermistors do not drift much, but the cheapies do. They also have a fairly non-linear R-T curve wwhich should be investigated.

Place 10,000 separate thermometers at each station location and average the individual readings; the uncertainty in the measurement at each station then falls by a factor of (10000)^0.5 = 100. I believe Pats ‘uncertainty’ will remain very high, even with 10,000 independent daily readings at each station location.

But the errors in those repeated measurements (which includes both the random and systematic components) are not independent (unless you used 10k different screen-types), so I don’t think it’s correct to say you’d get the standard sqrt(n) convergence in your uncertainty about the air temperature at that time/location (you’re right about making claims about the measurement for a particular instrument). This is a good point, but tangential to the point about the actual mechanics of doing the error propagation (which for this functional is trivial, unless I’m being blind); it’s deciding on the input pdf’s that seems like the hard part to me.

Steve Fitzpatricksaid

That we all lack objectively should be an assumption that guides us in how we seek truth in science.

Sure, and I doubt many people who work in science would argue otherwise, at least in the abstract. But telling someone (be that someone Eric Steig, Pat Frank, Lucia, Carrick, or me) that you think they are not being objective (influenced by politics, hysteria, fear, delusions, paranoia, vested interest, or whatever), doesn’t prove that is the case, and it sure does not help them think about the question more objectively, especially if they in fact are not thinking objectively. When tempted to suggest that someone is not being objective, it might be helpful to consider that it could just as well be you who is not being objective. Making a reasoned argument about why someone is mistaken is almost always going to be more constructive than telling them they are not being objective, whether they are being objective or not. It all comes down to what you are trying to do: piss people off or agree on what is correct.

Steve Fitzpatricksaid

Use whatever combination of thermometers and screen types you think (or can show) will narrow the uncertainty in an individual station reading. The point is that Pat’s method would appear to suggest high uncertainty regardless of how perfectly individual station readings represent the true temperatures at those stations.

Steven Moshersaid

‘As to the other, be blunt, I think it’s just f**king stupid to take offense when somebody comments on one’s lack of objectively: That we all lack objectively should be an assumption that guides us in how we seek truth in science. If you can’t’ handle your objectivity on a particular matter called into question), you really need to pick something a little more “soft core” to futz with, like maybe fantasy football.’

precisely. personally, I’d rather have somebody question my objectivity rather than my math. the reason is simple. We all lack objectivity. I do know how to correct for my own lack of objectivity. I submit to tests designed by those who disagree. They propose a test and If I judge it to be a fair test, I submit.

on pats claim that the uncertainty bars are that large, I wonder what it means?

As others have noted the bars do not grow or shrink with the number of stations.

This is problematic on its face. But it also suggests a test.

I could construct an average from 500 stations and calculate pats error bars.
They would not differ from the ones he has. the mean would not differ and the variance would not differ. ( hint he has months in the data with less than 500 stations)

What do his error bars suggest I will get as an answer if I then sample 4000 other stations? In short, what OBSEVATIONS could we make to show pat that his error bars are too wide? We know what to look for if his error bars are too narrow.

There is no problem with the stations measuring a different value, the problem is that in ascertaining how well we know the true average of the multiple values, you cannot include the variance of the ‘real’ temperature. If you have a pot of 100 degree water, and a pot of 33 degree water the average is 66.5 degrees. How well you know that average has not one bit to do with the difference in temp of the two pots. If the accuracy of each thermometer is within 0.001 degrees, you wouldn’t claim we know the average within 30C. In terms of anomaly the variance is on the order of a few degrees C from station to station, similarly this difference says nothing about how well you know the average. It does say something about the variance in weather, but that is all it says.

As I read it, Pat’s paper implies that this variance, which exists in nature, is in fact error in accurate knowledge of temperature when really it is the variance in the correct knowledge of temperature.

It does say something about the variance in weather, but that is all it says.

Precisely. Mind you: The variance in the weather can be interesting and can be the subject of a study itself. However, this variance does not contribute to the uncertainty in the computed mean — or at least not in the way Pat suggests. (Owing to lack of perfect station coverage, it can have an effect. But that effect is nothing like Pat’s estimate.)

Iansaid

precisely. personally, I’d rather have somebody question my objectivity rather than my math. the reason is simple. We all lack objectivity. I do know how to correct for my own lack of objectivity. I submit to tests designed by those who disagree. They propose a test and If I judge it to be a fair test, I submit.

But won’t your assessment of whether a test is fair or not also be potentially influenced by a lack of objectivity?😉

Carricksaid

Oh, while I agree that the measurement bias may be a non-issue, your claim that it is is just as specious that it is not without some more analysis or data to back the claim. Good thermistors do not drift much, but the cheapies do. They also have a fairly non-linear R-T curve wwhich should be investigated.

I’ve done the analysis and I’ve got data of my own to boot (1-second sampling rate, two years data, non-aspirated enclosure) The analysis on the effect of a change in geophysical distribution leading to a shift in the temperature offset was “written up” in comments on this thread. The other systematics that I commented on have been notice and explored in the literature. If you want a list of references, I could probably dig them up for you, but google scholar is your friend. MikeC put a list of some of them together at one point on Lucia’s blog.

SteveF #95, a shorter version is “you can be right, or you can be happy”.

Carricksaid

Mark Tsaid

I’ve done the analysis and I’ve got data of my own to boot (1-second sampling rate, two years data, non-aspirated enclosure) The analysis on the effect of a change in geophysical distribution leading to a shift in the temperature offset was “written up” in comments on this thread. The other systematics that I commented on have been notice and explored in the literature. If you want a list of references, I could probably dig them up for you, but google scholar is your friend. MikeC put a list of some of them together at one point on Lucia’s blog.

Makes your comment about how weak my argument above was, well, a little weak – a vague reference, not really backed up other than “I think it is a non-issue.” Don’t get me wrong, I do not doubt that you do have such data when you say so (nor do I doubt your word in general,) but next time, try to apply the same standard to your own posts as you do others’.

At any rate, are you referring to your own sensor (the first line?) That’s not the type of analysis I think we’d need. I think we’d need an actual audit of the stations to see which have been calibrated (and when) along with a spot-check of various known uncalibrated sensors (by calibrating them) to understand how prevalent the problem is. I’m not sure if the GHCN stuff in that thread you linked is quite what I was thinking (to extract any systematic calibration type issues.) Maybe… dunno for sure.

Certainly station moves would be big as would a change from one type of sensor to another, though whether they would impact the trend or not is another matter as well.

Pat Franksaid

Before posting a reply to Jeff’s #50, I’d like to clear up a pervasive mistaken understanding here about the meaning of magnitude uncertainty.

Some people apparently think I have treated magnitude uncertainty as a kind of measurement error. Magnitude uncertainty is not a measurement error and I do not treat it that way.

I’ll give a more detailed explanation when I respond at Lucia’s, but will summarize here.

Magnitude uncertainty arises when taking a mean of magnitudes that vary inherently. In a physical system it shows the heterogeneity of a physical state. It is not a measure of statistical uncertainty. It’s a measure of physical variability.

For example, if one wants to discuss the physical meaning of the temperature of a given month, is it enough to say that, e.g., January 1998 in Nivala, Finland had a mean temperature of -5.6 C?

Does presenting that value alone mean every day reached -5.6 C and stopped there? The average min-max of that month was -21.7 C and +1.9 C (no bathing suits there!). The mean magnitude of the January 1998 temperatures at Nivala Finland have a standard deviation of (+/-)6.1 C.

That (+/-)6.1 C represents the thermo-physical heterogeneity within January 1998 in Nivala Finland. It’s a measure of natural variability. Discussing the physical meaning of January 1998 temperatures in Nivala is not complete without that. A mean temperature of -5.6(+/-)6.1 C tells us much more about the state of January, than does -5.6 C alone.

The (+/-)6.1 C is the magnitude uncertainty of January 1998 in NIvala Finland. It is not a measurement error. It’s a measure of the thermal heterogeneity of the month of January.

It would not matter whether one owned a perfectly accurate, perfectly precise temperature sensor, that can make perfectly correct measurements. The temperature magnitudes vary inherently. When a mean is calculated there will be a statistical deviation of intrinsic thermal variability about that mean, due to the fact that each the January temperature has an inherently unique magnitude. It’s unavoidable.

If someone wanted to compare temperatures of January 1998 [-5.6(+/-)6.1 C] and January 2010 [-14.7(+/-)6.6 C] in Nivala Finland, then exclusion of the magnitude uncertainties of the two months would make the comparison almost meaningless. 2010 averages colder, but without knowing the temperature variability of the months, the mean difference has no context.

Likewise annual anomalies. A magnitude uncertainty can be calculated for each of the 30 sets of 12 months in the GISS 1951-1980 climate normals, or the CRU 1961-1990 climate normals. Plotting the trend in anomalies without giving at least the magnitude uncertainty of the normals, as an indicator of natural variability, makes the trend almost meaningless. How does one judge whether the trend is peculiar unless one knows how much the anomalies vary naturally?

Jim Hansen mentioned the GISS magnitude uncertainty of (+/-)0.13 C in passing in his 1988 testimony. My second Figure above merely plots them. The vertical bars do not represent a measurement error. The represent the inherent temperature variability of the 1951-1980 normals.

Likewise, Figure three plots the CRU 1961-1990 anomaly magnitude uncertainty of (+/-).17 C at the 99% (3-sigma) level, and Figures 4 and 5 plot the magnitude uncertainty of (+/-)0.28 C calculated for the entire 1850-2010, 160 year, period also at the 99% level.

They do not represent measurement errors. They represent the natural variability of temperature over those years.

I tried to be very clear about the meaning of magnitude uncertainty in my papers.

The concept is introduced in paper 1, page 972, top of the page. “However, a further source of uncertainty now emerges from the condition tau_i is not equal to tau_j. The mean temperature, T_bar, will have an additional uncertainty, (+/-)s, reflecting the fact that the tau_i magnitudes are inherently different.”

At the bottom of page 972, “The magnitude uncertainty, (+/-)s, is a measure of how well a mean represents the state of the system. A large (+/-)s relative to a mean implies that the system is composed of strongly heterogeneous sub-states poorly represented by the mean state.”

Paper 1 p. 977, middle, “This [magnitude] uncertainty transmits the confidence that may be placed in an anomaly as representative of the state of the system.”

Paper 2, p. 408, “Magnitude uncertainty stems from the variation in time and/or space of the inherent intensities of the measured observables.”

Page 414, “The large magnitude uncertainty, (+/-)s_bar, indicates heterogeneity and that the mean temperature is a poor measure of the state of the system.”

Page 416, “This [magnitude uncertainty of] (+/-)0.17 C is the 1-sigma uncertainty estimating the intrinsic magnitude of variation in the annual mean anomaly normal used to set the zero line for the global average air temperature anomaly trend in the CRU data set.”

Does any of that sound like the description of a measurement error?

Note that weather noise, as I will discuss in my next post and as defined by CRU and accepted in my papers, will tend toward zero in taking 30-year and 160-year means. It will have only a minor affect on the magnitude uncertainty. However, the magnitude uncertainty will contain an unknown amount of systematic measurement error.

Pat Franksaid

#50 Jeff, Lucia was apparently concerned about the analysis in paper #1, which appeared on tAV here. Uncertainty variances in temperature measurements appear in that paper, but not are not part of the magnitude uncertainty discussed above.

Am I right in thinking that when you wrote, “the variance is visibly different in time and visibly smaller than the stated error bars above,” you mean in the first Figure above, which was Figure 3 from paper #1 and which appeared first on tAV last January?

I’ll assume so for this discussion, but if I’m mistaken about which variances you mean, please respecify them in terms of which other figure above concerns you.

If you mean Figure 3 of paper 1, then the variance in Figure 3 has nothing to do with weather noise. But let me go through your points stepwise.

You wrote, “the problem I have with the method is that monthly weather variance is treated as uncertainty, I don’t believe this is at all correct.”

Page 969-970 quotes Brohan, et al., 2006 (B06) as follows: ““The station temperature in each month during the normal period can be considered as the sum of two components: a constant station normal value (C) and a random weather value (w, with standard deviation sigma_i).” They go on to say, “[If] the ws are uncorrelated then for stations where C is estimated as the mean of the available monthly data, the uncertainty on C is (sigma_i)/sqrt(N).”

Figure 3 in paper #2 (p. 415) shows the 30-year mean October temperatures for Anchorage Alaska. Compare the smooth almost linear trend with the September 1998 trend at Athens, Figure 2 (p. 413). Virtually all the weather noise is gone from the Anchorage trend in the manner expected if the weather noise is approximately random. I found the same loss of weather noise for the 9-year May temperature average of Kahoolawe, Hawaii.

You wrote, “Local weather noise, if recorded accurately, is what the mean is comprised of.”

You’ve gone one step too far. The mean is comprised of the measurements. The meaning of the measurements is what we’re discussing. Please look again at the 30-year Anchorage October average. All that’s left is a trend that reflects the declining insolation as winter approaches. The temperatures of virtually every month, even July and August or December/January, will include a trend across the month that is due to the change in solar angle. Excursions away from this trend are due to the vagaries of local weather. So, the daily temperatures, alone, can be separated into two components: a trend across the month due to systematically varying solar irradiance and the almost random daily excursions due to weather.

That is, t_i = tau_i + w_i,

where t_i is the measured temperature, tau_i is the part of the temperature due to that day’s insolation, and w_i is the thermal excursion due to that day’s weather (measurement error is neglected to focus on just the temperature magnitude).

You can see further examples by looking at the 30 year normals for any region. Those for the Philadelphia region are here, for example. Choose a city, choose “Daily/monthly normals” and “Avg Temperature.” Hit “Go.” The chart that comes up shows the 30-year average temps for every day in Jan – Dec , 1971-2000. The trends are very smooth and there is almost no weather noise. But in Spring there is a smooth trend to warmer temperatures and in Autumn a smooth trend to lower temperatures.

These smooth trends are the “climatology,” and reflect systematic insolation. Weather noise consists of the daily temperature jumps in a given month, that average away when 30 years of that month are averaged.

The Gaussian fits to monthly weather noise in paper 2 Figure 2, Figure 3, and Table 1, almost all give a very good fit r^2, and the same is true in Tables S1-S4.

Please also notice, in this regard, that on page 413 paper 1, just under Figure 2, weather noise is treated as random: “Weather noise [(+/-)1.88 C, Figure 2b] was treated as stationary, and for 60 measurements (+/-)sigma_w = (+/-)1.88/sqrt(60) = (+/-)0.2 C.” Under this circumstance, weather noise would average away as 1/sqrt(N), and would end up making essentially no contribution at all to the final error variance. So, any variances I calculated would in any case contain no weather noise.

So, honestly, I don’t understand how you got the idea that I am “Constructing error bars based on this expected variance [of weather noise].”

The error bars I calculated in paper 1, and displayed in Figure 3 from that paper (first Figure above) were calculated from the systematic air temperature error produced by an MMTS sensor operating under ideal field conditions.

If conditions were as you propose, namely, “if the thermometers were 100% perfectly maintained and 100% perfectly accurate and precise with perfect global coverage,…”, then the systematic error would be zero and I would get measurement error bars of zero magnitude.

I don’t understand how you can think that, “you would still have huge error bars by your method” unless you’re making a conceptual confusion between the magnitude uncertainty, described in the second – fifth Figures, with the uncertainty due to systematic error displayed in the first Figure.

Magnitude uncertainty would remain even if all the sensors were perfectly accurate and precise and properly distributed globally. The magnitude uncertainty describes the spread about a mean that is calculated from a set of values that vary inherently in their magnitude.

I discuss this in my prior post, and I hope there’s no further problem here.

Those uncertainties — systematic error and magnitude uncertainty — are physically orthogonal, and are statistically independent. Systematic error can be estimated separately by precise calibration experiments and subtracted from the total uncertainty in a mean to estimate any residual magnitude uncertainty. That condition is mentioned as part of Case 2, page 972 of paper 1: “If the sensor (+/-)sigma_n has been measured independently, then (+/-)s can be extracted as (+/-)s = (+/-)SD(-/+)(sigma_n/sqrtN) because measurement noise and magnitude scatter are statistically independent.”

You wrote, “ the conclusions about trend in the blog post and paper aren’t supported by a point to point statistical analysis.”

The error bars in the first Figure are supported by the statistical analysis given in paper 1. The magnitude uncertainty bars in Figures two through five are calculated from the 12 sets of 160 monthly anomalies provided by CRU in their 2010 data set. The statistical basis for that calculation is given in paper 2.

I’ve been very clear that the error bars in the first Figure represent a lower limit of uncertainty calculated from the careful calibration experiments of Hubbard and Lin. In paper 1, the argument is made that temperature sensors under ideal conditions of maintenance and field placement should produce less systematic error than an equivalent senor more poorly placed and more poorly maintained in the surface stations of the GHCN. I make the further point that LIG thermometers in CRS shields, also poorly placed and poorly maintained, will produce systematic errors likely to be even larger than the systematic errors of electronic sensors. These are reasonable arguments, that are buttressed by comparing the systematic error of an HMP45C PRT sensor in an MMTS shield with one in an MMTS shield in paper 1, reference 13. The CRS shield yielded systematic errors twice as large.

You wrote, “The trend is known better than any individual monthly value.” The trend is not known to better than the lowest systematic sensor error in the ensemble of sensor systematic errors used to calculate the total uncertainty in the mean of the measurements, unless somehow those errors accidentally cancel. But no one has ever monitored the systematic error of temperature sensors globally, and so there’s no way to know whether they cancel, or not. In the event of that ignorance, an error estimate has to be propagated into the each annual anomaly as sqrt[(sum of errors)^2/(N-1)], and the trend itself is not known better than that.

You wrote, “My point is that weather noise cannot be assumed to be a stochastic error.”

Since you’ve used “weather noise” to indicate the full magnitude of a temperature measurement, which it is not, this point is in limbo. But again, Figure 3 in paper 2 provides evidence that weather noise declines in a random-like manner, and the data provided at the Philadelphia site corroborates this. Brohan, et al. 2006 treat weather noise as random.

Finally, you wrote, “Not only will proximal stations be correlated indicating an accurate representation of actual temperature…”

Jim Hansen’s classic paper on this is Hansen, J. and Lebedeff, S., Global Trends of Measured Surface Air Temperature, J. Geophys. Res., 1987, 92 (D11), 13345-13372, where he showed correlation of surface air temperatures from adjacent surface stations, which decline to about 0.5 at 1200 km.

But what determines local air temperature? Local solar irradiance, wind, cloud cover, albedo around the sensor, etc.

Hubbard and Lin show that these identical physical circumstances determine the systematic error produced by surface air temperature sensors. This means the same climate determinants that cause temperatures to be correlated across regions will also cause systematic error to be equally correlated across regions.

Correlation of the measured temperatures of proximal stations therefore does not indicate accurate representation of air temperature. It only indicates correlation of the temperature measurements, which include correlated systematic error.

This problem of correlated systematic error has been entirely overlooked by climate scientists. I had a discussion about this point at WUWT with a climate scientist posting as EFS_Junior. You can read his opening argument here and my first reply here, which includes a discussion of regionally correlated systematic error. The discussion with EFS_Junior continued on in further posts.

The common practice in climate science of validating surface air temperatures by showing correlations among the records of regional surface stations is almost certainly misconceived and wrong.

It looks to me, finally, Jeff, that your concerns stem from a misapprehension of what I have done coupled with an erroneous view of the meaning of weather noise.

Thanks for the detailed reply. I hope you don’t mind the discussion because it is interesting.

Hopefully, you don’t feel everyone is just piling on for the heck of it. First, I am not disputing that you can take a stations variance into account in determining how well that individual station has represented the weather at that point. This is different from how accurate the station knows the weather as you have pointed out above it is also different from the ability to discern a trend in that station. Where this all gets really tricky is when you begin combining the stations which as you point out, correlate to 0.5 even at distances of 1200km. Air temps are a measure of the energy in the atmosphere at a given point. Because the atmosphere has thermal mass and inertia we should reasonably expect that with the large daily fluctuations by region to find other canceling daily fluctuations nearby or elsewhere, though not at a spatially consistent pair temperature stations. This is why PCA might pull out a valid consistent decorrelation region but might not as well – different subject.

“But no one has ever monitored the systematic error of temperature sensors globally, and so there’s no way to know whether they cancel, or not. ”

This is where we don’t agree. There is a way and that is the problem I have.

If you take the global average, you can see the variance from that average over a 20 year period with x number of stations. You can model the noise using various methods like arima or farima depending on your mood and which form you feel best fits noise and calculate the knowledge of trend directly from that data. You can even go back to infer what the true sigma noise is from uncorrelated non-canceling weather events. Simply fitting a trend line using Quenouille style estimates works well also but without all the fancy stuff, lets look at a section of global average surface temperatures.

Here is some code for downloading and plotting the crutem global temperature data.

It's kind of interesting because the plot of the data since 1995 looks like this:

I detrend the line and calculate the standard deviation of the global average land temperatures to be .201 * 2.58 gives a 99% interval of +/-0.52 for a single point based primarily on weather variance (due to the dominant magnitude of the weather over other sources of variance). Confirming the result further, the peak to peak unfiltered max min in the dataset of 1.28 C or a +/-0.64. Recognizing that I have ~200 datapoints so 2 should fall outside of the 99% range the peak to peak is slightly higher. This will shrink your error bars in the last figure above from +/- 0.84 to +/-0.52 – which you would expect if there is thermal mass in the distributed atmosphere. This is only for the high station count in recent years and doesn't include any systematic errors accounted for in your paper but does include all of the weather variance. In addition, these error estimates do expand in the past when less data was available.

Since multiple points are used in calculation of trend, this doesn't answer the question of how well we know trend for any timeframe and trend is the key to any AGW argument. Using AR1 modeling and Quenouille style estimate for noise, we can calculate from the exact same noise the trend from 1900 to today. I've tested this method against monte-carlo in the past and find it to be pretty darned close so would be interested if someone would show me why it isn't true. I've shown here in the past that 1995 to present is barely significant but by your statement above, the whole 1900 to present trend is non-significant. This is absolutely NOT the case.

From 1900 to present I calculated the following values
Trend = 0.08436108 C/Decade +/- 0.01136167

So the trend is about 8 times the 99% significance interval including all of the non-systematic random errors. A point to point error analysis therefore doesn't support that we don't know the trend.

Now all of the above analysis assumes the weather noise is random and that represents how well we know temperature. This is inaccurate in my opinion and represents an upper bound on any result that should be obtained by a more detailed analysis. The reason I say that is because there is variance in the weather so if you are asking the question, “How well have we recorded that variance?”, you cannot include that variance in the quality estimate.

In the first equations of Pat’s first paper, he’s averaging that variance between stations. If you ask the question, “What is the probability of an individual station measuring x distance from the mean?”, Pat’s initial equations seem to answer that question. (+/-)s = (+/-)SD(-/+)(sigma_n/sqrtN). It is not a very interesting question though and you cannot make conclusions about quality of the knowledge of the mean from it because in this case each station is not measuring the same thing. If you ask the question – “How much is random weather likely to affect my measured variance each month?”, you need to do an analysis of the average values similar to what I’ve done.

In trend measurement, you want to ask the question, “How likely is it that this trend is a result of random variation?” This is an important question and weather noise becomes critical to answering it. There is another question – “How well have we measured the trend?” which has to do with instrumental error and in climatology is less studied than it should be but it would not include weather variance.

So in the plots of global temperature data, the interesting error bars should represent only how well the instruments have recorded the mean – not including weather variance. Where the above post goes wrong is in the conclusions drawn from the equations, because they are answering a different question.

Nigel Harrissaid

I’m not a statistician, so please don’t be too hard on me if the following is all wrong. I found the conclusion of your paper – that we can’t say with any certainty that warming has happened – so stunningly counter to my intuitive interpretation of graphs of temperature anomaly that I had to make the effort to understand your logic.

I came unstuck very quickly, around equation (6). A few lines after that, you show a formula for the “magnitude uncertainty of T-bar” (+/-s). This seems to me to be an estimator of the population standard deviation (As I said, I’m not a statistician, so please forgive what is probably a crashingly horrible misuse of technical terms here) . I was expecting to see a formula more like the standard error of the mean – which would be your +/-s divided by sqrt(N).

Your “magnitude uncertainty of T-bar” seems to me to be a measure of how well T-bar estimates the value of an individual temperature measurement. I don’t understand why this is the appropriate error term in the context of assigning error bars to averaged temperature. Surely it is the value of T-bar as an estimator of the *mean* of the population, rather than the value of T-bar as an estimator of any individual value that matters? And for this, I’d have thought you need to divide by sqrt(N).

If I have 25 exactly the same cups of water, each at a different temperature yet containing the same mass. Each cup has a temperature from 50 to 74C in 1 degree increments and no heat is gained or lost. I know the temp from identical thermometers with accuracy of 95% confidence to +/- 0.01C, how accurately do we know the average cup temperature?

Nigel Harrissaid

OK, I see some very useful responses to other posts above, which have helped me understand where you’re coming from. I still have a big problem with your contention that “in every case, a magnitude uncertainty, ±s, must also be included as part of the uncertainty in an annual anomaly”. If the +/-s term is similar to the sample standard deviation, then it seems to me that even if the annual temperature history for the last 160 years consisted of a completely linear increase, you would still conclude that we can’t say with certainty that an increase has occurred, as the observed value +2SD at the start of the plot would be greater than the observed value -2SD at the end of the plot.

They go on to say, “[If] the ws are uncorrelated then for stations where C is estimated as the mean of the available monthly data, the uncertainty on C is (sigma_i)/sqrt(N).”

They are correct. Because in their view, the problem of “weather noise” has to do with lack of station coverage. So, had the stations been located somewhere different, they would have experienced weather noise differently. In this case, the estimate of the difference based on assuming they point measurements had been located elsewhere, and they estimate using this the way they say.

In contrast, your method that looses the sqrt(n) in the denominator makes no sense. To understand that we need to step back to simpler problems– and we need you to go back to case 2 in paper 1 which makes a claim that is utterly, totally completely wrong. This claim appears to propagate into paper 2.

Instead of discussing Alaska and Athens, you need to first create some synthetic data sets and think through what would constitute a perfect measurement of the mean temperature, what would constitute imperfect measurements, and show your method works. (It won’t.)

#50 Jeff, Lucia was apparently concerned about the analysis in paper #1, which appeared on tAV here. Uncertainty variances in temperature measurements appear in that paper, but not are not part of the magnitude uncertainty discussed above.

Does “above” relate to something other than paper 2?

Also, I am not concerned with uncertainty variances in temperature measurements. Those belong in the uncertainty analysis. I am concerned with your believing that the variance in the perfectly measured temperatures would contribute to the uncertainty in computing the mean.

If the issue I discuss in my post is not relevant to paper 2, that is a stunning revelation. Let me point out that your entire introduction suggests that your section “1. MEASUREMENT UNCERTAINTY IN SURFACE AIR TEMPERATURES” begins by discussing this made-up-uncertainty..

Equation (1) in paper (2) is introduced with “For any fluctuation or trend in air temperatures, the magnitude uncertainty in the mean temperature is given as”

This is followed by s=….
This is precisely the ‘fake’ uncertainty I discuss in my post.

You continue after (1) with

“B06 assumed that temperature measurement error is random and declines as , and disregarded magnitude uncertainty, ±s, which should have followed the description of station normal error, εN.”

You proceed to re-iterate material discussed as “case 2” in paper 1 and cite paper 1. It is the mistake in paper 1 that appears to not only propagate into paper 2 but forms the backbone of everything in paper 2.

To show more: In equation (7) in paper 2 you seem to be explaining how to include the variance into the uncertainty. (Of course, the true contribution should be zero.)

Beyond that, it’s somewhat difficult to comment. I find much about the writing style in your paper confusing. For example: When you say “spurious errors” do you think the “spurious errors” are real and need to be included in the uncertainty? Or do you mean Brohan included errors that don’t exist in his uncertainty (this would make Brohan’s too large.) It seems to me you are using the term “spurious” backwards. But maybe there is something I’m missing.

Pat–
After you answer Jeff’s question in 113, I have an extension for you that will clarify what Brohan is doing when accounting for “weather noise”. It should clarify why his method is right and yours is wrong.

joshua.stultssaid

If further discussion depends on Jeff’s #113, then it should be clarified. The example and questions given are unfortunate. The example population is finite, uniformly distributed and completely sampled. There is no sampling error in calculating a variance (or a mean, or a …), and the coverages given (based on incorrect calculations) make no sense. As written, it is more likely to increase confusion rather than decrease it.

The example population is finite, uniformly distributed and completely sampled.

I think that is a feature, not a bug. In fact, I think we absolutely need to get an answer for a finite population before we move forward to the case where the sample is drawn from an ensemble of all possible cups.

The example demonstrates the difference between knowledge of mean and variance of sample. If your measurements all came from the same cup of water Pat’s equations apply to his question. If they come from different ones, they do not.

Steve Fitzpatricksaid

Let me offer a simple example. Suppose I have a continuous chemical process conducted in a homogeneously mixed tank, where a solid compound (product) is formed from a mixture of volatile liquid compounds (raw materials). There is a specified conversion rate that is desired, and I control reaction conditions to try to match that specified conversion. The conversion can be determined by measuring the solid content of an (instantaneous) sample withdrawn from the tank. I can verify the variability of my measurement process by testing a series of individual samples drawn from the tank, splitting each of them into multiple aliquots and measuring the solid content of each of the (identical) aliquots. Suppose further that I verify the measurement process is extremely accurate and repeatable (eg, +/-0.001% conversion), but that individual samples of the process show considerable (but acceptable) variation around the target conversion of +/- 3% (+/- two sigma), and that variation is well behaved… essentially Gaussian noise around the desired process mean. The process has substantial “natural” variance….. I know that a deviation in one sample of up to + or – 3% of the long term average is not significant.

The question is: can I or can I not measure a long term trend in the process that I prove is a) statistically significant, but which is b) much smaller in magnitude than the “natural variation” in my process? The correct answer (assuming I collect sufficient data) is yes, for sure I can, because a small trend becomes ever more clear as the number of samples rises. Your formulation seems to suggests that the relatively large natural variation of the process blinds me forever from measuring a much smaller long term trend.

Steve Fitzpatricksaid

Pat,
Finally, suppose there is natural variation within my tank…it is not nearly so homogeneous as I imagined. If I simultaneously draw samples from different locations and measure how the conversion of the individual samples varies, I can determine the spacial variability of the process within the tank…. and I learn that what I thought was only gaussian temporal variability is in fact mostly gaussian spacial variability. I ask the same question again: can I or can I not determine a small long term trend in the process, one much smaller than the combined spacial and temporal variability, even with a limited number of sample locations? I am pretty sure I can; neither spacial nor temporal variability hide that trend if I have sufficient data.

Steven Moshersaid

Of course. And typically a back and forth ensues with each side proposing various test proposals. Typically, however to object to a test you have to explain WHY you think the test is unfair.

Mosher: I think woman love guys with salt and pepper hair.
Carrick: you biased piece of crap, you’ve got salt and pepper hair, of course you think that.
Mosher: well propose a test.
Carrick: lets do an internet poll.
Mosher: not fair.
Carrick why?
Mosher: too many young ladies, it’ll be skewed.
Carrick: ok, we will control for age
mosher: Arrg. I meant they love them in person, pictures dont do us distinguished guys justice.
carrick: ok, we’ll do live evaluations, like a police line up.
Mosher; No, it has to be a social thing, like we get to talk to them.
Carrick: no, wait then your testing your ability to talk and then it doesnt matter if yu have hair or not.

Carrick: are you really interested in making testable claims or not?

I’ve found that if somebody is interested in making testable claims you can come to agreement.

Pat.
I’ve responded to your “point-by-point”. The main issues are:
1) Case 2 is wrong. The uncertainty you call ‘s’ should be zero.
2) It seem you think this somehow doesn’t matter because you really used Case3b. My response is to point out that case 3b includes the fake uncertainty ‘s’, and so it is also wrong.
3) You blather about somehow really focusing on individual temperatures. My response: Your abstract presents results about means. When applied to means, your uncertainties are wrong.

My final conclusion: You are very confused. Carrick was correct when he wrote “Pat, your error bars regardless of how you obtained them do not pass the smell test, “. The reason why your error bars do not pass the smell test is that you are getting confused about what you are computing, think your ±s error is relevant to what you are computing and you are basically, very confused.

You really need to sit down and think about what would constitute a perfect, error free result. Then identify each thing you think is an error. Then figure out how to model that with Monte Carlo to test your notions of what constitutes standard errors for the actual observations you are trying to report. Because you are getting yourself very, very confused.

I then pointed out that (+/-)s and sigma_n are statistically independent. How is it remotely possible to suppose, as you did, Lucia, that I meant (+/-)s to be a measure of the statistical uncertainty in a computed mean?

The reason it is possible for me, Lucia, to think he meant ±s to be a measure of statistical uncertainty in the computed mean is that he says so in his paper. For example:

is his symbol for the mean. He explains how to compute it in his equation (5) in his paper. He says s gives the uncertainty in . So, he is saying this is the uncertainty in the mean. Moreover, it is clear from the rest of the paper that he means ±s to be the uncertainty in — so he is attributing it to the mean.

There are many similar examples of confusion in his comment at my blog. The confused bits are often snotty ways (e.g. How is it remotely possible to suppose, as you did, Lucia, that I meant…). But worse, anyone who reads his paper can also see that he must be confused because, it is more than “remotely possible to suppose” he meant what I think he meant. I am merely reading what his paper literally says.

So either,
a) Pat doesn’t know what he wrote in his paper.
b) Pat doesn’t know what he wrote means.
c) Pat is periodically confusing the mean for an individual measurement
d) Pat is confused about statistics in general.

Moreover, to a large degree, Pat wants to defend his clear, obvious mistake at the outset of paper 1 by avoiding engaging it, and by writing a huge number of indefensible statements in a snide tone.

As for my motives which you question: My motive in not rebutting every bit of blather is simple: I don’t want to waste my time. My motive for observing that Pat is confused is this: To communicate that he is confused. There is no “underlying” motive.

At the end, you asked this: “When the cups are mixed do we know the temperature will be 62 +/- 19 C or 62+/- 0.002C?”

In the first part of my reply, I’ll deal with your question as you asked it. In the second part, I’ll get to the crux issue.

I’m not going to be tricky here, but I will be precise. The way you framed your question, Jeff, it asked about our future knowledge, “… do we know the temperature will …?”

The physically correct answer to your question is: neither of your choices. We would only know the final temperature by measuring it. That means we’d know the temperature only to within our limits of measurement uncertainty.

According to your example, that’s (+/-)0.0051 C. So if we measured the final temperature of the total volume of water we’d only know the it to a precision of 62(+/-)0.0051 C, i.e., the 1-sigma limit of our measurement.

And if, prior to pouring the cups together, we had thought about the difference between predicting and knowing, we’d have predicted that we would only know the final temperature to (+/-)0.0051 C.

Nevertheless, even before we poured the cups of water together, we’d be able to predict that if we took 25 measurements of the the final state of the total volume of water, our knowledge would improve to 62(+/-)0.001 C. But the 25 separate measurements of the final volume of water are strictly necessary for a knowledge to (+/-)0.001 C precision.

So, thanks Jeff for asking the question in a way that allowed me to discuss the difference between knowing and predicting, and illustrating the importance of framing questions about precision using precise language.

But now to part 2. Let’s get to the heart of the issue. Notice what you did, Jeff, to reach the end of your example: you poured the cups of water together.

You took 25 separate cups of water and made 1 single pot of water of them. In getting to the end of your example, you changed the physical state of your system.

It went from heterogeneous to homogeneous.

When the cups of water were poured together, the “62(+/-)7.36 C” lost all relevance to the question. That (+/-)7.36 C is the standard deviation of the variation of the magnitudes of the temperatures of the individual cups about the mean temperature.

When you poured the cups together, all the heterogeneity disappeared. That magnitude standard deviation, reflecting heterogeneity, lost all physical connection with the final state of the water, which was homogeneous.

Look what else happened. Before the cups were poured together, the 62 C — the measurement mean — was a statistic only. It had no physical reality.

When you poured the cups together, the 62 C became a physically real magnitude.

So, in framing your question, Jeff, between the beginning and the end, you changed a physically heterogeneous system into a physically homogeneous system, and you changed a statistic into a physical magnitude.

The entire relevance of your question rests on ignoring these conversions. But the conversions, from physically heterogeneous to physically homogeneous and from a statistical mean to a physical magnitude, makes spurious the relevance of your question.

So your final state, coming after these physical conversions, has lost connection to the initial physical system and really has lost all of the meaning you intended. Your two final choices have no immediate connection to your final physical state.

So, with all that as context, here’s a question for anyone reading this: Have you ever seen all the 31 days of July poured into a single day? I chose July because Lucia chose that month for her own example.

So, how to pour all the days of July into a single 24-hour day. We have to do that, because that’s the only way to convert the heterogeneous magnitudes of the 62 daily min-max temperatures into a single homogeneous temperature representing the new homogeneous physical state of July collapsed into 24 hours of one temperature — your final pot of water, Jeff.

Of course, we cannot collapse all 31 days of July into a single homogenous state, with a single physically real homogeneous temperature equal to the previous statistical mean temperature.

Your cups of waters all co-existed in a single time. They can be physically combined. The days of July are permanently separated in time. They cannot be physically combined.

The July temperature mean is a statistic. It will always be a statistic. It will never, ever be the physically real magnitude of a single homogeneous 31-day day.

It’s as though your cups could never be combined, Jeff. As though their multitudinous and heterogeneous temperatures could never be converted into a single homogeneous temperature.

Now the list of temperatures becomes a relevant illustration. Let’s suppose we had one cup a day, for 25 days, with the 25 cups sequencing through your 25 temperatures. Each cup exists only one day. We measure the one temperature of the day’s cup, one at a time, once a day, a different cup for 25 sequential days. Our measurement precision is 1-sigma = (+/-)0.0051 C.

We have 25 measurements. At the end of the 25th day and the 25th cup, we can say with your example above, that the mean of those temperatures is 62(+/-)0.001 C.

However, we never removed the heterogeneity of the temperatures of the system of individual cups. The total volume of water was never combined and never reached 62 C. The 62 C was never, ever a real physical state, except, as it happens, in the 13th cup of water.

And even then, on the 13th day in the 13th cup, the 62 C of that cup — a real physical state — was not equivalent to the combined temperature mean of 62 C — a statistic.

Since the cups of water can never, ever be combined, the temperature heterogeneity of the system of cups can never, ever be removed.

If we want to fully describe that irreversibly heterogeneous system of cups and their temperatures, we can certainly say their statistical average temperature was 62(+/-)0.001 C.

However, we would only be improving the physical accuracy of our description, by including information about the ineluctable physical variation of the individual temperatures about the statistical mean. That would be 62(+/-)7.36 C.

A statistical mean, not a physically real state.

That (+/-)7.36 C represents the magnitude uncertainty of the mean and represents the real physical variability of the system of 25 cups of water. That (+/-)7.36 C tells us about the heterogeneity of the total physical state encompassing the 25 individual sub-states represented by the cups of water.

The (+/-)0.001 C only tells us about the precision of our knowledge of the statistical mean.

The two uncertainties are different things. The magnitude uncertainty can never be declared truly zero even if all of the temperatures are judged physically indistinguishable to the best accuracy and precision of our best thermometer.

Magnitude uncertainty of the mean of a physically heterogeneous state = the natural variability of a heterogeneous system = (+/-)s.

Statistical uncertainty of the mean of a physically heterogeneous state = measurement precision = (+/-)sigma.

They are not the same, (+/-)s is not zero, and it is not an error to pay separate attention to it.

A physical system consisting of sub-states of heterogeneous magnitudes cannot be fully described without noting the variability of those magnitudes about the mean state. The mean state, as calculated, is a statistic — a convenient fabrication that has no physical reality.

Virtually the entire post above concerns the magnitude uncertainty = the natural variability derived from the irreversibly heterogeneous magnitudes within the set of annual anomalies.

Noticing magnitude uncertainty (= natural variability) is not an error. Plotting the magnitude uncertainty as vertical bars around the mean state is not an error. The vertical bars tell us about the natural variability of a physically heterogeneous state.

I hope that clears things up about (+/-)s. Honestly, I had no idea the concept of natural variability would prove so hard to communicate.

Noticing magnitude uncertainty (= natural variability) is not an error. Plotting the magnitude uncertainty as vertical bars around the mean state is not an error. The vertical bars tell us about the natural variability of a physically heterogeneous state.

First:It is an error if you actually say that natural variability is an error attributed to the mean which you do in your paper. You do so repeatedly.

Second: Although the natural variability is something real, the full spread of natural variability is not the uncertainty with respect to assessing whether the average temperature for year A is different from year B. It is irrelevant for assessing whether the earth’s surface has warmed.

Suppose Jeff presented you with a batch B of 25 cups, which now have temperature ranging from 52C to 76C, told you he was going to mix batch B. He then called the previous batch A. Then, he asked you
a) to predict whether after mixing the previous batch which we will call a batch A would be warmer or cooler than B and
b) to state how confident you were in your prediction.
c) to estimate the difference in temperature of mixed batch B and mixed batch A and finally
d) to estimate the uncertainty in the difference between mixed batch B and mixed batch A

You ought to be able to predict that batch B will be warmer than A, you ought to be able to state that you are very, very confident B will be warmer than A, and you ought to be able to state with confidence that B will be warmer than A.

And the fact is, the average temperature of batch B is warmer than the average temperature of batch A both before mixing and after mixing.

This is true whether in your mind means of unmixed cups are only “statistics” or “real physical states”. One might argue that even temperature is not a “real physical state”, since it is an average of a property over many individual molecules.

If all your uncertainty intervals mean is that spread of temperatures in batch A and B overlap– well… all righty. But this information is utterly uninteresting and irrelevant to assessing warming. It is irrelevant to commenting on CRU. And people who read your paper need to understand that your “uncertainty intervals” have nothing to do with uncertainty in the annual averaged surface temperatures.

For this reason: Your paper should be ignored as presenting something utterly irrelevant to any interesting question touching on assessing whether on average, the surface temperature has warmed.

Jeff’s question implies this experiment is repeated many times, and he asks what would be the distribution of the temperature measurements made by measuring each cup, and then combining them. Your statement that we could only know the temperature of the final mixture by measuring it is false. Given the conditions that Jeff has given, the average of the temperatures of the individual cups is a physically valid way of determining the final temperature of the cup, according to the conditions of his experiment.

You point that we can’t know anything about the error in Jeff’s procedure by making a measurement using his procedure only once, as you pointed out is an irrelevant diversion. Jeff is really asking, “What does the distribution of measurements made this way look like?”
We must make the measurement many times and produce a distribution. It is simple to express this mathematically. The noise of each measurement i, of the mixture will be:
Sumi(n1…n25)
If the n’s are distrubuted normally with a variance, we should observe that the sum of 25 measurements will have a variance 1/5 the size of the variance of the individual cup measurements as Jeff points out.
This answer would be no different if the cups were mixed, then divided in 25 sample parts and the temperatures of each part were measured and averaged. This is undeniable.

It is a straw man argument point out that a mean value, does not fully represent a physical state. No one is claiming that an average is anything but an average. While an average does not fully describe the behavior of a variable it is a significant quantity worth recording.

The invention of the term “heterogeneous system” doesn’t really justify making a distinction between Jeff’s gedanken experiment with the cups of water, and the measurement of air temperature on successive days.

What is wrong is confuse the real physical variations of the individual measurements, with the noise due to the measurement procedure. They are different things.

Looking at the big picture, the satellite temperature record and the surface temperature record don’t look that different, despite the fact that they use totally different methodologies. That indicates that they are both describing the same real physical phenomenon – the earth is getting warmer.

Mark Tsaid

Thank you for your detailed reply and I do hope you don’t feel piled on about this. There are some errors in your answer above which are separate from the main issue of the paper but I do appreciate that you didn’t get into the nuance that the cups are not normally distributed in temperature which is a side issue from the point.

Specifically, if the error in measurement is truly random having a normal distribution, then our knowledge of the mean is 0.001 rather than the single measurement knowledge of 0.0051. A simple test of this would be to say you have 25 cups of an exact temperature of 62C all measured separately on different days with the instruments as specified above. How well do you know the mean? It gives a mathematically identical result to mixing them all and measuring 25 times. Knowing the day you measured each unchanging cup or whether the cups are mixed later does nothing to the precision of your measurement from a math standpoint.

In my opinion, nobody has found anything wrong in the math of your paper. I haven’t replicated it or done any work to that level. What we disagree on is the question that your math answers, or at least I think so because the answers don’t support the conclusions of the blog post.

Magnitude uncertainty of the mean of a physically heterogeneous state = the natural variability of a heterogeneous system = (+/-)s.

Statistical uncertainty of the mean of a physically heterogeneous state = measurement precision = (+/-)sigma.

They are not the same, (+/-)s is not zero, and it is not an error to pay separate attention to it.

I agree with the three points in the quote, I also agree that the variability about the mean state has meaning. I even agree that weather variability creates error in the measurement through incomplete sampling that must be accounted for.

In paper 1, the equation you used for the safest possible assumption of complete lack of knowledge of error avg(s) = sqrt( ( N * s^2/(N-1)) effectively indicates the variance distribution of weather is not only not normal but apparently antinormal as any amount of sampling amount produces any average value between the extremes. I don’t have any experience with a distribution like this, having encountered it here first. By the oversafe usage of the zero knowledge error distribution, what you have inadvertently estimated is the basic probability distribution of an individual temperature station about the mean.

This calculation does not accurately answer the question, “What is the likelihood that by normally distributed random error processes the ‘true’ mean is actually x degrees from the calculated mean?” While the weather processes have some degree of non-normality in distribution, autocorrelation (spatial or temporal) negates the usage of this equation.

Zeke’s subsampling post is a powerful demonstration that the true error bars are affected by sample number. This confirms beyond a doubt the point I’m trying to make. Not only are the bulk of the errors in mean much smaller than your result, they contract with more sampling as you would expect from error with a less antinormal distribution.

And again, even if the antinormal distribution were correct, it wouldn’t be enough to conclude that we don’t know trend. That has a different error calculation entirely.

Then, in other sections, you use the anomaly variance as a raw input into the understanding of average. This is just incorrect as the cup example shows. Were the monthly data distribution from station to station truly random white noise, you would be correct but again this is weather which has autocorrelation due to thermal mass and anomalies in different regions have different trends guaranteeing a differential value unrelated to the accuracy of the mean. Where this variance has effect would be if you didn’t sample all the cups. Jones 98 used some unusual correlation statistics to estimate this noise. Zeke has given an elegant way to back calculate the uncertainty ‘s’ due to weather noise, temperature and all random factors which makes very few assumptions necessary. His method would also allow the verification of the sqrt(n) issue by utilizing different sample sizes.

I doubt you will be convinced by my discussion so I wonder if you can interpret the difference between Zeke’s result and yours. Perhaps that can narrow our points of view.

Steve Fitzpatricksaid

People measure and evaluate small trends in noisy processes all the time. It is very common. Your internal process variability (‘s’) has nothing at all to do with uncertainty in a trend, except to the extent that a larger ‘s’ means you need more data to reach a specified level of uncertainty for the trend estimate. Your formulation suggests that small trends in noisy processes are essentially invisible for ever, since the contribution of internal process variablitiy to the “total uncertainty” can never be reduced with additional measurements. You are simply wrong about this, but I suspect you will go to the grave thinking otherwise. Too bad.

Pat Franksaid

I generally agree with your second paragraph, and essentially gave the same view in my own post, when mentioning the 25 measurements of your final pot of 62 C water: “Nevertheless, …”.

I’d like to emphasize, though, that measuring one pot of 62 C water 25 times gives you precise temperature that represents a single physically homogeneous state. Measuring 25 cups of 62 C water gives you a precise statistic that represents the mean of 25 equivalent states. I.e., the first 62 C is a temperature, and the second 62 C is a statistic.

Your question in #113 conflated a statistic with a temperature. The precisions are identical, but the meanings of the results are not.

You wrote, “What we disagree on is the question that your math answers, or at least I think so because the answers don’t support the conclusions of the blog post.”

My blog post extends a point taken from paper #2. Lucia’s objection, and by extension yours, concerns paper #1. Specifically, Lucia contends that in paper #1 I represent (+/-)s as an uncertainty somehow equivalent to measurement error in a temperature mean. Let’s please keep the discussion focused on paper #1, comprising Lucia’s objection.

But I do need to emphasize for clarity’s sake that the conclusions in the blog post stemmed from paper #2, not from paper #1. The post presents anomaly magnitude uncertainty in exactly the way Jim Hansen used it in his 1987 paper, and in his 1988 testimony before Congress: as a measure of the natural temperature variability of recent climate. That meaning includes Figures 2-5.

The conclusions in the blog post, therefore, have nothing to do with paper 1, or with Lucia’s objection; or with the first Figure in the post, which stems from paper #1. By conclusions I mean the italicized text at the end, not the summary points.

You wrote, “In paper 1, the equation you used for the safest possible assumption of complete lack of knowledge of error avg(s) = sqrt( ( N * s^2/(N-1)) effectively indicates the variance distribution of weather is not only not normal but apparently antinormal…”

Jeff, nowhere in paper 1 do I assign that equation to the variance distribution of weather or to weather noise. Nowhere. That equation, half of equation (6), represents magnitude uncertainty = the natural variability of inherently heterogeneous magnitudes.

The full form of that equation is (+/-)s = sqrt[(tau_i – T_bar)^2/N-1], under explicit conditions that only measurement noise is present and that noise is stationary random.

Please correct your view: (+/-)s has nothing whatever to do with weather noise variance. It is never, ever, associated with weather noise in either of my papers.

Further, concerning weather noise, in paper 2, page 408, just under equation (1), I wrote that, “Over long times local temperature excursions due to “weather noise” may average away as 1/sqrtN [2], but deterministic monthly temperature trends need not.“, which shows that I treat weather noise as normal, not as antinormal.

Page 414, paragraph 2: “Figure 3 shows the 30-year average October temperature series from Anchorage, Alaska. Comparison of Figure 3 with Figure 2 shows that weather noise has almost completely averaged away in the 30-year data, consistent with the assumed stationarity. (emphasis added)” Normal random.

Weather noise: always treated as normal random, never as antinormal.

You wrote, “what you have inadvertently estimated is the basic probability distribution of an individual temperature station about the mean”

“This calculation does not accurately answer the question, “What is the likelihood that by normally distributed random error processes the ‘true’ mean is actually x degrees from the calculated mean?””

Since (+/-)s has nothing to do with weather noise, I’m not sure where that leaves your first sentence above. However, the question you posed in the second sentence is not a question I posed or tried to answer in either one of my papers.

The question I posed in paper 1 comes right in Section 1.1. The scope of the study
“This study evaluates a lower limit to the uncertainty that is introduced into the temperature record by the estimated noise error and the systematic error impacting the field resolution of surface station sensors.”

The estimate of Folland, et al., comes up for examination on page 974, Section 3.1. The average noise uncertainty estimate: “It is now possible to evaluate the (+/-)0.2 C uncertainty estimate of Folland, et al. [12], … [which] was not based on a survey of sensors nor followed by a supporting citation.”

and after a survey of the literature, Section 3.1 concludes, “the (+/-)0.2 C estimate in Ref. [12] is the assessed (+/-)sigma_bar_prime_noise of Case 3b above, namely an adjudged assignment taken to represent the average uncertainty from an ensemble of surface station measurement noise variances of unknown magnitude and stationarity. It does not represent the magnitude of random noise for any specific measurement, nor does it represent the noise variance of any specific sensor, nor is it an average of known stationary variances. (emphasis added)”

So, the paper posed two questions: what is the meaning of a subjectively adjudged estimate of average uncertainty and what is the meaning of systematic measurement error.

You wrote, “Zeke’s subsampling post is a powerful demonstration that the true error bars are affected by sample number. This confirms beyond a doubt the point I’m trying to make. Not only are the bulk of the errors in mean much smaller than your result, they contract with more sampling as you would expect from error with a less antinormal distribution.”

Here’s what you yourself wrote at the head of Zeke’s post, Jeff: “It presented a set of very tight error bars based on weather variance, sampling errors, and any other random events which affect measurements. The error bars don’t incorporate any systematic bias… (emphasis added)”

Recall with what paper 1 is concerned: systematic instrumental error.

But you yourself noted Zeke’s analysis says nothing about systematic error, or about the uncertainty bars associated with it. It follows that Zeke’s analysis says nothing about the results in my paper.

You wrote, “Then, in other sections, you use the anomaly variance as a raw input into the understanding of average.”

Not in paper 1, I don’t. Nor in paper 2. I didn’t do it in the post here, either, so really I’ve no notion of where you got that idea.

Maybe you mean magnitude uncertainty, (+/-)s, but I use that to understand the physical variability of heterogeneous states, not to understand the statistical measurement uncertainty in a mean.

Then you wrote, “This is just incorrect as the cup example shows.”

Your cup example lost all connection to the analysis of anomaly uncertainty when you had all the cups poured into a single pot, for the reasons I laid out. Sorry to say your cup example showed nothing except that magnitude variation is different from measurement uncertainty — exactly as expressed in paper 1, equation (6).

The rest of your penultimate paragraph is about weather noise, which I’ve already shown has nothing whatever to do with the final uncertainties I assessed in paper 1. In paper 2, also as already noted, I treat weather noise as random, but there, too, the final uncertainties have nothing to do with it

You finished with, “I doubt you will be convinced by my discussion so I wonder if you can interpret the difference between Zeke’s result and yours. Perhaps that can narrow our points of view.”

As you noted, Zeke’s analysis concerned “random events.” My analysis concerns systematic events. That should be enough to establish, for good and all, the reason for a substantial difference between Zeke’s results and mine.

By now I truly hope you realize that my uncertainty analysis was not concerned with weather noise and did not treat weather noise as antinormal.

Pace Lucia, it does not treat (+/-)s as a measurement uncertainty, either.

Pat Franksaid

Pat Franksaid

#143 Steve, you wrote, “Your internal process variability (‘s’) has nothing at all to do with uncertainty in a trend…”

We can agree on that, if you mean something about measurement uncertainty. You do agree, don’t you, that the internal variability bars have something to do with determining whether the trend is due to natural causes?

But then you wrote, “except to the extent that a larger ‘s’ means you need more data to reach a specified level of uncertainty for the trend estimate.,” which is wrong.

(+/-)s is the variability due to an inherent variation in magnitude. If the magnitudes are the result of a deterministic process, you’d have no knowledge about whether their mean should get larger or smaller with time or with sample size, unless you had a predictive and falsifiable theory about the process of which they are an observable.

You wrote, “Your formulation suggests that small trends in noisy processes are essentially invisible for ever, since the contribution of internal process variablitiy to the “total uncertainty” can never be reduced with additional measurements. You are simply wrong about this, but I suspect you will go to the grave thinking otherwise. Too bad.

You’re mixing “noisy” with magnitude variability, Steve. They’re not the same thing.

Maybe I can clear this up with an illustration. Suppose you’re measuring the time evolution of an observable.

Suppose further that you have an excellent instrument with systematic instrumental error much smaller than the magnitude of your observable plus stationary noise. Let’s suppose you take several measurements a day, combine those into a daily mean, and plot the means with time.

The daily mean includes almost no systematic error, so we can neglect that. The stationary noise reduces as 1/sqrtN in the mean. But you have some magnitude variability among the daily observables that shows up in the mean. You can tell because the daily standard deviation you calculate, is itself always greater than expected for 1/sqrtN statistics. If you developed a way to independently measure the noise intensity, by the way, you’d be able to get a good estimate of every day’s magnitude variance.

Suppose now you have a set of daily mean values that show a time-wise trend with a small positive slope, relative to the zero established by the daily mean of an arbitrary start date. You plot the daily magnitude variability bars around the means. The precision of the mean is known by the 1/sqrtN sigma_noise derived from the repetition rate of your daily measurements.

You plot the daily mean values.

1) If the positive slope does not cause the trend to emerge from the 1/sqrtN noise uncertainty bars, you can’t say the trend is physically real. By “emerge” I mean that the uncertainty bars around your trend no longer reach the mean value you derived on your arbitrary start date.

2) If the trend line emerges from the 1/sqrtN sigma_noise, but is within the width of the magnitude variability bars — the (+/-)s — then you can say the trend is real (to 1 sigma), and that it’s within the limit of natural variation.

3) If the slope produces a trend that emerges from the 1/sqrtN sigma_noise bars, and also emerges from the (+/-)s bars, then you can say the trend is physically real (to 1 sigma) and has exceeded the natural variability of the daily mean magnitudes (to 1 sigma).

I hope that clears things up, and I really wish you and some others here, and at Lucia’s, would cease with that negative personal commentary. It adds nothing to the discussion.

Steve Fitzpatricksaid

It stems from frustration, which comes from you not understanding at all what you are taking about, but insisting that you do and that most everyone else in the world is wrong…. that is indeed frustrating. It is also frustrating because the kind of utterly erroneous analysis you insist is correct can be pointed at by climate scientists as an example of the illegitimacy of other, carefully reasoned, skeptical arguments about the true size of AGW and the true value of climate sensitivity.

Your notion of how ‘s’ impacts the ’emergence of a trend’ above the natural variance is just wrong. (As Lucia says, “utterly, completely….” wrong.) I have no idea how someone who works in science or engineering could ever come to such a conclusion, but I will ask two more questions to see if there is any hope of showing you where you are mistaken.

You appear to think that the max/min daily average is not an adequate representation of the true daily mean temperature. Consider for a moment that the daily mean could be calculated in other ways, for example, nearly continuous measurement of temperature at each station… say a very accurate reading each 1 minute, digitized and stored on a computer. At the end of each 24 hours, we would have not 2 readings to average, but 1440 readings. Would the average of that number of individuals readings at a station be sufficient to remove most all doubt about the accuracy of the daily average for that station? If so, would installing similar continuous monitoring stations uniformly all over the Earth (say one for each 25 square KM of the surface) remove most all of the uncertainty you describe using ‘s’?

If your answer to both questions is ‘yes’ then there may be hope you can see your error. If not, then there is no hope, and I bid you good luck and goodbye. Life is too short to waste time on such things.

Steve Fitzpatricksaid

No, it really is frustration. I do not know how to tell someone that they are wrong in ways other than what has already been tried in this thread and over at Lucia’s. Pat simply does not understand what he is talking about. To accept his conclusions about our level of uncertainty in measurements of complex systems would mean that most all of science and engineering are founded on erroneous analyses. As Lucia said “It’s nuts.”

Jeff Idsaid

Your answer is hard to follow but I’ve worked it out. I do feel that this has become a little one sided in communication but I understand the spot you are in.

Jeff, nowhere in paper 1 do I assign that equation to the variance distribution of weather or to weather noise. Nowhere. That equation, half of equation (6), represents magnitude uncertainty = the natural variability of inherently heterogeneous magnitudes.

You are correct, my interpretation was in error. You have assigned a sigma of +/- 0.2 specifically and then assumed some kind of anti-normal distribution. My error was caused by the previous descriptions of how to calculate sigma. However, this interpretation of ‘worst case’ is still fairly unique considering the sources of error this value represents.

I really wish I wouldn’t have added that confusion to this issue, it is a separate problem so lets ignore it.

————–

Equation 2 paper 1 – I’ve never used latex so the screen grab is:

Note the wording, mean temp plus minus mean noise. This is correct though because you have limited it to multiple measurements of a single temperature. Where it all goes wrong is example 2.

The mean temperature, Tbar , will have an additional uncertainty, ±s, reflecting the fact that the tau magnitudes are inherently different

This is not correct, and this is where you begin to improperly incorporate weather noise into the problem. My cup example is a clear demonstration of this fact.

Therefore under Case 2, the uncertainty never approaches zero no matter how large N becomes, because although ±σn should automatically average away, ±s is never zero.

Now we see the result. Note that further sampling has the same effect as your equation I just admitted to misinterpreting. Due to statistical conflation of the question, they are mathematically identical and my critique of the result was correct in my previous comment. It should be more clear now, how it was confused. What you have done, is you have conflated the error in knowledge of a mean of a single true temperature value with the error measurement of a mean of multiple true values — the cup problem. In fact, the result is mathematically identical to my ‘misinterpretation’.

You have in fact, incorporated weather noise in a perfectly anti-correlated fashion with the result being that your answer predicts the ‘likelihood of a single station deviating x degrees from the mean’ instead of our knowledge of the mean.

Again, I would suggest you pay closer attention to Zeke’s result as it does directly contradict yours and proves exactly what I’m saying.

Paper 2 starts with the same problem in Eq1.
————–

So what is the right way?

sigma^2 of a single station = sigma ^2 measurement error+ sigma^2 total sampling error spread to this single station

a little confusing, but from Zeke’s result we can estimate the true sampling error from gridding by adjusting the number of stations and recalculating the resulting distribution. This would allow backcalculation of the sampling error as well as an estimation of the normality of the distribution of those errors. This sigma could then be distributed across all of the temperature stations with an equal part per station.

Pat Franksaid

You asked: 1) “Would the average of that number of individuals readings at a station be sufficient to remove most all doubt about the accuracy of the daily average for that station?”

Answer: Yes.

2) “If so, would installing similar continuous monitoring stations uniformly all over the Earth (say one for each 25 square KM of the surface) remove most all of the uncertainty you describe using ‘s’?

Answer: (+/-)s has nothing whatever to do with measurement uncertainty.

I really have to apologize. I must have written the paper in a very unclear way.

Your question 2 shows that you somehow completely missed the point that s^2 is the variance of inherently different magnitudes. Inherently different. How does taking more measurements reduce the size of an inherent difference?

Here’s a walk-through: Take all the annual anomalies from all your every-25-square-km global surface climate stations. Combine them into a global average annual anomaly for each given year. Do that for 30 years, so that you have a 30-year trend in global average annual anomalies.

Fit the trend with an OLS line. The fitted line is your normal. Subtract the fitted line from your global average annual anomaly trend. The processed data are your normalized anomalies. Your 30 annual anomalies are now normalized to a common zero.

Do all the normalized anomalies have one constant magnitude? Or do they vary about zero in some way?

If they vary about zero in some way, does the variation about zero have anything to do with measurement error? If not, does that variation about zero have to do with possible inherently different average annual temperatures among the various years? Does that variability in annual averages have a standard deviation?

Suppose you had 160 years of annual anomalies, from 1850 through 2010. Do a linear OLS fit to all 160 years of anomalies. Subtract the OLS line from the 160 years of annual anomalies. These are your 160 years of annual anomalies normalized to a common zero.

Remember, they represent 160 years of every-25-square-km-of-the-global-surface anomalies. Are all the normalized anomalies of a constant magnitude? If not, does the magnitude difference represent measurement error?

If not, is there a standard deviation that describes the 160-year scatter of normalized anomaly magnitudes about the normal zero? Does that standard deviation represent an inherent magnitude variation of the anomalies about their common zero?

Would the magnitude variation reduce significantly if you had climate stations every 1 square km across the globe?

If not, what happens to the relevance of your question 2?

Those questions are illustrative only. We all know the answers.

Again, my apologies Steve. Apparently writing, “… (+/-)s, [reflects] the fact that the tau_i magnitudes are inherently different, made it seem to you like I meant that (+/-)s represents random measurement error.

Longer than I thought when I began, but all in the interests of transparent science.🙂

JRsaid

It took some reading and digesting, but I get the gist of what you are talking about. However, I’m not an expert in statistics. I think others’ filters are getting in the way and I find it a bit amusing that their frustration with you is actually self-imposed.

2) “If so, would installing similar continuous monitoring stations uniformly all over the Earth (say one for each 25 square KM of the surface) remove most all of the uncertainty you describe using ‘s’?

Answer: (+/-)s has nothing whatever to do with measurement uncertainty.

That’s not an answer because Steve didn’t ask you about measurement uncertainty. He asked about “the uncertainty you describe using ‘s’? In paper 2, before equation 1 you s “the magnitude uncertainty in the mean temperature” . So that would be “the uncertainty you describe using ‘s'”.

I have no idea what the point of your series of rhetorical questions is supposed to be.

If not, what happens to the relevance of your question 2?

Having read all your rhetorical questions, and thought of the answers I would give, it seems to me that SteveF’s question remains relevant. It would be interesting to read the answer to the question he actually asked.

Pat Franksaid

#156 Lucia, you wrote, “That’s not an answer because Steve didn’t ask you about measurement uncertainty.”

Steve’s second question was all about measurement uncertainty. That’s exactly revealed by the third-from-the-last sentence in his post: “If your answer to both questions is ‘yes’ then there may be hope you can see your error.

The only way a “yes” answer is possible for Steve’s second question, is if he construes s^2 to represent the variance of a random measurement error. It does not.

Steve reveals his fundamental misunderstanding of (+/-)s in his expectation of a “yes” answer.

If you really think that a “yes” is correct and that Steve’s second question is relevant to (+/-)s, then you fundamentally misunderstand (+/-)s as well.

Pat Franksaid

#152 Jeff, first my thanks for your continuing civil discussion. Also, thanks very much for fixing the close-link html tag for me in #144. 🙂 I’m now using a real time on-line HTML editor, and so the format should be better.

My response to your comments and screen grabs is a bit long, so I broke it up into 1 + grab-wise.🙂

Part 1: I see the problem. Lets step back a minute and see what it is Section 2 set out to do. Section 2 follows the general approach outlined in Section 1.

Section 1.1 sets out The scope of the study

Section 1.1., paragraph 2, sentence 1, quote: “Basic signal averaging is introduced and then used to elucidate the meaning of the estimated (+/-)0.2 C average uncertainty in surface station temperature measurements as described by Folland, et al. [12]. (emphasis added)”

Basic signal averaging is introduced: there is no necessary application to climate, air temperature, or weather.

Let me state this right up front: The Cases are not about science. They are about math, using temperature as a contextually convenient illustration.

Sentence 2: “An estimate of the noise uncertainty in any given annual temperature anomaly is then developed. (emphasis added)”

So, Section 2, about the math, comes first. Then an uncertainty model for air temperature is developed after Section 2 and after the concepts of basic signal averaging are introduced.

Only the basic concepts of signal averaging are introduced in Section 2.

Section 2 is Signal Averaging. Section 2 is not ‘Averaging the Temperature within a functional model of local air temperature.’

The cases offered are strictly limited to illustrating how signal averaging works. The use of air temperature in those illustrations is only a contextual convenience.

I could just as well have made the illustration using absorption spectra. Equation 1 could have been concerned with absorption intensity, a_i = alpha_i + n_i, etc.

The math would have been identical, and the eventual application to air temperature later in the paper would have proceeded the same way.

It’s true that I use air temperature to illustrate the concepts of signal averaging. This seemed convenient because, after all, the paper was about uncertainty in air temperature. But apparently using temperature to illustrate the math of signal averaging has misled many people into thinking that I was discussing climatology. I really truly regret the trouble this has caused everyone. If I could do it again, I’d probably use totally neutral notation.

But weather noise is not part of Case 2, because Case 2 is not about anything except how the contents of equation (4) are treated under the conditions of variable signal and constant noise.

It’s a simplified example to demonstrate the evolution of the statistical treatment of data as the conditions become step-wise more complicated.

If I had wanted to make a more complete model of local air temperature at that point, equation (4) would have included a term for weather, w_i, and tau_i would have been more closely defined.

But the intent was strictly to explore the limited case of signal averaging for a variable signal + constant noise; nothing more.

Case 2 is irrelevant to weather noise. It doesn’t include anything except a discussion of the general statistical approach to a signal of variable magnitude under conditions of stationary noise. Nothing more than that.

Pat Franksaid

About grab 1, Jeff you wrote: “Note the wording, mean temp plus minus mean noise. This is correct though because you have limited it to multiple measurements of a single temperature.”

Case 1 is not limited to “multiple measurements of a single temperature.” You’ve re-interpreted what I wrote. Case 1 is limited to multiple measurements of a constant temperature: Case 1, quote: “signal-averaging repetitive measurements of a constant temperature.”

The difference is important and systematic. It’s the difference between 25 measurements of your one pot of 62 C water (single temperature), and 1 measurement each of 25 cups of 62 C water (constant temperature).

Although the uncertainty statistics are the same, the difference is between a system of one state and a system of many states. The first set of measurements produces a temperature mean and its precision; the second set produces a statistical mean and its precision.

Sorry to seem pedantic here, but the difference is scientifically important and your misinterpretation of what I wrote is emblematic of the problem, as I’ll show.

Pat Franksaid

Now grab 2, Jeff, where you say I got it wrong. Here’s your quote where you establish where I got it wrong:

Quoting me: “The mean temperature, Tbar , will have an additional uncertainty, ±s, reflecting the fact that the tau magnitudes are inherently different”

And you follow that with, “This is not correct, and this is where you begin to improperly incorporate weather noise into the problem.”

Equation (4) gives the model for Case 2. Here it is in full: t_i = tau_i + n_i, where tau_i is not equal to tau_j, etc., and n_i is stationary noise.

Weather noise appears nowhere in that model. Equation (4) has no term representing weather.

But you’ve interpreted the generality of magnitude difference to mean “weather noise.” Except that it does not.

It means magnitude difference, in a general sense. Magnitude difference is merely illustrated using temperature as an example. Case 2 is not about climatology. You have imposed a new climatological meaning on it: weather noise.

For Case 2, the noise, n_i is defined as stationary. The tau_i are defined as of inherently different magnitudes. That’s the end of Case 2: the be-all and the end-all.

Different magnitudes, stationary noise, what happens to the statistics. No weather.

Nothing about weather. Weather is nowhere in Case 2. Weather does not appear among the axiomatic terms about tau_i and sigma_i that strictly define and limit the model examined under Case 2.

It’s a simple model exploring the most basic of signal averaging statistics.

Case 2 does not include weather. Nowhere in paper 1 do I ever evaluate weather noise using Case 2 statistics. Nowhere in paper 1 do I ever evaluate weather noise at all.

You have inappropriately carried climatology and weather into Case 2 under the mistaken impression that “inherently different magnitudes” must mean weather noise. But it does not.

Case 2 is not about climate. It is about the statistics of signal averaging. It is a general case with a very limited axiomatic meaning. That meaning is given in the few lines in the paragraph under 2.2. Case 2 and equation (4). No more than that.

Temperature is merely used as a convenient illustration. I see now that using temperature to illustrate the statistics was a mistake. It has led you, and probably Lucia, into thinking that I was analyzing climate. I was not. I truly regret my apparently poor choice of illustrative notation. It was misleadingly seductive.

Pat Franksaid

Jeff, now let’s look at your screen grab 3. In your post, the screen grab 3 link is followed by your quote from the bottom of the paragraph in that grab.

Quoting me: “Therefore under Case 2, the uncertainty never approaches zero no matter how large N becomes, because although ±σn should automatically average away, ±s is never zero.”

The sentence starts: “Therefore under Case 2…”

What is “under Case 2”? Well, it’s that t_i = tau_i + n_i. That and no more than that. No weather.

The Case 2 model does not include weather. It includes no more than the generality of variable magnitudes and stationary noise. That is why the sentence begins, “Under Case 2…”

The analysis and the results of Case 2 follow from the axiomatic definitions at the beginning of Case 2. Nothing more. Grafting weather onto Case 2 is not valid. Grafting weather onto Case 2 is imposing a new meaning that is not present in the original.

Case 2 is not a complete analysis of factors entering into local air temperature. It is limited to the artificial conditions of variable temperature and stationary measurement noise, alone, in order to explore the consequences attending to those conditions, alone.

Case 2 is the model of intermediate signal-averaging complexity, part-way through the process of developing the tools to finally examine the meaning of the estimated average of read-error published by Folland, et al. 2001 (in paper Section 3).

The paragraph in your screen grab 3 goes on to say that, under Case 2, (t_i – T_bar) = (n_i + delta_tau_i). Do you disagree that, under Case 2, this equation is true?

If (t_i – T_bar) = (n_i + delta_tau_i ) is true under Case 2, then isn’t it also true that under Case 2 (+/-)s represents the standard deviation of the delta_tau_i values?

And doesn’t each delta_tau_i represent the inherent magnitude variation of each t_i about T-bar, under Case 2?

And isn’t it true that weather noise is nowhere included under Case 2?

Then you wrote, “Now we see the result. … What you have done, is you have conflated the error in knowledge of a mean of a single true temperature value with the error measurement of a mean of multiple true values — the cup problem.”

On the contrary: Case 2 equation (6) explicitly separates out the noise-related measurement uncertainty from the magnitude uncertainty. It does the opposite of conflation.

Equation (6) shows how the increased complexity of Case 2 produces a result that is different from Case 1.

The fact that I illustrated the cases in terms of temperature should have led no one to conclude that the cases represent full models of daily or monthly temperature.

No one should graft onto a Case, a meaning that is not axiomatically stated explicitly in the definition of that Case.

From post part 1: The Cases are not about science. They are about math, using temperature as a contextually convenient illustration.

You then wrote, “You have in fact, incorporated weather noise in a perfectly anti-correlated fashion…”

Case 2 does not include weather noise. What is this preoccupation with weather?

Case 2 is about the statistical consequences of inherently different magnitudes. It’s not about weather.

The inherently different magnitudes of Case 2 are not about the temperature excursions of daily weather. Case 2 is about how inherently different magnitudes, themselves, affect the statistical calculation of standard deviation. Please stop imposing a full climatological meaning onto an illustratively limited statistical model.

When I actually discuss weather noise, it’s always represented as random. Nevertheless you continue to insist I make it antinormal, even though it appears that way nowhere in my papers.

So here’s the problem that’s leading to these difficulties.

My three stepwise Cases explore the statistical consequences of strictly limited and axiomatically defined conditions of t_i and n_i. These conditions increase in complexity across the Cases, in a build-up manner.

You have read those Cases, specifically Case 2, and grafted a climatological meaning — weather noise — onto “inherently different magnitude” that it does not have. It has only the definitional meaning I gave it. “Inherently different magnitude” does not mean weather noise. It means inherently different magnitude. Period.

You have read what I wrote, imposed your own new meaning on it, assigned your new meaning to me, and then criticized my result in terms of your meaning.

Steven Moshersaid

Pat Franksaid

Jeff, in your discussion of Zeke’s analysis, you wrote that it “is entirely different from Pat Frank’s weather noise discussed in previous posts …

Jeff, none of my posts have discussed weather noise. Not one.

The head post here discusses the natural variability of the anomaly temperatures. It’s ironic, because the “magnitude uncertainty” (= natural variability) of the 30-year anomaly normals in post Figure 2 and Figure 3 should include virtually no weather noise at all.

The 30-year averaged October temperatures for Anchorage, AL, in Figure 3 of my paper 2, demonstrates this, in that, due to averaging, all the weather noise is gone from the temperature trend.

The 160-year anomaly natural variability, in post Figure 4 and Figure 5, should include even less weather noise.

You also wrote, “In Pat’s work, the error due to weather was the total variance of different stations.”

In my papers, uncertainty due to weather noise never makes it into any of my final uncertainty estimates. I’ve already pointed out that in paper 1, Case 2 (+/-)s does not represent weather noise.

In paper 2, where weather noise is considered (under Figure 2), it is treated as a random and decremented away as 1/sqrtN. In those terms, uncertainty due to weather noise would decrement to almost zero in an annual anomaly.

Weather noise does not play into any of the uncertainty bars in any of the five Figures that appear in the head post.

Pat Franksaid

#152, Jeff, you also wrote, “Again, I would suggest you pay closer attention to Zeke’s result as it does directly contradict yours and proves exactly what I’m saying.”

My paper 1 discusses the uncertainty statistics of systematic instrumental error and of a subjective estimate of read error. How does any of Zeke’s analysis of the uncertainties due to random errors have anything to do with that?

You also wrote, “Paper 2 starts with the same problem in Eq1.” I.e., that I have “incorporated weather noise in a perfectly anti-correlated fashion…”

Paper 2, eq. 1 gives the usual standard deviation about the mean of any measurement, including measured air temperature,” for any fluctuation or trend in air temperatures”

Just below eq. 1, I noted that, “Over long times local temperature excursions due to “weather noise” may average away as 1/sqrtN [2], but deterministic monthly temperature trends need not.”

You have claimed that I treat weather noise as antinormal. In that sentence, I treat weather noise as Gaussian normal, contradicting your claim.

Note that in paper 2, (+/-)s finally is in fact applied to real air temperatures, and in doing so, weather noise is brought into the picture. In paper 2, weather noise is treated as a random fluctuation that averages away, and Figure 3 is given as an example. Weather noise is not treated as antinormal in my paper.

Nothing I’ve actually written in the two papers supports your view of them, Jeff. Nor Lucia’s.

You finished with, “So what is the right way?

sigma^2 of a single station = sigma ^2 measurement error+ sigma^2 total sampling error spread to this single station”

My paper 1 deals, first, with the sigma^2 of a single station = sigma ^2 measurement error part of your “right way,” and specifically derives a lower limit estimate of the systematic part of instrumental measurement error.

The paper then goes on to extrapolate that single-station lower limit of systematic error into a global average uncertainty as an r.m.s mean, under the condition that the lower limit systematic error of a modern MMTS sensor is globally less than the systematic error produced by the less advanced LIG thermometers in CRS screens that constitute the bulk of the 20th century global air temperature record.

Pat Frank
1) I understand that s is the variance in the population.
2) I also understand that in both your papers you consistently state it is the uncertainty in the mean.

Whether you call the “mean” a statistic or a physical quantity, what ‘s’ is not is the uncertainty in the mean. Even if the mean is a statistic, it still has a known uncertainty, and that uncertainty is not equal to ‘s’.

No amount of equivocation, or introducing words like “measurement”, talking about heterogeneous systems, going off on tangents about physical vs. non physical quantities or claiming something unstated in implied in questions or criticisms of your paper or post. Your ‘s’ is not the uncertainty in the mean.

Steve Fitzpatricksaid

You did not answer my second question directly, but used instead many other words that seem to imply your answer is ‘no’.

‘s’ is completely irrelevant to any discussion of whether or not a measured trend in the mean is statistically significant, and it is irrelevant to any discussion about uncertainty in that trend, or any discussion about our confidence if a measured trend is ‘natural’ or ‘man made’. It is a parameter you are inappropriately using to justify your personal lack of confidence in, or perhaps your belief in a lack of relevance of, the measured mean temperature trends.

Your approach is bizarre and utterly wrong, and there appears no chance that you will ever understand that you are wrong. You are wasting people’s time, but no more of mine.

Jeff Idsaid

I have tried to explain what you have done to the best of my ability. I do realize now that you are unaware of how t – tbar incorporates weather noise no matter how you define it. Daily variance in the population of temperature sensors is primarily weather noise.

There is and always will be a differential between temperature sensors. As one heat/cold wave moves across a region, you get first positve and then negative perturbations in temperature anomaly. The average is in the middle, the knowledge of that average is not defined by the variance. When you use t – tbar to define uncertainty in tbar you are utilizing the noise in the weather to define uncertainty whether you realize it or not. The standard deviation of those values is absolutely not the uncertainty in the mean. Were they truly normal random measures of the same mean value, the SD/sqrt(N) would apply but they are not truly heterogenous as different regions show different trends over time and there is local correlation over time so even that is conservative. The SD value per station can be calculated from Zeke’s method and applied with that smaller value using SD/sqrt(N) to get the true sigma error in the mean.

As to claims about climate, the papers both use the word ‘trend’ in them. They both declare the global mean trend to be well inside the error of knowledge. I would recommend that you take any claims of climate trend knowledge out as even if your method were correct, that has not been supported in these calculations.

Again, Zeke’s simple Monte Carlo demonstration proves these points. If we had error bars in knowledge of the mean anywhere near yours, he wouldn’t be able to do what he did.

Removing the word “temperature” will remove the apparently irresistible impulse you folks have to add climatological meanings inappropriately to what is axiomatically defined as a strictly limited signal model for the development of signal-averaging statistical tools for later use.

For t_i substitute y_i.

for tau_c substitute upsilon_c.

For tau_i,j etc.,substitute upsilon-I,j, etc.

For T_bar, substitute Y_bar

So, for example, Case 1 equation (1), page 970 becomes:

y_i = upsilon_c + n_i, (1)

where y_i is the measured intensity, upsilon_i is the constant “true” intensity, and n_i is the random noise associated with the i_th measurement.

The variable metric along the abscissa could be time, space, wavelength, frequency, you-pick-it. The observable is just the intensity of some arbitrary signal along that metric.

These substitutions throughout Section 2 will make it clear that the Section is strictly dealing with the statistics of signal averaging, just as Section 1 stated would be done and as is introduced in the opening sentences of Section 2.

Making these substitutions, the Section will clearly step through examples of serially more complicated signal-intensity and noise combinations, showing how the statistics of signal-averaging change with each case.

For the Case 2 that has caused everyone so much trouble, then, the signal averaging model is axiomatically limited to signals in which the “true” intensities are not equal:

upsilon_i =/ upsilon_j,

where “=/” means ‘is not equal to,’

The other part of Case 2 is that the noise is still stationary, i.e., sigma^2_i = sigma^2_j, etc.

No more meaning is allowed to Case 2 than that.

Apart from these changes in notation, the step-wise statistical development through the three Cases remains identical.

The application of the Case statistics to understanding the meaning of a subjectively adjudged estimated error, such as was offered by Folland, et al, 2001, remains identical.

The only difference is that all references to “temperature” are removed from the three Cases. The more abstract notation still carries the entire statistical message originally intended, which about the evolution of standard deviation.

But with the use of abstract notation, no one will be seduced into reflexively adding in any meaning to the Cases that is not explicitly stated in the axiomatic definitions given at the outset of each signal averaging Case.

Here’s how the following sentence under Case 2, for example, will change:

Original: “The mean temperature, T_bar, will have an additional uncertainty, (+/-)s, reflecting the fact that the tau_i magnitudes are inherently different. The result is a scatter of the inherently different temperature magnitudes about the mean …”

Abstactized:, ‘The mean intensity, Y_bar, will have an additional uncertainty, (+/-)s, reflecting the fact that the upsilon_i magnitudes are inherently different. The result is a scatter of the inherently different intensity magnitudes about the mean …‘

There is now no temptation to find some cryptic meaning about ‘weather intensity’ in Case 2, and impose that meaning on the rest of the paper. Nevertheless, the statistical meanings associated with the two sentences are identical.

It should now be very clear that Section 2 is only about basic concepts of signal averaging:

Maybe I should have used abstract notation in Section 2 from the outset. But it never, ever occurred to me — not a hint of a wisp of a suspicion — that anyone would misunderstand Section 2 in the manner we’ve all experienced here.

None of the four AMS reviewers from JAMC — not even Dr. Adamantly_Opposed –or the two associate editors raised any problem with understanding the intended meaning of Section 2. Nor did the E&E reviewers, and one of those last, at least, must have read Section 2 carefully because (he) found an error in the original equation (6) that everyone else missed.

Not a hint of a problem from any of them.

But I truly regret the storm that was caused, and that the way I wrote caused so many of you to have a problem parsing my intended meaning. Sincere regrets to you all for that.

Pat Franksaid

Well, Lucia, if you can’t see the difference between a measurement uncertainty described by reference to random measurement noise and a magnitude uncertainty described by reference to inherently different intensities, then I can’t help you.

Pat Franksaid

#168 Jeff, when you — if you — read 169 I truly hope you will realize that the three Cases, and specifically Case 2, has no climatological meaning.

You’ve imposed meanings that are strictly disallowed by the definitional limits of the signal averaging models. That’s all they are: signal averaging models. None of the cases have anything to do with climatology, weather noise, or even air temperature. They’re statistical models. They’re about math.

Why is that so hard to grasp?

Your last sentence shows you don’t realize that the same physical processes that determine air temperature determine systematic sensor measurement error. This is the message of the work of Hubbard and Lin (and others).

As air temperatures are regionally correlated, so with high likelihood will be systematic error. To the extent that systematic error is correlated with temperatures, you won’t be able to detect it by comparing independently selected subsets of the temperature record. Monte Carlo tests about sensor error are made in the complete ignorance of the global structure of sensor systematic error. They will not tell you anything definitive about it.

Pat Franksaid

Pat Franksaid

Jeff, the major measurement error variance in daily temperatures is due to systematic sensor error.

Hubbard and Lin measured the errors produced by their test sensors by reference to air temperatures simultaneously recorded by a high precision, high accuracy standard sensor (an R. M. Young probe in an aspirated shield, known to record accurate temperatures under high solar loading).

The errors they recorded were the differences between the highly accurate air temperatures and the temperatures recorded by the test sensors (ASOS, MMTS, etc.) The systematic measurement errors they presented in their papers, as (test minus standard) differences, were strictly independent of weather noise. Temperature excursions due to weather were subtracted away by taking the differences between simultaneously measured temperatures..

There is no weather noise component whatever in any of the systematic errors described in paper 1. Anyone who has read that paper carefully should have fully realized that fact.

Re: Pat Frank (Jul 19 01:29),
Pat, you say “the same physical processes that determine air temperature determine systematic sensor measurement error.”
How can that be? Environmental processes determine air temperature, human manufacture and use of instruments determines systematic sensor error. There is no causal link between air temperature (or its causes) and instrument or sampling problems.

“For Case 2 measurements the noise variance, , and the magnitude uncertainty, ±s, must enter into the total uncertainty in the mean temperature”

Magnitude uncertainty defined as eq 6.

Where tau is the deltat between the true measured temp and the true mean temp. In your words – “Tau represents the difference between the “true” magnitude of tau i and T bar, apart from noise.”

How is the full standard deviation of tau additional measurement error ‘s’ of Tbar? These are both defined as ‘true’ noiseless signals. And how is it that this exact statement does not include weather noise? And how is it that measurement of this noiseless ‘source of error’ adds to the total error in knowledge?

Richard T. Fowlersaid

Pat Franksaid

#174 Brian, you’d not say that if you read the sensor calibration papers of Hubbard and Lin.

#175 Steve my knowledge of Portuguese is zero. My mistake. Sorry. 🙂 However, if you read post #169 your astute understanding of measurement of trends in complex systems should lead you to realize that the point you’re criticizing isn’t about measurement of trends in complex systems. It’s about statistics and the evolution of standard deviation under axiomatically restricted conditions.

Pat Franksaid

The full quote is “delta_tau represents the difference between the “true” magnitude of tau_i and T-bar, apart from noise.”

Jeff, you asked, “How is the full standard deviation of tau additional measurement error ‘s’ of Tbar?”

The standard deviation of delta_tau is not additional measurement error. Delta_tau_i does not represent an error. It is not part of a random normal spread around T_bar.

The magnitudes of the delta_tau_i do not represent random fluctuations about a mean.

They represent the outcome of inherently different magnitudes — similar to magnitude variations (intensity variations) one might get when taking measurements of the observable of a deterministically and systematically varying process.

This is what I meant by the properties of the case being axiomatically defined. The tau_i were defined as having inherently different magnitudes, and no more than that.

If the delta_tau_i were to represent random fluctuations about a mean, I would have defined them to be so. But absent that definition, a tau_i attribute of random variation about a mean can not be assumed (or imposed).

The first sentence you quoted above, said that, “… (+/-)s, must enter into the total uncertainty…” Total uncertainty. Not measurement uncertainty. Not even total measurement uncertainty. Total uncertainty.

Part of the total uncertainty of the mean of a set of measured observables of inherently different intensities (magnitudes) is the magnitude uncertainty itself, which is apart from the measurement uncertainty. It is a measure of the non-random variation in intensity one would obtain if one measured the system again.

In a science/engineering context it’s a measure of the natural variability of the observable magnitudes associated with a deterministically varying system.

Usually, a magnitude uncertainty is reported separately from a measurement uncertainty, if they can be known separately, as, e.g., value(+/-)sigma,(+/-)s.

Or sigma and ‘s’ can be combined as the r.m.s. if one wanted to express the total measured variation in observational magnitude, as recorded by your instrument.

You asked, “And how is it that this exact statement does not include weather noise?”

Because weather noise is not part of the statistical model. The model isn’t about climate. It’s not about daily temperature. As I pointed out explicitly in #169, it’s not even about temperature at all. It’s about how standard deviation changes from Case 1 when the observables come to have an inherently different magnitude but the measurement noise remains stationary.

You asked, “And how is it that measurement of this noiseless ‘source of error’ adds to the total error in knowledge?”

Magnitude uncertainty is not a source of error. Magnitude uncertainty is not an error in knowledge. Magnitude uncertainty is positive knowledge. It’s knowledge of natural variability. It’s knowledge of the natural variability of a system that exhibits inherently different intensity magnitudes. Magnitudes that are non-randomly distributed.

Pat Franksaid

Well the paper is about sensor errors. How about the errors that occur for about 70% of the surface area; which happens to be water. I don’t see any attempt to correct the erroneous data that was gathered prior to 1980, which in 2001 (Jan) was found to not correlate with the Temperatures that are measured on land ie from the atmosphere rather than from the water temperatures, as were the older oceanic data readings.

These global Temperature anomaly proxies have more changes than any carillon on earth.

Pat Franksaid

1) that Section 2 of paper 1 comprises a derivation of statistical cases, and nothing else?
2) that none of the reported measurement error variances is contaminated by, or includes, weather noise?
3) that (+/-)s is never, ever represented as a measurement error?

Pat Franksaid

#181 Lasandra, you’re quite correct. Systematic error in SSTs is mentioned in passing on page 983 of the 2010 uncertainty paper under Section 3.2.4, but the subject is not explored in any detail.

I’ve been looking at SSTs since, though, and it’s pretty clear that the model used for bucket correction is unable to remove the uncertainty from past recorded sea surface temperatures. Even recent satellite SSTs are calibrated against buoy temperatures that are probably not accurate to better than about 1 C.

Jeff Idsaid

The discussion has been fun and I’ve enjoyed it to this point. I certainly didn’t expect it to get so wild when you asked to put the paper here but it seems a little unfair for me to have a nice guy put up his paper and continue beating it up.

I don’t know how to say it to you more clearly. The station spread IS weather noise and IS not a strong source of total uncertainty. It doesn’t matter that you define it verbally or mentally as ‘not’ weather noise or ‘not’ measurement uncertainty. You have the idea stuck in your mind that I am mixing up measurement uncertainty with total uncertainty, when it is actually you. You have suggested that I and others can’t separate temperature from our minds and we are not objective about it when in reality it makes no difference. Like the cup example, the spread in station values is almost completely unrelated to any certainty. I wish it were simpler so I could say it is NOT related to uncertainty at all, but there is uncertainty caused by incomplete sampling of this spread (a missing/extra cup) and that uncertainty is affected by the magnitude of the spread.

Imagine a billion sensors perfectly distributed and calibrated taking perfect measurements, or a trillion. How well would you claim we know the temperature of the gas then? It is still about knowledge of the mean temperature of the gas right? Your method would still provide the wild error bars yet we would be able to detect the thermal signature of a cow fart in Zimbabwe. I do work in engineering and I have to say that there is no such thing as magnitude uncertainty according to your definitions.

And again, you should retract any discussion of trend knowledge as those conclusions are completely unsupported by the math.

—-

As a last attempt to explain the problem, in your most recent reply to my objections above you wrote:

“Magnitude uncertainty is not a source of error. Magnitude uncertainty is not an error in knowledge. Magnitude uncertainty is positive knowledge. It’s knowledge of natural variability. It’s knowledge of the natural variability of a system that exhibits inherently different intensity magnitudes. Magnitudes that are non-randomly distributed.”

‘knowledge of natural variability’ As I have pointed out, your calculation is an estimate of the probability of an individual station deviating X degrees from the mean. It is not the ‘magnitude uncertainty’ of the global mean temperature.

Pat Franksaid

1) Figure 3 in paper 1 includes the systematic measurement error of an MMTS sensor operating under field conditions, as reported by Hubbard and Lin in their Figure 2 of reference 13 in paper 1. The (+/-)0.46 C uncertainty bars in paper 1 have nothing to do with magnitude uncertainty.

2) You’re an engineer. You measure the resistances of 100 resistors of 100 different ohmages. You calculate the mean resistance of the 100 resistors. You calculate the standard deviation of the set of resistances about the mean resistance. You conclude the standard deviation represents the variation of the 100 resistances about the mean resistance, but are told that, “those conclusions are completely unsupported by the math.“

Pat Franksaid

Jeff, you also wrote: “your calculation is an estimate of the probability of an individual station deviating X degrees from the mean.”

The calculation producing the bars in Figures 2 through 5 above is sqrt[(global annual anomaly) minus (mean of global annual anomalies)/(N-1)], where N is the number of annual anomalies.

How does that calculation have anything to do with the temperature of an individual station?

I’m actually having trouble supposing the alternative, but if you are referring to the uncertainty bars in Figure 1, above: The systematic errors Hubbard and Lin report represent the probability that an individual sensor measurement will deviate X-degrees from the true air temperature. Not “X degrees from the mean.”

If you really do think X degrees from the mean, then you must not have read the discussion of the results of Hubbard and Lin in paper 1 page 978, under Section 3.2.2. Uncertainty due to systematic impacts on instrumental field resolution.

If you like, I’ll send you their 2002 paper and you can check their results yourself.

Pat Franksaid

H&L GRL paragraph [1]: “Air temperature measurement biases caused by solar radiation and ambient wind speed are well-known. … This study investigated the air temperature biases caused in the commonly used radiation shields in the United States, including the ASOS, MMTS, Gill, CRS, ASP-ES, and NON-ASPES shields along with corresponding temperature sensors.”

Legend to Figure 2: “Statistical distributions of (a) daytime air temperature biases, (b) nighttime air temperature biases, and (c) overall air temperature biases for all air temperature systems in the measurements.”

The bulk of the uncertainty bars in head post Figure 1 is the MMTS measurement bias, i.e., the systematic error.

Systematic biases collected with reference to a standard accurate temperature (provided by the RM Young probe), are not distributions about a mean temperature. They are distributions around a mean systematic error.

2) You’re an engineer. You measure the resistances of 100 resistors of 100 different ohmages. You calculate the mean resistance of the 100 resistors. You calculate the standard deviation of the set of resistances about the mean resistance. You conclude the standard deviation represents the variation of the 100 resistances about the mean resistance, but are told that, “those conclusions are completely unsupported by the math.“

This is an ok example except that we know that resistors have a bimodal distribution due to the subsampling for higher accuracy components from the center distribution. Assuming though that we are testing a truly normal distribution, how accurately would you know the mean?

The standard deviation in this case would be the likelihood of an individual measurement to deviate x ohms from the mean. By the central limit theorem we know that the mean would be known within m* sigma over sqrt n where m is the number of standard deviations you wish to establish as your arbitrary value. Your paper’s analogous result indicates the likelihood of an individual resistor value deviating from the mean.

As to the last bit in quotes, this really bothers me. You need to understand that knowledge of ‘trend’ is not the same as knowledge of a single value. Your work only claims to establish the knowledge of a single value so that ‘trend’ conclusions are unsupported and not even remotely addressed by the math.

Pat Franksaid

Jeff, you wrote, “Your paper’s analogous result indicates the likelihood of an individual resistor value deviating from the mean.”

No, it does not. Somehow you and Lucia have decided that the error bars represent the standard deviations about a temperature mean.

They do not.

They represent the standard deviation about the error mean.

Most of that error is the systematic error recorded by Hubbard and Lin, 2002 (paper 1, ref. 13), as the difference between the temperature measured by a test sensor and the same temperature simultaneously measured by a high-accuracy reference sensor.

I’ve been very clear about this. It is described in detail in paper 1, Section 3.2.2.

You and Lucia are not correct. The uncertainty bars do not represent what you claim they do.

And just to reiterate: Case 2, as such, was not applied to any of the variances calculated in either of my papers. Nor was Case 2, as such, applied to any of the standard deviations appearing in the head post figures.

Lucia’s focus on Case 2 is entirely irrelevant to any of the calculated variances.

Jeff, you also wrote, “Your work only claims to establish the knowledge of a single value so that ‘trend’ conclusions are unsupported and not even remotely addressed by the math.”

No, it doesn’t. My work claims to estimate a lower limit of systematic error in any and every given global average annual anomaly anywhere along the 160 year time series. Paper 1, page 981ff: “The meaning of an ideal lower limit of measurement uncertainty provides that it is of lower magnitude than the uncertainty in each and all of the other homologous single-station measurements, worldwide.”

Global application of the lower limit estimate is addressed by the math on page 982, equation (11)ff. The uncertainty bars represent the standard deviation of the estimated lower limit of systematic measurement error resident in every single global average annual anomaly.

The systematic error bars are relevant to every single global average anomaly in the trend, and constitute a band of globalized measurement uncertainty about the trend. The bars represent the lower limit of systematic error in the accuracy of the anomalies making up the trend.

No one knows those numbers better than the systematic measurement error allows. You’re arguing that the trend can be known more accurately than the accuracy limit of the anomalies of which it is constituted.

Pat Franksaid

In order to make the resistor example relevant to the systematic measurement errors calculated in paper 1, the engineer would have to use two Ohmmeters and take two sets of resistance measurements on 100 resistors of different Ohmage.

His/her goal is to calibrate one of the two Ohmmeters. One Ohmmeter, the standard Ohmmeter, is a high-accuracy meter. Its measurements represent the “true” resistance standard. The second Ohmmeter, the test Ohmmeter of which s/he wants to know the accuracy, produces another set of resistance measurements, the test measurements, from the identical set of resistors. Multiple measurements of each resistor reduces random error to low levels in each final resistance measurement. Any remaining significant error is systematic error.

These 100 difference-resistance values, delta_Ohm(s), represent the systematic measurement errors of the test Ohmmeter. The errors are binned into frequencies showing how often errors of a certain magnitude occur in each unit of delta_Ohm. A plot of the binned errors yields a distribution of the frequency of error across the magnitude of error.

The distribution yields a mean error and an error standard deviation. The mean and the standard deviation represent a measure of the accuracy of the test Ohmmeter.

If the test Ohmmeter was used to measure resistances in the engineer’s lab, those resistances will not be known to better accuracy than the systematic error of the Ohmmeter. Any trend of lab resistances measured by that test Ohmmeter, e.g., resistance change with temperature, will not be known to better accuracy than the systematic error of the Ohmmeter.

That’s the experiment Hubbard and Lin did with air temperature sensors. They tested well-sited, well-maintained sensors against a high-accuracy standard sensor, and collected tens of thousands of simultaneous air temperature measurements across months of experimental time. They found the systematic temperature errors produced by the test sensors and plotted their frequency of error magnitudes.

The result is a mean systematic error and a standard deviation of that error for each test sensor. The experimental mean systematic error and the standard deviation of that error represent the systematic error of that sensor and indicate the accuracy of the temperatures that sensor will measure under ideal field conditions.

An error measured under ideal field conditions represents a lower limit of systematic error, and that is the error I propagated through the calculation of a global average air temperature anomaly. The propagated systematic error is what plotted on the trend of global average air temperature anomalies in Figure 3 of paper 1, which is also Figure 1 of the head post.

Pat Franksaid

Jeff, I didn’t try to bother you by quoting you in #185. I was just responding to your comment that, “I do work in engineering and I have to say that there is no such thing as magnitude uncertainty according to your definitions.”

And yet my consistent definition of magnitude uncertainty, the one to which you were responding, is the standard deviation about the mean of observables that differ inherently in magnitude. This is exactly what you then claimed, as an engineer, doesn’t exist.

So, the resistor example just demonstrated magnitude uncertainty, as I have always defined it, and that you claimed didn’t exist.

Pat, I get what you are saying about systematic error. If you have a bunch of sensors with the same bias then you don’t get a 1/sqrt(n) reduction in your uncertainty by adding more of the same sensor.

In that case of biased sensors you’d average over the unknown bias predictors (wind speed, pressure, insolation) to get an uncertainty distribution for the mean (as you point out, this distribution could be sharpened if you use knowledge of those predictors). Here’s a question though (it’s not rhetorical, I really want you to answer): if the predictors (weather) varied (in space) in an uncorrelated way at each station, would you expect the systematic error to reduce with the normally expected 1/sqrt(n) behavior?

I think if the unknown bias predictors varied in a perfectly correlated way then you could have an infinite number of sensors and not have any reduction in the uncertainty of the mean (this is the result in your paper right?), and if with some less than perfect correlation, then each additional sensor buys you something less than [1/sqrt(n)-1/sqrt(n+1)]. This seems like a pretty standard result. I think it’s your claims about trends in time that are nonstandard.

We are not getting anywhere. The errors in the early equation uncertainties have been pointed out, now we can move on to 3.2.2.

Every value you give in the paper is calculated one month at a time. This does NOT equate to knowledge of trend which is a different statistical test. You ask if the knowledge of trend can exceed knowledge of systematic error in a single measurement, the answer is – yes. I would also point out again that you make an assumption in your equations that combination of systematic error does not follow the central limit theorem. This is flatly contradicted in Figure 1 and 2 section 3.2.2.

“Any trend of lab resistances measured by that test Ohmmeter, e.g., resistance change with temperature, will not be known to better accuracy than the systematic error of the Ohmmeter.”

If the systematic error were of known normal distribution based on known factors such as wind, sun loading etc. on your multimeter, you certainly could measure the resistance multiple times and get a mean to greater accuracy than the known systematic error of the ohmmeter. The problem here is that your definition of systematic, is interpreted in your equations as an unrealistic worst case. This is demonstrably false as shown in Fig’s 1 and 2.

Again from your own work :

All these systematic errors, including the microclimatic effects, vary erratically in time and space [40-45], and can impose nonstationary and unpredictable biases and errors in sensor temperature measurements and data sets.

And more importantly:

found to originate principally from solar radiation loading and wind speed effects

So I will point out here that any ‘systematic’ errors primarily originating from sun/wind will not create long term trends on sensor stations. They cannot as the systematic biasing both increases and decreases over daily timescales. Over periods of 30 years (or likely quite a bit less), the systematic nature of error in a daily signal becomes moot. A systematic error with a high frequency and normal distribution will result in an ever more accurate determination of the mean over time. Also, I have to point out that Zeke’s method takes the sun and wind loading into account.

No. Systematic measurement errors (i.e. biases) are those that do not average out over repeat measurements. So, the difference between the mean over the 100 measurements from the single ohmmeter and the measurement from the calibration source is an estimate of the systematic error for that ohmmeter.

The standard deviation of the 100 measurement errors tells us nothing about the magnitude of the systematic error. It tells us something about the magnitude of the random error from that Ohm-meter.

The errors are binned into frequencies showing how often errors of a certain magnitude occur in each unit of delta_Ohm. A plot of the binned errors yields a distribution of the frequency of error across the magnitude of error.

The histogram of errors from a single ohmmeter provides the distribution of errors for single measurements made with a particular ohm meter. But the spread of these errors aren’t the systematic error (i.e. bias) for this ohmmeter because averaging over multiple measurements by the same ohm-meter will reduce the magnitude of the random error at a rate proportional to 1/sqrt(N). With enough measurements, the random component of the error can be driven down and only the systematic error remains. But it’s important to know that the spread arising from the random component of the measurement error is not the “systematic error”. The fact that there is a spread demonstrates that this error is not the systematic error!

The systematic error is estimated by finding the average error over all measurements in your histogram. This can be estimated in a calibration experiment by comparing the average measurement from repeat samples with an ohmmeter to the calibration standard. But the spread in individual measurements does not give the systematic error (i.e. bias).

Wouldn’t you agree that thermometer systematic error by wind and sun can have a relatively high frequency and therefore negligible effect relative to the 30 yr signal mean? It is still non-random, yet has a demonstrably systematic effect that won’t affect trend appreciably.

First: I agree that thermometer error by wind and sun can have relatively high frequency and therefor negligible effect relative to the 30 yr signal mean. So these have negligible effect on determination of the trend. (Note: I took out the adjective “systematic” which Pat introduced and is causing problems. I have done this because it is the specific properties of the data– e.g. high frequency and not the adjective Pat wants to apply that dictate whether the error will be negligible.)

Now on to the semantics: The difference is that I’m not sure I agree that I would call these errors systematic in the context of the measurement process for monthly temperature anomalies. I’ll explain this a bit.

First: The reason I limited my focus in 195 to Pat single ohmmeter measurement was to discuss systematic error in a very simple case. Note that in that very simple case, Pat mis-identified ‘systematic error’. Now I’m going to go on to the more complicated one and discuss why I think he is mis-identifying what ought to be called ‘systematic error’ here too. (And once he labels it wrong, he then applies math that might make sense if the label was correctly applied but makes no sense to a set of measurements with the properties of the time series at hand. You are correcting identifying the treatment.)

On to ‘systematic vs. not systematic’.

In the context of repeat measurements of temperature under constant wind/sun etc. conditions, the mean of multiple measurements from a single thermometer in a single installation may contain some residual error. This error can be viewed as systematic. That is: A portion of the error could easily be a function of those specific wind/sun etc. conditions.

A remaining portion could be mis-calibration of the thermometer. We can call that latter fundamental systematic error; the latter could be measured under ‘perfect’ conditions in a laboratory. (I can’t think of a good adjective for the former which arises from no-ideal installation in the field. Maybe “installed parameteric error?”). Both contributions to error are determnistic from some point of view– but your question pertains to the former, which I think in any case is the error Pat is suggesting is large.

Viewing the error due to wind/sun conditions as systematic would be the correct view if someone was doing a heat transfer study with measurements taken over the course of 15 minutes on a particular day. From this view, any error that arises at the present temperature, insolation etc. would be systematic– it would not average out over the collection of temperatures taken during that 15 minutes. If that’s the only data you have and you can’t average over other data, then you would like to know or estimate how large this systematic error might be.

So, in a sense, this error arising due to sun/wind conditions is systematic. You would want to know how this affected your Nusselt number v. Reynolds or Rayleigh number correlation etc and you need to treat that as systematic to the data point you collected.

If one repeated the experiment the next day or at a different site or slightly different wind conditions and possibly different insolation etc, that set of data would have an error that could not be averaged out of that data used on its own. Once again, you have systematic error in determination of the temperature which cannot be averaged out over that day. If you could vary wind conditions, you could begin to assess the magnitude of systematic error in the individual data points.

But now let’s change the experiment. Suppose your experiment involved holding something at a set temperature (which you also measure) and you can vary other conditions intentionally. That is: instead of having a set wind speed or insolation level, you shade the test section, vary the wind etc. You now take 100 measurements but each is at a different wind speed, insolation etc. Moreover, you aren’t actually studying the effect of wind speed, insolation etc. on the temperature. You just permit these to vary randomly. You are just measure temperature 100 times.

You now have 100 temperature measurements, each with some errors. But notice that from the point of view of this experiment, quite a bit of the ‘installed systematic error’ due to wind speed, insolation etc are effectively random. It is true that they are parametrically deterministic– since the error is a function of wind speed etc. But over the course of the experiment, these factors varied randomly. Some will result in readings that are too high; some too low.

So, the error due to these factors varies, and if you average over multiple measurements, you will reduce the error. What you end up with is that the systematic error for this themometer installation is not equal to the spread of errors due to wind/insolation etc. It is smaller. To estimate the magnitude of this, you have to do a calibration experiment in situ with side by side temperature readings and compare the readings from the experimental test rig to the system you believe to be “true”.

Now, supposing that your goal is to measure the monthly mean temperature using your test rig, you would compare a series of monthly mean to monthly means from the calibration set up and find out if there is a bias in the mean of the monthly means from the test rig. That’s what I call the systematic error for this experiment. And I think I am using the term correctly in the context of the time series of montnly means.

Notice that in this context, what I call systematic error is unlikely to have high frequency components. If there is no drift, the systematic error will be constant (or at worst a function of time or year– like “july”.) In both these cases it would be taken care of by the anomaly method.

If the ‘systematic’ error has a slow drift that can introduce a problem with determining a trend. But note this is not the error Pat Frank is discussing. Moreover, everyone is well aware of the problems of drift, which when measuring monthly temperature avearges can be due to changes in the conditions of or around the stevenson screens.

Finally: If we call the daily errors due to different wind conditions “systematic error”, then, of course, these errors cancel out when determining the trend. The reason is that those errors have some random distribution. Even if the probability distribution function is unknown and totally weird, the random component of this error will average out over many measurements and so, if you have many, they will not affect determiniation of the trend.

Normally, this sort of random error is not called “systematic”. Pat’s labeling it “systematic” (possibly because it could be seen as systematic in a different sort of experiment) does not cause this sort of randomly varying error to magically assume the properties of a bias error when, in fact, it is a random error.

So: The short answer is “I agree with you these have no effect on the trend”. Longer answer involves quibbling about Pat mis-labeling these errors “systematic errors” and then afterwards arguing that based on the incorrect label he applied, the random componenet of these error somehow stop acting like random errors and must be treated as bias errors. Which they are not in the context of the statistical treatment we are applying when trying to determine a long term trend over many decades.

Pat Franksaid

#195, Lucia, you wrote, “Pat Frank, … Part of your difficulty is that you seem to mis-clasify random errors as systematic errors. … No. Systematic measurement errors (i.e. biases) are those that do not average out over repeat measurements.”

In reading my #191, you appear to have missed this: “Multiple measurements of each resistor reduces random error to low levels in each final resistance measurement. Any remaining significant error is systematic error.”

That obviates your diagnosis and your analysis but reiterates a penchant, noted elsewhere, for careless reading.

In one country at least, the daily temperatures can be affected by ‘subjective adjustment’. One possible form is to delete an anomalouly high day in the midst of a mob of regular days and report it as missing data. Now, when the adjusters come along, they might easily fill in a value, for ease of calculation, that is close to those around them. This is an uncertainty and it adds to the magnitude of uncertainty. It is conceivable that it is a bias in one direction, so it won’t even out through the randomness of repeated examples or the (sometime) wrong use of the theory of large numbers or the central limit theorem.
A few years ago I looked at infilling by taking a similar month from another year nearby and just doing a cut and paste. (Sorry about the table – I’m part way between 32 bit XP and 64 bit Windows 7 and not all transforms are ready yet). As I wrote in 2008

”An example illustrates some infilling difficulties. It was chosen from the Broome data because the year 1959 seemed anomalously hot. Reasoning that this might be from an extended summer into autumn, maximum daily temperatures the month of May 1961 (close to an average year) were subtracted in turn from May 1959, the hot year. The following table shows the extremes of departures that are within the data, in this case without looking for an extreme example. The range of 18 degrees Tmax difference in 18 seasonal days is daunting. If data were missing in May 1959, it would not be accurate to replace them with data from May 1961, except that the monthly average would not change a great deal.

The mean of the two months from different years is similar, so a stats calculation on a time series of months would give much the same answer for uncertainty.
However, and this is the point to be stressed, a stats analysis of a string of days would be hugely wrong, especially if you used the Tmax values with the Tmin values to get a Tmean, or if you were looking at trends.
The point is that the type of uncertainty that has to be considered in real life goes beyond a perfunctory analysis of a simplified, synthetic system. I’m using úncertainty’ so you don’t confuse it with standard deviation or the like. The uncertainty that you can live with depends on the use to which it is to be put.

These 100 difference-resistance values, delta_Ohm(s), represent the systematic measurement errors of the test Ohmmeter. The errors are binned into frequencies showing how often errors of a certain magnitude occur in each unit of delta_Ohm. A plot of the binned errors yields a distribution of the frequency of error across the magnitude of error.

This is simply wrong. I didn’t say you always mistake systematic errors for random ones. I only said you do so. You do do so in some instances. That you sometimes don’t make the mistake doesn’t obviate the need for you to stop mis-diagnosing systematic errors in other instances.

Feel free to do all the work you like before replying. But when you do, look through what you write and if you don’t intend to describe a particular error which is clearly not-systematic, don’t describe it as systematic. Mixing blunders with correct things and then applying the blunder when doing follow analysis is not an appropriate approach. Complaining that people didn’t notice the correct concept which you express you later abandon when doing further work is silly.

I am not a statiscian, so I admit that I am semi-lost at times in these discussions although I think I get the gist of most of it.

Years ago, I read about a moon the circled one of the large planets (either Saturn or Jupiter). This moon appeared to have a molton core like Earth, but it was at the time presumed to be dead and cold, so the scientists were schocked by the finding. Because of the molten internal mass, the scientists realized that that moon was geologiocally active. Due to its molten core, this moon also had a magnetic field. Scientists determined that this moon had these charactersitics because of the gravity inflicted upon it by the host planet, if not by other planetary bodies as well.

It was not an expected ifinding.

In other words, the gravity inflicted created friction in this moon which, in turn, created internal heat.

Because science is all measurement, can one of you scientists explain to me whether or not something like this might occasoinally happen to the earth.

We can explain major ice ages and major warm periods throughout the earth’s history by slight changes of earth’s axis, or position of the land masses, or by position of the earth’s orbit.

But what about the minor changes like those of today or the Medieval warm period?

Is it possible that the scientists and statistcians are missing something really important?

Pat Franksaid

#200, Lucia, how is it “simply wrong,” that the difference remaining between an accurate measurement standard and a test measurement represents the systematic error of the test instrument — after both measurements were repeated sufficiently to remove random error?

where the two magnitudes represent the respective means of multiply-repeated measurements of the identical observable taken under the same conditions by a test instrument and an accuracy-standard instrument, respectively,

Lucia, how is it “simply wrong,” that the difference remaining between an accurate measurement standard and a test measurement represents the systematic error of the test instrument — after both measurements were repeated sufficiently to remove random error?

Please reread your statement which I quoted and will now requote and afterwards engage my actual criticism. The reason I wish you to reread your statement I quoted and criticism doesn’t match what the statement you are trying to make me defend.

These 100 difference-resistance values, delta_Ohm(s), represent the systematic measurement errors of the test Ohmmeter. The errors are binned into frequencies showing how often errors of a certain magnitude occur in each unit of delta_Ohm. A plot of the binned errors yields a distribution of the frequency of error across the magnitude of error.

In the quote above, you don’t discuss the difference between the accurate measurement standard and a test measurement and call that the systematic error. Instead, you discuss the spread in the resistance of 100 resistors and call this spread the systematic error.

Reality is that the ‘difference-resistence values, dleta-Omns from 100 different resistors each with a different resistance do not resent an error between some “true” resistance of some hypothetical resistor and a mismeasurement of that resistor. At least the way you wrote it, these are 100 different resistors each of which can have a different resistance from each other and from some reference. So, the spread has absolutely nothing to do with “systematic error”. It is merely the spread in resistance in a batch of 100 different resistors. It is not a systematic error.

You ought to admit that what you wrote is simply wrong. Ignoring that you wrote this and claimed this was a systematic error when it manifestly is not a systematic error is rather pointless son your part. claims it is a systematic

Either you didn’t mean what you wrote about the resitors or you are just totally beffudled

If you are going to continue this, I ask that

1) you engage my criticism instead of asking me to defend something I did not say.
2) Tell us if you really believe the spread in resistance of 100 different resistors is actually “systematic error” and
3) finally, since the answer to (2) must obviously be “no” admit you either said something you don’t mean or try to explain why it is systematic error to us all.

(If you chose the latter course on (3) bear in mind that you are simply wrong.)

Doing 1-3 can get us somewhere. But pretending I haven’t been absolutely specific that it is your application to the example I am criticizing when I keep quoting and requoting your example? Sorry. But just bizarre.

Pat Franksaid

By the way, in re-reading the various posts, I noticed again your comment, Lucia, in post #139 that, “First:It is an error if you actually say that natural variability [i.e., (+/-)s — P] is an error attributed to the mean which you do in your paper. You do so repeatedly. (emphasis added)”

In contrast, nowhere, repeat nowhere, in either paper did I ever describe or represent natural variability or (+/-)s as “an error attributed to the mean.” It’s always represented as an uncertainty due to inherent and natural, i.e., physical, variability in magnitude.

Yet one more bit of careless reading, followed by the imposition of a new and wrong meaning on my text: this time that uncertainty = error. It does not.

Pat Franksaid

#203 Lucia, here’s how I described the calibration experiment in my #191: First sentence: “In order to make the resistor example relevant to the systematic measurement errors calculated in paper 1, the engineer would have to use two Ohmmeters and take two sets of resistance measurements on 100 resistors of different Ohmage.”

The second paragraph sets up the experiment: “One Ohmmeter, the standard Ohmmeter, is a high-accuracy meter. Its measurements represent the “true” resistance standard. The second Ohmmeter, the test Ohmmeter of which s/he wants to know the accuracy, produces another set of resistance measurements, the test measurements, from the identical set of resistors.”

Let’s divide that into steps to make it completely clear.

We have:

1) 100 resistors known to be of different Ohmage

2) High accuracy Ohmmeter: accurate measurements of the 100 resistors = O_a_i, where O_a_i are the accurate Ohm readings and i = 1 to 100 representing the resistors in item 1).

The description goes on: “Multiple measurements of each resistor reduces random error to low levels in each final resistance measurement. Any remaining significant error is systematic error.”

So, 4), multiple repeat measurements of each resistor using each Ohmmeter reduce random error to low levels. Each final O_a_i and O_t_i represent the mean value of multiple measurements of each of the 100 resistors and, in each O_a_i and O_t_i mean resistance value, the random error is small.

In the remaining description, the 100 delta-Ohm values are magnitude-binned and the frequencies of the binned errors are plotted versus the magnitudes of the errors. The error mean is the average systematic measurement bias of the test Ohmmeter and the standard deviation yields the statistical width of the systematic error.

Now let’s see how you described the experiment: You wrote that,”In the quote above, you don’t discuss the difference between the accurate measurement standard and a test measurement and call that the systematic error. Instead, you discuss the spread in the resistance of 100 resistors and call this spread the systematic error.”

First: my experiment isn’t fully described in the quote you provided.
Second: your description of the experiment requires that we remove item 3), item 5), and item 6) from the experiment the way I described it in #191.
Third: even the part you did quote includes this: “the engineer calculates the measured (test resistance minus accuracy standard resistance) and gets delta_Ohm(s); 100 difference-resistances. (bolding added)”

And immediately after quoting that, you wrote, “In the quote above, you don’t discuss the difference between the accurate measurement standard and a test measurement and call that the systematic error. (bolding added)”

Compare the statements, Lucia: what you claim I didn’t discuss is exactly what I did discuss.

And then you wrote that the experiment I described, “… is merely the spread in resistance in a batch of 100 different resistors. It is not a systematic error.” when in fact the experiment is not merely a spread of resistances, but in fact is the difference between a calibration standard resistance and a test resistance. It is a systematic error.

Pat Franksaid

Just an observation to add to #205: Lucia, you’ve made the same mistake interpreting the experiment described in #191 as you’ve made previously and insistently about the systematic measurement error bars in Table 1 and Figure 1 (and propagated into Figure 3) in paper 1.

You have taken the standard deviation around the mean of systematic error — i.e., the standard deviation around the low-random-noise mean of the difference measurements between an accuracy standard and a test sensor — and mistakenly supposed it to be merely the standard deviation of the mean of a set of measurements.

That this is your supposition is shown in your post #100, for example, where you wrote that, “… in ascertaining how well we know the true average of the multiple values, you cannot include the variance of the ‘real’ temperature. … As I read it, Pat’s paper implies that this variance, which exists in nature, is in fact error in accurate knowledge of temperature when really it is the variance in the correct knowledge of temperature.”

Lucia has the same supposition, as is shown in the very next post #101, where she agrees with you: “Precisely. Mind you: The variance in the weather can be interesting…”

However, paper #1 Figure 1 does not show the standard deviation around a measured temperature mean. It shows the standard deviation around the systematic error mean — the mean of the systematic error of temperature sensors obtained by a careful calibration experiment. The systematic error is the difference — the delta_C — between the air temperature measured by a test sensor and the identical air temperature measured by the accuracy-standard sensor.

“The systematic measurement errors originating from the field exposure of the Min-Max Temperature System (MMTS), Automated Surface Observing System (ASOS), the Gill shield, and other commonly used electronic temperature sensors and shields have been investigated in excellent detail by Lin and Hubbard [35] and found to originate principally from solar radiation loading and wind speed effects.
…
“Under ideal site conditions Hubbard and Lin recorded thousands of air temperatures using MMTS, ASOS, Gill, and other sensors and shields [13, 35, 42], and compared them to temperatures simultaneously measured using a calibrated high resolution R. M. Young temperature probe with an aspirated shield. For each recorded temperature, the measurement rate was 6 min.^-1 integrated across 5 min., reducing the random noise in each aggregated temperature measurement by 1/sqrt(30). The temperature data thus consisted primarily of a bias and a resolution width relative to the “true” temperature provided by the R. M. Young probe. Figure 1 shows…”

The errors reported in paper #1 Figure 1 (and Figure 3) obviously do not represent the standard deviation of weather noise about a mean of measured temperature.

They represent the standard deviation of the difference-errors between a test temperature measurement and an accurate calibration standard measurement of the identical temperature. The units are delta C, and all weather noise has been subtracted away as the difference between two simultaneous air temperature measurements — test minus standard.

In paper #1:

Table 1, Figure 1, and Figure 3 do not show standard deviations of weather noise.

Table 1, Figure 1, and Figure 3 show standard deviations around the mean of systematic error.

For those unfamiliar with the term, the mean “bias temperature” represents how far off the measured temperature is from the correct temperature, on average.

The standard deviation around that bias temperature represents the statistical width of the systematic error envelope.

The bias temperatures and systematic error standard deviations are reported in paper #1 Table 1. The reported frequencies of the systematic errors for the three test sensors are plotted in Figure 1, and those for the MMTS sensor are propagated into Figure 3.

Jeff, from the very start your criticisms and Lucia’s have rested on a mistaken supposition that I merely calculated weather noise in my paper and represented the weather noise variance as measurement error. Your supposition, and Lucia’s, is wrong. I presented the systematic error — bias temperature standard deviations — derived from precise calibration experiments.

As a consequence, your criticisms and Lucia’s have been misguided right from the beginning. Your criticisms have concerned an imagined calculation that is nowhere in paper #1.

Therefore, your comment in post #194, that “The errors in the early equation uncertainties have been pointed out,…” is also wrong because your criticisms have never concerned themselves with what actually appears in paper #1.

I have pointed out that the early equations are there only to establish a statistical context for the assessment of Folland’s estimated average read-error in Section 3.1, and not just once but over and over again. And again. And again, and …. But this never seems to make an impression. You just go on misinterpreting what is very clear.

In fact, Jeff, it’s not until your post #194 that you finally address something that actually is in the paper, namely systematic sensor error.

In the remaining description, the 100 delta-Ohm values are magnitude-binned and the frequencies of the binned errors are plotted versus the magnitudes of the errors. The error mean is the average systematic measurement bias of the test Ohmmeter and the standard deviation yields the statistical width of the systematic error.

What you write is still wrong or if you don’t like the word “wrong” it is merely totally inept. You suggest a step of ” magnitude-binned and the frequencies of the binned errors are plotted versus the magnitudes of the errors.” A more compact way to say this is “create a histobram”.

Then you suggest “the standard deviation yields the statistical width of the systematic error”. But this would be an idiotic conclusion if you actually collected data from 100 resistors.

First: If all 100 resistors had the same resistance, getting a distribution would tell you that you did not take a sufficient number of tests to reduce the random error to zero. So, in this case, the spread is not the systematic error. (But I think this is not the case you mean to discussion.)

Second: If the 100 resistors have different resistances, that spread is still not the “statistical width of the systematic error,” or it isn’t unless you lack sufficient insight into systematic error and, consequently inflate what you interpret as ‘systematic error’ by blinding yourself to standard ways of using 100 resistors to appropriately diagnose systematic error (or do a calibration.) (For that matter, the “error mean is the average systematic measurement bias” of the 100 resistors is a pretty bumblheaded way to diagnose systematic error.)

Assuming you have truly done enough repeat measurements with the low accuracy ohmmeter and done them appropriately (so as to avoid the first issue) , the systematic error at a given resistance can be determined using 1 not 100 resistors. With one reisttors, you now have the systematic error “e1” at a particular resistance “om1′ (e1,om1). e1 is the systematic error (if you’ve done this right.)

That’s it. You you now know the systematic error of that uncalibrated (or imperfectly calibrated) ohm-meter at that resistance.

Because systematic error lies with that ohmmeter, at that resistance the using 100 resistors provides no benefit over using one. (Or, the only benefit it might provide is to show you that you screwed up and didn’t take enough measurements or didn’t vary conditions sufficiently to actually get the systematic error. That is: if you used 100 identical resistors you might discover what you thought was systmatic error was random error.)

So: You get the systematic error of the ohmmeter at that resistance by taking many, many, many measurements (appropriately varying conditions) using that 1 resistor. The difference between this mean and the true value is the systematic error for the ohmeter– at that resistance. There is no histogram involved. There is no “spread over 100 resistors” involved.

But of course, you can learn something by doing a test with 100 resistors. In fact, you might learn something that permits you to reduce the effective systematic error when this ohmmeter is used. You could calibrate the ohmmeter.

If you suspect or worry the systematic error is a function of resistance, then you might want to test using a range of resistances. So, you might use 100 resistors. But you’d have to be uninformed to then bunch up all the data into on slug, decree the difference in the mean the systematic error and the spread in these errors the error mean is the average systematic measurement bias ” the standard deviation yields the statistical width of the systematic error.”

You can them construct 100 pairs of (ei,omi). You can them plot (ei,omi). Every signle point in this series is the systematic error as a function of resistance. If you are untrained, you might look at the plot and say “duh”. Or, you might say, “Hey, let me forget this, create a histogram and call it asystematic error, even though only untrained chimps would do this.”

But the fact that a person who doesn’t use information readily at hand to get a good estimate of the systematic error might do something stupid doesn’t turn the variance of these errors into ‘systematic error” of the ohm-meter. If the systematic error for the ohmmeter was a function of resistance, then the systematic error for the ohmeter is each value at each resistance. That is: it varies with resistance. You don’t get the “correct” value by averaging. If it’s not a function of resistance, the systematic error will be found to be a constant value.

But a more common thing to do with data from a 100 resistor (i.e. something other than the meaurement device whose systematic error you are trying to determine) is called “a calibration”. In this case, you would create pairs of (om-t-i, omi) where om-t-i is the average from multiple measurements of the test resistors, plot that, and find a curve fit through the pairs. Most commonly, one fits a line through the data, thought other choices are possible if they make sense.

That is, you can create “a calibration” for that one test ohmeter.

If you use this calibration to measure the temperature, then

* the distance between the fit and the “correct fit” (which you don’t know) is the systematic error at a particular resistance. By using an infinite number of resistors, you can generally reduce the magnitude of the error in the calibration, but can’t know this value for that ohmmeter.

* If you make the assumption that the “true fit” is linear and you use a linear calibration, then your estimate of the systematic error will (a) be a function of resistance and (b) depend on the uncertainty estimate of the fitting parameters for you calibration. The values required to do this will be spit out by “LINEST”. But these are estimates of the systematic error based on an assumption.

* You can create a histogram of the residuals to the fit (i.e. errors in the resistance). The variance in the residuals to the fit will generally be taken as a symptom of random — not systematic — error that remains because you didn’t really do an infinite number of measurements, and the calibration standard isn’t really perfect alternatively,
* the residuals might suggest that the linear calibration doesn’t work, and and a portion of the residuals is systematic error so
* these residuals– which will be much smaller than the spread you we use to estimate the standard error– will represent the upper bound for the systematic error in the calibrated ohmmeter.

But taking resistance measured over 100 resistors (i.e. the thing measured), lumping these errors all together, and calling the error in those spreads the systematic error of the ohmeter (i.e. the thing used to measure) is either (a) wrong or (b) sufficiently inept as to be indistinguishable from wrong.

Now, if you want to find the “typical” standard error of an ohmeter. (i.e. the device used to measure), then you can do something with steps similar to what you discussed. But the test will involve a large number of ohm meters (i.e. measurement devices). Other details will vary depending on whether you want to find the standard error at a specific resistance, or a standard error in a measurement after calibration.

But the fact is: Your discussion of how to determine the standard error for a measurement device doesn’t work if you mean to be explaining how to find systematic error for an individual measurement device chosen to use in an application and it doesn’t work if you mean to describe the systematic error of a typical measurement device selected from a batch of measurement devices. You are confusing which things need to be varied, and how that variation is involved in determining anything one should interpret as “systematic error”.

In contrast, nowhere, repeat nowhere, in either paper did I ever describe or represent natural variability or (+/-)s as “an error attributed to the mean.” It’s always represented as an uncertainty due to inherent and natural, i.e., physical, variability in magnitude.

B.S. In paper one, you write

The mean temperature, , will have an additional uncertainty, ±s, reflecting the fact that the τ i magnitudes are inherently different

This is attributing ‘s’ as the uncertainty in “the mean temperature”.

You then give a formula for ‘uncertainty ±s, in equation 6. This ‘s’ is the variance. At a minimum this is not “no where”. It is at least one place where you do this.

In fact, you propagate this use into case 3 in paper 1, use it through out paper 1, and it carries forward into paper 2. You application of this is such that whether you can grasp this or not, you have introduced the natural variability into your estimate of the error in the mean.

Numerous people have tried to explain this to you. I don’t know why you don’t get this, but your very long discussions that consistently show evidence of missing an important point, not grasping the point people are trying to get you to focus on and trying to rehash those points no one disputes suggests to me this is all hopeless. Bye.

Richard T. Fowlersaid

Lucia, your efforts have been _greatly_ appreciated because for those of us who have not done much work with statistics, confusion tends to increase with the amount of bafflegab. I am very grateful you stuck with it this long.

Steve Fitzpatricksaid

It is hopeless. It does not matter how clearly you explain it, he will NEVER understand; he does not want to understand. When I encounter someone walking down the street talking to air molecules, I just do my best to avoid contact. Maybe a good strategy in this case as well.

Pat Franksaid

#208 Lucia, if one did the experiment I outlined in #191, one would get a histogram of resistance bias error vs frequency of error for the test Ohmmeter.

Admittedly, the experiment as I described it would also yield information about the resistance bias error per Ohm, also for the test Ohmmeter, over the given range of resistance of the reference resistors. But that’s not a bad thing, is it.

Apart from that, I’d like to thank you for your very muddled reply, with its piquant dash of ad hominem-brand paprika.

Entertaining as that is, Lucia, let’s leave it all aside. You’ve claimed that I’m confused and have presented “the variance in the weather” as an error variance.

Paper 1 has two Tables and three Figures. Please point out which one represents weather variance.

No such thing as “air molecules,” by the way, Steve. Air is neither a discrete compound nor a polyatomic element. Molecules of the air, yes; “air molecules,” no. Just thought you’d like to know when you’re avoiding people on the street.

Pat Franksaid

In my paper, (+/-)s is always represented as an uncertainty. It is never represented as an error.

Even the quote you use as your primo example of error actually uses the word uncertainty.

I’ll reiterate my point yet once again:

nowhere, repeat nowhere, in either paper did I ever describe or represent natural variability or (+/-)s as “an error attributed to the mean.” It’s always represented as an uncertainty due to inherent and natural, i.e., physical, variability in magnitude

except this time I’ve emphasized the important words for you, so that they’re no longer overlooked or misunderstood.

Just to be sure: error, no; uncertainty, yes. Throughout the papers. No exceptions.

Notice that Case 2 and Case 3 specify the statistical condition that the data points have inherently different magnitudes. Guess what an inherently non-zero variance imparts to a mean.

Mark Tsaid

In contrast, nowhere, repeat nowhere, in either paper did I ever describe or represent natural variability or (+/-)s as “an error attributed to the mean.” It’s always represented as an uncertainty due to inherent and natural, i.e., physical, variability in magnitude.

emphasis mine.
lucia replied:

B.S. In paper one, you write

The mean temperature, , will have an additional uncertainty, ±s, reflecting the fact that the τ i magnitudes are inherently different

This is attributing ‘s’ as the uncertainty in “the mean temperature”.

I call shennanigans. This comment by you, lucia, is utter nonsense. He is not stating that s is an “error attributed to the mean” as you have stated. An uncertainty is not the same thing as an error. Full knowledge (error-free) of the measurements of each of the components of a system is not necessarily sufficient for full knowledge of the true mean of those same components. It may be time-varying, or the mean may not exist at all (a Cauchy distribution, for example.)

Thanks for the comment, the Cauchy distribution is what is assumed by the papers. I couldn’t find the definition of the function I was looking for earlier. Worst case systematic errors in temperature sensors by wind and sun loading which have no mean. Zeke’s monte-carlo demonstration from actual data wouldn’t converge to narrower bands if this distribution were appropriate, and the true normal distributions were plotted in Figure 1 and 2 of paper 1. I’ve been struggling to find a way to tell this to Pat but I think you have done it.

Pat,

This is the problem I see in the paper 3.2.2 but couldn’t find the right terminology. The frequency of the systematic error is extremely high in relation to the trend and very heavily oversampled. The assumption of a Cauchy distribution is inherent in your math and like the trend conclusions, there is no math whatsoever to back up the claim. You assumed complete lack of knowledge of this distribution as a worst case understanding, where through your own plots from Hubbard and Lin, it is visually apparent that this is not a Cauchy distribution we are dealing with. Again, Zeke’s method confirms this fact beyond any shadow of a doubt.

“Over long times local temperature excursions due to “weather noise” may average away as [ 1/ sqrt N] , but deterministic monthly temperature trends need not.”

Unfortunately, Pat used the word ‘trend’ when none of the math adresses trend, but the point I’m trying to make is that it is true that monthly temperatures need not converge by 1/sqrt(n) but there is strong evidence that they do converge. Figures 1 and 2 of paper 1 show that the distribution is visually very much normal. Zeke’s method does converge with more sampling. The magnitude of the defined systematic sensor error is large relative to Zeke’s error bars and of high frequency so it is obviously not a Cauchy distribution we are dealing with. Thanks again Mark, I think you may have pushed the discussion forward.

An uncertainty is not the same thing as an error.
Right, measurement error is different than unexplained variation. Your instrumentation gives you the one, your model choices give you the other. Even when you can calculate the mean of a time varying signal with zero error, you are still left with unexplained variation (residuals) about that mean. These would probably be important to consider in time series analysis, maybe they don’t matter to what CRU is trying to report. These are all interesting concepts. It seems like it would be more fruitful to do some more resampling, people seem to have an intuitive grasp of those methods/results.

I don’t know what past history some of you folks on this thread have had with Pat Frank, but the petty rudeness is disappointing.

Mark Tsaid

Perhaps. I have not dug into the paper so specific details are out of my range. What is apparent, possibly, is that both sides are making assumptions regarding distributions that are perhaps unknown (or unknowable) without justification.

Detail is difficult to provide from my phone, but i am actually working a problem now in which variances are not adding, standard deviations are adding (which means no sqrt(N) increase in SNR.) Without i.i.d., basic concepts fail. One must assume worst case and on this point, Pat is correct.

I understand that if you assume you know nothing, worst case is worst case. What I think would clarify it for you is Zeke’s post and figures 1 and 2 of Pat’s first paper. I’ve tried endlessly to point out that we do know the distribution of the ‘systematic’ error, and that systematic error is by Hubbard, of very high frequency and of a visually normal distribution. What I also have repeatedly said is that the uncertainty of a single point in time cannot create conclusions regarding knowledge of trend – you have to take into account how many single points are used in the creation of the trend.

Because the oversampling is so high, for Pat to justify his papers, he would need to define Hubbard’s systematic error distribution as a Cauchy distribution and demonstrate statistically that this is the case. No effort was made to do that analysis and evidence against that conclusion was presented in the form of figures. What’s more, the constriction of the error bars with increased sampling in Zeke’s post is flat out proof that it is not the case. Figures 1 and 2 of Pat’s first paper are strong evidence and it probably wouldn’t take a lot of stats to settle the issue of whether the distribution has a mean.

Mark Tsaid

Could be and without digging further I’m just pissing in the wind. I apologize for not adding more than i have but deep technical analyses are diffcult to muster when you look at the stuff all day i still owe Roman some stuff regarding his filtering problem (i haven’t forgotten.)

Richard T. Fowlersaid

I’m sorry … are we actually now arguing about whether a mean surface air temperature for Earth even _exists_??

I’m sorry but if someone thinks they need statistics (or further “digging”) to believe that Earth has a mean surface air temperature, then something has gone horribly wrong here. And I say that with the utmost “confidence” anyone can imagine.

Earth has a mean surface air temperature because there exists a volume with reference to Earth’s of gravity which contains a non-zero mass and a certain quantity of heat, which volume has a thickness equal to the thickness of atmosphere required to produce a definitionally correct measurement of surface air temperature at any particular latitude and longitude. A more rigorous proof is certainly possible, but not necessary. Anything further would simply make a mockery out of the subject that is under discussion.

I’m sorry if this comment does not meet every reader’s minimum “PC” standard, but I believe it needed to be stated.

Richard T. Fowlersaid

The equations Pat presented mean you don’t gain any better knowledge of the mean by additional measurement. He presents them from the perspective that we don’t know anything about the nature of systematic sensor noise and expands from there. If the worst case is assumed, as you would with no knowledge, additional measurement don’t give you a better knowledge of the mean. I, and others, have argued that additional measures do converge to an improved mean. Zeke’s method which I keep referencing simply grabs random temperature stations and recalculates an average. If the systematic error had no mean, then Zeke’s methods would result in very large uncertainty bars as Pat has shown. Instead they converge to very narrow ones. It would mean that from an instrumental persepctive, our knowledge of global temp in the 1900’s is as good as today.

Pat Franksaid

#215, Mark, thank-you. You’re right. Knowledge of measurement precision is necessary but not sufficient to give full knowledge of a system.

In a system of inherently variable observables, the sort of system to which (+/-)s was invariably attached in my papers, the mean will always have a variance. A description of the system purely in terms of its mean, without recognition of the variance, would be misleading.

Also, in #220 you noted that, “Without i.i.d., basic concepts fail.. This is exactly the central point. When measurement error is systematic, the assumption of i.i.d. is mistaken.

Pat Franksaid

If one assumes the systematic error is i.i.d., one can immediately decrement it as 1/sqrt(N). I see two mistakes there. The first is theoretical: assuming an i.i.d. structure without any analytical justification. The second is empirical: one has decremented an error to yield a false precision.

It’s possible that a given systematic error is i.i.d. or i.i.d.-like, but this must be determined empirically; perhaps by a Monte Carlo analysis of error subsets or by the Quenouille method you mentioned earlier.

As you stated, assuming i.i.d. certainly yields a potentially less accurate error estimate.

But making that assumption in ignorance of the structure of the error is a mistake in and of itself. As an experimental scientist, I would never make that assumption. As a working engineer, it’s a safe bet that you wouldn’t either. There’s no a priori reason to expect any systematic error to be random.

Your assumptions are reversed as is your error. You assumed a worst case noise behavior when there IS anylitical justification not to. That is my point for 3.2.2. Zeke’s post is proof, figures 1 and 2 should leave one with strong suspicion of normal behaviour.

“There’s no a priori reason to expect any systematic error to be random.”

This is correct, but it doesn’t have to be random in order to have a proper mean and standard deviation so it is off point. Your defined systematic error has very high frequency in space and time. You have plotted what appears to be normal type distributions about a mean for this error. Then you assumed an unrealistic worst case scenario for standard deviation and again you made conclusions about trend which are not addressed by the math.

I don’t disagree that you have estimated a zero knowledge upper limit for standard deviation of a point, but there is a ton of evidence that your calculation is way off from the real performance of temperature measurements. For instance, the month to month variability in the global average (which includes actual signal) is far lower than your error bars.

If I were to do this analyis now, I would probably talk with a statistician aobut a normality test on the distribution. We do have enough data. I would then use Zeke’s methods to calculate a pseudosigma which would be slightly scaled by the true non-normality of the distribution. This could be calculated by different sample sizes and that would give a far tighter error bar based on the data’s actual characteristics. From his post you can see though that CRU isn’t far off. Then if I were to estimate trend knowledge, I might model the global average as an AR process having a standard deviation equivilent to my pseudosigma and do a montecarlo analysis with least squares fit for trend error bars. Even with your worst-case sigma, knowledge of trend is far greater than you have stated.

Unfortunately that is what your Tbar- t equations represent, weather noise. So we didn’t make any progress there either. I recognize now that you substitute Hubbards result in place of those equations measurement though if that helps.

I’m sorry to be difficult about this but these look like fundamental errors to me. I’m wrong a lot though😀

steve fitzpatricksaid

Thanks for that clarification, but I am a chemist by training and quite aware of the composition of air (both molecular and atomic). I’m not sure how many beyond you would object to my original description of the odd behavior one sometimes encounters on the street, but perhaps you would be happier with “talking to molecules in the air”.

Pat Franksaid

#233 Steve, you ignored the request to prove your claim by showing a weather noise contamination in any one of the three Figures or two Tables in paper 1. In the absence of your demonstration, your claims of error are empty of content.

#232 Jeff, then please point out the weather noise contamination in any of the variances displayed in the three Figures and two Tables of paper 1.

I see no point in going on to discuss the treatment of systematic error in this paper when you are still stuck on statistical Case 2 — which nowhere mentions weather noise. Not even once.

Pat Franksaid

#232 Jeff, the T_bar minus t_i represents what happens to the statistical expressions when the t_i have inherently different magnitudes. Case 2 is not an analysis of monthly temperature means. It’s about the evolution of statistical equations as the conditional axioms vary.

It’s about the basic statistics of signal averaging. It’s not about real, physical, monthly temperatures.

steve fitzpatricksaid

#234,
I am not the one claiming additional measurements of a non-homogeneous system do not narrow the uncertainty in the calculated average temperature. This postulate is too strange to be credible… as lots of people have been trying to tell you. You are confounding variation of the process (the infamous ‘s’) with uncertainty in the average of the measurements, and whatever specific notation you use to advance this argument (at least in your own mind), does not matter. Take it to the extreme: imagine a high resolution temperature sensor located on for each 100 square meters of the Earth’s surface, ~5 trillion of them, reporting a precise temperature each 1 second, and recorded for 100 years. Do you still claim that the average temperature trend has a large uncertainty? Given enough investment in measurement equipment, we can have whatever level of uncertainty we desire for Earth’s average temperature trend. Your suggestion that there is a large unaccounted for uncertainty in the historical average temperature record because the system has weather variation is just wrong. Of course, I do not expect you will believe me, or anyone else for that matter (your many comments make that clear). Too bad for you.

He substituted a value for his ‘s’ later on that didn’t represent Tbar minus tau – or any of the early equations presented for that matter. You need to change your wording to match this substitution like this–

Take it to the extreme: imagine a typical temperature sensor located on for each 100 square meters of the Earth’s surface, ~5 trillion of them

Then leave the rest the same and you are accurately representing his cliam.

steve fitzpatricksaid

Mark Tsaid

I am not the one claiming additional measurements of a non-homogeneous system do not narrow the uncertainty in the calculated average temperature.

They only do if the distributions (I’m not talking about errors) are i.i.d. and if the mean of the measurement is a) stationary, and b) actually exists (that an actual mean temperature exists is not controversial, as RTF described above, but a mean of the measurement is a different beast.)

This postulate is too strange to be credible… as lots of people have been trying to tell you.

It is not strange at all if Pat is correct regarding the distributions of what is being averaged. If Jeff is correct regarding Zeke’s information, however, then the opposite is likely true. My guess is that the “answer” is somewhere in between, but the engineer in me says to assume worst case until it can be proved otherwise.

Pat Franksaid

#237, Jeff where does weather noise contribute to any of the variances in paper 1? Point it out.

Where in paper 1 do I ever suggest that weather noise contributes to any of the reported variances? Point it out.

As with Steve, if you can’t point these out, your claim of error is empty of content.

Here’s what I actually say about (+/-)s (p. 972): “the uncertainty never approaches zero no matter how large N becomes, because although (+/-)sigma_n should automatically average away, (+/-)s is never zero.”

Does “is never zero” sound anything like Steve’s ‘never diminishes‘?

The axiomatic system is inherently heterogeneous. There will always be a non-zero variance around a state mean, representing the inherent heterogeneity of the system. Nowhere do I suggest what you and Steve purport, namely that, “additional measurements of a non-homogeneous system do not narrow the uncertainty in the calculated average temperature.”

I only suggest that the variance representing state heterogeneity never goes to zero.

I swear, you folks suffer from the worst case of facultative dyslexia I have ever encountered. You seem unable to read what I actually write.

Pat Franksaid

He himself says that his analysis says nothing about systematic error. In your own head post, you wrote that Zeke’s analysis includes, “weather variance, sampling errors, and any other random events which affect measurements. The error bars don’t incorporate any systematic bias… (emphasis added).”

So, Zeke’s analysis doesn’t remotely consider systematic measurement bias errors, doesn’t consider the likely regional correlation of systematic station biases, and does not consider any other sort of systematic error at all.

And then you use Zeke’s analysis of random errors to criticize my paper, which almost exclusively examines systematic measurement bias. By your own testimony your criticism is apples and onions.

Once again: show me where weather noise enters any of my reported paper 1 variances.

Steve Fitzpatricksaid

There will always be a non-zero variance around a state mean, representing the inherent heterogeneity of the system.

Yes, on Earth it is called weather, or longer term processes (ENSO, AMO, etc). Or maybe it is turbulent flow inside a mixed vessel, You are drawing a distinction without a difference. The fact that a complex heterogeneous system varies over time is 100% independent of and irrelevant to the measurement of the mean for the system. The uncertainty in the trend of the mean can be as small as we like it to be, based on our efforts at measurements. If you are trying to say something about the nature of the mean trend, then you need to start worrying about all kinds of different issues (stationarity, causal versus chaotic change, persistence, etc.) But none of those in any way impacts the reality that the trend in mean of a heterogeneous system can be accurately determined if the system is adequately sampled.

Mark Tsaid

#240: Jeff, the first thing I noticed was this statement – “The error bars don’t incorporate any systematic bias,” which seems to indicate Zeke’s results cannot be applied to Pat’s argument as a direct counter, i.e., they are putting forth hypotheses that are not mutually exclusive. I don’t disagree with Zeke’s results in general, however, though I only glossed over the thread.

There will always be the minimum resolution error (+/- 1/2 the smallest unit) that will average out since (you’d expect) most thermometers are measuring 1 C or 1 F, at different times, by different people, with similar methodologies. These are almost always going to be close to i.i.d. so you’d expect to see some sqrt(N) sort of difference using the subsampling approach just from that alone, possibly from other errors. The question, of course, is how much overlap is there with what Pat is doing/claiming?

I have some MATLAB simulation work to get kicked out tonight that I’m avoiding as well, btw. One of those contract jobs that just won’t go away…

Pat Franksaid

In consideration of those for whom the debate here may be confusion, I’ve prepared a concordance of dyslexiana consisting of written points from my paper #1 and the malapropos meanings assigned to them by Lucia, Jeff, and Steve Fitzpatrick. Steve has not been so universally involved, but it’s been clear his sympathies lay with the view of Lucia and Jeff.

The following inapt correspondences have been culled from, and can be found in, thread posts made by Lucia, Jeff, and Steve. The list concerns only climate related malaprops. I saw no point in bringing in analogous examples concerning Ohmmeters and resistors. Some terms have multiple entries because they were given alternative LJS meanings.

Herewith: wherever left-side terms appear in paper 1, LJS insist on applying one or another right hand meaning.

Wherever the concepts in column 1 appear in my paper 1, merely insist on applying the meanings attached in column 2. Then, you , too, will be able to find copious errors in the paper. Where multiple malappropriate terms appear for the same concept, feel free to choose.

NB: my second paper is an entirely different and separate study. It, in part, assesses the inadvertent error entrained by the impoverished physical meaning of the overly simplistic statistical model applied by the CRU scientists to monthly temperature means. Concepts and analyses appearing in one paper cannot be applied uncritically to the other.

Pat Franksaid

#244 Steve, I have no problem with anything in your post. The uncertainties I calculated in paper 1 have nothing whatever to do with magnitude uncertainty or with weather noise.

Case 2 is a statistical case exemplifying a rigorously defined and limited signal averaging concept of intermediate case complexity. Nothing more. It is a mistake to suppose it was applied to any of the variances I presented in the paper.

You’ve now posted four times since I asked you to show which of the two Tables or three Figures of paper 1 is contaminated with weather noise, or a (+/-)s mistake. You still have not pointed out the errors you purport. Neither has Lucia.

In his latest post, I see that Jeff has backed off of his prior claim that I mistakenly presented the variance of weather noise as a measurement error, i.e., “I get that Pat hasn’t included weather noise in his final calculations for Table 1,2 and the figures.” This self-correction is a long time coming.

#249, I’ll talk about it a bit. One thing which has confused a few people is there are varieties of systematic error.Zeke’s method definitely doesn’t include long term systematic errors by UHI and neither does Pat or Hubbard. Zeke’s method does, however, incldude systematic bias introduced by daily factors such as sun and wind loading. It is something that Pat has denied above also, MarkT missed due to lack of time and it seems is a critical point to make in my next post.

Nebuchadnezzarsaid

Jeff, One way to crack open the systematic error might be to consider the Hubbard models and ask what change in conditions (solar radiation, wind speed etc) would be needed to explain 0.8 degrees of warming over the twentieth century.

From what I can make of Pat’s paper, it seems like one of the uncertainty terms comes from the variance associated with the difference between the MMTS and the aspirated instrument which measures the true temperature. Those differences have physical causes.

#251, That is the interesting bit. It seems fairly clear what the answers would be to me from Pat’s paper. See figures 1,2 and then consider how systematic daily weather induced errors would affect Zeke’s monte-carlo test. MarkT didn’t get that far in his analysis.

Richard T. Fowlersaid

Thank you for your replies at 225 and 226. So, if I’m understanding correctly, in order for Pat Frank to be right, there would have to be no correlation between thermometer measurements and the actual temperature at the place and time of each measurement.

If that is what you’re saying, then it shouldn’t take a lot of effort to show that the premise of Frank’s conclusion is false. Why would anyone assume the worst case? Is there any evidence that the worst case has ever existed for an experiment of this nature? It seems that every single thermometer used in the experiment would have to deliver randomly varying readouts over the entire length of the experiment. In this experiment, that length of time would be decades for some thermometers, and certainly years for most of the others.

Zeke, if you are reading this, thank you for your efforts. I think they are key to understanding Frank’s results.

Mark Tsaid

In general there are plenty of reasons (not directed specifically to this discussion.) lack of knowledge of distributions, lack of knowledge of independence, knowledge of correlation, as well as a few other things I’m sure. The sqrt(N) cancellation is a bound requiring very specific conditions be met before it can be assumed true. Without knowledge said conditoons have been met, the next bound is perfect correlation, i.e., worst case. There is nothing in between even if the answer is somewhere in between.

Richard T. Fowlersaid

Thank you for your reply. Your information is interesting; however, speaking as a non-statistician, I feel _quite_ comfortable assuming something in the middle for the purpose of auditing a trend analysis. Which as I understand it is what Pat Frank was trying to do — audit CRU’s trend analysis.

Mark Tsaid

I am actually facing a similar problem at the moment. The variance at the output of a correlator would be N (length of correlator) for i.i.d. data. It is instead about 2.6*N because of the way the correlator works, the i.i.d. assumption is not valid (which i can provr.) The 2.6 is not worst case, either, so i am unfortunately tasked with finding all of the cross correlations. I am lucky in that everything is stationary and 1st and 2nd moments all exist.

Note that both the CLT and LLN require that a mean and variance exist and are finite.

Mark Tsaid

I’m sure you do, but your answer is no better than any if Pat is correct. If so, it is standard to use worst case because then you know one thing: it cannot be any worse. I have more on your question regarding the true mean above but will wait since phone typing is difficult.

Richard T. Fowlersaid

Mark Tsaid

steve fitzpatricksaid

Mark T #253
“The sqrt(N) cancellation is a bound requiring very specific conditions be met before it can be assumed true.”
Yes, but doesn’t the result of Zeke’s approach suggest that those conditions are in fact met in reality (or reasonably so)?

Mark Tsaid

Steve, potentially, yes. At least, it is strong emperical evidence that the answer is nearer to the CLT result than worst case. i condition that with the thought that i can still envision cases that can be misleading.

Mark

PS: solved my peak issue, now i need to solve the off-peak case. It is apparently worse.

steve fitzpatricksaid

I think the important point here is that the kind of additional uncertainty you are talking about (due to not meeting the conditions for sqrt(N) reduction) are not really related to what Pat Frank was proposing. And what is more, reasonable tests (like Zeke’s effort) can be used to illuminate if there is indeed a problem with satisfying those conditions. Pat appears to accept all of the conditions for sqrt(N) reduction of the ‘normal’ uncertainty.

What people are objecting to is not questions like you raise, but rather Pat’s suggestion that there is a whole different kind of uncertainty (never before recognized, with much larger magnitude) which has nothing whatever to do with how people routinely estimate uncertainty.

Pat Franksaid

Temperature plus systematic error would behave as temperature. Someone could decide that systematic error is invariably i.i.d. but that would be an unjustified speculation. I expect you to make the i.i.d. assertion, Jeff. It’s wrong.

Pat Franksaid

#253 “if I’m understanding correctly, in order for Pat Frank to be right, there would have to be no correlation between thermometer measurements and the actual temperature at the place and time of each measurement.”

Pat Franksaid

#262 Steve wrote, “I think the important point here is that the kind of additional uncertainty you are talking about (due to not meeting the conditions for sqrt(N) reduction) are not really related to what Pat Frank was proposing.”

Yes it is really related to what I proposed in paper 1. With respect to the Folland, et al. 2001 subjective read-error estimate, it’s exactly what I proposed. Mark T is the only person routinely posting here who gives evidence of having both read my paper and understood it.

You also wrote, “What people are objecting to is … rather Pat’s suggestion that there is a whole different kind of uncertainty (never before recognized, with much larger magnitude) which has nothing whatever to do with how people routinely estimate uncertainty. (emphasis added)”

And that whole different kind of uncertainty is systematic measurement error, which of course “has nothing whatever to do with how people routinely estimate uncertainty.” And also, of course, systematic measurement error has been “never before recognized“; presumably not by any one including not by any experimental scientist. Do I need to add a “/sarc” here?

Anonymoussaid

“Mark T is the only person routinely posting here who gives evidence of having both read my paper and understood it.”

I have not read the latest work, but I am pretty sure I understand your arguments and I did look at your first stuff back when it was posted over at WUWT (recall I spent far too many syllables explaining why i.i.d. is necessary for the CLT or LLN to apply.) I have not made any decision on whether I think your necessary assumptions are justified, however, but neither have I decided i.i.d. with a mean and variance that exists is justified. Zeke’s study has potential, but I’m not convinced it will pick up what you are asserting.

What got my curiosity piqued was the inadvertent creation of a data set that did not obey the CLT in my day job. Once I worked through the math I understood why, but the parallel to this situation was more interesting. That’s also when I boned up on Cauchy distributions, and perused a paper on detection in alpha-stable noise, which is another interesting distribution in the same class as the Cauchy. Even more interestingly, a normal distribution is in the same class but it has a defined finite mean and variance.

Pat Franksaid

Jeff, in #184 “I do work in engineering and I have to say that there is no such thing as magnitude uncertainty according to your definitions.”

So, in #185, I gave an example of magnitude uncertainty, (+/-)s: “You’re an engineer. You measure the resistances of 100 resistors of 100 different ohmages. You calculate the mean resistance of the 100 resistors. You calculate the standard deviation of the set of resistances about the mean resistance. You conclude the standard deviation represents the variation of the 100 resistances about the mean resistance…”

This example exactly comports with the definition of magnitude uncertainty given in paper 1, Case 2ff: a measure of state heterogeneity.

Now we get Jeff in #248, “magnitude uncertainty doesn’t exist by your definition”

So, according to Jeff, my definition of magnitude uncertainty is wrong, is right, and is wrong again. This is the sort of coherence I have been getting here.

In any case, in a heterogeneous system, even if one makes enough measurements to know the mean to arbitrary accuracy, the magnitude variance of the mean remains to reflect the heterogeneity. This variance, which reflects the heterogeneity of a state and which is how I have defined it in paper 1, is what Jeff says doesn’t exist.

In #189, by the way, Jeff went on to write, “Assuming though that we are testing a truly normal distribution, how accurately would you know the mean?”

There was no assumption of normal distribution in my example and the accuracy of the mean was not at issue there.

Jeff’s question about the accuracy of the mean is both beside the point and reflects his supposition (at that time, since retracted) that the uncertainties I reported in paper 1 were mere weather noise variance. But Jeff goes on to discuss the CLT as though it were pertinent.

This is another persistent problem I face here: the unwarranted imposition of additional and entirely unsupported conditions and meanings onto my words. And then projecting that textually unwarranted imposition onto me.

It would be unconscionable if it were deliberate. I believe it’s innocent, though. Somehow, Lucia, Jeff, and Steve Fitzpatrick are caught in some trap, and seem unable to grant my words the strict meanings they actually carry.

Pat Franksaid

In #248 Jeff entered a denial: “heterogeneous state__________________________stationary state — bull – these are false claims Pat read carefully. I’m just going to put bull there because none of us are that uneducated, and none of us made these claims. (bolding added)”

Jeff in #142, “Zeke has given an elegant way to back calculate the uncertainty ‘s’ due to weather noise, temperature and all random factors which makes very few assumptions necessary.

Likewise, Steve Fitzpatrick, “Your internal process variability (‘s’) has nothing at all to do with uncertainty in a trend, except to the extent that a larger ‘s’ means you need more data to reach a specified level of uncertainty for the trend estimate.” Steve here implies a 1/sqrtN reduction in ‘s.’ This, in turn requires state stationarity. Q.E.D. #2

Likewise, #132, Lucia: “1) Case 2 is wrong. The uncertainty you call ‘s’ should be zero.” The only way ‘s’ can be zero is if the state is homogeneous. That’s Q.E.D. #3. It’s not that none of you made these claims. All of you made these claims.

Jeff also wrote in reply, “(+/-)s is never zero___________________________(+/-)s never diminishes — +/-s is not correctly defined, who cares if it never diminishes.”

So do you, apparently: “You need to change your wording to match this substitution like this–… Then leave the rest the same and you are accurately representing his cliam.”

Q.E.D. again.

And the most ironic part of your conjoint objections is that I never applied (+/-)s to represent the uncertainty in a mean trend. I applied (+/-)s, in terms of the annual anomalies, to represent natural variability, and whether the trend exceeded that, or not. The reality of the trend is not at issue in this case, but only its relationship to the variability of the system.

Jeff, in post #142, “Zeke has given an elegant way to back calculate the uncertainty ‘s’ due to weather noise, temperature and all random factors … I doubt you will be convinced by my discussion so I wonder if you can interpret the difference between Zeke’s result and yours. (bold added)”

Richard T. Fowlersaid

Pat, just for the record, I was trying to paraphrase, not you, but Jeff, which I believe I have adequately done.

Apparently you don’t agree that a Cauchy distribution is necessary for your results. Specifically, you don’t agree with Jeff’s statement in 221 that

“Because the oversampling is so high, for Pat to justify his papers, he would need to define Hubbard’s systematic error distribution as a Cauchy distribution and demonstrate statistically that this is the case.”

Mark Tsaid

For the record, my initial mention of the Cauchy distribution was only to point out an example in which increasing the sample size does not decrease the uncertainty of the mean (it actually increases.) There are others though i have not come across any that would specifically apply. One problem with the Cauchy is that the tails are fat which leads to extreme values that are not be seen in real data (this sort of a problem with Gaussian distributions as well since real data is somewhat bounded.)

I have been thinking if toy examples such as the mean of random walks but even those have issues that would render them inappropriate. Damn you guys for forcing me to think!😉

Cauchy isn’t the right distribution either. Do you know of any distribution which has a defined mean that is not better known through more sampling. This is the worst case assumption of no knowledge of systematic error. I’m going to work on that post tonight a bit. I have answered Pat’s objections quite clearly to my understanding. Perhaps I will try again later but first, I would like to discuss the problems with his assumptions that are described as lack of assumptions.

In this case, the aforementioned “population” whose mean we are discussing comprises temperatures at an infinite number of points along a two-dimensional surface which exists near the surface of Earth, all at a single point in time. (I’m disregarding for now the effect of improper averaging of daily max and min temps, since it is a separate issue from the one I’m trying to address.)

At the same time, by your own admission in #268,

Statement 2. “This variance, which reflects the heterogeneity of a state, [. . .] is what Jeff says doesn’t exist.”

Please note that by the word “variance” in Statement 2, you were referring to the “variation” of ohmages of a sample of resistors, which “variation” you identify as being the “standard deviation”. Witness:

So, with Statement 2, made by you, you are claiming that the standard deviation of a statistically significant sample “reflects the heterogeneity of a state”. While at the same time, with Statement 1, you are claiming that the same standard deviation of the same sample is “an example of magnitude uncertainty” by which you mean uncertainty about the population mean.

How do you explain this contradiction?

From my point of view, Jeff is not flip-flopping as you suggest in #268, nor is he incoherent on this particular point. Furthermore, he is hardly the only poster here who has made that point.

Mark Tsaid

Jeff, i agree it is not. No, i do not have a specific. The problem arises in that everything is practically bounded, even Gaussians suffer from this. There is no such thing as an infinite temperature. Statistically such things can happen though. Cauchy-like bute without the extreme tails.

Mark Tsaid

The distribution i have with my problem has three components, all correlated. Two are Gaussian, the third is product normal. I know the mean and variance, but in one case the tails are too thin, leading to an over estimate of large value occurrences, and in the other case the tails are too fat leading to an underestimate of large value occurrences. But it is stable so tweking can yield reliable estimates.

Pat Franksaid

#275 Richard, in my paper, the very first time (+/-)s appears I define it as “reflecting the fact that the tau_i magnitudes are inherently different.”

You wrote that Case 2 concerns the mean of, “… temperatures at an infinite number of points along a two-dimensional surface which exists near the surface of Earth, all at a single point in time.”

No, it does not. Case 2 is none of that. The case is one of simple signal averaging statistics involving a signal with properties strictly and artificially limited to an intensity and a noise, period. Your description injects physical meaning into Case 2 that it does not have.

Further, time is nowhere specified in Case 2. Your “all at a single point in time” is pure invention. Just as is your “infinite,” your “two-dimensional,” your “surface,” and your “Earth.” In fact, even your “temperature” is an invention in that you represent it as a physically real temperature. None of that appears in Case 2. The t_i are never represented to be physically real measurements and the tau_i are never represented to be physically real temperatures.

Case 2 concerns only a set of intensities plus noise. Nothing else. It’s an artificial, mathematical system constructed to show the evolution of the statistical equations as the system varies away from Case 1, the simplest statistical model.

Notice that the only difference between Case 1 and Case 2, is that the tau_i are now allowed to vary inherently in magnitude. Case 2 merely notices that in describing T_bar, the inherent variation of the tau_i about T_bar must now be included.

Further down, in regard to noticing this variation about T_bar, Case 2 states that, “The magnitude uncertainty, (+/-)s, is a measure of how well a mean represents the state of the system.” Where is the error in that statement?

Now, let’s look at the “contradiction” you queried. You wrote, “So, with Statement 2, made by you, you are claiming that the standard deviation of a statistically significant sample “reflects the heterogeneity of a state”. While at the same time, with Statement 1, you are claiming that the same standard deviation of the same sample is “an example of magnitude uncertainty” by which you mean uncertainty about the population mean.”

Is that really what I meant by “Statement 1” magnitude uncertainty in #268? In #185 I wrote that the SD, “represents the variation of the 100 resistances about the mean resistance.” In #268 I called this SD, “a measure of state heterogeneity.”

So, I didn’t mean “the uncertainty about the population mean, did I. I meant the variation about the mean due to 100 inherently different Ohmages, didn’t I. That represents magnitude uncertainty as I defined it in the paper, doesn’t it.

In #185, I specified resistors of different Ohmages. That is, the resistances are inherently different. The system of 100 resistors is heterogeneous. The resistance mean will have a standard deviation reflecting that heterogeneity.

So, now, how does saying that the standard deviation reflects the “[magnitude] uncertainty about the population mean” contradict saying that the same standard deviation reflects “the heterogeneity of [the mean] state“?

In fact, there is no contradiction. The same standard deviation is both a statistical measure of the variation in the 100 resistances about the mean resistance, and it is also a measure of the magnitude uncertainty in the mean due to system heterogeneity, as I defined it in the paper.

To illustrate magnitude uncertainty: if, with closed eyes, one blindly chose out a single resistor from the heap, one’s only immediate knowledge about the chosen resistor would be that its resistance should be within Ohms_bar(+/-)s. That’s because even though one didn’t know which resistor was picked one does know the mean(+/-)SD of the heterogeneous system of resistors.

You wrote, “From my point of view, Jeff is not flip-flopping as you suggest in #268, nor is he incoherent on this particular point. Furthermore, he is hardly the only poster here who has made that point.”

I made three consistent representations of magnitude uncertainty, which Jeff claimed were wrong, right, and then wrong again. That’s hardly coherent.

You have imposed physical meanings onto Case 2 — e.g., … temperatures … along a two-dimensional surface … near the surface of Earth, all at a single point in time,” none of which are part of Case 2. You have invented untoward meanings and imposed them onto the Case. You have made the same error of impositional meanings that Jeff insistently makes.

Then you’ve assessed Case 2 in terms of your mistaken and imposed meanings and gone on to project your mistakes onto my work. As Jeff does.

So, you’re right that Jeff is not the only one who has made this point. So have Lucia and Steve F, and now you. You’ve all imposed novel and entirely unjustifiable physical meanings onto Case 2 and then run off across the countryside bannering supposed errors.

Let me also add here that Case 2 implies no distribution of magnitudes. Discussion of any purported distribution of the Case 2 tau_i, Cauchy or otherwise, also puts a meaning on the Case that it was not given and does not have. Adding further meaning will change Case 2 into something foreign and modify Case 2 into something else entirely. The mathematics governing something else entirely has no bearing on Case 2 itself.

Pat Franksaid

#273, “Do you know of any distribution which has a defined mean that is not better known through more sampling. This is the worst case assumption of no knowledge of systematic error.”

This is exactly the case for global temperature sensors virtually throughout the entire 20th century. There is no defined error mean, and there has been no sampling or monitoring of the real time in situ systematic measurement error of surface station sensors. The systematic measurement error is not available in station metadata.

There is virtually no empirical knowledge of the actual systematic sensor measurement error present in the global 20th century surface air temperature record, and therefore no empirical knowledge of its limits of accuracy.

DeWitt Paynesaid

Richard T. Fowlersaid

“That’s because even though one didn’t know which resistor was picked one does know the mean(+/-)SD of the heterogeneous system of resistors.”

Pat,

The “mean(+/-)SD”?

The mean standard deviation?

Seriously?

How many different standard deviations does a set of variables have?

And since we’re looking for a trend in the _mean_ temperature, not the standard deviation, then how is this relevant??

Furthermore, I was not talking about Case 2. I was talking about your description of a sample of resistors. I was also talking about the analogous system with respect to mean Earth temperature (analogous to your resistor example). I was not imposing anything on _your_ words. I was bringing up what you should have been talking about from the beginning — trends in the population _mean_. That is what matters, and it is all that matters. And by equating sample standard devation with uncertainty in the _mean_ of either the population or the sample, you are specifically and assuredly NOT talking about any such trends.

Regarding your comments to me about signal/noise — none of that enters into your resistor example, so it is not relevant to my question that I asked you. For what it’s worth, you should be looking for signal in the MEAN … not in any standard deviation. WHO CARES IF THE STANDARD DEVIATION CHANGES!?!?!?

And by choosing to examine CRU’s trend analysis (which was an attempt to characterize trends in MEAN TEMPERATURE, not STANDARD DEVIATION OF TEMPERATURE), you are choosing to talk about … wait for it … MEAN TEMPERATURE! Not any standard deviation of temperature, or any other parameter of temperature. But for some reason you will not choose your words according to the actual physical thing (MEAN TEMPERATURE trend) that you supposedly chose to inspect.

I also note that I haven’t seen anywhere in your comments where you acknowledge that it is the trend in the POPULATION of variables that is what we’re trying to characterize with our analysis, rather than just the sample. Your 100 resistors must be seen as a sample of a larger population in order for your analogy with global temperature to be apt.

That POPULATION is the INFINITE SET OF POINTS which you claim I have invented. I didn’t invent that! That set of variables is the subject of both CRU’s study and (supposedly) yours. Except it appears you weren’t studying the same set that they were, and in fact that is the one you SHOULD have been focused on. Thus, while it may be that you never THOUGHT of your study population as being an infinite set of temperatures, I didn’t “invent” that idea. I’m just characterizing it, and others here on this page have implicitly done so as well. With the notable exception of yourself. And that speaks volumes about yourself. Absolute volumes.

Richard T. Fowlersaid

I’m aware that it can matter if a standard deviation changes over time. My sentence that you quoted is an example of hyperbole. Hyperbole exists to make a point. If my point was not apparent to you, you might have a second look at my words.

WRT yelling, is it yelling if you raise your voice because the person you’re speaking to is decidedly hard of hearing? And even if it is (I would argue that’s an unfair generalization inapplicable to this case) is it fair, nonetheless, to say, “stop yelling” when the absence of emphasis has resulted in such a thorough avoidance of a point that was made?

I made a point. The point was rather, um, shall we say “important”. The point was 100.0% disregarded, and new points were brought up with a decidedly negative tone that suggests, to me, ill intentions. (I’m speaking of course of Pat Frank.)

If I assume honesty, and I am treated, in turn, to the appearance of dishonesty, Houston we have a MAJOR major problem. I’m sorry if you don’t LIKE my way of trying to deal with that. I’m _sorry_ if it’s not your STYLE. But it’s the only way I know that is proven to sometimes work in such a case.

A lot of people have said, well, sometimes you just have to disengage, and that is true. But that doesn’t have to mean you can never add some emphasis. With some people, you have to add more than with others to get the same results.

I would also remind you that the stated purpose of this entire set of studies was the examination of a prior analysis of M E A N T E M P E R A T U R E T R E N D.

Steve Fitzpatricksaid

Richard T. Fowlersaid

Dude. You have your style, I have mine. I wasn’t disparaging your style. You were taking umbrage at mine (and continue to do so with your comment “Typing in all caps doesn’t make it any clearer”), and I was simply saying, in essence, I’m sorry that you feel that way. And I am. Because I don’t think it’s necessary for you to feel that way.

My point in typing in all caps is for the words to be noticed. Typing in all caps makes them more noticable. If you don’t like looking at all caps, you are free to disregard the message.

Regarding “60% …”, that would be a lot more results than the others here have gotten. But I think you have missed the main point which is that I am profoundly offended by what I’ve read thus far, and the author doesn’t seem to be aware how offensive his words are. So I’m giving him a taste of his own medicine.

It’s called “education.” Again, sorry if you don’t like it, but it is sometimes important.

If it doesn’t work, then there is the last resort, which is I go to E&E and interact directly with their editors.

One way or another, I promise you this will eventually get addressed properly. If I have to organize a demonstration, I will do it. I organize a pretty good demonstration when the situation warrants.

The point of all this is you don’t mess with people’s reputation. As someone who believes temperatures are falling, my reputation is being placed on the line with a study like this. If the errors are innocent, then fine. But if the author will not admit a serious error even after it appears he may have already understood it, then that’s not so fine.

I hope you can understand. This is not just a parlor game, it’s real life. In real life, presently, there are a lot of people out there who cannot afford to have kids because of what’s being done. If they are being deliberately impoverished to prevent them from having any kids, the legal term for that is “ethnic cleansing”.

Pat Franksaid

#281 RTF” “Pat Frank says in #278,
“That’s because even though one didn’t know which resistor was picked one does know the mean(+/-)SD of the heterogeneous system of resistors.”
“Pat,
The “mean(+/-)SD”?
The mean standard deviation?
Seriously?
How many different standard deviations does a set of variables have?”

“mean(+/-)SD” = the mean plus-or-minus the standard deviation, Richard. How hard is that to figure out? Isn’t ‘mean(+/-)SD‘ the way one usually expresses a mean plus its uncertainty?
I swear, the misreadings exposed here are so naive that it almost seems like some of you folks must be misinterpreting my text on purpose.

Pat Franksaid

#281 Richard, you also wrote, “I was talking about your description of a sample of resistors. I was also talking about the analogous system with respect to mean Earth temperature (analogous to your resistor example).”

If you track back through the thread, you’ll discover that the resistor example was in reference to Jeff’s claim that magnitude uncertainty as I defined it did not exist.

If you’ve read my paper, you’ll know that I defined magnitude uncertainty to reflect the heterogeneity of a system when the intensities of observed signals display inherently different magnitudes. The offer of varied resistors provided a simple example of magnitude uncertainty.

So despite your perception that, “Furthermore, [you were] not talking about Case 2.“, in fact you were talking about Case 2 because the resistor example was made in the context of a discussion with Jeff about his (mis)perception of Case 2.

So, it seems you meant to raise a different issue, but you chose the wrong context. That confused your point. It seems, Richard, that you jumped in to the conversation without informing yourself of its content.

Most of the rest of your discussion in #281 is irrelevant because it follows from your misread of my “mean(+/-)SD,” in which you mistakenly assigned it to represent something like the mean of a set of standard deviations.

That misread of conventional notation is unique to you on this thread. Congratulations. But there’s been so much tendentious misreading of my text on this thread that you’ll just have to be satisfied with your modest place in a well-populated constellation.

You also wrote that, “And by choosing to examine CRU’s trend analysis…” I chose to analyze CRU’s misapplication of “random” to the subjective read-error of Folland, et al, 2001, and the total neglect of sensor systematic error. Figure 3 in paper 1 just applies a lower limit uncertainty estimate to the GISS (not CRU) trend, representing a first order rectification of those mistakes.

You wrote, “Your 100 resistors must be seen as a sample of a larger population in order for your analogy with global temperature to be apt.”

I never analogized the resistor example to global temperatures. More misconstrual on your part, Richard.

Immediately following the above quoted sentence, you wrote, “That POPULATION is the INFINITE SET OF POINTS which you claim I have invented. I didn’t invent that! That set of variables is the subject of both CRU’s study and (supposedly) yours.”

One suspects that even the CRU scientists would hesitate before a claim that they’re working with an “infinite set of points,” where your “points” clearly = temperature measurements.

In statistics, the infinite set of points is a mathematical fiction called the “parent” set, Richard. The real experimental data is a more limited set, often called the sample population, which is assumed to represent a sub-sample of the ideal parent set. See Bevington and Robinson, page 7: “If we could take an infinite number of measurements, … [this] distribution is called the parent distribution. … [The] measurements we have made … form the sample distribution. (my bold)”

So it appears you did invent the “POPULATION” of an “INFINITE SET OF POINTS,” Richard and doubly so. The CRU scientists don’t possess such a data set; your invention #1. And statistics specifically limits infinite data sets to a mathematical ideal that is not extended to real data sets; your invention #2.

You went on, “Thus, while it may be that you never THOUGHT of your study population as being an infinite set of temperatures, I didn’t “invent” that idea.”

It seems you did.

However, just to clarify: my view that you made an invention followed from the fact that you jumped into my conversation with Jeff about the meaning of Case 2. Your subsequent misconstruals of meaning clearly show that you jumped in without knowing the context. You then went on to inadvertently assign the meaning of “temperatures at an infinite number of points along a two-dimensional surface which exists near the surface of Earth, all at a single point in time.” to Case 2.

Now you have clarified that you didn’t mean to do that. Thanks.

With respect to what you did do, I doubt that very many would take the GHCN data set to be infinite or that any subset of “daily max and min temps” represent temperature averages “at a single point in time.”

And as my not discussing that which I never imputed, namely infinite data sets, told you “volumes… Absolute volumes.” about me, what do your tendentious misreadings, your careless misconstruals, and your inapt inventions tell you about you?

Item 2) has the corollary claim that the Case 2 (+/-)s is the standard deviation of weather noise and represented the uncertainties I reported, e.g. here, here, and here.

LJS also clearly thought that the variances I reported in Figures 1-3 and Tables 1, 2 represented this weather noise.

Item 1) was under continuous debate until I challenged Lucia, Jeff, and Steve F. to demonstrate which of the three Figures and two Tables contained a weather noise variance.

None of the three met the challenge. The reason was, of course, that none of the Figures or Tables actually include a weather noise variance.

Jeff, almost grudgingly, has admitted as much in his more recent head post: “I get that Pat hasn’t included weather noise in his final calculations for Table 1,2 and the figures…”

Lucia and Steve Fitzpatrick have evidently been unable to muster even that much grace. Lucia disappeared from this thread after Mark T’s admonition, and though Steve Fitzpatrick has continued to post here he has remained silent on this challenge even though it’s foundational to his criticism of my work.

This means that LJS error item 1) now stands entirely refuted; refuted by its demonstrated absence.

The loss of LJS item 1) left LJS error item 2) isolated to Case 2. The absence of Item 1) means that Item 2) had no ramifications. It propagated no error into any of the Figures, Tables, or conclusions in the paper.

Regarding item 2), RomanM made a very concise comment on the meaning of paper 1 Case 2 in Jeff’s new thread, namely that, “All of the discussion of weather and signals is a red herring in the context of the cases presented by Pat in his paper. … T_bar is … an unbiased estimator for the parameter tau_bar.”

Roman’s conclusion about T_bar and tau_bar is correct. His conclusion is already strictly implicit in Case 2: in the definition that the noise is stationary. Stationary noise means that at very large N sigma_noise tends to zero with a zero bias, while T_bar tends to tau_bar (cf. p. 972).

But more generally, Roman’s observation is that Case 2 is strictly self-referential. T-bar refers to tau_bar only. There is no reference to weather noise anywhere in Case 2. There is no explicit or implicit connection between the tau’s and weather in Case 2. “Weather” is not in the definition of Case 2 conditions. Tau_bar is never connected to weather. There is no implicit meaning of “weather” in Case 2 tau_bar.

This illuminates the Case 2 “(+/-)s”: it also has no specific connection to weather or to weather noise. And Case 2 (+/-)s has no specific connection to any of the variances in the Figures or Tables of my paper.

LJS error Item 2) is now also left empty of content.

The entire corpus of error Item 1) and error Item 2) is now extinct.

The LJS claims of error in this thread are left without any content at all.

And the refutation of error Items 1) and 2) came only after the actual content of my paper was tested against the LJS claims.

That means the weight of the debate in this thread has stemmed from the careless readings — or careless non-readings — and facile misinterpretations of the text of paper 1 by Lucia, first, and then by Jeff and by Steve Fitzpatrick.

You apparently haven’t followed (or perhaps haven’t grasped) the debate, deWitt. But that apparently didn’t stop you from finding an opinion that suits you. Take heart, though: you’re in good company.

Pat Franksaid

#283 RTF wrote, “The point was 100.0% disregarded, and new points were brought up with a decidedly negative tone that suggests, to me, ill intentions. (I’m speaking of course of Pat Frank.)”

And of course it has turned out that your point included both a misunderstanding of the very obvious meaning of “mean(+/-)SD,” i.e., ‘mean plus-or-minus its standard deviation,’ and was dropped into the middle of the debate carrying the wrong context.

I’ve lost track, is the topic of this thread whether Pat Frank’s method treats known, but uncorrected bias errors correctly, or is it the state of Pat’s soul? One of those might be interesting to discuss, the other is known but to God, and RTF apparently…

Show of hands: who thinks the scale of uncorrected bias errors get smaller with the sample size? [folds hands in lap] who thinks uncorrected bias errors do not depend on the sample size? [puts hand up] who thinks this is a really interesting case where the bias depends on a deterministic forcing that is correlated across the sensors? [puts other hand up]

It would be really interesting to see someone actually address the issue Pat raises (no, Zeke’s monte carlo, while interesting and useful, does not do that).

Carricksaid

It’d also be interesting for jstults to tell us what test he would find convincing, and even more to try and apply some of the issues Pat has raised to the surface temperature record and how it is analyzed.

If you don’t put pen to paper and do the math, you can get the physics to do what you want. (You can also get it to do what you want if you have conceptual errors that you refuse to admit to, but that’s solely another issue.

If Pat takes the time to answer my question on the other thread, I may spend the time to answer Jstults concerns. I ended up writing another post on this and gave up as the error I read keeps being repeated throughout. If he can explain the specific contradiction I’ve brought up, maybe there is the potential for understanding. I would hate to put the effort into another post without resolving what appears to me to be a very obvious contradiction.

If an analysis requires that you consider additional assumptions and a chain of reasoning that it doesn’t actually present to address an issue, then it is non-responsive to that issue (this also happens to be an issue for which Zeke specifically disclaimed relevance). It would be interesting to see a responsive analysis. In the absence of a responsive analysis, fruitless gish-gallop will continue…

The fact that Zeke wrote that his study doesn’t include systematic bias, does not change the fact that high frequency local systematic bias effects are considered. I’m quite sure he would agree with me on this but I understand that people would be confused without further explanation. The analysis does not require additional assumptions as they are explicit in Pat’s paper.

It’d also be interesting for jstults to tell us what test he would find convincing,

An analysis that acknowledged and at least attempted to treat known, but uncorrected bias errors would be a good place to start; the example of section 4.2 in that paper I linked seems particularly germane.

Handwaving about the frequency or distribution of the bias errors is neat, but talk is cheap (heck, I tried to make much the same point myself). Claiming that an analysis answers questions that it can’t (bootstrap is a way to estimate the sampling distribution, perhaps it could be modified to treat known but uncorrected bias, but no one has done that yet), is a perfect way to let error fester. It does nothing to raise the level of discourse. The dogs we’re left with about bias error not depending on sample size won’t hunt. Of course bias error does not depend on sample size.

Jeff, I’m not sure how to respond to your last comment; it strikes me as self-contradictory.

The systematic bias in Pat’s paper is due primarily to sun and wind loading. There is also systematic bias created by UHI and sight moves. These are vastly different effects when looking at multiple measurements and trends. One may be reduced as sqrt (n), the other may not. Zeke’s comment doesn’t pertain to a large number of stations experiencing weather related systematic error but rather to things which can affect trend. His analysis would certainly show large changes in monthly mean if the systematic effects of sun and wind didn’t reduce with increased sampling. It is also fairly easy to imagine that not everywhere is sunny on the same day. Sounds silly but that is what the non-shrinking error bars means. In the case of UHI style long term effects, such an assumption may be accurate though. I may actually do that post someday.

Hopefully that get’s you thinking in the right direction.

Currently Pat is claiming that his ‘magnitude error’ is not the error in mean. His paper says it is the error in mean, I’m trying to get him to parse the contradiction. If you are interested in raising the level of discourse, perhaps you should read that one and understand where this discussion has gone.

[…] We’ve been discussing Pat Franks recent temperature uncertainty publications for quite some time on this other thread. I’m not pleased to say that nearly zero ground has been made in understanding the problems in this work and even some very sharp people have missed the mark. I’ll give it an hour this morning, and work away at it for a bit until I’m finished. Math isn’t always boring but we have already covered this stuff so you guys might not like it. The paper being discussed is online here. […]

Mark Tsaid

I just rolled through for a quick check (since a recent post was on the list) and noticed the high frequency comments again. I just want to remind everyone that “high frequency random noise” implies correlated samples. Indeed, in order for Gaussian/normal distributions to be uncorrelated (and thus, independent) they must be white, i.e., cover the entire spectrum equally. I believe the same will hold for many (if not all) distributions though I have only tested a handful.

Just a thought…

Oh, Jeff, I played around with some Cauchy data the other day on a whim. I was able to confirm the notion that increasing sample size increases variance in the parameter estimation. Very interesting. The extreme values are the reason (from an empirical standpoint) a mean never converges.

Carricksaid

I just rolled through for a quick check (since a recent post was on the list) and noticed the high frequency comments again. I just want to remind everyone that “high frequency random noise” implies correlated samples. Indeed, in order for Gaussian/normal distributions to be uncorrelated (and thus, independent) they must be white, i.e., cover the entire spectrum equally. I believe the same will hold for many (if not all) distributions though I have only tested a handful.

It’s pretty easy to see this is a general result. A process with a Dirac delta autocorrelation has a flat (“white”) Fourier transform (this is a trivial result of course).

Since the Fourier transform is a linear operator whose inverse exists and is unique, the only spectral distribution that will yield a Dirac delta autocorrelation function must be white noise. Any other spectral distribution must have a finite-width nonzero autocorrelation function. (Of course that’s saying a different thing than saying the amount of correlation in the noise “matters” for a given measurement.)

Pat Franksaid

What you call an error in this case is not an error and the “contradiction” is not a contradiction. The problem evaporates when one realizes that “uncertainty” in science can have a broader meaning than it does in statistics.

Pat Franksaid

#297 and #193, Jstults, sorry, I didn’t see your question before. You asked, “if the predictors (weather) varied (in space) in an uncorrelated way at each station, would you expect the systematic error to reduce with the normally expected 1/sqrt(n) behavior?”

That’s a good question. In my view, the short answer is, ‘yes, but only if the distribution of systematic bias errors could be shown to be random.’

“Uncorrelated” is a tricky concept, and I may not have it right here. But in an empirical science, uncorrelated merely means that the value of one observable does not depend in any way on the value of another observable. In that case, “uncorrelated” doesn’t say anything about the distribution of the magnitudes of the observables.

If the magnitudes are due to systematic (deterministic) effects, I don’t see any reason to assume they have a random distribution. In the absence of an organizing and valid theory, one would have to measure the effects in some statistically valid way to empirically determine their distribution in time and space.

The specific biases we’re talking about here arise from the systematic effects on temperature sensors. There is no theoretical reason to suppose that systematic biases are randomly distributed across time or space, even if they are uncorrelated across locations. To impose this condition on the data would require a falsifiable meteorological theory saying something to the effect that the net weather on planet Earth sums to zero over some climatologically relevant time period — say one year for an annual global average temperature.

If the local surface weather effects, summed over Earth, averaged to a mean of zero weather, we can suppose that systematic temperature biases also would sum to zero, but, even so, iff a statistically valid distribution of temperature sensors had been deployed across the surface.

In the absence of a statistically valid distribution of sensors, the weather effects would be asymmetrically sampled and the systematic biases would not average away, even if the total surface weather summed to zero.

Even in this case, under these less than ideal circumstances, the mean systematic bias and its SD might be estimated and reduced using some rational approach, but they would not be zero.

So, my answer is yes, but, and yes, but. And I could be wrong.

But honestly, I don’t think that set of criteria will ever be met for the 20th century temperature record, which, after all, is the record of relevance for any causative conclusion about recent climate warming.

Thank you for any other informative website. The place else could I am getting that kind of information written in such a perfect means? I’ve a venture that I’m simply now working on, and I’ve been on the glance out for such info.