Wednesday, January 14, 2009
... /////

David Stockwell has analyzed the frequency of the final digits in the temperature data by NASA's GISS led by James Hansen, and he claims that the unequal distribution of the individual digits strongly suggests that the data have been modified by a human hand.

With Mathematica 7, such hypotheses take a few minutes to be tested. And remarkably enough, I must confirm Stockwell's bold assertion although - obviously - this kind of statistical evidence is never quite perfect and the surprising results may always be due to "bad luck" or other explanations mentioned at the end of this article.

Update:Steve McIntyre disagrees with David and myself and thinks that there's nothing remarkable in the statistics. I confirm that if the absolute values are included, if their central value is carefully normalized, and the anomalies are distributed over just a couple of multiples of 0.1 °C, there's roughly a 3% variation in the frequency of different digits which is enough to explain the non-uniformities below. However, one simply obtains a monotonically decreasing concentration of different digits and I feel that they have a different fingerprint than the NASA data below. But this might be too fine an analysis for such a relatively small statistical ensemble.

This page shows the global temperature anomalies as collected by GISS. It indicates that the year 2008 (J-D) was the coldest year in the 21st century so far, even according to James Hansen et al., a fact you won't hear from them. But we will look at some numerology instead.

Looking at those 1,548 figures

Among the 129*12 = 1,548 monthly readings, you would expect each final digit (0..9) to appear 154.8 times or so. That's the average statistics and you don't expect that each digit will appear exactly 154.8 times. Instead, the actual frequencies will be slightly different than 154.8. How big is the usual fluctuation from the central value?

Well, the rule is that the abundance of each digit, centered at N=154.8, obeys the normal distribution whose standard deviation is roughly sqrt(154.8). Well, it's actually sqrt(139.32), as argued below, but let's avoid unimportant complications here. It means that if you compute the average value of e.g. "(N_i-154.8)^2" over "i" going between 0 and 9, you should again obtain 154.8.

It's not zero, proving that some deviations from the "quotas" are inevitable. On the other hand, it is pretty small for a square of the difference of rather large numbers. And the value seems to be precisely determined. If you generate a random list of 1548 digits between 0 and 9, the average value of "(N_i-154.8)^2" over 10 digits will be remarkably close to 150 or so.

I've played this random game many times, to be sure that I use the correct statistical formulae.

Because the hypothetical bias that was introduced was man-made and men such as James Hansen usually care how a number looks like, we should look at the final digit of the absolute value of the temperature anomaly, expressed in units of 0.01 °C. You still expect 154.8 copies of each digit because a random temperature anomaly is expected to have a quasi-uniform distribution over intervals whose length clearly exceeds 0.1 °C.

And indeed, the "man-made" signal seems to be much stronger if you take the absolute value of the GISS figures. It turns out that if Stockwell's explanation is correct, Hansen's team loves 0 and 1 as the final digits but they abhor 4 and 7. ;-) For the absolute values, the numbers for 0-9 are

186, 178, 170, 157, 130, 170, 147, 131, 137, 142

Once again, these numbers were expected to be close to 154.8. The higher frequency of the digit 0 (186) can be explained by rounding if it ever occurred; the high frequency of 1 (178) and the low frequency of 4 (130) and 7 (131) doesn't admit such an explanation.

At any rate, the average value of "(N_i-154.8)^2" over 10 digits "i" in the GISS absolute value data set is 370.16, much larger than 154.8. I've repeated the same exercise 10,000 times (i.e. 10,000 fake GISS datasets) with random digits to determine that the probability that the expression defined in the previous sentence exceeds 370 is smaller than 0.5%: roughly 40 events (fake datasets) out of 10,000 have this property.

In fact, my statistics was good enough to determine that the correct expected average value of "(N_i-154.8)^2" over 10 digits "i" shouldn't be 154.8 but rather 154.8*0.9 which is actually 139.32. ;-)

The reduced result is essentially due to the fact that the abundance of one of the digits is not random but determined by the remaining nine: the fluctuations have to be reduced for the sum of the frequencies to equal 1,548. This detail makes things even worse for the hypothesis that the non-uniformities are due to chance.

I could calculate all these numbers analytically, too, but it is safer to do the actual Monte Carlo statistics to avoid mistakes in the statistical formulae. The qualitative summary of our particular statistical test would be identical, and it is this:

Using the IPCC terminology for probabilities, it is virtually certain (more than 99.5%) that Hansen's data have been tempered with.

And that's the memo. ;-)

Mathematica 7 notebooks

But unlike the IPCC's adjectives, my adjectives and calculated probabilities follow standard statistical algorithms that you can fully check by downloading a few files (rather than the central value of "gut feelings" of a few extremist would-be scientists). The evidence is of statistical character and it is not rigorous. But it exists.

at least for the 361 final monthly digits in the lower-troposphere global UAH MSU datasets (with signs removed).

The expected frequency per digit is 36.1 and the average squared oscillation should be 0.9 times 36.1. I got around 37 which is surely tolerable - almost exactly what is statistically expected (even though the detailed distribution would also indicate that UAH abhors 7 as the final digit). The probability that you get a result above 37 is comparable to 50%.

In plain English, I don't see any evidence of man-made interventions into the climate in the UAH MSU data. Unlike Hansen, Christy and Spencer don't seem to cheat, at least not in a visible way, while the GISS data, at least their final digits, seem to hint at their anthropogenic origin.

Rounding: Fahrenheit vs Celsius

Steve McIntyre has immediately offered an alternative explanation of the non-uniformity of the GISS final digits: rounding of figures calculated from other units of temperature. Indeed, I confirmed that this is an issue that can also generate a non-uniformity, up to 2:1 in the frequency of various digits, and you may have already downloaded an updated GISS notebook that discusses this issue.

The assumption of this alternative theory is that the initial temperature, in Fahrenheit degrees (with a sensible distribution), is written up to 0.01 °F (and maybe, it could be 0.1 °F or averages of readings accurate up to 1 °F could play a role, too). At this point, it doesn't matter whether you round it to the closest or the nearest lower multiple of 0.01 °F because the distribution is still uniform. However, now you multiply the rounded Fahrenheit reading by 5/9 and round up to 0.01 °C.

At this point, it matters how you round it. If you choose the closest integer multiple of 0.01 °C, you will see that zeros are underrepresented. On the other hand, if you truncate the number to the floor, the nearest lower multiple of 0.01 °C, you will see that the final digits 4,9 are underrepresented, roughly by a 2:1 ratio to other digits.

It doesn't matter where the temperature distribution is centered because 0.9 °F = 0.5 °C happen to be rational numbers, so this is the periodicity after which both Fahrenheit and Celsius last digits start to repeat. They are synchronized.

I can't get 4,7 underrepresented but there may exist a combination of two roundings that generates this effect. If this explanation is correct, it is a result of much less unethical approach of GISS than the explanation above. Nevertheless, it is still evidence of improper rounding.

Non-uniform distributions

Another reason for the non-uniform representation of different digits could be the fact that the standard deviation of the temperature during 120 years is simply not large enough for the digits counting 0.01 °C to be sufficiently uniform.

For example, if you follow Steve McIntyre and study rounded (to the closest integer) absolute values of numbers that are normally distributed with the standard deviation 100 and centered around 0, you will get about 10% for the digit "0" but the expected share of digits "1" to "9" monotonically decreases from 10.3% to 9.7%. Similar effects, perhaps with some shifting (or even a smaller standard deviation, relatively to the "decade" where the 0.01 °C digit tests all possible values), could be enough to explain the "data" here.

However, the particular theory by Steve from the previous paragraph still excludes the GISS distribution at 98.7% confidence level (it would be "extremely likely" that something else than his "null hypothesis" is going on). I think that Steve is making a mistake with his "percentile" functions if he believes that it is only 75%. The updated GISS Mathematica notebook shows you how I got the figures.

To summarize, I don't claim that there exists a final proof of a mistreatment. But there exists a statistical argument to consider this possibility. There are several conceivable and stronger arguments that would convince me that Stockwell's hypothesis is false, especially

a nearly complete theory of rounding (or a priori non-uniform distribution for the digits) that leads to statistical predictions that are compatible with the observed distributions

a complete reconstruction of the final digits by repeating automatic algorithms that don't contain any manual editing.

Unfortunately, an argument based on the assumption of complete scientific integrity of the boss of GISS won't convince your humble correspondent at this moment.

Finally, whether or not David's (or mine) suspicion in this case is justifiable and/or correct, his is a good test to reveal human interventions that should work for large enough datasets with large enough standard deviations. If someone claims to have memorized 1,548 digits of Pi and "7" will only appear 100 times in his version of Pi while "1" appears 200 times, it's clear that you shouldn't trust the person even if he tells you that he works with 27 artists from the whole European Union!