Phil Jones was Wrong

Wrong about what, you wonder? During an interview for the BBC he was asked, “Do you agree that from 1995 to the present there has been no statistically-significant global warming?” Jones replied, “Yes, but only just.”

Sorry, Phil, I disagree.

Jones based his statement on straightforward analysis of the data and reported his result honestly. The HadCRUT3v data set (which is from Jones’s institution) shows warming since 1995, but it’s not statistically significant. But then, climate trends over such short time spans (yes, for climate 15 years is a short time span) don’t usually pass statistical significance tests.

But by removing the influence of exogenous factors like el Nino, volcanic eruptions, and solar variation (or at least, approximations of their influence) we can reduce the noise level in temperature time series (and reduce the level of autocorrelation in the process). This enables us to estimate trend rates with greater precision. And when you do so, you find that yes, Virginia, the trend since 1995 is statistically significant. And that’s true for all 5 major data sets, including the HadCRUT3v data set from Phil Jones’s organization.

Let’s analyze the adjusted (exogenous-factor-compensated) data we got from our last analysis. We’ll first fit a trend line since 1975 to estimate the autocorrelation of the noise. Then we’ll estimate the warming rate (and its probable error) by linear regression on the data from various start years to the present, for all start years from 1975 to 2005.

Here’s the adjusted data based on the HadCRUT3v data set:

Here are the warming rates estimated from each starting year to the present:

Note that until 2001, the error bars (which are 2-sigma) don’t include the possible value zero. Therefore, until 2001 the warming is statistically significant (using the usual de facto standard in statistics).

Not only is the warming since 1995 statistically significant, the warming since 2000 is as well!

The same result emerges from the analysis of adjusted data from NASA GISS:

Certainly, we shouldn’t fault Jones for not estimating the influence of exogenous factors when computing warming rates and their statistical significance. His analysis was straightforward and standard, and his statement was scientifically conservative (which is a common practice among scientists).

Just as certainly, we should fault those who claim that Jones’s “not statistically significant” statement means “not warming.” For most of those who do so, it’s not even an honest mistake.

69 responses to “Phil Jones was Wrong”

Thanks–I can’t begin to estimate the number of times that I’ve had to explain again–and quite often to the same ‘usual suspects,’–that “not quite statistically significant warming” is not the same thing as “no significant warming.”

I think most folks actually get the difference, but for some it’s apparently another ‘inconvenient truth.’

Oh, it’s worse than that. Most of the “skeptics” I’ve had to deal with equate “not quite statistically significant warming” with NO warming. And when you note that this is incorrect, they just point to the infamous Daily Mail headline and say, “See, there it is in the newspaper, you’re wrong.”

Sorry, I know this is an old post but I have to add, I had all out brawl with Lubos Motl, himself, on Peter Sinclair’s youtube channel over this issue. Lubos was literally saying if the warming did not rise to statistical significance then there was NO warming. He absolutely refused to acknowledge that there would be little difference between a 95% confidence level and a 94% confidence level.

It might be fun for someone to construct a statistics spreadsheet for global monthly temperatures and plot the times it has taken to reach the 95% signifigance level on monthly intervals :-), this way it would be also helping the deniers who could ask:”has the global temperature risen statistically significantly after… (I don’t know – august 1997?)

Not in my estimation using monthly data and the methods outlined by Tamino in the post “How long” (and severals post on arma(1,1) correction). For HadCRUT3v the trend i significant since 1994, see this figure:

[Response: Only by removing the influence of exogenous factors (el Nino, volcanic, solar) can you show statistical significance post-1995.]

Hi,
by coincidence, I was wondering aloud on RC yesterday about the Phil Jones quote, and how now that the 2010 data are in, a naive linear fit to the annual means for HadCRUT3v 1995-2010 (inclusive) gives a warming (0.11 ± 0.10 deg/decade ) just barely within the 95% confidence level (p=0.041). Being a Bear of Very Little Stats Skills, that’s about as far as I get. Shall I take it that there is enough auto-correlation in the annual means to nudge the confidence down below the magic 95%?

[Response: Yes. In fact the autocorrelation is small enough that you can’t even establish it with statistical significance from such a small data set. But from study of the monthly data, we know it’s there.

And by the way, 95% is often thought of as “magic” but it’s actually arbitrary. It’s certainly useful! But it’s not magic.

All of which serves to emphasize the many uncertainties that are often unappreciated in statistical analysis. That’s why I like to divide statistical results into three categories: 1. no way; 2. yes way; and 3. things that make you go “hmmm…”]

Could we see the warming rate graphs also for non-adjusted data? Maybe even plotting both on the same graph to see the difference?

I think this kind of warming rate/trend start year graph is maybe the clearest and simplest way of showing how nonsensical the “warming stopped in xyz” argument is, all in a single graph! (and maybe a few words of explanation for people who are not mathematically inclined)

Fredt34: are you the ‘Fred’ who translated that page?
I was looking for an email address to send a correction and couldn’t find one. In TreizeVents fourth figure, the curve fit to the residual trend is quadratic, not exponential. I confirmed this by reproducing the graphs myself, but it’s also apparent by visual inspection; the trend curve turns up at the left-hand end.

I have wanted to ask this question before. I don’t know much stats, and I may frame this poorly, but the question of statistically significant warming is whether you can exclude the null hypothesis of no warming. But if you have a long term significant trend shouldn’t the null hypotheses be that the trend has not changed? Is that sensible and how would you test that?

[Response: Very intelligent question.

From a purely statistical point of view, one can adopt either choice. Both enable us to estimate the probability distribution for getting the observed result (which is one of the real purposes of adopting a null hypothesis).

In my opinion, under the present circumstances it’s much more sensible to adopt “trend unchanged” as the null hypothesis than “trend zero.”

It’s also important to be aware (statistically, that is) that the null hypothesis isn’t just “trend unchanged” or “trend zero.” It’s “linear trend plus autocorrelated noise” or just “autocorrelated noise.” And we shouldn’t forget that rejecting the null can happen in multiple ways, and rejection only shows that *some* part of the null is incorrect.

Since both hypotheses are linear, in this case they end up being tested the same way. If the 95% confidence interval excludes the long-term trend we reject “trend unchanged,” if it excludes zero we reject “trend zero.” Some statisticians reject the idea of hypothesis testing altogether, and just state the confidence interval.

And of course, with very short time spans the confidence interval includes both the long-term trend and zero, so we couldn’t reject either null. That’s a sure sign of trying to read too much into a brief span of data.

Finally, I’ll mention that there are other (unaccounted-for) uncertainties, for instance, uncertainty in the estimation of autocorrelation parameters, and the fact that we’ve tested all starting values (in integer years) from 1975 to 2005 so we have multiple chances to “reject the null.” So, the error bars in these plots should be a bit wider than they are and any null rejection comes with a few grains of salt. But for data since 1995, the value zero is far enough outside the confidence interval that we can be confident rejecting the “trend zero since 1995” hypothesis.]

I recently got into a very frustrating argument about the Phil Jones quote, and I have to say, as interesting as I find this post it is entirely the wrong response for those who misunderstood Phil. After all, is Tamino now admitting that it hasn’t warmed since 2001? (No.)

My take on the question of the null hypothesis is:

1. There is no “one true null hypothesis” out there that is logically or empirically determined and if we haven’t proven it wrong it is right. That is to say, don’t get too hung up on it.
2. “Statistically insignificant” does not mean “accept the null.” Statistical insignificance is always achievable, with little enough data, only a statistically significant result tells you something. As Tamino shows, warming over recent years is not statistically significant. And that will always be true, no matter how long it continues warming.
3. For choosing a null hypothesis, I think of Occam’s razor — in the absence of evidence, the simplest answer wins. So a good rule of thumb is, after my hypothesis, what’s the next simplest possibility? (Not necessarily the very simplest possibility — you don’t want to make your test too easy.) So if you’re trying to establish that there is a trend, you test against no trend. If you are trying to show that warming is speeding up or slowing down/reversing, you test against trend unchanged.

Funny that so many people claim warming is slowing down/stopping/cooling, but no one ever tries to demonstrate that with statistics.

Eric, People get WAY too hung up on the whole “null hypothesis”. Fisher really only introduced it because statistical tests were inherently comparative, and if you were interested in the truth or falsehood of a given hypothesis, there was no way to treat that.

All the null does is give you an alternative to which you can compare. In most analyses, it is in fact very unlikely that the null is true. All the failure of a significance test tells you is that your description of the data with your hypothesis isn’t significantly better than it is without. Moreover, the level of “significance” is arbitrarily set usually at 2 sigma, and this also introduces a whole new devil’s playground for dishonest operators like Lindzen to play in.

Most of the denialists are innumerate idiots. Those who are savvy, are only interested in lying with statistics rather than teasing out the truth with them. They’re a sad, sad group.

[Response: There’s another useful attribute (in most cases) of the null hypothesis: it’s simple enough that it enables us to *calculate* what the probability of various outcomes is. Only then can we evaluate how likely or unlikely a given result is.

I think null hypotheses are often useful, sometimes even important. For example, the null hypothesis “This treatment has no effect on the progress of the disease” really cuts to the heart of the matter. But I agree that there’s too much emphasis on null hypotheses (and too little awareness of their implications!).]

Good points, but I think there’s a simple answer to your observation about the lack of efforts to use proper statistics to demonstrate “cooling.”

To wit, those who don’t understand statistics at all, can’t, and those who do understand statistics recall the lawyer’s precept not to ask questions when their answers are likely to hurt your case, and don’t.

After all, is Tamino now admitting that it hasn’t warmed since 2001? (No.)

Just to clarify, is that Tamino’s (No.)?

On Kevin’s comment, as a lay person of average intelligence, I was able to use wood for trees to help a skeptic better understand Tsonis and Swanson’s “shift”. Tsonis and Swanson admit the “shift” disappears if one uses GISTEMP. I’m staring to doubt that HadCRUT is the better of two series, and if people don’t present both series in a claim, I somewhat doubt they are saying anything useful; in fact, I’m starting to think they are being intentionally deceptive.

Ray, agreed, but “2 sigma is arbitrary” is the wrong response to this sort of thing too — after all, even with Tamino’s corrections, warming since 2002 hasn’t been significant at the 1 sigma level, so does that mean it has stopped?

The argument I make is, the 30 year trend is significant, so it’s warming, unless you have better evidence. But an insignificant result is not valid evidence for making any case; only a statistically significant result showing that you can put a lower upper bound on recent warming would make that case. You don’t have that? That’s because no one does.

Eric L., The point is that the null hypothesis is arbitrary. In practice you never accept the null hypothesis–you merely do not nullify it. After all, even 60% significance against the null hypothesis shows the null is likely not a good explanation either. People need to remember that the purpose of the null is merely a comparison to the hypothesis.

Again, this is just something people need to understand about science if they want to get by.

I have a less ambitious discussion about volcanic dimming, Nino34 and GISS trends here.

Your readers may be interested in my CTS.csv file which includes the 5 major global temperature anomaly series as well as Nino34, PDO, AMO, AO oscillations, CO2, SATO, stratospheric temperature, and volcano VEIs. 15 climate monthly time series all in one csv file so that do it yourself climate scientists can learn climate data analysis by doing without having to download and manipulate the source files.

If you can give me the link to the solar file you used I will add it to CT.csv so that your readers can have all the data they need to reproduce your analysis in Excel, R or whatever analysis package they prefer.

The thing that needs to be mentioned here though, is that Phil Jones was fielding a laundry list of questions from this BBC reporter. I’m sure he knew the reporter was not asking for an explanation that involved factoring out the various drivers over the last 15 years, though he could well have gone into that if he’d wanted to. He was just answering the question as it was presented, at face value.

[Response: Indeed. In fact, the title of this post is mainly tongue-in-cheek.]

Phil Jones has participated in several research papers where components for natural internal variability in global temperatures have been deducted, revealing the underlying global warming signal.

I cite one such, where there are some very nice graphs :-

“Identifying Signatures of Natural Climate Variability in Time Series of Global-Mean Surface Temperature: Methodology and Insights”
DAVID W. J. THOMPSON, JOHN M. WALLACE, PHIL D. JONES, JOHN J. KENNEDY

ABSTRACT
Global-mean surface temperature is affected by both natural variability and anthropogenic forcing. This study is concerned with identifying and removing from global-mean temperatures the signatures of natural
climate variability over the period January 1900–March 2009. A series of simple, physically based methodologies are developed and applied to isolate the climate impacts of three known sources of natural variability:
the El Nin˜ o–Southern Oscillation (ENSO), variations in the advection of marine air masses over the highlatitude continents during winter, and aerosols injected into the stratosphere by explosive volcanic eruptions.
After the effects of ENSO and high-latitude temperature advection are removed from the global-mean temperature record, the signatures of volcanic eruptions and changes in instrumentation become more clearly
apparent. After the volcanic eruptions are subsequently filtered from the record, the residual time series reveals a nearly monotonic global warming pattern since ;1950. The results also reveal coupling between the
land and ocean areas on the interannual time scale that transcends the effects of ENSO and volcanic eruptions. Globally averaged land and ocean temperatures are most strongly correlated when ocean leads land by
;2–3 months. These coupled fluctuations exhibit a complicated spatial signature with largest-amplitude sea surface temperature perturbations over the Atlantic Ocean.

Yep. Jones was set up there. The BBC included questions solicited from ‘skeptics’ in that interview, and that particular question came from Lindzen via Lubos Motl. In characteristic style, Motl brags about it thus:

In this thread, I discussed 1995 as opposed to 1994 because that’s the year that BBC asked Phil Jones about, and for a good reason. 1995 is the earliest year when the statistical significance of the trend from that year to 2009 safely fails. Since 1994, you could get a technically significant trend. It would still not be a robust result because a small change of the beginning year would destroy the statistical significance …

Not that I have a great reason to ask, but would it be advantageous to do your fit on the hemispheric scale first and then combine the two hemispheres? Different lags to exogenous factors in different hemispheres perhaps and a greater cooling effect from volcanoes in the NH. Just a thought?

[Response: It sounds like a very interesting experiment. In particular, the greater fraction of ocean vs land in the southern hemisphere might cause significantly different response between the hemispheres. Only the result will reveal whether or not it’s a significant improvement. Incidentally, the Ammann et al. volcanic forcing data are on a latitude-longitude grid, so one could also generate separate hemispheric averages (rather than use the global average).]

when you do a multiple regression like this, how do you deal with the training and prediction so that you don’t regress over the whole period? For example if I am trying to reconstruct natural variability components over the past 100 years should I use the 1st 50 as a training period, or is it preferable to do something like train on 2 years, predict the 3rd, then repeat twice using the alternate configurations (i.e. year 1 and 3 predict year 2, year 2 and 3 predict 1)

Thanks.

[Response: I’m not doing prediction or extrapolating to the past or future, I’m estimating the trend, so I do use the whole period.

If you are withholding a segment for validation, I recommend making the hold-out period a single contiguous block either at the beginning or the end of the time span.]

The Jones interview is an example of how a scientist can be led by an interviewer — and thereby mislead the public.

While the interviewer might not have intended it as such, given the intended audience, the question

Do you agree that from 1995 to the present there has been no statistically-significant global warming?”

was certainly misleading (and those who provided the question most likely intended as much)

And Jones should have been prepared to better deal with it, but probably not by talking about “exogenous factors”, which might just have made some people think he was trying to avoid the question (or perhaps even think he’s a pervert. ~@:>)

What it means is that Jones should probably have rephrased the question before he answered it so most people would have a better idea what it really means — and, more importantly, what it does not mean.

Most members of the general public have not a clue what “statistically significant” means. In fact, most of them probably ignore the “statistically-significant” part entirely and hear/see only “no global warming”. Either that or their eyes glaze over and they tune out entirely when they hear/see the word “statistic”.

In the interview, Jones did make the points that “The positive trend is quite close to the significance level. Achieving statistical significance in scientific terms is much more likely for longer periods, and much less likely for shorter periods. ”

But he really didn’t explain what that means and does not mean (ie, only a statistician could love Jones’ answers, especially “Yes, but only just [no statistically-significant global warming]”)

He might have stated in simple English that, in general one can’t gauge what is going on with temperatures (increasing, decreasing or remaining flat) over short time periods to the same level of confidence as for longer time periods (a few decades) but that this does not mean that scientists can’t still say with relatively high confidence that global warming has continued since 1995.

The level of confidence for “global warming since 1995” is a little less than the normal “standard” (satisfied over the longer term), but not by much and it really should make no difference in the grand scheme of things.

FWIW, I think Prof. Jones answered the question exactly as a scientist should, i.e. directly and without hesitating to give an answer which could be viewed as going against his scientific position. Scientists first and foremost are there to advance knowledge in their field; if they are good at comminicating their work to the general public that is a bonus, nothing more. It is unreasonable to expect every scientist to be sufficiently devious to anticipate every way their statements will be misconstrued by an adversarial press that isn’t bound by a duty to seek the truth (rather than controversy). That is an unfamiliar environment for most scientists, and one that most scientists would be keen to avoid given the chance. Prof. Jones wasn’t given much of a choice in the circumstances.

Rephrasing a perfectly good scientific question would have been a bad thing to do as it would have led (rightly) to the criticism of evasion. It is a slippery slope from science into rhetoric and I don’t blame him for not taking even the first step (unlike many).

BTW many scientists (and even statisticians) don’t fully understand what statistical significance means, it isn’t just the general public.

It is unreasonable to expect every scientist to be sufficiently devious to anticipate every way their statements will be misconstrued by an adversarial press that isn’t bound by a duty to seek the truth (rather than controversy).

Agreed, but it’s really not that difficult in this case to see what that question was intended for. That question was pretty clearly not asked of Jones to help the tech-savvy in the crowd to understand that the 95% confidence interval about the trend from 1995-date included the “0” trend (no warming), but so that Joe Public would read Jones answer as “confirming” that there was “no warming since 1995” and others would quote him as such (and on and on)

There’s no need to be devious at all, but maybe the suggestion to “rephrase the question” is misguided, since, as you point out, he was actually providing the best (most accurate) scientific answer to a scientific question.

But perhaps he might have gone into more detail about what it meant — and, particularly what it did not mean: that there has been no warming since 1995.

The real issue is that the audience for a journalist (and for the scientist talking to the journalist) is primarily not scientists, but the general public.

And even then, if all this stuff had no impact on public policy, scientists could stick strictly with scientific jargon and leave it to the journalists and/or the public to figure out what it all means in practical terms.

Unfortunately, given that it has a potentially large impact on policy and given all the misinformation being bandied about, scientists have to at least be aware of how their words might be used or even misused. In this case, the words don’t even have to be “twisted” because most people don’t have any idea about statistical significance and almost certainly read “no statistically-significant global warming” as “no global warming”.

But I don’t fault scientists for giving accurate truthful answers to questions. The ultimate responsibility lies with those who twist and otherwise misuse what they say.

I have to admit that I didn’t immediately read the question as being *that* loaded, it was only when I saw some of the silly interpretations in the blogs… As I said, it is a quite unnatural environment for most scientists.

The Jones interview is an example of how a scientist can be led by an interviewer — and thereby mislead the public.

While the interviewer might not have intended it as such, given the intended audience, the question

Do you agree that from 1995 to the present there has been no statistically-significant global warming?”

was certainly misleading (and those who provided the question most likely intended as much)

Horatio, sorry, but you miss the point. The BBC did not provide that question themselves. They solicited questions from GW skeptics prior to the interview, and that particular one came from Richard Lindzen via Lubos Motl. They are *scientists* who fed the BBC an intentionally misleading question so that Jones would be made to look foolish in the eyes of the public.

One way that Jones could have answered simply and accurately, and still convey the import of the trend, would have been to state the probability that the trend since 1995 was not random.

If I recall correctly the value that he was working with was 0.94… in that case, telling people that there is a 94% likelihood that the increasing trend is not due to random chance would probably have convinced many people of the nature of what is occurring with global temperatures.

Yes. I believe the confidence level from 1995 – 2009 was at least 92%, so it just misses by a hair. Go back just one year and start at 1994, and that pushes it over the 95% mark; the main reason being that 1998 was an outlier that skews the starting portion of the period upwards.

You don’t say quite how you’ve calculated the error bars (apologies if you’ve said it elsewhere and I’ve missed it). Are you taking autocorrelation into account? And is there any dof adjustment for the noise removal?

[Response: Yes (based on modeling the noise as an ARMA(1,1) process) and yes.]

From eye-balling the above graphs for the adjusted data, it looks like the 2-sigma range about the trend starting in the vast majority of the years over the last 3 decades includes a warming trend of 0.2C/decade.

The “failure” to include 0.2C/decade warming appears to occur only with the HadCRUT data set and then only for trends starting in just a few of those years: 1997 and 1998 (corresponding to the strong El Nino) and 2001 (for which the top of the 2-sigma range falls just below 0.2C/decade )

Horatio seems to recall a claim (or perhaps two) in the blogosphere of “IPCC Projections Falsified” using 0.2C/decade as the criterion … and a trend starting in 2001.

I’m pretty sure this is the first time (personally) that I’ve seen two WUWT ‘contributors’ having a go at each other. Interesting, because nearly everyone at WUWT has their own little pet theories, and they never seem to point out the problems with each other’s bold statements of ‘fact’ as long as they are anti-AGW.

Tamino, there is one thing that bothers me about this kind of analysis. At one level, the question is the familiar one – how much of the variance can be explained factor x and factor y. However, this approach seems slightly different. A variable is calculated from multiple regression to remove estimated effects of various known factors. Then statistics is calculated on the regression. However, does the estimate of the significance of the regression allows for the error in estimation of the known factors? Something doesnt seem quite right to me.

Are you going to publish this?

[Response: Yes, and yes.

Suppose, as an example, that we removed the estimated influence of some factor but the data weren’t precise enough to determine it. Then the estimated influence would be zero (within the error limits), and we’d effectively be removing nothing, just analyzing the original uncorrected data. In such a case, removing the estimate wouldn’t do any good, but it wouldn’t do any bad either.

In fact even if the estimated influence were just noise, it would remove a *false* estimate, and that would make the estimated trend more imprecise. But it would also decrease the *degrees of freedom* in our analysis, and that increase the estimated uncertainty. So, our estimated trend would still be statistically valid (i.e., correct within its error limits).

The fact is that the exogenous factors which were removed *do* show a statistically significant response, so they improve the quality of our trend estimate. In fact the trend uncertainty is greatly reduced by removing the external factors.

I hope this makes things clearer; I know it’s not an intuitively obvious result.]

Thanks for that explanation. My first thought was to achieve the same thing by putting the whole thing into a single model, including the trend (another factor) and test significance of the various factors with conventional ANOVA type model. However, I see that this wouldnt help with looking at time range for significance, because you want to use the whole data set to estimate effect of ENSO etc but only part of the data set for estimating the trend.

If asked “Do you agree that from 2001 to the present there has been no statistically-significant global warming?”, based on your analysis above is the following answer possible (from length of error bar) and sensible?

Yes, but this means that
‘warming at a reduced rate’ is more likely than
‘warming has continued at the same rate or more’ and that is more likely than
‘no warming’.

[Response: Certainly not. For the rate estimate to change as the error estimate grows is expected — the likelihood that the rate estimate will remain unchanged even though the rate remains unchanged is very small. Occam’s razor: warming has continued at the same rate.

Likewise, if the rate estimate had *increased* as the uncertainty grew, that’s really not evidence of acceleration — the rate remains unchanged is still most likely.

And: I regard GISS data as a more realistic estimate than HadCRU anyway.

Seriously: there’s no evidence that global warming has stopped, or even slowed.]

crandles, You are under the common misconception of what happens as sample size decreases. Your best estimate may well not change or may even increase. However when looking at significance, you are looking at the lower end of the confidence interval. The confidence interval widens on both the top and bottom.

We know the question was a setup. Why do people still believe they only had a perfect denialosphere talking point prepared to make out of Jones’ “straight scientific” answer and not for some more elaborate ones? “Jones now raping statistical methods to ‘prove’ Global Warming Myth” – how does that sound to sum up “the warming trend has a 92% confidence, which is very close to the usual value used to determine statistical significance”. How about “Jones cherry picking data” to “Well, if we start in 1994 instead of 1995, the trend is statistically significant.”