Wednesday, December 5, 2012

Global Warming Redux

Matt Asher, a popular blogger on statistics has taken a fresh look at global warming. He has come to the conclusion that based on the temperature data for the last 131 years, as provided by NASA, that the change in temperature could be explained by random noise and most importantly that the trend in temperature is not statistically significant. He does this by using an approach of randomly sampling annual changes in temperature from a distribution that has a mean of zero and a standard deviation equal to the historic standard deviation in changes in temperature. In his starting model, he then simulates this approach many times and shows that 56% of the time the simulated change in temperature over the entire period is more severe than the actual historic change in temperature. Based on this, he concludes that the change in temperature over the last 131 years could be explained by random fluctuations.
But is he correct?
I decided to test the data myself using a Bayesian approach.

My core model is an moving average time series model. This model basically says that the temperature in one year equals the temperature in the prior year + random change. The random change is a function of the excess of the prior year’s temperature over the mean expected temperature for that year. The model suggests that extreme temperatures in one year are followed by temperatures closer to the mean in the following year. However, the mean temperature can change over time. The key issue is to test whether the mean temperature has a positive or negative trend.

I also allow for secular changes in the temperature trend assumption and in the standard deviation assumption.

The formal model is
[Temp in year t] = [temp in year t-1] + c + b(t - (n+1)/2) + theta * (difference between temperature in year t-1 and mean expected temperature in year t-1)
where:
c= annual trend in change in temperature (the Trend Factor)
b= annual change in Trend Factor
theta = regression to the mean factor
t=1,…, n; n=131
I also assume that the change in temperature has a standard deviation that can change over time (parameter d - see code at the bottom of the post).
The Bayesian approach treats every parameter above as a random variable. It estimates a distribution for the value “c” which is the Trend Factor. I further calculate the probability that c is greater than zero (called “prob.positive.trend”). The results of this analysis show that the prob.positive.trend is estimated as 97%!
This suggests that there is very strong evidence that there is a secular trend in the change in temperature.
If I simplify the model to:
[Temp in year t] = [temp in year t-1] + c + theta * (change in temperature in year t-1)
and assume that the standard deviation of temperature remains constant, I get a similar result for the prob.positive.trend.
The estimated distributions of each of the parameters are shown below (click to enlarge):
(You will note that “prob.positive.trend” has a bimodal distribution at 0 and 1. This value is set to 1 if c>0 and 0 if c<0. 97% of the time c>0)
So why do my results contradict Asher’s? Firstly Asher shows that if you start taking into account correlations from one year to the next that his 56% probability drops to 39%. Although the way he does this is a little odd and that potentially is the source of the problem. He samples data in 2 or 3 year batches resulting in the correlation being “broken” after this period. It also might be that the Bayesian model is more sensitive than the non-parametric approach followed by Asher. Finally, he does not incorporate the regression to the mean feature of changes in temperature. This is a significant feature in temperature trends. and constrains a "path" of temperatures from deviating too wildly from the mean.
In summary, proponents for global warning are still safe. The data from NASA appears to strongly support global warming trends.
Please feel free to leave a comment on this page and let me know why you think the two approaches (mine and Asher’s) lead to different conclusions despite using the same data.
_________________________________
The code for doing the calculations is done in R and JAGS. The R code is:
library(rjags)library(lattice)
temps <- read.csv(file='C:\\patth.to.annual.temp.data\\temperature.csv')#temps$Temp = average annual temperature calculated from NASA data for last 131 years
beta.dat.jags <- list(n=nrow(temps), y=temps$Temp, start.temp = temps$Temp[1] )
jags.fit <- jags.model("temp.jags", data = beta.dat.jags, n.chains = 2, n.adapt = 500000)
data.samples = coda.samples(jags.fit, c('theta', 'c', 'b','d','sigma.mean','prob.positive.trend'), 400000)
summary(data.samples)densityplot(data.samples)
The results produced by the “summary” function are (click to enlarge):