Friday, May 16, 2008

Roger gets it right!

But only where he says "James is absolutely correct when he says that it would be incorrect to claim that the temperatures observed from 2000-2007 are inconsistent with the IPCC AR4 model predictions. In more direct language, any reasonable analysis would conclude that the observed and modeled temperature trends are consistent." (his bold)

Unfortunately, the bit where he tries cherry picking a shorter interval Jan 2001 - Mar 2008 and claims "there is a strong argument to be made that these distributions are inconsistent with one another" is just as wrong as the nonsense he came up with previously.

Really, I would have thought that if my previous post wasn't clear enough, he could have consulted a numerate undergraduate to explain it to him (or simply asked me about it) rather than just repeating the same stupid errors over and over and over and over again. This isn't politics where you can create your own reality, Roger.

So let's look at the interval Jan 2001-Mar 2008. I say (or rather, IDL's linfit procedure says) the trend for these monthly data from HadCRU is -0.1C/decade, which seems to agree with the value on Roger's graph.

The null distribution over this shorter interval of 7.25 years will be a little broader than the N(0.19,0.21) that I used previously, for exactly the same reason that the 20-year trend distribution is much tighter than the 8-year distribution (averaging out of short-term variability). I can't be bothered trying to calculate it from the data, but N(0.19,0.23) should be a reasonable estimate (based on an assumption of white noise spectrum, which isn't precisely correct but won't be horribly wrong). This adjustment doesn't actually matter for the overall conclusion, but it is important to be aware of it if Roger starts to cherry-pick even shorter intervals.

So, where does -0.1 lie in the null distribution? About 1.26 standard deviations from the mean, well within the 95% interval (which is numerically (-0.27,0.65) in this case). Even if the null hypothesis was true, there would be about a 21% probability of observing data this "extreme". There's nothing remotely unusual about it.

By the way, this calculation ignores the fact that Pielke cherry picks the starting point of the interval to maximize the the z-value. This can be factored into the calculation by looking at distribution of the maximum z-value over a period instead of looking at the distribution of the z-value at a fixed point in time.

You'd think that being able to cherry pick outside of a 2 sigma envelope shouldn't be that hard. One of 20 randomly selected intervals ought to disprove* these statistics. Make that one out of 40, since we want to fail on one particular side of the curve.

* Disprove, in this case, means "demonstrate ignorance of probability"

In Pielkeworld, when "there is a strong argument to be made", it means that Pielke can't actually be bothered to (horrors!) actually make the argument. All he needs to do is to draw some nice little diagrams, and then invite you to feel his desired conclusion. Quo errat demonstrator.

And when you show him to be wrong, he'll just ignore you, or he'll go `you're right, but...' and then move on to the next bellyfeeling exercise. All the while pretending that his bizarre pronouncements haven't all been falsified or shown to be unfalsifiable.

Gavin confirmed via email that the actual distribution of trends from 7 years of model data is N(0.20,0.24) which is obviously consistent with my estimate of N(0.19,0.23) for 7.25 years. These are all pro-rated as 10-year trends for consistency.

Chuck - exactly. Even with cherry-picking, there is absolutely nothing unusual about the last few years. It is strikingly ordinary, which is pretty obvious from just looking at it and hardly needs a formal analysis. The only mildly surprising thing at all in the last 30 years was the 1998 El Nino which is about 2.5sd from the trend line (about a 1% event if we assume Gaussianity, although assuming Gaussianity for the extreme outliers is a pretty dodgy assumption anyway).

So faced with this stupendously normal data which is worth absolutely nothing to the denialists, they actually make it worth less than nothing by producing desperately wrong analyses.

> ou'd think that being able to cherry pick outside of a 2 sigma envelope shouldn't be that hard.

He is only cherry picking one endpoint of the interval (the starting time - the end time is constrained to be the present). Therefore, unless he is picking very short periods, he doesn't have that much wiggle room.

I ran a simulation and got about 30% of being able to get a p-value of 5% or less when cherry picking forty years.

What you and Roger are arguing about is not worth arguing about. What is worth arguing about is the philosophy behind comparing real-world data to model predictions. I work in the chemical industry. If my boss asked me to model a process, I would not come back with an ensemble of models, some of which predict an increase in a byproduct, some of which predict a decrease, and then claim that the observed concentration of byproduct was "consistent with models". That is just bizarre reasoning, but, of course, such a strategy allows for perpetual CYAing.

The fallacy here is that you are taking models, which are inherently different from one another, pretending that they are multiple measurements of a variable that differ only due to random fluctuations, then doing conventional statistics on the "distribution". This is all conceptually flawed.

Moreover, the wider the divergence of model results, the better the chance of "consistency" with real-world observations. That fact alone should signal the conceptual problem with the approach assumed in your argument with Roger.

I don't see what the problem is, Tom C. It seems obvious that the less specific a set of predictions is, the more difficult it is to invalidate. So yes, consistency doesn't neccessarily mean that your model is meaningful, especially over such short terms. But I don't see how it's conceptually flawed.

Roger is digging himself deeper into his hole. Look at http://sciencepolicy.colorado.edu/prometheus/archives/prediction_and_forecasting/001431the_helpful_undergra.html, quick, before he takes it down again. He's confused the distribution from the models with various estimates of the obs trend.

The term "independent models" makes no sense in statistics. If there are two models modelling the same phenomenon, they're either the same, or they're mutually exclusive.

(You can combine models to form a new mixture model, which is again different from all other models; but the analysis is totally different from pretending that the separate models are "independent" observations of some sort.)

When combining models into an ensemble, do you simply give equal weightage to each of the models, or do you adjust the mixture weights using e.g. Expectation-Maximization? Does this question even make sense? Thanks!