Friday, August 23, 2013

We have recently uploaded to academia.edu a manuscript, coauthored by us two and two others, with the title 'Can climate models explain the recent stagnation in global warming', in which we compare the magnitude trends in the global mean temperature recently observed - trends in the last 10 years and the trends in the last 15 years (1998-20012) - with the ensemble of trends simulated by climate models participating in the Climate Model Intercomparison Projects CMIP3 and CMIP5. Recent trends as low or lower as those observed in the HadCRUT4 data set, of merely 0.4 C/century, are reproduced by at most 2% of the scenario simulations. Also two other analyses of the development of global mean temperature have been considered, with a higher trend of 0.8 C/century by GISS and 0.4 C/century by NCDC - these other trends show up in the ensemble of scenario simulation at most in 4.7% of all cases and 0.6% of all cases. Obviously, there is some uncertainty in the trends, but our overall conclusion that the present trends are at the margin of the distribution generated by available A1B and RCP4.5 scenarios is robust against this uncertainty.

To increase the size of the simulated ensemble of model-suggested trends we analysed not only the recent simulated trends under the regime of A1B and RCP4.5 scenarios, but also all n-time segments in the period up to 2060, in which the assumed external forcing increases linearly as in the emission scenarios A1B and RCP 4.5. These scenarios describe changing emissions of greenhouse gases and aerosols, but do not describe changing solar activity, volcanic activity or any cosmic influences, since scenarios (or even predictions) of these factors for the next decades are very uncertain to construct. We let n vary between 10 and 30 years.

If the slow trend derived from the GISS, HadCRU or NCDC analysis would continue for a total of 20 years, the trend would occur at most in 0.9% of all cases. Of course, this statement is conditioned by the presently available set of scenario calculations (in CMIP3 and CMIP55). The title "Can climate models explain the recent stagnation in global warming" was likely misleading, as we did not examine climate models in general, but merely the output of contemporary climate models, subject to a specific class of scenarios, which are best mimicking the recent development. Maybe a title like "is the on-going warming consistent the developments envisaged by scenarios simulations exposed to realistic increases in GHG forcing" would have been more appropriate.

This manuscript was submitted to Nature, but it was not accepted for publication. Unfortunately, the reviews are subject to copyright-rules by nature, and we are not allowed to reproduce the reviews here. The manuscript has been clicked on more than 3000 times until 22. August 2013, with most clicks from spiegel.de, but also many from bishop-hill.net. We want here to set straight some misinterpretations that may have arisen in the blogosphere, e.g. Bishophill, and may also have been present in the review processes by Nature as well.

The main result is that climate models run under realistic scenarios (for the recent past) have some difficulty in simulating the observed trends of the last 15 years, and that are not able to simulate a continuing trend of the observed magnitude for a total of 20 years or more.This main result does not imply that the anthropogenic greenhouse gases have not been the most important cause for the warming observed during the second half of the 20th century. That greenhouse gases have been responsible for, at least, part or even most of the observed warming, is not only based on the results of climate simulations, but can be derived from basic physical principles, and thus it is not really debated. It is important to stress that there is to date no realistic alternative explanation for the warming observed in the last 50 years. The effect of greenhouse gases is not only in the trend in global mean near-surface temperature, but has been also identified in the spatial pattern of the observed warming and in other variables, such as stratospheric temperature, sea-level pressure and others.

However, climate model projections are not perfect. They are in a constant state of revision and improvement. The comparison between simulations and observations, and the identification of any mismatches between both, is thus a very important, and probably unending, task in climate research. This manuscript should be viewed under this perspective. However, the basic features of man-made climate change have been robustly described by these models in the course of time, even if more detail has been added, and rates of changes have somewhat changed in the course of time.

To understand the present mismatch, we suggest four different explanations; none is pointing to a falsification of the concept that CO2 and other greenhouse gases exert a strong and likely dominant influence on the climate (statistics of weather). None represents a falsification of climate models. But all point to the need for further analysis and improvement of our tools - which are scenario simulations with climate models û for describing possible future developments.

One is an underestimation of the natural climate variability, which could be related to variations in the heat-uptake by the ocean and/or in internal variations of the energy balance itself (such as cloud cover). Another possibility is that the climate sensitivity of models may be too large, but a longer period of mismatch would be required to ascertain this possibility, as 15-years trends are still strongly influenced by internal climate variations. A third possibility is that the set of external forcings prescribed in the CMIP5 simulations lack a component of relevance. In particular, the CMIP ensembles assume a constant solar irradiance, due to the difficulties in predicting solar activity. However, solar irradiance displays a negative trend in the last 15 years, which could be part of the explanation of this mismatch. Finally, although the number of simulations that produce a trend as subdued as observed is small, it is still not zero. The last 15 years may have been an outlier, especially considering that the starting years - 1998 - experienced a strong ENSO event, and therefore was anomalously warm. Thus, further analyses are necessary and we intend to carry them forward.

At present, we cannot disentangle which of the different possible explanations is the best - maybe a combination, but the conclusion is not: GHGs play a minor or no noteworthy role in ongoing and expected future climate change. A conclusion that we draw is that the A1B and RCP4.5 scenarios, which are used in very many impact studies, are suffering from some limitations.

Our paper does not represent a crises of the understanding of the climate system, but a wake-up call that scenarios have to be prepared better, and that all impact studies should expect that details of future scenarios concerning speed of change and intensity of natural variability may be described quite differently.

53
comments:

Karl Kuhn
said...

Dear Drs Zorita and von Storch,

thank you very much for your willingness to share and discuss your carefully worded conclusions with the audience of the Klimazwiebel. The following two sentences caught my attention, and I would like to pose two questions, myself being someone illiterate in climate science, but deeply involved in modelling of complex biophysical-economic simulation systems.

"That greenhouse gases have been responsible for, at least, part or even most of the observed warming, is not only based on the results of climate simulations, but can be derived from basic physical principles, and thus it is not really debated."

Is the water vapor feedback effect part of these 'basic physical principles', and can measurements of water vapor content in recent decades be replicated by the models you investigated?

"It is important to stress that there is to date no realistic alternative explanation for the warming observed in the last 50 years."

I infer from this that climate science and perhaps even climate simulation models CAN explain the warming period of the first half of the 20th century without invoking the greenhouse gas effect?

The paper seemed to be fairly clear in its abstract that the warming stagnation over the last 15 years " is no longer consistentwith model projections even at the 2% confidence level". But maybe the Bishop Hill headline "models falsified" was an exaggeration.

As I am sure you are aware, Nature rejects the vast majority of papers it receives.Has the paper been submitted to another journal?

The strong position taken in the beginning of the article is negated by the discussion that follows it.

The various possibilities raised in the second half of the article are in play only owing to one reason: the lack of a match between model output and real temperatures. But, the strength in confidence and the quantum of attribution, of 20th century warming to CO2, derive in no small measure from the same models.

The model/s should do better than just scraping by narrowly. If tomorrow tempertures increase, will the models 'perform' better? Sure they will!

Verification in science comes only with prediction, i.e., getting the right answer for the right reason. Everything else is moot.

Hans - did the reviewers discuss the choice of CMIP5 simulations? Having 21 members (out of 62) from the various flavours of the GISS GCM, which all very low amplitude variability does not seem very representative?

http://rankexploits.com/musings/2013/von-storch-et-al-stagation-in-warming/Lucia discusses the paper, and adds her own analyses. Michael Tobis comments.

My own question: How does the out-of-sample error for these models compare with the in-sample error from the "training period" of last century's temperatures? If it is much greater (which I don't know) doesn't that imply that there's a bigger problem - that the model design was essentially curve-fitting - than that the climate sensitivity is a little high?

As an example, many are saying now that the "missing heat" has gone into the deep ocean; that's why surface temperatures are too low for a decade or more. But what was happening for the last century? We have absolutely no data on the deep ocean temperatures from then, but is it the least bit believable that the effect started only since the year 2000? How did all those models correctly model surface temperatures for a century, without taking this obviously big effect into account? If so, how can anyone claim that the models are "just the physics" and are not tuned?

Dr. Hawkins, I think the Lucia post and several around it discuss the climate variability of the models. She suggests that the variability of the models is _already_ too high; that is, that the real climate historically varies less than the models do. Obviously you can make the models fit any data by putting enough noise into them.

I'm not really addressing your point - it certainly makes sense to take a representative sampling of the models if one is studying them, but I thought it was relevant.

You cannot call this science surely? That you cannot explain the last 15 years is mainly because you cannot explain at all what natural variation consists of or how powerful it is. What gave rise to the little ice age for example? If you predict something to rise parabolically and it goes the opposite way do you not at the very least deeply reconsider your predictions?

As this is so minor a trend it may even be natural and benign. Only the models convert innocuous warming into catastrophe and they are completely falsified.

Regardless of the 1 degree that is supposed to come from a doubling of CO2 by hugely simplistic 1D theory, the remainder was supposed to come from postulated positive feedbacks that just don't seem to be there. 1 degree is beneficial even according to the IPCC so the panic is about nothing.

Also the entire idea that mans contribution could be teased out from natures comes from models that are falsified. They are not imperfect as some like to say, they are inadequqte and unfit for the purpose of policy.

In short, that serious energy policy is based on this bag of speculative argumentation is an indictment of the entire field of climate science. You could be marching us off a cliff of high energy costs, low growth, poverty and starvation based on what amounts to mere pessimistic guesswork.

You might see it all as a curiousity, worthy of further funding ad infinitum. However, we in the real world see it as too much academic hubris resulting in massive waste of public and private funds in the middle of a deep recession.

"It is important to stress that there is to date no realistic alternative explanation for the warming observed in the last 50 years."

There are a number of temperature reconstructions available with time intervals from centuries to millennia containing temperature frequencies and its power strength in analysed spectra. Despite the unknown physical nature of the mechanism it is a realistic alternative to the projection models explaining the warming in the last 50 years, because the relevant temperature frequencies can identified as astronomical functions. From the power strengths of the frequencies it is possible to simulate the global temperature, except the effects of the ocean impedances and the delay in time, and except the volcano drops.

http://www.volker-doormann.org/images/solar_tides_1850.gif

Adding the fast functions, which have the most interference with the ENSO delays, it becomes clear that the simulation from temperature frequencies is an alternative method to explain the global warming. This was also the base of the idea of M. Lativ in 2008, when he has transferred temperature frequencies from the 1950ies and has correct forecast a stagnation in temperature.

http://www.volker-doormann.org/images/stagnation_lativ_2008.gif

My experience with models of heat currents in streaming fluids were based strong on physics and the physical properties of the fluids in a geometric model of 1:1.

http://www.volker-doormann.org/images/pbliz8000.jpg

I think this is an unalterable supposition for a physical model with no alternative.

"The CMIP5 models have a large diversity in their simulated variability:"

It is good that we are in agreement, though you may missing a point here.

If enough models, each with sufficient inter-decadal amplitude of global temperature change are included (in the ensemble), the confidence intervals will necessarily widen. At one point, the composite of models will essentially turn non-falsifiable, i.e., produce a range of temperatures that real-world temperatures would never fall out of.

The test of predictability comes from real-world temperatures falling with confidence intervals for an ever-decreasing number of models.

If the range produced by models includes one model that is flatline, and another that shows a temperature rise with slope corresponding to 0.7 (say, for example), that set of models would be non-falsifiable.

you wrote 'Despite the unknown physical nature of the mechanism..'. Then this is no valid explanation in the modern sense.

We can for instance also 'explain' the orbits of the planets with Fourier analysis, but this explanation would be ptolemaic. It would not include the real cause for the orbits (gravitation). I could b brave and explain that the sun rises every day because I also have breakfast everyday.

Indeed, there is no alternative explanation, so far, for the observed warmig of the last 50 years that does not include GHG. This does not mean that climate models are perfect and provide accurate predictions. But I would really like to see an alternative physical theory that tells me why temperature is rising near the surface and hes risen by 0.8C in the 20th century , it is cooling in the stratosphere, why sea levels are rising for the last 200 years by 20 cm (i.e. why the planet is gaining energy as a whole). In other words, we have to request to any alternative theory the same level of accuracy and falsifiability that we require for GHG.I may be wrong here because I cannot fathom all the papers that have been published, but I cannot remember any publication prior to 1998 that predicted the current stagnation. Please, correct me if you have better information.

I would also like to see a falsifiable prediction for the next years. if the20th century warming has been caused only by natural mechanism, when will temperature start to drop and by how much. Can anyone way be specific here ?

Ed Hawkins/4 - No, the reviewers did not comment on a possible bias related to the usage of relatively many GISS-family scenarios. We used all the data available at the CMIP5 data base, made no selection.

Paul Matthews/2 - "As I am sure you are aware, Nature rejects the vast majority of papers it receives.Has the paper been submitted to another journal? "Yes, a rejection by nature is by no means a catastrophe but a kind of standard. But it was clar that we would have a hard time to argue against the reviewers, who first of all pointed to "not really innovative" and "depends all on the warm year 1998". We are now extending the manuscript (for nature it had to be very short), and will submit it well after the IPCC publication of the WGI report in about half a year or so.

Karl Kuhn/1 - "Is the water vapor feedback effect part of these 'basic physical principles', and can measurements of water vapor content in recent decades be replicated by the models you investigated?

I infer ... that climate science and perhaps even climate simulation models CAN explain the warming period of the first half of the 20th century without invoking the greenhouse gas effect?"

Ad 1: part I, yes, part II: I do not know, I guess others can answer.

ad 2: Yes, the first part of the climate variations in the 20th century could be explained be natural variations. -- see, for instance the early study Hegerl, G., H. von Storch, K. Hasselmann, B.D. Santer, U. Cubasch, P.D. Jones, 1996: Detecting anthropogenic climate change with an optimal fingerprint method. - J. Climate 9, 2281-2306. Note the paper is 19 years old; first submission in August 1994.

you gave here four possible explanations, but in the draft linked above I can only find three.Did the fourth explanation presented here (I would call it the "bad luck hypothesis") emerge in the peer review process?

Hans von Storch:We used all the data available at the CMIP5 data base, made no selection

This is hard to believe, as the CMIP5 database currently lists 43 models for which the relevant data are available. Even if you failed to download one or the other, as easily can happen, you either somehow "forgot" on the order of 20 models, or you made a selection indeed. (I'm not saying a planned one to bias the results, as the other poster implied).

Andreas,we did not discuss the trivial explanation - the small likelihood event of - you call it - "bad luck".for scientists this is obvious and does not need to be discussed - at least I would presume so - but for a more general public it may be worth to list it explicitly.

@22hvv,the cmip5 site is a 'meta-site'. This means hat the data are not actually stored there: it is hub that redirects the user to the individual sites that actually store the data. Sometimes, the files are broken or the variables listed at the cmip5 site are not actually available. We are checking this though.

This question is, however, not relevant here. We have now re-done the analyses with data from other source that stores directly the global averages and the results remain unchanged. For instance, using the RCP4.5 runs (109 in total) until 2060, the HadCRUT4 temperature trend in 1998-2012 lies below the 2% percentile of the RCP4.5 ensemble: the same as in our manuscript.

Other blogs reach similar conclusions, and as do other published published papers .

Perhaps in this case we should try to find out why in this period the model ensemble is barely compatible with observations, and in doing so improve models and in the end come up with better projections, instead of uncritically dismissing the message beforehand.

There may be simple explanations for the stagnation, e.g. that the heat is going into the deep ocean. But then, we have to find out why models have problems in sending this amount of heat to their respective model ocean.

I am well aware that the relationship between the ESG database and actually existing and usable files is not perfect. It's a pain. However, I am sitting in front of actual files of tas for rcp45 from 44 models. Given that an unavoidable weak point of such an analysis is that you are restricted to an "ensemble of opportunity", I believe it is highly desirable to make sure to use anything that is available. If I were a reviewer, I'd be bitching big time if you presented only a subset and state that the incompleteness is "not relevant here".

That said, I do not strongly believe that the results change significantly if you include all models.The paper is nice and clear and somebody has to do this first step. Otherwise I agree with Andreas below that this cannot be all we've got to offer. This is an exploratory result which doesn't provide a robust answer to the question about the likelihood of this 15y trend happening conditional on the models having no collective error that would lead to the underestimation of variability on that timescale. More research is needed :).

Drs. Storch and Zorita, I'm wondering why the ensemble of models is the right metric to be using. Would it be a good idea to identify which models failed, and are rejected, and which are not rejected (yet)? Why not get rid of the ones that didn't work, and proceed with what remains?

thank you for your input. The number after the semicolon, as you said, represents the number of files, which may refer to 6-hourly, daily and months means for different sub-periods altogether. For instance, for model GFDL-CM3 you indicate 59 number of files. The CMIP5 site at Lawrence Livermore Nat. Lab. includes just 1 realization of model GFDL-CM3 for scenario rcp4.5 ( I just checked this).

In my previous comment - maybe you overlooked it- I indicated that we have repeat the analysis downloading the global means from the Climate E Explorer for a total of 109 simulations, and the results are the same as in our manuscript

you are absolutely right that the number of filenames I listed are useless. Not because they refer to different timesteps (it's all monthly) but because files are split into different intervals. If you like to compare with what you got from Climate Explorer (would that be regarded as an authoritative source anyways?) here are the number of realizations per model that I can see and from what you have about 74%:

That brings be to another question: Apparently you are using multiple realization for a model, if available. Doesn't that give undue weight to the models with many runs? In other words, would different realizations of the same model not be expected to show a similar, model-specific variability?

Mike,in some sense, estimations of the climate sensitivity based on Bayesian methods are based on what you are proposing. They essentially are weighted averages, the weights being a measure of how close a model is to past observations. However, this becomes quickly a more fundamental question: if I am a pilot and one among three on board computers disagrees with the other two, I would not build an average among three. I would try to understand why this happens. On the other hand, it may very well happen that the model one would reject because it fails to reproduce the temperature trends, is the one that produces a better annual cycle of, say precipitation.

I would essentially agree with you that one goal should be to disregard the worst models, but there are different opinions on this. In the end, the question boils down to 'what does an ensemble of models represent, when at most only one can be right ?'

It must be rather frustrating for you to have your paper rejected by Nature and then see today a paper published in Nature saying more or less the same thing"Overestimated global warming over the past 20 years".

I wonder what is the difference between the two papers, apart from the names of the authors?

Paul, we mostly interested in the ability of scenario simulations in describing the present stagnation, not in explaining the stagnation. That is quite different.What I find difficult with the "other" paper that it is again an a-posteriors explanation (like cold European winters caused by less Arctic sea ice in the preceding fall) and just one. There are in principle others, and we would need to do some work to disentangle the plausibility of different explanations.

hvw saidDoesn't that give undue weight to the models with many runs? In other words, would different realizations of the same model not be expected to show a similar, model-specific variability?They do. I've done this a different way combining the distitributions by model. I've counted each entry at the climate explorer as a model to estimate a typical variability for a model but based the estimate on models with more than 1 run in the projection. (My method requires repeat runs from a model to estimate the variability due to initial conditions only.)

If you examine my figure in that post you'll see the variability of trends differs from model to model as does the mean trend.

The results are similar to VonStorch and Zoritas.

I haven't organized the code to collect together some models listed in several cases (e.g. E2-H_p1, _p2, _p3 better be considered 1 estimate of the variability; if this is done, likely E2-R and MPI-ESM should be similarly grouped. )

lucia, thanks for the info. Your approach to dealing with such an unbalanced ensemble sounds like an improvement.

A new study (http://www.nature.com/nature/journal/vaop/ncurrent/full/nature12534.html) seems to point to a link between ENSO-related SST patterns and the currently observed small global temperature trend.

I wonder whether something can be learned by sorting the models under consideration by their performance in capturing ENSO.

Another thought: If we assume (or better hope) that modelled global temperature variability doesn't change much with the system's position on a warming trend (and you and HvS and EZ apparently do that by considering the distribution of n-year trends stationary in a 55 year interval), then it might be worthwhile to examine the AMIP runs with respect to their decadal variability. But someone already did this, I suppose ...

hvw - the new study published by nature on a possible link to ENSO is certainly encouraging, but it is typical how things are negotiated - somebody suggests one solution - which explains what happens, but it dos not help to sort our question, what is wrong with the scenario simulations - but only one. There may be others, and before declaring that our problem is solved we must be able to exclude other "solutions".

hwp,lucia, thanks for the info. Your approach to dealing with such an unbalanced ensemble sounds like an improvement.I don't know if it is an improvement--but it has the potential for addressing whether the estimate of the variance in trends is over-dominated by models with smaller variances in trends which some like Ed Hawkins suspect to be the case.

In this regard: it is worth noting that if we examine residuals from the linear trends relative to what we see for the earth, on average the models have too much natural variability, not too little. Mind you: this test is dominated by variability at timescales less than the trend length and also, some models have less small scale variability that the observation. And also: the test is ambiguous (high model residuals could arise from excessive internal variability in models or from failure to correctly model volcanic eruptions). Nevertheless, the test can be done, where in contrasts tests to compare variability at long time scales to earth variability have such low power as to be practically impossible). And this test which has the advantage of being 'doable' does not point to individual models having too little internal variability on average.

I wonder whether something can be learned by sorting the models under consideration by their performance in capturing ENSO.I was planning to apply an enso adjustment to the models which must be done if one is going to compare ENSO corrected observations to model outcomes. I grabbed the required model data, but haven't done it yet. (I've got to get of my duff and do it.)

If the models do simulate enso propery, this should narrow the variance in trends for models. As we have had La Nina's recently, it ought to move the earth trends more positive. How the two will pan out together I don't know-- but I anticipate to be similar to what's in the Fyfe paper.

FWIW: I prefer the comparisons without ENSO as more useful for a variety of reasons including the fact that once one considers ENSO, one has a variety of choices to try to remove ENSO, and many choices means that one might hunt for the method that gives answer the analyst 'prefers'.

Another thought: If we assume (or better hope) that modelled global temperature variability doesn't change much with the system's position on a warming trendThat's an important issue. But this assumption that the variance in 'n' year trends is identical over all periods is testable using the exact same methods we can use to test whether the variance in trends from different models differ from each other. So the assumption seems warranted (or at least is not inconsistent with the model-data available.)

We can compute the variability in trends across runs of identical models over matched periods and test if this variability changes over time. (The other test is to see if variability differs across models).

I have done so in the past with the AR4 models and there is no particular evidence the natural variability in trends increased (or decreased) over the 20th or modeled 21st centuries or that the variance differs from period to period.

In contrast, the same method used to detect whether variances differ across models confirms they do differ. The variance in trends is larger in some models than others. I need to repeat this and formalize it. (I think I mostly did 10 year trends too. ) I haven't done this check for the AR5 models mostly because I have a number of items on the 'to check' list.

I do not think that one can order models according to skill or quality. Depends all on the metric, and there is no way of choosing a "best" metric. Why should ENSO be more important than extreme rainfall in Asia, than the MJO, or the formation of blockings, just to mention a few.

Also, when jumping on the ENSO part, you have made a choice among the three (or four) explanations for the inconsistency of observed recent trend vs A1B/RCP4.5 trends - you say: it is the natural variability. But how do you know it is not a a lack of external forcing, or a possibly slight overestimating of the GHG response? Maybe we even had only bad luck, and this stagnation is a two in hundred rare event?

Does the natural variability explanation - which I personally find attractive - have a specific political utility, namely that we do not need to touch on the quality of the forcing nor on the quality of the response to forcing?

By the way, when the natural variability is not ok, and the models usually describe the full variability in the 20th century well, then the additional/missing natural variability must have been accounted for by forced variability. If too little natural, then the responses is overestimated, if it is too large, then it is underestimated.

We need time and patience to deal with these issues and should not jump on the most convenient explanation why our scenarios fail in describing the recent (and quite possibly soon ending) stagnation.

Like you, I doubt we can ran models according to quality. I think hypothetically it could be done. But -- as you say, why prefer ability to mimic ENSO vs. MJO? Hypothetically if one model was sufficiently bad at everything we could throw that one away.

On the 'ENSO part', the only reason I think it's worth examining whether ENSO is an explanation is that when presented data showing the current observations are skirting or outside the range of the models, some people always immediately suggest it is ENSO as hwp did just above. Since methods of explaining ENSO for earth observations exist, when some suggest 'it's just ENSO', it can be worth looking into that issue and seeing whether correction application does change the result.

namely that we do not need to touch on the quality of the forcing nor on the quality of the response to forcing?I actually favor these two as the more likely reasons because I don't think the main reason for the discrepancy is ENSO.

models usually describe the full variability in the 20th century well,Do models describe it well "well"? And how well? Collectively the variability of 10 year trends in the models used in the AR4 was exceeds the variability of 10 year earth trends in the 20th century by between 2% to 30% depending on whether the comparison is made between models and HadCrut3, NCDC or GISTemp.

The collectively model variability in 10 year trends exceed that of the earth variability despite the fact that (a) the earth includes measurement errors on top of other variability and (b) some of the the AR4 models did not include volcanic or solar forcings. The effect of each factor individually should tend to make variability in observations of earth trends larger than in models-- and yet earth trends are, if anything, somewhat smaller. (The amount depends on whether one chooses HadCrut3, NCDC or GISTemp for the comparison.)

then the additional/missing natural variability must have been accounted for by forced variabilityIn fact, with some of the models in the AR4, we can see large variability. But if tabulated, the excess might be overlooked -- because those model runs contained no volcanic forcings. And so while the very large spikes in earth temperature frequently coincided with volcanic eruptions, those in the model simply occur due to that models internal variability. For example, see echam5:http://rankexploits.com/musings/wp-content/uploads/2009/06/figure2_echamp.jpg

So, what we have here is a model whose variability of 10 year trends might not look so poor when variability of 10 year trends in single runs over the 20th century are tabulated and compared to that of earth trends, but which, to some extent, achieved that goal precisely be leaving out volcanic forcings which are thought to have caused a portion of the variability in 10 year trends for the earth.

Certainly there are other models whose variability seems possibly too small. But if we make the comparison in the aggregate, the variability of 10 year trends in individual models seems more likely on the high side than the low side.

We need time and patience to deal with these issuesI agree with this. Unfortunately, with only one earth one can't go to the lab collect replicate earth observations which would be very useful if we could have them.

#41,44I am not sure that the models should be weighted by the number of realizations they provide. This would assume that the models are independent, which has been shown not to be true. Let us assumed we have 50 realizations with model M. Perhaps 2 of this realizations have been done on another computer, or with another compiler, or someone changed a comma in the FORTRAN code. Formally, these two realizations belong to a different model, and yet in reality, they are almost the same model. If we weight compute the variance separately and combine them, these two realizations would be unduly overweighted.We have two sources of variability for the trends: The structural (model) variability, and the internal variability. By weighting the ensemble, we are implying that the first is more important. Why ?This question was indirectly addressed in the manuscript (supp. info). The ensemble is not a random sampling of a putative `model space`. Actually, we do not know what the ensemble represents, and so weighting the ensemble is not per se better.

eduardo,I agree with you that weighting is not necessarily better. On other hand, it's not necessarily worse either. To some extent, weighting by runs vs. weighting by model are just different ways and getting similar results both ways merely shows a degree of robustness. That is: the result isn't emerging merely because of a somewhat arbitrary choice.

For example, while you example explains the difficulty with treating two things that claim to be different models as different when they are the same, a similar issue would hold if the 50 run model and the 2 run model really were different with different parameterizations or solution methodology, but we weighted by runs. In this case, each model does provide an independent estimate of the variance runs about a mean given that set of parameterizations. Meanwhile the difference in the mean between the two runs gives an estimate of the effect of structural uncertainty.

Addressing your example where the same model is run 50 times and called "A" and then run 2 times and called "B", computing the variance by combining the two models wouldn't result in a great deal of bias in the computed variance which will tend to be the same as if we pool all 52 and simply compute over the full 52. My understanding is the difficulty is merely that we will get a less precise value estimate. And while the variance computed this way will be unbiased, the standard deviation will tend to have a low bias arising form the small sample size of 2 runs. So, weighting by model would, in this instance, be a suboptimal use of the data, but not truly horrible. Meanwhile if they had been different models, weighting by runs could result in model A's variance swamping the analysis.

I elected to do by model because (a) I wanted to look at individual models anyway to see how their means and variances looked relative to observations, (b) back when the AR4 was published, the multi-model mean highlighted in graphs and tables was obtained by first computing model means and then averaging over model means. So, my graphs mimic that methodology and c) computing a pooled variance from individual models gives a cleaner estimate of typical internal variability stripped of the variance that springs from the structural uncertainty and (c) cannot be done by computing the variance in trends over runs without first separating into models. As such, the variance weighted by model is a better model based guide to variability arising from uncertainty in initial conditions. (Assuming models get 'weather right, of course.)

I did by the way agree with the comment in your manuscript that the ensemble is not really a random sampling of putative 'model space' (while replicate runs in an individual model may be.)

Fyfe, Gillet and Zwiers construct an empirical distribution of the difference of trends ( model minus observations ) based on bootstrapping (see their supplementary information). They do take into account the different number of realizations but their scheme implies a much smoother weighting than just weighting by the number of realizations

eduardo--I looked at the supplemental in fyfe and saw they don't weight by realizations. My discussion is addressing hvw concern. I pointed out that I get the same thing weighting by model rather than realization.

I didn't plow through the details in fyfe enough to know know what happens in the case where the distribution of model runs about the model mean is normal and the distribution of model mean is normal and how that compares to what I did. I just skimmed. That gives me the gist but I often have to sit down and think through limits to fully understand how methods relate.

I think hvw's concern when criticizing weighting by runs might be your paper where things are weighted by run/realization.

But as I noted: I get more or less the same results with different weightings and using a different method. So as a practical matter, I don't think the choice is making much difference. And also, I'm not claiming one weighting is necessarily better than the other especially given that we really are not able to pull models randomly from a set of 'all possible models'.

the number of realizations enetsr in the estimation of the empirical dustribution of trends under the null hypothesis

in the supp. info:the deviation in the j-th trend for model i that is induced by internal variability. Sincethe model i ensemble is generally small, the deviations are smaller than would berepresentative of an infinitely large replication of runs for model i, and so tocompensate for that loss of variance, multiply the difference M ij − M i. by[ Ni /( N i − 1 )]1 / 2 .

So it is not a direct model weighting, but the number of realizations is taken into account indirectly

The stratosphere has not been cooling since 1995 so no need to find an explanation for that at all! Stratospheric cooling in fact was the official IPCC "fingerprint" for AGW and in any other field the hypothesis would have been rejected rather than the wrong-footed "experts" being allowed to suggest a slew of contradictory and unphysical excuses for a "warming masked by cooling". Whither Occams razor?

Explaining the brief, minor and beneficial heating period of the 20th century is less useful than explaining historical cooling periods. What caused the recovery from the ice ages when CO2 was at it's maximum levels? What caused the little ice age? What caused the drop after the 1940's, what causes the current plateau? As it happens the only plausible theories available for all of these are amplified solar forcing. CO2 cannot explain cooling at all. And since what causes the cooling likely also causes the heating then CO2 is not required to explain the 20th century.

Solar forcing was the dominant consensus theory for centuries. It is also still perfectly valid for both the Arctic and the US48 temperature datasets; the only ones with little likely influence from urban heat islands.

And if I was to explain that the current plateau should really have started in the 60's when sunspots levelled out but aerosol reduction from the clean air act caused a temporary cooling then you'd rightly say that I just made that up. Yet that is the current, accepted reasoning for the inability of the CO2 hypothesis to explain the post 40's drop in temperature. This juxtaposition demonstrates the facile logic that is allowed only if you are a catastrophist.

It is undeniable fact that the cumulative comment count of this blog remains stagnant since nine days now!

This is extremely unlikely according to to our ensemble of of blog comment simulations in which only 2 in a hundred show a similar behavior. The paper was rejected by Nature, but this incidence still points not only at a possible sudden death of Klimazwiebel but puts in question hitherto undoubted results, on which our models are based, about the character of cumulative anything.

Sustainable use of KLIMAZWIEBEL

The participants of KLIMAZWIEBEL are made of a diverse group of people interested in the climate issue; among them people, who consider the man-made climate change explanation as true, and others, who consider this explanation false. We have scientists and lay people; natural scientists and social scientists. People with different cultural and professional backgrounds. This is a unique resource for a relevant and inspiring discussion. This resource needs sustainable management by everybody. Therefore we ask to pay attention to these rules:

1. We do not want to see insults, ad hominem comments, lengthy tirades, ongoing repetitions, forms of disrespect to opponents. Also lengthy presentation of amateur-theories are not welcomed. When violating these rules, postings will be deleted.2. Please limit your contributions to the issues of the different threads.3. Please give your name or use an alias - comments from "anonymous" should be avoided.4. When you feel yourself provoked, please restrain from ranting; instead try to delay your response for a couple of hours, when your anger has evaporated somewhat.5. If you wan to submit a posting (begin a new thread), send it to either Eduardo Zorita or Hans von Storch - we publish it within short time. But please, only articles related to climate science and climate policy.6. Use whatever language you want. But maybe not a language which is rarely understood in Hamburg.