Blogs and peer-review

Nature Geoscience has two commentaries this month on science blogging – one from me and another from Myles Allen (see also these blogposts on the subject). My piece tries to make the point that most of what scientists know is “tacit” (i.e. not explicitly or often written down in the technical literature) and it is that knowledge that allows them to quickly distinguish (with reasonable accuracy) what new papers are worth looking at in detail and which are not. This context is what provides RC (and other science sites) with the confidence to comment both on new scientific papers and on the media coverage they receive.

Myles’ piece stresses that criticism of papers in the peer-reviewed literature needs to be in the peer-reviewed literature and suggests that informal criticism (such as on a blog) might undermine that.

We actually agree that there is a real tension between a quick and dirty pointing out of obvious problems in a published paper (such as the Douglass et al paper last December) and doing the much more substantial work and extra analysis that would merit a peer-reviewed response. The approaches are not however necessarily opposed (for instance, our response to the Schwartz paper last year, which has also lead to a submitted comment). But given everyone’s limited time (and the journals’ limited space), there are fewer official rebuttals submitted and published than there are actual complaints. Furthermore, it is exceedingly rare to write a formal comment on an particularly exceptional paper, with the results that complaints are more common in the peer reviewed literature than applause. In fact, there is much to applaud in modern science, and we like to think that RC plays a positive role in highlighting some of the more important and exciting results that appear.

Myles’ piece, while ending up on a worthwhile point of discussion, illustrates it (in my opinion) with a rather misplaced example that involves RC – a post and follow-up on the Stainforth et al (2005) paper and the media coverage it got. The original post dealt in part with how the new climateprediction.net model runs affected our existing expectation for what climate sensitivity is and whether they justified a revision of any projections into the future. The second post came in the aftermath of a rather poor piece of journalism on BBC Radio 4 that implied (completely unjustifiably) that the CPDN team were deliberately misleading the public about the importance of their work. We discussed then (as we have in many other cases) whether some of the responsibility for overheated or inaccurate press actually belongs to the press release itself and whether we (as a community) could do better at providing more context in such cases. The reason why this isn’t really germane to Myles’ point is that we didn’t criticise the paper itself at all. We thought then (and think now) that the CPDN effort is extremely worthwhile and that lessons from it will be informing model simulations some time into the future. Our criticisms (such as they were) were mainly associated instead with the perception of the paper in parts of the media and wider community – something that is not at all appropriate for a peer-reviewed comment.

This isn’t the place to rehash the climate sensitivity issue (I promise a new post on that shortly), so that will be deemed off-topic. However, we’d be very interested in any comments on the fundamental issue raised – how do (or should) science blogs and traditional peer-review intersect and whether Myles’ perception that they are in conflict is widely shared.

185 Responses to “Blogs and peer-review”

Does 11C have any more merit in relation to the new CO2 article just posted? If Hansen is projecting increased climate sensitivity to around 6C from 3C from the previous average possibility then does 11C become more likely or have I got this wrong?

In addition does it matter much that AGW is hapenning 30x faster than previous episodes did?

Geoff Sherrington, I think you may have misinterpreted Eli’s post. He was not suggesting that amateurs contribute significantly to the state of knowledge in complex fields such as climate change, but rather that their participation where they can allows them the experience of doing science.
Moreover, I think it is beyond doubt that amateurs have contributed significantly to the fields he mentioned–or do you consider the Shoemaker-Levy collision with Jupiter to be trivial?

pete best (101) wrote “…does it matter much that AGW is hapenning 30x faster than previous episodes did?” Yes. It is difficult for organisms to evolve quickly to survive in the new conditions. So The Sixth Mass Extinction seems to be underway…

Ray, the models do offer “insights” and many times have helped us make sense of the data and new insights have sent us back to the data to find confirmation that we weren’t aware of before.

However, there are issues about when and if the models transition from providing “insight” into the plausibility of hypotheses to having the skill and accuracy to resolve a quantitative issue such as attribution of the 20th century warming.

The “peer review” of climate models does not seem up to this task. Models have often been used in several peer review articles touting bold results about the climate, before they have been subjected to diagnostic studies and found wanting in various ways. What part of the early publications were a real scientific result? The authors certainly don’t have to retract the model results, because those were what they were and barring mistake or fraud, model results are model results. But the bold extrapolations from model to climate are not retracted, and involved embarassing assumptions in reasoning that should not have passed peer review, and unfortunately are being repeated even today.

For example, should a model based leap in reasoning such as “The observed signal falls outside the range expected from natural variability with high confidence (P

Martin, the left angle bracket symbol is mistaken by the blog software for the beginning of a HTML code, and the stupid-paperclip-feature fills in what it imagines you wanted to type. “View Source” and see what it did to your posting. There’s some trick to avoid this.

Ray, the models do offer “insights” and many times have helped us make sense of the data and new insights have sent us back to the data to find confirmation that we weren’t aware of before.

However, there are issues about when and if the models transition from providing “insight” into the plausibility of hypotheses to having the skill and accuracy to resolve a quantitative issue such as attribution of the 20th century warming.

The “peer review” of climate models does not seem up to this task. Models have often been used in several peer review articles touting bold results about the climate, before they have been subjected to diagnostic studies and found wanting in various ways. What part of the early publications were a real scientific result? The authors certainly don’t have to retract the model results, because those were what they were and barring mistake or fraud, model results are model results. But the bold extrapolations from model to climate are not retracted, and involved embarassing assumptions in reasoning that should not have passed peer review, and unfortunately are being repeated even today.

For example, should a model based leap in reasoning such as “The observed signal falls outside the range expected from natural variability with high confidence (P less than 0.01). … We conclude that natural internal climate variability alone cannot explain either the observed or simulated changes.” pass peer review muster, when there are peer review results showing that the model is unable to reproduce the signature of the solar cycle found in the observations and has other published issues in the diagnostic studies? I wouldn’t let such a leap get past conference reviews such as I have participated in, yet even prestigious journals seem to have a lower standard of review for climate science.

Models and climate are said to “match” each other or to be “realistic” in peer reviewed papers, usually without any rigorous analysis or explanation of what is being called a match, and whether that match is sufficient to prove the skill required for the bold results claimed.

I submit that assessing quantitative climate model results and claims is beyond the scope of normal journal peer review, and must await the diagnostic studies that often occur years later. Yes, it is professionally limiting to have to wait years after you have completed your work to be able to publish original research results that might be argued to be relevant to the climate. By then you will already be deep into working on version n+1, which hopefully has higher resolution and fewer and better parameterizations. If it is any consolation, I’ve heard that other fields, such as fusion research, have it worse.

Martin, the attribution of the current warming epoch to greenhouse gasses is in no way dependent on computer models. The fact that warming is occurring and that the warming is exceptional rests purely on empirical observations, historical data and analyses of various proxies. Now you could argue that some of these could be flawed–indeed, some are. However, when the all point to the same conclusion, that is what provides high confidence. The level of greenhouse forcing is determined from a variety of sources–and this says we should have warming, whether you put it into a computer model or not.
The models are not a fit to temperature data, but rather a best fit for forcers to a variety of independent data, with the goodness of fit to temperature data (among other things) as a validation. And over all, the models fare very well in these validations.
I agree that there are many residual uncertainties in climate models–greenhouse forcing just doesn’t happen to be one of them.

Ray, Agreed. The GHG forcings are well established as exceptional. The uncertainties are in the feedbacks, the resultant signatures and in the solar forcing and coupling. The current plateau in solar forcing is arguably just as exceptional (and questionable), as the recent climate is. Unfortunately, anthropogenic greenhouse forcing alone only gets you less than a third of the way to the recent warming. We need modeling of the feedbacks, etc. for a reason, I just don’t see evidence that they are ready for this particular task yet.

Gavin: A sensitivity analysis of models in my line of work would be to take all the parameters to their error bar extremes and find out what is the total potential locus of results. I’d say that aerosols at their extreme values would always produce cooling curves in some scenarios. This is not climate sensitivity but model parameter sensitivity. If this kind of analysis is never done then I’d be amazed.

[Response: You confuse testing the model against climatology (which has a fixed (but uncertain) aerosol distribution) and a transient experiment with changing aerosol distributions. Now there are no limits whatsoever on what you could have for theoretical transient – so there is not much use to try anything other than good estimates of the observed trends. And for dealing with the cliamtology you aren’t looking at the transient signal at all. – gavin]

Ray Ladbury:”both the agreement of model predictions AND the thick tail are important results. The fact that predictions agree by and large supports the contention that the most important contributors to climate are well understood.”
No. The only proper validation test for model runs is how well they agree with reality. If we understood climate well then of course the model results would agree but the reverse is very obviously untrue. I have seen this argument used before (eg Held with relation to droughts) and I still can’t believe any scientist thinks in this backwards fashion.

James G. I think you misunderstand my point–the agreement of the models says that there is agreement on the most important forcers–it suggests consensus.
James G and Martin,
The fact that the models agree as well with overall trends in the very noisy climate data suggest that on the whole they are correct. The uncertainty comes in for the less well constrained aspects, and the takeaway message here is that that uncertainty is overwhelmingly on the positive side. Since that is also where the highest costs are, those fat tails could wind up dominating the risk.

I worked on a government research project and the results were not released. Neither myself, nor my project lead JD Reynvaan are allowed (under the NDA we signed) to make public any of the results independently, so if the agency we were working with doesn’t release it themselves, the data will not be publicized. I wonder how much research data gets suppressed utilizing NDA terms?

Lydia, In aerospace research, we are not allowed NOT TO PUBLISH our results, although if a particular vendor is involved, we may call them “Vendor A”. I’m not sure how this is legal if the research is government funded.

thanks for coming over here to discuss this issue with us.
In my view our contentious sentence “we feel that the most important result … is that by far most of the models had climate sensitivities between 2ºC and 4ºC” is not wrong, it is certainly not an indication of a lack of fact checking at RealClimate, and also it is not a criticism of your paper. Rather it is a valid interpretation of your results. Other scientists may interpret your or my data differently – this is sometimes annoying but part of a healthy scientific discourse. Even today I still think that this in fact is your most important result. Let me try to explain.

You argue that you started from a model with about 3 ºC climate sensitivity, and then randomly perturbed parameters in all directions – so of course the peak of your distribution is near 3 ºC. I agree with that. As you know we also work with model ensembles, and what I found the most interesting question both with our own and with your ensembles is: how broad is this peak? I.e., how quickly do you get away from those 3 ºC when you change the parameters? How strongly do you need to tweak a model to get a really different climate sensitivity? Note that we did not write that the most interesting result is that you get a peak near 3 ºC – we wrote that the most interesting result is that most models remain inside the range 2-4 ºC when you perturb the parameters. I still find this small spread far more interesting than the outliers.

Concerning the outliers, my prior expectation (and maybe this is where we differ) is that a model version with 11 ºC climate sensitivity is very likely just an unrealistic model, which would fail a number of quality checks – we discuss this in detail in our original post. This is why per se I do not find those tails very interesting – yes, by tweaking parameters enough I can get a model to behave very strangely, but so what? I certainly would not go to the mass media suggesting that 11 ºC could be a realistic climate sensitivity (not even as an extreme case of a wide range), before I had performed some pretty rigorous testing on these high-sensitivity models. (Now this is a criticism – not of your paper but of your media outreach.) If I had a climate model with 11 ºC climate sensitivity that had passed the kind of validation tests discussed in the IPCC report – e.g., which gives a realistic present-day climate, including seasonal cycle, a realistic response to perturbations like the Pinatubo eruption, a realistic 20th-Century climate evolution and a realistic Ice Age climate – well, then I would call in a press conference. But I don’t think anyone has ever produced such a model.

Thanks also to all the other contributors to this discussion – I think this is excellent, and a good advertisement for science blogging.

If I may semi-digress into the issue of skeptics: One thing to pin them down on, considering their rejection of the clearness of “global warming”: Ask, “OK, so you’re not sure that global warming is definitely occurring, or that if so it is mostly man-made. But do you at least accept the action and presence of “greenhouse gases?” Water vapor is one [some skeptics like to talk of how the effects of water vapor swamp those of CO2], do you also accept that CO2 is a “greenhouse gas?” And if so, shouldn’t we at least concerned about a long-term rise in its concentration, even if we can’t agree on just what the outcome has been and will be?” That would take away a lot of steam from their evasions, and force some acknowledgment of the causal stresses in any case.

BTW, with possible solar cooling etc. we really should IMHO take a closer look at the interaction of all stimuli and not focus narrowly on the CO2 issue.

Some of you seem oblivious to the circular reasoning:
. If a climate sensitivity of 3 degrees is an input then it isn’t too surprising that 2-4 degrees should be your dominant output range.
. If you make a best guess on the highly uncertain aerosol parameter then you are obviously selecting it on the basis of what it takes to match real world temp trends. It is then guaranteed that the output temperature trends will match.
. If models all come from similar roots with similar parameters, it’s not surprising they’d have similar results. In fact even if they had different equations and/or different inputs but still achieved similar outputs we still wouldn’t be confident which inputs were actually correct. So model run matching is virtually meaningless.
. And how do you know a “good estimate” in an uncertain parameter? Is it by a show of hands? I know that sparsity of data makes such estimates necessary but we still need to match predictions with reality to be confident in the models.

Hank: Now point out where the obs don’t agree with reality and look for the obvious shortcomings in that paper. I’m happy though that coupled models are improving – the fewer the inputs and the greater the internal dependencies the more we can trust them.

JamesG, your missive makes clear that you are utterly ignorant of how climate modeling is done. First, parameters are not “fit” to the temperature data. Rather they are independently constrained by other datasets. The process of constraint and datasets vary from one modeling team to the next, so the models really are independent tests. The fact that the models agree as well as they do demonstrates that the conclusions of these independent analyses converge more or less on the forcings. The validation of the models is the degree to which they match trends (not values) in the temperature data and other tests (e.g. Pinatubo response).
Until you understand how the modeling is done, it’s hard to take your criticisms seriously.

a short reply to Neil (118): I accept the action (physics) and presence of “greenhouse gases. I also accept the greenhouse gas capability of CO2 and H2O (though maybe not in the exact proportion professed). I think we ought to be concerned (or at least very curious) over the potential increase in global CO2 and curious of its effect on global climate. (I’m not using “curious as a throw-away or a pejorative.) I’m not clear at all how this “takes the steam” out of my skepticism.

Rod B: I suppose that by “curious” you mean, just how much effect CO2 has had/would have is not patently clear. But its stimulative effect surely calls out for some sort of (albeit if in “reasonable” moderation) of mitigation strategy, does it not?

PS Feel free to explain more of your skepticism, something I surely tolerate in modest doses.

Neil (re 122, and 121, 118): That’s a close enough definition of “curious” in this context to understand my point. But, no, I don’t think the mathematical precision and uncertainty in some of the processes yet warrant mitigation. With one philosophical exception that hangs in the back of my mind, and that is the “inevitable dooms day” scenario — holding off any mitigation until (and assuming if) my skepticism is scientifically answered but finding it’s then too late to make any difference in the progression to extinction. Plus a small pragmatic exception for those “mitigation” activities whose costs are insignificant or that produce other desirable benefits. For instance putting some effort and resource into reducing our use or dependency on fossil fuels, like alternate energy sources, is helpful in its own right (to a degree), and if it also buys a little insurance against AGW, that’s seems a good thing.

One example of my maybe three areas of skepticism relates to the mathematical and physical relation between concentration and forcing (the old ln of the fifth power of the concentration ratios), the degree of radiation absorption broadening as concentration increases, and in the general area of molecular absorption, transfer, redistribution and/or re-radiation of energy and/or temperature. However, my current mode is an onus on me to research the science a bit more before I resurrect it fully in RC. My areas of skepticism have been discussed quite a bit on RC in the past. It became evident I had to improve the science behind my questioning so others didn’t waste their time reciting the same arguments over and over.

I would think that the model results should converge on the climate, not the forcings, so I am not quite sure what you are saying.

The benefit of considering the model efforts “independent” disappears, if despite this independence, they are documented to have correlated error, as the AR4 models have been shown to share a positive surface albedo bias, a delay or deficit in Arctic ice cap melting, and a missing or weak solar cycle signature, all of these relative to the observations of the actual climate.

The real question for the peer reviewers is what level of agreement of the models with the climate, or with each other should be required before the claims that authors want to make in peer review papers should be allowed to pass.

The diagnostic studies show that the models have significant disagreement with the climate, that is arguably relevant to the relative attribution to solar activity and anthropogenic GHGs, and thus any validation of the models for projection of GHG scenerios. A positive albedo bias differentially impacts solar response, and a failure to reproduce the solar cycle signature that is present in the observations hurts the models credibility in attribution to GHGs rather than solar.

What is really misunderstood that model comparisons, in spite of a lot of swapping of code and data, are all over the place like a mad woman’s breakfast. It is ludicrous to say that the good model agreement endorses the modelling processes, because there is not good model agreement.

Oh, I forgot. There is one comparison in Douglass et al Int J Climatol (2007) “Comparison of tropical temperature trends with model predictions”, which has one model agreeing in places with the mean of 22 models. The units are in millidegrees C per decade and are they calculate delta T for 13 altitudes. How’s this for a match?
Good model, first column. Average of 22 next, then SD next column.

Must be a good model if it agrees to one part in a thousand degrees C/decade for 3 different altitudes. Now how do you suppose that might have happened? Is this reality or is it Dreamtime? How come we use all those significant figures, when the overall error is perhaps of the order of 5,000 millidegrees?

Are these models good when the SD several times exceeds the value? (Why you guys don’t work in proper units like degrees Kelvin eludes this scientist).

Sorry Ray, It’s not me who misunderstands. I understand only too well.

[Response: Not really. It’s easy to find metrics where the SD among models is larger than the expected signal. Just make it regionally restrictive enough and use a short time period. I personally would not use such cases to argue for model robustness, and so picking a clearly coincidental match to argue against robustness, is just odd. – gavin]

Come off it Gavin, you are supposed to be a math person. What are the odds of 3 consecutive numbers agreeing so closely? 10^n, will you provide the n?

As a person used to looking at numbers, does this not arouse even a teeny bit of innate suspicion?

You must have looked at spaghetti plots of model comparisons lately. You must agree they are a mess. I suggest it is well past the time that modellers had a coference to solidify criteria that must be met in order to justify any further modelling. It must be costing zillions for the tiny increments of progress that come in year after year. Maybe the whole premise is impossible given present technology and data. Intractible problems are real and known.

[Response: Wooo….. numerology! And if you take the digits of the numbers that match you get my telephone number. Must mean something right? ‘Zillions’? – I wish. – gavin]

a failure to reproduce the solar cycle signature that is present in the observations hurts the models credibility in attribution to GHGs rather than solar.

WHAT solar cycle signature? As far as I know nobody has been able to detect a clear effect from the 11- or 22-year cycles, though many people have tried. Every time someone seems to find one, someone else points out a statistical flaw in their method. Read Tamino’s “Open Mind” blog for a thorough discussion of this issue:

Re: “Scientists should embrace the open scientific debate, and anyone who challenges that should be made very, very clear that without open debate, there simply is no science, no matter how much one is in favor of or opposes to particular people, statements and actions.”
Yes, but open scientific debate should happen among scientists (and it often does). A climate scientist and professor I recently met commented on such wisdom by asking “what would you do if your kid were sick? Would you go to a doctor or find 100 random people and take a democtratic vote (or open scientific debate) about what they think may be ailing your child?” I love the question because I think it makes obvious that if you want/need a real, scientific answer, you go to an expert/authority and not for open debate.
BUT I don’t mean I am against science blogging. Absolutely not. I have become an avid science blogger thanks to Realclimate and know it is agreat way of reaching and interacting with the lay public. It is a necessary and very effective tool for educating the lay public about the science. And it’s fun and RC is the best of the best. But it is not necessarily intended to debate the science.. IMHO.

“…the puzzling thing about the C&T results, which will need to be sorted out by a more detailed analysis of the oceanic response in the IPCC AR4 models. I can only explain their result by a combination of high sensitivity and low thermal inertia. But why should the thermal inertia that explains the solar cycle response be so much lower than the thermal inertia needed to explain the seasonal cycle over oceans. Camp and Tung are on to something interesting, but I don’t think it can be said that we understand what is going on yet. …”

Re 132. In a quick read of the paper, I did not find albedo either, though a more careful look might reveal that it is there. Hard to see how they would miss something that basic.

In any case, this paper does not in any way that I can see undermine our current understanding of AGW. The authors analyse the solar cycle as an oscillating, 11-year-cycle forcing that adds to or subtracts from AGW, depending on the part of the cycle one is in.

They state in the abstract: “From solar min to solar max, the TSI reaching the earth’s surface increases at a rate comparable to the radiative heating due to a 1% per year increase in greenhouse gases, and will probably add, during the next five to six years in the advancing phase of Solar Cycle 24, almost 0.2 °K to the globally-averaged temperature, thus doubling the amount of transient global warming expected from greenhouse warming alone.”

The paper doesn’t undermine the current understanding of AGW, it undermines the evidence for credible quantitative attribution (.e.g. claims of “most”) of the recent warming to GHGs. Such claims are largely model based. Note this result from the Camp and Tung paper:

“Currently no GCM has succeeded in simulating a solar-cycle response of the observed amplitude near the surface. Clearly a correct simulation of a global-scale warming on decadal time scale is needed before predictions into the future on multi-decadal scale can be accepted with confidence.”

Of course, confidence in attribution (not just projection) of the recent warming, probably would also be enhanced by models able to reproduce the solar response, since solar is the competing hypothesis for the recent warming. I don’t see models as ready yet to differentiate between the competing hypotheses.

Based on conference presentations by Tung and Camp, raypierre believes there is another paper that will be more detailed in its diagnostics on the various AR4 models.

Ray Ladbury
I last clashed with you when I said that picking a few “good” rural stations from Anthony Watts effort should independently verify the GISS US48 graph. You vehemently disagreed, somehow arguing that there could be a cooling signal introduced. I had argued that small amounts of very accurate data are intrinsically better than huge amounts of dross that need adjusting but you argued that all data was valuable, even obviously bad data. Well I was right and you were wrong. But then I was only stating the obvious. There are always certain things that are blindingly obvious in science and yet there are always frustrating characters like you who fight tooth and nail to avoid seeing them. The many circular arguments in climate science, some of which I highlighted, are indeed obvious too, if you are prepared to step out of your comfort zone of blind faith and start behaving like a scientist.

Of course the models are tuned to match 20th century temperature trends and the lassitude of the aerosol parameter “constraint” is what allows a lot of that tuning. It’s as plain as the nose on your face and even if it hadn’t been then you could turn to the peer reviewed studies from aerosol experts (non-skeptics I stress) who confirm it.

Your original reasoning was that if the model runs matched then they must be correct which was by itself laughable in it’s naiveté. Now you say that if they all match the real trends then they must be correct. Well no! Because a) If they are set to match the trends then it would be a pretty bad model that didn’t manage it; so all this exercise accomplishes is to show up the really bad models, And b) Hind-casting is a necessary but not a sufficient condition of modeling as your teachers should have drilled into you before you were allowed to ever practice science. It is very easy to reach the “correct” result by the wrong method in modeling. Which you’d know if you’d done any modeling. Such is the value of knowing the answer beforehand. Prediction is the only proper method of finally validating models and even that isn’t really sufficient if you (like Hansen) make such a wide range of predictions that it is almost impossible to fall outside of it.

JamesG, I don’t have the first idea of what you are blathering about. Do you? You seem to be under the misapprehension that the only purpose data can serve is to give you “the answer”. They also serve the purpose of error modeling, etc. Sensitivities are not “tuned” to temperature datasets. They are independently constrained to the extent possible. Aerosols are less constrained than other forcers. However, there are multiple studies that suggest that aerosol forcing is in the right ball park, including those on the Mt. Pinatubo eruption, etc.
I see you’ve gotten no better at providing supporting documentation for your assertions. As such, I do not feel bound to respond to groundless assertions except to say that it is clear from your posts that you don’t have much experience analyzing data. So, have a good life with your pseudoscience. I’ll stick with physics.

I had argued that small amounts of very accurate data are intrinsically better than huge amounts of dross that need adjusting but you argued that all data was valuable, even obviously bad data. Well I was right and you were wrong.

Ray Ladbury replied:

it is clear from your posts that you don’t have much experience analyzing data

I do have “much experience analyzing data.” It’s my life’s work.

I’ve often told skeptical astronomers that visual photometry — imprecise estimates made by eye, by amateurs — can easily outperform more precise instrumental photometry, by virtue of sheer numbers. At first they generally refuse to believe it. Then I show them results from visual photometry which detected fine structure which was nowhere to be found in instrumental work. Usually they still refuse to believe it. Then I show that the visual-data results were later confirmed by instrumental measurements — but only when they became numerous enough to do the job.

Of course the models are tuned to match 20th century temperature trends

Of course nothing. Global climate models are not in any way, shape or form tuned to 20th century temperature trends. The only climate data that goes into those models is grid data — albedo, elevation, terrain type, normal cloud cover, etc. for each grid square — and things like the composition of the atmosphere and the solar constant. The rest is physics. When a GCM is revised, it’s because someone has found a way to represent the physics better.

Re 134 Martin, it seems to me that you are really grasping at straws. I see nothing in the C&T paper that would justify your statement that “it undermines the evidence for credible quantitative attribution (.e.g. claims of “most”) of the recent warming to GHGs.” Nor do the authors make such a claim.

Must be a good model if it agrees to one part in a thousand degrees C/decade for 3 different altitudes. Now how do you suppose that might have happened? Is this reality or is it Dreamtime? How come we use all those significant figures, when the overall error is perhaps of the order of 5,000 millidegrees?

Are these models good when the SD several times exceeds the value? (Why you guys don’t work in proper units like degrees Kelvin eludes this scientist).

BTW when you say that the model results are “all over the place”, you shouldn’t quote Douglass et al., because all they really show is that model runs are all over the place… it’s called weather. Natural, unforced variability, on many different time scales. The fact that models somewhat faithfully display this real-world behaviour shows that they are good, not bad.

But to get to your question, your first column is not a “good” model, it is model 15, which just happens to be very close to the average of all 22 models. Very close, because it was picked to be close, out of 22 alternatives on offer. And indeed it is, as the table shows: two very similar, smooth curves crossing once. Very close in three points. Duh.

“All those significant figures” were chosen by Douglass et al., and for good reason if you do numerical manipulations. Agreed, they don’t mean much, but you don’t want to drop precision along the way.

About the SDs (which contain weather variability on top of true model uncertainty), none of them exceed the value “several times”, or even once — except the second one for the 100 hPa level, and the reason for that is in the paper (hint: it’s based on corrupted values.) And yes, the last one, but that happens to be a small value. (A measurement value of zero is invariably exceeded by its own SD — wonder why?)

And BTW trends are the same in C and K. And it’s Kelvin, not “degrees Kelvin”, if nit-picking is the game :-)

The authors only claim that model credibility should be questioned for multidecadal prediction (should be “projection” of course). However, put on your peer reviewer hat for a moment, are you going to accept model based attribution claims to AGW, now that there is a question about the model skill in reproducing solar effects on the climate that are in the observations? Wouldn’t that be like accepting the results of an election, when it appears that the electronic machine doesn’t count the votes of one of the leading candidates? There are other documented reasons to question the skill of the AR4 models, this failure goes specifically to the question of whether the current models can by used to reject or dismiss the relative importance of the possible solar contribution.

Peer reviewers will be aware of this and other diagnostic literature, and will reject attribution and projection claims based on models that continue to demonstrate these problems. Past published results based on these and presumably also earlier models will be questioned as well. This is the scientific process.

Martin, I am not really qualified to debate this, so I will leave that to others who are. But it seems to me that C&T are dealing with an oscillating forcing of 11 years period, with an amplitude of roughly the same order as the annual increase in GHG forcing. Though this may be an area where the models need additional refinement, it is hard to see how it could overturn the current conclusions about the role of AGW. I will give the paper a more careful read.

What evidence can you adduce that it was Douglass et al who composed the averages I cited – and not the people producing the model simulations?

Are not model runs that are all over the place named “disagreement” or “failure to work”? It is not my argument that this effect arises because these are “weather” figures. I would question them no matter what the nature of their origins.

How does a model “just happen” to be close to the average, so really darned close, given the lottery-like odds of the numbers?

Don’t you accept there is the possibility of more than statistical coincidence? If you don’t, never ask me for a job.

I used the term “degrees Kelvin” because I thought some scientists should be reminded of a name they seldom use, Kelvin, when it should be widely used.

The referenced critique of Douglass et al is full of holes and unreliable. For example the rebuttal quotes –
“However, the chances that any one realisation would be within those error bars, would become smaller and smaller. Instead, the key standard deviation is simply sigma itself. That defines the likelihood that one realisation (i.e. the real world) is conceivably drawn from the distribution defined by the models.
”

There is no innate relationship between the convergence of models and the connection with the real world. All there is, is a convergence of models. How, in the future, do you model volcanic eruptions and ENSO effects? That’s plain stupid.

Is it not time to cast in stone the criteria that must be met before more modelling is deemed worthwhile? The payback on investment so far makes good spaghetti graphs, but tell me one – just one – REAL WORLD physical thing of commensurate value that it has done.

[Response: You seem to have relevant experience to bring to this discussion, yet you spend your time indulging in rhetorical excesses. That’s just a waste of everyone’s time. If you really think that people are just making up terabytes of data in order to fake some desired convergence, there is no point in further discussion. It’s just paranoia. As to the real world practicality of modelling, it’s easy: Attribution and prediction. You can choose to go into the future blind to the consequences of your actions, or you can learn from the past and adjust your future behaviour accordingly. To the extent that models can consistently explain past climate change (and they can), they have credibility when projecting those changes ahead. For metrics where they indicate that that weather and interannual variability dominate, one would expect that to continue. That is practical information, whether you agree or not. – gavin]

I think you understate the value the models, there is much qualitative value independent of attribution and projection. There have been insights into ocean circulation subsequently verified by observations, notably in the southern ocean. There have been insights into mode changes that are possible in ocean circulation that would have significant climatic effects and probably explain data in the paleo record. Exploration of the effects of the closing of the Isthmus of Panama due to plate tectonics have generated insights and hypotheses. There have even been insights into the ENSO and PDO phenomena, and the behavior of the stratospheric gravity waves and other phenomena. Questions about the models in resolving issues around this recent warming require a level of skill and accuracy higher than that which produced the many past successes of the models. We are moving from qualitative insight and hypothesis generation, into quantitative attribution and projection of near term climate forced by small changes in current levels of forcing.

While we would expect progress to marked by a convergence in model results due to the shared physics of the climate, such convergence is not a guarantee of progress and may be partially due to correlated error or due to poorly constrained degress of freedom.

#145 Geoff Sherrington:
As to who did the ensemble averaging, I
don’t know. Shouldn’t matter if it was done
correctly. The number of runs per ensemble
is given in the table.

Are not model runs that are all over the
place named “disagreement” or “failure
to work”?

No, only disagreements between properly
constructed ensemble averages may be called
“disagreement”. This here is (largely) weather /
interannual variability. Weather really exists
you know, just look out of the window :-)

How, in the future, do you model
volcanic eruptions and ENSO effects?

You don’t. The models also don’t prepare
breakfast for you, expecting them to is
plain stupid.
(Now hindcasting for model verification
can take these things into account.
Different story.)
About the statistical agreement, it’s a mix
between physics and dumb luck. The physics
is in the general arc shape that the models
have in common: in the lower troposhere,
the adiabatic lapse rate gets smaller when
it warms, in the upper troposphere, dry,
it doesn’t; even further up it is cooling.(*)
Net result: an arc. Your “coincidence” is
just two such smooth curves cutting under
a small angle. Result: three successive
values agree within 0.001 degs.
(*) See article I referred to for good explanation.

Riddle: how big should a group of people be for
two of them to have the same birthday with 50%
probability? (A year having 365 days and probability
being evenly distributed.). People very
commonly overrate the unlikelyhood of
coincidences — especially such that aren’t
specified in advance…
BTW I’m left wondering where Gavin Schmidt
got the impression you know what you are
talking about… polite as always, I guess.

Still re #145 Geoff Sherrington: I see that your “refutation” of the critique of Douglass et al. doesn’t even understand what the critique is about. The point is that the radiosonde results presented there, all contain one and the same realization of “weather noise”, that of the one and only real world we live in — but this is artfully covered up by showing the nice agreement among several (in one case cherry-picked) radiosonde post-processing result sets.

Your questioning of the value of climate modelling reminds me a story (IIRC by Robert Junkh, “Brighter than a thousand suns”) where General Grover during the Manhattan project, was told by his scientists that that sturdy steel-frame tower in the Nevada desert, with the complicated apparatus on top, would completely evaporate as a result of the experiment in preparation. “Yeah, sure,” he is reported to have answered.

It is hard to sell your stuff when working on something that has never ever happened before…

Geoff Sherrington, you seem to have some fundamental misunderstandings of how scientific modeling–and climate modeling in particular–is done. Many of the climate models use quite different datasets to constrain the forcers, so the convergence of the models does in fact provide evidence in support of anthropogenic causation. What is more, the fact that most of the uncertainty lies on the high side of the consensus projcetion is also significant, since this high side may well drive risk. I would suggest that you become familiar with how climate models are implemented and used before blindly assuming your experience is 100% applicable without modification.
Might I suggest that Realclimate serves as a wonderful resource for scientists not directly involved in climate science to come and learn about the subject. The peer reviewed literature is the more appropriate venue if you feel you have substantive criticisms of methodology.