I ended up in a brief discussion on Twitter about models, in which the other party was suggesting that models can be manipulated to produce a desired result. Their background was finance, and they linked to some kind of Financial Times report to support their view. I, quite innocently, pointed out that this was more difficult with physical models, and the discussion went downhill from there. I do think, however, that my point is defensible, which I shall try to do here.

What I mean by physical models is those that are meant to represent the physical world, as opposed to – for example – financial, or economic models. The crucial point about physical models (which does not – as far as I’m aware – apply to other types of models) is that they’re typically founded on fundamental conservation laws; the conservation of mass, momentum, and energy. This has two major consequences; you are restricted as to how you can develop your model, and others can model the same physical system without needing to know the details of your model.

A set of fundamental equations that is often used to model physical systems is the Navier Stokes equations. These are essentially equations that describe the evolution of a gas/fluid in the presence of dissipation. The first of these equations represents mass conservation
which we can expand to:
What this equation is essentially saying is that the density, , in a particular volume cannot change unless there is a net flux, , into that volume.

We can also write the equivalent equation for momentum
which we can again write out (but which I’ll only do for the component) as
The left-hand-side is similar to that for the equation for mass conservation; the momentum in a volume can change if there is a net flux of momentum into that volume. There are, however, now terms on the right-hand-side. These are forces. The first is simply the pressure force; if there is a pressure gradient across this volume, then it means that there is net force on the volume, and the momentum will change. The final term is the viscous force. Microscopically you can think of this as representing a change of momentum due to the exchange of gas particles between neighbouring volumes. If there is no viscosity, then there will be no net change in momentum. If there is a viscosity, then the momentum gained will be different to the momentum lost, and there will be a net change in momentum. This can then be expressed macroscopically as a force.

You can also write out a similar equation for energy, that can also include the change in energy due to work done by neighbouring volumes, energy changes due to viscous dissipation, and – potentially – heat conduction. I won’t write this one out, as it just gets more and more complicated and harder and harder to explain. The point is, though, that these equations describe the evolution of a gas/fluid and are used extensively across all of the physical sciencies; from studying star and planet formation, through to atmospheric dynamics. They conserve mass, momentum, and energy. There are certain parameters (viscosity, heat conduction, …) that can be adjusted, but these are typically constrained both by physical arguments and by observations. For example, one could set the ocean heat diffusion to be so small that the surface warmed incredibly fast. It would, however, be fairly obvious that this was wrong given that it would neither match the observed surface warming, nor the warming of the deeper parts of the oceans.

To be clear, I’m not suggesting that physical models are somehow better than other types of models, or that physiscists are somehow better than other types of researchers; I’m simply pointing out that the existence of fundamental conservation laws makes it quite difficult to produce some kind of desired result using a physical model. I’m also not saying that it isn’t possible, simply that it’s harder when compared to models that don’t have underlying conservation laws. It’s also easier to pick up on such issues, given that you don’t need to know the details of the other model in order to model the same system independently. Essentially, that it may be regarded as easy to engineer a desired result with some models, does not necessarily make it the case for all types of models. Of course, I’m sure there are subtleties that I haven’t considered; this isn’t meant to be a definitive argument. Also, if someone can convince me that I’m wrong, feel free to try and do so.

Physical model are hard because they have a large number of constrain to respect. Economic model only deal with a few observable at a time. The observation space is then poorly constrained. In addition to conservation law, physical model must also be consistent with observation outside their own field of application. This is why you cannot accept homeopathy or other pseudoscience by only on a few experiment. In climate science, this make you reject the solar hypothesis even if a statistical signal exist,

In climate science, this make you reject the solar hypothesis even if a statistical signal exist,

I’m not quite sure what you mean by this. I guess maybe what you mean is that you can find a correlation between solar activity and climate, but when you consider the actual physics, the energetics isn’t consistent with a high sensitivity to changes in solar forcing and all the other ideas (cosmic rays, magnetice fields, …) are not based on physically plausible models (yet).

BBD,
I think the a argument was simply “I have experience in some kind of modelling, my experience tells me that it’s possible to engineer all sorts of desired results with the models with which I have experience, therefore this is true for all models”.

Harry,
I’m not quite sure why you regard them all as conceptual models. However, even if physical model isn’t ideal, I did try to define what I meant and I’m happy to use a better descriptor if anyone can think of one.

Economics appears to be largely opinion: there are very few generally-agreed laws with which to construct universally-accepted financial models. Every economist you ask provides a different prediction as to how financial situations turn out.

This is self-evidently true because otherwise how could stock markets operate? With a workable model, everyone would be able to predict forthcoming crises and act accordingly. In reality every financial event that comes along is unforeseen—the few gamblers who happen to call it right being the exceptions that prove the rule.

No wonder they don’t understand physical models based on long-established and proven rules and laws.

I think where things like Murphy v Smith are at loggerheads is that they’re talking about fundamentally different things. Murphy is talking about the size of the physical economy, which of course must conform to the various conservation laws and so on, which leads to the obvious implications that Murphy spells out. However Noah comes from a world in which a Gaugin can be valued more highly than an airbus a320, and frankly there ain’t no conservation law that can account for that. They’re both right, but are talking at cross purposes.

I think the a argument was simply “I have experience in some kind of modelling, my experience tells me that it’s possible to engineer all sorts of desired results with the models with which I have experience, therefore this is true for all models”.

Another important difference is that air and water molecules do not read scientific journals whereas many important players in the economy do base some of their decisions on the outputs of economic models. This means that the models themselves have feedbacks into the reality they are trying to model- when these feedbacks are positive interesting things happen.

I think where things like Murphy v Smith are at loggerheads is that they’re talking about fundamentally different things.

Yes, that was my impression too. For example, Murphy interpreted infinite literally, whereas Smith interpreted it as no limits for the foreseeable future (IIRC).

Gaugin can be valued more highly than an airbus a320, and frankly there ain’t no conservation law that can account for that. They’re both right, but are talking at cross purposes.

Yes, a good point.

My impression was that both did a poor job of considering the other party’s position. I think both Murphy and Smith’s posts came across as somewhat condescending, based on the idea that the other party was rather clueless when it came to things outside their area of formal expertise. A pity, because I think they were both making interesting points.

Using the Navier-Stokes equations as an example of a conservative system? You may be aiming high. Using as a recent example the fact that various media in Australia, the UK and North America once again credulously reported on a “NASA warp drive” that isn’t, even very basic insight into conservation of energy and momentum seems to be lacking in a large percentage of the population, even some educated in science or engineering.

Magna,
Yes, I read some of that article, but didn’t make it to the end (it was long and rather depressing). I think I see where he’s coming from. I’m sure there are many “skeptics” who are very nice, decent people, who I would find very interesting and quite enjoy meeting and spending time with. It is just a bit depressing that in today’s educated socieities, we still see so much pseudo-science being accepted.

Ultimately it all comes back to the $ not being a physical unit, it’s just a symbol, and is perhaps best thought of as a kind of hybrid between a tool of exchange and an information feedback mechanism (price) which dictates how much effort humans devote towards certain behaviours.

It’s an important one though, because it leaves the door open to the kind of financial modelling that can “create” wealth by extracting a large number of tiny percentages of dollars and moving them into someone’s pocket. My take on this is that financial modellers engineer their programs to produce a desirable result for them or their employers, and then assume that everyone else does the same. It’s analogous to the political discussion between “conservative” businesspeople and the rest of the world, where the conservative thinks, “I gamed the system to get where I am, because that’s what business values as ‘innovation’, and I assume everyone is playing the same game. How could you succeed otherwise?” (Yes, I’m stereotyping.)

Sam,
Thanks, that’s a good article and is – I think – roughly how I understand the situation. We don’t need to tie GDP/economic growth to some resource. As long as a resource is sufficient, then we continue to increase economic activity and – hence – economic growth. At least, that’s what I think the argument is.

The work of climateprediction.net provides evidence supporting your view. Through the magic of distributed computing they have run thousands of climate model simulations with a broad range of parameter values. All the runs show warming, and the runs which match the past, show warming consistent with IPCC projections. This is strong evidence that you can’t get anything you want from physics-based climate models despite the uncertainty in many of the parameterizations. With increasing CO2, the basic physics inevitably leads to warming. For example, http://www.climateprediction.net/wp-content/publications/NatGeoSci_2012a.pdf

T-rev and john,
That’s the one I linked to at the end of my post and which was criticised by Noah Smith for not understanding the standard economic terminology and confusing resource growth with economic growth. Sam (in the comments above) links to some interesting articles that try to reconcile the two positions.

Smith doesn’t so much interpret infinite differently as much as simply claim it’s out the scope of his field, much like nobody worries about whether Navier-Stokes applies to the heat death of the universe.

The whole eternal growth thing seems to be a hobby horse not of economists but of people who like to trash them. As somebody on Murphy’s or Smith’s post put it, economics is not what commenters at the Oil Drum say it is.

A significant point you allude to but don’t really emphasize, is that physical models ‘have’ to match observations to be considered good. Back in the mists of time when I was an astronomy grad student, my Masters Thesis was about the effect of using different variations of the mixing length model of convection in stellar models. My supervisor thought it might be able to cause variations in the giant branch temperature.
I tried several variations, but it turned out that once you calibrated the main mixing length parameter so a solar model matched the sun’s temperature, the giant branch didn’t move that much. However, the interior structure of the stellar envelope could change significantly, but at the time, there were no real observational constraints on this.

With respect, I think you all might missing the point of the initial argument, which I think is valid. I would restate it like something like this:

Mathematical models are useful tools to help us understand relationships between causes and effects in many different fields of study. These range from economic models to climate models and many others. Simple models – with few variables, a short time-frame, and well understood relationships between the variables – tend to be exceptionally reliable – mostly because they can be tested and revised until the assumptions built into these models can be “tuned” to match observed outcomes.

The danger in any model, however – whether it is a financial one, an astronomical one, or even a climate model, is when the model gets complex, has many “tunable” variables and even some unknown influences. This danger is greatest when people are trying to use these models to predict the future.

Planes have crashed, bridges have collapsed, and whole economies have been ruined because well-intended professionals mistakenly “trusted” their models to get it right. No model driven exercise is immune from this risk – including the current use of various complex climate models that are sometimes being used as “proof” that drastic action is needed to prevent some future climate catastrophe.

Complex models – like the Ptolemaic model of our solar system – or the Communist model of economics – frequently fail because skilled system modelers will “tune” the variables in such a way to create a desired outcome – one that makes sense to them – and one often just reinforces their own initial core beliefs.

This risk of a model failure applies equally to models based on physical processes or social ones.

With respect, I think you all might missing the point of the initial argument

With respect, I think you’re missing the point that I’m making. I’m certainly not suggesting that physical models are always right, while other models can be horribly flawed. I’m suggesting that the existence of conservation laws means that it is not as easy to get a desired result with a physical model as it is with models that do not have underlying conservation laws. I don’t think anything you’ve said disputes that point. Just because we use the word “models” to describe theoretical calculations across many fields does not make the “models” somehow equivalent.

“What I mean by physical models is those that are meant to represent the physical world, as opposed to – for example – financial, or economic models. The crucial point about physical models (which does not – as far as I’m aware – apply to other types of models) is that they’re typically founded on fundamental conservation laws; the conservation of mass, momentum, and energy. ”

That is the claim.

Lets break your claims down.

1. Physical models represent the physical world…. so do economic and financial models
there is nothing super natural about economics, nothing non physical.
First mistake: false dichotomy
2. Physical models are based on conservation laws. Economic and financial models are not.

a) this is a hard claim to prove, so you pleaded ignorance.
b) showing conservation laws in economics probably undos your argument.

Physical models represent the physical world…. so do economic and financial models

I disagree. Economic and financial models certainly represent apects of the physical world, but not in a way that is independent of our own choices and decisions. If you don’t like the term “physical” then change it to “systems where – given the initial assumptions – the outcome does not dependent on societal choices and decisions”.

The field is called
ECONOPHYSICS

Okay, but that there is a field that does use conservation laws, doesn’t really change my point. I’ll concede that there are other types of models that do use conservation laws, but this is not universally true. I wasn’t really trying to dismiss economic/financial modelling, I was really trying to distinguish between those models that are constrained by fundamental conservation laws, and those that are not.

you make a claim about physical models being harder to “tune” to a desired outcome.

This is an empirical claim. Settled by empirical means. not blog comments

I agree that I haven’t shown this to be true. However, I do think that the underlying conservation laws do make it harder to arbitrarily tune models that aim to represent physical systems. Call it an opinion, if you want, but it’s not entirely un-informed.

I’ll expand on this a bit. Something that you regularly encounter as an active physicist are other physicists who may have no actual expertise in computational modelling, but who can see an issue with a model purely by doing some kind of basic sanity check. Does it conserve energy, for example. This is one reason why I think it’s harder to arbitrarily tune these type of models; you just need an understanding of basic physics to see if a model result is plausible or not, you don’t necessarily need to know all that much about the actual workings of the model.

Actually, I’ll expand on this too 🙂 That there is a field that uses physical concepts in their modelling, doesn’t necessarily mean that these underlying concepts are fundamentally true. In modelling physical systems, energy is conserved. We don’t decide to conserve energy; we have to conserve energy. Applying a similar conservation law to economic modelling doesn’t immediately mean that that law is universally true.

Having created both physical models and financial models, I definitely see a difference. I agree with Mike’s general point, which can be summarized by the famous quote: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” However, I disagree with: “This risk of a model failure applies EQUALLY to models based on physical processes or social ones.” Risk of failure? Yes. Equally? No. The reasons are precisely those that ATTP laid out.

In economic and financial models, there are few constraints. Ask any anybody who made an NPV model how they determined their growth rate or their WACC, factors which greatly affect the result. You won’t get a pretty answer. Most of what I learned in creating NPV models has been during the act of building them not from analyzing the results. Macroeconomic models, of which I have less but some experience, have constraints but nothing close to physical models. On the extreme end of physical models are quantum mechanics models, where you have variables constrained to discrete values – there is nothing comparable in the economic world, at least that i can think of. I remember working on a QM model with 15 equations and 15 constraints, and needing a Monte Carlo simulation to work for quite some time to find just one solution that satisfied the equations (albeit these were older, slower computers).

Depending on the remaining degrees of freedom, it may be easy or hard to find many solutions. Historical climate history should be enough to constrain the models, I would think, and thus the future predictions. I’ve never worked on a climate model.

Econophysics – can I expand on this red herring, too? ATTP nailed it again with “That there is a field that uses physical concepts in their modelling, doesn’t necessarily mean that these underlying concepts are fundamentally true.” I worked on the Black–Scholes equation for option pricing, which is used as an example in the Wiki. It is based on a few assumptions that are not necessarily true (not as true as energy conservation, for example), such as future volatility matching historical. The B-S equation (and similar equations to price derivatives) can not handle drastic changes in the macro-market, such as a market crash. Nobel-prize winning Scholes was on the board of Long Term Capital Management – ask him how the crash of 1998 worked out for them. By contrast, great movements in energy caused by natural climate variability must be constrained by energy conservation among other laws.

“Actually, I’ll expand on this too 🙂 That there is a field that uses physical concepts in their modelling, doesn’t necessarily mean that these underlying concepts are fundamentally true. In modelling physical systems, energy is conserved. We don’t decide to conserve energy; we have to conserve energy. Applying a similar conservation law to economic modelling doesn’t immediately mean that that law is universally true.”

No one here who works in the sciences has even done the FIRST thing required.

Actually test the proposition.

we have personal anecdotes.. “i’ve worked in both”
we have claims made with zero expertise

I’ve worked with both physical models and financials ones.

the physical one were way easier to tweak because you had constraints.

the physical one were way easier to tweak because you had constraints.

I’m not following you.

one more anecdote.

It is just a blog.

However, I think you’re somewhat missing the point of what I’m getting at. This was really a response to a suggestion that seems quite common. Someone has experience of a form modelling in which – according to their experience – it is quite easy to engineer a desired result. They then extrapolate that to claim that this is true of models in general. Now, I also have experience of modelling. I don’t think that this is true for the models with which I have experience. I’m, therefore, disputing the suggestion that simply because some models are easily engineered to give a desired result, that this is true for all things that we call models. So, at best we’re all wrong and can say nothing about the relative merits of models from different disciplines. Alternatively, if people from one discipline regard their models as easily tunable, and those from another do not regard their’s as easily tuneable, maybe this is telling us something.

No one here who works in the sciences has even done the FIRST thing required.

Actually test the proposition.

As good as this would be, I’m not sure how one would do this. I still don’t think that this is required to point out that some models are constrained by universally accepted conservation laws, and others are not.

In hydrogeology they are generally called numerical models as opposed to physical models (sand-filled boxes), electric analog models (circuit boards), arithmetic models (back of envelope), etc.. A conceptual model is usually the first step where you compile and digest all of your data to set your constants, parameters, boundary conditions, calibration and verification targets, prediction scenarios, etc. Next, one might run some lower dimension arithmetic models to verify the conceptual model is reasonable. Once a numerical model is constructed, then you take on a shakedown cruise because there are always obvious input and assumptions problems that need to be fixed. Then, the model is “calibrated” against a period of historical data and adjusting various factors within “reasonable” limits. Once calibrated, a verification step consists of comparing the model output to another period of historical data. Once that is done, it is used to make predictions of potential future scenarios like no action, more pumping, less pumping, big floods, big droughts, mitigation measures, etc.

The bottom line is that numerical models can be tweeked and tuned six ways from Sunday. I have no idea about financial models, but I bet the same is true. The limit to tweeking and tuning is human imagination, not the subject matter.

In my experience with models, they are very useful in spotlighting what you don’t know, thereby driving a new field data collection focus. The problem is that human nature and societal pressures can pull modellers into over-tuning to prove that the model reproduces nature well and present a positive result. Bosses and clients never want to hear that you need to drill more holes and burn more samples.

Isn’t just a matter of the number of the unknown variables in a model?

In a simple model – where constrained physical values like weight, gravity, tensile strength, etc are all well known – then no one would argue with the excellent point that ATTP is trying to make….

In an economic or social model, however – the constraints are fuzzier at best, so the results are less predictable – which is why advertisers, pollsters, economists and others struggle so mightily to duplicate Asimov’s “psychohistory” premise…..

But that’s not what we’re talking about here.

Assume an engineering model has five “known” physical constraints. Can you model it accurately? Probably….

Assume an economic model has five “predicted” behavior and financial constraints. That model is more suspect, and I believe that is the point that ATTP is trying to make. If so – I agree.

But assume that a climate model, which might have five “known” physical constraints and five “unknown” physical constraints. My guess – and this is probably subject to a math analysis that is beyond my ability to do – that the margin of error in this situation is probably nearly as great as the purely economic model.

As a businessman, engineer, and sometimes marketing/advertising consultant – I am ACUTELY aware of how easy it is to get 9 out of 10 things right, but be totally hosed by the 10th factor that you did not account for properly ……

This leads me to be “skeptical” of the models from climate scientists who believe they have everything figured out, especially when their predictions over the past 10-15 years don’t seem to as accurate as they hoped…..

This leads me to be “skeptical” of the models from climate scientists who believe they have everything figured out, especially when their predictions over the past 10-15 years don’t seem to as accurate as they hoped…..

They’ve never claimed to believe that they have everything figured out. Also, until recently climate models were known to do poorly on decadal timescales. They may have hoped for better, but I doubt they’re that surprised that it hasn’t quite worked out that way. Also, an apples-to-apples comparison and updated forcings, already improve the comparison.

Also, I’m not claiming that physical models are always right, or that they can’t be tuned. I’m suggesting that producing any kind of desired result is more difficult when your model is based on fundamental conservation laws, than when it is not.

In hydrogeology they are generally called numerical models as opposed to physical models

Well, I was trying to distinguish between purely numerical models and those that are founded on the fundamental physical conservation laws.

The bottom line is that numerical models can be tweeked and tuned six ways from Sunday. I have no idea about financial models, but I bet the same is true. The limit to tweeking and tuning is human imagination, not the subject matter.

My argument is that this isn’t true if you’re modelling a physical system using something like the Navier Stokes equations. You can certainly do some tuning, but there are limits.

ATTP:
I get what you meant and you asked for another term. Perhaps mathematical physics model because “physical model” in the US (that’s the only experience I can speak to) typically means a petrie dish, bucket chemistry, sand box, wind tunnel, etc.

I agree that if you are modelling a sphere in a uniform incompressible flow field with low a Reynolds number, there are definite limits to tweaking these type of “tinker toy” (Mickey Mouse?) simulations. However, since this is a climate-focus blog, I assumed you were commenting about real-world models with highly uncertain/complex boundary conditions, uncertain/complex parameters, multi-phase flow, turbulence, transient conditions of heat, chemistry and biology, long-term and short term internal variance, planetary to particulate scales, etc. I know that geological fluid mechanical and heuristic models can be very highly tuned and they are several orders of magnitude less complex and uncertain and several orders of magnitude more constrained than GCMs. Are there limits to modifying GCMs? If so, what are they (besides computing power)?

I get what you meant and you asked for another term. Perhaps mathematical physics model because “physical model” in the US (that’s the only experience I can speak to) typically means a petrie dish, bucket chemistry, sand box, wind tunnel, etc.

Ahh, sorry. That might explain why so many seem to find my use of “physical model” a bit odd. Useful to know.

The latter part of your comment is a fair point. These are complex models and I have rather glossed over these complexities. I would argue, however, that just because there is complexity doesn’t necessarily mean that it’s very tuneable. However, if you start adding chemistry and biology then my analogy would start to break down.

Are there limits to modifying GCMs? If so, what are they (besides computing power)?

I don’t really have a good answer to this. I suspect it depends somewhat on what scales you mean. My suspicion would be that it’s quite hard to modify globally averaged results substantially, but that given the inherent complexity and non-linearities, it may well be that you can substantially influence regional effects in GCMs.

Whatever quibbles might be raised there is a fundamental difference in the underlying causative agency, the value/quantity of primary interest between climate and financial models.

In climate models energy is conserved, and this determines the value of the temperature changes. Those temperature changes and the associated energy flows can be measured as objective physical values and the conservation laws constrain the results. The number of Joules, or the temperature rise are not arbitary values.

In contrast financial models are primarily concerned with monetary value and flows of wealth. These are arbitrary social values. As recent (and historical) events demonstrate money can be created and destroyed. Unlike energy, money flows are not unidirectional and irreversible. In fact the predominate direction of flow for financial value is in direct contradiction with the comparable constrains on the flows of energy.

It is this root difference in the nature of energy and money that makes climate and financial models different at the epistemological level.

It may be possible to model financial systems with an a prior assumption that financial value is a conserved quantity, and place constraints on the nature of its movement through the system, but those are arbitrary impositions contradicted by observations. A mere mimicry of physical models that HAVE to conform to energy conservation laws to be legitimate. Because the constraints on physical models are ineluctable they attain a greater degree of epistemic validity.

“We’ve arranged a global civilization in which most crucial elements profoundly depend on science and technology. We have also arranged things so that almost no one understands science and technology. This is a prescription for disaster. We might get away with it for a while, but sooner or later this combustible mixture of ignorance and power is going to blow up in our faces.”
― Carl Sagan

The strain of anti-intellectualism has been a constant thread winding its way through our political and cultural life, nurtured by the false notion that democracy means that ‘my ignorance is just as good as your knowledge.” ― Isaac Asimov

No, I don’t believe this is true in the sense that ATTP wrote. He gave examples of equations from physics that are used in physical; there are virtually no similar equations in economics that *can* be used. In fact, one need only look at the current state of the world economy – especially Greece and the plans put forward for them just 5 years ago – to see how poorly tied to reality economic models are.

For instance, is there a universally accepted equation that describes the effects of tax cuts on GDP? Personal income? Budget deficits? Income equality? No. Virtually every economic model will be based on assumptions specific to the model itself.

Moreover, the general assumptions (Real Business Cycles, Rational Actors, the GOP version of dynamic scoring, etc., etc) made by many economic modelers have been shown to be at odds with reality for decades – but that doesn’t stop the modelers from using them as if *nothing* was wrong with them. These models are based on *idealized* behaviors that are never found in the real world. You can call them supernatural, unnatural, or whatever you like – but they don’t model the real physical world.

Paul Romer has recently been discussing economic models and their modelers. Here’s one excerpt: “One of the issues that I raised in a conversation concerned the support (in the mathematical sense) of the distribution of productivity in the Lucas-Moll model. Their assumption 1 states that at time 0, the support for this distribution is an unbounded interval of the form [x,infinity). In response to objections of the general form “this assumption is unacceptable because it means that everything that people will ever know is already known by some person at time 0,” Lucas and Moll present a “bounded” model in which the support for the distribution of productivity at time zero is [x,B]. Using words, they claim that this bounded model leads to essentially the same conclusions as the model based on assumption 1. (I disagree, but let’s stipulate for now that this verbal claim is correct.) I observed that Lucas and Moll do not give the reader any verbal warning that because of other assumptions that they make, the support for the distribution of productivity jumps discontinuously back to [x,infinity) at all dates t>0 so it is a bounded model in only the most tenuous sense.”

“… in only the most tenuous sense.” — I.e., not in any real-world, physical sense. This is *not* a parameter that has been ‘tuned’ to make the model better fit observations; it’s simply an assumption used because it gives a result the modeler likes. Romer makes the argument that this type of model is making economics an adversarial endeavor as opposed to a scientific one.

Paul Krugman has touched on the same issue with economic models many times over the years. He wrote in June of this year, “…Lucas’s attack on Romer rested in part on the claim that government spending on a new bridge would lead consumers, anticipating future taxes, to offset it one for one with cuts in their own spending; this is completely wrong if the spending is temporary.

But aside from exposing the intellectual decline and fall of the Chicago School, is this the way we should go about modeling such things? Well, yes, sometimes, because rigorous intertemporal thinking, even if empirically ungrounded, can be useful to focus one’s thoughts. But as a way to think about the reality of spending decisions, no. Ordinary households — and that’s who makes consumption decisions — have no idea what the government is spending, whether it is temporary or permanent…”

Again, models that have no bearing on reality. Useful in an academic sense – but not as a representation of the physical world.

There doesn’t exist an economic model more complicated than basic IS/LM that can accurately reflect reality.

I’m particularly atrocious at brevity, but even allowing for the fact that others are v good at being succinct, I still don’t understand how anything complex like the subject at hand can be discussed properly on this ‘Twitter’ of which you speak. (sorry to be such a Luddite).

Kevin O’Neill: “Again, models that have no bearing on reality.” The thought has occurred to me that there is an urgent need for much better computer models of economic systems, and that this is not an easy problem to solve. I realise that there are alternatives to neo-classical models, but I’m not sure how well-developed and useful any of these are yet.

He talks a lot about the similarities and differences between physics and economics — well, finance — as well as their use of theories versus models. Here is a segment that ties in with your post. The lead up starts around 9:30 on the audio.

“[S]tock prices or the returns on stock prices behave like smoke diffusing. And there’s something similar about them, but it’s not an accurate description in the way that, say, Newton’s Laws attempt to be an accurate description. It’s really based on an analogy to something you do understand, which is smoke diffusing, and saying maybe stock prices behave a lot like that.”

“The evidence suggests that truer equations of fluid dynamics can be found in a little-known, relatively unheralded theory developed by the Dutch mathematician and physicist Diederik Korteweg in the early 1900s. And yet, for some gases, even the Korteweg equations fall short, and there is no fluid picture at all.

“Navier-Stokes makes very good predictions for the air in the room,” said Slemrod, who presented the evidence last month in the journal Mathematical Modelling of Natural Phenomena. But at high altitudes, and in other near-vacuum situations, “the equations become less and less accurate.”

Regarding economics, I think one important and basic principle is that when two entities trade or are doing an economical transactions, both win. And that is why they do it (at least as far as they are homo economicus…), so economics tend to be driven by imbalances not conservation.

I’m not sure that the physical constraints matter that much. Many important aspects of climate models are paramaterised and not based purely on physics. Although perhaps physical constraints are why warming is always larger than the hypothetical 0 feedback situation.

I’d say the fact that modelling attempts to predict the choices of individual humans is one key issue.

And another key issue is how the models are tested and validated. Climate models mostly go through a barrage of tests against the real world, and exist in a reasonably open peer reviewed framework. Experts who know just as much as the modellers are free to criticise, replicate and possibly improve on a particular model. Many financial models are cooked up by an individual analyst in an office, and are tested by the analyst’s boss, and a few key stakeholders who for the most part do not have the same expertise as the modeller. A lot of the validation does not depend on scientific analysis and replicability, but on political power struggles based on how the model output might reflect or impact on a specific stakeholder. And whether the results pass a ‘gut feel’ test for the critical decision maker. This is my experience with financial modelling, within a small number of entities. I couldn’t say for sure whether others in financial modelling have experienced a decision making process with more scientific validation.

Paramaterisation exists in climate models. I believe that climate models can be tuned to some extent, and the question is how far? Can the amount of tuning be meaningfully compared to the amount of tuning that can be done in a finance model? If a financial modeller produced a model predicting the total company expenditure next year was going to be in a range between 1.5 billion and 4.5 billion dollars they’d either be told to do the model again, or shown the door. If company expenditure last year was 2.8 billion then the modeller could perhaps easily tune the model to 1.5 or 4.5, but either result would be unacceptable unless unusual circumstances existed. Would it be fair or even meaningful to then compare this to the canonical climate modelling range of 1.5 to 4.5 degrees C for climate sensitivity?

The key observation in my mind is that if climate models could be tuned to produce any answer, then why haven’t they? Considering the vested interests involved, then surely at least one person would have the combination of expertise and motivation to produce a climate model with a sensitivity of 0.5 if this was actually possible?

Take two scenarios:
1) From 2007 to 2009 US gov’t spending rose 25%, at the same time revenues fell by over 20% and the Federal Reserve was printing money at unprecedented rates. What were the predicted effects on inflation, interest rates, and the price of government bonds?

2) In 1938 Guy Callendar proposed that temperatures over the previous 50 years could be explained by increased concentrations of atmospheric CO2. Extrapolating forward, how has Callendar’s CO2 hypothesis turned out?

Now, the difference between the accuracy of the predictions in these two scenarios is that Callendar was modeling a *physical* system. It has some additional terms that can dampen or accentuate the results over short periods (volcanoes, natural variability), but those are also physical and can be included in the model to make it even more accurate. Whereas the economic scenario depends on lots of other variables, many of which are counter-intuitive, exceptions to general rules, or inherently difficult to predict (human reaction to events – both individually and collectively).

To my knowledge, the only economists that correctly predicted the economic results were those who looked at very simplistic models. Basic Hicksian IS/LM; an economic tool that’s been around longer than Callendar’s 1938 paper and that many economists scoff at as being too simplistic to be of any use. Elegant math divorced from reality is more to their taste.

I’d say leave economic models to the economists, financial models to financial experts and climate models to climate modelers. None of them can be tweaked to produced any answer you want without changing the assumptions or “what ifs” or making a nonsense of the model. Sometimes people think the assumptions are sound and sometimes not. (In climate models, some people accept models that show low climate sensitivity and some don’t.)

If economic models were useless, we’d not have had relatively successful monetary policies with inflation constrained within a few percentage points over decades. Bearing in mind that these models rely on assumptions and the feedback is human behaviour and human expectations. So when an economic model predicts something, people have some “faith” in the model and adjust their behaviour accordingly. Whatever the model predicts, people’s behaviour will counter the prediction to some extent. Which is often a good thing. (The human behaviour response adds something of an unknown, but not entirely unpredictable element to economic models.) In general the people managing monetary and fiscal policy rely on people modifying their behaviour to keep the economy in check (with interest rate signals, taxation changes etc used as behaviour modification tools).

Climate model projections are also affecting people’s behaviour, so that maybe, just maybe, we’ll veer off the RCP8.5 pathway (which is in part based on economic models) and shift down a pathway or two.

With the large complex coupled climate models (as opposed to simpler climate models), one of the main things that makes them more difficult to tweak to “get an answer you want” is the number of components they have, the number of people involved and the fact that they are tested on hindcasts.

To reply to some of what has been said so far with respect to economic modeling:

I think that it’s important to keep in mind the fact that not all economists are equal and not all economics are equal. That is, if we weed out all conservative economists and conservative economics (the only exception being those very few, narrowly defined, legitimate contributions made by some conservative economists like Freidman, outside of which is nothing but nonsense, and this includes Freidman’s nonsense outside of his few legitimate contributions), what we end up with is a collection of economists (like Paul Krugman and Joseph Stiglitz) and economics that actually has a pretty good track record in terms of projections.

People need to read what they have been saying over the past year, especially, and not take seriously the never-ending barrage of conservative misstatements as to what they actually say and do not say.

Here are some quotes from some of Krugman’s recent and very informative writings (note: cookies must be on to connect to New York Times documents):

“What am I talking about here? “Derp” is a term borrowed from the cartoon “South Park” that has achieved wide currency among people I talk to, because it’s useful shorthand for an all-too-obvious feature of the modern intellectual landscape: people who keep saying the same thing no matter how much evidence accumulates that it’s completely wrong.
…
I’ve already mentioned one telltale sign of derp: predictions that just keep being repeated no matter how wrong they’ve been in the past. Another sign is the never-changing policy prescription, like the assertion that slashing tax rates on the wealthy, which you advocate all the time, just so happens to also be the perfect response to a financial crisis nobody expected.

Yet another is a call for long-term responses to short-term events – for example, a permanent downsizing of government in response to a recession.”

“And as I have often argued, these past 6 or 7 years have in fact been a triumph for IS-LM. Those of us using IS-LM made predictions about the quiescence of interest rates and inflation that were ridiculed by many on the right, but have been completely borne out in practice. We also predicted much bigger adverse effects from austerity than usual because of the zero lower bound, and that has also come true.”

“It’s true that the Hicksian framework I usually use to explain the liquidity trap is both short-run and quasi-static, and you might worry that its conclusions won’t hold up when you take expectations about the future into account. In fact, I did worry about that way back when. My work on the liquidity trap began as an attempt to show that IS-LM was wrong, that once you thought in terms of forward-looking behavior in a model that dotted all the intertemporal eyes and crossed all the teas it would turn out that expanding the monetary base was always effective.
But what I found was that the liquidity trap was still very real in a stripped-down New Keynesian model. And the reason was that the proposition that an expansion in the monetary base always raises the equilibrium price level in proportion only actually applies to a permanent rise; if the monetary expansion is perceived as temporary, it will have no effect at the zero lower bound. Hence my call for the Bank of Japan to “credibly promise to be irresponsible” – to make the expansion of the base permanent, by committing to a relatively high inflation target. That was the main point of my 1998 paper!
…
And a few years after I published that paper, the BoJ put it to the test with an 80 percent rise in the monetary base that utterly failed to move inflation expectations. In general, Japanese experience gave us plenty of reason to realize that macroeconomics changes at the zero bound. So it’s still a puzzle that so many macroeconomists tried to apply non-liquidity-trap logic in 2009 – and just embarrassing that they’re still doing it.
…
In the end, while the post-2008 slump has gone on much longer than even I expected (thanks in part to terrible fiscal policy), and the downward stickiness of wages and prices has been more marked than I imagined, overall the model those of us who paid attention to Japan deployed has done pretty well – and it’s kind of shocking how few of those who got everything wrong are willing to learn from their failure and our success.”

Economics and finance are like physics and chemistry: Related but different and separate (despite the interdisciplinary field of financial economics). Modelling traditions are different in both fields, and heterogeneous in either. Economics and finance are many times bigger than climate science, and model diversity is correspondingly larger. The separation between applied and research models is more pronounced in economics and finance than it is climate science.

General circulation models are not physical models as defined above. Indeed, GCMs have to break at least one fundamental equation – most break the law of conservation.

Conservation of what? Is this your mass argument again? They add CO2 to the atmosphere without reducing the mass of the planet? If so, I don’t think it is possible – given numerical accuracy – to account for this mass loss. If you mean something else, feel free to elaborate.

I’m not quite sure what you’re getting at with your second comment, or why it’s relevant.

The separation between applied and research models is more pronounced in economics and finance than it is climate science.

Again, not sure what your point is. One reason that there isn’t necessarily a large separation between applied and research models in the physical sciences is that the equations are well-defined and you don’t use a different formalism for applied models, compared to research models.

I noticed you were projecting again on Blair King’s post in which he makes up an awful lot of things, while claiming not to have done so. Of course, I get the impression – given your many corrections to your paper – that accuracy isn’t all that important to you.

For a physical model, the ideal is to start with basic, well established physics and come up with a model that reproduces the phenomena of interest. So a really good climate model ‘knows’ nothing about trade winds or ENSO or seasons… these features just appear as you run the model. I’m not sure that climate models are 100% there yet – every time you have a parameterisation you are ‘telling’ the model something that you’d rather have emerge. But they are close.

I don’t see anything like that happening in macroeconomic models – indeed the very fact that there is a split between microeconomics, which does look at the economic effects of individual actions, and macroeconomics which deals with the big stuff – shows this.

“most break the law of conservation” ? Would you like to point at which one?

“Climate models are mathematical representations of the climate system, expressed as computer codes and run on powerful computers. One source of confidence in models comes from the fact that model fundamentals are based on established physical laws, such as conservation of mass, energy and momentum, along with a wealth of observations. ”https://www.ipcc.ch/pdf/assessment-report/ar4/wg1/ar4-wg1-chapter8.pdf

Mike,
Richard will probably not respond (as that would be out of character) but I think he’s referring to the fact the GHGs are added to the atmospphere without their mass being removed from the planet. It could be something else, probably equally trivial.

Climate model projections are also affecting people’s behaviour, so that maybe, just maybe, we’ll veer off the RCP8.5 pathway (which is in part based on economic models) and shift down a pathway or two.

Well, yes, but – I would argue – the physical model is how our climate respond to a specified emission pathway. Of course, if we don’t follow that pathway, the response will be different, but the physics won’t be.

I think that if someone’s critique of GCMs is ‘They break the law of conservation’ you can safely ignore their opinions on the subject.. I’d be embarrassed to make a howler like that and I’m not an academic.

KeefeAndAmanda –

Yes, Simon Wren-Lewis ( http://mainlymacro.blogspot.co.uk/ ) at Oxford calls this ‘MediaMacro’ in which a story is told about the economy that is completely at odds with mainstream macroeconomics, but just happens to justify the standard conservative agenda.

“…this makes it rather hard to do advanced atmospheric chemistry in a GCM.”

rather reinforces ATTP’s point.

Chemistry is important in the climate system. Methane is converted to CO by the action of hydroxyl radicals in the troposphere and stratosphere and this should be taken into account in order properly to account for radiative effects of methane emissions. CO2 partitions into the oceans by dissolution and then reaction with water to form carbonic acid, bicarbonate and carbonate.. (on very long timescales CO2 is removed from the atmosphere by chemical weathering)..etc.

But this chemistry isn’t “done” in a GCM -it can’t be done and obviously doesn’t need to be. The rate at which methane is converted to CO2 can be determined and so the methane degradation “chemistry” is parametrized using a rather well-constrained equation (or set of equations) with appropriate rate constant(s). The partitioning of CO2 into oceans follows Henry’s Law in some form and the chemical partitioning of hydrated CO2 amongst it’s charged dissolved species can be calculated using the Henderson-Hasselbalch equation. All of these parametrizations are very highly constrained and so there is little room for “tuning” these aspects of atmospheric chemistry. Unlike economic models their parameterizations are not somewhat subjective elements subject to the notions of the modeller about the effects of consumer confidence, interest rate variations, and stock market fluxes.

Much of the thinking behind the thinking that it has been succesful monetary policy which has kept inflation low in the last few decade is backed up by DGSE models in which one of the assumptions is that central bank policy largely controls the rate of inflation. I think this might actually be one of the times when it’s correct to say that the assumptions of the models give you the rate that you want! If recent years have taught us anything, it’s that monetary policy is comparitively weak. Interest rates on the floor, and unconventional policies like QE and still the growth that is saught remains elusive. I think it’s more liekly that fiscal and political forces are more likely behind the lack of inflation. Things like deindexation of payments, declining unionisation, changes in trade policy could all plausibly also be linked to lowered inflation.

Kevin,

IS/LM doesn’t really work, given that we live in a world of endogenous money and loanable funds is not an accurate description of how the monetary system in fact operates. The people who seem to have been most right about the effects of government spending in the US are the endogenous money/MMT crowd. It’s also incorrect to call QE ‘money printing’, as it is in fact an asset swap program designed to help target an interest rate. This has been one of the big issues with a lot of economic models, in that they treat banks completely incorrectly and hence missed the cause of the crisis. This is probably all Milton Friedman’s fault when he wrote that idiotic nonsense about only judging models based on the accuracy of their predictions, not the correctness of their assumptions. In fact it’s probably fair to blame the dire state of economics today mostly on Friedman, however things look to perhaps be moving in the right direction these days,

And as for people saying that you have to prove economic models don’t conserve quantities, well there are plenty of economic growth models which basically assume that technology is mana from heaven. That’s hardly realistic.

I don’t know if this is what Richard is referring to, but early GCMs used flux corrections (discussed here) which is a consequence of coupling different models together in which the surface fluxes may not be the same.

These were essentially empirical corrections that could not be justified on physical principles, and that consisted of arbitrary additions of surface fluxes of heat and salinity in order to prevent the drift of the simulated climate away from a realistic state.

But,

By the time of the TAR, however, the situation had evolved, and about half the coupled GCMs assessed in the TAR did not employ flux adjustments.

As a result, there has always been a strong disincentive to use flux corrections, and the vast majority of models used in the current round of the Intergovernmental Panel on Climate Change do not use them.

The equations that describe chemical reactions take mass conservation as their starting point. The equations inside a GCM do no conserve mass. If the chemistry is simple, a fudge factor will make this problem go away. Not so for complex chemistry.

I’m also slightly confused about your claim that they don’t conserve mass. If you’re referring to the removal of mass from the planet that provides the gravitational force, this is clearly irrelevant. If you’re referring to that the density doesn’t change when GHGs are added to the atmosphere, again this is about a effect. Precisely in what way do they not conserve mass?

The equations are one thing, when you code them and introduce parameterisations etc. you may lose some of the intended properties.

Earlier versions of ECHAM6 do not conserve energy, neither in the whole nor within the physics, and small departures from water conservation are also evident.
Analysis of the CMIP5 runs suggest that these issues persist with ECHAM6.

The equations are one thing, when you code them and introduce parameterisations etc. you may lose some of the intended properties.

Yes, I think I said something along those lines in the post. My point was more to do with the claim that one can engineer desired results with models which – I would argue – is not necessarily that easy if the models are founded on fundamental conservation laws. My claim isn’t that the models are correct, or that the parametrizations don’t have any effect, or can’t be tuned at all.

I’ll have a look at that link, thanks. However, a point I would make is that non-conservation still tells you something (i.e., it’s diagnostic).

Richard,
So your point is that the models don’t do everything perfectly, or have some problems? Sure, I don’t think I said otherwise. Be extremely surprising if there weren’t issues and that they couldn’t be improved. That doesn’t mean, however, that they’re not founded on fundamental conservation laws or that they can be tuned to produce any desired result.

I’ll expand a bit on Oliver’s point. If you read the next section of Stevens et al., it says

Since the CMIP5 runs, an attempt has been made to identify the origin of departures from mass and thermal energy conservation within the framework of the ECHAM6 single column model. A variety of model errors relating to the inconsistent use of specific heats, how condensate was passed between the convection and cloud schemes, or how vertical diffusion was represented over inhomogeneous surfaces have been identified and corrected.

This is one of the points. Non-conservation tells you something – there are model errors, which you can then aim to correct. Just to clarify the point I’m trying to make in this post; it’s not that models of physical systems can’t be wrong, or won’t have errors; it’s simply that being based on fundamental conservation laws means that it’s unlikely that you can tune them to produce any desired result (assuming you’re being honest) and that these conservation laws allow you to assess the validity of the model and identify errors.

No, Wotts, that’s not my point at all. GCMs have a remote relationship with physics. The models are full of fudge and tuning factors. The fact that they roughly represent observations may, for all we know, reflect skill in model calibration rather than skill in forecasting.

No, Wotts, that’s not my point at all. GCMs have a remote relationship with physics. The models are full of fudge and tuning factors.

And I think this is nonsense. You really shouldn’t get your information from climate denier blogs. In a sense, you may well be illustrating my point. Just because the models with which you’re familiar are full of fudge and tuning factors, doesn’t mean this is true for all models.

Post-2007/8 financial crisis, economists that relied upon analysis using IS/LM have largely been proven correct. Where is the group of MMT economists that had equal predictive success over this period?

Here’s Randal Wray in 2011 – ” I expect resumption of the financial crisis any day…” Now, that was three years *after* the meltdown. In Q1 2011 the empirical data shows the number of ‘problem banks’ had was peaking. Is that what influenced Wray? I don’t know, but he’s still waiting.

The statement “GCMs have a remote relationship with physics” is clearly nonsense.

Hmmm….

This is true wrt to sub-gridscale phenomenon ( clouds, rain, etc. ) which are much finer scale than model resolution, and thus, not represented by physics.

Also, there’s a lot of instability in model results. That’s why model runs make good spaghetti:

That doesn’t come from the physics so much as the numerical methods attempting to solve the physical equations. How much of that is a reflection of natural variability and how much is artificial variance introduced by lines of FORTRAN?

Turbulent Eddie: Nice try. Its not the FORTRAN. Climate models are the best and most thoroughly reviewed software out there. There is nothing like it.

A Speghetti graph from a site run by a man with a high school diploma. Is that really credible in your mind? Really?

So as you well know, multi decadal declines as well as monsterous inclines are in all the data. In fact you are comparing ensemble averages of multiple runs, to actual temperatures. By definition they should be different. A huge part of the ‘running cool’ mythos is caused by 1998’s weather, namely an El Nino 2C hotter on one year.

For a rigorous comparison you should probably attempt to remove the effects of weather and other short term effects from your comparison.

Or are you really one of those numpties that believes climate scientists predict volcanoes, solar cycles and the weather? I kinda want an answer to that. I mean, I think we need to know whether you think climate scientists can replace NASA, the USGS, and of course NOAA ENSO Watch. I mean… you actually think all that? Truly? What about the Tooth Fairy?

Notice that TE is being typically deceptive with graphs. To obtain a visually impressive spread he shows all the RCPs – from 2.6 to 8.5. Given the very wide spread of forcings this represents, you would expect a wide range of model results. But TE is trying to pretend that this wide spread is model uncertainty. I don’t know about you, but I find this sort of behaviour irritating.

The notion that GCMs can’t conserve mass or can’t do chemistry is nonsense.

There are certainly dynamical cores that exist that aren’t mass conservative (and they work fine in some applications – NWP for instance), but there are many others that are. GISS ModelE dynamical schemes are mass conservative and have been used to do chemistry for more than a decade (see Shindell et al, 2013 http://pubs.giss.nasa.gov/abs/sh05600u.html for a recent description). Other climate models (NCAR CESM, HadGEM etc.) also manage this without a problem.

This is an example of an old pattern. GCMs obviously have developed from simpler concepts and in the early days had many constraints. For any topic you can think of, you can generally go back to the first time people implemented it and find it was done imperfectly. Sometimes workarounds were found to deal with the problem (flux corrections, convective adjustment etc.), but after years of work, people found more fundamental ways of dealing with these issues and many of these fixes were discarded. Claims that models ‘don’t include’ water vapour, or can’t do clouds, or chemistry or sea level or whatever are generally bunk.

Yet we still find people (like Tol) confidently asserting as fundamental problems, points that (even when they were issues) were merely temporary fudges or approximations.

What strikes me as more shocking is that they still insist that the models ‘can’t’ do something, even after they have been pointed to examples of models doing exactly what it is claimed is impossible.

It is interesting how economists who are obtuse in the extreme regarding the subject they claim to have expertise in (economics), veer towards certainty, unencumbered by doubt (thanks to undocumented 25 year-old chats alleged climate modellers), when it comes to chemistry, physics, climate models, and who knows what else. Is there any limit to these renaissance intellects? I quake in awe … at their self delusions and arrogance.

I didn’t say that you were resolving it, but subgrid processes are not, necessarily, independent of physics. How you implement sub-grid processes in a model is, typically, motivated by the physics of the process that you’re trying to implement at a sub-grid level.

ATTP – this is a long discussion thread but going back to the moniker ‘physical model’ I wonder whether the key differentiator is that the models like GCMs are characterized in several ways:

(1) Fundamental … from underlying proven physics like Navier-Stokes (which are in turn dependent on classical physics), with wide applications, as you note

(2) Bottom up … But equally are used in a ‘bottom up’ way that enfolds the underlying physics in higher level phenomena

(3) Emergent … This emergence of phenomena is key (such as the greater warming of the arctic and the warming of the lower atmosphere while the upper atmosphere cools) is ‘not obvious’ from the basic equations but is a very robust result, no parametrization at the phenomenological level required!

(4) Transparent (parameters) … As you say, where paramterization is unavoidable (like ‘how fast will methane be released from warming tundra’, then on-going research will provide an on-going tightening of the bounds – it is informed parametrisation, and even ‘how fast will the economy of China decarbonise’ can be estimated based on transparent evidence/ observations, projections and plans.

(5) Validated (parameters) … All the physical parameters are ultimately linked to real-world phenomena (like speed of methane release from carbon sinks), and these estimates are continually monitored and improved (so any settings of these parameters is not arbitrary, self-fulfilling guesswork, but transparent and justifiable).

mt: Oil companies do not have enormous technical talent at their disposal. I want to make that utterly clear to you. I already have one patent used for pipelines because they’ve been doing it wrong for 30 years. I’m about to get another for drilling… ’cause they’ve been doing it wrong for 45 years.

In fact if you look into what the industry does you’ll find a lot of ‘business’ papers which are long on bragging and short on details.

Tol believes GCMs can be tuned to give any desired result as that’s what economists do. And their success is splattered about for all to witness. Perhaps people in glass houses should stop modeling. It only exposes their nakedness.

mwgrant: The video does indeed reference what we know of all other software. Give it a watch.

Even if the code for the models looked like old garbage, its been viewed, reviewed, reused, tested, and verified over and over. There is no industry that does that. Not even NASA. I think the closest would be open source. But what makes Climate Models different is that they are being reused competitively. Meaning, the goal is to cooperatively compete for better results. This would be like companies and businesses sharing all their technical know how directly with their competitors.

Yes. I appreciate the link and your memory is correct. But the claim in a TED talk really is just a repetition of anOilman’s comment. I guess it can be a basis for anOilman’s comment but it really does not get to the basis…even there it is only a slide, an assertion. [I am sure there some reference, but…not here] While I have little doubt on the considerable effort expended with the checking of the climate models comparative statements as above and in TED will tell little useful about the level of quality of the code(s). Hence such comments are subject to abuse and–way more more important–misunderstanding and misapplication by others.

I noticed that Easterbrook transitioned quickly to the important question of ‘is it good enough?’ That is something is must be determined by the context of the use of the model(s) and bug-density comparisons with unrelated code.

Clearly the QA of the code is dependent on the circulation and the use in the community of practitioners. Microsoft users are a huge community but their required expertise for the use of the products (in the context of of product capabilities) is low. On the other hand climate models (and other physical models) require more knowledge and expertise on the part of the user but the user bases are much, much, much smaller. To me this makes comparison more suspect.

Also the reference in TED is to ‘bugs’. That is different matter than errors in the conceptual models [as mentioned by Howard.] Code is only the implementation of a model; it is not the entire model.

BTW I did not notice the reference at the bottom of the slide earlier. If I can dig it up I want to look at it to see just what was done. Note that the comparison indicated in the slide is with not with all other software. The genre that is missing (based on the slide) are physical models such as the USGS MODFLOW family of codes. There are used in both the government and commercial sector and often in contexts where litigation and/or regulatory actions are in play. They are exceedingly well documented and tested by the authors but even more QA is imposed on their specific applications. (And yes they no doubt contain errors.)

Bottomline to me: it is one thing to indicate the level of effort put in to the conceptual underpinning and the code and it is another to state that it stands above all others. Just be realistic, that is all I ask.

mwgrant,
Sure, I don’t know if they are the most tested. Steve Easterbrooks presentation seemed to suggest that they’re pretty good. MT’s link suggests they still have issues, which isn’t a great surprise.

Hmmm, my reading list just gets bigger. BTW it is interesting that the cited paper does not take up documentation. The root ‘document’ appears twice in the paper and in a very narrow context. Good extensive QA documentation is/can be a blessing or a burden. But in a dispute it is essential.

@-“In particular, they suffer from the problems described here:
[link]”

Climate models are judged on how informative they are, not on the elegance of their code.

I wonder if econometric models are ‘better’ in terms of conforming to a set of software design principles, ie :-

F. Upstreaming, distribution, and community building:
In order to provide attractive alternatives to forking, maintainers
must be diligent to create a welcoming environment
for upstream contributions. The maintainers should nurture
a community that can review contributions, advise about
new development approaches, and test new features, with
recognition for all forms of contribution.

Models are a work in progress and always will be. The insinuation that they are so badly borked as to be actively misleading runs into trouble when the focus is widened to include observational and palaeoclimate data.

Richard Tol wrote:
“As I have said many times before, it is the exaggerated and poorly informed claims of environmentalists that make climate science such an easy target.”

Wait, I think you got it backwards, climate science and GCM:s are not dependent on environmentalists’ claims. The other way around, environmentalists tend to have a good basis for their worries. Unfortunately, more uncertain science and models probably mean more reasons to worry.

Turbulent Eddie and Richard Tol both seem to think that parameterization means its not physics.

By this argument none of fluid dynamics if physics. All physical laws governing a fluid behaviours are a statistical approximation of the resulting interactions of large numbers of atoms interacting a bit like billiard balls. Of course the statistical approximations can be extremely accurate, in part because the numbers of atoms in even a small parcel of air are so enormous that statistical variations can cancel out to a very high degree.

Michael Hauber – good points, though they tend to corroborate Tols comment: “GCMs have a remote relationship with physics”.

Here’s an interesting quote from Wiki:

As model resolution increases, errors associated with moist convective processes are increased as assumptions which are statistically valid for larger grid boxes become questionable once the grid boxes shrink in scale towards the size of the convection itself. At resolutions greater than T639, which has a grid box dimension of about 30 kilometres (19 mi),[11] the Arakawa-Schubert convective scheme produces minimal convective precipitation, making most precipitation unrealistically stratiform in nature.[12]

mw: Steve Easterbrook, Michael Tobis and William Connolley have written a fair amount on issues such as version controls, bit reproducability, and other software engineering issues. There have also been conferences and sessions at the Fall AGU.

The TL:DR version is that this was another bloody flag issue but some of the discussion was interesting and may have lead to some marginal improvements.

If you want to say that based on and constrained to the experience you indicate Easterbrook appears to make a good argument in regard to the implementations of the codes that is understandable as your opinion/assessment. But of course I commented on anOilman’s comment, “…Climate models are the best and most thoroughly reviewed software out there. There is nothing like it.” That is a very different statement.

Your dots do not connect with regard to anOilman’s comment. Sorry. Now if you can come back and say that you failed to mention it but indeed you have also spent hundreds of hours reviewing the climate codes and attendant QA documentation, all of the codes and QA documentation established for agency environmental transport codes… hmmm, all the performance assessment modeling done for WIPP, all of the performance assessment modeling done for Yucca Mountain, all of the NRC codes, etc.–all of this and much more in depth–well then you might be able to take a stab at the best of the best. Again I do not see that experience in your background. But why fool with such a useless comparative statement when the question is,”are the codes good enough for the intended use?” Exaggeration encourages suspicion.

“The TL:DR version is that this was another bloody flag issue but some of the discussion was interesting and may have lead to some marginal improvements.”

Again the important question is are they good enough for the intended application? Frankly in my opinion the answer is ‘yes’. The big problem lies with what is the intended use? For me–and the basis for ‘yes’–is that too much emphasis has been put on model results for the policy decision process. That is they are about as useful as a hockey stick, i.e., they paint a picture through a blurry lens and due to time constraints that is what we have to go on.

mt: Issues in software are well known. Usually the call for throwing out old software and replacing it with new ‘ware is made by newbs. You can look up a lot about this. Its an immense schedule impact and more often than not it introduces a lot more bugs. That is what you article recommends. Industry sentiments would concur, but at least we have the earlier Mozilla flavor of Netscape to use;http://www.joelonsoftware.com/articles/fog0000000027.html

The other thing is that you haven’t substantiated that there is a problem. So, it is illogical to think that you can miraculously replace old code with new code and achieve a better result. Different, yes. Better? Maybe. (How much better was BEST again?… )

It is also a misguided belief to think you can just toss money and hire experts to solve a problem. Life doesn’t work that way. Take it from me, I have to dodge the money that gets tossed. What happened to Shell’s submersible well cap… oh yeah, crushed like a beer can. Sign me up for that!

mwgrant: I was referring to Easterbrook earlier. Its been a while since I saw that. Its also been a tad longer since I had a subscription to a software engineering management journal.

Defects per unit of code is indeed the metric used to measure quality. There are many grades of defect, from simple implementation to design. Different defects take different amounts of time to identify and fix. Design defects are the most serious and have the greatest long term effect on a project.

Have you ever worked with software? The stuff that’s going out these days, is bad. NASA missed Mars, the military missed the target, and Microsoft… (just a second, I gotta get another update.)

” The stuff that’s going out these days, is bad. NASA missed Mars, the military missed the target, and Microsoft… (just a second, I gotta get another update.) ”

So it seems. So it seems. And let me go ahead and spit it out….all of the mentioned activities have more than likely ‘instituted formal quality assurance programs.” G-r-r-r-r!

So where am I coming from? A different background and perspective than you. Three-plus decades coding and using models in nuclear waste facility performance in the environment, impacts assessments of contaminated sites in regulatory and commercial contexts. IMO in that arena code defects are an incomplete measure of quality. Often QA assurance involves lots of documentation, e.g., validation and verification packages for codes, internal review, extensive archives including copies of all materials referenced…it could be painful but essential. In addition to the code(s) run, detailed characterization of the underlying site-specific conceptual models is needed in the case of numerical groundwater models and for exposure scenarios when human and ecological impacts are studied. There are more things, but the take away is that from my perspective from the use of models in regulatory and litigation contexts defect would miss some potentially very serious problems if the metric was defect counts.

If I were to guess I would guess that the climate codes are much larger than the codes I refer to above and defect count might be more significant metric for them. However, they are still (aggregate) physical models and so QA also should document and qualify the conceptual models that shape components of the larger system model. Defect counts simply do not do that.

It should be clear now that my bias is that climate models and codes have traits in common with large commercial codes (MS) as well as traits in common with codes for the physical models used in environmental regulatory work. This second set of traits includes things like use of numerical solutions for PDEs, real difficulty in characterizing the physical domain modeled, and difficulty setting initial and boundary conditions. Also there are multiple phases present in the systems and nonlinear processes occurring. And of course the systems have their stochastic aspects. How all of these sorts of non-code factors are handled has to be documented in order for the code to be used with the specified assurance developed in the upfront QA plan–where goals are set. In short I see climate codes subject to the same practices used in the regulatory arena–and given the tenor of the present projections–in the most severe manner.

TH: A lot of these metrics that we develop come from computer models. How should people treat the kind of info that comes from computer climate models?

Hansen: I think you would have to treat it with a great deal of skepticism. Because if computer models were in fact the principal basis for our concern, then you have to admit that there are still substantial uncertainties as to whether we have all the physics in there, and how accurate we have it. But, in fact, that’s not the principal basis for our concern. It’s the Earth’s history-how the Earth responded in the past to changes in boundary conditions, such as atmospheric composition. Climate models are helpful in interpreting that data, but they’re not the primary source of our understanding.

TH: Do you think that gets misinterpreted in the media?

Hansen: Oh, yeah, that’s intentional. The contrarians, the deniers who prefer to continue business as usual, easily recognize that the computer models are our weak point. So they jump all over them and they try to make the people, the public, believe that that’s the source of our knowledge. But, in fact, it’s supplementary. It’s not the basic source of knowledge. We know, for example, from looking at the Earth’s history, that the last time the planet was two degrees Celsius warmer, sea level was 25 meters higher.

And we have a lot of different examples in the Earth’s history of how climate has changed as the atmospheric composition has changed. So it’s misleading to claim that the climate models are the primary basis of understanding.

mwgrant,
As BBD points out, many do not regard climate models as nearly as crucial as you seem to think they are. We have a very good idea of the global effect of increasing emissions from paleo work, and from basic physics. The decisions as to whether we should be reducing our emissions, or not, could be made without reference to GCMs. Where GCMs might actually be more important is in determining what sort of adaptation strategies we should be considering. Determining the regional effects of increasing our emissions isn’t all that easy without GCMs. Addmittedly, this is an area where there is less confidence in the output from GCMs (we’re more confident about the overall warming and changes to the hydrological cycle – see Hargreaves & Annan – than we are about regional effects). However, this is all we have at the moment. They’ll get better, but that doesn’t mean that one shouldn’t use their output now to inform policy.

mwgrant: I have worked in a variety of software development roles in a variety of industries with varying degrees of quality and processes.

Commercial software is utterly different from climate software. Utterly.

You are neglecting the sheer amount of review and reuse that goes on with the actual climate model code. To be very very clear, its cooperatively shared. Different eyes look at it all the time to make sure it works right, and then share their findings.

That never happens with commercial code. Do you have any idea how much a code review costs? And you think a company would do 10 more with the exact same guys just to make sure? No. Companies aren’t that stupid. Installing a money furnace would be cheaper I think.

Cell phones have large complex protocol stacks. It takes hundreds of programmers to write this stuff, and its also pretty scary to also try and make the phones. So most companies buy this source code from a common source for less than it would cost to write, then concentrate making the phones.

In the early days, we found ourselves in the curious position of being ahead of our competition. We were flying all over the world testing our phones, and kinda resented spending all that money and sharing the results. We were supposed to share our findings with the protocol stack developer, but we also knew who’d get the update… So we didn’t share everything that we had found. 6 months later, our dear competitor still had the bugs we already found earlier and consequently they were unable to certify their phones.

Too bad they didn’t have cooperative software sharing.

I’m not cognizant of the details for the requirements for climate models. That is unrelated to whether they work and should not be discussed/mentioned in the same context as software quality.

“If I were to guess I would guess that the climate codes are much larger than the codes I refer to above and defect count might be more significant metric for them.”

Given that the source to NASA GISS Model E is online, you could check for yourself, for that particular model.

“However, they are still (aggregate) physical models and so QA also should document and qualify the conceptual models that shape components of the larger system model. Defect counts simply do not do that.”

Since NASA GISS’s Model E project has a nice web site attached to it, including a list of papers in the academic press which describe specifications and results, and other references to academic papers whose results are incorporated in the model, and links to other documentation, it seems you could spare us at least some of the questions your posting if you just did a little research.

I haven’t bothered looking for similar information for other models recently, though in the past I’ve found a bunch of documentation on the Hadley Centre’s GCM.

“For me–and the basis for ‘yes’–is that too much emphasis has been put on model results for the policy decision process.”

This is both false and classic denialist rhetoric, aka concern trolling.
—————————————–
It is simply a statement of my opinion based on my experience working with environmental models. In your response you choose to ignore the sentence that follows:

“That is they are about as useful as a hockey stick, i.e., they paint a picture through a blurry lens and due to time constraints that is what we have to go on.”

and in particular the last half of that sentence: due to time constraints that is what we have to go on.

‘false and classic denialist rhetoric’? Now who is really the troll here?

aTTP

So you see, my opinion is opposite of your read “many do not regard climate models as nearly as crucial as you seem to think they are.” My statement is that the value of models in the present context is over-stated. We are under time constraint in a decision process and have to objectively move that process forward given what imperfect inputs we have in hand. That is the essence of decisions.

As for my concern with model QA, my last lengthy response to anoilman points out my perspective on model and code QA in the environmental regulatory arena–a different background than his. In both case QA is a big deal. However how QA is approached is different. I found this interesting.

A final note to BBD — you are quick to use handles for people you neither know nor understand. Hackneyed political terms usually are not productive.

TE, Nothing that you (or Tol) have said supports the claim that “GCMs have a remote relationship with physics”.

Statistics is the key word.
There’s an apt line:
“Models are like sausages: People like them a lot more before they know what goes into them”

But you kind of know that the GCMs are not good and pure, right?

Because why else would the gcm results be so different?
They all use the same physics, right? ( If they don’t there would be a big to-do about which physics are correct, but the physics are mostly “settled”, right? )

But statistics and simply instability of numerical methods give us the variance that we observe between models which are all attempting to represent the same physics:

“Given that the source to NASA GISS Model E is online, you could check for yourself, for that particular model.”

Why bother? The comment is clearly indicated as a guess—and all the standing that entails or doesn’t. I am content. It is based on my experience with other environmental systems codes and the comparative level of complexity. I guess and tell you that I did. What’s you’re beef?

dhogaza wrote:

“Since NASA GISS’s Model E project has a nice web site attached to it, including a list of papers in the academic press which describe specifications and results, and other references to academic papers whose results are incorporated in the model, and links to other documentation, it seems you could spare us at least some of the questions your posting if you just did a little research.”

I have been there and my comment reflect what I found. Again looking at what is done for other custom and commercial environmental codes when used in a regulatory and litigative environment posted journal article are generally inadequate documentation—good summary, yes, but documentation no. [Journals have constraints on article size and besides QA of codes is not there primary function.] In particular validation and verification V&V or some functional equivalent is a sufficient concern to merit its on documentation.

A final note to BBD — you are quick to use handles for people you neither know nor understand. Hackneyed political terms usually are not productive.

Read the Hansen quote and compare with what you are doing here.

1.) What does Hansen’s quote have to do with the charged language you so often choose to use?
2.) Yes, I read the quote. You are on your own tangent. Note however that Hansen’s “I think you would have to treat it [model output] with a great deal of skepticism, “ is what I always do. This is consistent with my earlier statement:

“For me–and the basis for ‘yes’–is that too much emphasis has been put on model results for the policy decision process. That is they are about as useful as a hockey stick, i.e., they paint a picture through a blurry lens and due to time constraints that is what we have to go on”

1.) “Commercial software is utterly different from climate software. Utterly.”

I have worked* with both commercial, national lab and house-custom code on projects with the USDOE, USNRC, USEPA, and USACE. This includes writing code from scratch, extending others’ codes, using commercial codes and using government codes. Utterly, indeed.
—-
*To avoid confusion: both as on-site worker and as off-site consultant working for beltway bandits.

2.) “You are neglecting the sheer amount of review and reuse that goes on with the actual climate model code. To be very very clear, its cooperatively shared. Different eyes look at it all the time to make sure it works right, and then share their findings.”

No. I am familiar with those processes in other disciplines. I just do not share the same faith in them that you do — not when the stakes are much higher than those I have seen in smaller environmental projects where a much more demanding/detailed QA effort is reasonably expected.

I’ve shared my code with the opposition’s hired guns’ in a contentious highly government-public environment. I think that qualifies but like your comment it is irrelevant here.

“I’m not cognizant of the details for the requirements for climate models. That is unrelated to whether they work and should not be discussed/mentioned in the same context as software quality.”

Here ”whether they work” is an ambiguous term. It is tempting to dismiss the comment out-of-hand, but I will respond:

If when working on a model for a physical system you put the wrong model for a particular process, i.e., you implement a poor conceptual model or perhaps inaccurate approximation for that process, then the code can execute to the cows come home and quality will be an issue. So I will assume that by ‘work’ you mean something more than ‘execute’ and that something includes selection the correct conceptual model or correct level of approximation for the process. But that is very much a part of the requirements or specifications for the mode, i.e., it is still very much a matter of quality assurance.

and back to

”Cell phones have large complex protocol stacks. It takes hundreds of programmers to write this stuff, and its also pretty scary to also try and make the phones. So most companies buy this source code from a common source for less than it would cost to write, then concentrate making the phones.”

I think that to an extent there is a similar pattern in the government agencies. Our projects handled the QA in-house and out-of-house code differently. Separate were in place with-in the quality assurance scheme.

————
It is just opinion but I think that climate folks should really see how the geohydrologists have approached model QA in a contentious regulatory environment. It may be enlightening and sobering at the same time.

too much emphasis has been put on model results for the policy decision process.

As I said – correctly – this is both false and classic denialist rhetoric (concern tr0lling). If you don’t like being challenged then avoid making false statements and parroting denialist rhetoric – especially if you aren’t of that camp.

It is simply a statement of my opinion based on my experience working with environmental models.

See ATTPs response to your earlier comment above. Productive conversation arises when the party in error admits the error, which you chose not to do.

1.) What does Hansen’s quote have to do with the charged language you so often choose to use?

Tone tr0lling. See previous comment.

2.) Yes, I read the quote. You are on your own tangent.

Hansen:

Oh, yeah, that’s intentional. The contrarians, the deniers who prefer to continue business as usual, easily recognize that the computer models are our weak point. So they jump all over them and they try to make the people, the public, believe that that’s the source of our knowledge. But, in fact, it’s supplementary. It’s not the basic source of knowledge.

You (incorrectly):

too much emphasis has been put on model results for the policy decision process.

QED

* * *

It is just opinion but I think that climate folks should really see how the geohydrologists have approached model QA in a contentious regulatory environment. It may be enlightening and sobering at the same time.

And I think that the constant insinuation that there are ‘problems’ with climate science which needs to learn from other fields is generally rather risible.

And I think that the constant insinuation that there are ‘problems’ with climate science which needs to learn from other fields is generally rather risible.

I tend to agree, but I’ll make a further comment. Most forms of modelling in the physical sciences is simply a way of trying to gain understanding of a physical system. How does it respond to changes? What happens if…. Etc. The goal isn’t to produce some kind of result for a client. It’s true, however, that climate science is in a bit of a grey area in which climate models are used to try and answer fundamental questions, but are also used to provide information for policy. So there is a case to be made for maybe dealing with climate modelling in a way that is slightly different to how we deal with other forms of modelling in the physical sciences. However, we already do use the same models for weather prediction, which is providing a service, so it’s not clear that this isn’t already the case, to a certain extent. Additionally, we’re trying to provide projections for the coming decades, so how do we actually validate these models? As I think Annan & Hargreaves (or Hargreaves & Annan) said. We could wait for 100 years to find out, but that would probably be too late if what they suggest now is broadly correct.

2.) Yes, I read the quote. You are on your own tangent. Note however that Hansen’s “I think you would have to treat it [model output] with a great deal of skepticism, “ is what I always do. This is consistent with my earlier statement:

“For me–and the basis for ‘yes’–is that too much emphasis has been put on model results for the policy decision process. That is they are about as useful as a hockey stick, i.e., they paint a picture through a blurry lens and due to time constraints that is what we have to go on”

Perhaps ‘imperfect’ would have been better than ‘blurry’.

DUE TO TIME CONSTRAINTS THAT IS WHAT WE HAVE TO GO ON

Can you read? I am not going to return to the topic or respond to you feigned injury, BBD.

aTTP raises good points and these as it happens allows me to press the point that there are possibly some useful things to be learned.

Most forms of modelling in the physical sciences is simply a way of trying to gain understanding of a physical system. How does it respond to changes? What happens if…. Etc. The goal isn’t to produce some kind of result for a client. It’s true, however, that climate science is in a bit of a grey area in which climate models are used to try and answer fundamental questions, but are also used to provide information for policy.

Yes. The same holds for many other environmental models, e.g., groundwater contaminant models The models share other trait with the climate models. Numerical implementations, under-characterized model domains, fluid flow, chemical interactions, difficult defining the best initial and boundary conditions , and stochastic elements, to name a few.

So there is a case to be made for maybe dealing with climate modeling in a way that is slightly different to how we deal with other forms of modeling in the physical sciences. … Additionally, we’re trying to provide projections for the coming decades, so how do we actually validate these models?

There is a lot in common as I already mentioned above in the first part of this response. In addition to those consider this: a big concern with groundwater models is validation, particularly in light of the fact that impacts well into the future have to be predicted—years to thousands of years. So you see, these people have some very similar problems to take on.

“We could wait for 100 years to find out, but that would probably be too late if what they suggest now is broadly correct.”

Just again emphasizes again similarities.

aTTP, my comments are not in any way directed at throwing the models out, just keeping them in perspective within a necessary decision process consistent with Hansen’s I think you would have to treat it with a great deal of skepticism’ comment and suggesting another view on QA in what is notably a contentious arena.

BBD: mwgrant is insinuating that there are problems with climate models while providing zero evidence that this is the case. I tend this label that behavior trolling.

Given mwgrant’s preference for being extremely precise in his arguments, its safe to say that, he, like us, understands that there are no concerns over climate models. Its an easy conclusion since he has no evidence that there is a problem in the first place.

mwgrant: You do know that the models go back and forth to the real world don’t you? They’ve taken weird results analyzed them further, sent them back to hunt for more real world information, which was later found. You do know that right?

mwgrant: You do know that the models go back and forth to the real world don’t you? They’ve taken weird results analyzed them further, sent them back to hunt for more real world information, which was later found. You do know that right?

You respond with a snarky devoid-of-content comment evoking the real world with which you are so familiar given that I’ve mentioned V&V a couple of times? You can do better. Or can you? I tire of you irrelevant bluster. You want to BS your creds but keep opening your mouth and getting in your on way. Let’s face it, I think jack of your creds in the climate model context and vice versa. I’ll sleep at night. I tire of your irrelevant bluster. (That make me think you did more ‘managing than doing’ but that is probably an availability bias on my part.)

Eli, OK on second thought that seems like an interesting idea. It will take some time, but what the heck, let’s do it? Are you available for questions? Shall we in due time take it up on your blog so as not to burden aTTP? (I have one but it is not up and running.) Retired analyst with background in fate and transport modeling in the environmental arena takes a look at CESM. No agenda from either of us and no snark. And interaction as I go thru it. Hell, I live in Montgomery County I’ll even meet you for lunch. Interested? I need a hobby.

anoilman — I’ve just taken stock how far afield from my original comment this thread has gotten–the audacity of your nonexpert assessment of the best of all software. … really a foolish statement because the burden of proof is on you. Not going to waste time on you any more.

perhaps mw should go find out about how the community earth system model has been put together.

CESM 1.2 User Manual writes:

<iCESM Validation

Although CESM can be run out-of-the-box for a variety of resolutions, component combinations, and machines, MOST combinations of component sets, resolutions, and machines have not undergone rigorous scientific climate validation. … Users should carry out their own validations on any platform prior to doing scientific runs or scientific analysis and documentation.

Well, that was short and sweet. Looking at CESM will not be that useful from a QA perspective. That is not why it has been made available anyway. (Eli doesn’t seem tuned to this thread. Guess that was the case.) Better Model E documentation. Really not a surprise.

Wow, a lot of interesting ideas on this thread. I hope ATTP doesn’t shut it down despite an expressed inclination to do so.

Let’s start with the nuts and bolts issue.

On the quality of GCMs I am interested in engaging with mwgrant and frankly less so with anoilman who appears to me to be talking out of his hat. Although I am often cast as a zealot by some in the naysayer camp, I will not appear as a defender of the state of the art in climate modeling. I think some wrong turns have been taken in the last 15 years, or at the very least, that some promising avenues have been unreasonably ignored.

I am interested in the idea of undertaking a serious code review of a GCM. If you choose CCSM, I am already not be an enthusiast. I’d be more interested in GISS or the Hadley model, myself.

The bit you quote about validation is interesting, though I’m not sure what conclusions you take from it. I find it alarming that there is a “validation” phase in porting CCSM to any given platform. The idea that anybody outside NCAR knows how to do this appears to be a polite fiction. Their budget having been cut, their support for outside users is remarkably thin and getting the thing to even run at scale, never mind validate it, on a non-NCAR platform can be immensely frustrating as I can personally attest.

That said, I remain reasonably confident that NCAR internally do have a verification/validation protocol, and that the results of the model on validated platforms is tested to reasonable standards in the sense that would satisfy any software engineer or follower of Feynamn’s dictum “the easiest person to fool is yourself”.

That said in their defense, there is little that can be done about this if we accept the underlying idea of the thrust of ever increasing complexity of GCMs. Supercomputing is a sort of perpetual 1965, where ever single machine has its peculiarities; code that runs under putatively the “same” operating system and the “same” compiler on the “same” message passing infrastructure will fail on a new machine simply because the ways in which processors are allocated to a job varies from one machine to another. Most programmers have long since forgotten the overhead of JCL (“job control language”); supercomputer end users (never mind programmers) spend a lot of time mucking with shell scripts that essentially perform the functions of JCL at a much higher level of complexity and with much weaker documentation.

Finally, these models are not for practical purposes Open Source even if they do literally satisfy the criteria, of which I’m uncertain. http://opensource.org/definition Publication of the source is not sufficient. In practice, though, Open Source implies a welcoming of input and exchange with any interested and motivated party. Under budgetary stresses, NCAR in particular doesn’t act remotely in an open-sourcey sort of way. And in practice, every run on non-NCAR platforms constitutes a fork. University researcher modifications are to my knowledge hardly ever vetted and folded back into the source tree. Mostly they are done by immensely patient grad students who are adept in some aspect of science but little or no software engineering training.

To critics of climate science I would add first of all that none of this is unique to climatology – many other sciences are struggling along in this way, in a way that seems antediluvian to someone who has some sense of modern commercial software development.

Secondly, I would add that uncertainty is not your friend. To state that the models are lousy and therefore we should act as if the sensitivity is zero is about as unreasonable an argument as we see in these unreasonable times. It’s like saying you haven’t looked at your bank statements and so therefore you can write as many checks as you want.

Also, I would like to be intellectually honest and point to a key weakness in my complaints, which is that, somewhat to my surprise, the different models DO seem to be converging on some specific regional predictions, e.g., the drying of the American southwest. So all the effort that has been put in for fifteen years has NOT been entirely fruitless. That said, I think it’s long past time for a simple, readable AGCM implemented in a modern programming language (ideally, a language whose name is homonymous with a large snake) over a couple of well-tested libraries for the fluid dynamics and radiation code. I’ve been arguing for this for a decade, and I think it’s more true than ever.

I’m no longer waiting for my phone to ring nor submitting vain proposals to funding agencies. But maybe it’s time for an unfunded effort.

Paragraph by paragraph I find your comment on target…well, python needs to thought about a bit (speed? and IMO there is nothing wrong with modern fortran)

I hope to respond later in more detail. However for now I’ll make some quick comments:

1.) I too have an expectation that internal documentation in some form/condition can be found at UCAR. I think that because the point of the site is to make the code available to a wider community of users, there was a pragmatic decision made on the amount of material made available–do not overwhelm the user. Most people probably want to open the box and run the QA. UCAR is quite clear and correct in its note: the user is responsible installs on his/her machine(s). This is both a reasonable and practical burden to put on the user and would be a part of their project’s overarching QA plan.

2.) I agree considering GISS and Hadley is much more meaningful.

3.) How to proceed is an important topic. That is a whole big topic by its self. Being a productive effort in the current environment could prove tricky.

4.) While a modernized updated code (open source paradigm?) could be an attractive tool and contributor one should not lose track of the important goal is assure adequate confidence in the QA on the existing codes. But we definitely need a gnuClimate :O)

I believe that the idea that there is much need for better climate modeling to inform the mitigation process is wrongheaded.

It would be good if we could inform the adaptation process; so far progress has been limited, and I think it will remain so until a new software approach is taken. And if anybody asked me, I honestly think I can connect the right communities to make a good stab at it.

But on the mitigation side I will continue to maintain that this is a red herring. We have more than enough information to know that our recent and foreseeable emissions trajectory is insane.

I’m having a pleasant evening with the family (even though South Africa lost to Argentina for the first in a rugby test) so don’t have time to write a lengthy comment. MT has made some very interesting points. Maybe we can aim to use those as a motivation for being constructive, rather than confrontational.

Once upon a time, a friend of mine accidentally took over thousands of computers. He had found a vulnerability in a piece of software and started playing with it. In the process, he figured out how to get total administration access over a network. He put it in a script, and ran it to see what would happen, then went to bed for about four hours. Next morning on the way to work he checked on it, and discovered he was now lord and master of about 50,000 computers. After nearly vomiting in fear he killed the whole thing and deleted all the files associated with it. In the end he said he threw the hard drive into a bonfire. I can’t tell you who he is because he doesn’t want to go to Federal prison, which is what could have happened if he’d told anyone that could do anything about the bug he’d found. Did that bug get fixed? Probably eventually, but not by my friend. This story isn’t extraordinary at all. Spend much time in the hacker and security scene, you’ll hear stories like this and worse.

It’s hard to explain to regular people how much technology barely works, how much the infrastructure of our lives is held together by the IT equivalent of baling wire.

Even formal specification can’t provide much safety against exploits, unless it goes from programs to language all the way up to hardware.

***

> Quote it ol’ buddy.

There’s this:

Again the important question is are they good enough for the intended application? Frankly in my opinion the answer is ‘yes’. The big problem lies with what is the intended use? For me–and the basis for ‘yes’–is that too much emphasis has been put on model results for the policy decision process.

The emphasized bit presumes that models have been used for the policy decision process, a claim which, as MT suspects, might very well be a red herring.

In other words, dear mwgrant, unless you can clarify where you’re going with this “important question,” your huffing and puffing against Oily One is turning into playing the ref, and playing the ref is against the site policy.

mwgrant,
Although I do have some sympathy for those who get flack on climate blogs, I would argue that if you think there’s too much chaff here, there probably isn’t a climate blog that would be suitable.

MT,
What you said here seems similar to what I was trying to get at here. I agree that using inadequacies in climate models as an argument against mitigation is a red herring.

“I thought this was common knowledge that everything was broken” etc., etc.

Everything being broken is never a point of contention. The point of contention is “…Climate models are the best and most thoroughly reviewed software out there. There is nothing like it”—in particular its broad and deep reach. This basis for this assertion has been given as Easterbrook (TED) which cites Pipitone and Easterbrook 2012. The discussion in the TED talk is only a small part of the twenty-one minute presentation but clearly indicated the quality as indicated by defect density:

As Easterbrook notes this defect density is low. Fine, I had not disputed this anyway. I have simply questioned the superlative comparison of anoilman’s evaluation. Initially this was based on common sense—-there are just too many codes out there to have been considered.

My intuition seems to have been corroborated in the article. In the beginning of the section on Future Work the Piptone and Easterbrook reference cited on the debug slide in the later’s TED talk states [my bold]:

Many of the limitations to the present study could be overcome with more detailed and controlled replications. Mostly significantly, a larger sample size both of climate models and comparator projects would lend to the credibility of our defect density and fault analysis results.

A little later in the same section the authors quote ‘Hatton (1995)’:

“There is no shortage of things to measure, but there is a dire shortage of case histories which provide useful correlations. What is reasonably well established, however, is that there is no single metric which is continuously and monotonically related to various useful measures of software quality…”

Enough said. BTW the article is interesting and informative on the question of ascertaining climate software quality. It is a good initial effort, but here have perhaps forgotten the limitations mentioned by the authors.

***

In response to my statement

anoilman – where have I stated ” that there is something wrong.” Quote it ol’ buddy.

W offers:

There’s this:

”Again the important question is are they good enough for the intended application? Frankly in my opinion the answer is ‘yes’. The big problem lies with what is the intended use? For me–and the basis for ‘yes’–is that too much emphasis has been put on model results for the policy decision process.”

W. misread the quoted text[block]. There is nothing wrong here. There is only a problem to be solved—-specifying the intended use. Existence of a problem does not necessarily mean that there is something wrong.

I read mt’s comments, ‘red herring’ and the rest, differently than you. The two broad alternatives beyond a base case no-action are mitigation and adaptation. I saw mt’s comments as further observations on the current state of things and on how modeling might inform for each alternative. I agree with what he wrote and indicated this to him—outside of this blog.

Playing the ref? If the ref intentionally or unintentionally get in the game, then the ref is game. The timing of his caution flags suggest to me that the ref entered the game. So he got bumped a little. Ref may think otherwise, but there probably isn’t a climate blog where that doesn’t happen.

The idea that mitigation and adaptation should be considered as alternatives is fundamentally wrong in my opinion. Such a dichotomy certainly should not be read into my comments.

There is no meaningful bound to adaptation cost in the absence of mitigation. This should be obvious to any one who is paying reasonably close attention.

Climate models are only one among several streams of evidence pointing to a high enough sensitivity that a large proportion of our current fossil fuel reserves cannot be used (at least in the absence of an enormous sequestration effort) and that further discoveries are counterproductive. We simply cannot adapt to the amount of CO2 that individuals could profitably emit in the absence of regulation. I don’t think this is plausibly in doubt.

My point was that if there is a policy use to continuing efforts in climate modeling, it is to inform such adaptation as will inevitably become necessary, even in the most rigorous mitigation scenario.

Mr. Grant’s comments make it very clear how very badly we have been doing in communicating this fact to the public and the policy sector. He’s obviously not stupid and does have some relevant skills. He apparently just doesn’t get it.

Without significant mitigation, adaptation will fail. With vigorous mitigation, expensive adpatation will be necessary. They are not alternatives.

mt, I think it is fair to say that without models, we cannot place either an upper or a lower bound on the cost of adaption. That is, without climate models we cannot be sure that the costs of climate change will require significant adaption to maximize standards of living. However, that is irrelevant because without climate models we also cannot be sure that climate change impacts will exceed the bounds of all reasonable adaption measures such that standards of living will be reduced to levels not seen globally in 100, or even a thousand years; and without models we can be sure the low cost possibilities are also low probability possibilities. Not mitigating (ignoring the “evidence” of models) is like playing Russian roulette with five rounds loaded when we are uncertain whether it is a five or a six chamber cylinder. We might come out ahead, but no sane person bets on it. With the “evidence” of models, we at least know the number of chambers.

“The idea that mitigation and adaptation should be considered as alternatives is fundamentally wrong in my opinion. Such a dichotomy certainly should not be read into my comments. ”

Thanks for that clarification on that detail. It was my mis-read. Your point-of-view is duly. Please understand that my perspective is to view the a set of plausibly executable possible alternative and the manifolds of possible scenario dependent outcomes. You should note that there is nothing that precludes your variant of an adaptation to be an alternative in the risk approach. It is simple: develop a manageable number of alternatives assess the risk (good and bad). And crank it out. (Yes, is it complicated and bumpy.)

A couple points (I’m trying to be brief)
1.) I expect that if such a structured comprehensive process is undertaken objectively it has the capability to order relatively or semi-quantitatively the alternatives. If either entrenched side in this ‘debate’ has faith in their position they have also have no fear in laying on the line in a fair process. So I have no ‘sympathy’ for mt who has in some manner worked it out in his head or elsewhere. However, the same applies to all others that have gone thru similar evaluations. Until given good reason I would not budge deviate from that approach.
2.) The estimation of risk is obviously multiple points of contention, maybe to the point that some may be deemed incalculable. Even if that is the case it is important to have taken up by all participating parties in a transparent manner. Even some sharper delineation of areas of disagreement may help.
3.) mt if you have some concrete suggestions on you think I could read to facilitate my better understanding of your ‘we ain’t got no options’ perspective point them out and I’ll be happy to spend some time with it–or am I safe for now just going with your third paragraph as a readers’ digest version.

“Mr. Grant’s comments make it very clear how very badly we have been doing in communicating this fact to the public and the policy sector. He’s obviously not stupid and does have some relevant skills. He apparently just doesn’t get “

This is a good comment. Whether I do not get it or the efforts at communication have been bad or both, we are all better off if this is addressed. (Perhaps the nature of blogs as suggested by aTTP above means that they are not very helpful in the process and we need to modify them or move to other venues.

My view in regard to the contrarian matrix. (Thanks for your observation and the link. My distaste for classification and labels in regard to people is pretty strong but temptation of self-assessment was stronger.)

There is no such thing as a global average temperature. It is a metric. That entails uses and limitations. I think that the former outweigh the latter.

That most of the W since 1950 is from A means little Disagree. Thought never even occurred to me.

The GW has been less than predicted or overestimated by the models. Sure if one looks at the graphs, but the importance of that has also been overestimated in the context of decision-making. Regardless of what the eventual outcome will be, time is paramount in decision-making.

Model projections are unfalsifiable. That concern is an irrelevant waste of time. Popper was a philosopher, not a god.

Paucity of data prevails and its climate signal is almost indistinguishable from noise. Disagree, although I think that the statement is too open to interpretation.

We don’t know what adjustments were made to these records. No we do not. But it is what we have.

We need … It would be nice but we must work with what we have improving what we can. Again time is a constraint. To me the most important item here is probably the V&V. Independent? Yes that would be nice, but more important is transparency in a highly contentious environment. Find a way to get all of the internal documentation out in to the public. That is a relatively easy first step and may go a long way.

General comment: I view global warming as a risk problem in the context of policy. Jumping to mitigation or entrenching in ’n-action’ are too restrictive . A reasonable set of alternatives and outcomes need to be characterized to inform the decision. Time is paramount not because an urgency about gloom and doom, but because many of the aspects of the problem are conditioned by time and depend on time-ordering. Bottomline anoilman, here less than perfect decision making trumps less than perfect science.

BTW, I think that such a matrix is limited in utility without counterparts for camps with opposing view—all in neutral language.

mt, my view is that the medium influences the content, e.g., anonymity, no visual feedback, etc. can impact expression which is bound into content. Even on the other end expression influences perception. We can not escape our wiring.

I don’t view “exaggeration encourages suspicion” and “gloom and doom” as neutral. They both belong to the “Do not panic” level, i.e. the CAGW meme. One does not simply conflate calmness of tone and neutrality to whine about flaming in Mordor.

There are ways to understate one’s claim that can be heard loud and clear by the Internet dogs, e.g:

I think that such a matrix is limited in utility without counterparts for camps with opposing view—all in neutral language.

In that sentence, the word “limited” is quite splendid, and the concept of “opposing view” creates another dichotomy that would deserve due diligence.

The idea that we can not escape our wiring may be related to the one according to which we cannot escape our writing.

mwgrant: The total lack actual experts who produce and consume the data would strike me as a critical concern for a global warming blog. Especially in this case the notion of producing and consuming different data has been brought up by people who don’t do it. Hence… I think Anders is right.

Willard’s site is simply the usual drivel we see all the time. “Stop the press! Something might be wrong!”, is a textbook meme as far as everyone here is concerned.

You could have just said what you said in that last post up front. You were all over the place. “To me the most important item here is probably the V&V. Independent? Yes that would be nice, but more important is transparency in a highly contentious environment. Find a way to get all of the internal documentation out in to the public. That is a relatively easy first step and may go a long way. ”

I’m not sure I agree with your ideas of public transparency. That’s not a no… but I just don’t see what you’re going to get. This isn’t stuff you can just pick up and read. You’d probably need a graduate level course in climate science before you could begin. Various aspects of it will likely require a phd in a narrow subject. Joe public can’t understand the difference between passing the pointer to an automatic, and a crappy comment. (I was once asked me why we can’t just use GPS at bit for drilling.)

On the other hand actual qualified and competitive groups are reading and verifying the code/results, so I don’t believe they are biased. Quite the opposite, we have groups on different continents, different organizations all competing cooperatively. The final stage of review is peer reviewed papers on the subject. (It has been a very long time since I saw a paper on the competition results.)

If you think you are really on to something, why don’t you start up a public, and open group to start reviewing and testing the code that we all know you have access to? If you do well, and gain traction, then maybe you would have earned the street cred with the other organizations to look at their code.

OK, I exaggerate. I’ve been a fan of Marshall McLuhan since the bronze age. The medium is of course the message. But I’d say that’s true only in part. Also, in compensation for some of the drawbacks, though, blog conversation is scalable and referenceable (presuming the URL and web service stays intact anyway).

It depends very much on who is running it and who the audience is and what the moderation policy is. But I think there are some real advantages along with the disadvantages.

The key problem is identifying who knows what they are talking about and who is posing or hopelessly overconfident. That may vary by subject matter as well as person.

MT,
I think you make some very good points about the value of blogs. I, for example, don’t see this as some site where I get to broadcast my brilliance to the world, it’s simply a site where I can express my views – some of which are more informed than others – and learn from those who comment (and regularly do). As frustrating, and difficult, as moderation can be, I wouldn’t be able to run a blog without comments. It would just seem pointless and the lack of any feedback would seem bizarre.

You’re right that identifying who knows what they’re talking about and who doesn’t is crucial, but maybe not all that hard. What I find most disappointing are those who clearly do have valuable contributions to make, but choose to do so in a way that just makes it not really worth interacting with them.

Blogs are broadly useless, IMO. What I think people should do is do their best to be as informed as they can be. Normally that would involve interacting with real experts.

On your blog, I’ve read posts from: Lacis, Dessler, Way, Pielke, and probably some others I’ve missed, who all have published climate papers. So on your blog, there has been interaction with ‘real experts’.

Similarly, Curry is an expert who runs a blog and Lacis, Pielke, and others have posted there.

And of course, Isaac Held’s blog is full of gems and he goes out of his way to respond to posts.

Now, verifiable data and ideas are the important part, not expert authority, ( Pascal overturned ‘method of authority’ by taking the barometer up the mountain, though it’s ironic to site Pascal ). But if you want experts, you seem to have them.

”
Blogs are broadly useless, IMO. What I think people should do is do their best to be as informed as they can be. Normally that would involve interacting with real experts.
“

I would agree. What readers curious about science should gravitate toward is reading a site that contains independent research such as at http://ContextEarth.com or forums that feature collaborative work. Azimuth, moyhu are examples.

The fact that it owes its existence to blogworld would put my priors squarely in the category of Stadium Handwave in my estimation. (I’ve talked climate in person with John Baez. I like the guy, but he doesn’t really understand climate. Physicists tend to expect more simplicity than the world affords.)

In the end, though, it is a very different sort of beast than the Stadium thing. Its trouble is not that it applies tests that are far too broad to inform us like the Stadium paper does. Quite to the contrary – it performs tests that it passes far better than would reasonably be expected. It looks more than a bit too good. And it is frank about it:

The ground of ENSO modeling with simple systems is well-trodden. This is very different than what the literature says.

Does the author use the claimed nearly invincible method to predict the future, or just the past? A couple of seasons of on the money predictions would certainly cause people to sit up and take notice. (Of course, there’s the “difficulty of predicting significant regime changes that can violate the stationarity requirement of the model” dodge to fall back upon.)

What’s more the physics doesn’t sit right with me.

You’re a SciPy guy if you’re the real WHT. I presume you can point us at Pukite’s code?

“Apart from the difficulty of predicting significant regime
changes that can violate the stationarity requirement of the
model, both hindcasting for evaluating paleoclimate ENSO
data [14] and forecasting may have potential applications for
this model. The model is simple enough in its formulation
that others can readily improve in its fidelity without
resorting to the complexity of a full blown global circulation
model (GCM), and in contrast to those depending on erratic
[19] or stochastic inputs which have less predictive power.”

In other words, the atmosphere doesn’t enter into it, (except for the stratospheric QBO).

Nick Stoke’s blog, who has been contributing to the science outside his blog, and reports so on his blog. I don’t know if GISS has adopted his rewrite/replication of their FORTRAN code for GISTemp with his Python version, but serious conversations in that direction were happening a couple of years ago.

It contains a threaded forum for long-term discussion where you can add charts and equation mark-up to your heart’s content. That’s where all the El Nino discussion takes place.

”
Moyhu is a blog:
“

Barely, it is really an engineering notebook which contains interactive charts and code for analyzing data. See also WoodForTrees and KlimateExplorer and SkS also has something similar. Just as ContextEarth connects to a complementary interactive semantic web server http://entroplet.com

”
What were you saying regarding blogs again, Web?

Something obviously does not sit well
“

Probably got a bee up your butt.

Only the scientists with the keys to the castle are allowed to play in this field, eh?

BBD: “So what does ‘jumping to mitigation’ mean, exactly? I really don’t follow what you are trying to say here.”

mt: “As for “jumping” to mitigation, we have been standing nervously at the end of the diving board holding up the line for, what, twenty years now?”

aTTP: “Technically a diving board that’s been getting higher.”

Here is the full quote with the phrase. ([o] is a typo correction.)

General comment: I view global warming as a risk problem in the context of policy. Jumping to mitigation or entrenching in ’n[o]-action’ are too restrictive . A reasonable set of alternatives and outcomes need to be characterized to inform the decision. Time is paramount not because an urgency about gloom and doom, but because many of the aspects of the problem are conditioned by time and depend on time-ordering. Bottomline anoilman, here less than perfect decision making trumps less than perfect science.

Mitigation and no-action are decision alternatives. The meaning of ‘jumping to mitigation’ is implementing the mitigation without with adequate characterization of the decision (including the goal of the policy, the metric(s) defining defining quality of outcome, alternative selection criteria, clarity, etc.) or consideration of other reasonable alternatives.The meaning of ‘entrenching in n[o] action’ is opting to continue in the present mode taking no additional actions again without with adequate characterization of the decision or consideration of other reasonable alternatives. That is, these two alternatives are representative of the sort of default policy that we may well inherit as a result of lack of due diligence in the decision making phase.

I certainly agree that we have been in that phase for two decades and as time passes conditions change. This is an incredibly stupid, preventable circumstance that has evolved. At this stage there are no innocents and IMO assignment of blame is a useless, nay, hindering exercise. To me uncertainty in outcome under different alternative approaches can be manipulated provide both major camps in the ‘debate’ with pro and con arguments. This suggests is all the more reason to immediately move toward adopting an open inclusive structured approach to the decision making. Handling the science is not the stumbling block—making decisions is, i.e., less than perfect decision making trumps less than perfect science. But, hey, if you insist in wanting to be perfect… just wait and you might be.

> Only the scientists with the keys to the castle are allowed to play in this field, eh?

When a ClimateBall player such as Web makes things personal, it shows weakness. Therefore, one only has to read what has not been addressed. Here’s what has not been addressed:

Does the author use the claimed nearly invincible method to predict the future, or just the past? A couple of seasons of on the money predictions would certainly cause people to sit up and take notice. (Of course, there’s the “difficulty of predicting significant regime changes that can violate the stationarity requirement of the model” dodge to fall back upon.)

There’s also the quote, which I believe Web can find back if he’s more interested than to use AT’s for his usual drive-bys [Mod : redacted the last part of this comment.]

Mitigation and no-action are decision alternatives. The meaning of ‘jumping to mitigation’ is implementing the mitigation without with adequate characterization of the decision (including the goal of the policy, the metric(s) defining defining quality of outcome, alternative selection criteria, clarity, etc.) or consideration of other reasonable alternatives.

The goal of the policy is to avoid dangerous warming by reducing CO2 emissions. This also defines the quality of the outcome. What ‘other reasonable alternatives’ are there to decarbonisation of the energy supply?

What is the policy downside from simply getting on with this?

I don’t really understand what you are arguing for. My impression is that it distils down to more talking about uncertainty and decision making under uncertainty and further delays in getting on with the necessary infrastructural changes.

It’s not like I am doing anything different than what the experts on El Nino and ENSO are attempting. I am just solving the second-order differential equation that Allan Clarke at FSU has documented. This is a simple formulation of a wave equation that can obviously model the standing wave pattern that has been known to occur across the equatorial Pacific thermocline for thousands of years.

The question is whether the periodic forcing inputs that determine the sloshing standing wave can be deduced. I think my paper, and a very recent paper by Astudillo et al [1] are showing that this well may be a deterministic behavior (read not chaotic or stochastic).

At this stage there are no innocents and IMO assignment of blame is a useless, nay, hindering exercise.

[…]

But, hey, if you insist in wanting to be perfect… just wait and you might be.

What does this mean? It sounds as though you are in fact apportioning more blame to one ‘camp’ – the one that has acknowledged the urgent need for decarbonisation.

This suggests is all the more reason to immediately move toward adopting an open inclusive structured approach to the decision making.

Given that the other ‘camp’ is very strongly attached to doing nothing at all, I cannot see how what you appear to suggest will result in anything remotely productive. But again, perhaps I have missed something here.

BBD: Its terribly important to argue ad-nauseum over whether we have Super Really Very High Confidence or merely High Confidence as we attempt to contemplate looking at any data whatsoever let alone even consider making a decision.

Perhaps we should wait for a sign?

“Dr. Egon Spengler: Vinz, you said before you were waiting for a sign. What sign are you waiting for?

Louis: Gozer the Traveler. He will come in one of the pre-chosen forms. During the rectification of the Vuldrini, the traveler came as a large and moving Torg! Then, during the third reconciliation of the last of the McKetrick supplicants, they chose a new form for him: that of a giant Slor! Many Shuvs and Zuuls knew what it was to be roasted in the depths of the Slor that day, I can tell you!”

No, for me you are much too vague to bet the farm. Here are some questions I had when I quickly assessed your characterizations in your first paragraph. Please do not be put off by them. It reflects the essential process of trying to achieve clarity in a decision characterization. These questions are as much an exercise for me as for you.

The goal of the policy is to avoid dangerous warming by reducing CO2 emissions.

Warming where?

Reducing CO2 emissions where?

What level of warming is dangerous?
What does being ‘dangerous’ entail?

C02 emissions where?

Do you have a metric for warming, e.g., average global temperature? What is it?

…

This also defines the quality of the outcome.

If you stay with the qualitative representation of quality are they nominal categories of quality and how are they defined?

Wouldn’t some specified quantitative level of reduction in emissions be a more appropriate measure of quality?

Should there be more than a qualitative relationship between the quality of the emissions reduction in the amount of avoidance in warming?
…

What ‘other reasonable alternatives’ are there to decarbonisation of the energy supply?

Where is the list of alternative actions that have been examined, quantified, and documented?
Who is the targeted or client decision maker for that documentation?

Personally I would have no objection to starting actions for other reasons and in consideration of a risk from global warming. However, I expect along the lines of its effectiveness being continually monitored and that there is/are quantitative measure(s) for that. I expect costs to be an integral component of effectiveness.

I don’t really understand what you are arguing for.

I can see that and consider it to be a legitimate comment on your part. If I can not resolve that then you of course can/will dismiss it of turn it over every once in awhile. Certainly it has been productive for me and I appreciate it.

My impression is that it distils down to more talking about uncertainty and decision making under uncertainty and further delays in getting on with the necessary infrastructural changes.

It would be easy to dismiss that as an impression, i.e., subjective. However, there is some very legitimate concern here. Such a process does not have to goes off the rails and becomes an impediment, but we both know that they do. I can not fix that and the risk is not insignificant. However, the risk from politically muddling our way through the problem is also not to be neglect. As mt and aTTP point out a 20 year delay has complicated matters.

You showed up saying “What readers curious about science should gravitate toward is reading a site that contains independent research such as at http://ContextEarth.com“, but did not mention that you are its author. So you can forgive my surprise that you are merely expressing enthusiasm for your own work.

You have left yourself complete freedom to determine a forcing function.

So you conclude that with **a given forcing** an arbitrary model with the right time constants can be driven close to an observed record. As far as I can tell you obtained that forcing from the trying to match the observations with your presumed model.

Effectively you have a very large number of degrees of freedom. So you’ve told us nothing that can’t be gleaned from first principles.

A car whose wheels are badly out of alignment can be driven up a twisty road if you steer carefully enough.

Further, your concluding claim that your model is “in contrast to those depending on erratic or stochastic inputs which have less predictive power” is unjustified, as your forcing function is exactly that.

“Note that I really don’t care that the math has only been previously applied to tanks built on a human scale. Just because we are dealing with a “tank” the size of the Pacific Ocean doesn’t change the underlying math and physics”

is not true, because of the Coriolis effect.

Perturbations in the thermocline can propagate westward far more easily than eastward – for a given stratification there is only one eastbound Kelvin mode. So the idea that “sloshing” is the correct model without accounting for the actual propagation of trapped balanced equatorial modes is incorrect.

MWG re “CO2 emissions where?” can you possibly be serious? The answer, obviously, is “on earth”.

CO2’s lifetime is much longer than the mixing time constants of the atmosphere. CO2 is well mixed; tropospheric CO2 concentrations are globally near enough uniform that the Mauna Loa time series is good enough for policy purposes.

If you are really interested in this problem and you didn’t understand this, you clearly should hang around where the people who are talking know what they are talking about.

Yes I am serious. Think of all of my questions and in particular those relating to who is the decision maker or decision-makers. A big cause of poor decisions is not formulating the decision correctly. My approach is from the perspective of the clarity test. Missed bigtime on your response mt–in a number of ways.

mwgrant,
I’m equally incredulous. Here’s something that maybe people don’t realise and is – I think – what MT is getting at. We’re clearly changing our climate, we’re doing it quite rapidly relative to most – if not all – previous epochs of climate change, and the change could be substantial (globally the difference between a glacial and an inter-glacial is about 5oC). Is it going to be bad? That depends on what we actually do, but if we carry on as we are, then I think the consequences will be severe. However, maybe they won’t be. Maybe we’ll be lucky, and the changes will all just end up somehow working in a way that’s a net benefit (I don’t think they will, but let’s – for argument’s sake – say they could be). However, what we’re doing is irreversible on human timescales. If the majority of scientists are right, and the minority of naysayers are not, we can’t go back in time and make a different set of decisions.

Just Asking Questions is boring, more so when it has been predicted on an earlier thread. Their relevance is begged, and the commitment his shifted on the interlocutor. Take these:

What level of warming is dangerous?
What does being ‘dangerous’ entail?

The first one presumes that unless we can discover the very exact treshold between dangerous and not-dangerous, we can’t decide to stop dumping CO2 in the atmosphere like there’s no tomorrow. The second one presumes that we need to give a definite description of the word “dangerous,” when such words don’t work that way and when what matters are the risks associated with the impacts of AGW.

I don’t see any reason to believe that we need to identify a specific level of dangerous warming, nor do I see any reason to characterize dangerosity more precisely than some risk of adverse or harmful impacts. RTFR.

***

For memory’s sake, here’s a previous episode of leading questions at Judy’s:

> This implies that the mainstream messages are not alarmist.

Not at all. It only implies that those who make claims have the onus to show their evidence for it.

If the claim is that mainstream climate science is alarmist, then the onus is on the claimant to show that it is indeed alarmist.

The philosophical burden of proof or onus (probandi) is the obligation on a party in an epistemic dispute to provide sufficient warrant for their position.

Leading questions have a tendency to push the limits of justified disingeniousness. Unless mwg can come forward and claim things on his own instead of offering clarity tests, I would not bet the farm on this other ClimateBall episode.

MWG: “As mt and aTTP point out a 20 year delay has complicated matters.”

Actually a 20 year delay has made matters **more difficult** but it hasn’t made them more complex. It has **greatly simplified matters**. We must reduce net emissions to near or below zero as quickly as is practicable without major social disruption.

We’ve long since missed the point where there is much to put in the other side of the balance.

Also I don’t understand where I missed a “clarity test” above. I may have been a bit impatient but I don’t see how I was unclear.

“Please do not be put off by them. It reflects the essential process of trying to achieve clarity in a decision characterization. ”

Clarity is device used in decision analysis to assist defining the elements of a decision. Look at clarity test.

The twist here is that that which has followed my comment only reinforces the lack of clarity [in the decision sense] in your own characterizations. Perhaps this has contributed to the the difficulty in your communication efforts over the years.

You responses–unfortunately not a surprise–is duly noted…both in content and tone.

My apologies for being short there. The point is I have presented a perspective. It is clearly material with which you [all] are apparently not very familiar. I am not trying to convince anyone of adopting that perspective. I could really care less. If that has not been clear before I hope it is now. So people can go poke around on the internet or they can forget it. You guys are incredulous :O)

mwgrant,
Yes, I think we realise that you’ve been presenting a perspective. The responses have been based on the perspective you’ve presented. You may have a perspective, but the scientific evidence suggests that continuing to increase our emissions will have severe consequences and that these will likely by irreversible on human timescales. That, in itself, doesn’t tell us what to do, but does suggest that continuing to wait before we make concrete steps to reduce our emissions and – as MT points out – eventually getting them to zero, may be a strategy that carries a great deal of risk.

Your behavior is standard troll. You delight in JAQing (Just Asking Questions) off all over the place. You BS around, avoid anything vaguely meaningful, argue every point till your blue in the face. Really… you have to say anything meaningful after two days. We’re all quite used to that too.

anoilman as far as Esterbrook goes and you assertion based on Esterbrook,

“…Climate models are the best and most thoroughly reviewed software out there. There is nothing like it.”

is undercut interms of the number of codes considered and in terms of the metric by the following quotes from Esterbrook:

Many of the limitations to the present study could be overcome with more detailed and controlled replications. Mostly significantly, a larger sample size both of climate models and comparator projects would lend to the credibility of our defect density and fault analysis results.

A little later in the same section the authors quote ‘Hatton (1995)’:

“There is no shortage of things to measure, but there is a dire shortage of case histories which provide useful correlations. What is reasonably well established, however, is that there is no single metric which is continuously and monotonically related to various useful measures of software quality…”

As far as the subthread evolving from around the 8/11/15 4:19pm post, neither of the two links that you provide have any bearing on the topic. The link, Willard’s, climbitbull contrarian matrix has nothing to do with any thing either the GCMs or decision making aspects touched on. If my comments are those of a troll then I guess you folks over here could use more trolls or maybe get a different operating manual…something slipped thru on the QA.

f my comments are those of a troll then I guess you folks over here could use more trolls or maybe get a different operating manual…something slipped thru on the QA.

You and AoM have been going at each other a bit, so I decided to leave AoM’s troll remark. Maybe I shouldn’t have, but I did. I don’t particularly like people being called trolls, as I don’t hugely like it when it happens to me. Maybe we can all just tone this down.

> What is reasonably well established, however, is that there is no single metric which is continuously and monotonically related to various useful measures of software quality…”

Cue to more clarity testing from our guest wizard regarding which single metric we should have to evaluate code quality. Because that Howard dude, cited by the relevant Wiki entry, is just the formal guy we need:

> The link, Willard’s, climbitbull contrarian matrix has nothing to do with any thing either the GCMs or decision making aspects touched on.

It actually had something with when you tried to make it about me, mwg. Do you want me to trace back how the ClimateBall exchange evolve or are you able to recall the concerns you’re raising from one comment to the next?

It would be suboptimal to conflate ClimateBall and the Contrarian Matrix in the middle of clarity testing, BTW.

And you made it about me. Willard–the contrarian matrix pertains to people and not to the other content–which you are free to dispute. Let it rest.

Clarity test. Why the single metric reference? Read the thread W. It had been asserted earlier in the thread that defect density is the metric used from code quality. Here the authors comment that there is no single satisfactory metric. I shot the beast twice–out in the open. Pretty good, huh?

the beast shot — how about the first quote: “…Climate models are the best and most thoroughly reviewed software out there. There is nothing like it.” Don’t worry about the Strawman, Liebchen, it was only a bad dream you were having. Nothing here has been damaged! Woo-hoo!

I find his views on V&V a bit silly. But then we have different standards coming from different communities.

“The Computation of Vulnerable Area Tool (COVART) model predicts the ballistic vulnerability of vehicles (fixed-wing, rotary-wing, and ground targets), given ballistic penetrator impact. Each penetrator is evaluated along each shotline (line-of-sight path through the target). Whenever a critical component is struck by the penetrator, the probability that the component is defeated is computed using user-defined conditional probability of component dysfunction given a hit (Pcd/h) data. COVART evaluates the vulnerable areas of components, sets of components, systems, and the total vehicle. In its simplest form, vulnerable area is the product of the presented area of the component and the Pcd/h data. The total target vulnerable area is determined from the combined component vulnerable areas based upon various target damage definitions.

This model is really cool. One problem ( back in the day ) is that we had NO DATA to calibrate it.
One time we took a 50Million plane into the desert to shoot rounds at it and then try to
see if that data was “consistent with” the model. Validation was hard, but you just did the best you could and documented it.

Engineers, program managers, government buyers all knew that it was just a tool for evaluating two planes that had not been built. It was the best tool we had. we all submitted to the process.
there was no crying about bad models. If you wanted to you worked on improving the standard and then submitting your changes.

“ALARM is a generic digital computer simulation designed to evaluate the performance of a ground-based radar system attempting to detect low-altitude aircraft. The purpose of ALARM is to provide a radar analyst with a software simulation tool to evaluate the detection performance of a ground-based radar system against the target of interest in a realistic environment. The model can simulate pulsed/Moving Target Indicator (MTI), and Pulse Doppler (PD) type radar systems and has a limited capability to model Continuous Wave (CW) radar. Radar detection calculations are based on the Signal-to-Noise (S/N) radar range equations commonly used in radar analysis. ALARM has four simulation modes: Flight Path Analysis (FPA) mode, Horizontal Detection Contour (HDC) mode, Vertical Coverage Envelope (VCE) mode, and Vertical Detection Contour (VDC) mode. – See more at: https://www.dsiac.org/resources/models_and_tools#sthash.pm6IDGIA.dpuf”

Tactical Air Combat Simulation BRAWLER simulates air-to-air combat between multiple flights of aircraft in both the visual and beyond-visual-range (BVR) arenas. This simulation of flight-vs.-flight air combat is considered to render realistic behaviors by Air Force pilots. BRAWLER incorporates value-driven and information-oriented principles in its structure to provide a Monte Carlo, event-driven simulation of air combat between multiple flights of aircraft with real-world stochastic features. – See more at: https://www.dsiac.org/resources/models_and_tools#sthash.pm6IDGIA.dpuf

TAC BRAWLER wasnt very good. So there was a process of improving it.

we could submit improvements. if they passed muster then they got included in the standard model.

of course BRAWLER was never “validated” because to do that you’d have to fight a war and shoot down real planes and people. The closest we came was comparing brawler to man in the loop flight simulation wars. All very messy, all very notional.

But there was a process
And there was agreement to use the tool.

Research guys could go off and do anything they wanted to come up with better models
But if you were going to tell a “decider” or buyer some results those results had to come
from the standard model.

You shuold probably look at how modtran is maintained and used. That would be another example.

The point is that you had a very messy and very uncertain problem space. That was addressed
by having a standard model. Everyone knew the limitations of the beast. And people would try to get their improvements included in the standard model. Or people tried to get their models accepted

> how about the first quote: “…Climate models are the best and most thoroughly reviewed software out there.

That there ain’t no ultimate metric does not substantiate that point. There’s no need to review SteveE’s work to [see] that Oily One relied on an hyperbole: look for “hyperbole” on this page. Recalling upfront your own experience with code standards of practice within your own community may have been more expedient than sealioning Oily One.

***

> But then we have different standards coming from different communities.

Exactly. Next time Gavin will have climate models that would run the risk to cause a nuclear winter, it is to be hoped that he’ll at least validate for safety.

Since GCMs are mostly used for climate projections, I’m not sure exactly which formal properties need to be specified. Models that approximate physical laws may be quite complex. If I read you right, the question of the GCMs’ intended use is more important, and seems more like a verification requirement anyway. Seen under that light, SteveE’s work might be invoked to substantiate the claim that they’re more than good enough for the job. It’s possible to agree to disagree on such matter. The argument oscillates between pragmatic and formal considerations, which makes the whole discussion hard to arbitrate.

In any case, V&V for GCMs implies we decide to invest even more money than we already do in that field, an investment decision which may require its own clarity testing.

No. There was and is no need for the task at hand. I didn’t criticize the code. At one point I even indicated that

“…the important question is are they good enough for the intended application? Frankly in my opinion the answer is ‘yes’. The big problem lies with what is the intended use? …
I read did enough in your referenced work alone evaluate your ‘hyperbole’ as Willard generously calls. (I do not.)

I also commented on how QA is handled elsewhere and suggested that there might be benefits to that. This is not a criticism–frankly it is a reasonable idea looking at other approaches in contentious environments. I am continuously amazed at the insecurity some of you folks have about the codes used.

MWGrant: You are trying to have a discussion with people (a limited few, not all posting on aTTP) who have no capacity to listen and consider. The psychological defect has the same taproot of teabaggers who hate Mexicans and want to arrest/deport/fence in all brown people. Anyone presenting any hint of “the other” is to be attacked. When commentators spew the “T” word, they are looking in the mirror. In one sense, you are playing the fool for their idle amusement, so you have earned this abortion of a thread. Welcome to climateball.

The question if teh modulz are good enough for the intended application can only be settled by looking into the code. The same obviously applies to SQA issues. ISO standards ain’t cheap.

Speaking of “good enough”:

In Winnicott’s writing, the “False Self” is a defence, a kind of mask of behaviour that complies with others’ expectations. Winnicott thought that In health, a False Self was what allowed one to present a “polite and mannered attitude” in public.

But he saw more serious emotional problems in patients who seemed unable to feel spontaneous, alive or real to themselves anywhere, in any part of their lives, yet managed to put on a successful “show of being real.” Such patients suffered inwardly from a sense of being empty, dead or “phoney.”

Winnicott thought that this more extreme kind of False Self began to develop in infancy, as a defence against an environment that felt unsafe or overwhelming because of a lack of reasonably attuned caregiving. He thought that parents did not need to be perfectly attuned, but just “ordinarily devoted” or “good enough” to protect the baby from often experiencing overwhelming extremes of discomfort and distress, emotional or physical. But babies who lack this kind of external protection, Winnicott thought, had to do their best with their own crude defences.

In fairness, we must admit that Donald has not presented a concept of good enough that would satisfy a clairvoyant who’d know the future and be able to tell if an artifact or an agent is good enough in an operational manner.

Howard (Comment #137916)
July 30th, 2015 at 5:28 pm
Joshua is the best at getting people to pick nits out of scabs. I saw it on the SkS top secret blog that they send out these type of disruption bloggers to lukewarmist blogs to tie up the time and effort of skeptics thereby preventing real investigation of the commie plot to tax carbon in ice cream… childrens ice cream.

Let’s just say this was a test of myself in a different environment. I went in with eyes open with hip-waders on and am satisfied. The was some good pushing in there. To me there is no harm to be their fool because climatebawl is their game. It is not mine. I can stay or I can walk away. As a friend once told me they can’t chop me up and eat me. :O)

But hey, it is an away game … I guess.

BTW I knew where I likely was after your comment on conceptual models.

”
You have left yourself complete freedom to determine a forcing function.
“

Obviously you have not read much of what I have written. There is no flexibility in applying the QBO as a forcing function. This has a long-term mean period of 2.33 years and it can not be changed.

Everyone knows that QBO and ENSO are closely tied together.

So I applied that as a forcing function in Allan Clarke’s differential equation [1].

I hope that you can see that I am not making things up.

All I am doing is applying the components that climate scientists have shown to be as factors.

Why they do not follow through on their own suggestions, I haven’t a clue.

Why you assert that what I am doing is wrong, I can easily guess. Probably because you have set yourself up as a gatekeeper. “You ignored Coriolis!” “You ignored east-west asymmetry!” You are convincing yourself that it can’t be right and using “just so” stories to seed doubt in other people’s minds that what I am doing just has to be wrong.

“I suspect many people use hyperbole here from time to time to telegraph a thought or concept efficiently. That is how I took ‘kill the economy’ and do not take Steven to task for that.”

1.) Context and usage, Willard, also impact how any of us interpret writing or speech.
One case uses a very common idiom,e.g., “It’s not going to kill you, to eat your broccoli, Willard.”
The other case s not hyperbole“…Climate models are the best and most thoroughly reviewed software out there. There is nothing like it.” Walk it back to TE’s question Willard. Hyperbole in AoM’s answer does not make sense.

Something else you missed too. aTTP came by put oil on the water and AoM and I moved on. Yeah, he visited, but why not? You stirred the pot. It your problem. AoM makes comments, does some good pushing, same for BBD, we go to other things. Not you. You pertually chase blog dustbunnies.

Keep looking though and I am sure you can find someplace where I an inconsistent or hypocritical, but it does not matter. You’re the Vulcan wannabee I’ve see here. They rest of us have realistic expectations of others and ourselves.

anoilman. Again no need to, but why do you ask? I went and looked at CESM but that site material does not seem applicable for a QA look. I’ll definitely and I did signup. Here is the reply I wrote to Eli regarding CESM

Eli Rabett wrote:

perhaps mw should go find out about how the community earth system model has been put together.

CESM 1.2 User Manual writes:

CESM Validation

Although CESM can be run out-of-the-box for a variety of resolutions, component combinations, and machines, MOST combinations of component sets, resolutions, and machines have not undergone rigorous scientific climate validation. … Users should carry out their own validations on any platform prior to doing scientific runs or scientific analysis and documentation. [CESM ucase]

Well, that was short and sweet. Looking at CESM will not be that useful from a QA perspective. That is not why it has been made available anyway. (Eli doesn’t seem tuned to this thread. Guess that was the case.) Better Model E documentation. Really not a surprise.

“The question if teh modulz are good enough for the intended application can only be settled by looking into the code. The same obviously applies to SQA issues. ISO standards ain’t cheap.”

Not true.

the classic case would be RCS code. To look at RCS code you needed a level 4 TS/SAR
security clearance. but the people who decided it was GOOD ENOUGH never had to look at the code. What they compared was the code output and real live range test data.

Of course it does. Nobody in his right mind would presume that Oily One looked at every single piece of code on the planet, evaluated them all on a standardized benchmark, and then concluded that climate code came on top. To interpret Oily One as saying that teh modulz are quite good enough at what they do is way more charitable.

Here’s Turbulent One’s question, BTW:

How much of that is a reflection of natural variability and how much is artificial variance introduced by lines of FORTRAN?

Pray tell more about how a conceptual model would help answer Turbulent’s question without overseeing the code, mgw. Share the love.

***

> What they compared was the code output and real live range test data.

Fair enough, but that’s just one of the V, and it’s not the one that would answer Turbulent One’s question.

One does not simply validate formal properties by running the code in Mordor.

AoM was not looking at any code. He was doing something reasonable, he was recalling at Easterbrook’s video/paper.

As for TE question. The only relevant aspect here is that it as a specific question about the code. Details of that content are secondary. It was a legitimate serious question and the response –good answer or bad answer–hyperbole in that context would not make sense.

I let it go. You, sir, may now have the field. Throw daisies and run under them to your heart’s content.

In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.

Perhaps mgw’s experience in managing projects could help here in bringing more anecdata.

If these numbers are correct, and assuming that modulz are created without spending any money either on docs or SQA, we’re looking at a substantial investment, an investment we might need to “clarity test” beforehand.

Of course he wasn’t, mgw. However, that’s what he does for a living, and my guess is that he wants to know if you’re ready to scratch your own itch. That question might also help him recognize if the concern you made comes from a coder or from a project manager.

***

> hyperbole in that context would not make sense.

Let’s quote Oily One’s remark again:

Turbulent Eddie: Nice try. Its not the FORTRAN. Climate models are the best and most thoroughly reviewed software out there. There is nothing like it.

There are lots of expressions with the word “best” that work as an hyperbole: you’re simply the best, that’s the best damn wine I have ever tasted, Judy’s the best ClimateBall blog there is these days, etc. The idiom “there is nothing like it” appears mostly in hyperbolic stances, for the obvious reason that one does not simply compare everything together before inserting a meliorative in one sentence from a blog. The same applies to the “most thoroughly reviewed” bit, since the word “most” cannot be realistically interpreted as applying to a universal and objective claim in the context of comparing code quality. Moreover, the expression “most thoroughly” can be read as another way to say “quite thoroughly.”

On the one hand, it’s not obvious to me that it’s possible to write a spec for a GCM the way one would do for an online store or a bank.

On the other, in principle it should be feasible to write tests for each of the smaller modules. Indeed, going back to ATTP’s original point way back when, there are lots of physical constraints at play and these can be treated very much like software predicates. (This is greatly complicated by the imprecision of floating point numbers. One would really need a higher level of coding to do this effectively. It’s something I’ve thought about.) To my knowledge this sort of thing is not systematically done, and there is no formal test suite of the sort common in transaction processing, web services, etc.

I should add the caveat that I’m not entirely privy to the process at CCSM and merely speculating at all the others. I could be wrong. It’s not without precedent.

I think the validation phase is sufficiently robust for the fluid dynamics core and the radiation core that they can be excused from this sort of test. But all the other details? There are probably still small bugs lurking there that a more formal development process could expose.

However, there is a certain open-sourciness to it in that researchers do go over the bits of code closest to their own work and so the many-eyeballs approach does operate.

I think it’s amazing that the models work as well as they do. On the other hand, one could argue that, as a sort of curve-fitting argument would have it, the more complicated they get, the less amazing it is. I continue to maintain that there’s a space of dynamic models of intermediate complexity that ought to be more fully explored.

You want a external/public review of climate models. (For reasons unknown other that you think it will generate some sort of results you consider relevant.)
Said external/public audit will generate new metrics. (… of your as yet undisclosed choosing)
You haven’t been able to find what you consider to be design documents. I’m not disagreeing, but you may have to ask. (I’ll wager you’ll be disappointed. My bet is that the gritty bits will still require familiarity with a branch science and or the relevant scientific papers.)
You haven’t contacted anyone who’s reviewed/looked at the code.

The model space I am interested in is the minimally parameterized GCM, possibly but not necessarily run at low resolution. This is between the EMIC and the cutting edge GCM discussed in AR4, We could profitably do something like what WHT has pointlessly done with ENSO, because unlike his, our models have actual physics in them and actual predictive success. But the more free parameters there are, the less meaningfully we can tune them, and simultaneously the more likely that we fall into the trap of overtuning.

But I didn’t run my career well enough to get to attempt this. Somebody else ought to try it.

As per your previous comment, “However, there is a certain open-sourciness to it in that researchers do go over the bits of code closest to their own work and so the many-eyeballs approach does operate.”

”
We could profitably do something like what WHT has pointlessly done with ENSO, because unlike his, our models have actual physics in them and actual predictive success.
“

More physics in my simplified model than anything else.

ENSO is more geophysics than it is climate science. What is happening with the sloshing model is a first-order approximation of a body of fluid’s response to angular momentum changes in the forcing.

As far as predictive success, predictions are not the defining aspect of physics. Physics is about understanding first. There are any number of ways that one can use the historical data to test this understanding.

As an example, take the case of tides. Is it that important that we need predictive success to establish that the basic theory of tides is correct? Or can we go back in time and look at historical data to establish what is happening?

Eventually, the cycles of ENSO will be understood to the extent that they are as well accepted as tidal cycles.

BTW, ENSO predictions currently are very short-term. If I were to actually work on predictions, all that I would need to show was that it was better than the others. So if the current predictions are only good to 6 months out, if I had something that could predict 2 years out, my model would win. The reason that I am concentrating on understanding over the whole range of ENSO data available is that I am doing scientist first, and not attempting to start a climate forecasting business 🙂 In other science disciplines, such as materials sciences and condensed matter physics, we refer to this process as characterization.

read the text if you dont understand the issue ( its related to leapfrog )

basically the kind of thing that mwgrant and I are asking for would be this.

A) a STANDARD set of diagnostics that have to be performed. ( figure 9 shows you one )
B) a criteria pass/no pass
C) publication of diagnostics.

Let me give you an example from Ar4.

if you read Ar4 on attribution ( I think it was in the supplementary material so you will have to look )
you will find that NOT ALL models were used in attribution.

Model results were submitted and then a subset of models were selected. Those with a low drift
in control runs.

well. what you actually want to do is Specify an acceptable drift BEFORE model results can be submitted. In short, If a model had a 5% drift it could be used in any of the other charts but for attribution they realized that too much drift could look like a “natural’ trend.

The point being the Standard of low drift should be applied across the board. otherwise
you end up mixing shitty models with good models. if you are doing forecasts you dont want drift in a control run.

So some basics. You specify allowable drift in control runs. you preclude submissions that violate that. people are driven to improve that aspect. you get better science. You publish the spec
you publish a score card. that builds discipline and trust.

The spec doesnt have to be hard to meet ! and over time you add elements.

1. There is no evidence of difficulties with licences.
2. If there is it is easily handled by journals and the IPCC. If you want to submit
results you have to have a GNU licence ( or pick creative commons )
3. It is common even for companies to be forced to open to their code in order
to do business with competitors. Your code is basically given to everyone else.
If stupid business people can figure this out it cant be rocket science. Stop underestimating
scientists.

“Lastly, the results are competitively reviewed all over the place.”

1. Actually not.
2. A standard set of benchmarks is lacking. the best I’ve seen in taylor diagrams
on a couple of parameters.
3. Competition, real competition, should reduce the spread. Not seeing evidence of that.

MT,
If I understand what you’re suggesting, then I largely agree. I, personally, am always much more interested in models with physics that requires little tuning and in which one can understand the results in terms of the physics that you included. When it starts to become extremely complex it becomes much more difficult to associate the results with the physics that you know you’ve included and over-tuning does indeed become an issue.

Steven,
I don’t know enough about the details to have a strong view, but what you suggest seems reasonable. It’s certainly pretty standard to have at least passed some standard tests before using a model to solve a certain class of problems.

FWIW, ESMs and GCMs are not production software, but constantly evolving experimental testbeds.

Yes, a good point. These aren’t pieces of productive software that we validate in some way and then fix. They really are – as you say – experimental testbeds that are used to try and probe how our climate responds under different circumstances.

GCMs are ALSO community service platforms with a variety of use cases. They don’t have to be as informal as they are on the developer side or as painful as they are on the end user side.

I think where our friends the lukewarmers and naysayers of various stripes come in is to say that this is applied science, not pure science, so everything has to be multiply documented and so on like medical research. This wrong idea of the role of climate models is not entirely their fault – press offices and to some extent program managers are guilty of promoting it, without really understanding how much more difficult it would make life if that were real.

If at some point geoengineering becomes a real prospect, the pressures on climate modeling will be immense, though. Let’s hope it does[n]’t come to that for more important reasons, but really, would it hurt to do a ground up design using as many principles from the commercial software world as possible?

The culture gaps are immense – scientists don’t know how vast the skill set engineers have to offer is, and vice versa. Both accordingly tend to arrogance, which we see constantly on the blogs.

What’s more, I don’t know how it is overseas, but interdisciplinary collaborations in America tend to fall woefully flat even within the sciences. I can’t imagine how it would work with [engineering researchers as well as science researchers].

A climate model is very different even from most other scientific software (since it is seeking to characterize the phase space of the system and not its trajectory). Its distance from what commercial software houses normally do is even more vast. So there’s no cookbook approach. Teams will need to combine deep understanding of what the software principles are and what the models do. That’s not trivial by any means.

But engineering software is progressing in spectacular ways, and climate modeling is still plodding. If there’s really benefit to be had from significant progress in GCMs, it may be time to start with a nearly blank slate (aside from the well-tested low-level math libraries, BLAS etc. and maybe the radiative transfer code) insofar as the code base goes.

I suggest that the FLASH effort in astrophysics refutes Oreskes et al 1993’s claim that “In its application to models of natural systems, the term verification is highly misleading.
It suggests a demonstration of proof that is simply not accessible”

MT,
I’m not sure I quite agree. I think they’re verifying the numerical scheme by doing some standard tests. I don’t think they’re verifying it in terms of it’s application to a specific physical problem. FLASH is one of the standard schemes in astrophysics, but is used for a wide range of different applications. I assume that GCMs have at least tested their numerical schemes using standard test problems. I think that’s different to verifying that it correctly represent the complex systemt that it’s trying to model. It is late (I’ve been trying to watch the Perseids – largely unsuccessfully) so maybe I’m confused.

‘I think where our friends the lukewarmers and naysayers of various stripes come in is to say that this is applied science, not pure science, so everything has to be multiply documented and so on like medical research. This wrong idea of the role of climate models is not entirely their fault – press offices and to some extent program managers are guilty of promoting it, without really understanding how much more difficult it would make life if that were real.”

Again agreement.

my modest proposal is this.

For policy I would suggest that the US pick one model. Actually issue a RFP for people to submit their model along with a proposed budget for documentation, tests, ect. and bringing it under control much as programs like MODTRAN are under control.

That model would be standard.

Research folks can continue their research and pure science.

If they find or develop something cool, they can submit proposals to improve the standard.

We don’t really need a model “for policy” except insofar as regional models are needed for adaptation policy, which in the US would not occur at the federal level.

The mitigation horse has long since left the barn. “Zero net emissions as fast as feasible” is the only sane course; all we need to be discussing is how fast is feasible. We don’t need GCMs to tell us anything about mitigation.

I’ve often found it valuable to ask someone who knows the right stuff. They often have insight that I may not. Just because you think it might be concern doesn’t mean it actually is a concern.

While many companies may have resolved the sharing issues, I’d say many many more haven’t. Many companies I’ve been at have walked from IP sharing agreements. Part of the problem for climate models may be incomplete ownership. Furthermore open source now supports closed binary blobs (no source share) to satisfy businesses that don’t want to share.

I have to say that I think it should be sharable, but then you may not know what was put in way back. Some of this stuff is really really old, and practices may have been different back then. Retrofitting software licenses can be an expensive proposition. Its potentially a full rewrite for open source. (Luxrender is undergoing that now, its cost them years. But the new license will finally allow the render engine to be included in 3rd party applications.)

The mitigation horse has long since left the barn. “Zero net emissions as fast as feasible” is the only sane course; all we need to be discussing is how fast is feasible. We don’t need GCMs to tell us anything about mitigation.

This has been repeated by mt several times and was part of ATTP’s original response to mwg. It was also my original point. This is not about the bloody models.

So the endless back-and-forth about ‘issues’ with the models serves only one purpose – to generate a miasma of doubt and uncertainty tied to the incorrect (and never withdrawn) claim by mwg that ‘the models’ have an undue influence on policy.

This is no barrier to developing a standard model
*********************’

Money and time and slots. These sort of things can only be done as line items in a national science agency budget. There simply is no way to get grants to do this as the response will be “nothing new here, don’t fund” and even at the agencies/labs it is not clear whose mission fits.

Moreover, the update cycle will kill any such effort as new methods/information comes in.

I’ve dealt with (and created) validated models and software that has on occasion been subjected to qualification tests by examinations and against standards (FDA and pharmaceutical review, for example).

But there’s a problem with those software platforms – validation and verification are expensive, and once that expense has been incurred changes are not welcome. Meaning that as the science develops, the ‘standard model’ won’t include it at anything like the rate of scientific development, simply because of the cost of V&V.

I cannot count how many times I’ve had the following conversation:

“We need a bug fix for problem XXXX – can you patch it for us?”
“XXXX was fixed two releases ago, the patch is to update to the latest version.”
“NOOOoooooo – we would have to recertify the new version! Whine whine whine… I guess we’ll live with it.”

Quite frankly, insisting on a ‘standard model’ means insisting on a outdated model, several iterations of knowledge behind the state of the art. Not a good idea at all.

1. Production software?
2. Bringing Research Codes into a Production Mode?
3. experts in GCMs?
4. Experts in bring GCMs under production software rules?

Experts in WHAT?

i consider this to be a problem that would require input from #2. Those with experience
bringing research codes into production mode. “Fork it” is not a deep concept. The question isnt how you do it. The first newsworthy contract we won from the Air force was to bring research simulation code into a production mode.And yes, we keep a research fork alive
and thriving.

As ATTP notes there is nothing strange unique or unreasonable about what I propose.

The only cogent objection is “mt’s” objection. That GCMs are not needed for policy
and so creating a standard model would just be busy work.

Ask yourself why mt is able to come with the best counter arguments.

A) he knows more about code that you do
B) he knows more about GCMs than you do.
C) he is past personalizing our disagreements.

I’ll suggest you pick up mt’s argument and spend brain cycles on that.

“Quite frankly, insisting on a ‘standard model’ means insisting on a outdated model, several iterations of knowledge behind the state of the art. Not a good idea at all.”

Gosh this is so simple to solve.

The same argument is made repeatedly about standardizing models.

The procedure is simple.

You have a standard set of tests and metrics. your standard model has a documented performance on these metrics. If someone’s research code out performs that any decision maker can take that into consideration. And of course you threaten the company that maintains the standard that they can be replaced.

As a developer of research codes your dream is to get your stuff into the standard.

But lets take your argument and flip it on its head! lets do a little judo on your butt

Steven Mosher: Thanks for those questions, but it sounds like JAQing to me.

What experts?

I’d start with the people working on the models and work around from there. Does that sound reasonable? It might help you figure out if you need to find someone else to talk to for instance. You’re the one with an opinion, who do you think would find it interesting? You pointed to drift in one model as a potential concern. Oh ma gosh! I bet the model builders know a lot more about that than you. Just saying. I mean they measured it for a reason. Right?

It’s strange unreasonable and unusual to present your claims/concerns to a group of people wholly unrelated to the subject at hand. While simultaneously you’ve never contacted the one group of people who might have something to say about it. Seem reasonable?

Just so you know, I’ve contacted a key scientist about material/concerns that I was interested in. I was enlightened, as was the the scientist in question.

I’ve long since been warming what mt has said, but I haven’t seen anything I consider a concern.

“With over a 30 year heritage, MODTRAN® has been extensively validated, and it serves as the community standard atmospheric band model.”

Yes, this has been validated – by use. Not by qualifying against a panel of pre-defined standards on each iteration (although I expect that’s part of their internal release process), mind you, but by demonstration of validity over time leading to broad acceptance. Community tested, not committee tested.

I’ll say it again – I’ve had the validation conversation over and over, and groups without bottomless funding _will_ be resistant to change due to the cost and time of re-validating against a standards array every time you update the physics. And all of your military examples are of models used to replicate conditions, avoiding much of the cost and risk of physical tests – they are not models used to investigate phenomena or physics what-if questions.

GCMs are constantly being validated in the same fashion as MODTRAN – tested against observations and by evaluating the fidelity of the replication of various physical processes. However, what you _appear_ to be suggesting – an external panel of V&V tests to qualify as meeting a ‘standard’ – seems IMO to be both limited (which parts of the physics need to be in the panel of tests? All of them?) and limiting (as our observations gain detail and our physical understanding changes, the standards themselves may turn out to be in error). Not to mention that different models will have different strengths, for example in the replication of ENSO dynamics – is your ‘standard’ model going to require a super-set of capabilities?

“You have a standard set of tests and metrics. your standard model has a documented performance on these metrics.” – This only works if you’ve pre-defined the results possible from the GCM – or if you’re not testing the portions that can reveal new relationships. In which case you’re not going to learn much new from standardized models. The kind of testing you’re proposing would certainly prevent ‘doh’ errors, but it wouldn’t and couldn’t test how well the models generate new information.

Far better, IMO, to put the GCM results out in as much detail as possible, as in the CMIP runs, and evaluate model veracity accordingly.

No. Climate modeling is about science, science which is not relevant to mitigation policy. It is potentially relevant to adaptation policy and to remediation (geoengineering) policy, but it remains speculative whether such an advance is possible.

“Anything that could POSSIBLY cause a bump in the road to policy has to be objected to.”

That at least is a good question, especially given how successfully bumps have been wilfully constructed over the past two decades. I think, though, it is fair to say that a given question is not relevant to mitigation policy. Attacks on climate modeling are a red herring. We should strictly separate the very interesting and scientifically important questions which are of interest to a scientifically curious audience from the crucial policy questions which affect everyone and which any citizen should be aware of. The relevance of models to the global mitigation question is arguably negligible within the policy time scale window. I believe that, and do so argue. It’s a constructed speed bump, not a real issue.

“No other principles need apply. ever. the planet is stake.”

I make no such claim. I frequently remind people that there are multiple existential threats and we have to respond effectively to all of them. Sacrificing principle to solve on problem could easily prevent progress on another. So putting these words in my mouth is unacceptable to me.

“1. So we defend data hiding”

As far as I know, the only instance of “data hiding” that arguably comes from real scientists was an informal communication, the cover of a WMO report, in which an inconvenient part of Briffa’s data was willfully truncated. I have never defended this. But I object to any characterization of habitual data hiding, or myself for one defending it.

“2. So we defend opaque processes.”

Not me.

I am an enthusaist for restructuring science, open science, reproducibility, and finding some way to filter nonsense other than credentialism. I’m absolutely and adamantly opposed to the current for-profit journal-as-gatekeeper system.

Climatology is no exception among the sciences in having this problem. (Computer science has long been taking the lead in overcoming it.) Attacks on climate science in particular under this rubric makes enemies for the open science movement.

“3. so we defend inferior models”

In my defense I offer pretty much everything I’ve written on this thread, so no.

“4. so we refuse to compromise on anything”

Well, I’m not sure what that’s about. Regarding the topics of this thread, where we aren’t discussing policy but merely the science that informs policy, I’ll admit that I won’t compromise what I see to be true with half-truth.

” oh ya we dont have the money to do it right”

Well, after decades of overstated and sometimes baseless attacks, largely driven by policy and not science, climate science is not especially popular in congress.

Steven,
I must admit you did seem to imply that people here who had made arguments about the policy relevance of GCMs were also doing the various things that you mention in your comment. I don’t think that that is true. I’ll add a comment about defending data hiding, opaque processes, etc. I’m with MT in this regard; everything should be as open as is possible (there are reasonable exceptions). However, that doesn’t mean that we should all be condemning and demonising all those who may not have lived up to these ideals in the past.

Mt.. not everyone engages in every one of the ‘look the other way” types of behaviors.

but I think in general if one cares about the policy there will come a time and place
where you bend other principals.
some people make excuses on transparency some people make excuses on model testing..
maybe some people abandon their normal forgiving natures.

some people are just silent whereas they normally would object.

Its those points of friction that I find really interesting . I find it interesting when people talk about giving ammunition to skeptics..

For me the friction is with my libertarian politics.

I cant imagine anyone who doesnt or hasnt experienced some ethical friction

BBD says I shouldn’t take the bait. But I’m obviously foolish that way.

Yes. Of course the tension between precision and solidarity is a real factor in the life of anyone competently attempting to link complex evidence to complex policy. Stipulated. I agreed with Gaius Publius on that on Twitter recently. I agree with your 11:13 pm. I think it’s a good point.

(Steve Schneider’s reputation endured all sorts of hits for saying that rather hamhandedly, you’ll recall. http://climatesight.org/2009/04/12/the-schneider-quote/ Once you know your stuff, you’re constantly trading off being pedantically correct against being close to correct but more effective. It’s just a fact of life.)

But Mosher, that doesn’t justify your outburst of 4:28 pm ATTP time today and your peculiar attempt to suggest you never meant it as a provocation. Nor that the post wasn’t in part about me despite you naming me specifically. I can’t even tell if you’re serious about that.

“As mt and you argue it is about the policy.

Anything that could POSSIBLY cause a bump in the road to policy has to be objected to.
No other principles need apply. ever. the planet is stake.”

but

“mt i know you are the exception”

Please. What’s the punchline? Because you must be joking.

Really, I’d like you a lot better if you found it in your heart once in a while to apologize rather than sneakily try to walk your literal claims back while leaving the dog whistles in place, like a backbench politician. Own up to saying things, and own up to changing your mind when you do, please.

One may have to bite one’s tongue on occasion but there’s no need to waffle and prevaricate.

““Anything that could POSSIBLY cause a bump in the road to policy has to be objected to.”

That at least is a good question, especially given how successfully bumps have been wilfully constructed over the past two decades. I think, though, it is fair to say that a given question is not relevant to mitigation policy. Attacks on climate modeling are a red herring.”

Attacks on climate modeling would continue even if the models were delivered by God on stone tablets on a mountain top marked with a burning bush.

So, yes, red herring. And Mosher would be one of those continuing to spread FUD.

I don’t think that it’s a matter of your lack of imagination. I think that basically everyone, depending on context, recognizes those friction points of which you speak. Certainly, we can point to many situations where libertarians and non-libertarians alike respond to risk in similar fashion.

IMO, what we see with issues like climate change is that the polarized context stimulates cultural cognition which alters risk analysis as we see in non-polarized contexts. The problem is that people view their response as a kind of ID tag (“See. I’m a libertarian and my refusal to go along with a statist, authoritarian cabal who wants to impose climate change mitigation so they they can tax me and destroy capitalism is proof of my identity. So I wear this “skeptic” badge.)

Additionally, of course, there are also characteristics of risk assessment even in non-polarized contexts where the risk is dramatic but perhaps improbable, and where the risk plays out on long time horizons, where more typical “friction point” responses become affected.

Steven Mosher: Actually I’ve been attempting to elicit a reasonable business case. Change for change’s sake is not a successful policy for anything. So far I haven’t heard a reasonable business case.

Someone running around saying something might be wrong isn’t unusual, even in business. The gauntlet that you have run is to make a business case;
“What is the concern?” (Not entirely sure…)
“What do others think about it?” (Never asked anyone who would be affected… )
“How will you solve the concern?” (You hope for public reviews? Install more metrics?)
“What will be the result of solution?” (Not sure…)

To top it off, you have another guy on the other side of the table saying…
“I’ve run a public peer reviewed review of the models showing no concerns.” The bar is set pretty high for you.

I’m not saying you’re wrong, but to my eyes… meh. You got nothing but hand waving.

Reduce the Navier-Stokes equations to the wave equation, apply the QBO angular momentum changes as a near-periodic forcing on the equatorial Pacific thermocline, and voila, you have a very decent model of ENSO behavior over multiple decades and centuries.

This essentially explains a significant fraction of the world’s temperature anomaly.

> cavitation
I suspect that’s a brilliant observation the economists will utterly ignore
————-
On that model:
> Only when the integration reaches 1981-1982 does a real discrepancy
> occur. See Figure 3 ….

So I’m metamodeling the slosh model as a simple pendulum, a swing occupied by a kid who is sitting without wiggling too much.

Until 1981-1982, when the kid learns how to pump a swing by timing when to extend and when to pull in …. that is, the point at which the warming signal emerges.

What this equation is essentially saying is that the density, , in a particular volume cannot change unless there is a net flux, , into that volume.

The density can change, even in the case of zero net flux, if either of the two thermodynamically independent variables in the Equation of State (EoS) for the material change. Two thermodynamically independent variables that are frequently used in an EoS are pressure and temperature. Energy addition through the walls of a closed container, for example, will cause density change.

It also follows that in the case of zero net flux the density can change if the energy content of the fluxing material is different from the initial energy content of the material to which the equation is applied.

The statement quoted above actually doesn’t have consistent units; the density and the flux have different units.

What the equation is actually saying is that the time-rate-of-change of the density is determined by the divergence of the flux field. This does not refer to the density itself, but instead the time-rate-of-change of the density. Consider the case of an incompressible fluid, for which the general formulation holds, and the density is constant. The equation then reduces to state that the velocity field is divergence free.

Mass is conserved, not density. An equation for mass conservation is obtained by integrating the presented equation over a volume that contains the material of interest. The volume integral of the divergence of the flux is transformed by application of the Gauss Theorem to the surface integral of the flux over the surface bounding the volume. The time-rate-of-change of the mass, dM/dt [=] kg/sec, can change only if there is a net non-zero mass flux, rho*u*A = W [=] kg/sec, through the surface bounding the volume. Where [=] means “has units of”.

Neutron,
I think you’re thinking Lagrangian, not Eulerian. In the equations I used the assumption is that the volume remains unchanged. Hence the density can only change if there is an addition of mass. In a Lagrangian framework you follow the fluid, rather than work on a fixed grid. In a Lagrangian framework the local density can change, but that’s because you’re not working in a framework in which you’re assuming that the simulation volume remains unchanged.

The statement quoted above actually doesn’t have consistent units; the density and the flux have different units.

Yes, it does have consistent units. A change in density is density over time. The gradient of the flux is density times velocity divided by a length, which also has units of density over time.

Energy addition through the walls of a closed container, for example, will cause density change.

Density is simply mass divided by volume. If neither the mass nor the volume change, then the density cannot change. Adding energy will change the pressure and – consequently – the temperature, but can’t change the density if the mass and volume are fixed.