After reading Easterbrook’s blog post about “climate model validation”, and some discussions of this topic elsewhere, I noticed that there is some “computer terminology” floating around that disguises itself as plain English! This has led to some confusion, so I’d like to explain some of it here.

Technobabble: The Quest for Cooperation

Climate change may be the first problem in the history of humankind that has to be tackled on a global scale, by people all over the world working together. Of course, a prerequisite of working together is a mutual understanding and a mutual language. Unfortunately every single one of the many professions that scientists and engineers engage in have created their own dialect. And most experts are proud of it!

When I read about the confusion that “validation” versus “verification” of climate models has caused, I was reminded of the phrase “technobabble”, which screenwriters for the TV series Star Trek used whenever they had to write a dialog involving the engineers on the Starship Enterprise. Something like this:

“Captain, we have to send an inverse tachyon beam through the main deflector dish!”

“Ok, make it so!”

Fortunately, neither Captain Picard nor the audience had to understand what was really going on.

It’s a bit different in the real world, where not everyone may have the luxury of staying on the sidelines while the trustworthy crew members in the Enterprise’s engine room solve all the problems. We can start today by explaining some software engineering technobabble that came up in the context of climate models. But why would software engineers bother in the first place?

Short Review of Climate Models

Climate models come in a hierarchy of complexity. The simplest ones only try to simulate the energy balance of the planet earth. These are called energy balance models. They don’t take into account the spherical shape of the earth, for example.

At the opposite extreme, the most complex ones try to simulate the material and heat flow of the atmosphere and the oceans on a topographical model of the spinning earth. These are called general circulation models, or GCMs for short. GCMs have a lot of code, sometimes more than a million lines of code.

A line of code is basically one instruction for the computer to carry out, like:

add 1/2 and 1/6 and store the result in a variable called e

print e on the console

In order to understand what a computer program does, in theory, one has to memorize every single line of code and understand it. And most programs use a lot of other programs, so in theory one would have to understand those, too. This is of course not possible for a single person!

We hope that taking into account a lot of effects, which results in a lot of lines of code, makes the models more accurate. But it certainly means that they are complex enough to be interesting for software engineers.

In the case of software that is used to run an internet shop, a million lines of code isn’t much. But it is already too big for one single person to handle. Basically, this is where all the problems start, that software engineering seeks to solve.

When more than one person works on a software project things often get complicated.

(From the manual of CVS, the “Concurrent Versions System”.)

Software Design Circle

The job of software engineer is in some terms similar to the work of an architect. The differences are mainly due to the abstract nature of software. Everybody can see if a building is finished or if it isn’t, but that’s not possible with software. Nevertheless every software project does come to an end, and people have to decide whether or not the product, the software, is finished and does what it should. But since software is so abstract, people have come up with special ideas about how the software “production process” should work and how to tell if the software is correct. I would like to explain these a little bit further.

Stakeholders and Shifts in Stakeholder Analysis

There are many different people working in an office building with different interests: cleaning squads, janitors, plant security, and so on. When you design a new office building, you need to identify and take into account all the different interests of all these groups. Most software projects are similar, and the process just mentioned is usually called stakeholder analysis.

Of course, if you take into account only the groups already mentioned, you’ll build an office building without any offices, because that would obviously be the simplest one to monitor and to keep working. Such an office building wouldn’t make much sense, of course! This is because we made a fatal mistake with our stakeholder analysis: we failed to take into account the most important stakeholders, the people who will actually use the offices. These are the key stakeholders of the office building project.

After all, the primary purpose of an office building is to provide offices. And in the end, if we have an office building without offices, we’ll notice that no one will pay us for our efforts.

Gathering Requirements

While it may be obvious what most people want from an office building, the situation is usually much more abstract, hence much more complicated, for software projects.

This is why software people carry out a requirement analysis, where they ask the stakeholders what they would like the software to do. A requirement for an office building might be, for example, “we need a railway station nearby, because most of the people who will work in the building don’t have cars.” A requirement for a software project might be, for example, “we need the system to send email notifications to our clients on a specific schedule”.

In an ideal world, the requirement analysis would result in a document —usually called something like a system specification—that contains both the requirements, and also descriptions of the test cases that are needed to test whether the finished system meets the requirements. For example:

“Employee A lives in an apartment 29 km away from the office building and does not have a car. She gets to work within 30 minutes by using public transportation.”

Verification versus Validation

When we have finished the office building (or the software system), we’ll have to do some acceptance testing, in order to convince our customer that she should pay us (or simply to use the system, if it is for free). When you buy a car, your “acceptance test” is driving away with it—if that does not work, you know that there is something wrong with your car! But for complicated software—or office buildings—we need to agree on what we do to test if the system is finished. That’s what we need the test cases for.

If we are lucky, the relevant test cases will already be described in the system specification, as noted above. But that is not the whole story.

Every scientific community that has its own identity invents its own new language, often borrowing words from everyday language and defining new, surprising, special meanings for them. Software engineers are no different. There are, for example, two very different aspects to testing a system:

• Did we do everything according to the system specification?

and:

• Now that the system is there, and our key stakeholders can see it for themselves, did we get the system specification right: is our product useful to them?

The first is called verification, the second validation. As you can see, software engineers took two almost synonymous words from everyday language and gave them quite different meanings!

For example, if you wrote in the specification for an online book seller:

“we calculate the book price by multiplying the ISBN number by pi”

and the final software system does just that, then the system is verified. But if the book seller would like to stay in business, I bet that he won’t say the system has been validated.

Stakeholders of Climate Models

So, for business applications, it’s not quite right to ask “is the software correct?” The really important question is: “is the software as useful for the key stakeholders as it should be?”

But in Mathematics Everything is Either True or False!

One may wonder if this “true versus useful” stuff above makes any sense when we think about a piece of software that calculates, for example, a known mathematical function like a “modified Bessel function of the first kind”. After all, it is defined precisely in mathematics what these functions look like.

If we are talking about creating a program that can evaluate these functions, there are a lot of technical choices that need to be specified. Here is a random example (if you don’t understand it, don’t worry, that is not necessary to get the point):

• Current computers know data types with a finite value range and finite precision only, so we need to agree on which such data type we want as a model of the real or complex numbers. For example, we might want to use the “double precision floating-point format”, which is an international standard.

Another aspect is, for example, “how long may the function take to return a value?” This is an example of a non-functional requirement (see Wikipedia). These requirements will play a role in the implementation too, of course.

However, apart from these technical choices, there is no ambiguity as to what the function should do, so there is no need to distinguish verification and validation. Thank god that mathematics is eternal! A Bessel function will always be the same, for all of eternity.

Unfortunately, this is no longer true when a computer program computes something that we would like to compare to the real world. Like, for example, a weather forecast. In this case the computer model will, like all models, include some aspects of the real world. Or rather, some specific implementations of a mathematical model of a part of the real world.

Verification will still be the same, if we understand it to be the stage where we test to see if the single pieces of the program compute what they are supposed to. The parts of the program that do things that can be defined in a mathematically precise way. But validation will be a whole different step if understood in the sense of “is the model useful?”

But Everybody Knows What Weather Is!

But still, does this apply to climate models at all? I mean, everybody knows what “climate” is, and “climate models” should simulate just that, right?

As it turns out, it is not so easy, because climate models serve very different purposes:

• Climate scientists want to test their understanding of basic climate processes, just as physicists calculate a lot of solutions to their favorite theories to gain a better understanding of what these theories can and do model.

• Climate models are also used to analyse observational data, to supplement such data and/or to correct them. Climate models have had success in detecting misconfiguration and biases in observational instruments.

• Finally, climate models are also used for global and/or local predictions of climate change.

The question “is my climate model right?” therefore translates to the question “is my climate model useful?” This question has to refer to a specific use of the model, or rather: to the viewpoint of the key stakeholders.

The Shift of Stakeholders

One problem of the discussions of the past seems to be due to a shift of the key stakeholders. For example: some climate models have been developed as a tool for climate scientists to play around with certain aspects of the climate. When the scientists published papers, including insights gained from these models, they usually did not publish anything about the implementation details. Mostly, they did not publish anything about the model at all.

This is nothing unusual. After all, a physicist or mathematician will routinely publish her results and conclusions—maybe with proofs. But she is not required to publish every single thought she had to think to produce her results.

But after the results of climate science became a topic in international politics, a change of the key stakeholders occurred: a lot of people outside the climate science community developed an interest in the models. This is a good thing. There is a legitimate need of researchers to limit participation in the review process, of course. But when the results of a scientific community become the basis of far-reaching political decisions, there is a legitimate public interest in the details of the ongoing research process, too. The problem in this case is that the requirements of the new key stakeholders, such as interested software engineers outside the climate research community, are quite different from the requirements of the former key stakeholders, climate scientists.

For example, if you write a program for your own eyes only, there is hardly any need to write a detailed documentation of it. If you write it for others to understand it, as rule of thumb, you’ll have to produce at least as much documentation as code.

Back to the Start: Farms, Fields and Forests

As an example of a rather prominent critic of climate models, let’s quote the physicist Freeman Dyson:

The models solve the equations of fluid dynamics and do a very good job of describing the fluid motions of the atmosphere and the oceans.

They do a very poor job of describing the clouds, the dust, the chemistry and the biology of fields, farms and forests. They are full of fudge factors so the models more or less agree with the observed data. But there is no reason to believe the same fudge factors would give the right behaviour in a world with different chemistry, for example in a world with increased CO2.

Let’s assume that Dyson is talking here about GCMs, with all their parametrizations of unresolved processes (which he calls “fudge factors”). Then the first question that comes to my mind is “why would a climate model need to describe fields, farms and forests in more detail?”

I’m quite sure that the answer will depend on what aspects of the climate the model should represent, in what regions and over what timescale.

And that certainly depends on the answer to the question “what will we use our model for?” Dyson seems to assume that the answer to this question is obvious, but I don’t think that this is true. So, maybe we should start with “stakeholder analysis” first.

Post navigation

32 Responses to Your Model Is Verified, But Not Valid! Huh?

I guess one of the big things I don’t really understand is how stakeholder analysis applies to “negative desires” such as “all the neglected elements in the model should have negligible effects on the simulation results (in the ways this stakeholder cares about)”. In a way that’s a restatement of your point about fields and farms: in order not to spent scarce programmer resources on farms and fields how can we show that they won’t affect the result significantly, without actually having a simulation that includes them for comparison? It’s always more difficult to provide convincing evidence for a negative assertion than a positive one, and that seems to be one of the major complaints people have.

…in order not to spent scarce programmer resources on farms and fields how can we show that they won’t affect the result significantly, without actually having a simulation that includes them for comparison?

That’s a very important point: How do we know that the models include all the relevant processes? This is a very important question, and I’d like to learn more about it and maybe write about it, too. For example, all climate models need to model cloud formation by heuristic parameterization, because computers can’t handle a resolution that would allow a model based on mircophysical effects. These parameterizations can be validated by comparing their results with weather forecast models that have a better resolution.

However, my main point here is that some critics seem to assume that models should somehow include all processes, or that the omission of a process proves that the model has to be invalid, which is not true. This depends critically on the timescale, for example. Some processes will not be relevant for a 100 year simulation, but become relevant on a 1000 year timescale.

I guess one of the big things I don’t really understand is how stakeholder analysis applies to “negative desires”…

In practice there is a chapter in the system specification with “general requirements” which will list no-go requirements by the key stakeholders, like “the gui has to be a web gui, because we cannot distribute a native client”. For every specification there is a formal review process, where key stakeholders can veto anything that violates such a requirement.

I have spent nearly quarter of a century working with travel demand models. During that period I have noted that transport models have increased in complexity but travel demand forecasts have not improved in accuracy. The key issues are:

1) Can I represent all the actions (including their consequences) that might be taken now to influence future travel demand? (For example, if I think I can improve the road system by changing signal settings, a model must represent the effects of traffic signals.)

2) Can I represent all the measurements that I might make in order to evaluate the level of travel demand (For example, am I in the way flows vary with time of day or do I just want annual average weekday traffic?)

3) Can I analyse sufficiently many different scenarios to obtain a robust forecast before I need to act?

Note that (3) requires leaving out as much irrelevant detail as possible, but we are not sure which details are relevant. nevertheless it is actually very important to leave processes out – it is no good taking two hours to forecast traffic conditions in an hour.

In my opinion, similar considerations apply to climate models. We need an estimate of the future course of the climate that is available well in advance of the situation we are predicting. If we included every detail of the world, producing the forecast would take at least as long (and probably much longer) as waiting for the events to occur in real life.

Like Tim, I agree that increasing climate model resolution really does help in prediction. What people really care about is regional/local prediction and if the region you’re interested in is smaller than a grid cell, then resolution matters. But it would probably also help a lot to go to cloud resolving models (currently impractical). On the other hand, as Tim points out, increasing resolution doesn’t always help without careful reparameterization.

But there is a wider debate about the utility of the hierarchy of models. Do you focus on making the model more complex in the hopes of reducing bias, but forgoing any estimate of variance? Or do you use a simpler model that you can afford to run enough times to quantify its uncertainties? What happens when your model becomes too complex to understand? Simpler models are useful here also to test competing physical hypotheses. And they can help to tell you which details it’s okay to leave out.

We need an estimate of the future course of the climate that is available well in advance of the situation we are predicting. If we included every detail of the world, producing the forecast would take at least as long (and probably much longer) as waiting for the events to occur in real life.

Computing time is one aspect, of course. No one will run a simulation on a supercomputer that takes more than, say, two weeks. But there is little doubt that a higher grid resolution for the simulation of the atmosphere, for example, would lead to better results, IMHO. Current resolutions are at ca. 10 km to 50 km, which means that there is one grid point every 10 km or 50 km. That’s a little bit coarse, especially when one considers cloud formation. There are definitely processes of cloud formation that are not resolved at this resolution.

Of course there is the general problem that higher resolution and the inclusion of more processes can make the model actually worse. This happens when the higher resolution takes into account one process but not another one that counteracts the effects of the first one. Or, with respect to clouds, a parameterization that works for a grid spacing of 50 km won’t necessarily work for a 10 km spacing, because at 10 km the model will actually include more effects that will then be in the model twice. And it is a nontrivial task to find a good parameterization, so that it may take some time to develop one that does a good job at 10 km, so that in the meantime the model produces better results with the 50 km spacing.

So, I think a fair question would be “do the models a very good job of describing the fluid motions of the atmosphere and the oceans?” I don’t think so, because the resolution makes it necessary to include parameterizations, which, I assume, is what Dyson calls “fudge factors”.

I don’t know why Dyson thinks that the fluid motion modelling isn’t a problem; I would like to see an explanation. It is certainly not obviously true.

“Do the models do a very good job of describing the fluid motions of the atmosphere and the oceans?” is kind of a meaningless question without defining what “a very good job” means (which, of course, is application dependent).

Nor is it very useful to conclude the answer is “no” simply because models include parameterizations. Models will always include parameterizations of sub-grid scale physics no matter how fine their resolution. What is considered “sub-grid scale” will change with grid scale, of course, but that’s beside the point.

“Do the models do a very good job of describing the fluid motions of the atmosphere and the oceans?” is kind of a meaningless question without defining what “a very good job” means (which, of course, is application dependent).

That’s precisely my point. (I was a little bit sloppy not to explain this in my comment, but the main point of the main blog post is just that.)

I don’t know why Dyson said that the fluid dynamics part is solved so well. Maybe because the underlying physics is understood very well, as are the methods of numerical approximation. Maybe this is true for the models used in the latest IPPC reports, that are used for a prediction of the global temperature over the next 50 or 100 years. I don’t have an opinion about this, because I don’t know enough about these models.

But there certainly are applications and models where the fluid dynamics part is not completely understood and solved. I had a particular example in mind, explained by Isaac Held on his blog (which we should add to the blogroll of Azimuth, by the way):

As Held explaines, there are some aspects of storm formation that he would like to model, which has not worked out as desired – yet.

Of course, for a temperature prediction for the next 100 years it is probably enough to get the heat transfer right, which can work out even if – say – some kinds of storms never occur in the model. The model can get the heat transfer on a global scale, due to convection, mainly caused by turbulence, right nevertheless.

Isn’t the main problem with using models for predicting natural systems that the natural systems only have temporarily stable states of organization?

That becomes the dominant feature of natural systems when you spend time observing them, anyway. They are essentially “learning processes” that are continually discovering new ways to work. That’s what distinguishes them from machines. They are rather continually developing new kinds of organization, and models don’t do that. Models don’t even tell you when or where some new state of organization in the observed system already has, is presently or will soon be developing a new way of working.

They are rather continually developing new kinds of organization, and models don’t do that. Models don’t even tell you when or where some new state of organization in the observed system already has, is presently or will soon be developing a new way of working.

Agreed, but there are a lot of processes that I don’t expect to change during the next 100 or million years, like

• the main flow processes in the atmosphere and the oceans,

• the radiative properties and processes of the atmosphere

etc. What a climate model of the young earth -taking the above aspects into account – would never tell us, is, for example, what is life, how did it happen that it came into being, and the sudden explosion of “life forms” on the planet. This in turn has profound influence on the atmosphere (O2!), the albedo and other factors, which in turn should have a profound effect on the processes that the model does include.

(This aspect is not very important for climate models that cover a rather short time period like 100 years. Unless humans do something drastic, of course, like destroying half of earth’s forests.)

But such a model would still have a very important use: It would tell us that it is wrong. Having a model that turns out to be wrong is a better start than saying that we cannot have any knowledge at all, and stand in wonder why incomprehensible events unfold before our eyes :-)

Yes of course, but science has yet to give proper attention to which is which, how to tell the difference between processes you might anticipate will change tomorrow and those unlikely to ever change.

I wrote a paper on the technique needed, how to identify processes you can anticipate will change form for natural structural causes, called “Models Learning Change” http://www.cosmosandhistory.org/index.php/journal/article/view/176/295 that comes down to two things. What it offers is a greatly enhanced way to anticipate when a model will become “wrong”, rather than wait for system failure like we have done with economics.

One is that you need to ask the question, as failing to do that is remarkably clearly demonstrated in our global culture’s firm commitment to a model for limitless acceleration of change in its own complexity and energy use. The other is to use indications of seemingly constant proportional change as a signal to locate and identify the feedback network doing it and what will destabilize it.

Your model may be completely wrong and still very usefull to you or your peers in terms of publications, grants, careers, money, success, political influence etc.

“Let’s assume that Dyson is talking here about GCMs, with all their parametrizations of unresolved processes (which he calls “fudge factors”). Then the first question that comes to my mind is “why would a climate model need to describe fields, farms and forests in more detail?””

Because they both influence global climate and are in turn influenced by it. Thats why any model that is to be taken seriously has to either include their contribution in a satisfactory way or prove conclusively that it is insignificant (if thats even possible).

Your model may be completely wrong and still very usefull to you or your peers in terms of publications, grants, careers, money, success, political influence etc.

We cannot discuss the hidden agenda of people that we don’t know (the agenda, and at least in my case, also the people), in a rational way, on this blog.

But I can tell you this of myself: If I were interested primarily in money, I’d be writing IPad and IPhone apps right now instead of looking into climate models and post about them here.

Thats why any model that is to be taken seriously has to either include their contribution in a satisfactory way or prove conclusively that it is insignificant (if thats even possible).

I agree with you that what climate models do and where the most important problems are need to be explained in much more detail, to the public. Which is why I may write more about that topic here.

However: A model is and can be useful in a specific realm of application only. Demanding that a “climate model” (I assume you mean GCMs designed to predict local temperature and precipitation changes over the next century) has to take into account all possible factors means “throwing out the child with the bath”.

(“Taking into account” means either include into the model or “prove” that this is not necessary.)

As physicist and engineer I’d say that a lot of models are hopelessly inaccurate, but turn out to be very useful nevertheless, because it is still possible to learn a lot from them. The climate may seem to be very chaotic and inaccessable to a reductionistic approach, but I’m mostly sure that a virus would have the same impression with regard to what we use to model as ideal laminar flows, like blood, or water flows along pipes :-)

Just because we puny humans are to small compared to the processes that govern the atmosphere and the oceans doesn’t mean that statistical physics cannot be applied.

Tim van Beek: “We cannot discuss the hidden agenda of people that we don’t know (the agenda, and at least in my case, also the people), in a rational way, on this blog.”

That dosn’t mean we can simply assume its not there as the quote I refered to does.

Tim van Beek: “However: A model is and can be useful in a specific realm of application only. Demanding that a “climate model” (I assume you mean GCMs designed to predict local temperature and precipitation changes over the next century) has to take into account all possible factors means “throwing out the child with the bath”…”

The realm of application is clear – the models I am talking about are the ones on which predictions of global average temperature during the next century are based.

I don’t see what argument you are trying to make with the “throwing out the child with the bath” hyperbole here. That a flawed model is better then no model? This is certainly not true, a flawed model is much worse then no model, and if we know that it is flawed we should certainly throw it out.

Whether an incomplete or broken model can teach anyone anything is completely irrelevant to the larger issue. What matters is that many climatologists are professing doom and calling for major changes in public life and economy and all they have to support their position are such questionable climate models.

This is why reliability of those models is critically important. But unfortunately climate science is not like physics or engineering you mentioned in that we cannot perform experimental tests to verify that predictions of those models are correct.

All we can do to get some feeling for their quality is to analyze them to see if they at least correctly capture all the known and verified interactions on the qualitative level. If they “do a very poor job of describing the clouds, the dust, the chemistry and the biology of fields, farms and forests” then they fail even this most basic sanity check and no rational person should take their predictions seriously.

Winsberg makes the point that to understand whether a model is valid for the use to which it is put, you have to look at it’s history. He gives some examples of techniques used in climate modeling where the model deviates deliberately from the real-world physics, and points out that this is used for “falsification experiments” in the Popperian sense.

There’s a slightly more skeptical overview of the history of climate modeling in:

Applying the Maximum Entropy Principle seems to be the antidote to this problem of “reliability without truth”, especially in regards to dispersed statistical data. It assumes no extra information beyond the known values of moments.

He gives some examples of techniques used in climate modeling where the model deviates deliberately from the real-world physics, and points out that this is used for “falsification experiments” in the Popperian sense.

I understand Winsberg a little bit differently:

He introduces the term “falsification” to denote a property of a physical system that we know to be wrong, but that we introduce into the model to improve it nevertheless. He notes (footnote 2) that this is a completely different meaning than “falsification” has for Popper. I would have used a different term then, but, well,…

The concept of “artificial viscosity” isn’t special to climate science, but is used in computational fluid dynamics in general. In fact, I doubt that it is used in climate science at all, because “shock waves” aren’t important for a large scale model of the atmosphere (you can in fact neglect that air is compressible for applications in meteorology). Please correct me if I am wrong.

Engineering applications of fluid dynamics have drastically different requirements than large scale models of the atmosphere and the oceans. E.g. in engineering, fast flows and complex geometries rule, effects like cavitation are important.

Last but not least I don’t thinkt that artificial viscosity is a good example of what Winsberg has in mind. It is very difficult to devise numerical approximation schemes to the Navier-Stokes equations that capture important effects, like, for example, shock waves. Therefore, in the process to devise a map from the Navier-Stokes equations to a discrete implementable version, one introduces some auxilliary elements.

It is up to you if you interpret some of those in physical terms, like “artificial viscosity”: You can say “this part of the numerical scheme could be interpreted in physical terms like a localized high viscosity”. But you don’t have to. Most people don’t try to interpret the grid spacing in similar term, for example, but you could, of course, like “evey grid point corresponds to a little measuring device that takes notes of velocity and pressure”.

The difference of “artificial viscosity” and “grid spacing” is this: In the limit where the grid spacing goes to zero, it is possible to prove that the numerical solution converges to the exact solution (provided that it exists) with respect to some metric on some function space (with the addition of some more technical assumptions aka math technobabble). This is not possible with “artificial viscosity”. In this sense the “grid spacing” is a “consistent discrete approximation” (this, too, is math technobabble), the “artificial viscosity” isn’t.

This is more like the difference of a convergent and an asymptotic series.

The concept of “artificial viscosity” isn’t special to climate science, but is used in computational fluid dynamics in general. In fact, I doubt that it is used in climate science at all, because “shock waves” aren’t important for a large scale model of the atmosphere

“Artificial viscosity”, meaning an “unphysically large value of viscosity”, is indeed used in climate science. Climate modelers call it “hyperviscosity”, which I once mentioned to you here.

My educated guess is that hyperviscosity is not quite the same as artificial viscosity, because they address different problems: Artificial viscosity is added to “smooth out” shock waves. Hyperviscosity is added to model energy dissipation due to turbulence that is lost due to the grid resolution.

It is difficult to discuss this without any deeper knowledge (on my part), but naively I’d say that these are different concepts. And that “shock waves” are negligible for climate modelling.

Warning, dangerous half-knowledge:
For computational fluid dynamics, the fact that air is compressible becomes relevant at velocities around 0.5 Mach. No important process of the atmosphere that has non-negligible influence on global climate comes close to this velocity. Therefore climate models don’t have to include any related effects. Unless there is a considerable fleet of big spaceships flying around at these speeds, like in “Independence Day”.

That’s a really interesting article! There are a lot of points worthy of discussion, but I’ll concentrate on one point only: Verification and validation.

The concepts as I have defined them here, have no epistemic content whatsoever. Both verification and validation are well defined as long as you know what your key stakeholders need (or rather: what they should think that they need). These are the concepts that you need to understand to become a certified (software) test manager or test analyst by the standards of the International Software Testing Qualifications Board.

Müller uses a completely different definition motivated by epistemology according to Karl Popper. I don’t thinkt that this definition is of much use here, because, on a philosophical level, there isn’t much difference between climate science and climate models and particle physics, for example, much less than Müller describes.

For example, an experiment isn’t “reproducable” in particle physics, strictly speaking, from many different viewpoints. You need a lot of computer models and power to “compute data” that you could compare to your theory. The models used to “compute data” in particle physics are precisely the models that you try to test with these data. Etc.

Anyway, the really important point is to keep in mind that there are at least two very different definitions of verification and validation around, and one should always state clearly which one is used.

I like the article by Mueller, he brings up the concept of “trustworthiness” and using ensembles and more. also i lhis has a lot of merit. (everyttime i read a climate paper now, the models have bifurcations and chaotic parts,:

“In a much referenced article, Oreskes et al.
take the view that computer models in environmental
sciences can neither be veriﬁed nor validated, for
the above logical reasons and for the fact that
environmental systems are never closed and can never
be known completely due to their fractal structure.

Why foesn’t freeman-Dyson spend his retirement with that instead:-). Thanks Steve for the references.

Paul Edward’s “A Vast Machine” is on Azimuth’s recommended reading list. I learned a lot from it about the history of the subject.

I really liked the illustration of Richardson’s “forecast factory”, which you can see e.g. here. The idea is to create a spherical building representing a grid on the surface of earth, with one human calculator on every grid point, solving the modelling equations by hand and communicating the results to his neighbours. It is like a grid model today, but with humans instead of executable code units :-)

The discussion in chapter 13, “Models versus Data: Validation, Verification or Evaluation” was another reason why I wrote this blog post.

Richardson’s forecast factory is a good illustration of the reason why I don’t think that computer models pose a new challenge to epistemology: It’s not more than an advanced version of pen and paper. And I haven’t seen philosophers discussing that the existence of pen and paper poses a fundamental challenge to the way humans aquire knowledge.

The words “verification and validation” show the difficulty with abstraction. It’s much clearer (to me) to talk about code bugs and spec bugs, but V&V is used beyond software, so that won’t do it for everyone. Personally what I’d like out of “green mathematics” is a toolkit for non-mathematicians of concepts, diagrams, design patterns and actual software for working with abstractions up and (especially) down a hierarchy.

I wasn’t aware, before composing this comment, that climate modeling is a pretty social activity. There are events and organizations dedicated to testing different models against each other (“intercomparison”), and it’s common to glue models together into useful hybrids (“integration”). Both of these require formalizing the ways different models relate to each other, probably via more abstractions.

Here’s the rub. Making abstractions is a little bit interesting, but progress occurs when whatever is learned is made concrete again. Comparing abstracted values from two models might show that one is faulty, but the work and the reward are to find and fix the problem. Similarly when two models interact dynamically, incoming insights need to be merged into each model’s concrete state in a self-consistent way.

It may be that navigating back down an abstraction is intrinsically hard and specific to each new problem. But it’s my impression that while we engineers cobble something together by hand from scratch each time, category theorists know about some reusable machinery. Hopefully there’s low hanging fruit.

For example though I’m still struggling to relate them to my field, the “constellation” diagrams of opetopes by Joachim Kock et al are intriguing for their simplicity. Contrast these to UML, the graphical language used in software engineering, which gives an ad-hoc grab bag of diagrams specialized for a few common abstractions, but not very useful for others.

As they sometimes say, Verification and Validation answer the following questions:

Validation: Are we building the right system ?
Verification: Are we building the system right ?

The Verification problem nowadays can be solved automatically for practical purposes, by generating test cases that (try to) cover the whole behavior of the software, and automatically simulating the system for each case. There are pretty much well developed metrics to address the question of how much is good enough, in terms of model coverage.

After hanging out in a few NASA-organized IV&V conferences, i somehow also got the gist (but don’t hold me to that, i am not a V&V engineer) that if one writes the requirements in a formal language (e.g. UML or FSMs) then one can use the same verification techniques to solve the validation problem.

This seems to be especially convenient when you use code generation techniques to automatically generate lower level code from higher level description, as it is now routine in both Aerospace and Automotive industries. Then, at the end of the chain, static verification techniques also play a part in proving that there are no runtime errors (e.g. divisions by zero, and stuff like that) in the lower level code.

The Verification problem nowadays can be solved automatically for practical purposes, by generating test cases that (try to) cover the whole behavior of the software, and automatically simulating the system for each case. There are pretty much well developed metrics to address the question of how much is good enough, in terms of model coverage.

For some purposes, certainly, but for all? I assume you are talking about unit tests and C0 and C1 code coverage as a metric. For the object oriented systems that I know of it is not possible to generate unit tests, because there is no formalized interface description. You have to write them by hand. To be honest, usually you have to refactor the code to make it amenable to unit testing in the first place…

After hanging out in a few NASA-organized IV&V conferences, i somehow also got the gist (but don’t hold me to that, i am not a V&V engineer) that if one writes the requirements in a formal language (e.g. UML or FSMs) then one can use the same verification techniques to solve the validation problem.

I don’t know about NASA, but in my world the UML diagrams are usually far from being a formal description of the system. For a system that is developed for 10 years without any change in plans, this may work. Like for NASA’s Mars missions. For other projects the requirements may change every month, week or even day. So even if you’d happen to be able to write a complete system specification in a highly formal way in UML, it will certainly be obsolete before you are able to even begin to prove that it is “correct”.

This seems to be especially convenient when you use code generation techniques to automatically generate lower level code from higher level description, as it is now routine in both Aerospace and Automotive industries. Then, at the end of the chain, static verification techniques also play a part in proving that there are no runtime errors (e.g. divisions by zero, and stuff like that) in the lower level code.

It’s correct that nobody – AFAIK -in the automotive industry writes low level code for embedded microprocessors, for example; this code is generated. But in practice there are severe limitations to MDD (model driven design) and other abstraction techniques; I know many software projects that cannot go to a higher abstraction level than 4th generation programming languages, for good reasons.

I don’t know about NASA, but in my world the UML diagrams are usually far from being a formal description of the system.

Maybe on a very high level they could be, but anyway, in practice people do dot (and will never, IMHO) write specifications in UML, they use word or Doors, and then the requirements are traced back to block in diagrams (manually) and to the generated code (automatically). But then again i am mostly talking about model based design, which i know best.

I don’t think that it is possible to specify the interactions of system components in UML or any other graphical language I know in sufficient precision.
In my experience, in practice UML is used in this ways:

* entity-relationship diagrams are used to model the layout of relational database systems, up to the generation of DDL scripts (DDL = database definition language),

* class diagrams are used to specify the interfaces of components and are used to generate code stubs,

* interaction diagrams are used to specify specific complex processes, although Germans would typically use “ereignisgesteuerte Prozessketten” (“event driven process chains”, ARIS) for this purpose.

* Use Case diagrams are used as an illustration for, well, use cases.

The artefacts that are generated from UML are very limited, it works best for ER-diagrams -> DDL scripts.

I don’t think that it is possible to specify the interactions of system components in UML or any other graphical language I know in sufficient precision.

Well it’s possible in Simulink (and it’s actually very convenient because the blocks do not represent states but subsystems, so the diagram does not represent the behavior but the structure of the system). I would say that the fact that you can generate directly executable code also proves that the level of precision is sufficient. But i don’t think this is a good environment to write high level requirements (and also if the interactions between subsystems is always two-way then this complicates things a little).

For diagrams that represent the behavior of the system, like FSM (and therefore UML), i think that representing the interactions of the systems is exceedingly hard and inconvenient, but probably theoretically possible, (since anything you can do with code you can also do with a FSM, that would be my reasoning).

In any case, i don’t think using a graphical language for climate modeling is a good idea.

How To Write Math Here:

You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.