How climate scientists test, test again, and use their simulation tools.

Model simulation showing average ocean current velocities and sea surface temperatures near Japan.IPCC
Talk to someone who rejects the conclusions of climate science and you’ll likely hear some variation of the following: “That’s all based on models, and you can make a model say anything you want.” Often, they’ll suggest the models don’t even have a solid foundation of data to work with—garbage in, garbage out, as the old programming adage goes. But how many of us (anywhere on the opinion spectrum) really know enough about what goes into a climate model to judge what comes out?

Climate models are used to generate projections showing the consequences of various courses of action, so they are relevant to discussions about public policy. Of course, being relevant to public policy also makes a thing vulnerable to the indiscriminate cannons on the foul battlefield of politics.

Skepticism is certainly not an unreasonable response when first exposed to the concept of a climate model. But skepticism means examining the evidence before making up one’s mind. If anyone has scrutinized the workings of climate models, it’s climate scientists—and they are confident that, just as in other fields, their models are useful scientific tools.

It’s a model, just not the fierce kind

Climate models are, at heart, giant bundles of equations—mathematical representations of everything we’ve learned about the climate system. Equations for the physics of absorbing energy from the Sun’s radiation. Equations for atmospheric and oceanic circulation. Equations for chemical cycles. Equations for the growth of vegetation. Some of these equations are simple physical laws, but some are empirical approximations of processes that occur at a scale too small to be simulated directly.

Cloud droplets, for example, might be a couple hundredths of a millimeter in diameter, while the smallest grid cells that are considered in a model may be more like a couple hundred kilometers across. Instead of trying to model individual droplets, scientists instead approximate their bulk behavior within each grid cell. These approximations are called “parameterizations.”

Connect all those equations together and the model operates like a virtual, rudimentary Earth. So long as the models behave realistically, they allow scientists to test hypotheses as well as make predictions testable by new observations.

Some components of the climate system are connected in a fairly direct manner, but some processes are too complicated to think through intuitively, and climate models can help us explore the complexity. So it’s possible that shrinking sea ice in the Arctic could increase snowfall over Siberia, pushing the jet stream southward, creating summer high pressures in Europe that allow India’s monsoon rains to linger, and on it goes… It’s hard to examine those connections in the real world, but it’s much easier to see how things play out in a climate model. Twiddle some knobs, run the model. Twiddle again, see what changes. You get to design your own experiment—a rare luxury in some of the Earth sciences.EnlargeDiagram of software architecture for the Community Earth System Model. Coupled models use interacting components simulating different parts of the climate system. Bubble size represents the number of lines of code in each component of this particular model. Kaitlin Alexander, Steve Easterbrook
In order to gain useful insights, we need climate models that behave realistically. Climate modelers are always working to develop an ever more faithful representation of the planet’s climate system. At every step along the way, the models are compared to as much real-world data as possible. They’re never perfect, but these comparisons give us a sense for what the model can do well and where it veers off track. That knowledge guides the use of the model, in that it tells us which results are robust and which are too uncertain to be relied upon.

Andrew Weaver, a researcher at the University of Victoria, uses climate models to study many aspects of the climate system and anthropogenic climate change. Weaver described the model evaluation process as including three general phases. First, you see how the model simulates a stable climate with characteristics like the modern day. “You basically take a very long run, a so-called ‘control run,’” Weaver told Ars. “You just do perpetual present-day type conditions. And you look at the statistics of the system and say, ‘Does this model give me a good representation of El Niño? Does it give me a good representation of Arctic Oscillation? Do I see seasonal cycles in here? Do trees grow where they should grow? Is the carbon cycle balanced?’ ”

Next, the model is run in changing conditions, simulating the last couple centuries using our best estimates of the climate “forcings” (or drivers of change) at work over that time period. Those forcings include solar activity, volcanic eruptions, changing greenhouse gas concentrations, and human modifications of the landscape. “What has happened, of course, is that people have cut down trees and created pasture, so you actually have to artificially come in and cut down trees and turn it into pasture, and you have to account for this human effect on the climate system,” Weaver said.

The results are compared to observations of things like changing global temperatures, local temperatures, and precipitation patterns. Did the model capture the big picture? How about the fine details? Which fine details did it simulate poorly—and why might that be?Enlarge Comparison of observed (top) and simulated (bottom) average annual precipitationbetween 1980 and 1999.IPCC
At this point, the model is set loose on interesting climatic periods in the past. Here, the observations are fuzzier. Proxy records of climate, like those derived from ice cores and ocean sediment cores, track the big-picture changes well but can’t provide the same level of local detail we have for the past century. Still, you can see if the model captures the unique characteristics of that period and whatever regional patterns we’ve been able to identify.

This is what models go through before researchers start using them to investigate questions or provide estimates for summary reports like those produced for the Intergovernmental Panel on Climate Change (IPCC).

Coding the climate

Some voices in the public debate over climate science have been critical of the fact that there is no standardized, independent testing protocol for climate models like those used for commercial and engineering applications. Climate scientists have responded that climate models are so different as to make such an “independent verification and validation” process incompatible.

Steve Easterbrook, a professor of computer science at the University of Toronto, has been studying climate models for several years. “I’d done a lot of research in the past studying the development of commercial and open source software systems, including four years with NASA studying the verification and validation processes used on their spacecraft flight control software,” he told Ars.

When Easterbrook started looking into the processes followed by climate modeling groups, he was surprised by what he found.

“I expected to see a messy process, dominated by quick fixes and muddling through, as that’s the typical practice in much small-scale scientific software. What I found instead was a community that takes very seriously the importance of rigorous testing, and which is already using most of the tools a modern software development company would use (version control, automated testing, bug tracking systems, a planned release cycle, etc.).”

“I was blown away by the testing process that every proposed change to the model has to go through,” Easterbrook wrote.

“Basically, each change is set up like a scientific experiment, with a hypothesis describing the expected improvement in the simulation results. The old and new versions of the code are then treated as the two experimental conditions. They are run on the same simulations, and the results are compared in detail to see if the hypothesis was correct. Only after convincing each other that the change really does offer an improvement is it accepted into the model baseline.”

Easterbrook spent two months at the UK Met Office Hadley Centre, observing and describing the operations of the climate modeling group (which is about 200 scientists strong). He looked at everything from code efficiency to debugging to the development process. He couldn’t find much to critique, concluding that “it is hard to identify potential for radical improvements in the efficiency of what is a ‘grand challenge’ science and software engineering problem.”

Easterbrook has argued against the idea that an independent verification and validation protocol could usefully be applied to climate models. One problem he sees is that climate models are living scientific tools that are constantly evolving rather than pieces of software built to achieve a certain goal. There is, for the most part, no final product to ship out the door. There’s no absolute standard to compare it against either.

To give one example, adding more realistic physics or chemistry to some component of a model sometimes makes simulations fit some observations less well. Whether you add it or not then depends on what you’re trying to achieve. Is the primary test of the model to match certain observations or to provide the most realistic possible representation of the processes that drive the climate system? And which observations are the most important to match? Patterns of cloud cover? Sea surface temperature?

As more features have been added, current models have become much more sophisticated than models were 20 years ago, so the standards by which they’re judged have tightened. It’s entirely possible that earlier models would have failed testing that today’s models would pass. But that doesn’t mean that the older models were useless; they may have just gotten fewer physical processes right or had a much lower resolution.

If, as Easterbrook argues, the models are essentially manifestations of the scientific community’s best available knowledge, there’s already a process in place to evaluate them—science. Experiments are replicated by other groups using their own models. Individual peer-reviewed studies are considered in the context of the accumulated knowledge of climate science. Climate models are not so different from other methods of inquiry in that a new scientific method must be invented especially for them.

Firing up the wayback machine

The individual researchers who are part of these modeling efforts work on very different aspects of the model, and each requires a slightly different way of doing things. Bette Otto-Bliesner works on the Community Earth System Model at the National Center for Atmospheric Research (which recently opened a new supercomputing center). Her research focuses on using climate models to understand past climate, working out the mechanisms that drove the events recorded in things like ocean sediment cores. “My research goal is to understand the uncertainties in the climate and Earth system responses to forcings using past time periods to provide more confidence in our projections of future change,” Otto-Bliesner told Ars.

Proxy records of climate from cores of ice or ocean sediments are limited to providing information about the geographic area from which they were collected, so climate models can help fill in the rest of the global picture. A model simulation of actual events—say, an immense ice-dammed lake draining into the North Atlantic and disrupting ocean circulation—can be compared to a network of proxy records to see if the simulated climate impact is consistent with what the proxies show. If the match is poor, then perhaps the observed change in climate was caused by something else.

Otto-Bliesner’s group is working to take this comparison one step further by having the model simulate the processes that create the proxy records as well. Instead of comparing the model to the interpretation of the proxy record data (such as temperature changes inferred from shifting isotope ratios), that data could be compared directly to a virtual version of the isotopes themselves, one produced by the model.

These paleoclimate simulations can serve to evaluate a model as well. The model can be run for interesting time periods, like the end of the last ice age, to see how well it simulates changes in temperature and ocean circulation. “We want to keep our paleo-simulations [separate] as an independent test of our models to changed forcings, so they are not included in the development process,” Otto-Bliesner told Ars. Since the climate was very different at times in the past, these tests help illuminate a model’s strengths and weaknesses.Enlarge / Snapshot from an experiment simulating the last 22,000 years. In the graph at the bottom,the dark line represents simulated surface temperature over Greenland and the lighter line showsdata from a Greenland ice core. National Center for Atmospheric Research/ University Corporation for Atmospheric Research

Setting the bar

Gavin Schmidt, a climate researcher at the NASA Goddard Institute for Space Studies, is more involved in the development itself. “I explore issues like how one evaluates [climate] models, how comparisons between models and observations should be done, and how one builds credibility in predictions,” he told Ars.

Improving the model means better simulating physical processes, Schmidt says, which doesn’t necessarily improve the large-scale match with every set of observations. “There are always observational datasets that show a mismatch to the model—either regionally or in time,” Schmidt explained. “Some of these mismatches are persistent (i.e., we haven’t found any way to alleviate them); some are related to issues/parameters that we have more of a handle on, and so they can be reduced in the next iteration. One problem is that in fixing one problem one often makes something else worse. Therefore, it is a balancing act that each model center does a little differently.”

One surprisingly common misconception about climate models is that they’re just exercises in curve-fitting. The global average temperature record is fed into the model, which matches that trend and spits out a simulation just like it. In this (mistaken) view, having a model that compares well with reality is a necessary outcome of the process. This doesn’t demonstrate that climate models can be trusted to usefully project future trends, but this line of thinking is mistaken for several reasons.

There’s obviously more to a climate model than a graph of global average temperature. Some parameterizations—those stand-ins for processes that occur at scales finer than a grid cell—are tuned to match observations. After all, they are attempts to describe a process in terms of its large-scale results. But successful parameterizations aren’t used as a gauge of how well the model is reproducing reality. “Obviously, since these factors are tuned for, they don’t count as a model success. However, the model evaluations span a much wider and deeper set of observations, and when you do historical or paleoclimate simulations, none of the data you are interested in has been tuned for,” Schmidt told Ars.

Why so cirrus?

Many of the most important parameterizations involve the complex behavior of clouds. Representing these processes effectively in a climate model is a key challenge, not just because they happen at scales far smaller than grid cells but because clouds play such a big role in the climate system. Storm patterns affect regional climate in many ways, and the way clouds respond to a warming climate could either enhance or partially offset the temperature change.

Tony Del Genio, another researcher at the NASA Goddard Institute for Space Studies, works on improving the way models simulate clouds. “The real world is more complicated than any model of it,” Del Genio told Ars. “Given the limited computing and human resources, we have to prioritize. We try to anticipate which processes that are missing from the model might be most important to include in the next-generation version (not everything that happens in the atmosphere is important to climate).”

“Once we identify a physical process we want to add or improve, we start with whatever fundamental understanding of the process that we have, and then we try to develop a way to approximately represent it in terms of the variables in the model (temperature, humidity, etc.) and write computer code to represent that,” Del Genio said. “We then run the model with the new process in it and we look for two things: whether the process as we have portrayed it behaves the way it does in the real world and whether or not it makes some aspect of the model’s climate more realistic. We do this by comparison to observations, either field experiment, satellite, or surface remote sensing observations, or by comparing to fine-scale models that simulate individual cloud systems.”

Del Genio says that while modelers used to focus more on whether the model simulations looked like the average conditions for an area, they’ve learned that other types of behavior—like large-scale weather patterns— are better indicators of the usefulness of a model for projecting into the future. “A good example of that is something called the Madden-Julian Oscillation (MJO for short), which most people in the US have probably never heard of,” Del Genio said. “The MJO causes alternating periods of very rainy and then mostly clear weather over periods of a month or so over the Indian Ocean and in southeast Asia and is very important to people in that part of the world. It also affects winter rainfall in the western US. It turns out that whether a model simulates the MJO or not depends strongly on how one represents the clouds that develop into thunderstorms in the model, so we observe it closely and try hard to get it right.”

Del Genio also gets to apply his knowledge and skills to other planets. Using the extremely limited information we have about the atmospheres of other planets, models can help work out how they behave. “For other planets, we are still asking basic questions about how a given planet’s atmosphere works—how fast do its winds blow and why, does it have storms like those on Earth, are those storms made of water clouds like on Earth, and why one planet differs from another,” Del Genio said.

Ice, on the rocks

While Tony Del Genio has his head in the clouds and outward into the Solar System beyond, Penn State glaciologist Richard Alley stands on ice sheets miles thick, thinking about what’s going on beneath his feet. Instead of trying to model the whole climate system, he’s focused on the behavior of valley glaciers and ice sheets. “An ice sheet is a two-mile-thick, one-continent-wide pile of old snow squeezed to ice under the weight of more snow and spreading under its own weight,” Alley told Ars. “The impetus for flow is essentially the excess pressure inside the ice compared to outside, and it’s usually quantified as being the product of the ice density, gravitational acceleration, thickness of ice above the point you’re talking about, and surface slope.”

Ice sheet models use the equations that describe that flow of ice to simulate how the ice sheet changes over time in response to outside factors. The size of an ice sheet, like a bank account, is determined by the balance of gains and losses. Increase the amount of melting going on at the edges of the ice sheet and it will shrink. Increase the amount of snowfall over the cold, central region of the ice sheet and it will grow. Lubricate the base of the ice sheet with liquid water, and it may flow faster to the sea, causing an overall loss of ice.

These models are complex and detailed enough that they’re usually run on their own rather than within a climate model that is already busy trying to handle the rest of the planet. Depending on the experiment being run with the model, climate conditions simulated by another model might be imported or a simpler, pre-determined scenario might suffice.

Like global climate models, ice sheet models can also be evaluated against what we know about the past. “Does the model put ice in places that ice was known to have been and not in places where ice was absent?” Alley said. “Are the fluctuations of ice in response to orbital forcing in the past configuration consistent with the reconstructed changes in sea level based on coastal indicators or isotopic composition of the ocean as inferred from ratios in particular shells in sediment cores?”

All this work eventually contributes to our understanding of how the ice sheet is likely to behave in the future. “For these projections to be reliable, we want to see similar behavior in a range of models, from simple to complex, run by different groups, and to understand physically why the models are producing the results they do; we’re especially confident if the paleoclimatic record shows a similar response to similar forcings in the past, and if we see the projected behavior emerging now in response to the recent human and natural forcings,” Alley said. “With all four—physical understanding, agreement in a range of models, observed in paleo and emerging now—we’re pretty confident; with fewer, less so.”

Along with providing better estimates of how ice sheets will contribute to sea level rise, ice sheet models also help generate research questions. By revealing the biggest sources of uncertainty, models can point to the types of measurements and research that will yield the greatest bang for the buck.

Enlarge Simulation of ice sheet elevation at the peak of the last ice age using the Parallel IceSheet Model and the ECHAM5 climate model. Florian Ziemen, Christian Rodehacke,Uwe Mikolajewicz (Max Planck Institute for Meteorology)

Community service

There’s another way in which these climate models are probed—by comparing them with each other. Since there are so many groups of researchers independently building their own models to approximate the climate system, the similarities and differences of their simulations can be illuminating.

Observational data is necessarily limited, but every single thing in a model can be examined. That makes model-to-model comparison more of an apples-to-apples affair when they’re run using the same inputs (like greenhouse gas emissions scenarios). The cause of a poor match between some portion of a model and reality isn’t always obvious, whereas it could jump out when the results are compared to those produced by another model.

There are many such “model intercomparison projects,” including ones focused on atmospheric models, paleoclimate simulations, or geoengineering research. The largest is the Coupled Model Intercomparison Project (CMIP), which has become an important resource for the Intergovernmental Panel on Climate Change reports. What started in 1995 as a simple project blossomed into an enormously useful organizing force for an abundance of research.

Each phase of the project includes a set of experiments chosen by the modeling community. In the latest round, for example, the models have been investigating short-term, decadal predictions, the way clouds change in a warming climate, and a new technique for making comparisons between model results and atmospheric data from satellites.

Apart from helping research groups improve their models, CMIP also makes climate simulations from all the models involved accessible to other researchers. Interested in the future behavior of Himalayan glaciers? Or the economic impact of changes in precipitation over the US? Simulations from a variety of models for a range of emissions scenarios are conveniently available in one place and in standardized formats. In a way, that coordination also increases the value of the studies that use this data. If three different studies on species migration caused by climate change each used arbitrarily different scenarios for the future, comparing their results could be more difficult.

The most visible product of CMIP has probably been its contribution to the IPCC reports. When the reports show model ensembles (many simulations averaged together), they’re pulling from the CMIP collection. Rather than choosing a preferred model, the IPCC essentially works from the average of all of them, while the range of their results is used as an indicator of uncertainty. In this way, the work of independent modeling groups around the world is aggregated to help inform policy makers.

No crystal ball—but no magic 8 ball, either

If you only tune in to public arguments about climate change or read about the latest study that uses climate models, it’s easy to lose sight of the truly extraordinary achievement those models represent. As Andrew Weaver told Ars, “What is so remarkable about these climate models is that it really shows how much we know about the physics and chemistry of the atmosphere, because they’re ultimately driven by one thing—that is, the Sun. So you start with these equations, and you start these equations with a world that has no moisture in the atmosphere that just has seeds on land but has no trees anywhere, that has an ocean that has a constant temperature and a constant amount of salt in it, and it has no sea ice, and all you do is turn it on. [Flick on] the Sun, and you see this model predict a system that looks so much like the real world. It predicts storm tracks where they should be, it predicts ocean circulation where it should be, it grows trees where it should, it grows a carbon cycle—it really is remarkable.”

But climate scientists know models are just scientific tools—nothing more. In studying the practices of climate modeling groups, Steve Easterbrook saw this firsthand. “One of the most common uses of the models is to look for surprises—places where the model does something unexpected, primarily as a way of probing the boundaries of what we know and what we can simulate,” he said. “The models are perfectly suited for this. They get the basic physical processes right but often throw up surprises in the complex interactions between different parts of the Earth system. It is in these areas where the scientific knowledge is weakest. So the models help guide the scientific process.”

“So I have tremendous respect for what the models are able to do (actually, I’d say it’s mind-blowing), but that’s a long way from saying that any one model can give accurate forecasts of climate change in the future on any timescale,” Easterbrook continued. “I’m particularly impressed by how much this problem is actively acknowledged and discussed in the climate modeling community and how cautious the modelers are in working to avoid any possible over-interpretation of model results.”

“One of the biggest sources of confidence in the models is that they give results that are broadly consistent with one another (despite some very different scientific choices in different models), and they give results that are consistent with the available data and current theory,” Easterbrook said. And while they’re being developed, the rest of the broad field of climate science is hard at work gathering more data and developing our theoretical understanding of the climate system—information that will inform the next generation of models.

The guiding principle in modeling of any kind was summarized by George E.P. Box when he wrote that “all models are wrong, but some are useful.” Climate scientists work hard to ensure that their models are useful, whether to understand what happened in the past or what could happen in the future.

Every projection showing multiple scenarios for future greenhouse gas emissions illustrates the present moment as a constantly shifting crossroads—the point where all future paths diverge, with their course determined using climate models. Armed with that map, we get to decide which of the possible paths we are going to make reality. The more we understand about the climate system and the more realistically climate models behave, the more detailed that map becomes. There’s always more to work out, but we’ve already advanced well past the stage where we need to ask for directions.