This blog provides updated forecasts and comments on current weather or other topics

Saturday, April 7, 2012

Lack of Computer Power Undermines U.S. Numerical Weather Prediction (Revised)

In my last blog on this subject, I provided objective evidence of how U.S. numerical weather prediction (NWP), and particularly our global prediction skill, lags between major international centers, such as the European Centre for Medium Range Weather Forecasting (ECMWF), the UKMET office, and the Canadian Meteorological Center (CMC). I mentioned briefly how the problem extends to high-resolution weather prediction over the U.S. and the use of ensemble (many model runs) weather prediction, both globally and over the U.S. Our nation is clearly number one in meteorological research and we certainly have the knowledge base to lead the world in numerical weather prediction, but for a number of reasons we are not. The cost of inferior weather prediction is huge: in lives lost, injuries sustained, and economic impacts unmitigated. Truly, a national embarrassment. And one we must change.

In this blog, I will describe in some detail one major roadblock in giving the U.S. state-of-the-art weather prediction: inadequate computer resources. This situation should clearly have been addressed years ago by leadership in the National Weather Service, NOAA, and the Dept of Commerce, but has not, and I am convinced will not without outside pressure. It is time for the user community and our congressional representatives to intervene. To quote Samuel L. Jackson, enough is enough. (click on image to watch him say that famous line)

Enough is Enough! The U.S. Needs Better NWP

In the U.S. we are trying to use less computer resources to do more tasks than the global leaders in numerical weather prediction. (Note: U.S. NWP is done by National Centers for Environmental Prediction's (NCEP) Environmental Modeling Center (EMC)). This chart tells the story:

Courtesy of Bill Lapenta, EMC.

ECMWF does global high resolution and ensemble forecasts, and seasonal climate forecasts. UKMET office also does regional NWP (England is not a big country!) and regional air quality. NCEP does all of this plus much, much more (high resolution rapid update modeling, hurricane modeling, etc.). And NCEP has to deal with prediction over a continental-size country.

If you would expect the U.S. has a lot more computer power to balance all these responsibilities and tasks, you would be very wrong. Right now the U.S. NWS has two IBM supercomputers, each with 4992 processors (IBM Power6 processors). One computer does the operational work, the other is for back up (research and testing runs are done on the back-up). About 70 teraflops (trillion floating points operations per second) for each machine.

NCEP (U.S.) Computer

The European Centre has a newer IBM machine with 8192, much faster, processors that gets 182 terraflops (yes, over twice as fast and with far fewer tasks to do).

The UKMET office, serving a far, far smaller country, has two newer IBM machines, each with 7680 processors for 175 teraflops per machine.

Here is a figure, produced at NCEP that compares the relative computer power of NCEP's machine with the European Centre's. The shading indicates computational activity and the x-axis for each represents a 24-h period. The relative heights allows you to compare computer resources. Not only does the ECMWF have much more computer power, but they are more efficient in using it...packing useful computations into every available minute.

Courtesy of Bill Lapenta, EMC

Recently, NCEP had a request for proposals for a replacement computer system. You may not believe this, but the specifications were ONLY for a system at least equal to the one that have. A report in a computer magazine suggests that perhaps this new system (IBM got the contract) might be slightly less powerful (around 150 terraflops) than one of the UKMET office systems...but that is not known at this point.

The Canadians? They have TWO machines like the European Centre's!

So what kind of system does NCEP require to serve the nation in a reasonable way?

To start, we need to double the resolution of our global model to bring it into line with ECMWF (they are now 15 km global). Such resolution allows the global model to model regional features (such as our mountains). Doubling horizontal resolution requires 8 times more computer power. We need to use better physics (description of things like cloud processes and radiation). Double again. And we need better data assimilation (better use of observations to provide an improved starting point for the model). Double once more. So we need 32 times more computer power for the high-resolution global runs to allow us to catch up with ECMWF. Furthermore, we must do the same thing for the ensembles (running many lower resolution global simulations to get probabilistic information). 32 times more computer resources for that (we can use some of the gaps in the schedule of the high resolution runs to fit some of this in...that is what ECMWF does). There are some potential ways NCEP can work more efficiently as well. Right now NCEP runs our global model out to 384 hours four times a day (every six hours). To many of us this seems excessive, perhaps the longest periods (180hr plus) could be done twice a day. So lets begin with a computer 32 times faster that the current one.

Many workshops and meteorological meetings (such as one on improvements in model physics that was held at NCEP last summer---I was the chair) have made a very strong case that the U.S. requires an ensemble prediction system that runs at 4-km horizontal resolution. The current national ensemble system has a horizontal resolution about 32 km...and NWS plans to get to about 20 km in a few years...both are inadequate. Here is an example of the ensemble output (mean of the ensemble members) for the NWS and UW (4km) ensemble systems: the difference is huge--the NWS system does not even get close to modeling the impacts of the mountains. It is similarly unable to simulate large convective systems.

Current NWS( NCEP) "high resolution" ensembles (32 km)

4 km ensemble mean from UW system

Let me make one thing clear. Probabilistic prediction based on ensemble forecasts and reforecasting (running models back for years to get statistics of performance) is the future of weather prediction. The days of giving a single number for say temperature at day 5 are over. We need to let people know about uncertainty and probabilities. The NWS needs a massive increase of computer power to do this. It lacks this computer power now and does not seem destined to get it soon.

A real champion within NOAA of the need for more computer power is Tom Hamill, an expert on data assimilation and model post-processing. He and colleagues have put together a compelling case for more NWS computer resources for NWP. Read it here.

Back-of-the-envelope calculations indicates that a good first step-- 4km national ensembles--would require about 20,000 processors to do so in a timely manner--but it would revolutionize weather prediction in the U.S., including forecasting convection and in mountainous areas. This high-resolution ensemble effort would meld with data assimilation over the long-term.

And then there is running super-high resolution numerical weather prediction to get fine-scale details right. Here in the NW my group runs a 1.3 km horizontal resolution forecast out twice a day for 48h. Such capability is needed for the entire country. It does not exist now due to inadequate computer resources.

The bottom line is that the NWS numerical modeling effort needs a huge increase of computer power to serve the needs of the country--and the potential impacts would be transformative. We could go from having a third-place effort, which is slipping back into the pack, to a world leader. Furthermore, the added computer power will finally allow NOAA to complete Observing System Simulation Experiments (OSSEs) and Observing System Experiments (OSEs) to make rational decisions about acquisitions of very expensive satellite systems. The fact that this is barely done today is really amazing and a potential waste of hundreds of millions of dollars on unnecessary satellite systems.

But do to so will require a major jump in computational power, a jump our nation can easily afford. I would suggest that NWS's EMC should begin by securing at least a 100,000 processor machine, and down the road something considerably larger. Keep in mind my department has about 1000 processors in our computational clusters, so this is not as large as you think.

For a country with several billion-dollar weather disasters a year, investment in reasonable computer resrouces for NWP is obvious.

The cost? Well, I asked Art Mann of Silicon Mechanics (a really wonderful local vendor of computer clusters) to give me rough quote: using fast AMD chips, you could have such a 100K core machine for 11 million dollars. (this is without any discount!) OK, this is the U.S. government and they like expensive, heavy metal machines....lets go for 25 million dollars. The National Center for Atmospheric Research (NCAR) is getting a new machine with around 75,000 processors and the cost will be around 25-35 million dollars. NCEP will want two machines, so lets budget 60 million dollars. We spend this much money on a single jet fighter, but we can't invest
this amount to greatly improve forecasts and public safety in the U.S.? We have machines far larger than this for breaking codes, doing simulations of thermonuclear explosions, and simulating climate change.

Yes, a lot of money, but I suspect the cost of the machine would be paid back in a few months from improved forecasts. Last year we had quite a few (over ten) billion-dollar storms....imagine the benefits of forecasting even a few of them better. Or the benefits to the wind energy and utility industries, or U.S. aviation, of even modestly improved forecasts. And there is no doubt such computer resources would improve weather prediction. The list of benefits is nearly endless. Recent estimates suggest that normal weather events cost the U.S. economy nearly 1/2 trillion dollars a year. Add to that hurricanes, tornadoes, floods, and other extreme weather. The business case is there.

As someone with an insider's view of the process, it is clear to me that
the current players are not going to move effectively without some
external pressure. In fact, the budgetary pressure on the NWS is very intense right now and they are cutting away muscle and bone at this point (like reducing IT staff in the forecast offices by over 120 people and cutting back on extramural research). I believe it is time for weather sensitive industries and local government, together with t he general public, to let NOAA management and our congressional representatives know that this acute problem needs to be addressed and addressed soon. We are acquiring huge computer resources for climate simulations, but
only a small fraction of that for weather prediction...which can clearly
save lives and help the economy. Enough is enough.

That Wired article was clearly written by somebody who doesn't understand orders of magnitude very well, but it's true there are probably hundreds of thousands of cores around in the spook community to be borrowed.

Are weather models too "synchronous" a computation to be distributed as a grid task? Cliff - I am sure there are people at UW computer science to help you put your model into Boinc, if it makes sense.

In theory, the gov't should be able to do something similar with its existing server farms. But I can certainly see that maybe the task is better suited for a traditional supercomputer.

Do the type of computations required for weather prediction lend themselves to a distributed computing system? A distributed computing model similar to Folding@Home (6 petaFLOPS) and SETI@Home (500 TFLOPS) might be the answer to this problem

dkitch and others asked whether such computations could be distributed...as per SETI. This is very difficult because the problem really needs to be in one machine...a lot of communication between cpus/nodes is needed.....cliff

It's very disappointing to find out that the NWS is so underpowered. This seemed to start during the Reagan administration, with the same people who wanted to eliminate the USGS and NCAR completely, but for goodness' sake, we are home to both IBM and Cray, and given the cost of assembling a 24 core AMD machine that's about 3 MFLOP per core, why can't we have a gigantic cellular computer that could handle 1,000,000 nodes per CPU and have 128K CPU's or something? If anything is parallalizable it's a discrete weather model!

You mention that they all need to run on a single machine, but I'm curious if you could stitch high resolution regional forecasts into a single national model? The UW does Washington at a high resolution. If each state had a single university do the model for their state and then stitch it together for a national forecast? I'm probably over simplifying, not knowing what it takes.

As for comparing the cost to a jet fighter, that is rather disingenuous. Improving the forecasts will not reduce the need for jet fighters. A better comparison is to the costs of the weather disasters you mention and the payback from being able to mitigate that damage in the future. ROI is based on how much you can earn or save based on your investment. That is the argument that should be made.

"What I'm not sure of is whether an investment in NOAA is better than letting private companies fill the void. Government rarely spends the money wisely"

...and this is the type of silly, anti-government attitude that agencies like NOAA & the NWS run into when ask for more resources. The fact is, if you ask the private sector about these issues, they don't want any part of running high resolution NWP models, since it's waaay too expensive an enterprise for them.

----------------------------------

"As for comparing the cost to a jet fighter, that is rather disingenuous. Improving the forecasts will not reduce the need for jet fighters."

The fact is that the USA has at least over twice the number of fixed-winged, manned, combat aircraft than any of our potential rivals or enemies in the world. Mr. Mass' analogy is quite correct indeed!