Podcast: DoE Awards $258 Million for Exascale to U.S. HPC Vendors

Today U.S. Secretary of Energy Rick Perry announced that six leading U.S. technology companies will receive funding from the Department of Energy’s Exascale Computing Project (ECP) as part of its new PathForward program, accelerating the research necessary to deploy the nation’s first exascale supercomputers.

“Continued U.S. leadership in high performance computing is essential to our security, prosperity, and economic competitiveness as a nation,” said Secretary Perry. “These awards will enable leading U.S. technology firms to marshal their formidable skills, expertise, and resources in the global race for the next stage in supercomputing—exascale-capable systems.”

The awardees will receive funding for research and development to maximize the energy efficiency and overall performance of future large-scale supercomputers, which are critical for U.S. leadership in areas such as national security, manufacturing, industrial competitiveness, and energy and earth sciences. The $258 million in funding will be allocated over a three-year contract period, with companies providing additional funding amounting to at least 40 percent of their total project cost, bringing the total investment to at least $430 million.

Paul Messina, Director, Exascale Computing Project

The PathForward program is critical to the ECP’s co-design process, which brings together expertise from diverse sources to address the four key challenges: parallelism, memory and storage, reliability and energy consumption,” ECP Director Paul Messina said. “The work funded by PathForward will include development of innovative memory architectures, higher-speed interconnects, improved reliability systems, and approaches for increasing computing power without prohibitive increases in energy demand. It is essential that private industry play a role in this work going forward: advances in computer hardware and architecture will contribute to meeting all four challenges.”

The Department’s funding for this program is supporting R&D in three areas—hardware technology, software technology, and application development—with the intention of delivering at least one exascale-capable system by 2021.

Exascale systems will be at least 50 times faster than the nation’s most powerful computers today, and global competition for this technological dominance is fierce. While the U.S. has five of the 10 fastest computers in the world, its most powerful — the Titan system at Oak Ridge National Laboratory — ranks third behind two systems in China. However, the U.S. retains global leadership in the actual application of high performance computing to national security, industry, and science.

Trascript of Q&A Session:

Mike Bernhardt: All right. At this point, we have several questions that I’ll pass along now for Paul. This one comes from Rick Merritt at EE Times. And the question is, “How would you characterize supercomputer leadership of the US versus China? Specifically, does the Exascale project now aim to deliver a single system in 2021 at 30 megawatts to catch up with China?”

Paul Messina: Thank you Mike. Well on the last part, our current plan is to have delivery of at least one, not necessarily one system delivered in 2021. I would not characterize it as to catch up with China. We do know of course that China has indicated they plan to have at least one Exascale system in 2020. But we for example do not know whether that system in China will be a peak exaFLOP system versus what we’re planning to deliver through the project and associated facilities activities is what we call Exascale capability, which is measured in actual applications performance as opposed to peak speed. But I guess the concise answer to your question is at least one system in 2021 and another one, if not in 2021, in 2022.

Mike Bernhardt: So we have one from Pat Thibodeaux. And Pat asks, “What will the firms be required to deliver at the end of three years? So, what is the time frame for a fully functioning Exascale system? The plan is to build two? And how much money will you need to reach the final project goals?”

Paul Messina: So the firms will be required to deliver final reports on the outcomes of their research. But I think it’s very important to note that as indicated, this is a co-design effort with other activities and we will be having frequent formally scheduled intermediate reviews every few months of the activities. The funding for each of the vendors is based on specific work packages. And so as each work package is delivered, which would be an investigation on the particular aspect of their research, as it’s delivered, it will be evaluated. In between times, we will have progress reports and applications developers, assistant software developers, software library developers for example will participate in those evaluations. So it isn’t that we send the money, wait three years, and get an answer. And that’s very important because through co-design, the firms involved will get inside into our needs and can adjust to the requirements as much as feasible and at the same time, the applications developers and the software developers in the project will have early information about potential aspects of Exascale systems and be thinking about those.

Now for the second question, the time frame for a fully functional Exascale system, as I indicated in the answer to the first question, we, the Department of Energy plans to deliver at least two systems. At least one in 2021, possibly two, and others perhaps later. And by fully functioning, I would say, if it’s delivered in 2021, either one or both, historically, it takes a few months for first of a kind innovative systems to be fully functioning. So if by that you mean production, it would probably be 2022. If by fully functioning, you mean that the whole system is put together and it can run applications, but it’s not in full production mode, that could be as early as 2021 itself.

Now the money to reach the final project goals, well of course that depends partly on the results from this forward funding as you heard several of the vendors’ statements, bringing down the cost is part of the goal. So the systems will be purchased, not by the project, but by the facilities that normally house the leading computing resources at the Department of Energy labs such as Oak Ridge, Argonne, Livermore, Los Alamos, Berkeley. And it would be those facilities that would be purchasing the systems and how much it will cost. Well hopefully, it’d be reduced a bit by this forward funding. So I can’t provide a number because it’s a moving target for the entire project, including the purchase of the systems.

Mike Bernhardt: Okay. We have another one from Rick Merritt. “Can you get total expected government and industry investments expected over the life of the program?”

Paul Messina: The life of the program being the Path Forward program I presume. So the total investment is estimated at $430 million, which is the 258 million that the DOE is contributing and the rest of it is what the companies are contributing, through what is a fairly common arrangement that DOE has for contracts of this type.

Addison Snell: Can anyone talk specifically to what work is being done on software to enable applications or modeling and simulation at scale? Co-design did come up a couple of times, but I’d like to get a sense of, is their work on programming models or basically what work is being done for the software system?

Paul Messina: Addison, are you referring to work done as part of the Path Forward awards or as part of the Exascale Computing Project or the other parts–?

Addison Snell: I was thinking about the Path Forward awards in particular.

Paul Messina: So the majority of the work under the Path Forward is not on the software, but I anticipate that some software will be developed that will be needed to get a handle on the things like the programmability and the efficiency of actually using such systems. But the project itself, the Exascale Computing Project is investing heavily in software technology. And as I mentioned, through co-design, certainly, there will be a lot of interaction with the companies on that, on the software aspect in the context of Path Forward.

Addison Snell: Thank you.

Mike Bernhardt: Now we’ve had one question come in asking for if it’s possible to get more specifics on the projects that each vendors undertaking. I think that would best be handled off of this call and dealing directly with each vendor. You can talk at that level about what they are specifically working on and what they’re willing to share.

Let’s see, we have a questions from insideHPC, which says, “What is the status of the Aurora system that was supposed to come to Argonne?”

Paul Messina: I believe that the Aurora system contract is being reviewed for potential changes that would result in a subsequent system in a different time frame from the original Aurora system. But since that’s just early negotiations, I don’t think we can be anymore specific on that.

Mike Bernhardt: Okay. So Pat Thibodeaux raises the question, “Though labs will order the systems, so is that a part or separate from this funding?”

Paul Messina: Yes. Yeah, our funding situation can be a little bit difficult to describe. So there’s the Excascale Computing Project of which Path Forward is definitely a part because Path Forward is a program within the Excascale Computing Project. However, something for the acquisition and the siding of the systems will – as is usually the case – being within the budgets of the facilities that I mentioned earlier as examples. So those facilities normally buy systems every few years. They – through their budgets – do the side preparation usually. Also, have nonrecurring engineering contracts and then there’s a bill contract for the system. So the facilities will be doing that with the facility’s budgets.

Part of that, you can think of us in the context of Exascale as part of the Exascale Computing initiative, which is broader than the Exascale Computing Project. So yeah, so the facilities buy the systems and house them and that is separate from the Exascale Computing Project. The Path Forward program is a substantial part of the Exascale Computing Project, but as you may be aware, other parts are system software technology, which involves the software from system software, all the way up through libraries, programming models and so on. And in particular, applications development, which has a very large part of the budget as is appropriate to develop applications that address a lot of critical mission needs.

Mike Bernhardt: Here’s one from Tiffany Trader at HPCwire who asked, “Are you planning a second Path Forward RFP for the 2021 systems”?

Paul Messina: At present, we are not, Tiffany.

Mike Bernhardt: All right. Well, we’re right up about to the 2:45 time here. So Paul, did you want to make any closing remarks?

Paul Messina: I don’t have anything too much to say other than I appreciate so many of the community that reports on these activities to be present and for the questions. And certainly, looking very much forward to working with six leading companies on getting us to Exascale in the best possible way. Thank you.

Resource Links:

Latest Video

Industry Perspectives

Often, it’s not enough to parallelize and vectorize an application to get the best performance. You also need to take a deep dive into how the application is accessing memory to find and eliminate bottlenecks in the code that could ultimately be limiting performance. Intel Advisor, a component of both Intel Parallel Studio XE and Intel System Studio, can help you identify and diagnose memory performance issues, and suggest strategies to improve the efficiency of your code. [READ MORE…]

White Papers

As the first to 40Gb/s, 56Gb/s and 100Gb/s bandwidth, Mellanox has both boosted data center and cloud performance and improved return on investment at a pace that exceeds its own roadmap. To that end, Mellanox has now announced that it is the first company to enable 200Gb/s data speeds with Mellanox Quantum switches, ConnectX-6 adapters, and LinkX cables combining for an end-to end 200G HDR InfiniBand solution in 2018. Download the new report, courtesy of Mellanox Technologies, to lean more about 200G HDR InfiniBand solutions.