The Coral Project: How History will Guide us to Exascale

In this Industry Perspective, Tom Wilkie from Scientific Computing World looks at what history has to teach us about reaching the daunting goals of exascale computing.

Tom Wilkie, Scientific Computing World

Prompted by discussions at the International Supercomputing Conference (ISC’14), held in Leipzig at the end of June, Tom Wilkie considers how some of the ideas for getting us to exascale have a long history.

Later this year, the US Government will announce three of the most significant procurement contracts for supercomputing in recent times. The technical objective will be to deliver machines able to perform up to 200 petaflops peak performance — about 10 times faster than today’s most powerful high-performance computing systems — and to demonstrate technological solutions that will open the way to exascale computing.

But the so-called ‘Coral‘ project is also a classic example of how the US Government can push technological development in the direction it desires by means of its procurement policy. As discussed in previous articles in this series, there are (at least) three ways in which Governments are forcing the pace of technological development. One is by international research cooperation – usually on projects that do not have an immediate commercial product as their end-goal. A second is by funding commercial companies to conduct technological research – and thus subsidising, at taxpayers’ expense, the creation or strengthening of technical expertise within commercial companies. The third is subsidy by the back door, through military and civil procurement contracts.

The Coral project has elements of the latter two strategies. Significantly, the project includes a ‘deliberate and strategic’ investment plan that will pay for ‘non-recurring engineering’ (NRE) research and development as part of the contracts that will be awarded.

Coral is an acronym for the ‘Collaboration of Oak Ridge, Argonne and Livermore’ – US National Laboratories that have civil science (Argonne and Oak Ridge) and nuclear weapons work (Livermore) as their mission. It is intended that the new systems will start operating around 2017 or 2018 and will be used to support the research programmes at the respective institutions.

At the Lawrence Livermore National Laboratory, for example, the system will be used for advanced simulation of how nuclear warheads in the stockpile are ageing. The US has a so-called ‘stewardship’ programme, intended to monitor deterioration of weapons in the stockpile and thus prevent them from exploding prematurely and also to ensure that they will explode if they are ever used in warfare.

However, Coral is also an important step in the development of technology for exascale systems. The three laboratories issued a joint Request for Proposals for the Coral procurement on 6 January this year, with responses submitted by mid-February. The intention is that the Coral partners will select two different vendors to procure three systems, two from one vendor and one from the other.

Livermore’s system, to be called Sierra, will be tailored to support applications critical to stockpile stewardship. Oak Ridge and Argonne will employ systems that meet the needs of their civilian science missions, as part of the Advanced Scientific Computing Research (ASCR) programme financed by the US Department of Energy (DoE).

According to Bronis de Supinski, Chief Technology Officer for Livermore Computing, the ‘non-recurring engineering’ (NRE) contracts will allow for earlier optimization of the applications that will run on the new systems and thus “enhance what we would otherwise get. Vendors work with our application teams, transferring knowledge of system architecture to our applications.”

The R&D contracts also help address the technological challenges of developing new systems, such as containing power requirements; ensuring memory bandwidth is sufficient to give scientists the full benefit of the machine’s computing power; and making sure the system is reliable and resilient given its many components. The selected vendors will build small prototype systems that will be used to determine the final decision on building the full systems.

Use of procurement policies to push technology development in a particular direction has been a consistent – and very successful — strand in US Government policy since the end of the Second World War. Nearly two decades ago in his book Knowing Machines, Donald MacKenzie, the distinguished sociologist of science based at Edinburgh University, showed how the very idea of a supercomputer had been shaped by US Government policy.

He concluded that: “Without the [US National] weapons laboratories there would have been significantly less emphasis on floating-point-arithmetic speed as a criterion of computer performance.” Had it been left solely to the market, vendors would have been more interested in catering to the requirements of business users (and other US agencies such as the cryptanalysts at the US National Security Agency) who were much less interested in sheer speed as measured by Flops and this would have led to a ‘subtly different’ definition of a supercomputer, he pointed out.

The massive purchasing power of the laboratories was critical, he argued, in shaping the direction of supercomputer development: “There were other people – particularly weather forecasters and some engineers and academic scientists – for whom floating point speed was key, but they lacked the sheer concentrated purchasing clout.”

However, MacKenzie also identified limits to the clout of the weapons laboratories. On a technical level, they had had two different types of computational tasks and it was not viable to develop specialized machines for each. Seymour Cray, on the other hand, developed his more generalized – and successful — line of supercomputers by listening to other potential customers as well as the weapons laboratories. According to MacKenzie, ultimately it was not possible “to satisfy the particular needs of Los Alamos and Livermore if those needs conflicted with the needs of other users and if significant costs were involved in meeting them. The anonymous logic of the market has come to shape supercomputing.”

A similar recognition of the limits of Government intervention came in a remark by Bill Harrod, program manager for Advanced Scientific Computing Research at the DoE, to the international cooperation session at ISC’14 at the end of June: “Our strategy is to invest in vendors to complete technology development projects towards exascale.” But, he added: “We will not have control over what the companies want to ship.”

‘Nuclear Weapons Laboratories and the Development of Supercomputing’, in Knowing Machines by Donald MacKenzie, The MIT press 1996.

Resource Links:

Latest Video

Industry Perspectives

In this Nvidia podcast, Bryan Catanzaro from Baidu describes how machines with Deep Learning capabilities are now better at recognizing objects in images than humans. “AI gets better and better until it kind of disappears into the background,” says Catanzaro — NVIDIA’s head of applied deep learning research — in conversation with host Michael Copeland on this week’s edition of the new AI Podcast. “Once you stop noticing that it’s there because it works so well — that’s when it’s really landed.” [Read More...]

White Papers

This white paper reviews common HPC-environment challenges and outlines solutions that can help IT professionals deliver best-in-class HPC cloud solutions—without undue stress and organizational chaos.