isc12 – HPCwirehttps://www.hpcwire.com
Since 1987 - Covering the Fastest Computers in the World and the People Who Run ThemTue, 20 Mar 2018 00:17:45 +0000en-UShourly1https://wordpress.org/?v=4.9.460365857Exascale Computing: The View from Argonnehttps://www.hpcwire.com/2012/06/21/exascale_computing_the_view_from_argonne/?utm_source=rss&utm_medium=rss&utm_campaign=exascale_computing_the_view_from_argonne
https://www.hpcwire.com/2012/06/21/exascale_computing_the_view_from_argonne/#respondThu, 21 Jun 2012 07:00:00 +0000http://www.hpcwire.com/?p=4433As a result of the dissolution of DARPA's UHPC program, the driving force behind exascale research in the US now resides with the Department of Energy, which has embarked upon a program to help develop this technology. To get a lab-centric view of the path to exascale, HPCwire asked a three of the top directors at Argonne National Laboratory -- Rick Stevens, Michael Papka, and Marc Snir -- to provide some context for the challenges and benefits of developing these extreme scale systems.

]]>The US Department of Energy (DOE) will be the most likely recipient of the initial crop exascale supercomputers in the country. That would certainly come as no surprise, since according the latest TOP500 rankings, the top three US machines all live at DOE labs – Sequoia at Lawrence Livermore, Mira at Argonne, and Jaguar at Oak Ridge.

These exascale machines will be 100 times as powerful as the top systems today, but will have to be something beyond a mere multiplication of today’s technology. While the first exascale supercomputers are still several years away, much thought has already gone into how they are to be designed and used. As a result of the dissolution of DARPA’s UHPC program, the driving force behind exascale research in the US now resides with the Department of Energy, which has embarked upon a program to help develop this technology.

To get a lab-centric view of the path to exascale, HPCwire asked a three of the top directors at Argonne National Laboratory — Rick Stevens, Michael Papka, and Marc Snir — to provide some context for the challenges and benefits of developing these extreme scale systems. Rick Stevens is Argonne’s Associate Laboratory Director of the Computing, Environment, and Life Sciences Directorate; Michael Papka is the Deputy Associate Laboratory Director of the Computing, Environment, and Life Sciences Directorate and Director of the Argonne Leadership Computing Facility (ALCF); and Marc Snir is the Director of the Mathematics and Computer Science (MCS) Division at Argonne. Here’s what they had to say:

HPCwire: What does the prospect of having exascale supercomputing mean for Argonne? What kinds of applications or application fidelity, will it enable that cannot be run with today’s petascale machines?

Rick Stevens: The series of DOE-sponsored workshops on exascale challenges has identified many science problems that need an exascale or beyond computing capability to solve. For example, we want to use first principles to design new materials that will enable a 500-mile electric car battery pack. We want to build end-to-end simulations of advanced nuclear reactors that are modular, safe and affordable. We want to add full atmospheric chemistry and microbial processes to climate models and to increase the resolution of climate models to get at detailed regional impacts. We want to model controls for an electric grid that has 30 percent renewable generation and smart consumers. In basic science we would like to study dark matter and dark energy by building high-resolution cosmological simulations to interpret next generation observations. All of these require machines that have more than a hundred times the processing power of current supercomputers.

Michael Papka: For Argonne, having an exascale machine means the next progression in computing resources at the lab. We have successfully housed and managed a steady succession of first-generation and otherwise groundbreaking resources over the years, and we hope this tradition continues.

As for the kinds of applications exascale would enable, expect to see more multiscale codes and dramatic increases in both the spatial and temporal dimensions. Biologists could model cells and organisms and study their evolution at a meaningful scale. Climate scientists could run highly accurate predictive models of droughts at local and regional scales. Examples like this exist in nearly every scientific field.

HPCwire: The first exascale systems will certainly be expensive to buy and, given the 20 or so megawatts power target, even more expensive to run over the machine’s lifetime – almost certainly more expensive that the petascale systems of today. How is the DOE going to rationalize spending increasing amounts of money to fund the work for essentially a handful of applications? Do you think it will mean there will be fewer top systems across the DOE than there have been in the past?

Marc Snir: There is a clear need to have open science systems as well as NNSA systems. And though power is more expensive and the purchase price may be higher, amortization is spread across more years as Moore’s Law slows down. We already went from doubling processor complexity every two years to doubling it every three. This may also enable better options for mid-life upgrades. A supercomputer is still cheap compared to a major experimental facility, and yields a broader range of scientific discoveries.

Stevens: DOE will need a mix of capability systems — exascale and beyond — as well as many capacity systems to serve the needs of DOE science and engineering. DOE will also need systems to handle increasing amounts of data and more sophisticated data analysis methods under development. The total cost, acquisition and operating will be bounded by the investments DOE is allowed to make in science and national defense. The push towards exascale systems will make all computers more power efficient and therefore more affordable.

Papka: The outcome of the science is the important component. Research being done on DOE open science supercomputers today could lead to everything from more environmentally-friendly concrete to safer nuclear reactor designs. There is no real way to predict or quantify the advancements that any specific scientific discovery will have. An algorithm developed today may enable a piece of code that runs a simulation that leads to a cure to cancer. The investment has to be made.

HPCwire: So does anyone at Argonne, or the DOE in general, believe money would be better spent on more petascale systems and fewer exascale systems because of escalating power costs and perhaps an anticipated dearth of applications that can make use of such systems?

Snir: It is always possible to partition a larger machine; however, it is impossible to assemble an exascale machine by hooking together many petascale machines.

The multiple DOE studies on exascale applications in 2008 and 2009 have clearly shown that progress in many application domains depends on the availability of exascale systems. While a jump in a factor of 1,000 in performance may seem huge, it is actually quite modest from the viewpoint of applications. In a 3D mesh code, such as used for representing the atmosphere in a climate simulation, this increase in performance enables refining meshes by a factor of less than 6(4√ 1000 ), since the time scale needs to be equally refined. This assumes no other changes. In fact, many other changes are needed, when precision increases, that is, to better represent clouds, or to do ensemble runs in order to quantify uncertainty.

It is sometimes claimed that many petascale systems may be used more efficiently than one exascale system since ensemble runs are “embarrassingly parallel” and can be executed on distinct systems. However, this is a very inefficient way of running ensembles. One would input all the initialization data many times, and one would not take advantage of more efficient methods for sampling the probability space.

Another common claim heard is that “big data” will replace “big computation.” Nothing could be further from the truth. As we collect increasingly large amounts of data through better telescopes, better satellite imagery, and better experimental facilities, we need increasingly powerful simulation capabilities. You are surely familiar with the aphorism: “All science is either physics or stamp collecting.” What I think Ernest Rutherford meant by that is that scientific progress requires the matching of deductions made from scientific hypotheses to experimental evidence. A scientific pursuit that only involves observation is “stamp collection.”

As we study increasingly complex systems, this matching of hypothesis to evidence requires increasingly complex simulations. Consider, for example, climate evolution. A climate model may include tens of equations and detailed description of initial conditions. We validate the model by matching its predictions to past observations. This match requires detailed simulations.

The complexity of these simulations increases rapidly as we refine our models and increase resolution. More detailed observations are useful only to the extent they enable better calibration of the climate models; this, in turn, requires a more detailed model, hence a more expensive simulation. The same phenomenon occurs in one discipline after another.

It is also important to remember that research on exascale will be hugely beneficial to petascale computing. If an exascale consumes 20 megawatts, then a petascale system will consume less than 20 kilowatts and become available at the departmental level. If good software solutions for resilience are developed as part of exascale research, then it becomes possible to build petascale computers out of less reliable and much cheaper components.

Papka: As we transition to the exascale era the hierarchy of systems will largely remain intact, so the advances needed for exascale will influence petascale resources and so on down through the computing space. Exascale resources will be required to tackle the next generation of computational problems.

HPCwire: How is the lab preparing for these future systems? And given the hardware architecture and programming models have not been fully fleshed out, how deeply can this preparation go?

Snir: Exascale systems will be deployed, at best, a decade from now – later if funding is not provided for the required research and development activities. Therefore, exascale is, at this stage, a research problem. The lab is heavily involved in exascale research, from architecture, through operating systems, runtime, storage, languages and libraries, to algorithms and application codes.

This research is focused in Argonne’s Mathematics and Computer Science division, which works closely with technical and research staff at the Argonne Leadership Computing Facility. Both belong to the directorate headed by Rick Stevens. Technology developed in MCS is now being deployed on Mira, our Blue Gene/Q platform. The same will most likely be repeated in the exascale timeframe.

The strong involvement of Argonne in exascale research increases our ability to predict the likely technology evolution and prepare for it. It increases our confidence that exascale is a reachable target a decade from now. Preparations will become more concrete 4 to 6 years from now, as research moves to development, and as exascale becomes the next procurement target.

Stevens: While the precise programming models are yet to be determined, we do know that data motion is the thing we have to reduce to enable lower power consumption, and that data locality (both vertically in the memory hierarchy and horizontally in the internode sense) will need to be carefully managed and improved.

Thus we can start today to think about new algorithms that will be “exascale ready” and we can build co-design teams that bring together computer scientists, mathematicians and scientific domain experts to begin the process of thinking together how to solve these problems. We can also work with existing applications communities to help them make smart choices about rewriting their codes for near term opportunities such that they will not have to throw out their codes and start again for exascale systems.

Papka: We learn from each system we use, and continue to collaborate with our research colleagues in industry. Argonne along with Lawrence Livermore National Laboratory partnered with IBM in the design of the Blue Gene P and Q. Argonne has partnerships with other leading HPC vendors too, and I’m confident that these relationships with industry will grow as we move toward exascale.

The key is to stay connected and move forward with an open mind. The ALCF has developed a suite of micro kernels and mini- and full-science DOE and HPC applications that allow us to study performance on both physical and virtual future-generation hardware.

To address future programming model uncertainty,Argonne is actively involved in defining future standards. We are, of course, very involved in the MPI forum, as well as in the OpenMP forum for CPUs and accelerators. We have been developing benchmarks to study performance and measure characteristics of programming runtime systems and advanced and experimental features of modern HPC architectures.

HPCwire: What type of architecture is Argonne expecting for its first exascale system — a homogeneous Blue Gene-like system, a heterogeneous CPU+accelerator-based machine, or something else entirely?

Snir: It is, of course, hard to predict how a top supercomputer will look ten years from now. There is a general expectation that future high-end systems will use multiple core types that are specialized for different types of computation. One could have, for example, cores that can handle asynchronous events efficiently, such as OS or runtime requests, and cores that are optimized for deep floating point pipelines. One could have more types of cores, with only a subset of the cores active at any time, as proposed by Andrew Chien and others.

There is also a general assumption that these cores will be tightly coupled in one multichip module with shared-memory type communication across cores, rather than having an accelerator on an I/O bus. Intel, AMD and NVIDIA all have or have announced products of this type. Both heterogeneity and tight coupling at the node level seems to be necessary in order to improve power consumption. The tighter integration will facilitate finer grain tasking across heterogeneous cores. Therefore, one will be able to largely handle core heterogeneity at the compiler and runtime level, rather than the application level.

The execution model of an exascale machine should be at a higher level – dynamic tasking across cores and nodes – at a level where the specific architecture of the different cores is largely hidden; same way as the specific architecture of a core, for example, x86 versus Power is largely hidden from the execution model viewed by programmers and most software layers now. Therefore, we expect that the current dichotomy between monolithic systems and CPU-plus-accelerator-based systems will not be meaningful ten years from now.

Stevens: To add to Marc’s comments, we believe there will be additional capabilities that some systems might have in the next ten years. One strategy for reducing power is to move compute elements closer to the memory. This could mean that new memory designs will have programmable logic close to the memory such that many types of operations could be offloaded from the traditional cores to the new “smart memory” systems.

Similar ideas might apply to the storage systems, where operations that now require moving data from disk to RAM to CPU and back again might be carried out in “smart storage.”

Finally, while current large-scale systems have occasionally put logic into the interconnection network to enable things like global reductions to be executed without using the CPU functional units, we could imagine that future systems might have a fair amount of computing capability in the network fabric again to try to reduce the need to move data more than necessary.

I think we have learned that tightly integrated systems like Blue Gene have certain advantages. Fewer types of parts, lowest power consumption in their class, and very high metrics such as bisection bandwidth relative to compute performance, which let them perform extremely well on benchmarks like Graph 500 and Green500. They are also highly reliable. The challenge will be to see if in the future we can get any systems that combine the strengths needed to be affordable, reliable, programmable, and lower power consumption.

HPCwire: How about the programming model? Will it be MPI+X, something more exotic, or both?

Snir: Both. It will be necessary to run current codes on a future exascale machine – too many lines of code would be wasted, otherwise. Of course, the execution model of MPI+X may be quite different in ten years than it is now: MPI processes could be much lighter-weight and migratable, the MPI library could be compiled and/or accelerated with suitable hardware, etc.

On the other hand, it is not clear that we have an X that can scale to thousands of threads, nor do we know how an MPI process can support such heavy multithreading. It is clear, however, that running many MPI processes on each node is wasteful. It is also still unclear how current programming models provide resilience, and help reduce energy consumption. We do know that using two or three programming models simultaneously is hard.

Research on new programming models, and on mechanisms that facilitate the porting of existing code to new programming models is essential. Such research, if pursued diligently, can have a significant impact ten years from now.

Our research focus in this area is to provide a deeper stack of programming models, from DSLs to low-level programming models, thus enabling different programmers to work at different levels of abstraction; to support automatic translation of code from one level to the next lower level, but ensure that a programmer can interact with the translator, so as to guide its decision; to provide programming models that largely hide heterogeneity – both the distinction between different types of cores and the distinction between different communication mechanisms, that is, shared memory versus message passing; to provide programming notations that facilitate error isolation and thus enable local recovery from failures; and to provide a runtime that is much more dynamic that currently available, in order to cope with a hardware that continuously change, due to power management and to frequent failures.

Stevens: An interesting question in programming models is if we will get an X or perhaps a Y that integrates “data” into the programming model — so we have MPI + X for simulation and MPI + Y for data intensive — such that we can move smoothly to a new set of programming models that, while they retain continuity with existing MPI codes and can treat them as a subset, will provide fundamentally more power to developers targeting future machines.

Ideally, of course, we would have one programming notation that is expressive for the applications, or a good target to compile domain specific languages too, and at the same time can be effectively mapped onto a high-performance execution model and ultimately real hardware. The simpler we can make the X’s or Y’s, the better for the community.

A big concern is that some in the community might be assuming that GPUs are the future and waste considerable time trying to develop GPU-specific codes which might be useful in the near-term but probably not in the long-term for the reasons already articulated. That would suggest that X is probably not something like CUDA or OpenCL.

HPCwire: The DOE exascale effort appears to have settled on co-design as the focus of the development approach. Why was this approach undertaken and what do you think its prospects are for developing workable exascale systems?

Papka: It’s extremely important that the delivered exascale resources meet the needs of the domain scientists and their applications; therefore, effective collaboration with system vendors is crucial. The collaboration between Argonne,Livermore, and IBM that produced the Blue Gene series of machines is a great example of co-design.

In addition to discussing our system needs, we as the end users know the types of DOE-relevant applications that both labs would be running on the resource. Co-design works, but requires lots more communication and continued refinement of ideas among a larger-than-normal group of stakeholders.

Snir: The current structure of the software and hardware stack of supercomputers is more due to historical accidents than to principled design. For example, the use of a full-bodied OS on each node is due to the fact that current supercomputers evolved from server farms and clusters. A clean sheet design would never have mapped tightly coupled applications atop a loosely coupled, distributed OS.

The incremental, ad-hoc evolution of supercomputing technology may have reduced the incremental development cost of each successive generation, but has also created systems that are increasingly inefficient in their use of power and transistor budgets and increasingly complex and error-prone. Many of us believe that “business as usual” is reaching the end of its useful life.

The challenges of exascale will require significant changes both in the underlying hardware architecture and in the many layers of software above it. “Local optimizations,” whereby one layer is changed with no interaction with the other layers, are not likely to lead to a globally optimal solution. This means that one need to consider jointly the many layers that define the architecture of current supercomputers. This is the essence of co-design.

While current co-design centers are focused on one aspect of co-design, namely the co-evolution of hardware and applications, co-design is likely to become increasingly prevalent at all levels. For example, co-design of hardware, runtime, and compilers. This is not a new idea: the “RISC revolution” entailed hardware and compiler co-design. Whenever one needs to effect a significant change in the capabilities of a system, then it becomes necessary to reconsider the functionality of its components and their relations.

The supercomputer industry is also going through a “co-design” stage, as shown by the sale by Cray to Intel of interconnect technology. The division of labor between various technology providers and integrators ten years from now could be quite different than it is now. Consequently, the definition of the subsystems that compose a supercomputer and of the interfaces across subsystem boundaries could change quite significantly.

Stevens: I believe that we will not reach exascale in the near term without an aggressive co-design process that makes visible to the whole team the costs and benefits of each set of decisions on the architecture, software stack, and algorithms. In the past it was typically the case that architects could use rules of thumb from broad classes of applications or benchmarks to resolve design choices.

However many of the tradeoffs in exascale design are likely to be so dramatic that they need to be accompanied by an explicit agreement between the parties that they can work within the resulting design space and avoid producing machines that might technically meet some exascale objective but be effectively useless to real applications.

]]>https://www.hpcwire.com/2012/06/21/exascale_computing_the_view_from_argonne/feed/04433HPC and the Spirit of St. Louishttps://www.hpcwire.com/2012/06/20/hpc_and_the_spirit_of_st-_louis/?utm_source=rss&utm_medium=rss&utm_campaign=hpc_and_the_spirit_of_st-_louis
https://www.hpcwire.com/2012/06/20/hpc_and_the_spirit_of_st-_louis/#respondWed, 20 Jun 2012 07:00:00 +0000http://www.hpcwire.com/?p=4434<img style="float: left;" src="http://media2.hpcwire.com/hpcwire/spirit_of_st_louis_small.jpg" alt="" width="75" height="75" />Every year, as the International Supercomputing Conference in Germany approaches, our good friends here at HPCwire invite me to reflect on the trends of the past 12 months, not so much to provide a potentially tedious list of specific events, product deliveries, and TOP500 mantras but rather to convey a personal sense of what it all adds up to and possibly means for the future of HPC.

]]>Every year, as the International Supercomputing Conference in Germany approaches, our good friends here at HPCwire invite me to reflect on the trends of the past 12 months, not so much to provide a potentially tedious list of specific events, product deliveries, and TOP500 mantras but rather to convey a personal sense of what it all adds up to and possibly means for the future of HPC.

This year, to do so, is both easy and difficult. As a field, we are at an inflection point marked by significant progress and innovation in form and method, while at the same time we are confronted by uncertainty at a level that is at least uncomfortable for our system providers and possibly disruptive. There is certainly more contention about degree and direction of product and research investment, both within the US and internationally.

HPC has certainly entered a period of diversity far different than a decade ago, and that is not simply a function of the more than two orders of magnitude of Linpack performance in a time of Pax MPI. The easy part is to recite the buzz words of the year: “GPU”, “big data”, “clouds,” and “exascale.” If you are on one of these trains, then according to popular belief, you are on the fast track.

The mad dash to flops through the means of cramming as many ALUs as possible in the dense (and successful) form factor of a GPU, sometimes referred to as “accelerators,” has pushed passed the tipping point, with many, but not all major new installations incorporating these flops multipliers in their arsenal on the field of big iron.

Both NVIDIA and AMD are providing the punch in heterogeneity, although with vary different architectures. It’s actually interesting to watch NVIDIA on the wrong side of the PCI bus move ARM into their modules. AMD is moving their accelerator module into, or at least closer to, their multicore array. Choose your own benchmark (you actually should), but for some, the AMD strategy appears to be working, while the NVIDIA offering is clearly in the lead.

In the US, new big systems like Titan at Oak Ridge and Blue Waters at the University of Illinois are betting on this, and preparing to break through 10 petaflops based on Cray’s latest supercomputing offerings. Both China, with Tianhe-1A, and Japan, with TSUBAME 2, have taken a similar path, each with their own novel contributions.

But there are exceptions, even at the top end. Kei (K) in Kobe, the fastest machine at the time of this writing, has a more tightly integrated architecture provided by Fujitsu as it delivers about 10 petaflops and an array of IBM Blue Gene systems are banking on millions of lighter weight cores in a homogeneous system architecture to deliver an easier-to-program, and therefore more general class of computer, with lower power. (Note the GPUs are good on power as well.).

But programming remains a challenge, and if that is not hard enough, portability is even trickier, especially performance and scalability portability. There are on the order of 50,000-plus CUDA programmers but that does not mean the programming of large scalable systems incorporating GPUs is solved. OpenCL, a community-wide effort to provide an open programming methodology and one that addresses the problems somewhat more broadly, is in work and is attracting a growing body of users. OpenACC is an inchoate programming formalism with broader goals and an OpenMP-like touch and feel.

Many assert that we are looking at the system/programming family of the future. Others (and I’m among them) think it is a transitory phase, which will evolve into something as yet undefined. At least one heavy hitter, Intel, is betting on something all together different; their early MIC chip that defines a new manycore socket exhibiting homogeneity, reduced power, and generality. Clearly, the HPC community is not of a uniform opinion.

A very constructive movement that has gained momentum over the last year in the field of HPC is dubbed “big data.” In science and engineering more and more problems are challenged by the management, processing, and communication of potentially enormous amounts of associated data, whether observed by sensors or derived through simulation. The world’s largest telescopes, LIGO (Laser Interferometric Gravitational Observatory), and of course the LHC (Large Hadron Collider at CERN) are all examples of on-going experiments that generate constant streams of data that have to be dealt with. But biology and medical science also create an ever-growing body of data where cross correlations and data mining becomes an increasing challenge.

Storage capacity is only the beginning of the daunting problems confronting big data science. Communication bandwidth, latency, and reliability for data integrity, as well as power and cost are now and at an increasing pace continue to dominate big data science. Fortunately, unlike some other aspects of mostly flops-intense scientific computing, help will come from industry. This is because big data may generate big profits.

The needs of science in this realm are also manifest in the commercial space from large relational databases, through inventory and sales management, to social networks and search engines. These and other markets will drive technology advancement by the vendors that should have substantial impact on the science domain as well. But over-exuberance in our field is abundant and there are some well-intentioned practitioners in the big data arena who assert that this is THE problem in scientific computing. My message to them is: there are enough problems in HPC to go around.

Of course, according to some, the answer to the question of where to put all that data, or for that matter, where to process it (or any other kind of computing one might need to do) is obvious: it’s the cloud! Well maybe.

The value of clouds or “The Cloud” — I don’t know which — is real, permitting shared environments, data sources, services etc. among multiple people or communities and among the multiple platforms of a single individual. This is a rapidly moving capability and interface the full societal impact of which is probably unpredictable even to the most visionary among us but can be anticipating to be enormous and far reaching.

But for HPC, the utility of clouds in the future is, well, foggy. There are some sweet spots. Storage of data, larger than easily managed by a modest department, but smaller than some horrific size, is likely. The problem with ultra-large data sets is that they have to get moved. If they accrue slowly and are only lightly sampled, this can work. But if the entire data set has to be processed by local computing resources, then the intervening bandwidth provided by the internet simply may not be adequate.

On the computing side, there is an attraction to amortizing the cost and administration of a large array of computing resources across many users. Indeed, the accessibility of a system of very large scale that could not be acquired by any but a few institutions is a potential breakthrough in operational modality. But HPC reflects different forms of usage. The clouds can supply “throughput computing” and a significant percentage of the HPC workload is of this kind. Indeed, pools of resources including workstation farms across academic campuses and else where have been widely employed over decades.

But HPC has many computational challenges, single programs, that are tightly coupled and for which much of the programming challenge is performance tuning. Latencies have to be low, overheads even lower, and cost of information flow understood and stable. Clouds provide none of this in very large configurations. In some sense, this is their strength; successive requests are serviced by different configurations of available resources on demand. But for very large complex problems, they are not suitable, or at least less than optimal. Success of the cloud will require that we benefit from its advantages but not over-hype it and ultimately become disappointed.

People love milestones to mark progress and not just HPC people. In the last century two such captured the imagination of the world. One that I lived through was getting to the Moon with “one small step” provided Neil Armstrong in 1969. But another was a flight non-stop from New York to Paris by Charles Lindbergh in 1927 to claim the Raymond Orteig Prize. Today, the HPC community has self-defined our next milestone as exascale.

Over the last year, this objective has been codified by the US and internationally through meetings, plans, and programs. One international forum, the International Exascale Software Project (IESP), was completed after more than two years with its last of eight meetings in Kobe, Japan. The European Exascale Software Initiative (EESI) was also completed and is now succeeded by EESI-2. Plans are being considered in China, Japan, and Russia for their own path to exascale computing. In the US, the Department of Energy has launched at least three programs to develop a sufficient understanding and capability not just to get to exaflops, but to derive the right kind of exascale systems (hardware and software) and programming methodologies.

The Predictive Science Academic Alliance Program (PSAAP II) has just accepted proposals for exascale application development and system software. The co-design centers are also focused on the development of application algorithms and the systems upon which they are to run. The Modeling of Execution Model projects are exploring and quantifying the very principles upon which future exascale systems will be designed and operated. And the X-stack Program has just selected the teams that will develop next-generation system software and programming environments that will lead to exascale computing while providing nearer term utility as well.

But there is a difference between the milestones of 1927 and 1969 on the one hand, and that of the exascale, on the other. As extraordinary as Lindbergh’s historic accomplishment was, it was an end in itself. That cannot be the case for exascale computing. While our field has been guilty of stunt machines in the past, the cost and importance of achieving useful exascale capability, capacity, and application is too great to invest in merely claiming the first HPL exaflops Rmax run

And if some institution, agency, or nation does force such an artificial solution for a short-lived sense of glory, then surely the serious HPC community should mark this act with disdain. The future of HPC is the future of exascale but not merely such systems or benchmarks in and of themselves, but rather the scientific, medical, societal, and commercial breakthroughs that these systems will enable.

The Spirit of St Louis flew from New York to Paris. But it took a ship back to the US. It wouldn’t have made it if it had tried to do it in reverse. The head winds, which helped it fly east, would have impeded its progress west. The Spirit of Saint Louis now lives in the Smithsonian Air and Space Museum, the world’s most popular museum.

When viewing it from the 2nd floor, the discerning eye will notice a very peculiar thing; there is no front-looking window. Lindbergh could not see where he was going (although he did have a small periscope). From the side windows he could see where he was and guess what was coming next but he did not have the vision ahead.

HPC cannot afford to fly blind. We cannot just use our current position to assume we will make the right incremental progress towards our future destination. And we can’t just build an exascale computer to sit in a museum, even if it does run a benchmark. HPC is a tool for humanity to solve problems of importance when faced with so many critical challenges. No more stunt machines, please.

]]>https://www.hpcwire.com/2012/06/20/hpc_and_the_spirit_of_st-_louis/feed/04434TOP500 Gets Dressed Up with New Blue Geneshttps://www.hpcwire.com/2012/06/19/top500_gets_dressed_up_with_new_blue_genes/?utm_source=rss&utm_medium=rss&utm_campaign=top500_gets_dressed_up_with_new_blue_genes
https://www.hpcwire.com/2012/06/19/top500_gets_dressed_up_with_new_blue_genes/#respondTue, 19 Jun 2012 07:00:00 +0000http://www.hpcwire.com/?p=4435<img style="float: left;" src="http://media2.hpcwire.com/hpcwire/Blue_Gene_Q_small.jpg" alt="" width="75" height="75" />The 39th TOP500 list was released today at the International Supercomputing Conference in Hamburg, Germany, with a new machine at the top. Sequoia, an IBM Blue Gene/Q machine, delivered a world record 16 petaflops on Linpack, knocking RIKEN's 10-petaflop K Computer into second place. The Japanese K machine had held the TOP500 title for a year.

]]>At 16 petaflops, Sequoia recaptures the number one spot for the US.

The 39th TOP500 list was released today at the International Supercomputing Conference in Hamburg, Germany, with a new machine at the top. Sequoia, an IBM Blue Gene/Q machine, delivered a world record 16 petaflops on Linpack, knocking RIKEN’s 10-petaflop K Computer into second place. The Japanese K system had held the TOP500 title for a year.

Sequoia, which is housed at Lawrence Livermore National Lab, will provide the NNSA its most advanced simulation platform for maintaining the nuclear weapons stockpile of the US. In its spare time, it will also run unclassified codes for open science research.

The 96-rack Sequoia houses 1.6 million cores, another TOP500 record, and 1.6 petabytes of memory. Peak petaflops is a whopping 20.1 petaflops. The machine is one of six Blue Gene/Q systems of a petaflop or more deployed over the last six months.

Compared to the November 2011 list, when there was no turnover in the top 10, this time around, there are six brand new machines, plus one, Jaguar, that has benefitted from an upgrade to faster processors. Besides four new Blue Gene/Q’s (Sequoia, Mira, Fermi, and JuQUEEN), there is SuperMUC, an IBM iDataPlex cluster at Leibniz Rechenzentrum in Germany, and Curie, a Bull supercomputer installed at the French Atomic Energy Commission (CEA).

The new top 10 looks like this:

16.3 petaflops, Sequoia, United States

10.5 petaflops, K computer, Japan

8.2 petaflops, Mira, United States

2.9 petaflops, SuperMUC, Germany

2.6 petaflops, Tianhe-1A, China

1.9 petaflops, Jaguar, United States

1.7 petaflops, Fermi, Italy

1.4 petaflops, JuQUEEN, Germany

1.4 petaflops, Curie, France

1.3 petaflops, Nebulae, China

Although the US has regained the TOP500 title — the first time it has been in the top spot since 2009 — just three of the top 10 are now based in the States, down from five machines, six months ago, continuing a trend that has resulted in more geographical parity. China, Japan, Germany, France and Italy all have supercomputers at the top of the list now.

Taking all 500 supercomputers into account, the US is still the dominant player with 252 systems, but that’s down from 263 six months ago. China, is in second place with 68 systems, but it too has lost ground, shedding six since November. Japan (35 systems), the UK (25 systems), France (22 systems) and Germany(20 systems) are the only other nations with more than 10 machines on the list.

With each passing year, the TOP500 becomes a more exclusive club. The least performant machine (the 500th system) is now over 60 teraflops, a Linpack mark that would have earned it the top spot in 2004. Turnover was about average, with the list shedding 170 systems.

Meanwhile, aggregate performance continues its upward climb and is now at 123.4 petaflops, nearly doubling that of the November list, when it totaled 74.2 petaflops. A sizeable chunk of added flops was contributed by new machines that came in at a petaflop or better. Overall, the petaflop club doubled its membership over the last six months, growing from 10 to 20 systems.

From a vendor perspective, IBM cleaned up. The company is responsible for nearly half of the machines on the list, with 213. The next most popular vendor is HP, with 138 systems. Cray (26), Appro (19), Bull (16), SGI (16), and Dell (12), round out top six computer makers. Everyone else is in single digits.

It’s even more skewed at the top, where IBM claims five of the top 10. As mentioned before, that’s mainly the result of the new Blue Gene/Q installations. No other vendor has more than a single system in this upper tier.

The only area where IBM didn’t dominate the field is in processor architecture. Here Intel is king, claiming a 78 percent share overall, split between its various Xeon generations. The latest E5 Xeons, despite being in production only three months, already claim a nine percent share.

GPUs and other accelerators are now installed in 58 systems, up from 39 six months ago. The vast majority of them (53) are using NVIDIA parts. AMD’s ATI GPUs and IBM’s PowerXCell 8i are installed on two systems, apiece, while Intel’s MIC coprocessor made its debut on the TOP500 in an experimental cluster with pre-production Knights Corner chips.

On the interconnect front, InfiniBand now reigns as the most popular technology, with 209 systems, finally beating out Ethernet, which is installed on 207 machines. The remaining 84 systems use a some flavor of non-standard interconnect (custom, proprietary, Cray, etc.). Although small in number, these specialized networks are installed in systems that represent more than half (55 percent) of the TOP500’s aggregate performance.

The next Linpack rankings in November should see many of these trends continue. The top of the list, as always, should be quite interesting, especially since at least three new double-digit petaflop machines, powered by the latest accelerators, are scheduled to make their appearance. The Stampede system at TACC, will be powered by Intel’s first Knights Corner coprocessor, while the Titan and Blue Waters supercomputers at ORNL and NCSA, respectively, will get the new NVIDIA Kepler parts. If these deployments go as planned, we could, once again, see some major realignments in the top 10.

]]>https://www.hpcwire.com/2012/06/19/top500_gets_dressed_up_with_new_blue_genes/feed/04435Intel Will Ship Knights Corner Chip in 2012https://www.hpcwire.com/2012/06/18/intel_will_ship_knights_corner_chip_in_2012/?utm_source=rss&utm_medium=rss&utm_campaign=intel_will_ship_knights_corner_chip_in_2012
https://www.hpcwire.com/2012/06/18/intel_will_ship_knights_corner_chip_in_2012/#respondMon, 18 Jun 2012 07:00:00 +0000http://www.hpcwire.com/?p=4436<img style="float: left;" src="http://media2.hpcwire.com/hpcwire/knights_corner_chip.jpg" alt="" width="83" height="63" />On Monday at the International Supercomputing Conference in Hamburg, Intel announced that Knights Corner, the company's first manycore product, would be in production before the end of 2012. The company also released a few more details about the upcoming product line, including the creation of a new Xeon brand for the architecture, some performance updates on pre-production silicon, and Cray's adoption of MIC as part of its future Cascade supercomputer.

]]>Intel’s first Many Integrated Core (MIC) microprocessor is now just months away from its commercial debut. On Monday at the International Supercomputing Conference (ISC’12) in Hamburg, Intel announced that Knights Corner, the company’s first manycore product, would be in production before the end of 2012. The company also released a few more details about the upcoming product line, including the creation of a new Xeon brand for the architecture, some performance updates on pre-production silicon, and Cray’s adoption of MIC as part of its future Cascade supercomputer.

This was not a Knights Corner launch, however. With the plans now set for the chip to go into production before the end of the year, more than likely that means Intel will debut the product, in all its manycore glory, at SC12 in November. NVIDIA’s big Kepler GPU, the K20, is also expected to launch around this time, setting the stage for an MIC-GPU shootout in Q4.

This fall, TACC is slated to get a boatload of the first MIC coprocessors — 8 petaflops worth — as part of the center’s 10-petaflop Stampede supercomputer, which will be built by Dell. Other Knights Corner systems are also in the works for a handful of large HPC centers, including Jülich Supercomputing Centre, the University of Tokyo, Leibniz Supercomputing Centre (LRZ), Oak Ridge National Laboratory, the Korea Institute of Science and Technology Information (KISTI) and CERN. Depending upon the actual installation schedules and availability of the MIC parts, some or all of these systems may be up and running by November, in time perhaps to log Linpack runs.

But we won’t have to wait for November to hear about Linpack running on MIC machines. According to Intel’s Rajeeb Hazra, Intel’s GM of the Technical Computing group, they’ve been running the High Performance Linpack (HPL) benchmark on pre-production parts and have been able to achieve one teraflop on a single node equipped with a Knights Corner chip. That teraflop, by the way, is provided by the Knights Corner card plus the two Xeon E5 host CPUs, so the MIC chip itself is likely delivering something in the neighborhood of 700 to 800 gigaflops.

Intel has also put together a Xeon E5-MIC experimental cluster with pre-production Knights Corner parts that delivers 118.60 Linpack teraflops. That’s enough to place it at number 150 on the new TOP500 list released earlier today.

The peak performance for the Intel MIC cluster is 180.99, which means the Linpack yield is only 65 percent. Even though that’s pretty anemic compared to a CPU-only cluster, which typically hit 75 to 95 percent of peak, compared to the 50 percent or so yield on the current crop of GPU-accelerated clusters, MIC’s Linpack extraction looks to be significantly better. NVIDIA’s latest Kepler GPU and GPUDirect technology may help to close that gap, but we’ll have to wait and see on that.

Since Intel is not doing the Knights Corner launch at this point, they’re not releasing much more information about the upcoming product here at ISC. All the previous specs — 50-plus cores on 22nm process technology — are still in effect.

Intel, however, did talk about the on-board memory for the first time, saying that the Knights Corner PCIe cards will include at least 8 GB of GDDR5 memory (which, by the way, may have contributed to the better Linpack yield). The current Fermi-based Tesla modules from NVIDIA top out at 6GB of GDDR5, but the upcoming K20 module is likely to get more than that. Intel is still mum about ECC support for Knights Corner’s on-board memory, but as we’ve said before, such support seems like a foregone conclusion.

On the marketing front, the product line is getting a rebrand makeover. The architecture will still be called MIC, but the official product family will now be known as Xeon Phi. The idea here was to leverage the well-established Xeon brand, which defines the leading edge of Intel’s x86 line-up. At the same time, it drives home the point that MIC is an x86-based architecture, rather than some exotic design that Intel cooked up only for bleeding-edge techies.

Although the MIC instruction set, which Intel made public last week, does not match that of the latest Xeon CPUs, bit for bit (mainly diverging in the vector instruction area), the company is quick to point out that its C and Fortran compilers, libraries and other development tools will support the new architecture seamlessly. Plus, we’re reminded, developers are free to program them with the HPC standard parallel frameworks, namely MPI and OpenMP, as well as Intel’s own frameworks like TBB and Cilk Plus. Basically, if an app runs on a Xeon, it should run on a Xeon Phi.

In fact, Hazra made a point of talking up the ability of the Phi chips to run entire applications, rather than just accelerated kernels as is the case for GPUs and FPGAs. According to him, you will be able to run complete apps on the coprocessors, which can be treated as a virtual network node. That belies MIC’s natural role as a coprocessor, but opens up some unique ways to use the chip, as well as helping ease application porting and development.

Intel has to a careful here. Many, if not most, HPC applications are likely to run slower if they are entirely confined to a MIC coprocessor, in part because single-threaded performance on MIC will be inferior to that of a Xeon CPU. Plus, even at 8 GB, local memory capacity on the Phi card is just a fraction what a CPU can access.

And Intel still promotes its beloved Xeon CPUs as the center of the high performance computing universe, with Hazra referring to them as “the foundation of HPC” for general-purpose technical computing workloads. The Xeon Phi chips, he says, are suited for those applications that are highly parallel in nature. But the latter and former have a huge overlap, so talk of using the coprocessor as a CPU seems to send somewhat of a mixed message to HPC’ers.

In any case, OEMs are jumping on the MIC bandwagon. Most of the HPC system vendors in the x86 clusters business today will be offering Xeon Phi-equipped systems, presumably as soon as the first Knights Corner chips start rolling out, or soon thereafter. All the major server makers have signed up, including IBM, HP, Dell, Bull, SGI, and Fujitsu, as well as smaller HPC outfits like Appro, T-Platforms, and Penguin Computing.

Cray too, will be introducing MIC supercomputing in their “Cascade” product line in 2013, a system that will glue Xeon CPUs to Phi coprocessors. Cascade is the result of the DARPA HPCS program, whose goal was to produce productive architectures for multi-petaflop computing. The addition of the MIC chips to Cascade should come as no surprise, given that the system was designed to be based on Intel parts from the get-go.

“This is the next big step in our adaptive supercomputing vision,” said Cray CEO Peter Ungaro. According to him, they’ve already begun taking orders for such Phi-accelerated systems, including one from HLRS at the University of Stuttgart in Germany and another from Kyoto University in Japan.

Although the Xeon Phi product will be initially aimed at traditional HPC science codes, Intel believes that other applications that require high levels of parallelism, especially data parallelism, would also be good candidates. Big data analytics, in particular, appears to be an area ripe for these manycore processors with lots of memory bandwidth, and both the Xeon Phi and NVIDIA GPUs are likely to be jockeying for a chunk of this market.

The idea of using the MIC platform as the basis for big data machines has piqued Cray’s interest too. “We actually see Phi as a very viable candidate even within that [big data] environment,” said Ungaro. uRiKA, Cray’s big data appliance, which it offers under its YarcData division, is currently based on the company’s own custom Threadstorm processor.

Being able to sell these manycore chips into multiple markets beyond HPC would certainly be appealing to Intel and is likely to affect the Xeon Phi roadmap going forward. In the meantime, users will have to wait for Knights Corner launch, which finally appears to be just around the corner.

]]>https://www.hpcwire.com/2012/06/18/intel_will_ship_knights_corner_chip_in_2012/feed/04436Mellanox Cracks 100 Gbps with New InfiniBand Adaptershttps://www.hpcwire.com/2012/06/18/mellanox_cracks_100_gbps_with_new_infiniband_adapters/?utm_source=rss&utm_medium=rss&utm_campaign=mellanox_cracks_100_gbps_with_new_infiniband_adapters
https://www.hpcwire.com/2012/06/18/mellanox_cracks_100_gbps_with_new_infiniband_adapters/#respondMon, 18 Jun 2012 07:00:00 +0000http://www.hpcwire.com/?p=4437<img style="float: left;" src="http://media2.hpcwire.com/hpcwire/ConnectIB_logo.bmp" alt="" width="86" height="26" />Mellanox has developed a new architecture for high performance InfiniBand. Known as Connect-IB, this is the company’s fourth major InfiniBand adapter redesign, following in the footsteps of its InfiniHost, InfiniHost III and ConnectX lines. The new adapters double the throughput of the company’s FDR InfinBand gear, supporting speeds beyond 100 Gbps.

]]>Interconnect maker Mellanox has developed a new architecture for high performance InfiniBand. Known as Connect-IB, this is the company’s fourth major InfiniBand adapter redesign, following in the footsteps of its InfiniHost, InfiniHost III and ConnectX lines. The new adapters double the throughput of the company’s FDR InfinBand gear, supporting speeds beyond 100 Gbps.

Over the past 10 years, CPU compute power has increased roughly 100-fold, but interconnect bandwidth has been lagging, creating communications bottlenecks in servers. At the same time clusters are getting larger, further compounding the problem. This is certainly happening in HPC, but also in the commercial realm of cloud computing, and now, big data.

In all cases, the trend is toward larger and larger clusters with CPUs whose core counts are increasing at aMoore’s Law pace. With Connect-IB, Mellanox is attempting to re-sync the interconnect with the performance curve, with the goal to provide a balanced ratio of computational power and network bandwidth.

Connect-IB was designed as a foundational technology for future exascale systems and ultra-scale datacenters. Gilad Shainer, vice president of marketing development at Mellanox, claims the redesign offers unlimited interconnect scalability via its new Dynamic Connected Transport technology. “If you build something, you need it to handle tens of thousands and even hundreds of thousands [of nodes] if you want that architecture to last for the next couple of years,” he told HPCwire.

Connect-IB increases performance for both MPI- and PGAS-based applications. The architecture also features the latest GPUDirect RDMA technology, known as GPUDirect v3. This allows direct GPU-to-GPU communication, bypassing the OS and CPU. Overall, new adapters can process 130 million messages per second. The current generation ConnectX/VPI adapters, which handle both InfiniBand and Ethernet, deliver just 33 million messages per second, or roughly a quarter of Connect-IB’s capabilities.

Latency on the new adapters is 0.7 microseconds, which is equal to that of the latest Connect-X hardware for FDR InfiniBand. That’s pretty much tops in the commodity interconnect space today. Ethernet RDMA (RoCE), for example, comes in slightly behind at 1.3 microsecond latency.

When asked about the latency numbers, Shainer said the technology is approaching its physical limits and that further improvements would be minimal. “We’re getting very close to what you can cut,” he noted. “Right now the bigger portion of the latency is on the server side. It will be reduced moving to the future, but it’s not going to be a huge reduction.”

Connect-IB’s throughput marks the architecture’s greatest advantage. The highest-end part, which needs a PCI Express 3.0 interface, can break 100 Gbps. The increased bandwidth is welcome among a variety of applications and Shainer explained one hypothetical case involving SSD storage.

He noted that a server loaded with 24 SATA III SSDs could support a theoretical data throughput of 12 GB/second. To achieve that level of I/O without bottlenecks, the server’s interconnect would have to deliver 96 Gbps. This would require the equivalent of 15 8 Gbps Fibre Channel (FC)cards, 10 10GbE cards, or a single Connect-IB card with dual-FDR InfiniBand (56 Gbps) ports. Of course, there are no standard servers with more than a handful of I/O ports, so an FC or Ethernet solution for a heavily loaded SSD configuration is essentially out of the question.

“If you want to go the Fibre Channel way, you would have to put 15 cards in that box,” explained Shainer. “There is no way you’re going to do it. You create storage density, but from the other side you can’t take it out, so you lose the ability to do storage density.”

Mellanox will initially be releasing five InfiniBand adapters using the Connect-IB technology. The first unit will support PCIe 2.0 x16 with one port of 56 Gbps connectivity, which for the first time delivers FDR speeds to AMD-based servers. Two adapters have been also been developed with a PCIe 3.0 x8 interface. With a maximum throughput of 56 Gbps, these adapters can be ordered in one- or two-port configurations.

The last pair of adapters use a full PCIe 3.0 x16 interface. The maximum Connect-IB bandwidth of 112 Gbps is achieved with the dual-FDR-port adapter. In this case, multiple cables would be required between the adapter and the next hop. Mellanox is also offering a single-port PCIe 3.0 x16 adapter, providing 56 Gbps. Since maximum throughput from each port is the same as that of FDR InfiniBand, the new adapters are compatible with current switches.

The current Connect-X/VPI adapter line is not going away as a result of the Connect-IB introduction. In fact, the company plans to incorporate the more performant architecture in the fourth generation of Connect-X adapters, which support both InfiniBand and Ethernet.

A number of organizations across HPC, Web 2.0, cloud and storage have been lining up for the new Connect-IB products, according to Shainer. “We might see deployments this year, but definitely early next year,” he said. “Right now it’s too early to expose the names, but yes, we have customers.”

Prototypes are currently working at Mellanox labs and samples will be sent to customers in Q3, with general availability expected in early Q4. Mellanox will be running a lab demonstration of Connect-IB at ISC’12 this week inHamburg,Germany.

]]>https://www.hpcwire.com/2012/06/18/mellanox_cracks_100_gbps_with_new_infiniband_adapters/feed/04437HPC Lists We’d Like to Seehttps://www.hpcwire.com/2012/06/15/hpc_lists_wed_like_to_see/?utm_source=rss&utm_medium=rss&utm_campaign=hpc_lists_wed_like_to_see
https://www.hpcwire.com/2012/06/15/hpc_lists_wed_like_to_see/#respondFri, 15 Jun 2012 07:00:00 +0000http://www.hpcwire.com/?p=4438<img style="float: left;" src="http://media2.hpcwire.com/hpcwire/generic_lists.bmp" alt="" width="85" height="72" />Since the release of the first TOP500 list in June of 1993, the HPC community has been motivated by the competition to place high on that list. We’re now approaching the twentieth anniversary of the TOP500. In recent years, two additional lists have gained traction: the Green500 and the Graph 500. Would a few more lists be useful?

]]>Since the release of the first TOP500 list in June of 1993, the HPC community has been motivated by the competition to place high on that list. We’re now approaching the twentieth anniversary of the TOP500. In recent years, two additional lists have gained traction: the Green500 and the Graph 500. Would a few more lists be useful? Let’s take a look at a some options.

Two Decades of Lists

In June, at the International Supercomputing Conference in Hamburg, the TOP500 list will celebrate its twentieth anniversary. The first list was published in June of 1993 at the Supercomputing 93 Conference in Mannheim. The top 10 entries on that list are displayed below.

The peak performance of the top machine was just under 60 gigaflops. Eight of the top 10 machines were manufactured and sited in the United States, five of those by a now-defunct vendor.

Nostalgia aside, it is interesting to reflect on the impact that the TOP500 list has had, and continues to have, on HPC around the world. While some might argue that the list is too simplistic and fails to capture the true complexity of high performance computers, its beauty also lies in that simplicity. It provides a simple linear ranking of machines by their peak performance at number crunching, as measured by the LINPACK benchmark, in FLOPS (floating point operations per second). It has provided a race that everyone wants to win. Consequently, high placement on the TOP500 list has become a driver on HPC procurements and provides bragging rights and enhanced recruitment power to those at the top.

Green500 & Graph 500 Lists

As was discussed in a previous HPCwire article, the TOP500 List has also spawned at least a couple of additional lists: the Green500, introduced in November of 2007, and Graph 500, introduced in November of 2010. The previously unconstrained race to the top is being complemented by a new form of competition – one constrained by electrical power – and captured in the Green500 list. Here, the measure is energy efficiency and the metric is MFLOPS/watt. Also, data crunching has soared in importance and visibility and is arguably on par with number crunching. Data crunching performance, as measured by an evolving set of kernels from graph algorithms and a metric called TEPS (traversed edges per second), is documented in the Graph 500 list.

These new lists help expand our understanding of machine performance by adding a couple of additional dimensions. At the same time, each list preserves the beauty of the TOP500 list by providing a simple linear ranking. Each gives us a competition to be won. In this same spirit, perhaps we should consider putting a few more HPC dimensions into competitive play. To start the conversation, we’ll propose a few possibilities.

Footprint500

The most capable computers are very large. Housing them is expensive. So, we might want to create a “footprint” measure (actually system volume, but footprint sounds better). The footprint metric could be FLOPS/meter3. Given some agreement on what constitutes the measurable volume of a system, such a Footprint500 list should be relatively straightforward to construct.

For example, let’s make some rough estimates for the RIKEN K Computer, currently at the top of the TOP500 List. The entire K Computer system and all of its supporting equipment are spread across three floors and a basement in its home at the RIKEN Advanced Institute for Computational Science. If we consider only the room in which the computer cabinets are located, it has a floor area of about 3,000 m2. The K Computer’s 864 cabinets sit on an areal footprint of 1,600 m2. As the cabinets are a bit over 2m tall, this yields a volume of about 3,296 m3. The K Computer’s peak performance is 10,510 Teraflops. Thus, it delivers roughly 3.19 teraflops/m3. How does your favorite machine compare?

Faultless500

The reliability of high-end HPC machines is a significant issue. There are several ways to measure system reliability and this is an active topic of research. A common metric is the Mean Time Between Interrupt (MTBI). So, we’ll use this as a placeholder for whatever more precise metric the community of experts may converge on. MTBI can range as low as days or even just hours for our most capable machines. This is not a good situation and it is expected to get worse as systems grow into the exascale range. In order to highlight the issue and provide positive reinforcement to those who make strides in addressing it, we might create a Faultless500 list. To be fair to the larger, more capable, machines the MTBI metric could be replaced with something like a Mean FLOPS Between Interrupt (MFBI) one.

Motion500

Crunching numbers is fast and cheap, while moving data is slow and expensive. This is the mantra one hears these days. So, maybe we need a data motion metric and a Motion500 list. For example, something like bits/second/distance, where distance represents some set of predetermined traverses of a computing system’s memory space. If the traverses look significantly different for number and data crunching applications, then perhaps there should be two lists. Another approach might be that described by Allan Snavely, associate director of the San Diego Supercomputer Center, as data motion capacity: “Take the capacity of each level of the memory hierarchy in bytes and divide by the access time in cycles and sum this up.”

Satisfaction500

From the end user’s perspective, the time to job completion is very important. Obviously, not all application jobs look alike. For example, we’ve already observed that number and data crunching are claimed to be substantially different. Within each of these categories there is further significant differentiation. So while time to completion may be a good metric for end user satisfaction, there is no obvious simple measure, like LINPACK, to apply. Nonetheless, test suites comprised of collections of complete codes representative of those applications consuming the lion’s share of HPC resources can be assembled and used to measure time to completion for the Satisfaction500 list.

About a decade ago, the Department of Energy’s Office of Advanced Scientific Computing Research commissioned a first attempt at something one might see as similar in intent to a Satisfaction500 list. It was called the Applications Performance Matrix (see screenshot below). Its purpose was to provide “a rich source of information on the performance of high-performance computers applied to real science and engineering problems.”

It may have been an idea ahead of its time, as it seems to have disappeared from the web and only to have survived in presentations, like this one given by Bill Buzbee at the 2004 Salishan Conference.

While the Applications Performance Matrix used real applications codes to measure computer performance, a complementary approach has been taken by the HPC Challenge benchmark. This benchmark attempts to get a more holistic view of computer performance by using a suite of seven tests, presumably abstracted from the requirements of real applications.

What we suggest here is “biting the bullet” and running a suite of “full up” applications codes to completion, under pre-specified conditions, on a large collection of computers. With the emergence of a nascent OpenScience movement (see, for example, the Open Chemistry Project) and the imperatives for openness imposed by the centrality of science to public policy formation, perhaps a common set of real applications captured in open codes, to serve as candidates for such a suite, is now within reach.

List of Lists

If each of the above suggestions were implemented, we’d have seven distinct “500” lists available for analysis and decision making. While each list would contain its own collection of machines, presumably there would be reasonable overlap. In our earlier study of the TOP500, Green500 and Graph 500 lists we found this to be the case and, given an incentive to make the measurements, the intersection of the various lists would surely grow.

So, the ultimate list would be a list of all the lists. Given clear identification of machines, so that they could be unambiguously tracked across lists, the ListofLists500 might help us to better understand the character of various high performance computers, while preserving the linear rankings and competitions inherent in the individual lists. Also, if the ListofLists500 were refined enough and diligently maintained, it could serve as the basis for an HPC “configurator.”

Are any of these lists worth a try? Do you have suggestions for other lists? Let us know what you think.

About the author

Gary M. Johnson is the founder of Computational Science Solutions, LLC, whose mission is to develop, advocate, and implement solutions for the global computational science and engineering community.

Dr. Johnson specializes in management of high performance computing, applied mathematics, and computational science research activities; advocacy, development, and management of high performance computing centers; development of national science and technology policy; and creation of education and research programs in computational engineering and science.

He has worked in Academia, Industry and Government. He has held full professorships at Colorado State University and George Mason University, been a researcher at United Technologies Research Center, and worked for the Department of Defense, NASA, and the Department of Energy.

He is a graduate of the U.S. Air Force Academy; holds advanced degrees from Caltech and the von Karman Institute; and has a Ph.D. in applied sciences from the University of Brussels.

]]>https://www.hpcwire.com/2012/06/15/hpc_lists_wed_like_to_see/feed/04438Student Cluster Challenge Makes ISC Debuthttps://www.hpcwire.com/2012/06/13/student_cluster_challenge_makes_isc_debut/?utm_source=rss&utm_medium=rss&utm_campaign=student_cluster_challenge_makes_isc_debut
https://www.hpcwire.com/2012/06/13/student_cluster_challenge_makes_isc_debut/#respondWed, 13 Jun 2012 07:00:00 +0000http://www.hpcwire.com/?p=4441<img style="float: left;" src="http://media2.hpcwire.com/hpcwire/beowulf_student_cluster.png" alt="" width="99" height="79" />For the first time in its history, ISC will host its first international Student Cluster Challenge (SCC), heretofore a mainstay of the US-based SC conference in November. Now the competition is moving far beyond its “fan favorite” origins to take center stage this June in an international venue.

]]>Internationals spotlight shines on global competiveness and the future of computing

Spring in Germany and autumn in the United States draws global leaders from across vast fields of research, development, academia, government and industry to two distinct conferences which are everything but standard symposium fare.

Ongoing for over two decades, the International Supercomputing Conference (ISC) and the SC conferences are built upon the rich heritage of supercomputing. Today both programs focus on a range of disciplines within high performance computing. Attendees are constantly pushing boundaries, discovering the unimaginable, proving the improbable, achieving the impossible or doing what for others may be altogether inconceivable. Combined, the conferences and community maintain a deeply rooted commitment to stewarding the future and their follow-generations in line to inherit this tremendous legacy.

On the event horizon is the Hamburg-based ISC, followed in November by SC12 in Salt Lake City. They are symbolic cornerstones, marking the beginning and completion of the academic calendar, underscoring their core focus on fostering education, expertise and community. Complementing the impressive attendee rosters and exhaustive program agendas are the student contingents who are as integral to the conference fabrics as the programs are critical to the student’s success and our relentless pursuit of understanding.

Student engagement has always been a central theme. Both conferences offer extensive student-focused initiatives designed to support both academic and professional pursuits. One of those programs is the Student Challenge, a rigorous technical competition designed to augment undergraduate disciplines in the design and use of system architectures and HPC applications.

Drummed up by SC notables, like Ricky Kendall and company, as part of the stateside conference years ago the competition began as an “SC friendly.” Today the SC Student Cluster Challenge (SCC) is moving far beyond its “fan favorite” origins, with a few ardent devotees, to take center stage this June as an international competition.

For the first time in its history, ISC will host its first international student competition. Through generous sponsorship from Airbus, the launch of the ISC student challenge sets another milestone in conference history and is the result of tremendous vision and commitment from an exceptional team of people.

With a long history in student-oriented development programs, Gilad Shainer – whose affectionate title as “the hardest working man in HPC” is well earned – initiated the effort. He, along with the full backing of Mellanox Technologies Inc and support from colleague Pak Lui as well as the HPC Advisory Council, worked with ISC Chairs, conference organizers and numerous others to bring the live competition to Germany this month. Drawing team participation from across Europe, the United States and China, the result is nothing less than a profound achievement of epic proportion.

With the US focus on STEM and COMPETES initiatives directly, and comparable efforts globally, the student challenges are triggering national investments and interest. With growing representation of countries outside of the EU and US – including Russia, Costa Rica and beyond — year over year there are numerous new examples of the growing importance of this one small program. And with each year and competition comes an equal amount of heartwarming stories of heroes and champions.

The bigger back-story leading to this year’s Hamburg challenge is the heated competition among teams that will represent China, whose approach best showcases a nation’s dependency on developing future generations of researchers and innovators. As far as what this means for the rest of the world, China has laid down the gauntlet and comes to the competition as the nation to beat.

Thirty teams vied for only six in-play finalist spots in the advance competition. While that alone should be enough to encourage competing nations to take heed, the ceremony and coverage included full endorsements from high-ranking officials and national media coverage throughout. Capturing both national and international attention is as much a testament to the talents of the competing and winning teams as it is recognition of the programs vision on developing technology expertise. And is an international nod to the competition founders, teams and supporters alike who have championed the importance of this small program.

SCC champions and advocates like Dan Olds, of the Gabriel Consulting Group, is both patron and guardian to the competition. Enduring the same week-long schedule as the competing teams, he has given the competition not just a broad and international spotlight, but has brought it a life and personality of its own. Old’s has established a presence that the competition and the kids so richly deserved and that the conferences needed in order to help fuel the broad appeal.

Awareness of the competition has grown tremendously with everything from formal coverage throughout the conference week, complete with live competition footage and postings to this year’s virtual betting pool. This unique addition to the ISC program will allow fans to endorse their odds on favorites.

No challenge goes without its set of challenges. What the founders, team leads and chairs, like the Doug(s) and the host of others have managed to accomplish is awe-inspiring. They have been at the forefront, leading, advocating and championing the importance and rewards of these competitions. Breaking down barriers and moving impenetrable forces to achieve the unimaginable, challenge the improbable, deliver the impossible and accomplish what for others was altogether inconceivable.

What these champions – students, supporters and advocates alike – understand is the absolute need for and commitment to excellence. The future depends upon it.

Acceptance into the competition is no easy endeavor for any undergraduate whether backed by an entire nation or by a single academic advisor. The bar is intentionally set high.

As an educational program element for the students, competition begins from the outset. Anywhere from six to twelve months ahead of a conference, teams boot-strap their way through the entire process. From formalizing an undergraduate team complete with an advisor, securing mentors and endorsement from their representative institution, to architecting a competitive platform and garnering advanced commitment from the technology community, all the way through to submitting the team-authored proposal that must be unique and differentiated enough to be selected by a juried review body.

Looking at it from the academic calendar, and the student’s vantage point, preparation for selected teams generally coincides with summer and winter semester and break periods. So while their classmates are on break, student teams can be found heads down or in their labs, as they are encouraged to do as much advanced preparation that runs all the way through to hands-on testing prior to the competition.

Upon arrival at the conference, the competition begins in earnest. Each team builds a complete system that is bound by rigorous rules including limitations of advisor involvement, challenging parameters such as power-capping while pushing to achieve maximum performance. Students can be found, night and day, tuning their systems, optimizing their software, in order to achieve the optimal performance for each application to deliver the winning results across the entire competition.

Just a few short years ago, achieving teraflop (TF) performance was a huge leap in the SCC. The University of Texas team was the first to break the that threshold. Since then, three teams achieved the TF performance milestone during SC10 in New Orleans, whereas six out of eight teams hit that mark just a year later at SC11 in Seattle. Russia’s led with 1.92 TF, followed by China at 1.84, Taiwan at 1.83 and Texas at 1.37. These are phenomenal numbers in terms of both the base performance as well as the number of teams to break the TF threshold, particularly when the teams are only limited to 26 amps of total power.

In addition to unique system architectures and winning performance, there are additional competitive points awarded each team that range from team and individual presentations of their efforts and results, to peer and juried reviews, live interviews, etc. At the close of the competition the students are as exhausted as they are exhilarated. With the close of the conference the students participate in a formal award ceremony in this winner take all competition.

With the inauguration of the ISC competition, China’s in-nation competition and the November SC12 looming on the horizon, what started as a friendly is hotting-up. Will other nations follow suit? Will the student challenges go on to become a conference favorite and leading draw?

What is plausible is that the original stewards, annual committee, long-time attendees and newcomers alike are likely to be as inspired by the vitality of youth — not yet encumbered by the notion of impossibility — as the younger set is from having direct access to the expertise and experience of the pioneers in this broad field of possibilities.

]]>https://www.hpcwire.com/2012/06/13/student_cluster_challenge_makes_isc_debut/feed/04441The TOP500 Celebrates 20th Anniversary, Will it Survive 20 More?https://www.hpcwire.com/2012/06/12/the_top500_celebrates_20th_anniversary_will_it_survive_20_more_/?utm_source=rss&utm_medium=rss&utm_campaign=the_top500_celebrates_20th_anniversary_will_it_survive_20_more_
https://www.hpcwire.com/2012/06/12/the_top500_celebrates_20th_anniversary_will_it_survive_20_more_/#respondTue, 12 Jun 2012 07:00:00 +0000http://www.hpcwire.com/?p=4442<img style="float: left;" src="http://media2.hpcwire.com/hpcwire/Hans_Meuer.jpg" alt="" width="85" height="85" />With the upcoming release of the TOP500 next week, the latest rankings are usually a hot topic of discussion this time of year. Over the past 20 years, the list has proven to be a useful and popular compilation of supercomputers for the HPC community. In this exclusive interview, Professor Hans Meuer, considered by many to be the driving force behind the project, offers his thoughts on the TOP500; its past, present, and future.

]]>The TOP500 has provided a ranking of systems for two decades in a consistent fashion, which has provided the high-performance community with a way to compare systems and to establish targets for vendors to deliver increased capabilities to the most challenging applications.

Over the past 20 years, the TOP500 has proven to be a useful and popular benchmark. To a degree, it is a corner point in performance focused on dense linear algebra (compute-intensive floating point), which is highly correlated to many applications in computational science and engineering.

In recent years, new data-intensive problems have come to light that stress the memory subsystem for irregular accesses to data. Complimentary benchmarks are emerging, such as the Graph 500, which evaluates the suitability of a machine’s performance while running data-intensive analytics applications, and the Green500, which provides a ranking of the most energy-efficient supercomputers in the world.

With the upcoming release of the most current rankings, the TOP500 is usually a hot topic of discussion this time of year. I caught up with Professor Hans Meuer recently, considered by many to be the driving force behind the project, to learn more about his thoughts on the TOP500; its past, present, and future.

Tom Tabor: Hans, how timely is this topic at ISC’12 this year?

Hans Meuer: The 39th list will be published on Monday, June 18, during the opening session. That leaves just one more list to compile this year – the November list, which will be released at SC12 – to complete 20 years since the founding TOP500. As we countdown to the 20th anniversary celebration, Erich (Strohmaier), Jack (Dongarra), Horst (Simon), and I will be guests on HPCwire’s Soundbite live from Hamburg, and our ISC Think Tank Series topic will be “The TOP500 – Twenty Years Later.” At SC’12, we’ll also host a TOP500 history booth to demonstrate the 20 years of development of the project. So this will be a very exciting year for us.

Tabor: Take us back to the beginning. How did you, Jack, Horst and Erich meet and did you meet with the intent of starting a ranking?

Meuer: We have all known each other for a long time. Erich joined my staff at the Mannheim University in 1990, and has thus been involved in the TOP500 project from the very beginning. I invited Jack to talk at the second Mannheim Supercomputer Seminar in 1987, and Horst has attended our HPC conferences regularly since 1990. Ironically, we didn’t hold any special meeting when we launched the project in the spring of 1993. Currently, we meet each year at ISC, and in the U.S., at the SC Conference, to discuss the project.

Tabor: How did the idea for the TOP500 germinate?

Meuer: The Mannheim Supercomputer Statistics merely contained the names of the manufacturers and thus became superfluous right at beginning of the 90s. New statistics that reflected the diversification of supercomputers, the enormous performance difference between low-end and high-end models, the increasing availability of massively parallel processing (MPP) systems, and the strong increase in computing power of the high-end models of workstation suppliers (SMP), was more essential.

To provide for this new statistical foundation, in 1993, Erich and I began to assemble and maintain a list of the 500 most powerful computer systems. We also decided right at the beginning to use the best LINPACK performance, Rmax to rank the systems in our list. The first list was compiled in June of that year. Since then, with the help of HPC sites and manufacturers, it has been compiled twice a year.

Erich and I are the TOP500 founding authors, Jack is the father of LINPACK and came aboard in 1993, and Horst embarked on the journey in 2000.

Tabor: Whose idea was it to call it the TOP500?

Meuer: It was my idea and the underlying reasons are two-fold. The first is that when we completed the Mannheim Supercomputing Statistics project, we were left with 530 systems and I considered it logical to begin where we had stopped. The other reason is sentimental. The Forbes 500 list, which point to the world’s richest and most successful people and corporations, has always fascinated me. So, here we are… focusing on the world’s 500 most powerful systems!

Tabor: Did you ever envision the list becoming so mainstream?

Meuer: No.

Tabor: What was your first instance of the notoriety of the list?

Meuer: Sometime late in the 90s, during one of the sessions at the SC conference, a speaker referred to “the list” in his presentation as a matter of course and not the TOP500 list.

Tabor: On the first list, who was number one and what was the system’s peak performance?

Meuer: This was the Thinking Machines CM-5/1024 at the Los Alamos National Lab, with a best LINPACK performance of 59.7 gigaflops and a peak performance of 131 gigaflops. By the way, the TOP500 app, which is available for free download at the Apple Store contains information on all the past lists.

Tabor: What do you believe are the most important aspects of the TOP500 that have led it to be a widely referenced benchmark?

Meuer: We have been criticized for choosing LINPACK from the very beginning, but now in the 20th year, I believe that it was this particular choice that has contributed to the success of TOP500. Back then and also now, there simply isn’t an appropriate alternative to LINPACK. Any other benchmark would appear similarly specific, but would not be so readily available for all systems in question. One of LINPACK’s advantages is its scalability, in the sense that it has allowed us for the past 19 years to benchmark systems that cover a performance range of more than 11 orders of magnitude. Another significant advantage is that we can foster competition between manufacturers, countries and sites.

The TOP500 list’s success lies in the compilation and analysis of data over time. We have been able to correctly identify and track nearly all HPC developments over 19 years, covering manufacturers and users of HPC systems, architectures, interconnects, processors, operating systems and more. Above all else, the TOP500’s strength is that it has proved to be an exceptionally reliable tool for forecasting developments in performance.

Tabor: If there were no precedent to follow, would you propose ranking supercomputers on the basis of LINPACK measurements today?

Meuer: Yes, because LINPACK remains a useful, valid and substantive benchmark even in the years to come. And there is currently no alternative to replace it.

Tabor: What do you like and dislike with the LINPACKbenchmark?

Meuer: The pros of LINPACK as a yardstick of performance are as following: one figure of merit, simple to define and rank, it allows the problem size to change with machine, and over time and it also allows for competition. The cons are that it emphasizes only “peak” CPU speed and number of CPUs. It does not stress local bandwidth, the memory system or the network, and no single figure of merit can reflect the overall performance of an HPC system. To solely rely on LINPACK today and in the years to come is definitely not enough. Additionally, we need other benchmarks to keep track of new HPC systems.

Tabor: Can you please discuss in a bit more detail the current alternative benchmarks?

Meuer: For the purpose of discussion, let’s focus on three alternative benchmarks.

The HPC Challenge Benchmark (HPC CB) from Jack Dongarra basically consists of seven different benchmarks, each stressing a different part of a system. Of course, High Performance LINPACK (HPL) is represented and stands for the CPU. Ultimately, however we don’t have a single number of merit, but seven numbers represented in a much more complex way by the so-called Kiviat Graphs.

For some people, this is too complex to understand, especially for journalists reporting on new systems entering the HPC arena. For system specialists, the results can be well interpreted and for that reason the HPC CB has reached a certain standard for selecting an HPC system for an institution.

The Green500 List, overseen by Wu-chun Feng and Kirk W. Cameron of Virginia Tech is another complimentary approach to ranking supercomputers. The inaugural Green500 list was announced at SC08 as a complement to the TOP500, to provide a ranking of the most energy-efficient supercomputers in the world, so that supercomputers can now be compared by performance-per-watt. At SC11, the latest Green500 list was published with 500 entries. The number one system in the TOP500, Fujitsu’s K computer, reached a remarkable position of number 32 on the green list, although it represents the largest power consumption, with more than 12.5 MW, as observed in the TOP500 list.

The Graph 500, led by Richard C. Murphy from Sandia National Laboratory is a highly important project that addresses the dominating data-intensive supercomputer applications. As current benchmarks don’t provide useful information on the suitability of supercomputing systems for data intensive applications, a new set of benchmarks is needed to guide the design of hardware/software systems intended to support such “big data” applications. While the TOP500 addresses number crunching, the Graph 500 addresses data crunching applications. Graph algorithms are a core part of many analytics workloads. Backed by a steering committee of 50 international experts, Graph 500 will establish a set of large-scale benchmarks for these applications.

The Graph 500 project includes three major application kernels: concurrent search, optimization (single source shortest path), and edge-oriented. (maximal independent set). It addresses five graph-related business areas: cyber security, medical informatics, data enrichment, social networks, and symbolic networks. The Graph 500 was announced at ISC’10, and the first list appeared at SC’10. (9 systems ranked). Further results have been published at ISC’11 (29 systems) and SC’11 (49 systems) with the next list slated for release at ISC’12.

Tabor: Hans, in your opinion, how much of the reason we use the TOP500 is due to the legacy and how much is because it provides good guidance on how fast a computer really is?

Meuer: I have to admit that the TOP500, with LINPACK, is not the best tool for ranking supercomputers but it’s the only one available. The TOP500, with LINPACK, doesn’t tell you how fast a computer is on useful applications. The TOP500 ranks computers only by their ability to solve a set of linear equations, Ax=b, using a dense random matrix A and nothing else. The misinterpretation of the TOP500 results has surely led to a negative attitude towards LINPACK. Politicians, for example, consider a system’s TOP500 rank as a general rank that is valid for all applications, which of course is not true.

Tabor: Do you think the TOP500 should consider replacing its ranking of systems by flops with flops-per-joule?

Meuer: No.

Tabor: What are your thoughts about expanding the TOP500 to include the price paid for the supercomputer so that one can easily see the price-performance trends?

Meuer: That is a good question. We had thought about this right at the very beginning, but decided not to include any prices. What is the price of a supercomputer? Is it the list price? Is it the negotiated price? That’s a highly vague area, and we were afraid to waste our time with a fly-by-night approach.

Tabor: Do you envision the TOP500 also ranking the performance of cloud computers?

Meuer: We haven’t thought about this yet. When we gain a deeper understanding of cloud computers, we might consider this.

Tabor: Is there any intention to compile all the lists in a book?

Meuer: Yes, we’ve been discussing this since the 15th year of TOP500. We are all more or less very busy, but now that you have reminded me, I’ll start pushing for a discussion in conjunction with our 20th anniversary.

Tabor: Finally Hans, do you believe the TOP500 will still provide a useful measure for ranking systems another 20 years from now?

Meuer: Yes, but I can’t tell you what yardstick we’ll be using 20 years from now.

Tabor: Hans, thank you for taking the time to share this important bit of HPC history with us.

Meuer: With great pleasure Tom… see you in Hamburg.

—–

About the Author

Tom Tabor is CEO and Founder of Tabor Communications, Inc. (TCI), a leading international media, advertising, and communications organization. An industry pioneer, Tom has over 30 years of experience in business-to-business publishing, with the last 24+ years focused primarily on high performance and data-intensive computing technologies.

]]>https://www.hpcwire.com/2012/06/12/the_top500_celebrates_20th_anniversary_will_it_survive_20_more_/feed/04442Aircraft Simulations Push Computing to the Cutting Edgehttps://www.hpcwire.com/2012/01/26/aircraft_simulation_pushes_computing_to_the_cutting_edge/?utm_source=rss&utm_medium=rss&utm_campaign=aircraft_simulation_pushes_computing_to_the_cutting_edge
https://www.hpcwire.com/2012/01/26/aircraft_simulation_pushes_computing_to_the_cutting_edge/#respondThu, 26 Jan 2012 08:00:00 +0000http://www.hpcwire.com/?p=4575<img style="float: left;" src="http://media2.hpcwire.com/hpcwire/Guus_Dekkers_small.jpg" alt="" width="107" height="89" />Designing an aircraft is one of the more expensive endeavors in the manufacturing business. It's no surprise that large manufacturers like Boeing and Airbus have turned to computing, and especially high performance computing, to streamline the effort. To get a sense of the current state of the art, we asked Guus Dekkers, CIO of EADS and Airbus, to shed some light on the computational challenges involved.

]]>Designing an aircraft is one of the more expensive endeavors in the manufacturing business. Complex engineering, strict safety regulations, and high levels of quality control, all conspire to make such development time consuming and labor intensive. It’s no surprise that large manufacturers like Boeing and Airbus have turned to computing, and especially high performance computing, to streamline the effort.

To get a sense of the current state of the art, we asked Guus Dekkers, CIO of EADS and Airbus, to shed some light on the computational challenges involved. In the interview that follows, Dekkers, who will be delivering the opening keynote on this subject at ISC’12 in Hamburg, Germany, explains how HPC is being applied to aircraft simulation today and what the future might bring.

HPCwire: Before coming to Airbus and EADS, you worked in the automotive industry. How do these industries differ in their need for, and use of, high performance computing?

Guus Dekkers: Due to the complexity of both the product and the development process, the aeronautics industry has the need to pre-load and virtualize its development process far more than is the case today in the automotive industry. Whereas in an automotive environment the number of prototypes built has been substantially reduced during the last decade, a new car model will nevertheless still see a substantial number of physical models being built. This compares to a handful of extremely expensive prototypes in the aeronautics industry, with only few — and costly! — capabilities to correct if needed.

Also the number of engineering domains in which advanced simulation is being used is far more substantial than in automotive. Because the aeronautics industry needs to address advanced technical challenges unknown to the automotive industry (ex: lightning stroke, ice accreditation prediction, calculation of dynamic loads during different flight phases, …), and also because the establishment of a physical mock-up is in the automotive industry at times the far easier and efficient way to take design decisions.

HPCwire: Is the use of HPC for aircraft simulations actually enabling engineers to come up with better, more complex designs or is its main benefit cost reduction, via the replacement of physical prototyping and testing?

Dekkers: I would say it is both. Today the engineers do no longer limit themselves to simulate an aircraft’s behavior as a static model, but use the availability of vast high performance computing power to calculate the optimal scenario under different, partially dynamic situations. This allows them to optimize important safety, environmental and performance criteria like fuel-burn, noise, aerodynamics optimizations and performance prediction for multiple scenario’s, which has been impossible in the current precision up until recently. This clearly allows us to design better aircraft.

HPCwire: What is the biggest challenge in performing aircraft simulations today? And how is it being addressed?

Dekkers: The challenges are multifold. First and most basic, the compatibility of the simulation software with the hardware architecture. This is why most companies prefer having multiple types of architectures to deal with multiple requirements.

The calibration of the simulation algorithm, its results, and its predictions with real-life also represents a challenge, especially for newer materials like carbon fiber. Here we ultimately have no other option than to validate through physical mockups.

Last but not least, linking both input and output of such a simulation cycle to the “right” aircraft configuration is not evident, that is, how do I make sure the calculations are based on the right digital mockup configuration and how can I assure that its results are reproducible for a very long time-frame?

HPCwire: Are there particular aspects of aircraft design that simulations are particularly good at optimizing?

Dekkers: Traditionally over three-fourths of our HPC capacities have been used for aerodynamic optimizations, which is not a surprise to anyone, I believe. However, we currently see a clear trend shifting its use toward multi-disciplinary design and optimization, aero-acoustics and system integration. This does not mean that the traditional area of using HPC is reducing its usage, but the other use cases simply seem to grow faster.

HPCwire: Can you tell us a little about Airbus’ FuSim program — what it’s about and what are the expectations?

Dekkers: FuSim is for Future Simulation concept. It is a strategic research & technology program launched in 2006 to drastically change the aerodynamic development process.

FuSim objectives are to develop innovative computer-based simulation systems to increase the capability of fluid mechanics design processes by up to a million times, leading to significantly reduced product development lead times, as well as enhanced product optimization through investigation of breakthrough technologies such as flow control. Needless to say, this requires endless computing capacity.

Progress achieved during first phase of Fusim from 2006 to 2011 demonstrated an overall 10^3 improvement in computational fluid dynamics efficiency versus its 2005 basis.

The next big step is Megasim, planned for 2015, which targets another 10^3 improvement in CFD efficiency versus today’s basis, that is, a 10^6 improvement in comparison to 2005.

HPCwire: How important are government and academic partnerships to Airbus and EADS?

Dekkers: Especially in the area of flight physics we have long-lasting partnerships with academic institutes and programs. In this area, I specifically would like to mention C2A2S2E in Germany, Mosart in France, CFMS in UK and DOVRES in Spain.

Our typical work with academia focuses on research methods — how to improve aerodynamics analysis and methods implementation and how to best apply them.

In addition to these initiatives we are looking at an EU funded project, called PRACE, which is federating HPC research infrastructure in Europe, in order to see how the aerospace industry can benefit from European petaflops computing capacity, and eventually access exaflops for the most challenging unsteady aerodynamics and multiphysics simulations.

HPCwire: Which new or upcoming HPC technologies and developments do you think will be most significant for the aerospace industry?

Dekkers: In the area of HPC environments, we will have to deal with the strong growth in I/O management and storage. Between 2008 and 2013, I/O volumes are growing from 5 GB/calculation to 5,000 GB/calculation, which all need to be transferred, stored and displayed. Also the visualization of such data volumes represents a true challenge, not only due to its sheer size but also by having to compress the meaningful data onto available display sizes.

Also handling the physical characteristics of such HPC environments are more and more challenging. Our 200 teraflops container solutions consume several hundred kilowatts in just a couple of cubic meters of space, and need to be cooled in an environmental-friendly way. Here we will certainly need even newer technologies then we have today.

Last but not least, I believe that the efficiency of high performance computing will depend at least as much on the exponential efficiency of the algorithms used, which I would expect to contribute in the same order-of-magnitude as the performance of HPC from hardware innovations. Code must clearly be further parallelized to take benefit from the new architectures — we today still have a lot of “old fashion” code on our systems — and needs to be continuously adapted to take maximum benefit of the newest processor technologies.