Parallel

Virtualized Supercomputer Operating System

, January 29, 2010

Research shows progress on opening up supercomputers to a broader range of applications and users

New work on the Sandia National Laboratories Red Storm supercomputer — the 17th fastest in the world — is helping to make supercomputers more accessible. Sandia researchers, working hand in hand with researchers from Northwestern University and the University of New Mexico, socialized 4,096 of Red Storm's total 12,960 computer nodes into accepting a virtual external operating system — a leap of at least two orders of magnitude over previous such efforts.

"The goal is to create a more flexible environment for all users," said Sandia researcher Kevin Pedretti, who led Sandia researchers in adapting and optimizing a Northwestern program called Palacios, an open source virtual machine monitor framework, for the Red Storm environment. Sandia researchers directed the testing effort.

Peter Dinda, professor of electrical engineering and computer science at Northwestern's McCormick School of Engineering, said "If we can virtualize supercomputers without performance compromises, we will make them easier to use and easier to manage, generally increasing the utility of these very large national infrastructure investments." Dinda led the development of Palacios with his student Jack Lange.

Because of the complex nature of the classified work performed on Red Storm, its operating system is functionally restrictive compared with a general-purpose operating system.

Enter virtualization.

A virtual machine in effect separates the hardware of a computer from its operating system. "Our observation is that no single operating system will satisfy the needs of all potential users," said Pedretti, "so we are attempting to leverage the virtualization hardware in modern processors to allow users to select the operating system best for them to use at run-time."

This could permit one machine to simultaneously run multiple operating systems, with the possibility of migrating these systems from one computer to another. To achieve this trick on Red Storm, a receptor operating system called "Kitten" has been developed primarily at Sandia, while the virtual machine monitoring program Palacios was developed at Northwestern. Operating through the filter of this programming translation, a program not native to Red Storm can run on nodes of the machine.

A virtual machine monitor (VMM) works by separating a computer's operating system from its hardware. This indirection exposes a range of benefits. For example, a VMM allows an operating system from one machine to be run on another. (If it needs more memory for example.) It can also allow one machine to simultaneously run multiple operating systems, and it is possible to migrate running operating systems from one computer to another.

In the case of supercomputing, the VMM also acts as a translator between a user's software and the highly specialized hardware and software environments of the system, which could potentially allow more researchers to use supercomputers to solve complex problems.

With more than 38,000 processors, Red Storm is a massively parallel processing supercomputer that was uniquely designed to support modeling and simulation of complex problems in nuclear weapons stockpile stewardship. It is currently the 17th fastest supercomputer in the world, with a theoretical peak performance of 284 trillion floating point operations per second in a relatively compact 3,500 square foot footprint.

The overlaid program was only 5 percent less effective than running Red Storm's native, fixed programming. That overhead represents the additional expense in time and efficiency of running the program in a virtualized environment. "We believe the results show that the benefits of virtualization can be brought to even the largest computers in the world without performance compromises," said Pedretti.

This would mean that researchers around the world should one day be able to run their own simulations on huge machines at remote sites without having to reconfigure their software to the machine's specific hardware and software environment.

The work was funded for Sandia by its Laboratory Directed Research and Development program. Northwestern and UNM work was funded by the National Science Foundation.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!