Cloud HPC Makes the Rounds at SC10

The Science Clouds blog has posted a nice roundup of HPC Cloud developments from SC10 last week:

Two representatives from Platform Computing presented a large-scale cloud deployment being tested at the CERN laboratory in “Building the World’s Largest HPC Cloud.” CERN is testing Platform ISF to run scientific jobs in a virtualized environment. Results included reports of launching several thousand VMs and a comparison of image distribution techniques.

In “Virtualization for HPC”, members of the academic (Ohio State University, ORNL) and industrial (VMware, Univa UD, Deopli) communities shared their vision of a future for virtualization technologies in HPC. Topics discussed included pro-active fault tolerance using migration, virtualized access to high-performance interconnects, and new hypervisors technologies designed for exascale computing.

In “Low Latency, High Throughput, RDMA and the Cloud in Between” representatives from Mellanox, Dell, and AMD discussed the advantages of cloud computing and highlighted the importance of reducing latency and increasing throughput for scientific communities. RDMA over Converged Ethernet (RoCE) was emphasized as a specific effort toward reducing latency in virtualized environments.

The work in “Elastic Cloud Caches for Accelerating Service-Oriented Computations” demonstrated a dynamic and fast memory-based cache using IaaS resources, specifically for a geoinformatics cyberinfrastructure. The system responds to changes in demand by dynamically adding or removing IaaS nodes from the cache.

Purdue demoed Springboard, a “hub” to work with NSF’s TeraGrid infrastructure. The hub provides a central point for researchers to collaborate and removes the need for researchers to rely strictly on the command line when interacting with the TeraGrid’s resources. Springboard also interfaces with the TeraGrid’s first cloud resource, Wispy, at Purdue.

The National Center for Atmospheric Research (NCAR) and the University of Colorado at Boulder used 150 Amazon EC2 instances for the Linux Cluster Construction tutorial. The virtual machines were launched on-demand the morning of the tutorial. They provided participants with a realistic software environment for configuring and deploying a Linux cluster using a variety of open source tools such as OpenMPI, Torque, and Ganglia.

Authors Paul Marshall And Pierre Riteau go on to say that it would be cool to see an HPC challenge category for cloud computing, perhaps running on Amazon’s cluster compute resource that premiered at number 231 on the TOP500.

Resource Links:

Latest Video

Industry Perspectives

In the second in this series of articles on ways to save energy in computing, Robert Roe from Scientific Computing World reports on how Europe is looking to both hardware and software solutions. [Read More...]