High performance computing versus high throughput

Xserve G5 supercomputer. Image credit: Christopher Bowns, Flickr

Two approaches to scientific computing

The terms “high performance computing” (HPC) and “high throughput computing” (HTC) might sound interchangeable to those not familiar with scientific computing, but they denote two very different approaches to computing. I’m going to describe the difference below (with the caveat that I have only a layman’s understanding of this field).

High throughput computing is for many smaller tasks

HTC is a computing approach that aims to make available a large number of computers to quickly accomplish tasks that are easily broken up into smaller, independent components. For example, if you have to process 100 video clips, and each one takes ~1 hr, then you would need ~100 hrs of computing time on your laptop.

However, if you had 100 laptops, you could theoretically do the task in 1 hr assuming that you could instantly command each one to begin the processing task (in reality, of course, you’d have to run around setting up the task on each computer which could take longer than the compute time). The point is this: each video processing task is independent of the others.

It is these types of tasks that HTC aims to address. By providing many hundreds or thousands of networked CPUs in a cluster and a software application that can easily and automatically track and distribute hundreds of tasks (called a DRM or distributed-resource manager) an HTC user can submit a task such as the video processing example described above and have it automatically farmed out to 100 compute nodes for processing (in the HTC world this is called a “pleasantly parallel” problem). Once each node completes, the data are copied back into the user’s home folder and it appears to the user that they have just used an extremely fast computer, when in fact they have used 100 computers working simultaneously.

High performance computing is for difficult computational problems

Now, however, consider the case of a computational task where each subunit is not independent of all of the others. One that I am intimately familiar with is Molecular Dynamics (MD) simulations of protein structure and dynamics. In MD simulations, an algorithm simulates the atomic motions of a protein molecule immersed in a box of waters on a very short timescale (somewhere on the order of a microsecond). Even with the short timescale, this is a very compute-intensive task. But because each atom in the protein interacts with many other atoms in the system, it is a task that can’t be neatly broken down into independent components in the way that video processing can be. You can’t simply give each atom to a separate compute node. In effect, MD simulation is a single, extremely resource-intensive computation.

Enter high performance computing. In HPC (also called supercomputing), the aim is to build hardware and software that are focused on peak computing capability (i.e., speed) and extremely fast interconnectedness, rather than on the number of simultaneous tasks that can be accomplished. The “high performance” part of HPC comes about from the technological focus on networking the computational nodes together with extremely fast connections so that communicating data and messages back and forth does not become a significant bottleneck to completing a large-scale computation.

On the software side of HPC, code libraries like MPI have been developed that allow simulations to be “parallelized” (i.e., broken down into smaller pieces). These smaller pieces (called “domain decomposition” for MD simulation) are then farmed out to the compute nodes of an HPC supercomputer and they can exchange data in real time so that each part of the simulation “knows” about the results from every other part. In this way, the velocities and positions of certain atoms of a protein can be influenced by all of the other velocities and positions of atoms even if they are being simulated on different CPU nodes.