High-performance computing on gamer PCs, Part 1: Hardware

It is hard to imagine performing research without the help of scientific computing. The days of scientists working only at a lab bench or poring over equations are rapidly fading. Today, experiments can be planned based on output from computer simulations, and experimental results are confirmed using computational methods.

For example, the Materials Genome Project is currently plowing through the periodic table looking for structures and chemistries that may lead to enhanced materials for energy applications. By allowing a computer to perform most of the work, researchers can concentrate their valuable time on synthesizing and characterizing a small subset of interesting compounds identified by the search algorithm.

As the scope of scientific research has become more complex, so have the computational methods and hardware required to provide answers to scientific questions. This increasing complexity results in expensive, highly specialized scientific computing equipment that must be shared across multiple departments and research units, and the queue to access the equipment can be unacceptably long. For smaller labs, it can be nearly impossible to get adequate, timely access to critically important computing resources. Sure, there are national user access facilities or toll services, but they can take extraordinarily long times to access or be prohibitively expensive for prolonged projects. In short, high performance scientific computing is largely restricted to large and wealthy research labs.

With these issues in mind, a research team in the Laboratoire de Chimie de la Matière Condensée de Paris (LCMCP) at Chimie ParisTech, led by research engineer Yann Le Du and graduate student Mariem El Afrit, has been building a high performance computational cluster using only commercially available, "gamer" grade hardware. In a series of three articles, Ars will take an in-depth look at the GPU-based cluster being built at the LCMCP. This article will discuss the benefits of GPU-based processing, as well as hardware selection and benchmarking of the cluster. Two future articles will focus on software choices/performance and the parallel processing/neural network algorithms used on the system.

In Part 1 of our series, we detailed the hardware choices and benchmarked the various GPUs and CPUs used in the HPU4Science scientific computation cluster. The cluster is built in a master-worker configuration in which the master dispatches jobs to the workers, compiles and processes the results, and handles data storage. The master is equipped with a dual Intel XEON processor, a four-SSD RAID array for short-term storage, and an array of five 2TB hard drives for archival storage. The networking is a simple Gigabit Ethernet.

Currently, there are three workers in the cluster running Intel i7 or Core 2 Quad processors and using GPUs for highly parallelized computation. In the last paper, the third and newest worker had four GTX 580s that give four TFlops of measured, peak computational performance (this equates to six TFLOPs of theoretical performance, which is the measure used for the Top500 supercomputers list). The hardware for a fourth worker with the same configuration as the third has just arrived, so the cluster will soon comprise a total of four workers with eight GTX 580s, three GTX 480s, three GTX 285s, a C1060 Tesla GPU, and a GTX 295 dual GPU. The estimated computational power of the whole system is 20 TFLOPS in theory, and 12.5 TFLOPS in practice. Some brand new GTX 590s are currently being ordered for a fifth worker, so the total computational power is still increasing.

Obviously, a cluster of this scale requires careful software selection to maximize the performance of the hardware. In this article, we detail the software choices for the HPU4Science cluster and discuss the areas where software and performance collide.

Running high-performance neural networks on a "gamer" GPU

A recent project here at the Laboratoire de Chimie de la Matière Condensée de Paris (LCMCP) wants to make high-performance scientific computing cheaper by finding new ways to squeeze performance from consumer-grade "gamer" hardware. The idea is nothing less than building the equivalent of a $400,000 custom high performance computing setup for only $40,000.

The cluster, known as HPU4Science, is up and running, and the team behind it is tackling difficult scientific problems by developing novel computational methods that make good use of HPUs—Hybrid Processing Units—like CPUs and GPUs. The current cluster is a group of six desktop-type computers powered by Intel i7 or Core 2 Quad processors, together with GPUs that range from the GTX 280 to the GTX 590.

In two previous article, Ars outlined the hardware and software used in the cluster. For our last look at HPU4Science, we discuss specific applications running on the HPU4Science cluster, execution speed optimization techniques using Python and Cython, and the neural network algorithm used by the system.

The HPU4Science project began on paper in 2009, and after a long period thinking about the best choices to make given the budget we had, we began buying components in april 2010, reaching 80% completion in May 2011. As of May 2011, we are still working on the first application (EPR imaging), and no peer-reviewed scientific results has been published yet.