Putting the PS3’s brain to work

Scientific simulations are becoming more advanced as the processors available …

One major trend in the last few decades of computer simulation has involved the transition from ever-faster single processors to larger multiprocessor systems. Large scale multiprocessor systems present a wide range of options to scientific users. They can range from collections of highly specialized processors that are designed to carry out a specific computation—RIKEN-BNL for quantum chromodynamics computations, or GRAPE-DR for N-body problems—to a Beowulf cluster made up of different types of old, discarded desktops. An article in the current issue of Physical Review E examines the performance of Cellprocessors when they're used in computer simulations. The authors compare two Cell setups: a Sony PlayStation 3 (best performance per dollar) and a QS20/QS21 IBM blade server (best in terms of raw performance).

Cell processors can execute orders of magnitude more floating point operations per second (flops) than traditional single CPU systems. However, their lack of intrinsic double-precision data types (double-precision math can be handled, but doing soresults in a large performance hit) and the fact that some IEEE 754 binary floating point arithmetic functions are not implemented makes using the Cell in scientific computing a questionable proposition.

To test the various Cell processor based systems, the group used previously developed code for modeling fluid flow systems using a two-dimensional lattice-Boltzmann (LB) method. In this method, space is discretized into a number of lattice sites, and equations describe what is happening at each site. The first test run simply involved determining whether the use of single-precision representations of real numbers resulted in a loss of precision in the simulations.

Since the equations underlying the LB method conserve mass and momentum, any loss in precision could result in the equivalent of a physical loss of mass or momentum, violating the underlying laws that govern the physical system. They found that a small loss, around one percent, in mass or momentum could occur over the course of a simulation due to the Cell's inability to accurately represent floating point numbers.

By carrying out simulations of various flow geometries—flow around a circular object, wave relaxation, and Poiselle flow—the researchers found that simulations on the Cell processor could be carried out seven to 21 times faster than those on an Intel Core 2 running at 2.4 GHz. When the results were reported in terms of lattice site updates per second (LUPS), the code running on a PS3 utilizing a single SPE core ran at about 4.5 megaLUPS. Running full out, the IBM QS20/QS21 blade server produced approximately 73 megaLUPS; the PC reference simulation ran at 3.3 megaLUPS.

Since most academic research groups are not overly flush with cash, the authors put these results in terms that someone holding the purse strings would understand. It terms of computing power per cost, the PS3 delivers 50,000 LUPS/dollar, the super high performance IBM QS20/QS21 runs at 3500 LUPS/dollar, while a quadcore desktop machine is capable of putting out 17,000 LUPS/dollar. The researchers point out that LB simulations take a large amount of RAM and, when moving to a three-dimensional simulation, the amount of RAM will become very important. Since the PS3 has only 256 MB of RAM, even moderately sized 3D grids could end up being written and read into swap memory, which would be a significant performance bottleneck.

The authors conclude that, even without the intrinsic support for double precision numerics, the Cell processor is becoming a powerhouse in the next generation of scientific high-performance computing, despite its low IEEE compliance for even single-precision computations. Since the next generation Cell processor (PowerXCell 8i) is going to have much faster double-precision computing power (it will be half the speed of single-precision arithmetic) it could be a serious choice for new computing clusters. The authors close by pointing out that the Roadrunner petaflop supercomputer being built at Los Alamos National Lab will be powered by 12,960 PowerXCell 8i processors.

Matt Ford / Matt is a contributing writer at Ars Technica, focusing on physics, astronomy, chemistry, mathematics, and engineering. When he's not writing, he works on realtime models of large-scale engineering systems.