31,000 PS3s are pushing over three times the TFLOPS of 163,000 Windows boxes, a feat that has many wondering about the hows and whys of the new console's apparent F@H dominance. Let's take a closer look at the phenomenon, starting with the numbers above.

We can begin to get a handle on what's happening by organizing the main contestants by their TFLOPS/CPU ratio. In other words, this metric will give us the average number of TFLOPS per active CPU, which yields a quick-and-dirty view of which systems are bringing the most per-processor power to the project.

Processor

TFLOPS/CPU

GPU

0.0594

Cell (PS3)

0.0159

CPU (Windows)

0.00095

Clearly, the PS3 owns the Windows machines in terms of TFLOPS/CPU, but the GPU has even the PS3 outclassed. There are two main reasons behind the relative TFLOPS/CPU rankings here, both of which become apparent after a read through the F@H documentation.

First, most Ars readers are aware that, as a general rule, the TFLOPS metric is a pretty poor way to judge performance. Different instruction set architectures (ISAs) will render the same calculation using different numbers of floating-point operations, sometimes skewing the results significantly when you compare TFLOPS ratings across ISAs. So the TFLOPS ratings given by F@H are fun, but the project itself acknowledges that there are major problems with this metric. In fact, the F@H group's initial numbers for the PS3's TFLOPS rating were wrong, and were revised downward by 50% last night.

Even if we give the TFLOPS rating the benefit of the doubt, and assume that the TFLOPS/CPU ranking gets us somewhere in the ballpark of what we're looking for, there's another major factor to consider: not all work units (WUs) are created equal. The F@H PS3 FAQ introduces the issue fairly clearly:

What type of calculations the PS3 client is capable of running? The PS3 right now runs what are called implicit solvation calculations, including some simple ones (sigmodal dependent dielectric) and some more sophisticated ones (AGBNP, a type of Generalized Born method from Prof. Ron Levy's group at Rutgers). In this respect, the PS3 client is much like our GPU client. However, the PS3 client is more flexible, in that it can also run explicit solvent calculations as well, although not at the same speed increase relative to PC's. We are working to increase the speed of explicit solvent on the PS3 and would then run these calculations on the PS3 as well. In a nutshell, the PS3 takes the middle ground between GPU's (extreme speed, but at limited types of WU's) and CPU's (less speed, but more flexibility in types of WU's) [emphasis added].

I'll also point you to another section of the FAQ:

The GPU client is still the fastest, but it is the least flexible and can only run a very, very limited set of WU's. Thus, its points are not linearly proportional to the speed increase. The PS3 takes the middle ground between GPU's (extreme speed, but at limited types of WU's) and CPU's (less speed, but more flexibility in types of WU's). We have picked the PS3 as the natural benchmark machine for PS3 calculations and set its points per day to 900 to reflect this middle ground between speed (faster than CPU, but slower than GPU) and flexibility (more flexible than GPU, less than CPU).

As the last line in each quote above indicates, the PS3 sits halfway between the GPU and the general-purpose CPU in terms of the flexibility vs. performance tradeoff. So the relative positions in the TFLOPS/CPU list given earlier are about what we'd expect, with the GPU being extremely good at the limited number of WU types that it can do, the PS3 being very good at a slightly larger number of types, and the general-purpose CPU offers a range of performance numbers on all the types of WUs which averages out to a result that puts it well at the bottom of the pack.

Here's a simple analogy to illustrate the logic behind the rankings. The GRE exam has three sections: verbal reasoning, quantitative reasoning, and analytical writing. Let's say that three different students took the exam, and they were then ranked relative to each other based on the average of their three scores.

Because student G is a mathematical genius and an autistic savant who can't do anything but math, he was asked to take only the quantitative exam—an exam that he's insanely good at. So his GRE average reflects his performance on that single exam. Student P excels at both math and vocabulary, so took he took only the quantitative and the verbal reasoning exams. Thus P's GRE average reflects the average of only these two exams. Student C is good at vocabulary, average at math, and a solid writer. He took all three exams, and all three scores contributed to his GRE average.

If we were to rank the average scores of all three students, student G would outclass the other two by a wide margin, student P would come in second place, and student C would be stuck at a far, far distant third place. This is because G (the GPU in the analogy) took only the test that he was insanely good at, P (the PS3) took the two tests at which he excelled, and C (the general-purpose CPU) had to take all three tests.

Ultimately, the TFLOPS/CPU rankings given above align pretty much exactly with the degree of specialization of each type of processor. The GPU is far and away the most specialized of the three, so it sits comfortably atop the rankings. The PS3 has a lower degree of specialization than the GPU, but a significantly higher degree than the general-purpose CPU. Indeed, you could almost use each processor's TFLOPS/CPU score as a sort of "degree of hardware specialization" rating.

The final thing that's worth noting is that the pool of CPUs that make up the "Windows" portion of the client list varies widely, from older Pentium 4 models to brand new Core 2 Duos and everything in between. The GPUs are much more uniform in terms of hardware types (all ATI), and the PS3 is the most uniform of them all. So the PS3-to-PC comparison isn't just apples-to-oranges. It's more like apples-to-citrus.

The fact that the metrics by which the rankings are decided are totally stacked against the general-purpose CPU shouldn't necessarily detract from the PS3's feat. There's a reason why IBM is pushing blades based on the same Cell processor that powers the PS3 in the high-performance computing (HPC) market: Cell can offer dramatic speedups vs. a general-purpose CPU on certain types of "embarrassingly parallel" workloads. One such "embarrassingly parallel" workload happens to be the F@H client, at least when it's running certain types of work units.