Archives

In my 2007-05-23 blog item, I asked "When will they be published?". I also posed my questions by email to John Buscemi in IBM Media Relations. I did not receive a direct response, but corrections to the press release have since been made, apparently an indirect response to my inquiry.

"They" are the POWER6 SPEC results. The tests were conducted in May, but not published until June.

In particular, POWER6 is now the fastest microprocessor in the world on SPECfp_base2006, delivering a speedup of 1.08 when compared with Dual-Core Intel Itanium 2 9050.

I thought it might be fun to do a quantitative comparison to a Sun Blade 150 Workstation with one 550 MHz UltraSPARC IIi microprocessor and 512 MB of main memory (no longer orderable). Using a 24041 test case workload on the Sun Blade 150 Workstation and a 61390 test case workload on the Sun Fire V210 Server, the cycles/second speedup of FSS was 1.98. But a clock frequency of 1.34 GHz is 2.44 times faster than a clock frequency of 550 MHz. Furthermore, the UltraSPARC IIIi microprocessor employs a superior microarchitecture as compared with the UltraSPARC IIi (e.g. much larger L1 and L2 caches). Clearly, other factors limited the overall speedup.

It is noteworthy that, using a 24041 test case workload on both machines, the speedup was 2.46. Here, I am comparing the fastest FSS performance ever observed on each computer.

That's enough about logic simulation. A passion of mine is computer arithmetic, particularly floating point. Using a floating point benchmark I wrote that predates Cosmic Horizon, I made some FLoating point Operations Per Second (FLOPS) measurements. The C++ benchmark, called FLOPS3, performs a large number of operations on operands of the long double fundamental type. The output from each of the two Sun products I have been comparing follows.

Sun Blade 150 Workstation:
1.91797e+06 FLOPS
sizeof(long double) = 16

Sun Fire V210 Server:
3.73043e+06 FLOPS
sizeof(long double) = 16

Here, we have a speedup of 1.94, which is less than the clock frequency quotient.

Some new SPEC CPU2006 results were published last month that might cause some of you to take another look at SPARC. Whether we call it Fujitsu SPARC Enterprise M8000 or Sun SPARC Enterprise M8000, its SPECfp_base2006 result of 11.1 puts SPARC back in the vicinity of PowerPC (e.g. POWER5+). The SPARC Enterprise M8000 server showcases Fujitsu's SPARC64 VI microprocessor, an implementation of the SPARC-V9 architecture. This result shows SPARC64 VI as 2.75 times faster than UltraSPARC IIIi.

Take a look at the throughput performance vs. single thread performance graph here. I really like that kind of graph and I want to see more progress to the right (e.g. Rock). Computer manufacturers tend to talk more about throughput when their single thread performance lags the competition. Single thread performance is important too. Not all applications are, or can be, massively multithreaded. And sometimes, what we care most about is how long we're going to have to wait for the computer to complete a single task.

It's good to see this kind of success with SPARC. Here's a measure of architectural efficiency that I devised: 1/(R&D dollars * execution time). I have a hunch that the SPARC-V9 architecture, because of the relative ease of implementing microprocessor designs based on it and because of performance-enhancing architectural features (e.g. register windows), outperforms the legacy IA-32 architecture (with its expansion to support 64 bits) on that efficiency metric. SPARC R&D is money well spent.