Microwulf: Performance

Supercomputer performance is typically measured in flops --
the number of floating point instructions the
supercomputer can perform each second.
Early supercomputer performance was measured in
megaflops (Mflops: 106 flops).
Hardware advances increased subsequent supercomputers
performance to gigaflops (Gflops: 109 flops).
Today's massively parallel supercomputers
are measured in teraflops (Tflops: 1012 flops),
and tomorrow's systems will be measured in petaflops
(Pflops: 1015 flops).

When discussing supercomputer performance, you must also
distinguish between

peak performance -- the theoretical maximum
performance a given computer could possibly achieve; and

measured performance -- the maximum performance
a given computer actually achieves on a benchmark
or other performance-measurement program.

Computer manufacturers often list a computer's performance using
its peak performance, resulting in inflated performance claims.
In actual usage, you are doing well if your
computer's measured performance is 50-60% of its peak performance.

One final factor in measuring performance is the precision
of the floating point operations.
Most high performance computations use double-precision
operations.
These can be much more time-consuming than single-precision operations,
so you have to be careful not to mix these comparisons --
if you do, you're comparing apples to oranges.

The standard benchmark (i.e., used by the
top500.org supercomputer list)
for measuring supercomputer performance is
high performance Linpack (aka HPL),
a program that exercises and reports a supercomputer's
double-precision floating point performance.
To install and run HPL, you must first install a version of the
Basic Linear Algebra Subprograms (BLAS) libraries,
since HPL depends on them.

In March 2007, we benchmarked Microwulf using HPL and
Goto BLAS.
After compiling and installing each package,
we ran the standard, double-precision version of HPL,
varying its parameter values as follows:
We varied PxQ between {1x8, 2x4};
varied NB between {100, 120, 140, 160, 180, 200};
and used increasing values of N, starting with 1,000.
For the following parameter values:

PxQ = 2x4; NB = 160; N = 30,000

HPL reported 26.25 Gflops on its WR00R2R4 operation.
Microwulf also exceeded 26 Gflops on other operations,
but 26.25 Gflops was our maximum.

This is significant computational power.
For example, according to the
top500 list,
a 1996 Cray T3D-256 provided just 25.3 Gflops
of measured performance.