pi_css5

Calculating π... FAST!

pi_css5 written by Takuya Ooura
and is copyrighted by him, but may be freely distributed and
redistributed. In addition to calculating lots of digits of
π fast (although other programs are faster), it is extremely
portable, and thus an interesting
benchmark for different compilers and hardware. Or even
languages since I've now converted it into C# and Java.

5/1/06 Updated the Windows binaries with an
SSE2 version as well. That one's about 20% faster than the
generic one (but requires Pentium 4/Athlon XP or newer).

3/18/06 All the binaries have been updated.
Most should be fast, and slightly more user friendly (don't
have to run them from the terminal). In particular the MacOSX
version runs on any Mac running MacOSX 10.1 or newer, and is
optimized for G4, G5 and Apple's new Intel machines! I've
no access to Windows right now, so there's no optimized
Windows version (the generic one will work fine). For Linux on
x86 machines with SSE2 (read: Pentium 4/Athlon XP or newer),
things should be a lot faster.

Results

Comparative performance of pi_css5 and other programs is
here in chart form.

Apple MacOS: Requires MacOS 7.5 or
later. Carbon version is for MacOS 8.6 or later. FAT
versions supports 68K Macs but requires a floating point
unit. The programs must be copied out of the disk image
before they can be used. pi_css5_fatmacos.smi.bin

Java: requires JDK 1.2 or better to
compile, and JRE 1.4 or compatible to run the included
bytecode. This code is a manual translation from the C code.
pi_cs5_java.tgz
(~53k)

C#.NET: can be compiled with any C#
compiler. Should be compatible with any runtime. This code is
a manual translation from the Java code. pi_css5_csharp.tgz
(~28k)

Compilation
Notes

Microsoft Windows:
pi_css5_sse2 was bult with Intel C++ 9.0 using -xN
optimizations. Generic version was built with mingw32 and GCC
4.1, using -O3 -funroll-loops -fomit-frame-pointer
-mcpu=i686.

Apple MacOS X: Universal binary. Intel
version built with Intel C++ 9.1 using -fast optimizations.
PowerPC version built with GCC 4.0.1, using -O3
-funroll-loops -fomit-frame-pointer -ffast-math
-fprefetch-loop-array optimizations for all targets.
Additionally the G4 version uses -mcpu=G4 -maltivec
-faltivec and the G5 version uses -mcpu=G5 -fast.

Apple MacOS (Classic): pi_css5 (FAT)
was built with MrC 5.0 and Symantec C 8.9 using MPW 3.5 and
all optimizations enabled. pi_css5 (Carbon) was built with
Metrowerks Codewarrior Pro 8.2 and level 4
optimizations.

Linux on x86: pi_css5.sse2 was built
with Intel C++ 9.0 using -xN optimizations, and so requires
a machine with sse2 support (Pentium 4, Athlon XP or
newer). Generic version built with GCC 4.2 using the -O3
-funroll-loops -ffast-math -fomit-frame-pointer -mcpu=i686
and -static flags. Both built with dietlibc to reduce
size.

Linux on PowerPC: Built with GCC 3.4.4
using the -O3 -funroll-loops -fomit-frame-pointer -mcpu=G3
and -static flags.

Linux on Alpha: Built with GCC 3.4.3
using the -O3 -funroll-loops -fomit-frame-pointer
-mcpu=ev67 and -static flags.

Linux on IA64: Built with Intel C++ 9.0
using the -O2 -static flags.

HP-UX: Built with HP aC++ A.03.055 using
-fast +Odataprefetch and -Wl,-aarchive and +DA1.1 or +DA2.0
for the PA versions.