(→Some simple operations on large bignums: If you want to benchmark Gambit, it helps not to use a version of gcc that is known not to optimize Gambit-generated code well)

(2 intermediate revisions not shown)

Line 9:

Line 9:

==Some simple operations on large bignums==

==Some simple operations on large bignums==

-

In a talk Paul Zimmerman defined <tt>a</tt>, <tt>b</tt>, and <tt>c</tt> to be <tt>(expt 3 2095903)</tt>, <tt>(expt 7 1183294)</tt>, and <tt>(expt 11 1920505)</tt>, respectively, (presumably because <tt>a</tt> and <tt>b</tt> have roughly a million decimal digits, and <tt>c</tt> has roughly 2 million digits) and timed <tt>(* a b)</tt>, <tt>(quotient c a)</tt>, and <tt>(integer-sqrt c)</tt>, and <tt>(gcd a b)</tt>. With

+

In a talk Paul Zimmermann defined <tt>a</tt>, <tt>b</tt>, and <tt>c</tt> to be <tt>(expt 3 2095903)</tt>, <tt>(expt 7 1183294)</tt>, and <tt>(expt 11 1920505)</tt>, respectively, (presumably because <tt>a</tt> and <tt>b</tt> have roughly a million decimal digits, and <tt>c</tt> has roughly 2 million digits) and timed <tt>(* a b)</tt>, <tt>(quotient c a)</tt>, and <tt>(integer-sqrt c)</tt>, and <tt>(gcd a b)</tt>. With

<pre>

<pre>

heine:~/Desktop> gsi -v

heine:~/Desktop> gsi -v

Line 18:

Line 18:

Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz

Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz

</pre>

</pre>

-

on Ubuntu 9.04 with gcc-4.3.3, we find the times

+

on Ubuntu 9.04 with FSF gcc-4.2.4 (and adding back -fmove-loop-invariants), we find the times

<pre>

<pre>

> (define d (time (* a b)))

> (define d (time (* a b)))

(time (* a b))

(time (* a b))

-

183 ms real time

+

157 ms real time

-

180 ms cpu time (172 user, 8 system)

+

156 ms cpu time (144 user, 12 system)

3 collections accounting for 2 ms real time (0 user, 0 system)

3 collections accounting for 2 ms real time (0 user, 0 system)

26078656 bytes allocated

26078656 bytes allocated

-

4313 minor faults

+

4314 minor faults

no major faults

no major faults

> (define e (time (quotient c a)))

> (define e (time (quotient c a)))

(time (quotient c a))

(time (quotient c a))

-

808 ms real time

+

710 ms real time

-

808 ms cpu time (756 user, 52 system)

+

704 ms cpu time (668 user, 36 system)

-

9 collections accounting for 11 ms real time (4 user, 8 system)

+

9 collections accounting for 11 ms real time (8 user, 0 system)

130659840 bytes allocated

130659840 bytes allocated

-

19657 minor faults

+

19660 minor faults

no major faults

no major faults

> (define f (time (integer-sqrt c)))

> (define f (time (integer-sqrt c)))

(time (integer-sqrt c))

(time (integer-sqrt c))

-

805 ms real time

+

698 ms real time

-

804 ms cpu time (796 user, 8 system)

+

692 ms cpu time (656 user, 36 system)

-

21 collections accounting for 12 ms real time (12 user, 4 system)

+

21 collections accounting for 12 ms real time (4 user, 8 system)

-

160630136 bytes allocated

+

160630392 bytes allocated

-

6131 minor faults

+

6332 minor faults

no major faults

no major faults

> (define g (time (gcd a b)))

> (define g (time (gcd a b)))

(time (gcd a b))

(time (gcd a b))

-

11353 ms real time

+

10167 ms real time

-

11353 ms cpu time (11325 user, 28 system)

+

10137 ms cpu time (10105 user, 32 system)

-

810 collections accounting for 304 ms real time (284 user, 4 system)

+

810 collections accounting for 302 ms real time (320 user, 8 system)

5055362984 bytes allocated

5055362984 bytes allocated

-

10674 minor faults

+

10450 minor faults

no major faults

no major faults

</pre>

</pre>

-

With gmp-4.3.1 these same operations take 20ms, 88ms, 68ms, and 524ms, respectively.

+

With GMP 4.2.4 (compiled with the default system compiler gcc-4.3.3), these same operations take 44ms, 220ms, 168ms, and 6052ms, respectively; with GMP 4.3.1 they take 20ms, 88ms, 68ms, and 524ms, respectively.

When <tt>a</tt>, <tt>b</tt>, and <tt>c</tt> are replaced by <tt>(expt 3 20959032)</tt>, <tt>(expt 7 11832946)</tt>, and

When <tt>a</tt>, <tt>b</tt>, and <tt>c</tt> are replaced by <tt>(expt 3 20959032)</tt>, <tt>(expt 7 11832946)</tt>, and

<tt>(expt 11 19205051)</tt>, respectively, so <tt>a</tt> and <tt>b</tt> have about 10,000,000 decimal digits and <tt>c</tt> has about 20 million decimal digits, then the Gambit times are

<tt>(expt 11 19205051)</tt>, respectively, so <tt>a</tt> and <tt>b</tt> have about 10,000,000 decimal digits and <tt>c</tt> has about 20 million decimal digits, then the Gambit times are

and the GMP 4.3.1 times are 26ms, 1544ms, 1268ms, and 9992ms, respectively.

+

The GMP 4.2.4 times are 680ms, 3852ms, 3180ms, and 1008696ms, respectively; the GMP 4.3.1 times are 260ms, 1544ms, 1268ms, and 9992ms, respectively.

So, after the release of GMP 4.3.0 in April, 2009, GMP is decisively faster than Gambit (and Gambit is no longer a reproof to the permanent claim on GMP's home page that GMP is the "fastest bignum library on the planet!"). Some of the changes that went into GMP 4.3.0 are

So, after the release of GMP 4.3.0 in April, 2009, GMP is decisively faster than Gambit (and Gambit is no longer a reproof to the permanent claim on GMP's home page that GMP is the "fastest bignum library on the planet!"). Some of the changes that went into GMP 4.3.0 are

Latest revision as of 18:03, 20 June 2009

Gambit benchmarks

Marc Feeley has put together a collection of benchmarks, called the Gambit benchmarks. The performance of Gambit is compared to a number of other Scheme implementations on that page.

The "Programming Language Shootout" benchmarks

The Computer Language Benchmarks Game compares various computer languages by benchmarking similar algorithms to a number of small problems in a number of language implementations. Here are a number of Gambit implementations of the programs in the programming language shootout. Please feel free to propose improvements before they're submitted.

Some simple operations on large bignums

In a talk Paul Zimmermann defined a, b, and c to be (expt 3 2095903), (expt 7 1183294), and (expt 11 1920505), respectively, (presumably because a and b have roughly a million decimal digits, and c has roughly 2 million digits) and timed (* a b), (quotient c a), and (integer-sqrt c), and (gcd a b). With

With GMP 4.2.4 (compiled with the default system compiler gcc-4.3.3), these same operations take 44ms, 220ms, 168ms, and 6052ms, respectively; with GMP 4.3.1 they take 20ms, 88ms, 68ms, and 524ms, respectively.

When a, b, and c are replaced by (expt 3 20959032), (expt 7 11832946), and
(expt 11 19205051), respectively, so a and b have about 10,000,000 decimal digits and c has about 20 million decimal digits, then the Gambit times are

The GMP 4.2.4 times are 680ms, 3852ms, 3180ms, and 1008696ms, respectively; the GMP 4.3.1 times are 260ms, 1544ms, 1268ms, and 9992ms, respectively.

So, after the release of GMP 4.3.0 in April, 2009, GMP is decisively faster than Gambit (and Gambit is no longer a reproof to the permanent claim on GMP's home page that GMP is the "fastest bignum library on the planet!"). Some of the changes that went into GMP 4.3.0 are

Several new algorithms for division and mod by single limbs, giving many-fold speedups.

Improved nth root computations.

The mpz_nextprime function uses sieving and is much faster.

Countless minor tweaks.

Niels Möller introduced the new gcd algorithm; it was based on an algorithm of Schönhage that was never published but was reverse engineered by Brad Lucier for Gambit (after some hints from Schönhage in e-mail) in January 2004. In 2005 Lucier told Möller (who was working on a Schönhage-type gcd algorithm for GMP) about Schönhage's algorithm, and Möller integrated it into GMP's HGCD framework for gcd calculations (and got two publications out of it, too!). Now that it is actually part of released GMP, GMP's bignum code beats Gambit's bignum code for all relevant bignum operations.