Note that examples and benchmarks made before september 2009 on this page use
the older, less general, definition, with F(0) = 1.
September 2009 UPDATE:
UPDATE IN PROGRESS: Five years since I wrote this page, I've decided to rewrite
all examples to be more useful, the subroutines will be as general as
possible (no more 2x2 matrix multiplication), all code will be commented,
each example will now take its arguments from command line (where possible)
April 2014 UPDATE:
This effort was abandoned somewhere in 2010.

ALGORITHMS AND BENCHMARKS SUMMARY

1. EXPONENTIAL COMPLEXITY

1A - naive binary recursionexecution time: T(n)=to*φn

asmjavaC

to=2.8 nsto=3.0 nsto=3.1 ns

2. LINEAR COMPLEXITY

1B - cached binary recursion / memoization

pending 2009/2010 update

2A - cached linear recursion / infinite lazy-evaluated list

pending 2009/2010 update

2B - linear recursion with accumulator

pending 2009/2010 update

2C - imperative loop with mutable variables

pending 2009/2010 update

3. LOGARITHMIC COMPLEXITY

3A - matrix multiplication

pending 2009/2010 update

3B - fast recursion

pending 2009/2010 update

3C - Binet's formula with rounding

pending 2009/2010 update

EXAMPLES

Languages

Algorithms

1A

1B

2A

2B

2C

3A

3B

3C

concatenative languages
(program is a composition of functions applied to stack of arguments)

Please remember that these benchmarks do not show strength or weakness of any
programming language. All they show is how efficient specific language
translators are at implementing the featured algorithms. It is currently
(as of 2009) in the transitional stage from OLD to NEW benchmarks using modern
hardware and better written examples.

Benchmarked on Intel Pentium Core i7 2.66GHz running Linux 2.6.31 with 12GB RAM, in x86_64 mode
OLD benchmarks: Intel Pentium 4 2.20GHz running Linux 2.5.75 with 640M RAMNEW RAW BENCHMARK DATA available if you want to see them, as well as
OLD RAW BENCHMARK DATA.

The number of additions grows exponentially with n, and each recursive call
requires memorization of state (since neither call is tail-recursive), which
makes space complexity also exponential.
This algorithm becomes uselessly slow for F(n) with n>50 and all measured
results easily fit in 64-bit integer data type. With the duration of
the integer arithmetic operations independent of n, the time to execute these
programs is almost perfectly t0φn,
which is why the plots of the logarithm of execution time vs. n are
parallel straight lines.
This algorithm was implemented directly in all languages featured on this web
page, since all of them allow recursion. Only Fortran did not allow recursion
in the past, but it is supported since the 1990 standard.
Curiously, Java outperforms C++ on this test, but it's one of the very few
artifical tests
where it does so

Language

Translator [*]

T0

T(46), seconds

Assembly

GNU as 2.19.1 [c]

2.8 ns

11.63

log(Execution time, seconds) vs. Fibonacci number calculated

Java

Sun Java SE 1.6.0_16-b01 / HotSpot 64-Bit VM 14.2-b01 [b]

3.0 ns

12.34

C

GNU gcc 4.4.1 [c]

3.1 ns

12.57

C++

GNU g++ 4.4.1 [c]

3.1 ns

12.61

OCaml

INRIA OCaml 3.11.1 [c]

3.4 ns

13.98

Fortran

GNU gfortran 4.4.1 [i]

4.3 ns

17.62

Forth

GNU gforth 0.7.0 [i]

15.3 ns

61.91

Scheme

MIT Scheme 7.7.90 20090107 [b]

22.6 ns

98.04

Haskell

GHC 6.8.2 [c]

31.3 ns

129.1

Joy

Joy1 by Manfred von Thun, 07-May-03 [i]

199 ns

813.5

PostScript

GPL Ghostscript 8.70 [i]

284 ns

1147

awk

GNU Awk 3.1.7 [i]

417 ns

1729

MUF

FuzzBall MUCK 6.09 [b]

716 ns

2939

perl

perl 5.8.8 [i]

755 ns

3101

J

Jsoftware J64 6.02 [i]

1.08 μs

4386

Hope

Hope by Ross Paterson, 08-Dec-03 [i]

1.52 μs

6035

Prolog

SWI Prolog 5.6.64 [b]

2.17 μs

8900

Tcl

Tcl 8.5.7 [i]

3.37 μs

13700

sh

bash 4.0.33(2) [i]

101 μs

415000

[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1) except for MUF, where an internal time function was called from the program.
T(46) was measured directly only when it was less than 2000 seconds, otherwise it was extrapolated from execution times for smaller inputs.
In all cases I tried to pick the fastest one of the language translators that are readily available and
offer unlimited precision integer arithmetics (it becomes important in further tests).

ALGORITHM 1B: CACHED BINARY RECURSION / MEMOIZATION

If n equals 0, return 0

If n equals 1, return 1

Read F(n-1) from cache or recursively calculate if not cached yet

Read F(n-2) from cache or recursively calculate if not cached yet

Return the sum of the results from steps 2 and 3 and save it in the cache
Complexity (unlimited precision): O(n2) speed, O(exp n) space

The common optimization technique applied to binary recursions is memoization:
for any given set of arguments, the recursive function is calculated only once,
and the result is stored in memory. Further calls to the function with the same
arguments return the previously calculated result.
Since no function result is calculated twice, the number of additions
required to calculate F(n) is reduced to n. The number of cache lookups
is instead exponential but, assuming it is implemented as a suitable
random-access data structure (array or hash table), the time of each
cache lookup is O(1).
Space requirements for this algorithm are still exponential because of
binary recursion.
In the programming languages that have built-in support for memoization
(such as J or Factor), algorithm 1A can be turned into 1B with just one
keyword. For other languages, I manually implement a suitably generic data
structure.

Language

Translator [*]

t0

T(1,000,000), seconds

sqrt(Execution time, seconds) vs. Fibonacci number calculated

[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.

ALGORITHM 2A: CACHED LINEAR RECURSION

When previously calculated Fibonacci numbers are stored in a data structure,
there is no need for binary recursion, the entire structure can be filled
with only n iterations.

infinite lazy evaluated list.

Construct an infinite list F with F[0] equal 0 and F[1] equal 1

Define F[i] as the sum of F[n-1] and F[n-2].

Return the nth element of the list

In the languages that support lazy evaluation, either implicitly (like Hope
or Haskell) or explicitly (like OCaml or Scheme), it is possible to
construct the data structure containing all Fibonacci numbers and simply
return the nth element of the structure.

random-access container.

Create a container F with the elements F[0] equal 0 and F[1] equal 1

Recursively (or iteratively) calculate the elements F[2..n]
according to the formula F[i]=F[i-1]+F[i-2]

Return F[n]

In the languages that only support eager evaluation, the entire data
structure must be filled with an explicit loop (or a tail-recursive function).
If the container permits random access with O(1) complexity, the total
execution complexity is O(n), assuming fixed precision. For large precision
numbers, the complexity of addition grows with n as well.

[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.

These algorithms execute n additions to calculate F(n), and store the intermediate
results in the recursive function's additional arguments, which serve as
accumulators. The time to execute is, therefore n * time to execute
one addition and one recursive call. The addition of large
numbers if O(n), and recursive calls are typically constant time and never greater
than O(n). The time to execute the algorithm is T(n)=t0n2.
which is a straight line in the plot of sqrt(T) vs. n.
This recursive functions is easily written as tail recursion, which means
it does not contribute to space complexity, but space is needed to store
the accumulated sum, which grows as O(n). In the languages that do not support
tail recursion, such as C/C++, space grows as O(n2), and the
algorithm quickly becomes useless. Such languages are more suited for 2C (iteration) anyway.

Language

Translator [*]

t0

T(1,000,000), seconds

Haskell

GHC 6.8.2 [c]

12.4 fs

12.42

sqrt(Execution time, seconds) vs. Fibonacci number calculated

Prolog

SWI Prolog 5.7.15 [b] with GMP 4.3.1

21.3 fs

22.22

OCaml

INRIA OCaml 3.11.1 [c]

32.2 fs

32.14

Java

Sun Java SE 1.6.0_16-b01 / HotSpot 64-Bit VM 14.2-b01 [b]

34.1 fs

31.27

Scheme

MIT Scheme 7.7.90 20090107 [b]

48.8 fs

47.88

Forth

GNU gforth 0.7.0 [i]

-

-

C++

GNU g++ 4.4.1 [c] with GMP 4.3.1

-

-

J

Jsoftware J64 6.02 [i]

-

-

[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
J reports stack overflow when trying to calculate numbers over ~3000,
and I could not find a way to increase available stack size in jconsole
C++ linked with GMP consumes memory at a very large pace, exceeding 20Gb when calculating
500,000th number
Forth (Gforth linked with GMP) also consumes memory at a very large pace, running into tens
of gigabytes about the 500,000th number

ALGORITHM 2C: IMPERATIVE LOOP WITH MUTABLE VARIABLES

Assign value 0 to variable n1 and value 1 to variable n2

Assign the value of n1+n2 to variable n2 and assign the old value of n2 to n1

These algorithms execute n additions to calculate F(n), and store the intermediate
results in mutable variables. Time to execute is, therefore n * time to execute
one addition and two variable read/store operations. The addition of large
numbers if O(n), and store/retrieve is typically constant time and never greater
than O(n). The time to execute the algorithm is T(n)=t0n2.
which is a straight line in the plot of sqrt(T) vs. n.

Language

Translator [*]

t0

T(1,000,000), seconds

Forth

GNU gforth 0.7.0 [i] with GMP 4.3.1

11.1 fs

11.13

sqrt(Execution time, seconds) vs. Fibonacci number calculated

C++

GNU g++ 4.4.1 [c] with GMP 4.3.1

23.2 fs

22.62

Java

Sun Java SE 1.6.0_16-b01 / HotSpot 64-Bit VM 14.2-b01 [b]

31.6 fs

29.52

OCaml

INRIA OCaml 3.11.1 [c]

35.9 fs

34.52

Scheme

MIT Scheme 7.7.90 20090107 [b]

50.8 fs

48.46

J

Jsoftware J64 6.02 [i]

413 fs

403.6

Prolog

SWI Prolog 5.7.15 [b] with GMP 4.3.1

73.8 ns

-

Haskell

GHC 6.8.2 [c]

-

-

Hope

Hope by Ross Paterson, 08-Dec-03 [i]

-

-

[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.
Prolog usually does not have mutable variables, but some implementations do.
Haskell does not have the concept of mutable variables.
Hope does not have the concept of mutable variables.

ALGORITHM 3A: MATRIX MULTIPLICATION

This is the most mathematically elegant way to calculate a Fibonacci number.
Complexity (fixed precision): O(log n) speed, O(1) space
Complexity (unlimited precision): O(n log2 n log log n) speed, O(n) space
Raising a matrix to the power of n requires only O(log n) matrix
multiplications, if n is a natural number and the repeated squaring algorithm
is used. Space requirements for repeated squaring include the product
accumulator and the last calculated square, which makes it O(1) (in fixed-precision case)
unless non-tail-recursive implementation is used.
In case of unlimited precision, execution time of each 2x2 matrix multiplication
is proportionate to the execution time of one integer multiplication,
which has the complexity of O(n log n log log n) and space requirements to store
one 2x2 matrix grow as O(n).

The repeated squaring algorithm for calculation of xn:

start with product accumulator a=1 and last square accumulator s=x

if n is odd, increase the accumulated product, a=a*s

calculate the next square of s, s=s2

divide n by 2, n = n div 2

repeat until n=0, at which time the final value of a is the answer

In languages providing bit operations, this algorithm can examine the bits
in the binary representation of n, proceeding from least-significant
to most-significat bit, instead of repeating division by 2. The example in J
makes good use of this approach.

Language

Translator [*]

t0

T(1,000,000), seconds

sqrt(Execution time, seconds) vs. Fibonacci number calculated

[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.

ALGORITHM 3B: FAST RECURSION

This algorithm can be derived either from from the fact that, from definition,
F(n+m)=F(n)F(m) + F(n-1)F(m-1), or by rewriting the matrix equation using the
fact that An+m=AnAm:
This algorithm is faster than than 3A because it does not repeat calculations
of the same numbers (top right and bottom left corners of the matrix in 3A),
but it cannot be easily implemented as a tail-recursive function, resulting
in O(log n) space complexity.
I implement it this way:

[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.

ALGORITHM 3C: BINET'S FORMULA WITH ROUNDING

Eigendecomposition of the matrix equation from 1A gives a formula,
published by Abraham de Moivre in 1730, Euler in 1765, and by Jacques Philippe
Marie Binet in 1843
Where φ is the golden ratio, (1+sqrt(5))/2.
Because (1-phin) / sqrt(5) is less than 1/2 for all positive n, this
formula can be simplified to nearest integer to φn/sqrt(5) or,
using the floor function,
Complexity (fixed precision): O(log n) speed, O(1) space
Complexity (unlimited precision): O(n log2 n log log n) speed, O(n) space

Evaluation of this formula, just as evaluation of the formula in 3A, is thus
reduced to taking an integer power of a number, only this number is irrational
and must be represented as a floating-point value with sufficient precision to
produce the correct fibonacci number. Not many languages support arbitrary
precision floating point values, I use GNU MP library where possible.
The computational and space complexity of this algorithm are the same as 1A,
and are the same as the complexity of the repeated squaring algorithm with one
addition: time to calculate the square root of five with sufficient precision.
Where not provided by the compiler/library, I iteratively calculate inverse
square root via y(n+1) = 0.5*y(n)*[3 - a*y(n)^2] because this recurrent
expression has no division.

Language

Translator [*]

t0

T(1,000,000), seconds

sqrt(Execution time, seconds) vs. Fibonacci number calculated

[*] - [c] for Compiler, [b] for Bytecode intepreter, [i] for Interpreter
Execution time is user time as reported by time(1), the test programs were
modified to avoid conversion of the entire result to string, since that
operation may be slow for large numbers.