Speed comparison of various number crunching packages (version 2)

Speed of execution is an important aspect in choosing
a data analysis software. Since it can vary from a factor 10, or more,
on the same computer, this can make the difference between
a quick-reacting package and another one that
seems to takes hours to calculate!

This is the second version of our benchmark tests,
derived from Stephan
Steinhaus' benchmark v. 2. You can find a (quite outdated) test with
our first version here. The tests in our
first version were scaled in such a way than each of them ran in about 1
second on the test machine (a Celeron 500Mhz with 256 Mb RAM under
Windows 2000 professional) with our reference software: Matlab 6.0
(R12). For this second version, we decided to change the reference
software to a freely available software. This way, everybody can
download it and use it also as a reference in its own computer. We chose
R version 1.6.2 with a standard, non processor-optimized ATLAS (Rblas.dll)
library as our new reference. All tests are scaled in order to run
in 1 +/- 0.1 sec in our new test computer: a Pentium IV 1.6 Ghz with 1
Gb RAM under Windows XP professional. Other changes from the original
Steinhaus' benchmark are still the same as version 1: (1) we kept only tests that run on all checked software,
(2) we ranged them
in two categories ("matrix calculation" versus
"matrix functions"), (3) we added "programming" category
to evaluate how fast the software executes scripts, (4) we adapted or
optimized tests to recent versions of the software, and (5) we considered
only trimmed geometric means (worst and best results eliminated) inside each
category and for the overall index. Note that Stephan Steinhaus' report
evaluates also the "richness" of the packages (which functions
are present, and which one are absent). Here, we only compare software
for speed!

We have compared:

R 1.9.0,
the latest version of our reference software, a rich and powerful free 'S language dialect' (R
benchmark 2.3 script; text file, 13 Kb). Here, we use the Pentium
IV-optimized ATLAS library (provided on CRAN), which gives slightly better results in
some tests. We have not tested other optimized libraries, like
Goto's
one.

III.C: grand common
divisors of 70,000 pairs. Tests potentials in using
recursive functions.

III.D: creation of a
220x220 Toeplitz matrix. Check the speed of execution for
loops.

III.E: Escoufier's method
on a 37x37 random matrix. Tests various aspects of
programming combined in a single test.

Note that tests III.A-E are not most optimized
algorithms for each package, but they do test similar features in all of
them. For instance, a matrix algorithm for test III.D is often much more
efficient, as is a possibly preprogrammed toeplitz() function.
Yet, we keep the loop algorithm in all cases... in order to test
the speed of loops execution in scripts!

Results

The tests were run three times on a Pentium IV 1.6 Ghz
computer with 1 Gb of memory under Windows XP professional and the
mean value is recorded. The next table presents results:

Test (sec)

R
1.9.0

S-PLUS
6.1

Matlab
6.0

O-Matrix
5.6 Ml mode

O-Matrix
5.6 native

Octave
2.1.42

Scilab
2.7

Ox
3.30

I. Matrix
calculation

I.A

1.49

3.03

0.48

0.69

0.58

2.01

1.19

0.74

I.B

0.43

1.37

0.42

0.53

0.62

1.22

0.70

0.94

I.C

0.87

2.38

0.89

0.98

0.98

7.77

2.00

1.97

I.D

0.26

0.72

0.73

0.19

0.30

0.35

8.58

0.45

I.E

0.26

1.33

0.24

0.17

0.14

0.78

2.11

1.04

Score

0.46

1.63

0.53

0.41

0.48

1.24

1.71

0.90

II. Matrix
functions

II.A

1.01

1.62

0.48

0.99

1.05

0.96

1.78

3.06

II.B

1.25

0.96

0.86

0.41

0.49

2.30

2.44

1.78

II.C

0.30

0.41

0.27

0.13

0.14

1.02

2.27

0.71

II.D

0.24

1.92

0.33

0.11

0.12

0.21

1.96

0.36

II.E

0.14

1.48

0.23

0.07

0.06

0.47

1.67

0.35

Score

0.42

1.35

0.35

0.18

0.20

0.77

2.00

0.77

III. Programming

III.A

0.83

1.68

2.11

0.31

1.84

2.06

0.72

0.69

III.B

1.33

1.14

0.84

0.51

0.64

0.73

0.91

0.79

III.C

0.56

0.71

0.91

0.14

0.17

0.42

1.52

0.72

III.D

0.67

6.62

0.38

0.10

0.10

4.39

1.45

0.05

III.E

0.89

15.10

1.92

0.60

0.56

3.08

3.97

0.31

Score

0.79

3.15

1.14

0.28

0.39

1.67

1.26

0.54

Total

10.52

40.47

11.12

5.93

7.83

27.76

33.27

13.97

Overall

0.53

1.71

0.60

0.27

0.34

1.17

1.63

0.72

Comments

The higher the result (in seconds), the slower the test
executes. Low values mean thus higher performances. Results lower than 0.50 (more
than twice faster than the reference) are in green;
result larger than 2.00 (more than twice slower than the reference) are
in violet. We immediately see the
progress made in R since version 1.6.2 (about 30% faster, but as much as
four to seven times faster for some operations using the optimized libraries).

S-PLUS is a well-recognized standard in
statistics, and it is the commercial counterpart of R. As we see here, it
is much more slower than R under Windows (it takes four times more to complete
all tests)! S-PLUS is well-know for its
versatility, and for the ease of exploring statistical models in its
environment. It excels in almost all fields of statistics. However, its limits
are reached when working with huge datasets. In this case, SAS (not evaluated
here) is considered to be faster, and thus more efficient, especially in loops programming where
S-PLUS is desperately slow (test III.E)! However, S-PLUS propose alternatives: the For() function for optimized loops, and the
apply() family of functions that "vectorize" loops. With middle-size
matrices, as in the current test, it is easily outperformed by almost all the
other software evaluated here.

Since R offers similar features than S-PLUS, a larger
number of additional libraries (more than 300!), and is totally free, it is
clearly an excellent choice for statistical analyses. This benchmark shows
also that it is also quite good for "number crunching". Moreover, it
runs on almost all platforms (Windows, Macintosh,
Unix/Linux) and it has not the "loop problem" of S-PLUS (yet it also
provides apply() and the like to accelerate loops).
However, it does not propose (yet) the same nice user interface with menus and
dialog boxes (GUI) as S-PLUS 6.1 does,... (though many professionals do not
care about that because they prefer to use scripts and the command line for a
finer control on their calculations). R becomes better and better with the
successive releases. It is maintained and enriched by a very active community
of developers. These are the reasons why we decided to promote it as a
reference in our benchmark tests.

Matlab 6 is a commercial standard in pure matrix
calculation. It is significantly poorer in statistical models than S-PLUS
or R, but it offers a wide range of high-quality toolboxes for specific
applications (although, they increase the cost of this already very expensive
software!). Concerning speed, it is about as fast as R 1.9.0. However, we did
not tested the latest version, 6.5.1, that seems to provide some substantial
increase in speed. As being
one of the fastest, the richest, the most commonly used and having one of the
best user interface, Matlab 6 deserves its status of leading product in matrix
programming.

Matlab has several
contenders that propose a similar matrix language for a lower price (O-Matrix, Octave,
Scilab). Among them, only one is fighting also on the performance level
with Matlab 6.0:
O-Matrix. Overall, O-Matrix is the fastest matrix computation package we
have tested. It is much less expensive than Matlab, and it provides
reasonable compatibility. However, O-Matrix does not propose the same range of
specialized toolboxes and it runs only on Windows.

The two other "Matlab clones" (Octave & Scilab) are
free open source
software. Their performances are somewhat lower than Matlab 6.0 and better
compare with Matlab 5.3 (see version 1 of the test). Octave aims to be fully
compatible with the base version of Matlab 4.2. One should note that
Octave runs under the cygwin emulation of Unix in Windows, and this has
probably some negative impact on its pure performances. The Unix/Linux native
version should run comparatively faster. Scilab proposes many more functions than Octave, but it is not 100%
compatible with the Matlab language, and it is the slowest package of this
comparison if we except the "loop problem" of S-PLUS.

Ox is a little apart. It is the only package that does not
claim compatibility with one of the two standards previously cited: Matlab or
S-PLUS. However,
it is partly compatible with Gauss, another high quality commercial matrix
calculation software regarded as a standard in econometry (not evaluated here,
but you will find detailed tests in Stephan
Steinhaus' report). It is one of the four software (with R 1.9.0, Matlab and
O-Matrix) to be faster than R 1.6.2, that is, our reference software and
version for this benchmark. It is particularly good for the execution of scripts (tests III). As
it is a lightweight console application that can easily run scripts in batch
mode, Ox is an excellent choice to shell matrix calculation scripts in various
kind of applications. O-Matrix is even faster, but it is restricted to
Windows systems.

Conclusions

The choice of a data analysis software is a difficult task. "Matrix languages" (like all the software we
evaluated here) are very flexible because they are programmable and they are
able to work very efficiently with matrices (by definition!) that are widely
used in data analysis. However, they
differ from each other in term of price, richness (the number of function
provided), usability (including the quality of their user interface, their
status of established standard or not, the quality of their support, their availability on different platforms like
Windows, Macintosh, Unix or Linux), and finally, in term of their pure
performances. We evaluated the latter here by using a benchmark
suite of 15 tests. Considering results obtained with our
benchmark (but beware of its limits: only few features
were tested, and solely on a Windows platform!), one can conclude:

R is one of the fastest open source data analysis
packages. Since it is free and provides many additional
packages for all kind of statistics, we warmly recommend it.

S-PLUS is slower and much more expensive, but it still offers a
better graphical user interface.

Matlab is equally fast, rich and offers a well-designed user
interface, but it is equally expensive.

O-Matrix is the fastest matrix language we have tested on Windows.

Currently, no free "clone" of
Matlab is as fast as Matlab 6.0 itself.

Octave is language-compatible with Matlab, but not a top
performer on Windows.

Scilab is a free alternative of Matlab for
"richness" more than for performance.

Ox is a very efficient matrix language, especially for batch process
of scripts.