Tue, 05 Jul 2011

Linear regression models are a major component of every applied researcher's
toolbox. Obtaining results more quickly is therefore of central importance,
particularly when many such models have to be fit. Common examples in this
context are Monte Carlo simulation or bootstrapping.

My talks introducing High Performance Computing with R (see e.g.
these
slides from a five-hour workshop at the ISM in Tokyo) frequently feature an example
of how to extend R with dedicated
compiled code for linear regressions. Romain and I also frequently use this
a motivating examples with our
Rcpp package for
seamless R and C++ integration. In fact, the examples directory for Rcpp still
contains an earlier version of a benchmark for fastLm(), a faster
alternative for R's lm() and lm.fit() functions. We have also
extended this with the
RcppArmadillo
package which brings Conrad Sanderson's excellent
Armadillo library with templated C++ code
for linear algebra to R, as well as a simple integration to the GNU GSL via our
RcppGSL
package. The Rcpp
section on my blog contains several posts about fastLm benchmarks.

Doug Bates has been a key Rcpp contributor, helping particularly with the
initial Armadillo integration. His research, however, also requires highly
performing sparse matrix operations which Armadillo does not yet offer. So
Doug has started to explore the
Eigen project---a free C++ template
math library mainly focused on vectors, matrices, and linear algebra (note
that we will refer to the Eigen, Eigen2 and Eigen3 APIs as just 'Eigen' here,
focusing on the latest version, Eigen3). Better
still, Doug went to work and pretty much single-handedly wrote a new package
RcppEigen which
integrates the templated C++ library
Eigen with
R using
Rcpp.

RcppEigen also provides a fastLm implementation and benchmark
script. In fact, it contains a full six
different implementations as Doug is keenly interested in rank-revealing
decompositions which can guard against ill-conditioned model matrices. Some more
background information on this is also available in Doug's article on Least
Squares Calculations in R in
R News 4(1).

Doug's implementation also uses an elegant design. It comprises a base class with common
functionality, and six subclasses which specialize accordingly for these
six different decompositions approaches:

a column-pivoting QR,

a standard QR,

an LL,

an LDL,

an SVD, and

a standard symmetric one.

On my server, the result of running the included benchmark script
lmBenchmark is as follows:

From this first set of results, the preferred method may be 'PivQR', the
pivoted QR. Strictly-speaking, it is the only one we can compare to
lm.fit() which also uses a pivoting scheme. In the case of a degenerated model matrix,
all the other methods, including the four fastest approaches, are
susceptible to producing incorrect estimates. Doug plans to
make SVD and SymmEig rank-revealing too.

As for pure speed, the LL and LDL decomposition have almost identical
performance, and are clearly faster than the other approaches. Compared to
lm.fit(), which is the best one could do with just R, we see an
improvement by a factor of eight which is quite impressive (albeit
not robust to rank-deficient model matrices). Apart from the SVD, all
approaches using Eigen are faster than the one using Armadillo, which itself
is still faster than R's lm.fit(). Doug and I were very surprised
by the poor performance of the GNU GSL (which also uses SVD) via RcppGSL.

Now, Eigen uses its own code for all linear algebra operations, bypassing
BLAS and LAPACK libraries. The results above were achieved with the current
Atlas package in Ubuntu. If we take advantage of the BLAS / LAPACK plug-in
architecture offered on Debian / Ubuntu systems (see the vignette in my
gcbd package for more)
and use Goto BLAS which provide tuning as well as parallelism on
multi-core machines, the results are as follow:

We see that the BLAS-using Armadillo approach improves a little and moves
just slightly ahead of the pivoted QR. On the other hand, lm.fit(), which also
uses a pivoting scheme and hence only level 1 BLAS operations, changes less.
GSL performs even worse (and it is unclear why).
Doug's post announcing RcppEigen on the Eigen list has a few more sets of results.

This post has illustrated some of the performance gains that can be obtained
from using Eigen via RcppEigen. When not
using rank-revealing methods, computing time can be reduced by up to eight
times relative to lm.fit(). Rank-revealing method can still improve
by almost a factor of two. The main disadvantage of Eigen may be one of the
reasons behind its impressive performance: its heavily templated code does
not use BLAS, and the resulting object code (as e.g. in RcppEigen) becomes
enormous (when compiling with debugging symbols). As one illustration, the
shared library for RcppEigen on my Ubuntu 64-bit system has a size of 24.6 mb
whereas RcppArmadillo comes in at a mere 0.78 mb; without debugging symbols it is a
more reasonable 0.52 mb.

The performance of Eigen is certainly intriguiging, and its API is rather
complete. It seems safe to say that we may see more R projects going to make
use of Eigen thanks to the RcppEigen wrapper.

Update: Clarified statement about large
object size which was entirely due to building with debugging support.