How the MKL speeds up Revolution R Open

Last week we announced the availability of Revolution R Open, an enhanced distribution of R. One of the enhancements is the inclusion of high performance linear algebra libraries, specifically the Intel MKL. This library significantly speeds up many statistical calculations, e.g. the matrix algebra that forms the basis of many statistical algorithms.

Several years ago, David Smith wrote a blog post about multithreaded R, where he explored the benefits of the MKL, in particular on Windows machines.

In this post I explore whether anything has changed.

What is the MKL?

To best use the power available in the machines of today, Revolution R Open is installed by default with the Intel Math Kernel Library (MKL), which provides BLAS and LAPACK library functions used by R. Intel MKL makes it possible for so many common R operations to use all of the processing power available.

The MKL's default behavior is to use as many parallel threads as there are available cores. There’s nothing you need to do to benefit from this performance improvement — not a single change to your R script is required.

However, you can still control or restrict the number of threads using the setMKLthreads() function from the Revobasepackage delivered with Revolution R Open. For example, you might want to limit the number of threads to reserve some of the processing capacity for other activities, or if you’re doing explicit parallel programming with the ParallelR suite or other parallel programming tools.

You can set the maximum number of threads as follows:

setMKLthreads()

Where the is the maximum number of parallel threads, not to exceed the number of available cores.

Testing the MKL on matrix operations

Here are the results of 5 tests on matrix operations, run on a Samsung laptop with an Intel i7 4-core CPU. From the graphic you can see that a matrix multiplication runs 27 times faster with the MKL than without, and linear discriminant analysis is 3.6 times faster.

You can replicate the same tests by using this code:

—

—

Simon Urbanek's benchmark

Another famous benchmark was published by Simon Urbanek, one of the members of R-core. You can find his code at Simon's benchmark page. His benchmark consists of three different classes of test:

Matrix calculation

This includes tests for creation of vectors, sorting, computing the cross product and linear regression.

In this category, the MKL substantially speeds up calculation of cross product (~26x) and linear regression (~20x)

These functions do not activate any mathematical functionality, and thus the MKL makes no difference at all

I compared the total execution time of the benchmark script in RRO (with MKL) and R. Using Revolution R Open, the benchmark tests completed in 47.7 seconds. This compared to ~176 seconds using R-3.1.1 on the same machine.

To replicate these results you can use the following script runs (sources) his code directly from the URL and captures the total execution time:

—

—

Detailed results

Here is a summary of each of the individual tests:

R-3.1.1

RRO

Performance gain

I. Matrix calculation

Create, transpose and deform matrix

1.01

1.01

0.0

Matrix computation

0.40

0.40

0.0

Sort random values

0.72

0.74

0.0

Cross product

11.50

0.42

26.4

Linear regression

5.56

0.25

20.9

II. Matrix functions

Fast Fourier Transform

0.45

0.47

0.0

Compute eigenvalues

0.74

0.39

0.9

Calculate determinant

2.87

0.24

10.8

Cholesky decomposition

4.50

0.25

16.8

Matrix inverse

2.71

0.25

9.9

III. Programmation

Vector calculation

0.67

0.67

0.0

Matrix calculation

0.26

0.26

0.0

Recursion

0.95

1.06

-0.1

Loops

0.43

0.43

0.0

Mixed control flow

0.41

0.37

0.1

Total test time

165.60

47.72

2.5

Conclusion and caveats

The Intel MKL makes a notable difference for many matrix computations. When running the Urbanek benchmark using the MKL on Windows, you can expect a performance gain of ~2.5x.

The caveat is that different the standard R distribution on different operating systems use different math libraries. For example, R on Mac OSx uses the ATLAS blas, which gives you comparable performance to the MKL.