Other sites

How slow is R really?

One thing you always hear about R is how slowitis, especially when the code is not well vectorized or includes loops. But R is an interpreted language and its strong suit really isn’t speed but rather the comparative advantage is the 4,284 packages on CRAN. We accept the slower speed for the time saved from not having to re-invent the wheel every time we want to do something new.

But that doesn’t mean that it isn’t worth sometimes wondering how slow R is relative to other languages, especially with new tools like pandas in Python. I happened to be working on a Project Euler problem with the objective of calculating the first 10,001 prime numbers. I decided to see how R performed relative to my other primary languages of Python and C. I also wanted to see how R’s performance changed when I used apply() and also the new(ish) compiler package.

I took the same basic approach to each language by writing a two functions. The first determines whether a number is prime or a composite by trial division with the set {2, 3, 5, …, round(sqrt(number))} and stopped when a trial division had mod 0 or when we had exhausted all possible divisors. The second function considered the odd numbers and counted the number of prime values. It returned the value of the supplied index. The code for C, Python and R (with and without use of sapply()).

C, the only compiled language, was really fast. It was nearly 16 times faster than Python and over 270 times faster than R. Relative to R, Python was a 17-fold performance increase. To paraphrase the SAT, C is to Python as Python is to R (for this problem).

What about using sapply() and taking advantage of Rs functional programming? That was dreadful. Relative to the loops, using functional programing and sapply() actually increased runtime to 10.470 seconds.

R isn’t looking so hot here. The CRAN packages are still worth it but the relative performance advantages of Python and increasing analytical support but it is still largely confined to programmers who do stats. There is some hope with the byte code compiler for R. We get a massive performance increase in this case when we compile the functions before using them. Using cmpfun() reduced runtime to 2.408 seconds from the previous 7.058 and 10.470 seconds, respectively. While still much slower than Python or C, this represents a significant performance increase for R relative to its state just a year ago.

Maybe we won’t have to depend on the incredible packages on CRAN for our comparative advantage forever.