speed of R, C, &tc.

My Paris colleague (and fellow-runner) Aurélien Garivier has produced an interesting comparison of 4 (or 6 if you consider scilab and octave as different from matlab) computer languages in terms of speed for producing the MLE in a hidden Markov model, using EM and the Baum-Welch algorithms. His conclusions are that

matlab is a lot faster than R and python, especially when vectorization is important : this is why the difference is spectacular on filtering/smoothing, not so much on the creation of the sample;

octave is a good matlab emulator, if no special attention is payed to execution speed…;

scilab appears as a credible, efficient alternative to matlab;

still, C is a lot faster; the inefficiency of matlab in loops is well-known, and clearly shown in the creation of the sample.

(In this implementation, R is “only” three times slower than matlab, so this is not so damning…) All the codes are available and you are free to make suggestions to improve the speed of of your favourite language!

11 Responses to “speed of R, C, &tc.”

Why comparison of different tools designed for absolutely different purposes become so popular?

Isn’t it obvious that many scientists prefer to reserch data (or it’s subset) using their favourite scripting languages which they know best and when the research is completed they can ask to compile their scripts in Fortran (which is sometimes faster than C) or pure C (the work for technicians to process full datasets)?

You’ll get more speed from Armadillo if you disable the run-time bounds checks. Armadillo has debugging turned on by default, to catch mistakes in user algorithms. The reasoning is “first get algorithm right, then optimise”.

I don’t think benchmarking by example is getting us beyond the general rule of thumb (“R is slowest, C is fastest”).
Some applications use algebra intensively, others use nested loops or make use of iterative procedures.
I propose that we develop a multidimensional benchmark scale for synthetic tasks focusing on algebra, iterative calculation and so on. Otherwise we will not understand were specific strengths and weaknesses are.
The poor performance of R in the GIBBS example is not due to slow algebra. In fact, swapping the BLAS library for the optimized version has a close-to-zero effect on most MCMC tasks.

sure, but how much?
2 times, 10 times, 100 times?
I think it is important for researchers how spend most of their time prototyping new algorithms to have the right order of magnitude in mind. And even for prototypes, one sometimes needs fast computations – is it still worth learning C today, or can we rely on optimized byte-code?

Other questions I wanted to address: can/should researchers use free software instead of matlab? The interest in python (and sage) in the community is growing: should we use it to teach in the university?

I’ve often found Matlab to be much closer, and sometimes better, than naive C code (not calling optimized BLAS). But in those cases my bottlenecks have been much larger matrix operations than dealing with 2×2 transition tables. The sequences of dependent cheap operations in this demo would be annoying to run on a GPU too.

Because none of the codes take that long, we’d really only bother to rewrite code if we were doing many runs. If we could put these runs side by side and run them at the same time, it may be possible to put much more of the computation time inside heavily-optimized matrix libraries. We’d need to write functions that work on multiple observations sequences at once.

If a “parallel” version could be made to work well, one could also try splitting up the really long observation sequences and pretending they were shorter independent sequences, at least during the early stages of learning.

this might go in the direction of using C++ translation of Matlab code for cases where many loops are needed.

Perhaps your colleagues could include another case: simulating the samples using the C++ via “the coder” while leaving the filtering and smoothing in pure Matlab by taking advantage of the vectorization tools.