Comparison of application of Rcpp and rJava in R

Introduction

I implemented a simple algorithm that computes distances between all pairs of given set of n-dimensional points. The algorithm is implemented in C++ and Java. To communicate with C++ code, I use R’s Rcpp package. To communicate with Java, I use two methods. The first one is a simple approach where the communication is made through temporary disk files containing the input and output data, and the Java program is called through the system() function. In the second approach, the communication is made with help of the rJava package.

Test

To test these methods, I ran the program on a 10-dimensional set of points drawn randomly from a normal distribution. The number of points in a single set varied from 1 to 3000. Each run was repeated 5 times and then averaged to obtain more reliable results. The results (number of points vs. time taken by the method) are shown in the following plot. The bars correspond to standard deviation of a given run.

Comparison of different methods computing the distances between points: java – implementation in Java with the help of rJava, cpp – implementation in C++ with the help of Rcpp, java_file – implementation in Java with the use of temporary disk files.

Analysis of the test results

It can be seen than the method that uses C++ is the fastest one. Of course, it was expected since C++ programs are generally more efficient than Java, but that’s not the whole story. Apart from that, the Rcpp package allows to write a more efficient code responsible for communication between the native language and R because in Rcpp we do not copy the whole objects while passing them to the native code (as we do in rJava). Rcpp allows and encourages using its thin and effective wrappers for R objects without unnecessary copying of their contents.

On the other hand, the Java code is more readable for a layman, since it doesn’t use any nonstandard packages to implement the functionality. This is not the case with the C++ version where we use the Rcpp.h header to access R-specific constructs.

What was unexpected in this comparison is that the method based on temporary files is faster for smaller data sets. This is probably due to expensive but hidden bookkeeping connected with the rJava package.

Source code

The source code of these methods along with a makefile that builds appropriate packages and installs them in the system and the R code that generates the plot is placed here:

The program was tested with R 2.12.1 on Ubuntu 11.04 system. The code is written in such way that it should be pretty easy to use it as a template for R connection code when implementing other algorithms in Java or C++.

6 Responses to Comparison of application of Rcpp and rJava in R

I was very pleased to find this web-site.I wanted to thanks for your time for this wonderful read!! I definitely enjoying every little bit of it and I have you bookmarked to check out new stuff you blog post.

Hi admin do you need unlimited content for your blog ? What if you could copy
article from other pages, make it unique and publish on your blog – i know the right tool
for you, just search in google:
Loimqua’s article tool