Learn more about using open source R for big data analysis, predictive modeling, data science and more from the staff of Revolution Analytics.

June 22, 2012

The latest update to open-source R, R 2.15.1, was released this morning. (You can grab sources now, and binary versions will hit the CRAN mirrors over the next couple of days.) In addition to several new features and bug fixes (including the new globalVariables function, which will be a boon to package developers), this update also includes some significant performance improvements inspired by the dataframe package.

At the useR! 2012 conference last week, Google's Tim Hesterberg introduced the dataframe package (available now on CRAN), which has been in use for the last three years amongst Google's 500+ R users. (You can download Tim's PDF slides here.) The package makes no functional changes to R; instead, it improves the implementation of data frames to reduce the number of temporary copies made of data. Tim reported that using the dataframe package with R 2.15.0 improved performance by 21% for creation and column subscripting, and by 14% for row subscripting.

Tim mentioned during the talk that r-core member Luke Tierney was in the process of incorporating performance improvements from the dataframe package into base R, and indeed several such improvements are noted in the NEWS file. All the improvements are devoted to reducing the number of times R makes temporary internal copies of data, which improves both speed and memory usage of R. And because these are low-level improvements at R's core, these improvements will affect many R functions, not just those related to data frames.

If you've built R 2.15.1 already, have you noticed performance improvements? Let us know in the comments.

Comments

You can follow this conversation by subscribing to the comment feed for this post.

David, that is great news and you explained it so clearly to us.

I think the continuous improvement of R performance is critical to R's future. I have some code which uses KD-tree and recursion to do kernel density estimation. It is running slowly and porting it to C or C++ probably will run oders of faster. But after 15 years of programming in R, I have lost the ability to program in any thing else. I am holding out hope that R's speed will continue to improve. Indeed, Luke's byte code compiler itself is almost twice as faster.