R Resources

Using C in R

I looked for a long time to find a small self-contained example of code that requires a persistent structure across R invocations. This is primarily to allow me to deal with data sets that far exceed the memory on my computer, but that do allow me to read and process observations sequentially. (Examples are moments and regressions.)

Eventually, I pieced together a good example that implements AS75 (Applied Statistics' algorithm 75, WLS regressions, by WM Gentleman. It works, although it may have bugs.

If interested, look at the two files in AS75-in-C. To use them, make sure you have gcc installed and then run

The code is reasonably short and reasonably documented, so it should not be difficult to figure out what it does.

However, if you want to use it for its regression aspect, you should realize that Thomas Lumley's biglm on CRAN does this, too, and probably a lot better.

June 2013

R Benchmarks

Common CPU benchmarks at sites such as anandtech or cnet are often frustrating for R users. R users care less about frame rates in FPS or single-precision floating point speed. They care more about double precision speed that is accessible to the standard R implementation. I was interested in what is fast and what is not fast for R purposes.

This site provides a (linux) perl script that runs Simon Urbanek's R-benchmark 25, records some basic information about the computer and installation on which the benchmark was run, and sends it off to my (public) website.

None of the transmitted information is sensitive. The information is publicly accessible by anyone over the web. (For now, I just use it to display the below table nicely.) I hope this will be a collaborative site that will contain many different timings. There should be almost no effort involved in contributing:

Open a terminal and run "perl R-benchmark-client.pl". It will prompt you before sending off the results.

You may want to be paranoid and examine my script before you run it. It is short and sweet. Count on the script to take about 1-5 minutes on 2012 intel hardware. Please be forgiving if there are bugs in my code—this is an early version of the script. If the script bombs, please email me with some basic information of where and how.

Current Results — Sortable

For a pure hardware table, i.e., without atlas results, click here. The number next to the atlas version is the speedup relative to the ordinary unoptimized blas library.

* Note: Jeroen mentioned that the AMD math library (ACML) implementation may have NaN-issues, mentioned in here: "R relies on ISO/IEC 60559 compliance of an external BLAS. This can be broken if for example the code assumes that terms with a zero factor are always zero and do not need to computed be, whereas x*0 can be NaN." ACML does not propagate NaN according to R's expectations. (OpenBLAS is on BLAS-library and ACML is just another BLAS library.) The ACML has the NaN propagation issues, OpenBLAS does not. Note that MKL is Intel's equivalent implementation of Atlas for its own processors. Even more interesting, the Intel MKL is better than the ACML.

The "total" and "avg" columns are the total time and the average time on the benchmarks. The average is not arithmetic, but trimmed. For more detail, refer to Simon's benchmark...or just run the benchmark for yourself on your own computer. Of course, Simon's benchmarks are not representative of a lot of other tasks. They are good representations only of typical R calculations. For other R benchmarks, see
Revolution R Benchmarks. Of course, Revolution R does not support linux, nor do they benchmark different processors or setup.

Some obvious observations from the table:

The biggest difference is not between processors, but between installations that have added a better BLAS library and the standard stupid linux distribution library (yikes!). I have sent an email to the "debian scientific computing team" to beg them to change the default library. The current situation is especially bad, because most R novices on linux won't realize the problem and the easy cure. Under ubuntu linux, it requires an "apt-get install libatlas3-base" and, voi-la, the computer becomes three times faster. The experts know how to fix this, anyhow.

Vendor-specific math libraries (Intel MKL, AMD ACML) improved performance by another 25% for some processors— though this was the case for the Haswell 4770k, it was not the case for the Xeon E3.

Native recompilation can make a difference, but it is modest in comparison.

It appears as if the leap from the i7-950 to the i7-2660K was larger than the leap from the i7-2660K to the Haswell i7-4770K, even though these are not perfect comparisons. The libraries/ubuntu installations and Atlas libraries are different. However, if you take a look at the non-Atlas benchmarks, it is remarkable how the Haswell i7-4770K (June 2013) manages to show only a 19% improvement over the i7-950 (June 2009), of which 14% is due to the 4770K's higher clockspeed.

The AMD 8350 Vishera is not bad. Perhaps one generation behind Intel—comparable to the i7-3* series in absolute performance, although it does so through higher core frequency and more cores compared to the i7.

MKL libraries can produce a kick up, but for the most part, higher frequencies and more cores translates almost directly into performance. The year-to-year architecture changes have been very modest.

All submitted results go into the results directory. If you run the script for a novel setup, then please email me a note, too, to alert me. Make sure to tell me what BLAS/LAPack library you are using---it may be in the submitted info, but it is hard to tease out. If you do, I will try to include your results in this table. Also, if your results are very different from what this table claims they should be, please send me an email and let me know why you think this might be. In any emails, please speculate why—explain a little more about your setup. If you want your name not be mentioned with your results, please say so in your email. (I need your ip address and time of submission). If there is something unusual in your installation that I should note, again please let me know.

I am particularly interested in some of the more exotic CPU examples that I don't have here---things like the Intel Phi, ARMs, snow clusters (if it makes a difference), overclocked CPUs, older CPUs, Kaveri CPUs, etc.; or Intel MKL compiled versions; or different BLAS/LAPack libraries; or values that are just very different from those that are already in the table. (The point of this table is not to showcase changes in coding, but to see how plain R programs perform with no changes on different platforms. Please stick to the unmodified urbanek R-benchmarks.) If you can increase performance in this test by a significant factor through odd hardware or simple recompiles (e.g. through GPU recoding), please email me with some more explanations, too.

Does anyone have an Intel phi processor board to try this out?

Does anyone have an AMD Kaveri?

More Details

The client script collects the following pieces of information and sends them back to the server (for now, R.ivo-welch.info):

Your remote IP address as sent to the server (every website you visit has your IP).

Again, none of this should be security-sensitive information. The information is stored in the results/ directory, where everyone can read it and process it. If I get a couple of hundred submissions, I will write a nicer display screen that collects the information.

The server collector with which the client communicates is a simple perl script, fbhole, which you can think of as a "file blackhole server"—you can send text to it, but it does not do anything else. It has simple configuration and logging, which minimizes the probability of a security breach. Its a nice simple unsophisticated and easy complement to any web server.