Sunday, May 19, 2013

Proof of Concept for a ClojureCLR-to-R Bridge Library

Based on published experience on porting Clojure libraries from JVM to CLR - namely Rob Rowe´s blog here - I figured I might be successful with an own project, so I decided to give it try.

1. Motivation for a Clojure-to-R Bridge

Joel Boehland has already developed a ClojureJVM-to-R bridge which he published as Rincanter on Github. Although it seems that Rincanter is not widely used, I like the concept for the following reasons:

R turned out to be the lingua franca for Statistics academia of today. It predominates the sector of published source code especially in those cases where the data is also publically available.

A Clojure-to-R bridge is a very useful tool to support native Clojure implementations of statistic procedures as it supports the configuration of regression tests in a natural way.

I like to have the option to use a remote R server for number crunching while I am able to control this from my laptop in the language of my choice.

2. Using the R server mode

"Rserve is a TCP/IP server which allows other programs to use facilities of R (see www.r-project.org) from various languages without the need to initialize R or link against R library. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++, PHP and Java. Rserve supports remote connection, authentication and file transfer. Typical use is to integrate R backend for computation of statstical models, plots etc. in other applications.

I installed the Rserve package on a virtual machine (Oracle VirtualBox), running Ubuntu 12.04 with R 3.0.0 on a Windows 8 host machine. In order to get the R server listening to the Windows host, I configured in /etc/Rserv.conf on the Ubuntu server:

Note, that is just a adhoc configuration without any security precautions.

While Rserve provides a mature Java client that Rincanter could leverage upon, it does not support a C# client out of the box. The main R client functions are:

Manage the TCP/IP connection with the R server by using a proprietary, DES encrypted protocol named QAP1.

Manage the desired R code as R expressions (called Sexp internally).

Manage the data flow to and from the R server which includes the proper handling of the necessary data type conversions.

While it seems to be straight forward to implement a native client in Clojure (both CLR and JVM), I chose to use the C# client RserveCLI2 that Suraj Gupta published on Github. It is forked from an implementation by Oliver M. Haynold which is hosted on Codeplex.

4. The ClojureCLR Implementation

I published my ClojureCLR implementation as a proof on concept on Github. While the full roundtrip cycle of sending data and R commands to the server and retrieving selected results and graphics (PDF files) back to the client is already provided, the remaining deficiencies are:

Clean up the API

Encapsulate the R-expressions in the library (use the status map *rserve-connection* for this)

Provide comprehensive tests.

Nevertheless, the current status already serves my needs. If I will need more sophistication, I will take a full native implementation into consideration.

To give an idea about the usage of the Clojure-to-R bridge, I reproduce here the test code of the clojure client: