Docker

At Control F1, we’re always evaluating the latest technologies to see if and how they’ll fit with our clients needs. One of our core strengths is in .NET development, so we’ve recently been looking at the newly released Visual Studio 2017, along with .NET Core 1.1 and combining this with our ongoing use of Docker to create microservices. We always like all our projects to have continuous integration to ensure a consistent and repeatable build process – in our case, we use a TeamCity instance running in AWS for this. However, actually getting everything to build in TeamCity wasn’t quite as easy as we would have hoped due to a few minor niggles, so I’ve put together this blog post to capture everything that we needed to do.

Control F1 Lead Architect Phil Kendall gives some advice on performing R calculations in microservices.

Back in January this year, Control F1 started work as the lead member of the i-Motors consortium, a UK Government and industry funded* project working towards viable, commercially sustainable Smart Mobility applications for connected and autonomous vehicles. One of the key elements we will be delivering as part of the project is the capability to add predictive and contextual intelligence to connected vehicles, allowing all individual drivers, fleet managers and infrastructure providers to make better decisions about transport in the UK. At a coding level, this means we need to get some data science / machine learning / AI code written and deployed. This post gives a quick run through of the technology choices we made, why we made them and how we implemented it all.

Why R?

There are effectively two choices for doing “small scale” (i.e. fits into the memory on one machine) data science; R and Python (with scikit-learn). It just so happens that I’m much more an R guy than a Python guy, and the algorithms we wanted to deploy here were written in R.

Why Docker?

For i-Motors, we’ve gone down the microservices route for a lot of the common reasons, including the ability to independently improve the various components of our system without needing to do high risk “Big Bang” deployments where we have to change every critical part of the system at once. There are obviously alternatives to Docker for running microservices – while this post is Docker-specific, it shouldn’t be too hard to adapt what’s here to another container platform.

Why spray-can?

This is where it gets a bit more complicated! Excluding the definitely right out there on the cutting edge Docker for Windows Server 2016, running Docker means running Linux. At Control F1 we’re mostly a .NET house on the server side, so a number of the i-Motors components have been written in .NET Core and very happily deploy themselves on Docker. However, the .NET to R bridge hasn’t yet been ported to .NET Core, so there’s no simple way for a .NET Core application to talk to R at the moment. I investigated a couple of other options for bridging to R, including using node.js and the rstats package. Unfortunately, the official release of rstats doesn’t work with the latest versions of node, and while there are forks out there which fix the issue, basing a long-term project on a package without official releases didn’t seem like the wisest solution. The one option which did present itself was JRI, the Java/R Interface which I’d made some use of before when running on the JVM.

When it comes to JVM languages, I’m a big fan of Scala and the spray.io toolkit – again, the solution here isn’t particularly tied to Scala and spray.io and should be relatively easy to adapt to any other JVM language and/or web API framework.

Implementation

All the code for this blog post is available from Bitbucket. I’ll give a brief overview of the code here.

Startup

The web API is set up in RSprayCanDockerApp and RSprayCanDockerActor. This is pretty much a straight copy of the spray-can “Getting Started” app, with the notable exception that we bind the listener to 0.0.0.0 rather than localhost – this is important as the requests will be coming from an unknown source when deployed in Docker.

R integration

The guts of the R integration happens in the SynchronizedRengine class and its associated companion object. There are two non-trivial bits of behaviour here:

The guts of R are inherently a singleton object – there is one and only one underlying R engine per JVM. SynchronizedRengine.performCalculation() has a simple lock around the call into the R engine so that we have one and only one thread accessing the R engine.

The error handling is “a bit quirky”. If the R engine encounters an error, it calls the rWriteConsole() function in the RMainLoopCallbacks interface. The natural thing to do here would be to throw an exception, but unfortunately the native code between the Rengine.eval() call and the callback silently swallows the exception, so we can’t do that; instead we stash the exception away in a variable. If the evaluation failed (indicated by it returning null), we then retrieve the stashed away exception. In Scala, we wrap this into a Try object, but in a less functional language you could just re-throw the exception at this point.

Docker integration

The Docker integration is done via SBT Native Packager and is pretty vanilla; three things to note:

The Docker image is based on our “OpenJRE with R” image – this is the standard OpenJDK image but with R version 3.3 installed, and the JRI bridge library installed in /opt/lib. The minimal source for this image is also on Bitbucket.

We pass the relevant option to the JVM so that it can find the JRI bridge library: -Djava.library.path=/opt/lib

We set the appropriate environment variable so that the JRI bridge library can find R itself: R_HOME=/usr/lib/R

If you just want a play with the finished Docker container, it’s available from Docker Hub; just run it up as “docker -p 8080:8080 controlf1/r-spraycan-docker“.

Putting it altogether

For this demo, the actual maths I’m getting R to do is very simple: just adding two numbers. Obviously, we don’t need R to do that but in the real world you should be able to substitute your own algorithms easily – we’ve already deployed four separate machine learning algorithms into i-Motors based on this pattern. But as demos are always good:

$ curl http://localhost:8080/add/1.2/3.4

4.6

Where next?

What we’ll be working on in the near future is investigating how this solution scales with the load on the system – a single instance of the microservice will obviously be limited by the single-threaded nature of R, but we should be able to bring up multiple instances of the microservice (“scale out” rather than “scale up”) to handle the level of requests we expect i-Motors to produce. I’m not foreseeing any problems with this approach, but we’ll certainly be keeping an eye on the performance numbers of our “intelligence services” as we increase the number of vehicles in the system.

* i-Motors is jointly funded by government and industry. The government’s £100m Intelligent Mobility fund is administered by the Centre for Connected and Autonomous Vehicles (CCAV) and delivered by the UK’s innovation agency, Innovate UK.