As I wrote in a previous blog post,
there are good reasons to be paranoid with Ruby gems:
they may have been hacked and “enhanced” with malicious code.
It would be great if we could check every gem that we want to install,
including their dependencies.
You may think “this is not practical at all”, and you are probably right.
But still, I wanted to give this idea a try and learn about the challenges
that people will face if they want to review their gems before installation.

Let’s consider a company whose business is about making web applications.
The tech team is divided in two:

a development team that writes the company software, leveraging Ruby gems

a security team that focuses on security issues

The security team is in charge of reviewing all the gems needed to run the company applications.
This policy could bring a lot of tension between the two teams,
so I hope that members of both teams enjoy having coffee breaks together.

A check point

The security team wants to ensure that all the gems dependencies used by the company software are safe.
So they set up a kind of check point:
a process in which all gems needed by the development team will be reviewed and the unsafe ones filtered out.

Since the company does not trust the rubygems.org source anymore,
the development team is not allowed to download gems directly from there.
If they do so, they have to use a sandbox environment, which could be a self-contained virtual machine.

Once successfully been reviewed,
the gems will be available for both development and production environments.
Over time, the development team will have access to more and more safe gems to work with.

The workflow to validate a new gem could look like this:

development team thinks they need new gems that have not been reviewed yet

the sandbox environment is used to experiment with those gems

the exact names and versions of each gem to review are identified

the gems and all their dependencies are reviewed by the security team

if deemed safe, the gems are made available on the internal gem server

In other words, the security team acts as a middleman for rubygems.org.

The development team should be able to do its job the normal way,
using bundler and the gem command-line too with minimal annoyances.

The built-in gem server

To serve Ruby gems,
the security team first considers using the
gem server command.
It comes with rubygems itself so there’s nothing special to install.

The gem server command serves all the gems installed on the machine;
so it’s very easy to add a new gem and its dependencies with gem install.

The gem server runs on webrick and can only serve one client at a time.
It does no caching and is not compatible with Rack.
So it will not run under more powerful application servers such as puma
or unicorn.
Fortunately, most rubygems clients cache the data on their side.

Let’s grab a debian-compatible server and try to run the gem server under its own user account:

All these settings are stored in the .gemrc configuration file,
making it very easy to share them with team mates.

Everything seems OK.
Remember that most rubygems clients maintain a cache of downloaded gems,
and this cache may already contain plenty of code that we do not trust.
So it’s best to move to a new setup.

Add trusted gems

We are now done with the setup.
Now, the development team wants to build a new web application using sinatra version 1.4.3.
The security team should fetch, unpack and review the gem.
If deemed safe, the gem can then be installed and shared using gem server.

But something is wrong:
gem install will install sinatra along with its dependencies,
yet we have checked none of those!

The standard gem fetch command does fetch the dependencies,
so I have written a small rubygems plugin called
rubygems-deep_fetch.
gem deep_fetch will ignore the dependencies that are already in your cache.

Equipped with gem deep_fetch, the security team goes back to hard work.
They can ignore the packages that are already in the cache
as they have already been checked.
They just want to fetch and review the missing ones.

Fortunately, rack, sinatra and tilt are all small gems,
so the security team was able to review all of them within a reasonable time.
That would be different for complex gems like rails, obviously.

Playing with bundler

So far, we only have played with the rubygems client,
however the development team is more likely to use bundler instead.
They have updated the Gemfile for this new sinatra-based web application they are working on:

source "http://checkpoint:8088"
gem "sinatra"

Bundler is very good at caching,
so to avoid cache effects every developer was asked to clean up his/her .bundler directory beforehand.

$ bundle config path ~/.bundler
Settings for `path` in order of priority. The top value will be used
Set for the current user (/home/fabien/.bundle/config): "/home/fabien/.bundler"
$ rm -rf /home/fabien/.bundler

Bundler first tries to query the dependency API but it is unsuccessful
since the feature is not available in the standard gem server.
As a consequence, bundler falls back to retrieving the full index to resolve the gem dependencies on the client side.

By the way, the Rubygems 2 client also knows about this new dependency API,
but Rubygems 1.8 does not.

We also notice that the server was not queried on our second bundle run.
That means that bundler is smart enough to cache the dependency resolution.
No network connection is required when nothing has changed in the bundle. Very nice.

The gem server can work with bundler, but it will quickly hits his limits as the security team adds more gem in the trusted gems database.
Do you remember how slow bundler felt previous to 1.1 version? You got it.

Better gem serving with geminabox

geminabox makes it very easy to serve your own gems.
It can be installed as a gem and has two main features:

a sinatra-based web application to host your gems

a plugin to add a new command to the gem tool

Once geminabox is installed, Rubygems is enhanced with a new gem inabox command.
It expects *.gem arguments and behaves like the gem push command
(that publishes to the official rubygems.org repository).

The geminabox gem server is more efficient than gem server
because it implements the dependency API.
It is also compatible with Rack so it’s possible to run it using a modern web server
(which can serve a lot faster than webrick).
Rack compatibility makes it very easy to add SSL protection and HTTP authentication using middleware.

The security team is running the geminabox server under a dedicated user, using puma as application server:

Using geminabox rather than the standard gem server won’t break anything on the client side, however it may feel faster with bundler and rubygems 2 clients
due to its support of the bundler dependency API.

The performance gain is noticeable, even if sometimes difficult to measure on small gem sets.
Starting from an empty bundler cache, using geminabox on the server side will decrease our installation time from 1000 ms to almost 800 ms.

The security team now has to publish the approved gems using gem inabox followed by the package filename.
They cannot install dependencies automatically using gem install,
so they could really use some kind of “deep fetch”,
like in our rubygems-deep_fetch plugin.

Geminabox also provides an administration web interface, so that the security team can unpublish the gems they don’t need or trust anymore.

The development team also gains a server to publish their own private gems.
After all, this is what geminabox has been designed for.

How about a proxy?

I’ve experimented with gem server and geminabox to implement our check point.
Along the way, it gave me a better understanding of
the relationship between a gem server and its clients (i.e. rubygems and bundler)
and was good to remind about the dependency API introduced with bundler 1.1.

Using similar techniques, it’s also possible to set up a proxy for rubygems.org, and even work off-line.
The gem mirror is a rubygems plugin that aims to do so.
But so far there is no open source project to setup an intelligent proxy in front of rubygems.org,
that could anticipate the upcoming needs of the clients.
geminabox may evolve to become such a cache, like mentioned in a recent forum discussion, but this is just a guess.
We’re still missing something as smart as apt-cacher-ng.

About Us

Gemnasium monitors your project dependencies and alerts you about updates and security vulnerabilities.