Why is Rubygems Slow?

“Why exactly is Rubygems slow?” is the question that more than one developer has asked, but few have bothered to do anything about. Recently @mfazekas took up the task for profiling an especially slow case using dtrace. This resulted in several high profile pull requests to improve performance and drop memory allocations. These in turn lead me to ask the question, just what is Rubygems doing that takes so long? The short answer is, way more than you ever thought; for the long answer, keep reading.

Rubygems Basics

The Rubygems gem comes stock with modern versions of Ruby. It’s goal is to make installing and using external pieces of code (or libraries) easier. To do this, it takes over Kernel#require so that after you’ve run:

$ gem install wicked

You can later require 'wicked' and right code gets loaded.

Load Path and Resolutions

When you require wicked you don’t specify a version to load, so how does Rubygems know which one to use? It could take the latest version, but then when you $ gem install a more recent version for a different project, your project might break. Rubygems tries to make an intelligent decision about the dependencies you need based on other dependencies. When a gem is published to rubygems.org (the server) they contain a .gemspec that declares information about the current gem as well as the dependencies that gem has. For example in wicked.gemspec

If you only have this version of wicked required, it can’t be required unless you’re also using a version of railties greater than 3.0.7. If it can’t find one it will raise an exception. If it finds a valid version it adds it to $LOADED_FEATURES.

Using another library, threaded, we can see $LOADED_FEATURES, here it is before being loaded:

If you’re worried that Rubygems might pick the wrong version of a gem, you’ll need to be using bundler, which most Ruby developers already are. Bundler does gem resolution the first time you run bundle install. Bundler uses explicit specifications from the Gemfile alongside information found in an individual gem’s gemspec. Once resolved, bundler generates a Gemfile.lock. Developers check this file into their source control to ensure that every version they install is the same as their coworkers’. Otherwise, you end up with the “it works on my machine” subtle bugs that exist between minor version differences in software. Bundler works with Rubygems by adding all gems in the Gemfile.lock explicitly to $LOAD_PATH

Here you can see that threaded version 0.0.4 is already on the load path. Now, when you try to require threaded, that explicit version will be found and Rubygems doesn’t have to attempt resolution.

It would look like using a Gemfile.lock with bundler eliminates the need for Rubygems to resolve dependencies. This is mostly true, but there are still plenty of times when you need require to work correctly without a Gemfile.lock. A good example is running rails new foo. This command will actually generate a Gemfile and a Gemfile.lock for you. This is exactly what @mfazekas was doing when the problem was identified.

Rubygems Internals

What really happens when your require a gem in your Ruby code? When you call require "wicked", the code will first try to find wicked in the normal standard library. If it can’t, then Rubygems will see if it has been loaded previously. If wicked has not been previously required, Rubygems must work to find it and calls:

found_specs = Gem::Specification.find_in_unresolved path

In Rubygems, we “activate” a gem once we know exactly what version is needed and present on the system. Once activated, we cannot activate a different version of that gem. Rubygems works to resolve dependency collisions when possible, and will error out when it can’t.

Inside of Rubygems, an “unresolved” gem specification is a gem that has been referenced by another gem but has not been loaded yet. For example, if we have a gem a that requires a gem called b we will want both gems loaded when we require a. In order for this to work Gem::Specification#activate is called on the a gem. If these were the only two dependencies needed, then we could “activate” a version of b that a needs. However if there are other gems that the system is aware of that have not been activated that reference b, it can’t be resolved and loaded yet. When this happens the dependency requirements are stored in memory:

Later, when a gem is explicitly require-d we check to see if it was previously stored in our Gem::Specification#find_in_unresolved list.

If we are trying to require 'c' where a is activated and references b (which is “unresolved”), then Rubygems does not know where to find c yet. To do this, it needs to walk through all unresolved dependencies and find which have a reference to c. To do this we call:

found_specs = Gem::Specification.find_in_unresolved_tree path

which must traverse all possible gems.

Gem Traverse

The traverse method inside of Rubygems is an expensive one with no affiliation with Chevy.

(groan)

When trying to find a specification in the unresolved tree, the unresolved dependencies must be looked up and each of their dependencies need to be inspected to see if they contain the gem we are looking for.

This method takes all Gem::Specification-s that will automatically be activated at runtime by the target gem; these are known as “runtime dependencies”. For each of these, it grabs their dependencies and calls traverse on each of the dependencies. Each time this happens the trail variable is added to with the current Gem::Specification.

This is a expensive method, as it allocates 2 arrays each time it is called:

trail = trail + [self]

In addition to the [self] array, the process of adding an array to an already existing array actually allocates a new array even if you use the same variable:

Once we do the assignment, array = array + [4], the object_id of the array variable changes. A new array has been allocated. The Ruby interpreter could optimize to not allocate this array, but it is very difficult, since somewhere else could hold a reference to the array, including eval-ed code:

List to the Rescue?

An array contains quite a few features that aren’t needed in this case. All we’re trying to do is to keep track of the specs that have been traversed. Aaron submitted a relatively popular patch to help with this problem:

The Gem::List is a linked list data structure. This structure is like an array, in that it holds sequential data. In a linked list, elements can be added and removed very efficiently compared to an array, since the entire data structure does not need to be reorganized. A linked list is made up of many nodes that each have a value and a reference to the “next” or “tail” node. You can iterate over an entire linked list like this from the Gem::Node source:

def each
n = self
while n
yield n.value
n = n.tail
end
end

In the pull request, this Gem::Node is used to track the traversal process. Take a look at the source:

The logic is essentially the same as the original traverse. Recursively go through all gem specifications and look through each of their specifications. While the account of “millions of allocations” was due to a bug in the PR, the improvement still stands that reducing allocations on each invocation is an improvement.

Effects

The test that was written highlights the worst case scenario. We have many gems that have not been “activated” but are loaded, and they also reference many other gems. We are forcing a full traversal by searching for a file that doesn’t exist.

What does this mean for your app? I modified Rubygems to print out the number of times Gem::Specification.find_in_unresolved_tree path is called while booting my codetriage.com app. That method gets called twice which results in 30 total calls to the traverse method.

You can reduce this to 0 calls to either method by booting the app with bundle exec. When bundle exec is used, the scope of gems that can be searched through is reduced, as only gems found in the Gemfile.lock will be present in the load path.

What Does It All Mean?

This is one of my favorite recent pull requests for a number of reasons. First, it had a bug in it that no one caught in the tests. What does this mean? We can’t always rely on existing test suites to handle every possible scenario. There’s a reason such pull requests are made in public. The human element to review and gauge validity cannot be underestimated.

My second favorite thing about this pull request is that the benchmarks used to deduce the savings were included in the description. Not only does it mean we can all learn from the PR, but we can all test it independently. This openness allowed anyone to run the code, apply the patch, and separately verify the savings. This is what attracted me to investigate the Rubygems behavior in the first place. This post is born out of the raw notes I took while evaluating the pull request. Finally, I really enjoy that this performance work came from finding and reporting a performance bug. Why don’t we as Rubyists spend half of the time benchmarking and finding performance bugs as we do complaining about slowness?

It’s important for people outside of a project’s core developers to read over and make sense of popular pull requests. It’s a great way for you to learn about new tools and techniques, but it’s also a good time to learn about how different internal parts of projects work. I encourage you to go find a PR you don’t totally understand: take some notes, dig in, and learn something. The best part of open source contributions aren’t just that we benefit from other’s work, we get to see how that work came to be, and we get a chance to make ourselves better.

If you like open source and Ruby, follow @schneems. If you want to dig into open source issues and pull requests subscribe to codetriage.com.