Fetching Source Index for http://rubygems.org/

Like you, I’ve sat at my terminal watching Bundler emit
this post’s title and do nothing for quite a while. Imagine what we could be
doing instead of waiting for dependencies to resolve! I’m out of ideas already,
I love resolving dependencies.

It’s actually not Bundler that is slow…it’s RubyGems itself. To understand why
this process takes a long time, you need a bit of a history lesson with how
RubyGems handles its index of gems. There are three indexes available:

Latest index (newest versions for a given gem on a given platform)

Big index (all versions for all gems on all platforms)

Prerelease index (only prerelease gems for all gems on all platforms)

Usually we just need to request the “latest” index when you gem install
something. However, Bundler needs the big index. This has a serious size
difference though:

Bundler also needs a given gem’s dependencies. If you haven’t noticed already,
those dependencies aren’t in the index at all, they’re in the gemspecs, which
are stored individually at a completely different location, also gzipped and
Marshal’d.

So that’s basically how RubyGems figures out dependencies out to a N level, it
has to make separate requests to each gemspec and continue to jump through until
all possibilities are exhausted. At some point when you gem install a gem, add
-V on and you’ll see all of these requests happening.

Those requests obviously take a lot of time, no matter how good Bundler’s
resolver algorithm gets. I think we’ve pushed this system to its limits, and the
fact that it does complete resolves in a reasonable amount of time is
impressive.

From the RubyGems side, I think we’ve done a good thing by making the long
requests go out to CloudFront, so big gems get a CDN boost. However, all
requests being are still being made to the Gemcutter server at RackSpace before
being redirected to S3/CloudFront, so the network latency with that request
doesn’t help those outside of the US get their gems faster.

At Cape Code,
Matt and I worked on a new resolver
endpoint
for Bundler. The idea was that Bundler could make a request to this new API that would return one level
of dependencies for a given set of gems. We can’t move the entire Bundler
resolver algorithm to the server side, but this could cut down the number of
requests it needs to make out for gemspecs.

This will speed things up a bit, but it doesn’t solve the root problem here.

RubyGems definitely needs a better indexing scheme, but this is difficult since
making the client support it is going to be rough (and we have to worry about
backwards compatability!)

Thankfully, our server is now in Ruby (one of the first goals of the Gemcutter
project) so we can iterate rapidly and drop the changes into a gem plugin (think
gem fast_install rails). I’ve been talking to some fellow robots here about
some possibilities (differential indices for one) but we need to bang some code
out soon.

I’m looking into getting a mirroring system set up, but as always, we need
contributors to help. My first stop has been with
MirrorBrain, but I’m open to anything that works and
will be easy to setup. My only real requirement is that it takes less than 1
minute to get a gem distributed. Perhaps we need BitTorrent? The gem files are
small (most are way under 1MB) so I can’t see that as being hard to accomplish.