A place for me to write about interesting technology topics.

Precompiling Assets Locally for Capistrano Deployment

Apr 14th, 2012

I’ve had a goal of fast Capistrano deployments to my VPS for a while now, but I’ve constantly been plagued with asset precompilation taking anywhere from four to 15 minutes on my little server (I’m using Rackspace’s smallest offering, a VPS with 256MB RAM). When I precompile assets locally, it always finishes in under a minute…so I came up with an approach to leverage my local machine for precompilation and upload the assets to the server. I’ve also avoided any shenanigans with adding assets to my git repository (Ew! Don’t do that!).

Technology

Local Asset Precompilation

I started out using Capistrano’s built-in asset precompilation which is as simple as uncommenting the line below from the Capfile:

12

# Uncomment if you are using Rails' asset pipeline
load 'deploy/assets'

And Capistrano was successful at precompiling my assets on the server…it just took a long time to complete…sometimes a very long time to complete. I figured the first step in getting assets to precompile locally would be commenting back out the deploy/assets line in the Capfile and reading over the Capistrano assets.rb source to know exactly what I needed to re-implement. Go check out the Capistrano source if you haven’t already. The Capistrano code, in general, is very easy to read and well documented, however, the assets methods are especially simple. The real magic in the Capistrano assets code is the symlink method which needs to execute before deploy:finalize_update. I didn’t include symlinking in my first test deployment and it didn’t work well.

I discuss the finer points of the conditional logic on whether or not to precompile below, so I’ll skip over that for now and explain the process of precompiling and uploading first. The run_locally method is courtesy of Capistrano and allows us to run commands on the local machine. In order to keep things tidy, I run assets:clean before running assets:precompile. Next, a .tar.bz2 of the assets folder (I’ll explain why I went with bz2 below) is created, and Capistrano’s top.upload method is invoked to secure copy assets.tar.bz2 to the shared directory on my server. After the file is on the server, it is extracted, and then the .bz2 is deleted. I tried leaving off :via and using the default sftp behavior but kept running into: Net::SFTP::StatusException(4, "failure"). Instead of debugging that issue, I tried scp and it worked perfectly. Lastly, I delete the assets.tar.bz2 from my local machine and run assets:clean again…leaving my public directory nice and clean. Remember, it’s not best practice to store assets in a code repository.

As for the symlink method, I took that directly from the Capistrano symlink method. No need to change anything in that behavior.

So, why did I use bz2 instead of gz when bz2 takes longer to compress? My goal is fast deployments, and unfortunately, I can’t always be connected to lightning fast internet when I work remotely, so I’d rather spend a little more time compressing if that means faster uploads.

I ended up writing a quick and dirty performance test to see if bz2 was worthwhile. Here’s my shell script:

I ran the performance test five times and here’s the average compression times:

tar: <1s

gz: <1s

bz2: 1.6s

In my opinion, that space savings is worth the wait considering I might be uploading a change tethered from my phone or connected to a public wifi connection!

Ben’s Approach to Selective Asset Precompilation

As I was working on writing my new Capistrano recipe, I stumbled upon a post by Ben Curtis called Skipping Asset Compilation with Capistrano. It seems he was also looking for a way to speed up Capistrano deployments and approached it from the angle of reducing how often precompilation is done. By default, Capistrano does it on every deploy, however, if no assets have changed…then there’s no need for it to be run. Per Ben’s post:

The trick, then, is to check the list of files that were changed in the range of commits that are being deployed, and compile the assets only if assets show up in that list.

Ben’s solution builds on Capistrano’s pending:default method, and limits the scm log to the assets folders. Here’s the piece we’re interested in from Ben’s code:

Let’s break this down a bit since not everyone is familiar with the inner workings of Capistrano, and I’ll be explaining in the context of Git…since each scm may have a different implementation, but Git is my scm of choice.

Source is set to Capistrano::Deploy::SCM.new(scm, self) (in my deployments scm is set to Git). The SCM module has the next_revision method which looks like:

1234567891011

# Returns the revision number immediately following revision, if at
# all possible. A block should always be passed to this method, which
# accepts a command to invoke and returns the result, although a
# particular SCM's implementation is not required to invoke the block.
#
# By default, this method simply returns the revision itself. If a
# particular SCM is able to determine a subsequent revision given a
# revision identifier, it should override this method.
def next_revision(revision)
revision
end

Since I’m using Git, I can look in the Capistrano Git class and see that next_revision is not being overridden, so it will simply return the revision passed to the method.

The current_revision variable is set to the commit hash stored in #{current_path}/REVISION.

Putting all these pieces together, we can see that from is set to the /path/to/app/current commit hash.

The next line uses some Git magic to find out if there have been any changes to the assets folders. Log is a method in the Git class which corresponds to the $ git log command.

1234

# Returns a log of changes between the two revisions (inclusive).
def log(from, to=nil)
scm :log, "#{from}..#{to}"
end

This gets evaluated to:

1

git log #{/path/to/app/current commit hash}..

The .. is indicative of the <since>..<until> options which specify a range of commits. If you don’t specify an upper bound when filtering git log with the <since> option, the <until> option will default to HEAD. So if you have commits on your machine that are newer than the current_path commit hash, running this command will show a list of all the more recent commits. Ben trims down this result even further by using the <path> option. This Git option allows you to specify any number of directories or files that you want to filter the commits on…meaning if you have commits with changes that aren’t in your specified <path> option, then those commits won’t be output when you run git log. The output from git log can then be piped to the linux command wc -l which prints the newline counts. In this case, if the newline count is greater than zero, there are new commits with modified assets! Easy!

The only tweak I made to Ben’s code was adding the lib/assets directory to the <path> option.

Wrapping It Up

There might be some corner cases where this setup won’t work, but I’ve yet to encounter them. I’ve probably done somewhere in the neighborhood of 100 deployments (and counting) with this code in place. When I change assets, they get precompiled correctly, and my worst experience with the upload of the bz2 involved a seven minute upload over a slow connection. Overall I’m satisfied with these changes, but I’ll probably never stop looking for ways to improve my deploy process.