A Jekyll CDN with CloudFront

I recently added a CloudFront CDN for all static assets on this site. Adding a CDN to Jekyll isn’t terribly difficult, but there are a few steps. I found this post, which was very helpful. It’s primarily for Rails sites, so I’m posting some more detail for Jekyll sites here.

The goal is to serve all static assets from Amazon’s content distribution network for stability and consistent load times worldwide. The process outlined below uses the pull method, which means I never have to upload anything. When an asset is requested from the CDN, it checks the cache and, if it doesn’t already have the file, it grabs it from your website. More about the final results at the end.

While much of this article is only germane to Jekyll, the basic setup can be ported to just about any website with considerations for the plugin changes and templating style. If you’re interested in brewing your own CDN, this should help you get started.

CloudFront Setup

First, you need an Amazon Web Services (AWS) account. If you already have an Amazon account, this should be easy. Go to the management panel and open CloudFront. You don’t need to create a distribution yet, we’ll do that with a script:

Add a cdn_url key to your Jekyll installation’s _config.yml and set it to the distribution url. Make sure there’s no trailing slash. I’ll get back to this in a second.

Cachebusting

In order to be able to easily invalidate cache objects on CloudFront, you’ll want to implement a versioning system for your assets. I wrote about the process with some additional information recently, but here are the basics.

Add a version key in _config.yml and set it to a starting number (1?). Once implemented in your templates, busting the cache for the whole site is just a matter of incrementing the number. You can do this manually or automatically with a Rake task, as detailed in my previous post on the subject.

Use the version in your templates with {{ site.version }}. Just insert it into any filename:

<linkrel="stylesheet"href="/style.{{ site.version }}.css">

In .htaccess, set it up to serve original files when versioned filenames are requested.

This allows you to request style.1234.css, and when the CDN receives the request and polls your site, your site will serve style.css. Incrementing the version number tells the CDN to grab a new copy without having to maintain locally-versioned files.

If you load assets via JavaScript, add window-level JS variables in your template (before your scripts load) that you can use when loading other assets:

Apache configuration

I added some headers to my httpd.conf file to improve caching. I’m still experimenting with these settings, and there are some points I’m not certain about. If you have access to your servers config, you can try these suggestions out, but use your own judgement.

You can turn on keep-alive, if you haven’t already. This requires the “headers” apache module, which should be present in any default installation:

It might be a good idea to turn off sending cookies. I use a couple of cookies on my site, and setting this sitewide hasn’t broken anything, but has improved caching:

<ifModulemod_headers.c>RequestHeader unset Cookie
</ifModule>

Some of these can also be set in .htaccess, and there are some additional caching and optimization suggestions in the HTML5 Boilerplate htaccess. I use a lot of what’s in there, and there may be some settings I already had that help this whole thing work without me realizing it.

Templates

Now we’ll use the cdn_url key in _config.yml. If this is left empty, absolute links without a protocol/hostname will still function with the normal server version. You’ll want to use absolute paths such as /javascripts/asset.js in your templates instead of full urls like http://yoursite.com/javascripts/asset.js.

Optionally, add a production key in _config.yml. In my setup, this is set to true when generating for a deploy, false when developing or previewing. If needed, templates can check this value before inserting cdn_url to avoid having to increment your version number just to avoid caching in development.

Running rake prod with no argument will set it to true, and you can use rake prod[true] and rake prod[false] to toggle it. My system also automatically changes it to false when generating any preview (anything with a post count limit or without –no-future), and switches it to true any time a full generate or gen_deploy is run.

Plugins

You can modify any of your plugins to insert the cdn_url value before urls that are generated. Note that you need to have any plugins you modify check for your website url and remove it before adding CDN url. Check for your full url (or site.url) so that you don’t mess up external links. As I mentioned, if you use absolute paths starting with a backslash, this isn’t an issue.

I use Liquid image tags in my posts, which means that with a little modification to the image plugin, Jekyll can automatically change my image paths to CDN paths. I hacked up the the image_tag plugin provided by Brandon Mathis, Felix Schfer, and Frederic Hemberger, and you can find my version in this gist. It contains some elements specific to my site (including lazy loading image replacement), so you’ll just want to extract the relevant parts. The section starting at line 71 is where the cdn_url is implemented and contains some clues.

My homebrew download management system uses a similar technique, checking if the download url is local and serving it from the CDN if it is. My most popular downloads (like nvALT) were already served from S3, but now almost everything is, and I never have to deploy files to a remote server manually.

You can use the “production” key as a condition and access the cdn_url in plugins through the site object. In a generator plugin, use site.config['production'] and site.config['cdn_url']. In tag plugins, you’ll need to use the context.registers object: context.registers[:site].config["cdn_url"].

The results

The first day I was only seeing an average 8% decrease on page load times, though non-US locations were faring better. After a day, though, my page load times (testing from US locations) have gone from 1.2-1.5s to 600-700ms. Tests from an Amsterdam server are showing 1.1-1.4s, down from an average of 3 seconds. I also pass more YSlow tests now1.

The fact that I can turn the CDN off at any time just by blanking out the cdn_url key — combined with not having to implement any S3 upload tasks — means that there’s really no risk in my trying it out, and I don’t have to undo anything if I change my mind. I can leave all the template changes in place; they won’t do anything as long as cdn_url is blank, and if I try another CDN in the future, I just have to update that value.

I set this all up pretty late at night2, so I might be forgetting a step (or two). This is all the basic information you need to get rolling, though. Leave a comment if you notice any glaring omissions or blatant misinformation. Or typos. I need to hire a proofreader if I’m going to keep blogging at 4 in the mrning.

He tests well, but he doesn’t get along with the other kids. I think we need to look into a private school.↩