Edge Caching With Play 2, Heroku, and CloudFront

Web applications are primarily comprised of data, services, and the User Interface (UI). The UI is comprised of HTML, CSS, images, and probably JavaScript. In the traditional web architecture all of the UI assets are static files except the HTML which is dynamically generated by the server. In the modern web architecture the entire UI is static files that consume RESTful / JSON services. The static files for the UI must be downloaded to the client so the less time it takes for them to be downloaded, the better the overall performance of the application.

Bits are moved around the world through beams of light (fiber optics). Unfortunately the speed of light just isn’t fast enough when it comes to transferring data. It takes 134ms for light to travel once around the world. Think of all the light and bits that had to move from the Eastern United States (where this article is being served from) to you. Some of the time you spent waiting for the content to load was purely in the moving of bits over long distances. There is an obvious reason why physically moving bits closer to the consumer has massive performance benefits.

It would be amazing if a web application could just live right down the street from every user of the application. But doing so would cause there to be many copies of the data, services and UI for an application. The data usually needs to be consistent across all of the users of the application. Maintaining a geo-distributed and consistent data set is a really hard thing to do without massive data synchronization overhead. The services are just the gateway to the data so they need to be near the data. But the UI, all of those static assets, can easily live in many places; ideally located near the consumers.

This is exactly what a Content Delivery Network (CDN) does. Also known as “edge caching”, a CDN takes copies of static files and replicates them to servers around the world so that whenever someone downloads a static file the bits don’t have to transfer across large distances.

It has often been a big hassle to geo-distibute / edge cache the static assets in web applications. The typical setups for utilizing a CDN involve complex deployment procedures and brittle architectures. But the value of edge caching the static assets is immense for any size web application. The most effective way to improve the overall performance of just about every web application is to use a CDN.

Lets walk through how you can use the Amazon CloudFront CDN, Heroku, and Play 2 to transparently edge cache static assets. Feel free to follow along.

For this example we are going to load jQuery from a WebJar so you can remove the play2-cloudfront/public/javascripts/jquery-1.7.1.min.js file. Now update the play2-cloudfront/project/Build.scala file to include a dependency on the jQuery WebJar and the WebJars repository:

Play has a simple static asset controller that serves files from the classpath (any Jar dependency or source directory). However, the Assets Controller doesn’t provide a mechanism in its URL resolver (Play 2’s reverse routing) to change the URL of the asset. We will need this functionality later since loading assets from CloudFront requires using a different, non-relative, domain name. To solve this we will create a new RemoteAssets controller that wraps the Assets controller and optionally adds a domain prefix in front of the resolved URLs.

Create a new file named app/controllers/RemoteAssets.scala that contains:

This Scala class has a getAsset method that takes path and file parameters and returns the actual asset in the response. A Date header is also added to the response headers. The getUrl method takes a file parameter and returns a URL to the file. That URL will be prefixed by a contentUrl if one is provided in the application’s configuration. To setup the configuration so that a contentUrl can be optionally provided, add the following to the conf/application.conf file:

contenturl=${?CONTENT_URL}

If an environment variable named CONTENT_URL is provided then the contenturl configuration parameter is set.

Now lets create and use some static content. First lets write a little CoffeeScript that will use jQuery to fade an image in. This will help to illustrate how even compiled and minimized assets can be loaded from the CDN. Create a new file named app/assets/javascripts/index.coffee containing:

$ ->
$("img").fadeIn()

This simple script simply fades in all of the images on the page when the page has loaded.

Also update the public/stylesheets/main.css file to give the web page a new background color:

body {background-color:#ddddff;}

Now create a new server-side template that will load the stylesheet, jQuery (from the WebJar), the index.coffee script, and the public/images/favicon.png image. Create a new file named app/views/index.scala.html containing:

Notice how the getUrl method in the RemoteAssets controller is used to get a URL for each asset. Now we need a simple controller that will render the index template. Create a new file named app/controllers/Application.java containing:

If this is your first time using the Heroku Toolbelt then you will be led through the steps to associate an SSH key with your Heroku account. This SSH key will be used to authenticate your uploads via Git.

From the command line in your play2-cloudfront directory, create a new Git repository, add your files to it, and commit them:

git initgit add app conf project public
git commit-m init

From the command line in your play2-cloudfront directory, provision a new application on Heroku:

heroku create

This will create a new application with corresponding HTTP and Git endpoints, like:

This will push the master branch of your Git repository to the Git remote named heroku which in my case points to the git@heroku.com:peaceful-retreat-3158.git URL. When Heroku receives the files it will run the project build (SBT for Play 2 projects), then deploy and run the application. When the application is running you can access it in your browser:

heroku open

This time all five requests go to Heroku:

The requests take quite a bit longer than locally, in-part because the bits have a much larger distance to travel. All of the requests except the index page (because it’s dynamic) can be served from a CDN. Now lets setup CloudFront to serve the static assets.

Serve Static Assets with CloudFront

CloudFront has a very simple way to load static assets into it’s CDN. When a request comes into CloudFront, if the asset is not on the CDN or has expired, then CloudFront can get the asset from an “origin server”. The application you just deployed on Heroku will now be the origin server for the static assets. To setup a new CloudFront “Distribution”:

In the Origin Domain Name field enter the domain name for your application on Heroku. In my case it is: peaceful-retreat-3158.herokuapp.com

Keep the other default values as-is and select Continue

Do the same for the next two steps (keep the defaults)

Select Create Distribution

It will now take about ten minutes for AWS to create the CloudFront distribution. You can monitor the status in the AWS Console. While you wait, take note of the domain name provided for your distribution. Mine is: d7471vfo50fqt.cloudfront.net

The first time that request goes through, CloudFront will make a request back to the app on Heroku and then load the asset into the CDN. If you examine the HTTP response headers on that request you will see:

X-Cache: Miss from cloudfront

That indicates that the resource was not on the CDN. A subsequent request should contain the following response header:

X-Cache: Hit from cloudfront

That indicates that the resource was served from the CDN and there was no need to go back to the origin server.

Now that the static assets are loadable via CloudFront lets tell the app on Heroku and the RemoteAssets controller to point to them. Just set the CONTENT_URL environment variable on your application by running the following from the command line (make sure you replace the URL value with the one for the distribution you just created):

heroku config:add CONTENT_URL="http://d7471vfo50fqt.cloudfront.net"

Now test out your application on Heroku in your browser:

heroku open

You should now see all four static asset requests going to CloudFront:

But as you can see they assets didn’t load very quickly because the first request is a Miss from cloudfront. Reload the page (clear your cache to avoid 304s) and you should see much faster responses:

And now your static assets are being edge cached!

Learn More

Using a CDN is step one of significantly speeding up your web applications but there is certainly more that you can do. By default Play sets the expiration time of static assets to 1 hour (via the Cache-Control response header):

Cache-Control: max-age=3600

You can change that value by modifying the application.conf file (more details). Often times you will also want to use far future expires and use naming conventions to instruct the client to fetch a new version of a static asset.

Bit off topic – but if I was to set up RemoteAssets as a class and not an object (and inject dependencies via Guice or whatever) – how do I call this in the template? (ie. @RemoteAssets.getUrl won’t work)

http://www.jamesward.com James Ward

You can define it as a class and then have an object that extends that class for the getUrl method.

Patrick Simon

Cheers James – I thought I could magically reference the instantiated object but look’s like I need to import it at the top of the template in the normal way.