Details

If your web site is flourishing, you've probably realized that success is both a blessing and a curse. On the one hand, a growing audience and increased traffic furthers your goalbe it communication, commerce, or community. On the other hand, the additional page views and the incumbent demands on bandwidth, processor cycles, and storage may seem daunting. Ironically, mushrooming traffic can torpedo a burgeoning site.

But before you run off to lease more bandwidth or purchase additional rack-mount servers, consider delegating some of the busy work your servers currently contend with to Amazon Simple Storage Service (Amazon S3) and the Amazon CloudFront content-delivery network. In particular, Amazon S3 and Amazon CloudFront can unburden your servers from the chore of serving static content. Delivering assets from a specialized network reserves compute power for computation.

Moreover, you can serve that content more quickly. Typically, the more distant the server (as measured by network topology), the greater the latency, or the time required to establish a connection to a server. Amazon CloudFront both reduces latency and hastens each download. Amazon CloudFront edge locations are located worldwide and are likely more proximal to your visitor than your own server farm. Further, Amazon CloudFront has been architected and optimized to deliver content such as images and stylesheets.

As you'll see, integrating your application with Amazon S3 and Amazon CloudFront is a snap. With a bit of work, you can do more with the hardware and software you already have deployed.

Amazon-ian Compute Power

Figures 1 and 2 depict the difference between an entirely self-hosted application and one that delegates asset trafficking to the Amazon CloudFront content-delivery network.

In the self-hosted solution, captured in Figure 1, example.com hosts its site on servers in Austin, Texas. The system handles all requestsdynamic web pages, static web pages, and static assets such as images, movies, Cascading Style Sheet (CSS) files and JavaScript files. Every site visitor, independent of location (here, Maui, Hawaii; Los Angeles, California; Lincoln, Nebraska; New York, New York; and Miami, Florida) connects to the server in Austin. For the resident of Maui, the latency to download an image is the time required to traverse the Internet between the two endpoints.

Figure 1. A centralized, stand-alone server farm must process all incoming requests. The capacity of the servers is diluted by busy work.

In the alternate solution shown in Figure 2, combining example.com's existing infrastructure with Amazon S3 and Amazon CloudFront, requests for dynamic content (shown as dash-dotted lines) are fulfilled by the Austin server. However, requests for static content (the dashed lines) are sent instead to the closest Amazon CloudFront edge location (each edge location is depicted as a star). If the edge location does not yet have the requested asset, it pulls a copy from Amazon S3, caches the asset locally, and ultimately fulfills the request. From then on and until the asset is expired from the edge location's cache, all subsequent requests for the asset can be served proximately and expeditiously.

Figure 2. Amazon CloudFront, a content-delivery network, delegates delivery of static content to a specialized, distributed, and proximate network of servers. (The maps shown here are illustrative only and do not necessarily represent the topology of Amazon S3 and Amazon CloudFront.)

Thus, assets that are oft-requested tend to remain in the edge location cache, translating to lower latency and improved download times for those nearby.

The division of labor between the example.com server and the access points is transparent to the web surfer. Rather than point the Hypertext Markup Language (HTML) to an image, say, on example.com, the application server points the URL to the Amazon CloudFront network. The browser rolls merrily along unperturbed.

There is one wrinkle in this scheme: An edge location can serve stale data. If the edge location has an asset in its cache, it need not return to the Amazon S3 bucket to fetch the asset again. Hence, if you change the asset on Amazon S3crop a photo, for instancewhat is available in the remote cache differs from the canonical source. You can mitigate this issue by setting short expiries on Amazon CloudFront, but this defeats the purpose and real advantages of a long-lived cache.

The best solution is to uniquely name each and every version of each asset and modify your application to request a specific version. For example, instead of generating <img src="logo.png" />, the application might generate <img src="logo_200903222305.png" />, where logo_200903222305.png is the latest version and 200903222305 reflects the last-modified time of the file, or March 22, 2009, at 11:05 p.m. You can concoct any number of unique naming schemes, but choose one and retrofit your code and build system to suit.

The next section presents a complete example. If you want a fresh computer to develop and test this example, launch an instance on Amazon Elastic Compute Cloud (Amazon EC2). Several images suffice, including the Rails on Ubuntu Intrepid image, AMI ID ami-e1937488. The example is constructed with Ruby on Rails, but you can construct the equivalent in other popular programming languages and other platforms.

Getting Started with Rails

Let's create a small Ruby on Rails application and integrate it with Amazon S3 and Amazon CloudFront to serve static content from the network of edge locations. The application is a simple photo manager. You can upload and caption images; when uploaded, Amazon S3 and Amazon CloudFront serve all the images.

If you're using the recommended Amazon EC2 instance, all the required software is already installed. If your computer does not have Ruby, Rails, SQLite, or ImageMagick, packages are widely available in binary form for many platforms. If you use Linux, all the software is likely available via your package manager, such as Aptitude.

You will also need two additional Ruby gems: the RightScale Amazon Web Services gem and Paperclip. The former provides a robust, fast, and secure interface to Amazons Web Services (AWS); the latter processes file uploads and associates each attachment with a Rails model. Paperclip can also store attachments directly on Amazon S3, which makes it ideal for this demonstration.

After you install Ruby, Ruby on Rails, SQLite, and ImageMagick, install the two gems:

Paperclip provides has_attached_file, which performs all the heavy lifting required to save an uploaded file and associate it with a record. The configuration for Picture also creates two thumbnail styles: thumb and medium (one small and one large, respectively). The rest of the options configure access to Amazon S3:

:storage => s3 saves attachments to Amazon S3. If you omit this element, Paperclip saves attachments to the local file system within the application's public folder.

:bucket => BUCKET selects a bucket. You should edit the constant BUCKET and replace cloudfront-demo with the name of one of your own buckets.

:path describes how each file should be stored within a bucket. In addition to literals, you can use placeholders that are filled when each attachment is actually saved. The :attachment element is replaced with the name of your model; :id stands in for the ID of the model instance; :styles is replaced with the name of the image style (here, one of originals for the full-sized attachment, and either thumbs or mediums for the thumbnail images); and :extension is the attachment's file extension.

:s3_credentials grants access to your bucket. You must replace the placeholder credentials, YOUR_ACCESS_KEY_ID and YOUR_SECRET_ACCESS_KEY with your own access key ID and secret access key. (You can generate this pair of keys from the AWS site.)

As the owner of the bucket, Paperclip grants you full access implicitly. The s3_permissions specified, public-read, grants open access to the bucket to everyone else.

Figure 3 shows what an Amazon S3 bucket looks like after three additions to the picture catalog. (The application shown is the Firefox S3 Organizer plug-in, which can navigate and manage Amazon S3 files, access control lists [ACLs], and more. You use it later in this article to configure Amazon CloudFront, as well.) Look at the path shown in the text field. Seem familiar? The path components of /cloudfront-demo/uploads/photos/4/ are the bucket name, the literal "uploads," the name of the associated model, and the ID of the model instance, respectively.

Figure 3. Paperclip can save attachments in your Amazon S3 bucket.

The next step is to add a Rails migration to modify the Picture model and add new columns to keep track of the attachments's file name, size, content type, and date saved. Create the migration with ruby script/generate.

The last step in writing the application is to create the views to create, edit, view, and delete pictures. The controllers are satisfactory unchanged. The listings below are index.html.erb, show.html.erb, new.html.erb, and edit.html.erb, respectively.

Now, run the application. Type ./script/server and point your browser to http://localhost:3000. Walk through the interface, and upload a handful of files. Eventually, your index page should resemble Figure 4.

Figure 4. A snapshot of the image manager application.

If you examine your Amazon S3 bucket, it should resemble Figure 3. And if you view the HTML source of the application's index page, the URL for each image resembles https://s3.amazonaws.com/cloudfront-demo/uploads/photos/1/thumbs.png. With very little work, you've already separated the work of the application from the busy work of serving assets. The next step is to serve the assets even more quickly by referencing an edge location in closer proximity to your user.

Tying Rails to Amazon CloudFront

To disburse content from Amazon CloudFront, you create a distribution. A distribution connects an Amazon S3 bucket to a special domain name such as http://d138nkhrob277s.cloudfront.net. To serve content from the distribution, simply replace the domain name and name of the bucket with the domain name of the distribution.

For example, if you serve an image from your Amazon S3 bucket via https://s3.amazonaws.com/cloudfront-demo/uploads/photos/1/thumbs.png, you can serve the very same image via Amazon CloudFront by changing the URL to http://d138nkhrob277s.cloudfront.net/uploads/photos/1/thumbs.png. It's that easy!

Amazon offers a number of ways to create and manage distributions. The most convenient option is the Firefox S3 Organizer plug-in. Figure 5 shows the Firefox S3 Organizer plug-in and a pair of buckets created for this example. To create a distribution, right-click a bucket from the list, then click Manage Distributions.

Now that the distribution is ready, you can change the Rails code to use Amazon CloudFront. The changes are simple: In Picture, wrap the existing Paperclip url() method and replace the leading portion of the URL with the Amazon CloudFront URL, then update the views to use the new method.

And here is the one line to index.html.erb to use Amazon CloudFront. You can make similar changes to the other view templates, too.

<%= image_tag picture.cloudfront_url(:thumb) %>

Managing Versions of Files

The last task is to ensure that Amazon CloudFront doesn't serve stale static assets, such as data files and images.

To reiterate, Amazon CloudFront caches your assets in edge locations to accelerate delivery to nearby visitors. Assets do expire from the cache, but the expiry of a particular asset may be days or even weeks in the future, if your application has so deemed. (An expiry can be set by your application via a Cache-Control header in a response.) In the event you want to replace the asset with a new version of the asset, you must do something to effectively obsolete the older incarnation.

An easy approach is to simply assign the new version of the asset a new name. For example, if you change your company logoan asset that typically changes very rarely and hence can be cached for long periodsyou can update all uses of logo.png to logo_version2.png. Essentially, the older file is now obsolete and will be purged from each edge location cache in due course. However, as soon as the application references logo_version2.png, the edge locations will rush to the origin to cache the new file.

While the notion is simplejust create a new file for each new version the mechanics of implementing the scheme are a bit more complicated. If you use a version control system to manage your assets, for instance, you probably want the file name intact from revision to revision to be able to keep its history intact, too. And, given how many assets an application typically has, you don't want to search and replace asset references in every source code file every time you make a change.

What to do? Automate, of course, using rake.

To prepare, march through your code and replace all explicit references to an asset with a constant. For example, if you refer to use logo.png, replace it with LOGO_PNG. Next, move all the assets in public to a new directory, app/assets. Ideally, app/assets, like the rest of your Rails application, is maintained by a version-control system such as Subversion or Git.

Next, create a rake task to amend the assets' file names with the last modified time of the file (or, if you prefer, the MD5 hash of the contents of the file). This makes each file name unique, because timestamps per file are also unique. Finally, map the latest revision of the file to its constant and emit a list of constants to include in the application.

Here is the new rake task, which should be installed in lib/tasks/constants.rake in your Rails application.

There are two new tasks: constants:clean and constants:generate. The former task removes your public directory and the file config/initializers/constants.rb. Both are regenerated by the other new task, constants:generate.

Let's try it. Open the index.html.erb template and add the Rails logo to the head of the page with the line:

<%= image_tag RAILS_PNG %>

Next, copy the contents of the public folder into app/assets, run the two new rake tasks, and start the server.

If you look at the HTML source of your index page, you should see <img alt="Rails_03232009160913" src="/images/rails_03232009160913.png />. That's good.

The last step is to move the image to your Amazon S3 bucket and point the <img ...> tag to point to the Amazon CloudFront domain.

Use the Firefox S3 Organizer to create a folder named images in your bucket and upload the Rails PNG file. Make sure to make both the folder and the image world-readable.

To point your image tag URLs to Amazon CloudFront, set the Rails asset_host to the domain of your distribution. The asset_host, if set, prefaces URLs for images and other static assets with the domain you specify. Open environment.rb, and add the following line:

Stop and restart the server, and revisit your index page. The URL for the Rails logo should now point to your Amazon CloudFront distribution.

Amazon CloudFront: A Silver Lining

With only a modicum of coding and some work to establish an initial Amazon CloudFront presence, the sample Rails application now delegates a good amount of overhead to Amazon CloudFront's edge locations. The cost? Negligible. The potential reward? Considerable.