February 2, 2011

Subscribe

Using Amazon’s Cloudfront to host static files

The major players online use CDNs–Content Delivery Networks–to serve their static files, such as images, CSS and javascript files. CDNs have edge locations scattered around the world, so that when a user in Japan (for example) visits your page, he’ll be served all the static content from a server in Tokyo, even though your main web server may be located in the US.

In recent years, a number of CDNs have emerged that can affordably serve content for web sites that don’t have million dollar budgets. Amazon’s Cloudfront service is probably the cheapest, most accessible such CDN.

I began testing out Cloudfront for serving GoToQuiz files last December. I was curious whether doing so would have a positive effect on page load speed. Yahoo’s YSlow tool pushes CDN usage as an effective way to give your site a speed boost, so I figured I’d see some positive results. Via Google’s Webmaster Tools (Labs > Site performance), it seems there was a slight improvement in page load speed. Cloudfront may have shaved off a bit less than half a second in average load time. Nothing dramatic, but every bit helps.

January was the first month in which I had moved the majority (~ 90%) of the static files to Cloudfront. As a result, Cloudfront handled about 8 million requests for me. That was 8 million requests my server did not have to handle. Unburdening my server is important–in fact, it is not page speed but the ability to withstand traffic spikes which is my main concern in trying out a CDN.

GoToQuiz has had traffic spikes in the past which put significant strains on the server, on one occassion rendering it completely unresponsive. When I dug into the logs and looked at the Apache processes, I determined that the server was dedicating a tremendous amount of resources–not to database queries or serving the actual pages–but to handling the dozens of requests for static files used on each page.

Consider: if your typical page utilizes 20 static files, it means each page view results in 21 requests to your server. If you move all 20 of those files onto a CDN, each page view will burden your server with only one request–not 21.

So with this in mind, I believe using Cloudfront has made GoToQuiz better able to withstand traffic spikes. I should also mention that you’d do well to reduce your Apache KeepAliveTimeout setting. This setting tells Apache how long to wait for additional requests on the same connection. The default of 15 seconds is quite a long time to tie up each of your Apache processes, especially if your server is getting hammered. If you’ve shifted all or most of your static content onto a CDN, there’s really no sense in having a long KeepAliveTimeout. I’ve reduced this setting to 5 seconds.

So how happy am I with Cloudfront? With its pay-as-you-go approach (no upfront fees or commitments), it is the most compelling option for experimentation. It is frustrating to use, however. Amazon’s web interface is slow and lacking in functionality. You must use a third party tool to get anything done. I searched for a non-crippled free tool to interface with Cloudfront, and I found a decent one: CloudBerry S3 Explorer. I give this tool a high recommendation.

Part of the frustration with Cloudfront is that you must use it with Amazon’s S3 service. That is, you must first upload your files to S3 buckets before Cloudfront can serve them. Other, better CDNs will pull the files right from your web server. After you’ve uploaded your files to S3, you must remember to change their permission settings to allow the world to view them. If you forget this step, Cloudfront will serve an error and will cache the error for all subsequent requests. You must send an invalidation request for each errored file, and then wait for your request to be honored (~10 minutes in my experience)–thankfully the CloudBerry tool makes this relatively painless.

Another problem with Cloudfront is that it does not send cache control headers by default. You must manually set either the Expires or Cache-Control header on your uploaded files, or else visitors to your site will be requesting the same files again and again. Repeated requests for the same files are not only inefficient–it will drive up your Cloudfront costs. In my case, GoToQuiz would’ve generated far more than 8 million Cloudfront requests had I not set a cache control header. I set the Cache-Control header to “max-age=31536000”, which does the job. Mercifully, the CloudBerry tool allows you to set this on multiple files at once. Other tools (such as S3 Browser) require you to upgrade to a pay version for this functionality, which renders the free version completely unusable.

So in conclusion, I recommend Cloudfront with caveats. Amazon’s web interface to S3 and Cloudfront is buggy and slow. Cloudfront’s inability to pull files from your web server is inconvenient, as is the need to manually set permissions and headers for your files (a global default would be nice). Cloudfront also does not support gzipping your CSS and javascript unless you resort to a very clumsy workaround. However, the price is right, and there are no contracts or commitments. Amazon has ten edge locations in the US, four in Europe, and three in Asia, giving you a global presence for a bargain. I’m interested in investigating other CDN services, though. MaxCDN looks relatively affordable, and I believe does not have some of Cloudfront’s shortcomings. I will be sticking with Cloudfront in the near term, however.

Thank you for the sensible critique. Me and my neighbor were just preparing to do a little research about this. We got a grab a book from our area library but I think I learned more from this post. I am very glad to see such magnificent information being shared freely out there.

We’ve been using S3 to handle all our file archives for SetStrike (print a file and the PDF copy goes to Amazon). We decided to keep our javascript files and CSS on a public bucket on S3.
That was pretty slow, so we tried CloudFront…. even worse.

Finally, one day, we made a frustrated decision and just gave up and put them back on our own servers.

However… now that you pointed out that there are no cache control headers on CloudFront, I’m smacking myself. So, they may go back up.

Thanks,

I’d also like to give a big thumbs up to Amazon’s S3 service. Using that to manage hundreds of thousands of archive files has been great.

“Amazon CloudFront is optimized to work with other Amazon Web Services, like Amazon Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2). Amazon CloudFront also works seamlessly with any origin server, which stores the original, definitive versions of your files.”