some practical caching notes

Cache-Control header

public

private

max-age

must-revalidate — "If the response includes the 'must-revalidate' cache-control directive, the cache MAY use that response in replying to a subsequent request. But if the response is stale, all caches MUST first revalidate itwith the origin server, using the request-headers from the new request to allow the origin server to authenticate the new request" HTTP 1.1 spec

no-cache — when no-cache, "...a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests" HTTP 1.1 spec

"...in practice, IE and Firefox have started treating the no-cache directive as if it instructs the browser not to even cache the page. We started observing this behavior about a year ago. We suspect that this change was prompted by the widespread (and incorrect) use of this directive to prevent caching." Cache Control Directives Demystified

McAfee Web Gateway was not re-validating with public, max-age=31557600, no-cache, causing corporate clients to not see changes. It seems the McAfee cache was paying attention to max-age and ignoring the no-cache. Changed to public, max-age=0, must-revalidate (that way, we continue to get caching in Firefox and IE, but intermediary caches will see that it is stale and that they must revalidate).

checking cache settings in Chrome

Press return in address bar = refresh = revalidate using previous header values; response may be new content or 304 Not Modified

Refresh + shift = send a request without the If-Modified-Since or If-None-Match, and including no-cache headers, to force the server to send content again

Only navigation via clicks seems to use local cache without revalidating

reverse proxy caches (web accelerators)

A reverse proxy cache sit between the internet and the origin server. It receives incoming requests and only sends to the origin server those requests that it cannot fulfill (based on the headers defined when an object was returned by the origin server).

As an added benefit, the reverse proxy cache can terminate SSL to reduce load on the origin server.

Varnish

run your own Varnish instances (there is also Fastly— globally distributed Varnish as a service, including VCL support)

instantly purge items from Varnish

health-check the back-end and protect from traffic when down

grace period—when an object is expired, but requested again, Varnish can fulfill the request immediately with the expired version if it is still within the grace period rather than making it wait while it goes and gets the fresh resource (which it does, it just doesn't make the requester wait for the fresh version); make sure max-age > 0 or this grace period can be confusing

caching Paperclip files in CloudFront

Since CloudFront supports S3 buckets as distributions, this is trivial:

This is only suitable for assets that never change (or in some other, specific use cases) because invalidating CloudFront objects takes on the order of 10 minutes.

Note our experience with videos

caching the asset pipeline in CloudFront

The Rails asset pipeline is a perfect candidate for CloudFront because the precompiled assets have file names fingerprinted based on their contents. For this to work, you must adhere strictly to the asset pipeline when including assets—that is, always use asset_path & co. for images, etc. The below approach involves precompiling assets locally (well, at least not on the production server) and moving them to S3:

precompile_and_deploy bash script (since we are always using the fingerprinted file names, we can cut the precompilation time in half by doing rake assets:precompile:primary, which only computes the hashed version of the files)

since S3 does not negotiate the Content-Encoding header, you can either serve everything gzipped or you can back CloudFront with a server that handles content negotiation, in which case CloudFront will properly deliver subsequent requests based on the Conten-Encoding header; the example VCL file shows how to do this using Varnish

in order for asset pipeline helpers to generate URLs pointing at CloudFront, add the following lines to config/environments/production.rb:

# Enable serving of images, stylesheets, and JavaScripts from an asset serve# make asset_path generate // format URLs in web pages so they take on the protocol of the pageconfig.action_controller.asset_host = "//d3l5bx7ow11yzt.cloudfront.net"# action_mailer line needs a protocol because email clients don't have a protocol to inheritconfig.action_mailer.asset_host = "https://d3l5bx7ow11yzt.cloudfront.net"

WARNING: you must ensure that in your VCL, you delete the Set-Cookie header from assets or visitors will get cookies from your users—that is, they can suddenly be signed in as another user simply by visiting your site