Chapter 16. HTTP Headers for Optimal Performance

Contents:

Header composition is
often
neglected in the CGI world. Dynamic content is dynamic, after all, so
why would anybody care about HTTP headers? Because pages are
generated dynamically, one might expect that pages without a
Last-Modified header are fine, and that an
If-Modified-Since header in the
client's request can be ignored. This laissez-faire
attitude is a disadvantage when you're trying to
create a server that is entirely driven by dynamic components and the
number of hits is significant.

If the number of hits on your server is not significant and is never
going to be, then it is safe to skip this chapter. But if keeping up
with the number of requests is important, learning what
cache-friendliness means and how to cooperate with caches to increase
the performance of the site can provide significant benefits. If
Squid or mod_proxy is used in httpd accelerator
mode (as discussed in Chapter 12), it is crucial to
learn how best to cooperate with it.

In this chapter, when we refer to a section in the HTTP standard, we
are using HTTP standard 1.1, which is documented in RFC 2616. The
HTTP standard describes many headers. In this chapter, we discuss
only the headers most relevant to caching. We divide them into three
sets: date headers, content headers, and the special
Vary header.

16.1. Date-Related Headers

The various headers related to when a document was created, when it
was last modified, and when it should be considered stale are
discussed in the following sections.

16.1.1. Date Header

Section 14.18 of the HTTP standard
deals with the circumstances under which
we must or must not send a Date header. For almost
everything a normal mod_perl user does, a Date
header needs to be generated. But the mod_perl programmer
doesn't have to worry about this header, since the
Apache server guarantees that it is always sent.

In http_protocol.c, the Date
header is set according to $r->request_time. A
mod_perl script can read, but not change,
$r->request_time.

16.1.2. Last-Modified Header

Section 14.29 of the HTTP standard
covers the
Last-Modified header, which is mostly used as a
weak validator. Here is an excerpt from the HTTP
specification:

A validator that does not always change when the resource changes is a "weak
validator."
One can think of a strong validator as one that changes whenever the bits of an
entity changes, while a weak value changes whenever the meaning of an entity changes.

What this means is that we must decide for ourselves when a page has
changed enough to warrant the Last-Modified header
being updated. Suppose, for example that we have a page that contains
text with a white background. If we change the background to light
gray then clearly the page has changed, but if the text remains the
same we would consider the semantics (meaning) of the page to be
unchanged. On the other hand, if we changed the text, the semantics
may well be changed. For some pages it is not quite so
straightforward to decide whether the semantics have changed or not.
This may be because each page comprises several components, or it
might be because the page itself allows interaction that affects how
it appears. In all cases, we must determine the moment in time when
the semantics changed and use that moment for the
Last-Modified header.

Consider for example a page that provides a text-to-GIF renderer that
takes as input a font to use, background and foreground colors, and a
string to render. The images embedded in the resultant page are
generated on the fly, but the structure of the page is constant.
Should the page be considered unchanged so long as the underlying
script is unchanged, or should the page be considered to have changed
with each new request?

Actually, a few more things are relevant: the semantics also change a
little when we update one of the fonts that may be used or when we
update the ImageMagick or equivalent
image-generating program. All the factors that affect the output
should be considered if we want to get it right.

In the case of a page comprised of several components, we must check
when the semantics of each component last changed. Then we pick the
most recent of these times. Of course, the determination of the
moment of change for each component may be easy or it may be subtle.

mod_perl provides two convenient methods
to deal with this header:
update_mtime( ) and set_last_modified(
). These methods and
several others are unavailable in the
standard mod_perl environment but are silently imported when we use
Apache::File. Refer to the
Apache::File manpage for more information.

The update_mtime( ) function takes
Unix's time(2) (in Perl the
equivalent is also the time( ) function) as its
argument and sets Apache's request structure
finfo.st_mtime to this value. It does so only when
the argument is greater than the previously stored
finfo.st_mtime.

The set_last_modified( ) function sets the
outgoing Last-Modified header to the string that
corresponds to the stored finfo.st_mtime. When
passing a Unix time(2) to
set_last_modified( ), mod_perl calls
update_mtime( ) with this argument first.

The following code is an example of setting the
Last-Modified header by retrieving the
last-modified time from a Revision Control System (RCS)-style of date
tag.

Normally we would use the Apache::Util::parsedate
function, but since it doesn't parse the RCS format,
we have used the Date::Parse module instead.

16.1.3. Expires and Cache-Control Headers

Section 14.21 of the HTTP standard deals with the Expires
header. The purpose of the Expires header is to
determine a point in time after which the document should be
considered out of date (stale). Don't confuse this
with the very different meaning of the
Last-Modified header. The
Expires header is useful to avoid unnecessary
validation from now until the document expires, and it helps the
recipients to clean up their stored documents.
Here's an excerpt from the HTTP standard:

The presence of an Expires field does not imply that the original resource will
change or cease to exist at, before, or after that time.

Think carefully before setting up a time when a resource should be
regarded as stale. Most of the time we can determine an expected
lifetime from "now" (that is, the
time of the request). We do not recommend hardcoding the expiration
date, because when we forget that we did it, and the date arrives, we
will serve already expired documents that cannot be cached. If a
resource really will never expire, make sure to follow the advice
given by the HTTP specification:

To mark a response as "never expires," an origin server sends an Expires date
approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD
NOT send Expires dates more than one year in the future.

For example, to expire a document half a year from now, use the
following code:

The latter method should be faster, but it's
available only under mod_perl.

A very handy alternative to this computation is available in the
HTTP/1.1 cache-control mechanism. Instead of setting the
Expires header, we can specify a delta value in a
Cache-Control header. For example:

$r->header_out('Cache-Control', "max-age=" . 180*24*60*60);

This is much more processor-economical than the previous example
because Perl computes the value only once, at compile time, and
optimizes it into a constant.

As this alternative is available only in HTTP/1.1 and old cache
servers may not understand this header, it may be advisable to send
both headers. In this case the Cache-Control
header takes precedence, so the Expires header is
ignored by HTTP/1.1-compliant clients. Or we could use an
if...else clause: