Caching With Apache's mod_cache On Debian Lenny

This article explains how you can cache your web site contents with Apache's mod_cache on Debian Lenny. If you have a high-traffic dynamic web site that generates lots of database queries on each request, you can decrease the server load dramatically by caching your content for a few minutes or more (that depends on how often you update your content).

I do not issue any guarantee that this will work for you!

1 Preliminary Note

I'm assuming that you have a working Apache2 setup (Apache 2.2.x - prior to that version, mod_cache is considered experimental) from the Debian repositories - the Apache version in the Debian Lenny repositories is 2.2.9 so you should be good to go.

I'm using the document root /var/www here for my test vhost - you must adjust this if your document root differs.

2 Enabling mod_cache

mod_cache has two submodules that manage the cache storage, mod_disk_cache (for storing contents on the hard drive) and mod_mem_cache (for storing contents in memory which is faster than disk caching). Decide which one you want to use and continue either with chapter 2.1 (mod_disk_cache) or 2.2 (mod_mem_cache).

2.1 mod_disk_cache

The mod_disk_cache configuration is stored in /etc/apache2/mods-available/disk_cache.conf, so let's edit that one:

vi /etc/apache2/mods-available/disk_cache.conf

Make sure you uncomment the CacheEnable disk / line, so that the minimal configuration looks as follows:

<IfModule mod_disk_cache.c>
# cache cleaning is done by htcacheclean, which can be configured in
# /etc/default/apache2
#
# For further information, see the comments in that file,
# /usr/share/doc/apache2.2-common/README.Debian, and the htcacheclean(8)
# man page.
# This path must be the same as the one in /etc/default/apache2
CacheRoot /var/cache/apache2/mod_disk_cache
# This will also cache local documents. It usually makes more sense to
# put this into the configuration for just one virtual host.
CacheEnable disk /
CacheDirLevels 5
CacheDirLength 3
</IfModule>

To make sure that our cache directory /var/cache/apache2/mod_disk_cache doesn't fill up over time, we have to clean it with the htcacheclean command. That command is part of the apache2-utils package which we install as follows:

That's it already! With mod_mem_cache, you don't have to clean up any cache directories.

3 Testing

Unfortunately mod_cache doesn't provide any logging functionalities which is bad if you want to know if logging is working. Therefore I create a small PHP test file, /var/www/cachetest.php, that sends out HTTP headers that tell mod_cache that it should cache the file for 300 seconds, and that simply prints the timestamp:

Now call that file in a browser - it should display the current time stamp. Then click in the browser's address bar and press ENTER so that the page gets loaded again (don't press F5 or the reload button - this will always fetch a fresh copy from the server instead of the cache!) - if all goes well, you should still see the old, cached timestamp. If you wait 300 seconds, you should get a fresh copy from the server instead of the cache.

4 HTTP Headers

Caching doesn't work out-of-the-box - you must modify your web application so that caching can work (it is possible that your web application already supports caching - please consult the documentation of your application to find out). mod_cache will cache web pages only if the HTTP headers sent out by your web application tell it to do so.

So if you want mod_cache to cache your pages, modify your application to not send out such headers.

If you want mod_cache to cache your pages, you can set an Expires header with a date in the future, but the recommended way is to use max-age:

"Cache-Control: must-revalidate, max-age=300"

This tells mod_cache to cache the page for 300 seconds (max-age) - unfortunately mod_cache doesn't know the s-maxage option (see http://www.mnot.net/cache_docs/#CACHE-CONTROL), that's why we must use the max-age option (which also tells your browser to cache - please keep this in mind if you get unexpected results!). If mod_cache knew the s-maxage option, we could use "Cache-Control: must-revalidate, max-age=0, s-maxage=300" which would tell mod_cache, but not the browser, to cache the page.

Of course, this header is useless if you send out one of the non-caching headers (Expires in the past, Set-Cookie, etc.) from above at the same time!

Another very important header for caching is this one:

"Vary: Accept-Encoding"

This makes mod_cache keep two copies of each cached page, one compressed (gzip) and one uncompressed so that it can deliver the right version depending on the capabilities of the user-agent/browser. Some user-agents don't understand gzip compression, so they should get the uncompressed version.

So here's the summary: use the following two headers if you want mod_cache to cache:

"Cache-Control: must-revalidate, max-age=300"
"Vary: Accept-Encoding"

and make sure that no Expires with a date in the past, cookies, etc. are sent.

If your application is written in PHP, you can use PHP's header() function to send out HTTP headers, e.g. like this:

Falko Timme is an experienced Linux administrator and founder of Timme Hosting, a leading nginx business hosting company in Germany. He is one of the most active authors on HowtoForge since 2005 and one of the core developers of ISPConfig since 2000. He has also contributed to the O'Reilly book "Linux System Administration".