Using Lighttpd as a static file server for Drupal

Building websites that can handle high amounts of traffic involves finding points of scalability in the network architecture. There is a lot of discussion about database replication and redundant web servers, but very little discussion has taken place about serving static files from a different server than the one which executes PHP. This article shows how you can configure Drupal to serve static files from a separate server, potentially on a separate machine. There is even a solution for those of you who are using the imagecache module.

Static vs Dynamic content

A webpage in your browser usually consists of HTML plus Javascript, images, CSS, and perhaps some Flash. The typical order of events is that the browser requests the HTML, parses it, and then begins to request the additional .js, .css, .png, .gif, and .flv files. This sequence is well diagrammed on the Yahoo! Developer Network. For Drupal sites, the initial request that returns the HTML is a dynamic request, meaning PHP code and a database are required to generate the HTML. The rest of the requests, however, reference static files. These files require neither PHP nor a database and can be returned to the browser by the simplest and most lightweight web servers available. This is the fundamental difference between a dynamic request (one that requires a script language like PHP) and a static request (one which returns an simple file from the file system).

For a web server like Apache to serve a dynamic Drupal page, it must load extra software (mod_php) in order to be able to execute PHP. This extra software increases the memory footprint of the server and reduces the total number of requests that it can handle before the machine's physical memory is exhausted. Even more memory intensive is the act of executing PHP. A Drupal site with lots of modules installed that handles a lot of data from the database can easily require 64M of memory per thread. This is a huge expenditure of memory compared to the 1-2M it takes to serve a static file. Since Apache recycles its worker threads, you end up in a situation where the same 64M monster that created the Drupal HTML is also used for serving a .jpg file. This is a huge waste of resources.

Adding a static file server to your network thus brings the following advantages:

Static files are served from a server optimized for the task

Better utilization of "heavy" PHP server resources

A new point for scalability; you can add more machines to run static file servers if needed using typical load balancing techniques

Sharing files

Where exactly are the static files in a Drupal site? Here's a list of the typical places:

files/: Files uploaded by the application

misc/: Drupal's Javascript files and some images

modules/: Any module might have extra static files, such as .css, images, .js and so forth

themes/: Most themes introduce .css and images

sites/all/: More modules and themes can be found here

With Drupal's static files scattered throughout a directory structure that also contains all of the PHP files needed for Drupal execution, the idea of collecting them separately and putting them on a separate static server is impractical. The solution is to make the entire directory structure available to the static file server and disallow that server from serving requests for the PHP files.

How the files become available to the static file server is another question. One approach is to host the files on an NFS server which all web servers and the static file system mount. Another approach is to use rsync to keep redundant copies of the entire directory structure available to every server. There are other options as well.

It is even possible to run the static file server on the same machine as the dynamic web server and have the two share a document root. This is the approach I take in this article as it demonstrates the principle adequately.

Routing requests

The next issue is how should requests be routed? One approach would be to have a proxy server which routes requests for static files to a separate server. This leaves the application blissfully unaware of the concerns of the static file server. If you have experience with this approach please discuss it in the comments.

A second approach, which I take in this article, is to adjust the application to write the URLs to static resources differently. In Drupal this turns out to be a very simple task because all URLs are generated by a small number of functions. A minor tweak to these functions is sufficient to send all static file requests to the appropriate server.

Here is a survey of the changes that I needed to make to Drupal 5.5 and the Garland theme in order to serve all static files from a separate server. A patch with the complete set of changes is attached below.

Add a variable to $conf in settings.php:

$conf = array(
'static_url' => 'http://static.example.com/'
);

In every function where static files get included in the HTML, update the logic to use the static_url variable. This includes:

includes/common.inc: drupal_get_css(), drupal_get_js()

includes/file.inc: file_create_url()

includes/theme.inc: theme_get_setting(), theme_image()

// use either the URL to the static server (if set) or the base_path()$base = variable_get('static_url', base_path());
// Anywhere a resource is being included, use $base$output .= ''. "\n";

For the theme, I added a variable to all templates called static_base.

The static_base variable can then be used where files are directly linked in the theme. For example, in Garland's page.tpl.php:

The static file server

I chose to use Lighttpd (aka Lighty) to be the static file server based on its reputation for being lightweight and fast, and because I had never used it before. There are many web servers that can be optimized for the task, however.

I installed Lighttpd on Mac OS X (Leopard) using MacPorts. After the package was installed I made the following changes to the lighttpd.conf file:

## This is the same document root as is used by the Apache server for Drupal
server.document-root = "/Users/robert/public_html/"
## Make sure that directory listings don't work.
index-file.names = ( )
## For the Mac OS X users
server.event-handler = "freebsd-kqueue"
## This plays a similar function to the .htaccess directive that hides certain file extensions.
url.access-deny = ( "~", ".engine", ".inc", ".info", ".install", ".module", ".profile", ".po", ".sh", ".sql", ".theme", ".tpl.php", ".xtmpl" )
## I want Apache to run on 80 so this needs to be something else
server.port = 81

I also added this to my .bash_profile so that I could start lighttpd from the command line easily: PATH=$PATH:/opt/local/sbin export PATH

You may have to take the additional steps of adjusting your firewall to allow a process to bind to port 81, and some of the directories referenced in the lighttpd.conf file may need to be created.

Once you've finished with the above steps you can test Lighty's configuration with the following command: sudo lighttpd -t -f /opt/local/etc/lighttpd/lighttpd.conf

You can start the server with this command: sudo lighttpd -D -f /opt/local/etc/lighttpd/lighttpd.conf

A production instance of lighttpd will require some further configuration, most notably you'll want to use mod_expire and mod_compress to set expiry dates in the future, and to compress textual content for faster transfer over the wire.

Turn off KeepAlive

One of the big gains that can be had by using a static file server is the freedom for your dynamic server to close the connection to the client immediately after serving the initial HTML. In your main web server's configuration you can now turn off the KeepAlive directive. For my setup, using Apache 2 (via MAMP), this involved adding the following line near the top of httpd.conf:

KeepAlive = Off

A restart of Apache is necessary.

Using /etc/hosts

Your static files should always come from a different hostname than your dynamic HTML. This allows the browser to make more efficient use of its connections. On your local machine you can simulate this by editing /etc/hosts:

127.0.0.1 localhost static

This adds a hostname static that also resolves to the local server. Your $conf in settings.php will then look like this:

$conf = array(
'static_url' => 'http://static:81/'
);

Testing it out

With Drupal patched and Lighttpd up and running, you should have a Drupal site that gets its HTML from Apache and its static files from the static file server. Please describe any problems (and their solutions) that you run into in the comments below and I'll update the article accordingly.

Imagecache

The above techniques will work will with any Drupal site that doesn't use imagecache. The imagecache module presents a special challenge because it plays sneaky games with Drupal's 404 error handling. When Drupal receives a request for a resource that isn't on the file system and isn't a valid Drupal path, the Drupal application serves a 404 Not Found page, resulting in a full Drupal bootstrap. Imagecache takes advantage of this and generates image derivatives during this process. This means that imagecache requires requests for static images to come to Drupal - at least in the case when they are 404 Not Found.

To sidestep this problem we want Lighttpd to redirect any 404 requests to the Drupal server. In your lighttpd.conf file, change the following directives so that we can run a small Perl script to do the redirect.

Now you must add a script to the scripts directory of your Drupal installation and make it executable.

Save this to scripts/redirect.pl

#!/usr/bin/perl
// Here localhost is the hostname for the Drupal server. Update so that your domain or hostname
// is used instead.
print "Location: http://localhost$ENV{REQUEST_URI}\n\n";
exit;

Update the URL in the script to use your hostname or domain instead of localhost, if necessary. The file must be executable by the user running the Lighty webserver. Now, when Lighty encounters a 404 request, it will be forwarded to the Drupal web server where imagecache will be able to make the derivative image. After that, Lighty will be able to serve requests for that image.

Please note that imagecache 2.0 is said not to need this workaround.

Conclusion

Setting up a static file server to handle all non-dynamic requests is a moderately simple task that is well worth the while for sites that need to get the best performance and handle the most visitors. It provides a new point of scalability, manages existing server resources better, and can lead to overall faster page loads.