I'm a web developer in Norfolk. This is my blog...

19th September 2015 7:42 pm

A Quick and Easy Varnish Primer

As I mentioned in an earlier post, I recently had the occasion to use Varnish to improve the performance of a website that otherwise would have been unreliable and unusably slow due to WordPress making an excessive number of queries. The difference it made was nothing short of staggering, and I’m not exaggerating when I say it saved the day. I now use Ansible for provisioning new WordPress sites, and Varnish is now a standard part of my WordPress site setup playbook.

However, Varnish can be quite fiddly to configure, and it was something of a baptism of fire for me to learn how to configure it appropriately for this use case. I did make a few mistakes that caused problems down the line, so I thought I’d share the details of how I got it working for that particular site.

What is Varnish?

Varnish Cache is a web application accelerator also known as a caching HTTP reverse proxy. You install it in front of any server that speaks HTTP and configure it to cache the contents. Varnish Cache is really, really fast. It typically speeds up delivery with a factor of 300 - 1000x, depending on your architecture.

In other words, you run it on the usual HTTP or HTTPS port, move your usual web server to a different port, and configure it, and it will cache web pages so they can be served more quickly to subsequent visitors.

Be warned - Varnish is not something where you can generally stick with the default settings. The default behaviour does make a lot of sense, but in practice almost no-one will be able to get away with leaving the configuration unchanged.

Installing Varnish

If you’re using Debian or a derivative such as Ubuntu, Varnish is available via apt-get:

$ sudo apt-get install varnish

You may also want to install the documentation:

$ sudo apt-get install varnish-doc

If you’re using Apache I’d also recommend installing libapache2-mod-rpaf and enabling it with sudo a2enmod rpaf - without this, Apache will log all incoming requests as coming from the same server.

I’m assuming you already have a normal web server installed. I’ll assume you’re using Apache, but it shouldn’t be hard to adapt these instructions to work with Nginx. I’m also assuming that the site you want to use Varnish for is a WordPress site with WooCommerce and W3 Total Cache installed. However, this is only for example purposes. If you want to use Varnish for a different web app, you’ll need to plan your caching strategy around that web app yourself.

Please also note that this is using Varnish 4.0, which is the version available with Debian Jessie. If you’re using an older operating system, you may have Varnish 3.0 in the repositories - be warned, the configuration language changed in Varnish 4.0, so the examples here will not work with older versions of Varnish.

By default, Varnish runs on port 6081, which is fine for testing it out, but once you want to go live it’s not what you want. When it’s time to go live, you’ll need to open up /etc/default/varnish and edit the value of DAEMON_OPTS to something like this:

Next, we need to move our web server to a different port. We’ll use port 8080. Replace the contents of /etc/apache2/ports.conf with this:

# If you just change the port or add more ports here, you will likely also# have to change the VirtualHost statement in# /etc/apache2/sites-enabled/000-default# This is also true if you have upgraded from before 2.2.9-3 (i.e. from# Debian etch). See /usr/share/doc/apache2.2-common/NEWS.Debian.gz and# README.Debian.gzNameVirtualHost *:8080
Listen 8080
<IfModule mod_ssl.c># If you add NameVirtualHost *:443 here, you will also have to change# the VirtualHost statement in /etc/apache2/sites-available/default-ssl# to <VirtualHost *:443># Server Name Indication for SSL named virtual hosts is currently not# supported by MSIE on Windows XP.Listen 443
</IfModule><IfModule mod_gnutls.c>Listen 443
</IfModule>

You’ll also need to change the ports for the individual site files under /etc/apache2/sites-available, as in this example:

Writing our VCL file

Next, we come to our Varnish configuration proper, which resides at /etc/varnish/default.vcl. The vcl stands for Varnish Configuration Language, and it has a syntax somewhat reminiscent of C.

The default behaviour for Varnish is as follows:

It does not cache requests that contain cookie or authorization headers

It does not cache requests which the backend HTTP server indicates should not be cached

It will only cache GET and HEAD requests

This behaviour is unlikely to meet your needs. We’ll therefore work through the Varnish config file I wrote for this WordPress site in the hope that it will teach you enough to adapt it to your own needs.

Here we define that we’re using version 4.0 of VCL, and that the host to use as a back end is port 8080 on the same server. If your normal HTTP server is running on a different port, you will need to set it here. Also, note that you can use a different host as the backend.

acl purge {
"127.0.0.1";
"localhost";
}

We also set which hosts can trigger a purge of the cache, namely localhost and 127.0.0.1. The web app hosted on the server can then make an HTTP PURGE request to a given path, which will clear that path from the cache. In our case, W3 Total Cache supports this - if it’s a custom web app, you’ll need to implement this functionality yourself to clear the cache when new content is added.

Next, we start the vcl_recv subroutine. This is where we define our rules for deciding whether or not to serve content from the cache. Let’s look at our first rule:

Here, we declare that we should never cache any PUT, PATCH, DELETE or POST requests, on the basis that these change the state of the application. This ensures that things like contact forms will work as expected.

Note that we’re getting the value of req.method to determine the HTTP verb used. The req object has many other properties we’ll see being used.

Next, we define a series of regular expressions, and if the URL (represented by req.url) matches that regex, then the request is passed straight through to Apache without Varnish getting involved. In this case, we never want to cache the following sections:

The shopping cart, checkout, addons page or account page

The Add to cart button

The WordPress admin and login screen, and cron requests

The WooCommerce API

You’ll need to consider which parts of your site must always serve the latest content and which don’t need everything to be fully up to date. Typically admin areas any anything interactive must not be cached, while the front page is usually fine.

Cookies, even ones set on the client side such as those for Google Analytics, can prevent content from being cached. To prevent this, you need to configure Varnish to discard these cookies before passing them on to Apache. In this case, we want to exclude Google Analytics and various WordPress cookies.

Here we check for remaining WordPress-specific cookies. These would indicate that a user is signed in, in which case we may want to serve them all the latest content rather than displaying content from the cache.

Remember where we allowed the local server to clear the cache? This section actually carries out the purge when it receives a request from an authorised client.

# Force lookup if the request is a no-cache request from the clientif (req.http.Cache-Control ~ "no-cache") {
return (pass);
}

Here we check to see if the Cache-Control HTTP header is set to no-cache. If so, we pass it straight through to Apache.

# Try a cache-lookupreturn (hash);
}

This is the last rule under vcl_recv, because it only reaches this point if the request has got past all the other rules. It tries to fetch the page from the cache. If the page is not in the cache, it passes it on to Apache and will cache the response.

sub vcl_backend_response {
set beresp.grace = 5m;
}

This is where we set how long responses are cached for. Here we’ve set it to 5 minutes.

With that done, we should be ready to restart Varnish and Apache. If you are using an operating system with systemd, then the following commands should restart Apache and Varnish:

If you then visit your site and inspect the HTTP headers using your browser’s dev tools, you’ll notice the new HTTP header X-Varnish in the response. This tells you that Varnish is up and running. If you make sure you’re logged out, you should hopefully see that if you load a page, and then load it again, the second response is noticeably quicker.

Installing and configuring Varnish is a relatively quick and easy way of helping your website scale to be able to serve many more users, and if the site becomes popular all of a sudden, it can make a huge difference as to whether the site can stand up to the load or not. If you need more information on how to configure Varnish for your own needs, I recommend consulting the excellent documentation.