Server Setup

Monitor Varnish like a PRO in CentOS 7

byDanila Vershinin, September 12, 2018

A silly mistake

At some point I started seeing strange things about my Varnish instance. It gave unexplained “backend fetch failed” errors. Only when I viewed syslog (and this was for an entirely different task), I spotted Varnish panic happening quite often:

My immediate reaction was trying to downgrade, etc. All was in vain – the actual error was my own misconfiguration. Cache segmentation was configured in a way that both static files and page cache backend were looking at the same file:

There is no need to segment cache if it’s intended to store it in the same filesystem

Surely this was an easy fix. But the frustrating part was not knowing that something is wrong with Varnish configuration before spotting the panic messages in syslog, merely by accident. How can we do better here?

How Varnish runs

Varnish architecture builds upon two main processes: the master and the child process.

The child process is the process that actually caches stuff. It panics if there’s a problem. Responsibility of the master process is basically watching over the cache process and restarting it as needed.

Improving things in terms of monitoring and a bit of reliability raises questions:

How can we easily spot Varnish panics and be alert about them?

Who is watching over watcher (master)?

Notification for Varnish panics

It’s easy to know if your running Varnish instance had a panic happen with the following command:

varnishadm panic.show

If a panic has happened, you’d see its details. But how do we know we have to check it in the first place? It would be nice to be notified. Here comes our simple Monit check. E.g. place in /etc/monit.d/varnish.mon:

check program varnishpanic with path "/bin/varnishadm panic.show"
if status != 1 then alert

The trick here is knowing that varnishadm panic.show will have an exit code 0 if panic exists and 1 otherwise. The easy check will ensure that you will get an alert, should there be any panic. And act on it early.

Watch over master

The master Varnish process is quite reliable and is the least likely thing to crash. But why not add a bit of monitoring if we can?

There are basically two options here: you can also use Monit to ensure main Varnish process is running. Or you can use systemd feature.

systemd

With the arrival of systemd in CentOS 7, one does not have to take care about constant Varnish uptime, should it completely crash.