Improving Network Reliability with Keepalived

Redundancy is one of the key ways you can increase the reliability of your
network. As the concept of RAID (redundant arrays of inexpensive disks) has
shown, it can be much more cost effective to group a number of inexpensive
components together than to spend much more money on one high-priced item. You
can apply the same idea to your network: instead of investing in one very
expensive proprietary router, why not install several redundant Linux routers
made out of commodity parts and free software? This article shows how easy it
is to do just that with Keepalived
on Linux.

Concepts

The problem with routing is that most client computers do it in the simplest
way possible: by using a default route. Any network traffic not destined for
the local network goes happily onto the gateway router, assuming that it knows
how to send it along appropriately.

This makes the gateway a single point of failure for your network. If it
goes down, none of your client machines can communicate with the outside
world.

One answer to this is to have your client machines run a routing protocol
such as RIP or OSPF. Generally this isn't done, due to the increased complexity
and overhead. Thus the only practical way to make routing more robust is to
fool the clients into thinking they are always communicating with one gateway
router. You can use VRRP (Virtual Router Redundancy Protocol) to do this; the Keepalived program
provides one implementation of VRRP for Linux.

VRRP is an IETF protocol that allows
two or more routers to act as one virtual router. According to the VRRP
specification, the routers present a virtual IP address (VIP) that
corresponds to a virtual MAC address (VMAC). Each router has a real hardware
and IP address. Initially the master router handles the virtual IP and MAC
addresses. If the master router fails, the backup then takes over the virtual
addresses. The master and backup routers monitor each other with regular
multicast advertisements, at a default rate of one per second.

Because Linux does
not currently support VMACs, Keepalived implements only VIPs. In practice,
this works fine on modern networks, although you should be aware that this can
cause problems for older hardware that does not support gratuitous ARP
requests.

In normal operation, the backup VRRP server monitors continually by
listening for multicast advertisements from the master. If the master
disappears, the backup sends a gratuitous ARP message out on the network, which
says, in effect, "I own the hardware address that the master previously owned."
This causes all other systems on the network to start using the backup VRRP
server as their gateway. This continues until the master server reappears.
The key point here is that no reconfiguration is necessary on the client
machines; it all happens on the servers.

While this article covers the simple case of one master and one backup
server, in reality there can be multiple backup servers for increased
reliability. The VRRP protocol works on an election process: a failing master
causes an election to happen, and the highest-priority backup takes over. If
that backup fails, the next takes over, and so on.

Installing the Software

Keepalived may come preinstalled on your Linux server, depending on which
distribution you use. (It doesn't come with Fedora, at least up to FC3.)
There have been many updates to Keepalived in the past few months, so you are
probably best off downloading the latest version directly from the Keepalived
web site.

Keepalived installs and builds in the standard Unix way: just unpack the
tarball and follow the instructions in the install file. Make sure
you put the Keepalived init file in your init directory; for example,
/etc/rc.d/rc3.d/S99keepalived.

Remember that you also need to install Keepalived on both the master and
backup routers. The installation is the same on all servers except for
differences in the configuration file.

Configuration

The Keepalived configuration uses a single file,
/etc/keepalived/keepalived.conf. This file can be intimidating,
because there are many configuration options and the documentation is a bit
scattered. Keepalived includes several other health-check mechanisms, and the
documentation focuses on configurations such as web server farms. Luckily, you
can ignore most of those configuration options if you are just configuring
VRRP. One of my main goals in writing this article is to make people aware
that Keepalived works perfectly fine as just a VRRP server.

Assume that your master router is at address 192.168.1.253 and your backup
is at 192.168.1.254. Traditionally, the gateway on a network is on the .1
address, so set the VRRP virtual address to 192.168.1.1. That way if your
existing client configurations use a default gateway on 192.168.1.1, you won't
have to change the configuration on each client machine.

Edit keepalived.conf on the master so that it contains just
the following:

This first defines a new VRRP instance and calls it VI_1. If
you want to run VRRP on multiple interfaces on a router, give each one a
different instance name. Typical names are VI_1,
VI_2, and so on, but you can name them anything you want.

The next line defines the state VRRP will be in when Keepalived starts.
Because this is the master, VRRP should start in the master state so that it
will control the virtual IP address.

The interface line defines which network interface this VRRP instance will
operate on, so typically this is eth0 or eth1 or something similar. Keepalived works
just fine with VLANs, so you can use VLAN addresses such as eth0.2
as well.