I'm new to both ucarp and keepalived and am unsure which is the most "supported". I see that the latest ucarp release was in 2010 while the continued development of keepalived seems to be more active.
In any case, all I'm looking for is a simple way to move a floating IP address alias within 3 LAN servers. I think keepalived supports specifying custom scripts to perform "health checks". Can ucarp do the same? How does ucarp determine if a node has failed? Does it do more then an ICMP check?

Before digging into each software configuration files, does anyone have a quick comparison between the two solutions?
Any other suggestions (except Heartbeat because I think it does too much for what I actually need)?

I'm not able to properly compare the two solutions because I don't know enough about keepalived. However, I can comment on ucarp as it is something I use in production. I encountered ucarp when the need arose to migrate two firewalls running in an active/backup configuration from FreeBSD to Linux. I chose ucarp for two reasons. Firstly, its implementation is straightforward. Secondly, it is the closest approximation of CARP - as implemented in BSD. The protocol itself is similar to Cisco's VRRP protocol. It works by sending out periodic advertisement messages via multicast.

Setting up ucarp is quite easy. To employ ucarp usefully, it should be spawned with parameters that describe:

The VHID (a unique integer for the virtual server)

The real IP address of the interface to which ucarp must bind

The (shared) virtual IP address

A pre-shared key (password)

The location of an 'up' script to execute upon being promoted to master

The location of a 'down' script to execute upon being demoted to backup

These parameters should be common between the participating hosts except for the real IP - obviously.

When ucarp decides that the other host is down, it sends a gratuitous ARP so as to expedite the updating of ARP tables maintained elsewhere (such as your ethernet switch), then it executes the up script. This is an interesting characteristic of ucarp because it is the admin's responsibility to create a script that adds and removes the virtual IP. To put that into perspective, a minimal up script would look like this (using iproute2):

Code:

#!/bin/sh
# ucarp passes the interface as the first arg, and the virtual IP as the second arg
ip addr add "$2" dev "$1"

Conversely, a minimal down script would look like this:

Code:

#!/bin/sh
ip addr del "$2" dev "$1"

This is different from CARP, which manages the interfaces directly. I actually find ucarp to be more powerful because one can easily script whatever actions are necessary to ensure a smooth failover process. For example, I need to have conntrackd synchronize the netfilter connection tracking table. I also need to have the OpenVPN daemon start on the active node, and stop on the backup node. Whatever one may need to do, it can be carried out in these scripts. There is no domain-specific language or specialized configuration syntax to learn. For that matter, they needn't even be shell scripts.

In Gentoo, there is no initscript provided with the ucarp package. If one is required, I would suggest taking a look at the package in Alpine Linux. I just use postup() hooks and predown() hooks in /etc/conf.d/net to spawn and terminate ucarp as necessary.

The greatest difficulty I had in solving my problem with ucarp is in emulating BSD's net.inet.carp.preempt sysctl. My firewalls are multi-homed and I have a need to run multiple ucarp instances which, in turn, handle multiple interfaces/addresses. If any one instance decides to transition to a 'backup' state, then all instances must follow suit to prevent routing breakage.

In the end, I managed to do this by executing pkill -USR2 ucarp in the down script, which instructs all instances to transition to a backup state. I used atomic locking to prevent race conditions in the down script. Here's a skeletal example based on my production script:

Code:

#!/bin/sh
exec 9>/tmp/vip-down.lock

ip addr del "$2" dev "$1"

if flock -n 9; then
# signal all ucarp instances to hand over
pkill -USR2 ucarp

# if any other actions are needed then insert them here
# ...?

# sleep for just under 1 second (my advertisement period)
sleep 0.9
fi

Check the manpage and you'll see that there are some parameters to tune ucarp behaviour but you cannot intrinsically alter the method used to determine unavailability because it is defined by the CARP protocol. It's the very reason for ucarp to exist as a tool. Still, you can force a ucarp instance to demote itself by dispatching SIGUSR2 (as previously mentioned). This signal could be sent from any process that has the appropriate privileges.

The docs talk about "advertisements" but I can't seem to understand how they are generated. I mean, how does a master advertise that it's still a master? I suppose that it's simply by generating the advertisements and that if it doesn't then the backup takes over because it means that the master either died (machine freezes or ucarp process dies for some reason) or a custom script "forced a ucarp instance to demote itself by dispatching SIGUSR2". The latter is quite interesting because most of my master node troubles aren't due to machine freezing or anything as easily detectable as that. So I would create my own scripts to do some sanity checks and decide whether to demote or not.

I suppose that it's simply by generating the advertisements and that if it doesn't then the backup takes over because it means that the master either died (machine freezes or ucarp process dies for some reason) ...

The master node will send out periodic advertisements and respond to ARP requests for the virtual IP. The backup node(s) listen for these advertisements and will keep track of the interval between them. So far so good.

When a backup node detects that the interval is higher than its own defined interval, it takes action and sends its own advertisement. All nodes will consider these advertisements and whichever exhibits the highest frequency will be considered as the winner. This process can be described as an election.

Given a host running ucarp which is active as a master, if the host freezes or ucarp is terminated then an election will take place (because it is no longer sending advertisements). Further, if said host is unable to dispatch advertisements with sufficient frequency - perhaps due to connectivity issues or packet loss - an election will take place.

The advertisement interval for a node is defined with the --advbase parameter. I don't think there is any limit on the number of participating nodes. Hence, multicast is used by default for reasons of efficiency. Nodes can be favoured through adjustment of the --advskew parameter. Those with a higher value are less likely to win in an election, all other things being equal.

The time to elapse before a node considers itself as master can be calculated as 3 * (advbase + (advskew / 255)).

Vieri wrote:

... or a custom script "forced a ucarp instance to demote itself by dispatching SIGUSR2".

Upon receiving a SIGUSR2, ucarp will immediately demote itself from master to backup (if applicable), pause for 3 seconds, then start listening for advertisements again. It will then decide what to do based on the aforementioned election rules i.e. it will promote itself to master again if it is necessary.