The HA solution in Wazo makes it possible to maintain basic telephony
function whether your main Wazo server is running or not. When running a Wazo HA cluster, users are
guaranteed to never experience a downtime of more than 5 minutes of their basic telephony service.

The HA solution in Wazo is based on a 2-nodes “master and slave” architecture. In the normal
situation, both the master and slave nodes are running in parallel, the slave acting as a “hot
standby”, and all the telephony services are provided by the master node. If the master fails or
must be shutdown for maintenance, then the telephony devices automatically communicate with the
slave node instead of the master one. Once the master is up again, the telephony devices failback to
the master node. Both the failover and the failback operation are done automatically, i.e. without
any user intervention, although an administrator might want to run some manual operations after
failback as to, for example, make sure any voicemail messages that were left on the slave are copied
back to the master.

When you upgrade a node of your cluster, you must also upgrade the other so that
they both are running the same version of Wazo. Otherwise, the replication might not work
properly.

You must configure the HA in the Web interface
(Configuration ‣ Management ‣ High Availability page).

You can configure the master and slave in whatever order you want.

You must also run xivo-sync-i on the master to setup file synchronization. Running xivo-sync-i will create a passwordless SSH key on the master, stored under the /root/.ssh directory,
and will add it to the /root/.ssh/authorized_keys file on the slave. The following directories
will then be rsync’ed every hour:

/etc/asterisk/extensions_extra.d

/etc/xivo/asterisk

/var/lib/asterisk/agi-bin

/var/lib/asterisk/moh

/var/lib/xivo/certificates

/var/lib/xivo/sounds/acd

/var/lib/xivo/sounds/playback

Warning

When the HA is configured, some changes will be automatically
made to the configuration of Wazo.

SIP expiry value on master and slave will be automatically updated:

min: 3 minutes

max: 5 minutes

default: 4 minutes

Services ‣ IPBX ‣ General Settings ‣ SIP Protocol

The provisioning server configuration will be automatically updated in order to allow
phones to switch from Wazo power failure.

Configuration ‣ Provisioning ‣ Template Line ‣ Edit default

Warning

Do not change these values when the HA is configured, as this may cause problems.
These values will be reset to blank when the HA is disabled.

Important

For the telephony devices to take the new proxy/registrar settings
into account, you must resynchronize the devices
or restart them manually.

When the master node is down, some features are not available and some behave a bit
differently. This includes:

Call history / call records are not recorded.

Voicemail messages saved on the master node are not available.

Custom voicemail greetings recorded on the master node are not available.

Phone provisioning is disabled, i.e. a phone will always keep the same configuration, even after
restarting it.

Phone remote directory is not accessible, because provisioned IP address points to the master.

Note that, on failover and on failback:

DND, call forwards, call filtering, …, statuses may be lost if changed recently.

If you are connected as an agent, then you might need to reconnect as an agent
when the master goes down. Since it’s hard to know when the master goes down,
if your CTI client disconnects and you can’t reconnect it, then it’s a sign
the master might be down.

Additionally, only on failback:

Voicemail messages are not copied from the slave to the master, i.e. if someone
left a message on your voicemail when the master was down, you won’t be able to
consult it once the master is up again.

More generally, custom sounds are not copied back. This includes recordings.

Here’s the list of limitations that are more relevant on an administrator standpoint:

The master status is up or down, there’s no middle status. This mean that if Asterisk is crashed
the Wazo is still up and the failover will NOT happen.