Database: Percona XtraDB Cluster. Percona XtraDB is a higher-performance, backwards-compatible drop-in replacement version of the InnoDB storage engine for MySQL. The Percona XtraDB Cluster is the Percona Server setup with the Galera multi-master cluster libraries built in.

Load balancer: HAproxy. HAProxy is a very high performance TCP/HTTP load balancer.

We will be building three database nodes, connected to one load balancer. Something like this:

The steps:

Setup the machines

Install Percona XtraDB Cluster on the database machines

Configure the bootstrap node

Configure the rest of the cluster

Test the cluster

Daemonise the clustercheck script on the database machines with xinetd

It will ask for a MySQL root password during each install. I kept them the same over the nodes.

Configure the bootstrap node

Choose a machine that will create the initial cluster. Forming an initial cluster is called bootstrapping the cluster.

Stop MySQL on that machine:

$ service mysql stop

(Create &) Edit /etc/mysql/my.cnf on the bootstrap node.

Add the following:

[mysqld]
datadir=/var/lib/mysql
user=mysql
# Path to Galera library
wsrep_provider=/usr/lib/libgalera_smm.so
# Empty gcomm address is being used when cluster is getting bootstrapped
wsrep_cluster_address=gcomm://
# Cluster connection URL contains the IPs of node#1, node#2 and node#3
#wsrep_cluster_address=gcomm://192.168.10.11,192.168.10.12,192.168.10.13
# In order for Galera to work correctly binlog format should be ROW
binlog_format=ROW
# MyISAM storage engine has only experimental support
default_storage_engine=InnoDB
# This is a recommended tuning variable for performance
innodb_locks_unsafe_for_binlog=1
# This changes how InnoDB autoincrement locks are
# managed and is a requirement for Galera
innodb_autoinc_lock_mode=2
# Node #1 address
wsrep_node_address=192.168.10.11
# SST method
wsrep_sst_method=xtrabackup
# Cluster name
wsrep_cluster_name=my_ubuntu_cluster
# Authentication for SST method
wsrep_sst_auth=&quot;sstuser:s3cretPass&quot;

wsrep stands for WriteSet REPlication, BTW. wsrep is a project that aims to develop a generic database replication plugin interface, and Galera is a specific wsrep provider.

Notice line 10, where the wsrep_cluster_address (gcomm://) address is empty. This will tell this node to bootstrap itself (create a new cluster) on startup. We will comment line 10 and uncomment line 13 later when the rest of the cluster is ready. You’ll notice that line 13 specifies all the other node addresses in the cluster – it is stricly speaking only necessary to specify one other node, but listing all of them is noted as a best practice in the Percona documentation.

Some notes about some of the settingswsrep_cluster_name is the unique name of your cluster. It should be identical on all nodes that form part of the same cluster.

wsrep_sst_method specifies the way a State Snapshot Transfer (SST) is taken. Cluster nodes are kept in sync through SSTs and ISTs (Incremental State Transfer). An SST is a full data copy from a donor node to a joining node, and an IST a way for a joining node to catch up through a transfer of partial state snapshots (writesets) from the donor node’s writeset cache.

The following SST options exist:

xtrabackup – uses Percona XtraBackup to take an SST. It’s useful because it doesn’t lock the donor database during an SST. We need to set up an SST user for xtrabackup, hence the values in wsrep_sst_auth.

That SST in progress line tells you that it connected to the server and is receiving an SST.

Once all the nodes are connected you can stop the bootstrap node’s MySQL service, comment line 10 in it’s my.cnf, and uncomment line 13 in it. Restart the MySQL service on it after saving the cnf file. Not fixing the wsrep_cluster_address directive on the bootstrap node will make it re-bootstrap itself on restart.

Test the cluster

All three nodes should now be running happily connected to each other. Let’s test some things.

Try creating a database on one node, and running SHOW DATABASES; on another node. Should feel pretty cool to see the live replication 😉

Percona installs a script called clustercheck in /usr/bin/clustercheck. It’s a simple script that checks a node’s MySQL wsrep_local_state variable value and outputs HTTP 200 if the node is in sync, and HTTP 503 if it isn’t (or if it couldn’t connect).

It needs a MySQL user called clustercheckuser with password clustercheckpassword!. You can change this user, BTW – look for the following lines in /usr/bin/clustercheck:

You don’t have to repeat user addition on the other nodes as it will propagate through the cluster by itself. Just remember to edit the clustercheck scripts on the other nodes if you are not using the default username and password.

Daemonise the clustercheck script

We are going to set up the load balancer soon. But we need to find a way to make the load balancer aware that a node has gone out of sync. That’s why the clustercheck script exists.

To expose the clustercheck script we will be using xinetd. This is a daemon that listens on a port you specify, running a specific command once you connect to that port, and returning the output of the command to you.

Percona, nice people that they are, installs an xinetd config file for clustercheck as part of the server install. It should be located at /etc/xinetd.d/mysqlchk, and the contents probably look something like:

The interesting detail to look at are lines 28-30, where HAProxy’s httpchk functionality is used to check for an HTTP 200 status on port 9200 of the cluster nodes – thus running the clustercheck script via xinetd, and taking a node out of circulation while it is not synced. It’s a better way to manage nodes than just checking if port 3306 is open.

The HAProxy service won’t start by default – as it has no sane default configuration – but now that we have provided a sane configuration we can set it to start. Edit /etc/default/haproxy, setting ENABLED to 1:

You will probably get a warning about <debug> mode incompatible with <quiet> and <daemon>. Keeping <debug> only. You can remove this by editing /etc/haproxy/haproxy.cfg and removing the debug option and enabling the quiet and daemon options (lines 7 – 9). I found it useful to mess around in debug mode at first, though.

Unexpected results – Host Blocked Errors
I had a problem the first time I set this up. I got Error Code: 1129 Host ‘192.168.10.10’ is blocked because of many connection errors; unblock with ‘mysqladmin flush-hosts’ after a while when connected to my cluster. I’m still not sure why it happened, or that my actions cured it; I do know it hasn’t happened again after adding the option tcpka (line 26: enabling TCP Keep Alive) to the haproxy.cfg file. Not understanding it feels a bit like magic. Hopefully someone more clued up can weigh in with some ideas. The MySQL manual notes that it happens after a certain amount of interrupted connection requests, hence my trying to keep the connections alive. Since the setting does nothing much for security it may make sense to set the max_connect_errors value in your MySQL configuration to some insanely high value.

Now that HAProxy is running (and you are hopefully not dealing with unexpected results) you should be able to connect through MySQL to your load balancer, and it should be round-robining the requests to your cluster nodes.

Congrats! Have fun.

PS: Possible Tweaks

Security can be improved a bit – I would suggest at least locking down xinetd to only allow connections from your load balancer (look at line 12 in /etc/xinetd.d/mysqlchk – the directive takes += as operator so you can list multiple IPs below each other).

You can also close ports on the cluster nodes to only allow MySQL and SSH access. The MySQL cluster needs 3306, 4444, 4567, and 4568 open, and SSH is typically running on port 22.

to the /etc/haproxy/haproxy.cfg file. Restart HAProxy, and browse port 8080 on your load balancer’s address. Remember to change Username and Password to values that you like for HTTP authentication.

HAProxy Stats – very colourful

Round-robining between the cluster nodes is an OK, but not great, strategy for the load balancer. If two nodes receive conflicting writes rollbacks can occur. It would be better to only write to a specific node at a time, failing to another node only if the active node goes offline.