Running a MySQL Galera cluster on Microsoft Azure

A few weeks ago, I was looking into running a MySQL Galera Cluster for a customer with a large Linux IAAS deployment on Azure.

Why that? There’s ClearDB, a Microsoft Partner that offers MySQL on Azure as SaaS (Software as a service), so you can go to https://www.cleardb.com/store/azure and pick your size. Or, if you want to run it on your own, you can pick a Ubuntu Linux Gallery image and type “apt-get install mysql-server” and that’s it, right? Well, not so fast…

ClearDB is a great offering for most customers that need a MySQL backend, but in this case, even the largest ClearDB offer was not sufficient.

So the customer followed the second path down, he created an IAAS VM (actually several VMs which each run an independent database server for different purposes) and configured his services to use these databases via the internal IP addresses of these servers. But there’s one problem with this approach: Occasionally, Azure needs to deploy patches to the host systems running these VMs. And occasionally, the Linux VMs also need patches that require a restart of the database server or a reboot of the machines. Whenever this happened, the customer site would be down for a few minutes.

To avoid this occasional downtime, I teamed up with Oli Sennhauser, CTO at FromDual and my colleague Christian Geuer-Pollmann to set up a MySQL Galera Cluster on Azure.

Such a cluster consists of three MySQL VMs. Database connections can be handled by all three machines, so the DB (read) load is distributed as well. As long as two machines are up, the database service is available. Galera achieves this by implementing the replication of database write transactions. More information can be found on http://galeracluster.com/ and on https://launchpad.net/galera/

So, here’s the tl;dr version of what we did:

- Set up three Ubuntu 14.04 LTS IAAS VMs with fixed internal IP addresses - Set up an Azure internal load balancer so that database clients have a single IP they connect to - Installed mysql-server-wsrep-5.6.16-25.5 and galera-25.3.5 plus a few dependencies - Configured galera on these three machines - Added a bit of iptables magic, courtesy of FromDual, to the VMs to block access to the MySQL port while a database server is recovering. The internal load balancer then moves the clients to the other servers of the cluster in case one is down. - And in order to keep this all neat and clean, we used Powershell to automate the Azure setup part.

0. Prerequisites

The fixed internal IP and the internal load balancer make use of features that were only added to the Azure virtual network quite recently. Chances are that if you configured an Azure virtual network a while ago, these function may not be available. So just configure a new virtual network for this.

Currently, some of these features can only be configured via Powershell. So you need a (windows) machine to run powershell on, if you don’t have one handy, just create a small (A1) Windows server machine in the Azure portal and use RDP to connect to it. Then install the Azure Powershell, see here.

And you should do a bit of planning ahead for your new virtual network. It should have sufficient IP addresses to host all your database clients, the three servers of the cluster and the additional IP input address of the load balancer. In this case, we used the 10.0.0.0/8 default setting but placed all the database servers in the 10.11.0.0/16 subnet.

1. Creating the machines and the internal load balancer

As said before, we scripted all this in powershell. And in order to keep the configuration apart from the actual commands, we set a bunch of variables in the header of our script that contain the actual settings. So when you see $servicename in the examples below, that is something we’re setting in this header.

When running this command, we found that the service needs to be deployed before running this command. So in order to ensure this, we just created a small toy IAAS VM, then created the loadbalancer and the database VMs and then removed the toy VM again.

To configure a VM to use the internal load balancer, we add an endpoint to the VM configuration:

In these packages, we found a few minor glitches that collided with the Ubuntu 14.04 LTS we installed them on. The first glitch was that mysql-server-wsrep-5.6.16-25.5-amd64.deb has a configured dependency on mysql-client. And Ubuntu sees this satisfied with the mysql-client-5.5 package it uses as default, but this creates a version conflict. So I downloaded the .deb and modified its dependency to point to mysql-client-5.6 by following http://ubuntuincident.wordpress.com/2010/10/27/modify-dependencies-of-a-deb-file/. The second glitch was the fact that the default my.cnf contains the path /var/log/mysql/error.log which does not exist on Ubuntu. This created the strange situation that the server process would not start but just create two mysterious entries in syslog. Running strace on the server process showed the path it was trying to access, and once I created it everything worked fine. Another glitch in the package was that is was missing an upstart script for mysql, instead it had just a classic /etc/init.d shell script which confused upstart. So I took one from a standard mysql-server-5.6 package and everything worked out well.

and put the standard upstart script from mysql-server-5.6 into the upstart config directory.

The next part was to configure the galera cluster function. As you can see in the script above, we have created three machines with the internal IP addresses 10.11.0.11, 10.11.0.12 and 10.11.0.13. For this, we need to set a few things in the default my.cnf

These settings are the same in all three machines. On each of the machines, we can now set a human readable node name, eg.g

wsrep_node_name='Node A'

In the next step, we configured the actual clustering, i.e., we told each machine where to find the replication partners.

On machine 10.11.0.11, we set the following line in my.cnf:

wsrep_cluster_address="gcomm://"

This allows this database node to come up even if there is no replication partner.

Then we started the server on 10.11.0.11.

Then we set the following line in my.cnf on 10.11.0.12:

wsrep_cluster_address="gcomm://10.11.0.11,10.11.0.13"

and started the server on 10.11.0.12

Then we set the following line in my.cnf on 10.11.0.13:

wsrep_cluster_address="gcomm://10.11.0.11,10.11.0.12"

and started the server on 10.11.0.12.

Now we went back to 10.11.0.11 and changed the line to:

wsrep_cluster_address="gcomm://10.11.0.12,10.11.0.13"

and restarted the server. Now the galera cluster was configured.

Instead of changing the configuration of the initial node twice, one can also directly start the server process and add the configuration setting to the command line, e.g. mysqld_safe wsrep_cluster_address=”gcomm://”. This is a good workaround if for whatever reason the cluster was fully shut down and needs to be brought up manually again.

Since the internal load balancer was already configured before, we can now use the ILB input IP address to connect to the cluster. So the clients use 10.11.0.10:3306 to connect to the cluster. And with each new TCP connection, the load balancer chooses one of the running nodes and connects the client to it.

There is one additional issue that may confuse clients in one specific situation. Imagine that one of the nodes just failed and is about to start up again. In this state, the database server can be accessed but does not yet have data replicated from the other nodes. In this state, although the clients can connect, all database commands will fail. If clients aren’t prepared to handle this situation, this may show up as database errors in applications. But there’s a solution: FromDual has implemented a small shell script that uses the Linux iptables firewall to deny access to the server while it is in this state. The load balancer then finds it cannot access the TCP port and reroutes the request to another running cluster node.

To run the script whenever a replication state change occurs, another line is added to my.cnf:

If you setup the cluster originally with 5 nodes then using the autoscaling feature built into azure you can set 3 nodes minimum running all the time, to 5 nodes max autoscale with demand. Then if the cluster gets higher demand then there is easy scope for auto scaling from the onset.

You can do that but I would recommend against this. The scaling of a database cluster is a complex operation and adding an additional node is essentially quite a burden to the remaining nodes since the database needs to be replicated to the new node. So while “scaling up” you actually reduce the database performance of your application until the new node is online. In addition, there is no simple performance value that the autoscaler could base its decision on. For instance, setting a CPU utilization threshold will probably not be enough because the typical database node is usually I/O bound before the CPU even feels a slight load. Essentially, the scaling of a database should be a conscious decision that I wouldn’t leave to the autoscaler.

However, you can prepare a couple of additional database nodes beforehand (like you would do with the autoscaler) and then start them manually or on a fixed schedule if you know that the load will be high. For instance, you could run three nodes over night but many more nodes during the day for a business website. You could run many nodes before christmas but few nodes for the rest of the year for an e-commerce website. And you could start a few extra nodes in case you are a German soccer club and have an important game coming up. You can use the built-in scheduling function of the autoscaler to orchestrate this.

One caveat though: Although horizontal scaling is great for read performance, write performance does not profit that much since in a cluster like Galera, every node gets all the writes replicated and has to write everything down into its own storage.