Docker II: Replicated MySQL with HAProxy Load Balancing

In the previous part we use the official Cassandra images from Docker Hub to start containers and have them form a cluster. In this post we will see how to create our own docker images to facilitate the deployment of a Master-Slave replicated MySQL cluster. We will also use a HAProxy container to load-balance our MySQL instances.

The Custom MySQL Image

MySQL has an official Docker image on Docker Hub. It is a simple image to start a MySQL instance with the given database name, user and password in the container environment variables. But unlike Cassandra, MySQL does not have an official or default way to be clustered, so the image does not provide a way to do it out of the box.

Master-Slave replication is one way of scaling out instances, and is usually configured manually. I have written a bash script, which I explained in a previous post, to automate this process. What we will be doing in this section is to extend the official MySQL image, and simply add a few replication configurations on top of it, and also enable using my automation script in the containers.

Understanding the Official Image

Since we will be extending this image, let’s first study how it works. To understand an image we must first look at its Dockerfile, because it is the file containing the instructions on how to build the image. So in a sense it is the definition of the image. It is usually referenced on the Docker Hub page. The Dockerfile and resources for the latest Docker image can be found in the official docker GitHub repo:

The first line of the Dockerfile contains
FROM debian:jessie which means that the image is an extension of the debian:jessie image. This means they are taking an image containing a debian distribution, and will install mysql on top of it. The following lines are mostly
ENV and
RUN commands, to set default environment variables and run linux and debian commands to install mysql packages, change permissions, move files, etc… These are all the commands that actually install Mysql in the image.

At the bottom of the Dockerfile we can see a few important things:

COPY docker-entrypoint.sh /usr/local/bin/: copy the script which sits next to the Dockerfile into the image’s filesystem.

ENTRYPOINT["docker-entrypoint.sh"]: When a container is started from this image, run the script as its main process.

CMD["mysqld"]: Pass “mysql” as the command argument to the entrypoint script.

So when you run a container using this image, the container already contains Debian with all MySQL binaries, and it runs the
docker-entrypoint.sh script. In the same repo, next to the Dockerfile, open the script and take a look at what it contains. Most commands in this file read the environment variables passed during container startup, and prepare the database accordingly (database and user creation, privileges, etc…). The last line of the script is
exec"$@" to run mysqld, which was automatically passed as the argument to the script, because it was the CMD value in the Dockerfile.

The first instruction,
FROM, shows that we extend the official image of mysql, version 5.7.

The second instruction,
RUN, runs shell commands. In this case it installs a few APT packages on top of the base image. The openssh-client package is useful to use SCP protocol when we will transfer the mysql dump from master to slave(s). The vim and net-tools commands are not necessary but might be useful for debugging purposes inside the container.

After that the
COPY instruction copies files to the image’s filesystem. Here I decided to copy all sources to the
/app/ directory in the image filesystem.

The
WORKDIR instruction defines the default directory from which relative paths are defined in other Dockerfile instructions. It is also the default directory when running commands in the container. For example when running a
docker exec -ti <container> bash you will end up in the WORKDIR.

The
ENV instructions define default environment variables which will be used in the container. These can be overridden by using the
-e option when starting a container.

Finally, we define the
ENTRYPOINT. This overwrites the entrypoint script from the base image.

The Entrypoint Script

The new entrypoint for this new image, new-entrypoint.sh, starts by creating the necessary configuration files for mysql replication. It replaces the environment variables in
custom.cnf.template to produce
/etc/mysql/conf.d/custom.cnf. It does the same thing with
credentials.cnf.template to store user/password for remote commands later.

Then, if the container being created is destined to be a slave, and that the master’s IP address was defined in the environment variable
AUTO_INIT_MASTER_IP for automatic replication activation, the replication script is started in background mode. The script will wait until mysql is started before activating the replication.

Finally, to start mysql, we simply call the old entrypoint script !

The Replication Script

This script is very similar to the script I explained in a previous post about MySQL replication automation, with a few little changes:

The MASTER and SLAVES variables are now script parameters.

The first argument is the IP of the master

Subsequent arguments are slave IPs

The remote mysql shell commands are authenticated by the
credentials.cnf file instead of
-u and
-p parameters.

In the case of slave auto-initialization, it will wait for MySQL to start.

In the case of slave auto-initialization, skip sending the dump via SCP, because it is already on the slave.

Building the Image

Considering that you are in the directory where the Dockerfile is located:

1

2

3

dock@ubuntu0$ docker build -t rep_mysql .

dock@ubuntu0$ docker tag rep_mysql nicomak/rep_mysql:5.7

dock@ubuntu0$ docker push nicomak/rep_mysql:5.7

The first command build the image and calls it rep_mysql. After this command, you can already
docker run the image locally because it exists in your local repo.

The second command tags the image with an DockerHub account prefix, and a tag suffix, which is the version 5.7 here. The third command pushes the image to Docker Hub, so that it can later be downloaded by any host connected to the internet. Note that for these 2 steps you need to create a Docker Hub account.

Running the MySQL containers with Replication

The image I have created is essentially a MySQL image but with extra configurations, and a script which can automatically initialize replication. This script can either be run anytime after the containers have started, or during a slave startup by passing it the IP address of its master in
AUTO_INIT_MASTER_IP.

To illustrate its usage, let’s first create containers on 3 of my nodes: ubuntu1, ubuntu2 and ubuntu3:

We can now create a table and insert rows in the master mysql instance on ubuntu1, and the changes will be automatically replicated to the slave nodes ubuntu[2-4].

Deploying a HAProxy container for Load-Balancing

We now have 1 Master which replicates any changes to the djangodb database to the 3 Slaves.

To make good use of their replicated data and achieve collective performance and availability, we can use a Load Balancer. Luckily, HAProxy has an official image that we can use strait out-of-the-box. We won’t need to create our own image, but we will need to configure a few things first.

Create a User for Status Checks

The first thing we need to do is create a password-less MySQL user for HAProxy to perform its status checks.

If we create the user on the master ubuntu1, while using our replicated database djangodb, the user creation will be replicated on all slave nodes, which is convenient:

Create the HAProxy configuration

The next step is to write a
haproxy_mysql.cfg configuration file which will define how HAProxy must behave. Here is an example I wrote:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

global

log127.0.0.1local0 notice

user root

group root

defaults

log global

retries2

timeout connect3000

timeout server5000

timeout client5000

listen write_nodes

bind0.0.0.0:3306

mode tcp

option mysql-check user haproxy_check

server mysql1192.168.0.201:3306check inter2000fall3rise99999999

server mysql2192.168.0.202:3306check backup

listen read_nodes

bind0.0.0.0:3307

mode tcp

option mysql-check user haproxy_check

balance roundrobin

server mysql2192.168.0.202:3306check

server mysql3192.168.0.203:3306check

server mysql4192.168.0.204:3306check

listen stats_page

bind0.0.0.0:8080

mode http

stats enable

stats uri/stats

stats realm Strictly\Private

stats auth admin:password

❗ Host IP addresses must be used in the configuration file instead of host names (ubuntu1, ubuntu2, etc…).

This is because the file will be used inside the container, which is not aware of those host names. As explained in the previous part, when using a the default bridge network, a container does not inherit of its host’s
/etc/hosts file. You can use host names if you redefine the
/etc/hosts file in your containers, or by use the host network stack on your container.

There are 3 listen groups:

The write_nodesinterface on port 3306

We will use this only for writing instructions (Create, Insert, Update, etc…).

Uses ubuntu1 as the only active server. We want all requests to go to our Master so that is can replicate changes to its slaves.

The
-v option mounts our
haproxy_mysql.cfg to the container to be used as its main configuration file. The
-p options expose the ports we need, which are the ports of the interfaces defined in our configuration file.

We can now test the 3306 and 3307 ports by checking on which instance the requests end up:

Failover

So what we deployed in this part is a small MySQL Master-Slave replication cluster. The HAProxy is the gateway to the cluster resources:

When called on port 3306, the requests are forwarded go to the Master only

When called on port 3307, the requests are forwarded to the Slaves only

This way we have an instance dedicated to write operations, and load-balanced slave instanced to scale out read operations.

We have also a backup instance to take over write operations if the master dies. If that happens, however, manual intervention (or extra automated processes) must be used. The backup will have to be quickly declared as the new master for all slaves, otherwise the slaves will stay master-less and keep serving the outdated data. Ways to do this are:

Then manually configure the master/slaves, or call my the
replication-start.sh script.

Create a cron job or daemon script to monitor the status of servers.

It can call the stats uri at
http://ubuntu0:8080/stats;csv to get the results in a CSV format, to be parsed more easily.

Use the
replication-start.sh script to reconfigure replication with the new master.

In the next part we will create our Django web application. Based on what we did here, the web apps will have to communicate with the HAProxy server, and know when to use the port 3306 or 3307 based on the type of MySQL requests.