Sunday, May 28, 2017

A few weeks ago I started experimenting with MySQL InnoDB cluster. As part of the testing, I tried to kill a node to see what happens to the cluster.

The good news is that the cluster is resilient. When the primary node goes missing, the cluster replaces it immediately, and operations continue. This is one of the features of an High Availability system, but this feature alone does not define the usefulness or the robustness of the system. In one of my previous jobs, I worked at testing a commercial HA system and I've learned a few things about what makes a reliable system.

Armed with this knowledge, I did some more experiments with InnoDB Cluster. The attempt from my previous article had no other expectation than seeing operations continue with ease (primary node replacement.) In this article, I examine a few more features of an HA system:

Making sure that a failed primary node does not try to force itself back into the cluster;

To explore the above features (or lack of) we are going to simulate some mundane occurrences. We start with the same cluster seen in the previous article, using Docker InnoDB Cluster. The initial state is

As before, the cluster detects that a node is missing, and excludes it from the cluster. Since it was the primary node, another one becomes primary.

However, this time the node does not come back in the cluster. Checking the cluster status again after several minutes, node 1 is still reported missing. This is not a bug. This is a feature of well behaved HA systems: a primary node that has been already replaced should not come back to the cluster automatically.

Also this experiment was good. Now, for the interesting part, let's see the Split-Brain situation.

At this moment, there are two parts of the cluster, and each one sees it in a different way. The view from the current primary node is the one reported above and what we would expect: node 1 is not available. But if we ask the cluster status to node 1, we get a different situation:

Node 1 thinks it's the primary, and two nodes are missing. Node 2 and three think that node 1 is missing.

In a sane system, the logical way to operate is to admit the failed node back into the cluster, after checking that it is safe to do so. In the InnoDB cluster management there is a rejoinInstance method that allows us to get an instance back:

$ docker exec -it mysqlgr2 mysqlsh --uri root@mysqlgr2:3306 -p$(cat secretpassword.txt)
mysql-js> cluster = dba.getCluster()
<Cluster:testcluster>
mysql-js> cluster.rejoinInstance('mysqlgr1:3306')
Rejoining the instance to the InnoDB cluster. Depending on the original
problem that made the instance unavailable, the rejoin operation might not be
successful and further manual steps will be needed to fix the underlying
problem.
Please monitor the output of the rejoin operation and take necessary action if
the instance cannot rejoin.
Please provide the password for 'root@mysqlgr1:3306':
Rejoining instance to the cluster ...
The instance 'root@mysqlgr1:3306' was successfully rejoined on the cluster.
The instance 'mysqlgr1:3306' was successfully added to the MySQL Cluster.

Sounds good, eh? Apparently, we have node 1 back in the fold. Let's check:

Nope. Node 1 is still missing. And if we try to rescan the cluster, we see that the rejoin call was not effective:

mysql-js> cluster.rescan()
Rescanning the cluster...
Result of the rescanning operation:
{
"defaultReplicaSet": {
"name": "default",
"newlyDiscoveredInstances": [],
"unavailableInstances": [
{
"host": "mysqlgr1:3306",
"label": "mysqlgr1:3306",
"member_id": "6bd04911-4374-11e7-b780-0242ac170002"
}
]
}
}
The instance 'mysqlgr1:3306' is no longer part of the HA setup. It is either offline or left the HA group.
You can try to add it to the cluster again with the cluster.rejoinInstance('mysqlgr1:3306') command or you can remove it from the cluster configuration.
Would you like to remove it from the cluster metadata? [Y|n]: n

It's curious (and frustrating) that we get a recommendation to run the very same function that we've attempted a minute ago.

But, just as a devilish thought, let's try the same experiment from the invalid cluster.

Now this was definitely not supposed to happen. The former failed node has invited a healthy node into its minority cluster and the operation succeeded!

The horrible part? This illegal operation succeeded into reconciling the views from node 1 and node2. Now also node 2 thinks that node1 is again the primary node, and node 3 (which was minding its own business and never had any accidents) is considered missing:

In short, while we were attempting to fix a split brain, we ended up with a different split brain, and an unexpected node promotion. This is clearly a bug, and I hope the MySQL team can make the system more robust.

Monday, May 08, 2017

InnoDB Cluster was released as GA a few weeks ago. I remember the initial announcement of the product at OOW 2016, promising a seamless solution for replication and high availability with great ease of use. I was a bit disappointed to see that, at GA release time, the InnoDB Cluster is a patchwork of three separate products (Group Replication, MySQL Router, MySQL Shell) which the users have to collect and install separately.

Given this situation, I was very pleased when Matthew Lord published Docker-InnoDB-Cluster, an image for Docker that contains everything you need to get the system up and running. The associated scripts make the experience even easier: not only we don't have to hunt for components, but the cluster deployment procedure is completely automated.

Installation

The process is painless. After cloning the repository the start script takes care of everything. It will create a network, deploy three database nodes, and fire up the router.

Most of the configuration (which has been simplified thanks to the usage of MySQL shell to add nodes) is handled inside the container initialization script. Just a few details are needed in the cluster deployment script to get the result.

The deployment script will also invoke the mysql shell in one of the nodes to show the status of the cluster:

The above status is the result of dba.getCluster().status(), which is a convenient way of collecting a bunch of information about the cluster and then present them in a compact JSON structure. If you enable the general log prior to running this command, you would see something like this:

We see that the primary has now ID 200, and the R/O node is 300 (the only other node that has survived.)

Summarizing

The good

I can see that some of the ease of use promised in San Francisco is already available. We can create a cluster with little effort.

The recovery from the master failure is transparent.

The cluster status gives clear information about the system.

The bad

MySQL shell is difficult to use. The command line help is insufficient: some options require trial and error to work correctly. It also does not use an options file like other MySQL clients.

Adding a node after the primary has become unavailable is harder than it should be, and the manual does not contemplate this case. It only mentions a server that can be restarted.

Restarting the router after the primary died is impossible with the current configuration.

The metadata for replication is now in three different schemas: mysql, performance_schema, and mysql_innodb_cluster_metadata. I understand the reasons, but I believe that a simplification would be possible.

The bottom line: quite good to start a cluster, but not enough to deal effectively with simple HA cases. Possibly released too early.