In my previous blog post, I showed how to use bash scripts and move virtual IPs with Orchestrator. As in that post, I assume you already have Orchestrator working. If not, you can find the installation steps here.

In the case of a failover, Orchestrator changes the MySQL topology and promotes a new master. But who lets the application know about this change? This is where ProxySQL helps us.

ProxySQL

You can find the ProxySQL install steps here. In our test, we use the following topology:

For this topology we need the next rules in “ProxySQL”:

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

INSERT INTO mysql_servers(hostname,hostgroup_id,port,weight,max_replication_lag)VALUES('192.168.56.107',601,3306,1000,10);

INSERT INTO mysql_servers(hostname,hostgroup_id,port,weight,max_replication_lag)VALUES('192.168.56.106',601,3306,1000,10);

INSERT INTO mysql_servers(hostname,hostgroup_id,port,weight,max_replication_lag)VALUES('192.168.56.105',601,3306,1000,0);

INSERT INTO mysql_servers(hostname,hostgroup_id,port,weight,max_replication_lag)VALUES('192.168.56.105',600,3306,1000,0);

INSERT INTO mysql_replication_hostgroups VALUES(600,601,'');

LOAD MYSQL SERVERS TORUNTIME;SAVE MYSQL SERVERS TODISK;

insert into mysql_query_rules(username,destination_hostgroup,active)values('testuser_w',600,1);

insert into mysql_query_rules(username,destination_hostgroup,active)values('testuser_r',601,1);

insert into mysql_query_rules(username,destination_hostgroup,active,retries,match_digest)values('testuser_rw',601,1,3,'^SELECT');

LOAD MYSQL QUERY RULES TORUNTIME;SAVE MYSQL QUERY RULES TODISK;

insert into mysql_users(username,password,active,default_hostgroup,default_schema,transaction_persistent)values('testuser_w','Testpass1.',1,600,'test',1);

insert into mysql_users(username,password,active,default_hostgroup,default_schema,transaction_persistent)values('testuser_r','Testpass1.',1,601,'test',1);

insert into mysql_users(username,password,active,default_hostgroup,default_schema,transaction_persistent)values('testuser_rw','Testpass1.',1,600,'test',1);

It shows us “192.168.57.105” is in “hostgroup” 600, which means that server is the master.

How does ProxySQL decide who the new master is?

ProxySQL does not know what the topology looks like, which is really important. ProxySQL is monitoring the “read_only” variables on the MySQL servers, and the server where
read_only=off is going to get the writes. If the old master went down and we changed our topology, we have to change the read_only variables on the new master. Of course, applications like MHA or Orchestrator can do that for us.

We have two possibilities here: the master went down, or we want to promote a new master.

Master is down

If the master goes down, Orchestrator is going to change the topology and set the
read_only=OFF on the promoted master. ProxySQL is going to realize the master went down and send the write traffic to the server where
read_only=OFF.

Let’s do a test. After we stopped MySQL on “192.168.56.105”, Orchestrator promoted “192.168.56.106” as the new master. ProxySQL is using it now as a master:

This happens quickly and does not require any application, VIP or DNS modification.

Promoting a new Master

When we perform a
graceful-master-takeover with Orchestrator, it promotes a slave as a new master, removes the old master from the replicaset and sets
read_only=ON.

From Orchestrator’s point of view, this is great. It promoted a slave as a new master, and old master is not part of the replicaset anymore. But as I mentioned earlier, ProxySQL does not know what the replicaset looks like.

It only knows we changed the
read_only variables on some servers. It is going to send reads to the old master, but it does not have up-to-date data anymore. This is not good at all.

We have two options to avoid this.

Remove master from read hostgroup

If the master is not part of the read hostgroup, ProxySQL won’t send any traffic there after we promote a new master. But in this case, if we lose the slaves, ProxySQL cannot redirect the reads to the master. If we have a lot of slaves, and the replication stopped on the saves because of an error or mistake, the master probably won’t be able to handle all the read traffic. But if we only have a few slaves, it would be good if the master can also handle reads if there is an issue on the slaves.

Using Scheduler

In this great blog post from Marco Tusa, we can see that ProxySQL can use “Schedulers”. We can use the same idea here as well. I wrote a script based on Marco’s that can recognize if the old master is no longer a part of the replicaset.

The script checks the followings:

read_only=ON – the server is read-only (on the slave servers, this has to be ON)

repl_lag is NULL – on the master, this should be NULL (if the
seconds_behind_master is not defined, ProxySQL will report
repl_lag is NULL)

If the
read_only=ON, it means the server is not the master at the moment. But if the
repl_lag is NULL, it means the server is not replicating from anywhere, and it probably was a master. It has to be removed from the Hostgroup.

This is because we changed the “hostgroup_id” to 9601. This is what we wanted so that the old master won’t get more traffic.

Conclusion

Because ProxySQL redirects the traffic based on the
read_only variables, it is important to start the servers with
read_only=ON (even on the master). In that case, we can avoid getting writes on many servers at the same time.

If we want to use
graceful-master-takeover with Orchestrator, we have to use a scheduler that can remove the old master from the read Hostgroup.

Related

Author

Tibi joined Percona in 2015 as a Consultant. Before joining Percona, among many other things, he worked at the world’s largest car hire booking service as a Senior Database Engineer.
He enjoys trying and working with the latest technologies and applications which can help or work with MySQL together.
In his spare time he likes to spend time with his friends, travel around the world and play ultimate frisbee.

Share this post

Comments (3)

The approach of choosing the master node based on reading the read_only value will work in most cases: orchestrator, or another tool, would remove that property from the demoted master, set it on the new master and we’re happy.
Or, you’d set read_only=1 in /etc/my.cnf for all your servers, such that a server that panics and restarts always starts up as read_only.

However the approach does not work in the event of network partitioning of the master: to the world it would appear to be truly dead. But no one is able to set read_only=0 on that master. If it suddenly recovers from the network partitioning, during or after new master promotion, you end up with two different servers, both claiming to be read_only=0.

To mitigate this you’d need to be able to shoot the failing node node (e.g. if it’s AWS you can halt/restart it) through the orchestrator failover scripts.

Otherwise it would be best to find a more holistic approach to deciding which is the true master. A service discovery (consul/zk) would be a good candidate for that. Orchestrator would be able to tell consul: oh hey, I just demoted _that_ master and promoted _that_ one; ProxySQL would periodically consult with consul as for identify of master and route write queries based on that info.

Thanks for your great comment. Yes, I have seen that discussion after the post was published and I am already testing/working on an Orchestrator+Consul+ProxySQL setup.

I am also thinking about that, if it is just a traditional replicaset normal master-slave (not galera), ProxySQL might should disable writes if there are two servers with read_only=OFF in the same hostgroup. It might better not having writes for a short period than writing two nodes and might corrupting your data (of course this is depending on the application and the use cases). I am going to ask Rene what is his opinion about this.

I think even a scheduler can do this but in that case writes could go to two nodes until the scheduler runs and changes the hostgroups etc..