Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I know this might not be a very good question to be asked in this site, but I really need some helps/hints to be directed to the right path.

I want to have failover strategy for 2 MySQL servers which are in Master-Master Replication mode. I have done the replication, and it's working just fine. Yet, when in comes to the failover strategy, I'm kind of clueless...

I've been reading on the internet over the past couple of days (I know this is a very complex procedure, and I could not master it in a week or so), but the more I read the more confuse I get. Basically, as I understood, I need to have a software to monitor the network and report any failure and another one to switch between two systems. Heartbeat apparently is the best in doing so, but in all tutorials I read, they are using many other software ( like MON for monitoring, or other software for dumping or getting backups) as well. Is it necessary to have them? Can one have fail-over strategy only using MySQL and HA tool like Heartbeat?

Each website is introducing different tool and ways, which they have their own pros and cons, and of course it depends on how I want my "cluster" to work, and how much high availability is important to my system. I am just considering pointing it out to my customers, and it's not going to the production phase any soon, since my customers running small businesses and don't care about loosing data that much, however, I wanted to dig into the concept by setting up a simple master-master replication where in case of failure of any servers, clients automatically commit changes to the standby server.

1 Answer
1

Given the fact that you mentioned you have Master-Master replication mode, I would not recommend any automatic failover unless you properly account for Replication Lag. After all, MySQL Replication is asynchronous.

It is theoretically possible to have the following:

DBServer1 as Master to DBServer2

DBServer2 as Master to DBServer1

DBVIP pointing at DBServer1

DBServer2 is 180 seconds behind

DBServer1 goes down

Automatic Failover moves DBVIP to DBServer2

With this scenario, DBServer2 could have auto increment keys that do not exist yet. Upon failover, the DBVIP will allow WebServers to connect to DBServer2 and ask for data that does not exist yet.

This would therefore require background processes running on each DBServer.

For the above scenario:

DBVIP is on DBServer1

DBServer1 runs HeartBeat

DBServer2 runs HeartBeat

Background Process on DBServer1 to monitor

a) Data Mount Availability

b) Data Mount Writeability

c) MySQL Connectivity

Once a,b, or c fail, kill HeartBeat

Background Process on DBServer2 to make sure DBVIP is pingable

What should killing HeartBeat do? Trigger the startup script defined for it.

What should the startup script on DBServer2 look for?

Loop until DBVIP is unreachable via ping

Connect to MySQL and

Run SHOW SLAVE STATUS\G in a Loop until Seconds_Behind_Master is NULL

RUn SHOW SLAVE STATUS\G in a Loop until `Exec_Master_Log_Pos stops changing

Assign DBVIP to DBServer2 via ip addr add

This is essentailly the algorithm for failing over safely to a Passive Master in a Master/Master Replication Cluster.

ALTERNATIVE

If ALL your data is InnoDB, I recommend something with less rigor. Perhaps you should look into using DRBD and HeartBeat. Here is why:

This is a lot more straightforward because only one side is Active. The Passive side (DRBD Secondary) is a Synchronous Disk Copy of the Active Side (DRBD Primary).

CAVEAT

If all or most of the working set data is MyISAM, do not touch DRBD. Crash scnearios quickly result in MyISAM tables being marked crashed and need auto-repair (which can be paintfully slow to wait for).

Thank you for your complete answer, I'm using InnoDB, so I guess I would go for DRBD... they have lots of documentation on their website as well. But about the Replication lag in Master-Master mode which you've mentioned... would it be very obvious if you have both your nodes in the same geographic place, and have them connected through cable?
–
mrzDec 29 '12 at 6:15

Having DRBD nodes connected via cable on a dedicated NIC is a must. Doing DRBD via a switch introduces network latency even on DRBD's best day. While it is possible to do geographic DRBD, that's just begging for trouble. If you want a Cluster across data centers, I would suggest a DRBD pair in each data center and semisynchrounous replication between them. I will update my answer with additional posts I have on DRBD.
–
RolandoMySQLDBADec 29 '12 at 12:48

I wish I could have up voted your answer twice :) thank you again once again.
–
mrzDec 29 '12 at 13:18

I looked into DRBD today, and correct me if I'm wrong, bit it seems it is a very complex approach, and recovering data after failover with DRBD is scary... I was wondering aren't any simpler way of doing this? Like configuring pacemaker and corosync to take care of fail-over, and have mysql to take of replication? (I'm just trying to show a demo to some customers and see what is their reaction, in case I get positive feed back I can assign more resource on it, and do the best approach. I'm just looking for some easy-to-setup approach, rather busy with other stuff than lazy to set DRDB up)
–
mrzDec 29 '12 at 14:16

I have used DRBD for the past 5 years. For an all-InnoDB database, DRBD is exactly what is needed. DRBD will handle synchronized writes to disk. InnoDB takes care of all writes that were written to the double write buffer and transaction logs. As a DBA, my main concern is the data integrity. Failover is supposed to accommodate data integrity. This is why whether you pick Linux HeartBeat, HAProxy, PaceMaker, or ucarp, the failover paradigm must handle the mounting and unmounting of DRBD. You could still setup failover paradigm on Master/Master Replication and it would be simpler.
–
RolandoMySQLDBADec 29 '12 at 16:18