I have been researching high availability (HA) solutions for MySQL between data centers.

For servers located in the same physical environment, I have preferred dual master with heartbeat (floating VIP) using an active passive approach. The heartbeat is over both a serial connection as well as an ethernet connection.

Ultimately, my goal is to maintain this same level of availability but between data centers. I want to dynamically failover between both data centers without manual intervention and still maintain data integrity.

There would be BGP on top. Web clusters in both locations, which would have the potential to route to the databases between both sides. If the Internet connection went down on site 1, clients would route through site 2, to the Web cluster, and then to the database in site 1 if the link between both sites is still up.

With this scenario, due to the lack of physical link (serial) there is a more likely chance of split brain. If the WAN went down between both sites, the VIP would end up on both sites, where a variety of unpleasant scenarios could introduce desync.

Another potential issue I see is difficulty scaling this infrastructure to a third data center in the future.

The network layer is not a focus. The architecture is flexible at this stage. Again, my focus is a solution for maintaining data integrity as well as automatic failover with the MySQL databases. I would likely design the rest around this.

Can you recommend a proven solution for MySQL HA between two physically diverse sites?

Thank you for taking the time to read this. I look forward to reading your recommendations.

Hi - have you determined an approach yet ? It would be interesting to hear what you have decided to do. We have the same problem.
–
MartinMar 8 '10 at 14:23

I appreciate all the responses and everyones' time. Unfortunately, none of these answers truly address the root of the question, which is how people have successfully solved the question in a production environment. When I come to a conclusion here, I will be certain to share my final thoughts. Thus far, this appears to be a severe limitation with MySQL's ability to scale out.
–
WarnerMar 16 '10 at 13:10

Maybe your not getting the write solution, because your asking the wrong question? What data do you need to replicate and why? When you start asking these questions, then you'll be able to find out why you needed replication in the first place. Split brain is not just a mysql problem, it's a cluster concept.
–
The Unix JanitorMar 19 '10 at 9:24

An answer I provided here includes some additional info: serverfault.com/questions/142683/… I will also provide follow-up when the final production implementation is in place.
–
WarnerJul 26 '10 at 15:26

9 Answers
9

You will face the "CAP" theorem problem. You cannot have consistency, availability and partition-tolerance at the same time.

DRBD / MySQL HA relies on synchronous replication at the block device level. This is fine while both nodes are available, or if one suffers a temporary fault, is rebooted etc, then comes back. The problems start when you get a network partition.

Network partitions are extremely likely when you're running at two datacentres. Essentially, neither party can distinguish a partition from the other node failing. The secondary node doesn't know whether it should take over (the primary has failed) or not (the link is gone).

While your machines are in the same location, you can add a secondary channel of communication (typically a serial cable, or crossover ethernet) to get around this problem - so the secondary knows when the primary is GENUINELY down, and it's not a network partition.

The next problem is performance. While DRBD can give decent** performance when your machines have a low-latency connection (e.g. gigabit ethernet - but some people use dedicated high speed networks), the more latency the network has, the longer it takes to commit a transaction***. This is because it needs to wait for the secondary server (when it's online) to acknowledge all the writes before saying "OK" to the app to ensure durability of writes.

If you do this in different datacentres, you typically have several more milliseconds latency, even if they are close by.

** Still much slower than a decent local IO controller

*** You cannot use MyISAM for a high availability DRBD system because it doesn't recover properly/ automatically from an unclean shutdown, which is required during a failover.

I appreciate your time and thoughts. You described some of the issues I'm attempting to avoid very well. Ideally, I'd like to keep the advantages of active/passive dual master for maintenance and quick failover while minimizing the risk of data corruption. I'd think someone out there has found an acceptable solution.
–
WarnerMar 9 '10 at 14:37

1

Indeed. Data does not want to be two places at once.
–
Matt SimmonsMar 9 '10 at 15:00

That was my first thought. Introducing layer 2 to such a degree would require a top down approach between both sites. Other server roles that have redundancy using LinuxHA would have to have similar implementations, such as the firewalls. Otherwise there would be routing issues. Ultimately, even with multiple WAN uplinks between both sites, my comfort level is substantially lower with that than it is with both serial and ethernet uplinks. That is more risk than I can tolerate. Moreover, it seems that there should be a more ideal solution.
–
WarnerMar 2 '10 at 3:56

Your first stage should be to upgrade your current HA solution to one that uses OpenAIS as the Cluster membership layer: this will give you a lot of flexibility, and given low latency links between sites, might be able to reach across. PaceMaker and RHEL Clustering support this.

For automatic data center failover, you really need a third site to act as a tie-breaker, otherwise your sites will not be able to distinguish between inter-site routing problems and remote site failure. Microsoft has some surprisingly good web-casts covering the area:

I've been going back and forth on that. First, I wrote it off entirely as too risky. Now, I'm reconsidering. Realistically, data corruption risk with even two fully diversified paths is pretty high. It's on my short list right now.
–
WarnerMar 11 '10 at 2:41

+1 for a dose of reality. You can use a well managed DNS service like UltraDNS and its site monitoring service "SiteBacker"to get you most of the way there.
–
MartinMar 2 '10 at 13:30

1

We already have BGP in place. This is outside of the scope of my question.
–
WarnerMar 2 '10 at 15:03

2

No, the smallest routable block is /24. Actually, no.. The smallest physically routable block is /28, but you're likely to get ignored by everyone. The smallest prefix that will get listened to is /24.
–
Tom O'Connor♦Mar 9 '10 at 15:17

Giving a correct answer might be hard depending on the amount of data you have, the amount of servers you want to fit this in, etc. That being said, my answer might not be one, or at least the one you are looking for.

There is no proven solution for multiple site with MySQL. But there is solution that works.
As some pointed out, yes DRDB does work fine but has its limit or possible issue depending of your setup.

Will you ever need a third site (another datacenter)? If so, how much time and money will you have to do this?

Considering each time you add a master/slave/dns server, backups, ... you add yourself a server to manage, what is your management capacity in terms of number of servers? If you can define this number, you may have to throw away some possible solutions and work towards those that will fit with your numbers so that management does not become a bottleneck.

Considering datacenters don't go down often, multiple site means load balancing and some DNS hacking, is this going to be in the same datacenter? If so, if one datacenter goes down for whatever reason you will run into issue because a good part of your DNS and loadbalancing is going to be in this datacenter.

So you may have to plan that split brain situation. For almsot each possible setup, the way to resolve a spit brain situation is different. Also, each solution takes X amount of time.
It may also be far more easier to plan to use 3 datacenter from the start. I'm no MySQL expert but I've heard that in production it was easier to have 3 Masters than 2 if you ever run into issue.

One thing that may help you is load balancing service offered by some networking vendor like Zeus, have a look here There is probably many more offering this kind of service. I am sure it comes at a price but sometimes lets you cut down on some other things.

Data is relatively small, all things considered. A couple hundred gigabytes for the sake of discussion. Third site, probably. If necessary, I'm willing to compromise the architecture for a better solution now and revisit later for a third. "Management bottleneck" or other administrative concerns are outside of the scope of the question. Redundancy will be in place for all production technologies. The focus here is MySQL.
–
WarnerMar 15 '10 at 14:42

DRBD is not a recommended solution for remote data centers, since it requires bandwidth that can affect the speed of your database and replication.
The recommended solution is Master - Master Replication. The only issue with this is that you auto increment fields need to be staggered.

If you require a truly HA solution for MySQL you would have to go with MySQL Cluster because DRBD can not give you data integrity in case of failures.

Overcoming the lack of a serial cable is actually really easy, you use a thing from the dark ages called a modem - you have one at each end and then run Heartbeat over the PPP link. You can also use frame relay. Both methods will fix any worries you have with layer1/2 redundant paths.

However that being said - DRBD running over any link with much more than about 300µs (note thats 0.3ms) latency becomes ridiculous very quickly.

You would be better served by using standard MySQL replication, and LinuxHA over PPP & eth to do the fail overs.

Interesting idea. I've used dial-up as failover on a PtP before. While I don't think it would completely eliminate the CAP theorem issue, I do believe this could be supplemental to making split brain less likely to occur. Difficult to create the same level of confidence as created by a several foot direct physical connection.
–
WarnerAug 14 '10 at 0:40