5 Answers
5

At a very simplistic, Kindergartner level, he's right -- you can't just stick a layer 4 load balancer in front of a bunch of MySQL servers and expect it to work, like you would with a bunch of web servers.

However, there are techniques for spreading the load amongst a number of MySQL servers, such as sharding and multimaster replication, and in that sense the guy you were speaking to is dead wrong.

As far as "these downtimes", I'm not sure what you mean -- load balancing isn't (necessarily) high-availability, and it's important to understand the relationship and differences between the two if you want to use these techniques properly.

In general, you handle the downtime of your storage layer by presenting an apologetic sorry page to users and then screaming for help behind the scenes from the on-call sysadmin.

Well you can load-balance any type of server, you just might not get the result you want.

For instance if the DB app is a pure read-only query system then why not, you could benefit enormously by LB'ing a bunch but the problem you run into is with writes as they won't cluster well/properly - it's just designed for that kind of thing unless you go seriously out of your way to design your system top-to-bottom to work that way.

Unlike MS SQL's enterprise versions, MySQL does not have built-in load balancing provisions in the traditional sense. It's likely your friend works with layer-4 and similar LBing solutions, and in that sense, this is true.

However, MySQL sharding can provide for high-availability under heavy load, which I assume is what you're looking for.

Load-balancing a MySQL database environment can be done, but it is not done by putting a LB device between your app servers and your database servers. This is exactly the wrong place to do it.

The problem with doing it at the MySQL connection level is one of knowledge. It is the very nature of SQL to target a single piece of storage, the 'database'. The raw SQL does not have enough information about the relationship of different types of data, nor the required validity of data. Instead, it has to make guesses about why the unknown data is being accessed by seeing how it is being accessed. This is how MySQL Proxy works. And it still requires additional configuration of the database instances themselves.

True cluster-aware load balancing requires the database instances to be able to talk to each other to sort out serialization of access and knowledge about comparative loads on the database engine and on the storage level. I haven't worked on them, but as far as I know, this is the type of thing DB2 and Oracle do in a clustered setup. MySQL does not - it doesn't have the architecture designed to do it.

So you have to load balancing higher up: in the application, before the SQL is sent out. This is a much better place to do it because that is where all the knowledge about data relevancy actually is. This is where the code can know it needs to look in one of the databases over there for this table, and one of the databases over here for this other table. It can know that it can read this database for this table, because it's a replicant, but it must write to that one over there. It can even make decisions about how old the data can be before it's not useful anymore, permitting intelligent caching strategies. It also means many many database instances. This is what LiveJournal figured out over several years.

The downside to this approach is what you lose. No (or few) stored procedures, constraints are difficult, reporting across disparate databases is difficult. It makes your sophisticated SQL engines simple storage. But they're good at that! It also means either your DBA has to become a DB programmer or your DB programmer has to become a DBA. Been there. Done that. I did this scalability in the object layer. It worked really really well. It was just hugely unconventional.