KISS KISS KISS

When I visit customers quite often they tell me about number of creative techniques they heard on the conferences, read on the blogs, forums and Internet articles and they ask me if they should use them. My advice is frequently – do not. It is fun to be creative but creative solutions also means unproven and people who had to become creative with their system often did that because they had no choice. Of course when they came to the bunch of conferences and told their story which resonated across the Internet sticking to the people mind as a good practice.

There are 2 things you should ask yourself. First is the scale comparable – the recipes from Facebook, YouTube, Yahoo, are not good for like 99.9% of the applications because they are not even remotely close in size and so capacity requirements. Second if this “smart thing” was truly thought out architecture choice in beginning or it was the choice within code base constrains they had, and so you might not have.

Let me look into couple of most typical reservations.

Sharding – This is perhaps the technology people get obsessed with most regularly. Sometimes it looks like a homepage running on 100K database visited by 100 people in a month is attempted to be sharded. Remember as commodity hardware is advancing the size of the application when you really need to shard moves further and further away. I remember LiveJournal with 4GB of memory per box doing sharding 5 years ago…. well now you can get a box with 128G of RAM within $15K. Keeping working set in memory is not the only reason for sharding but one of the most frequent ones. The examples I like to use is YouTube – they did not shard until after Google bought them (though they were in pain) and 37Signals

When doing Performance Audit we tend to look at the required capacity and data size within current horizon. In many cases even with super optimistic assumptions application will do just fine with single “cluster” even on the current hardware for several years.

Replication Optimization People often get scared with the fact replication is single thread and often becomes bottleneck so they are using various optimizations including tricky prefetch approaches suggested by YouTube. Interesting enough this often happens even when system is far from reaching its replication capacity.

I would suggest measure and monitor your replication capacity (how long will it take a slave to catch up 1 hour lag of peak traffic ?) and act appropriately. Also focus on simple optimizations first, if you need to get to prefetch you’re quite likely beyond reasonable use of single master and should have done sharding functional partitioning or something else.

Complex Replication There are impressive numbers out there on how many slaves people run and how complex replication topologies with multiple tiers filtering and writes to some intermediary slaves people use. For me simple is best. Complex architectures are more error prone harder to maintain (upgrades etc) and troubleshoot. Remember for every single “role” in such setup you need to understand what to do with it if any other “role” in the system fails, which escalates complexity. You may need something more advanced than master and one slave but any complication needs to be justified. I also should note slaves are not overly efficient beasts – they not only store the copy of data on the disk, wasting resources but their caches are also highly redundant defeating the fact you may have a lot of total memory on the slave farm.

Reading from the Slaves The story heard is typically – Web applications often have significantly prevailing reads so to scale we better have many slaves which we can use to handle most of our read traffic right ? Sure. Unless you’re using memcache or other caching option. Successful memcache implementations often report 90% cache hit ratio meaning 10 to 1 read ratio drop backs to one to one. This means you may not need a lot of slaves if your application allows use of efficient caching.

Now lets look at the simplified case – you got pair of servers replicating as Master-Master which you typically want for high availability and online schema changes. How far you want to go making your application being capable to read from the slave ? Remember as you’re doing this for high availability and online schema changes you’re planning to operate without slave every so often, meaning one server should be able to handle all traffic from capacity planning standpoint anyway. At the same time slave can be perfect to be used for non production impacting things like analytics.

High Availability The trick with high availability is the more complex architectures and processes you use for high availability prevention the more likely it is for them to fail. Unless you’re Google scale with failures happening daily you can’t really be sure you’re handling “wild” failures, not the test ones well. Furthermore you always have to look at failures caused by other things – wrong code pushed to production, hacker break in, data center power failure etc.

Google guys tells us single MySQL server on a good hardware has MTBF somewhere between 1000 and 2000 days. This is a lot of time which means for most of applications having a pair of slaves (even though second slave is available for failover only 99% of the time) is more than enough.

I would say more. In my experience the availability of the application is only related to the MySQL redundancy for very high quality/high scale applications. I’ve seen applications having no downtime running for years on single MySQL server (which just does not crash) as well as complex no single point of failure database backend with application constantly going down because of bad code or something unpredicted.

Summary: So am I denying all MySQL industry practices (which we also covered in a great depth in our book) ? Not really. I’m just suggesting do not just grab advice from the Internet or friends tip and do not complicate beyond the need. You may start with couple of replicated nodes for high availability and maintenance if you’re in serious business (and just one server and good point in time backup if you’re on the budget) and assess any need for any complications. It may be boring but boring systems often have highest uptime

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

This is great advice for startups. The key is to evolve from simplicity to complexity based on proven rather than imagined needs. Few startups warrant complex distributed database systems; this engineering time is better spent building something their customers want.

Aside from advice on exotic MySQL configurations, there seems to be a new column-oriented or distributed key-value database project launching every month. I’m afraid this will be the exotic configuration of the next few years, where startups spend lots of time building systems that will scale to the billions of records they’ll never have to store.

Thanks for this post; having a top notch expert writing we can point to will help fighting those “overdesigning hubris” moments we all have, for sure

“Master-Master which you typically want for high availability and online schema changes.”

Ahh, thanks for this tip, I never thought to address this problem that way. I guess you mean something like, along the lines: remove one master from being accessed by applications, stop slave on the remaining, busy one, make schema change on the non busy server, when everything is done, exchange the roles of the servers (ex. non-busy become worker) and wait for the other server to catch up schema changes. Is it this?

By the way, something that makes planning such a solution very useful with MySQL, despite complexity (and even when the schema change is expected to be fast and resources friendly, without large index creations) is the problem with InnoDB schema changes being not ACID compliant. This dangerous lack of feature is often overlooked in my small experience; I remember a Zarafa (a MySQL based Exchange like) upgrade which, after quite a few schema changes bundled in a theoretical transaction, did failed to had a constraint (because of database content) and desperately tried to rollback the whole changeset (bummer! all corporate emails made inaccessible!). So for now we have to circumvent this problem with some infrastructure complexity. Having rollbackable ALTERs, like PostgreSQL already supports, would be quite an improvement for MySQL/InnoDB. Is this totally impossible? are there plans like this for XtraDB (wouldn’t that be a killer feature)?

Complexity often causes trouble. But simplicity should not be confused with “lack of good design” – and I’m not saying “proper design” as dogma rarely helps. It’s very important to learn why some “rules” (guidelines) exist, so you know when to break them! This is what we teach in our classes, and it’s oh so useful.

Another thing to steer clear of is marketing hype… a company plugging the latest and greatest fancy technology or tool.

Right. You can do MySQL/hardware/OS upgrades this way, schema changes or table maintainance virtually without downtime. Transactional DDL would be nice though so far it can’t be solved on just storage engine lawyer (hence can’t be added just to XtraDB).

@Matthew I’m not sure to understand the worklog implicit consequences very well, so unsure about the functionality coverage; but it seems to be a very useful proposal even for what it explicitly states.

The worklog’s description wording focuses on addressing serializability and isolation problems with concurrent accesses while performing DDL, through proper locking. Is deferring the release of metadata locks up to an explicit commit/rollback enough to get rid of DDL auto commits (ie. does this implicitly means that the schema changes can actually be rolled back, or is it just a first step)?

This is great advice, although perhaps bad for me. Overly complex deployments create more interesting problems to solve. One other point that relates to scalability is that many (most?) systems have a lot of spare capacity waiting to be discovered as soon as they deploy the user_stats patch from Percona/Google. With that in place it is common to find a lot of unexpected load and then query logging or SHOW PROCESSLIST sampling can be used to find the problem queries.