How do you put a Database in the Cloud?

Filed in by Jonathan Bryce | October 28, 2009 10:45 am

There’s a lot of buzz in the cloud world around Amazon’s new Relational Database Service[1]. With this move, Amazon inches up one level from pure infrastructure to also owning the operating system and base server software (you can’t SSH into an RDS EC2 instance). More interesting than the announcement itself is the discussion it’s generated, a frequent question being, “Is it really cloud?” Techcrunch[2] provides coverage and an intelligent discussion in the comments.

Rightscale[3] makes the point that RDS instances are basically MySQL appliances, at the core just EC2 instances running MySQL. This is a capability RightScale has offered for years on top of the same infrastructure. The RDS instances then have some valuable, automated services layered on top to back up and scale the resources available to that EC2 instance. This is similar to the value added services Rightscale has offered as well as similar to the snapshot backups and in-place scaling Cloud Servers[4] offers for all server types. A side note is that this is obviously a step that will be worrisome for some of the Amazon partners who are building businesses on top of Amazon’s infrastructure services.

Database != Cloud?

Back to the original question: How do you do databases in the cloud? This is a question we’ve been consumed with for years. We are are running thousands and thousands of applications, most of which are back-ended by a MySQL or Microsoft SQL Server database. At Rackspace, we have a few basic philosophies that influence how we approach our product offerings from managed hosting to cloud to email.

First, we want to give users a variety of options that start very low in the stack and go all the way up to software services. Customers can pick where in the stack they want to work to match their needs for customization, ease of use, and required technical skill.

Second, we want to try to make the transition from one type of service to another as smooth as possible. I applaud Amazon for implementing this in a way that preserves the standard MySQL protocol. We’ve taken the same approach with our Cloud Sites database capabilities.

The third goal, though, is the hardest: smooth scalability. One of the primary promises of the cloud is elasticity. For something like our web application servers, we run custom versions of web server software that allow us to reach a level of scale that will meet practically any need. Relational databases, though, are much more difficult to scale infinitely. Amazon’s approach is to give their users the building blocks and ask them to handle the scaling. We’ve taken a different tack, trying to handle scaling as seamlessly as possible.

Along the way, we’ve learned many lessons about scaling, especially databases. The first lesson is that there’s still a limit to how far you can stretch today’s relational database software. We’ve been able to create a MySQL offering for Cloud Sites that has elastically handled massive volume, but we’ve also reached upper bounds and had to help a small number of customers deploy in other configurations. You can throw bigger, beefier hardware at it, but eventually you can’t go any farther vertically. You can scale horizontally using projects like mysql-proxy[5] to load-balance queries, but again, you will run into problems like maintaining consistency across all your nodes. For the vast majority of database usage out there, these problems never appear on the horizon and the work we’ve already done on MySQL is elastic enough. For those cases where the database needs to do more, we’ve been working with two interesting new technologies.

Drizzle

Drizzle[6] is a project that is one step removed from MySQL. It’s primarily worked on by developers who also worked on MySQL, with the goal being to modularize every component of MySQL and build a scalable, cloud-friendly version of the world’s most popular RDBMS. Drizzle is still in early stages of development, but it shows a lot of promise. Bringing real horizontal scaling to MySQL, while maintaining compatibility and relational database functionality will be a huge step forward. And if it won’t require a complete rework of the decades of development time that has been spent on RDBMS-backed applications, that is a big bonus.

Cassandra

Beyond Drizzle, we are actively contributing to and working on the Cassandra[7] distributed database system. Cassandra goes beyond trying to scale a traditional relational database. Cassandra removes many of those traditional concepts and places a priority on scaling. When you are dealing with billions of writes and terabytes of data, you’ve moved into a realm of technology needs that requires you to adopt some new concepts. Truly web-scaled applications, like Digg, have reached this point[8] and started to make the shift. The possibility of reaching hundreds of millions of users worldwide with online applications creates scale problems like never before. We see distributed databases as a key component to solving the next wave of scaling problems and that’s why we are investing heavily into it. If you put in a little bit of extra development effort upfront, they offer the potential for truly elastic, cloud database services.

If the idea of creating infinitely scalable database technology consumes your thoughts, send us your resume – jobs@rackspacecloud.com[9]. We are always looking to add skilled engineers and developers, and will provide an opportunity to work on some of the largest infrastructure systems in the world, handling billions of transactions every month.