Service Isolation By Virtualization

As recommended by Ezra Zygmuntowicz, I’ve divided all layers of our web application into separate virtual machines using Xen. At a first glance, having virtual machines for every service sounds like quite some overhead. Isn’t it much simpler to just install the whole stack on one box and let it run? Why take the hassle of installing and maintaining an additional virtualization layer? Shouldn’t we keep everything simple and straightforward? After running this setup for nearly a year and scaling it from hundreds of thousands of clicks per month to a couple of million, I want to share with you the ways in which virtualization really pays off for us.

Ramp-up Costs

If you’ve never used any virtualization technology on the server side, you might need to invest some time to learn about the specifics of the product of your choice. Xen itself is quite complex and learning to both use and tune it takes a real committment. If you run your own hardware, and therefore need to setup the virtual infrastructure yourself, this might be an investment you’re prepared to take. You can avoid the hassle of maintaining your own virtual inftrastructure by using a hosting company which offers virtual servers like joyent accelerators or slicehost. But, if you’re stuck with your own root servers (like we are due to the fact that we weren’t able to find a decent offering of virtual servers in the European Union), you’ll have to set up the virtual infrastructure yourself.

Resource Management

Running each service in a separate virtual machine lets you manage its resources very strictly. You can allocate memory, disk space and even the number of CPU cores. We initially decided to run every VM with the resource minimums. This has the advantage of showing you bottlenecks much earlier than running it with all the hardware resources available. If you only have a couple of users, even dedicating a small server for your application is really overkill (or at least it should be, if you plan to scale). Using many times more resources than your application should need for such a small amount of users will hide upcoming performance bottlenecks until its too late. But, if you try to run your application in a scaled down environment with just enough resources available, you’ll see the bottlenecks and be able to decide whether you have to fix your application (e.g. because it has memory leaks or is lacking database indices) or invest in more hardware.

Monitoring

As you run every service in a separate virtual machine, you are able to use basic monitoring for each of them. That enables you to monitor every aspect of your web application stack separately (e.g. basic nagios checks for memory, I/O wait, load, etc.). By that, you can easily identify performance bottlenecks. As you avoid tight coupling of services, your monitoring solution will give you a very clear and precise picture of what is really happening enabling you to tackle problems in isolation.

Encapsulation Of Problems

Using one VM per service has the advantage of encapsulating most problems within a certain VM instead of spreading through your whole stack and causing cascading effects. A runaway process eating all the available memory would kill your whole server if you didn’t use virtualization. But if you’ve carefully allocated memory resources, only that VM should become unresponsive allowing you to log in to the host and restart it without effecting any other service (as long as you have every service running in fail-over configuration; this is also much easier to accomplish using VMs as all you have to do is just clone them).

Another advantage of encapsulating services in VMs is that most changes you apply to a certain VM do not affect any other service. It’s easy to upgrade to a newer version of a core library needed by one of your services if you don’t have to test for any negative side effects on other services. If, for example, memcached needs a newer version of libevent than another service requires, you can easily upgrade your memcached VM. Try doing that on a machine where all the services are physically running!

Summary

Even with the higher setup and maintenance costs of running our own virtual infrastructure, we still gain so many advantages of separating every service of our web application stack into its own virtual machine that we’re very happy to pay that price.

If you are able to enlist the services of a hosting provider which offers virtual machines to you directly, you avoid these extra efforts and still get all advantages! Now that’s what I really call “Win-win”!

3 thoughts on “Service Isolation By Virtualization”

Nice article Matthias,
I’ve been calling this functional siloing and tiering recently. You still end up with your multiple tiers (N-Tier) but then silo across the tiers. I’ve been drawing it like this:http://www.gliffy.com/publish/1502705/
It’s great because each silo and tier can usually be scaled independently of the other. That’s not to say there aren’t consequences in other tiers of scaling a given silos tier. For example, in a config like that one people sometimes forget that the DB layer can only take so many connections from so many application servers. But, they dependencies are usually more manageable this way.
Having a well written application sure helps.
It is definitely virtualization that makes this possible from a total cost of ownership perspective.
-Kent

I like your siloed approach. It’s the logical next step in decoupling various parts of an application. I’ve thought about it a lot but did not yet set it up like you show it.

Separating static requests from dynamic ones has the advantage, that you can use a CDN for your static content. Not having that separation but still using a CDN made us duplicate our whole site: Once with the CDN URL (assets.domain.com) and once under http://www.domain.com. Google did not like that 😉 Before we can start using a CDN again, we’ll have to create such a static silo.

Additionally to the silos you mention I’m planning to setup a silo for our API. Generating all the XMLs is quite resource intensive on the app server and needs to scale independently of all other parts of the application.