Scalability Testing with Virtual Machines

Making a service based, distributed application is a good way to help your scalability goals. If you can compartmentalize the different aspects of your application into separate processes that can host services through either web services or plain sockets, you’ll gain the ability to make asynchronous and parallel calls between services.

When you’re developing an application you generally use a single development machine. You can run into problems when trying to do integration tests between multiple services due to the limits placed on listening to ports and how some technology stacks can act differently on the same machine in comparison to using multiple machines. While your application might be distributed onto many servers, your development environment might not have the same number of servers to access, often due to budget constraints. This is especially true if you’re just hacking something up in your spare time at home or if you’re a self-employed programmer. However, machine resources can be limited in many companies due to the cost of those servers.

Good news is you probably don’t need additional physical servers just for your own testing. Most computers today are so powerful that you don’t use all of the resources of it very often. You can harness those additional resources by running virtual machines and using those as your additional servers. A really good place to acquire a high-quality, free virtual machine package is from Virtual Box. It’s an open source virtual machine started by Sun and now owned by Oracle that’s quite fast, multiplatform, and of course free.

The main idea will be to run multiple virtual machines, each one representing a proxy of a real server you would run your application on. You can then distribute your services onto these VMs and test how your interactions work. Each VM will have less resources than a real machine, but still enough for most testing you’ll need to do. In fact you can use VMs on your real servers if you find your resource utilization on those servers are low, but you need more machines for your application design.

This is also useful if you’re attempting to build a fault-tolerant distributed application where a single service might be replicated onto many machines. Any distributed application needs to deal with node failure, which is difficult to simulate on a single box. However, if you run your service on multiple virtual machines, you can see the effect of shutting one down during execution to see if your clients can handle the service outage and redirect to a running service. This can also be used to test the reintegration of a previously down server back into a distributed service pool.

Of course this isn’t new stuff at all. Many people have been using VMs for years for a variety of purposes. I’d like to mention however that if you’ve avoided VMs because you don’t see how they can apply to your work, if you think your work is too small for it, you should reconsider. Even a non-developer can benefit from using Virtual Machines. Sometimes you might want to run a piece of software that doesn’t work in your current OS, but you’d like to gain its services. You can fire up a VM of the exact OS that software supports and run it inside of a virtual machine. Some VMs can be fairly light-weight, taking up only a minor amount of ram, disk space, and cpu usage.

Another developer usage for VMs is to make a completely clean build environment. If you want to make sure your package can build, run, and execute with many Linux distributions you can create automations to create new virtual machines, install the OS, import your code, compile, test, run your application, export the results, then remove the virtual machine. You can use VMs as part of an extended systems test for your application. This way you can find out if your application relies on some odd configuration that might cause difficulties for your users. It also can help you refine your build process and configurations for multiple distributions or Operating Systems.

In the current development climate, processors have nearly reached their speed limits. It is said that the next big push is for more parallel processing among multiple cores. Beyond that is distributed computing. When needs reach beyond the multiple processors of a single machine, we’ll look at the resources of multiple processes of multiple machines. This work is already being done at Yahoo!, Google, Microsoft, Amazon, Ebay, Facebook, and all of those other large tech corporations. However in the future I suspect that it will become fairly common to utilize the resources of many devices for a single application. Virtual Machines can give you a cheap way to explore the exciting realm of distributed applications from the comfort of your single (hopefully mutli-core, multi-processor, or both) machine.