Testing

How Much Development and Test Infrastructure Capacity Do You Really Need?

By Navin R. Thadani, June 10, 2014

With good metrics and queuing theory, it's possible to accurately compute your needs. Fulfilling the requirements, then becomes the tricky part.

We can quantify this new efficient frontier by re-running the queueing simulation (Figure 9). The optimal point has now shifted to 5 smoke testing environments, 8 integration testing environments, 1 system test environment, and 3 manual QA environments. Y, the incremental cost of additional test environments, is offset by the productivity increase in the development team since there is no waiting even when the commit rate peaks to 27 commits per hour.

Figure 9.

A Difficult Situation: Could the Cloud Be the Answer?

As we have illustrated, during the days just before a milestone or sprint, we need many more test environments. But during normal days, a few test environments suffice. If you had to pick, economically speaking, it would make sense to build out a larger lab and plan for the peak commit load. However, sometimes this isn't possible due to budget constraints and the difficulty in financially justifying higher capex and opex.

This situation is ideal for the public cloud. One can theoretically get infinite resources on demand, do whatever testing is required, and then shut them down  paying only for what you use. This model changes our efficient frontier substantially (Figure 10). Since 5,000 commits are expected in this project, every time there is a commit, you can spin up an instance in the cloud and shut it down after the test is done. Each commit needs to go through smoke testing (5 minutes), then integration testing (12 minutes), and then system testing (4 hours) during the nightly build. So that's a total of 5,000 x 1 hr + 5,000 x 1 hr + 200 x 4 hours = 10,800 hours of cloud usage (even if a test takes only 5 minutes, it is billed as 1 hour on some clouds). Assuming each test environment in the cloud costs $1 per hour, that's approximately $11,000 for the whole project (or worst-case $15,000 assuming some tests fail and need to be re-run). This is a great savings compared with a total of $250,000 needed to build-out our lab internally for peak capacity.

Figure 10.

Public Cloud = Slam Dunk? Not So Fast…

So $250,000 (internal data center capacity) versus $15,000 (public cloud capacity on demand) seems like a slam dunk for the public cloud. However, it is critical to note that the public cloud is a completely different environment. It takes time and effort to "migrate" the application environment over to the cloud. Differences in the virtualization infrastructure (usually VMware internally versus Xen in AWS), differences in networking (static IPs, DNS, DHCP, L2 networking appliances internally, versus ephemeral IPs, elastic IPs, different DNS method, and a whole different networking paradigm in AWS), and differences in storage topology (EBS, S3 in AWS) all mean that it could take a while to migrate your application environment to the cloud.

In addition, simple migration to the cloud is not enough. To be able to realize this JIT-test-environment nirvana, you need to spend engineering resources to automate your application provisioning in the public cloud. In this way, you can click one button or make a simple API call, and you have an environment running in the cloud. But setting up the automation can also take substantial effort depending on the complexity of the application.

In fact, 6 to 12+ month "cloud migration projects" are not unheard of in the industry. And even after all that, the environment that you get in the public cloud is still going to be different from what you have on premises for production. This discrepancy could be material for complex enterprise applications, but negligible for simpler Web applications.

The Tradeoff: Average Lab versus Peak Lab versus Public Cloud

The table below is a summary of the tradeoff:

Option 1 Design lab for average commit rate

Option 2 Design lab for peak commit rate

Option 3Public cloud + migration + automation

Number of test environments:

2 smoke test

2 integration test

1 system test

2 manual QA

Number of test environments:

5 smoke test

8 integration test

1 system test

3 manual QA

Number of test environments:

On-demand/just-in-time (JIT) test environments

Pros:

Costs less in terms of capex and opex ($100K for the lab)

Equipment utilization is higher

Pros:

No time wasted waiting for access to test environments

Pros:

Lower cost for test environments (total of ~$15,000 for the project

No waiting time, hence, high efficiency

Cons:

Developers waste time waiting for access to test environments

Sometimes get frustrated and don't test as much as they should

Cons:

Significantly higher cost for development lab (>$250K)

Average lab equipment utilization <1%

Cons:

Incur cloud migration cost

Incur additional automation cost

Different environment than production

Conclusion: Key Ingredients of Agility  Automation and Capacity

To be as agile as you can be, developers and test engineers need the ability to spin up on demand as many test environments as they require. Ideally, these environments should be replicas of the production environment. To achieve this objective, you need automation  not just at the application level, but also at the infrastructure level  and you need a large amount of capacity. If your application runs in production in your on-premises data center, meeting this objective can be challenging. Automation of infrastructure components is not easy, and capacity is expensive. Furthermore, the development and test cycle is characterized by fluctuating demand for test environments. This means that you spend a lot of money only to have environments sitting idle most of the time, or you provide a substandard lab environment to your development and QA team, which leads to a tremendous amount of inefficiency.

The cloud can be an interesting alternative due to its inherent on-demand nature and infinite capacity. However, moving your development and test workloads to the cloud takes a lot of effort in terms of application migration and automation  and after all that investment, you still end up with an environment that doesn't look like your production environment because the cloud is completely different from your data center. The fact that each public cloud is built on an entirely different infrastructure is the root cause of this problem. One approach is to choose a public cloud which is identical to your private datacenter  for example if you run VMware internally you can choose a VMware-based public cloud and try to workaround the networking and automation. Alternatively you could look to emerging technologies that take a new approach by abstracting the application from the underlying infrastructure across private and public clouds, such as nested virtualization and cloud hypervisors for existing datacenter applications or containers for new applications. Such advances represent an important way to streamline enterprise application development and test processes.

Navin Thadani is senior vice president of products at Ravello Systems (an overlay cloud service that is powered by high-performance nested virtualization and software-defined networking).

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!