Year:

Our Experiences with Chef: Using Vagrant and Chef to Enable Quality Assurance

By: Clinton Wolfe 18 Apr '13

Level:

This article reviews a fundamental concept or principle

This article reviews an intermediate concept or principle

This article reviews an advanced concept or principle

This article expresses an opinion or just a downright rant

This is part 3 in a series exploring our experiences at using Chef to deploy multiple architecture with different technology stacks and business requirements as need by our customers. You may also want to read Part 1 and Part 2.

Intended Audience

In previous entries, we examined best practices in developing Chef cookbooks. In this article, we shift our focus to a powerful, but often overlooked application of Chef: its use as a fixture generation system for application quality assurance testing. Only basic Chef knowledge is needed, though some familiarity with the software development lifecycle is helpful. We’ll be showing how we integrate git, Chef-solo, Vagrant, rspec, jUnit and Jenkins into one QA provisioning stack. To be clear, this article is not about testing Chef cookbooks - there are severalgoodtools and tutorials available for those looking for that.

Repeatability: Not just for Scale-Out

Chef’s main user base is composed of operations engineers. They generally see it as a tool that brings massive automation to their provisioning tasks. If you can configure one machine with Chef, you can easily configure a thousand. Repeatability is key; you don’t want the servers to vary, except in very specific ways that you know about and control. So, Chef lets you define trees of build parameters (called attributes) that you can vary in numerous ways.

Away from Ops, QA practitioners are dealing with similar problems. Instead of scaling out for production load reasons, the QA team wants to build and discard many systems, testing permutations of build, operating environment, seed data, test set and so on. The QA systems load is also highly elastic - in the final days of testing a release, the ability to run many variations simultaneously is crucial. This gets us away from having “the QA server” that is in a precious state - we can always spin up more QA environments.

This assumes that systems and application deployment are integrated. Chef offers many avenues for this; you can use application deployment tools like Capistrano inside Chef, or use Chef-specific tools like the “artifact” cookbook to deploy applications from tarballs or other distribution media. For shops that do not use an application deployment tool, Chef itself can be a very effective tool to deploy applications (performing code checkouts, loading SQL into databases, starting webservers, etc).

Choosing a Platform for QA Provisioning

Choice of platform is largely dictated by the needs of both the developers and the operations team. The QA group will want to be able to allocate and deallocate environments on-demand to respond to testing needs. If self-hosting, this may require tooling or coordination with the operation group. Two additional methods that might be more appealing include cloud-based hosting (which makes a great deal of sense if the production application is cloud-based anyway) and the desktop virtualization harness, Vagrant.

Vagrant allows you to use existing hypervisors like VirtualBox, while executing full Chef runs and performing testing against the VM. VMs can also be brought up in clusters, and have private networking between them, enabling testing of distributed systems in a more realistic fashion. The most recent versions of Vagrant include support for multiple hypervisor platforms, including cloud-based providers such as AWS. Vagrant also allows you to run an environment locally (ideal for developers). We found it to be an ideal platform, but missing several key pieces.

Vagrant Cookbook Fetcher

The first piece that we found missing was a way of managing which cookbooks, roles and data bags were needed for a project. Berkshelf and Librarian both fill this role to a certain degree, but we found their learning curve to be rather steep - a “too big” tool that didn’t entirely meet our needs. We wanted to be able to share our roles, databags, handlers and other auxiliary Chef configuration across projects - all of the cookbook distribution methods out there seem to only focus on sharing the cookbook itself.

We developed a vagrant 1.0 plugin, vagrant-cookbook-fetcher, which reads a CSV -formatted file (specified by a URL) listing each checkout to perform. Because Chef allows multiple cookbook paths, but only one role path, we then created a combined directory in which it symlinks to the various roles. The plugin configures Vagrant’s Chef-solo provisioner to find the roles, etc., in the new combined location. A similar approach is used for databags, handlers, tests and other shared features. Finally, hooks are added to the ‘up’ and ‘provision’ vagrant commands, to trigger the checkout at provision time. The tool is designed to work identically to the Chef-solo-helper tool that we use when bootstrapping production VMs.

Using vagrant-cookbook-fetcher, we can easily recreate any environment, even if the Chef configuration is spread over several repos that may have different access policies.

Vagrant + rspec

The next step is to be able to write tests against the VM that we have created. In the Ruby world, there are several test platforms (minitest, rspec, and cucumber) covering a continuum from unit testing, explicit behavior-driven-testing, and natural-language behavior-driven testing. Of the three, rspec is the happy median; you can use several testing paradigms in rspec, and support for it is broad.

Again, we turned to creating a vagrant 1.0 plugin, vagrant-rspec-ci. Here, our goal was to be able to write specfiles in a project repo that expresses tests against a VM, run the tests easily, and produce test results in a format that could be consumed by our continuous integration server, Jenkins.

The vagrant-rspec-ci tool lets us run rspec tests, then saves the output of those tests in jUnit format (using ci_reporter) in a directory. There are several Vagrantfile configuration variables you can use to control its output and behavior; the most interesting are listed below.

config.rspec.enable_ci_reporter

Defaults to true - whether to produce jUnit reports

config.rspec.tests = [ ‘*_spec.rb’ ]

Array of globs that specify which test files to run.

config.rspec.rspec_bin_path

Defaults to finding the rspec binary in vagrant’s gemset, then falls back to plain ‘rspec’.

All of the tests are executed outside the VM. We chose this approach for several reasons:

we did not want to alter the system build to include testing tools, like rspec

we want to test the behavior of the application server as a unit - generally, a blackbox

vagrant offers several avenues to probe inside the VM if needed, without actually moving the entire test execution there

Probing A Vagrant VM

When testing the behavior of a VM, there are several tasks that come up repeatedly. Some are test predicates we would want to be able to use repeatedly; some are simply informational queries against Vagrant or VirtualBox.

We have captured several of these tasks into a set of Ruby classes to represent a Vagrant VM as a test subject. vagrant-test-subject is typically used as follows:

you@somewhere $ cat combined/spec_ext/cdn_services_spec.rb
require "spec_helper"
describe "TrafficServer Service" do
before(:all) do
@vm = VagrantTestSubject::VM.attach()
end
it "should appear as a healthy service" do
@vm.should have_running_service("trafficserver")
end
it "should be listening on localhost:80" do
@vm.should be_listening_on_localhost(80)
end
it "should be listening on external_ip:80" do
@vm.should be_listening_on_external_ip(80)
end
it "should be the right process name on port 80" do
process = @vm.process_name_listening('127.0.0.1', 80)
process.should_not be_nil
process.should match(/\/opt\/ts\/bin\/traffic_manager/)
end
it "should respond with HTTP 404 to / on port 80" do
@vm.http_get('/').should be_http_not_found
end
end

The first interesting line is the attach() method call.

@vm = VagrantTestSubject::VM.attach()

This looks for the Vagrant VM running in the current directory, and extracts some key information:

VirtualBox VM GUID

VirtualBox OS type

ssh options

port redirection map

The call to attach() is actually a factory method, and will return an OS-specific subclass of VagrantTestSubject::VM.

Next, we define several tests (or “examples”, as they are called in rspec). Each examines the VM, using various methods defined on the object. Some highlights are below.

@vm.map_port(80)

Returns the port on the VM host that is being redirected to the given port on the VM guest, if any.

@vm.has_running_service?(“httpd”)

Using OS specific means, determines if the named service is running and healthy. Facilities are provided to allow OS-specific aliases for service names.

@vm.process_name_listening('127.0.0.1’, 80)

If possible, determines the command portion of the process listening on a particular IP address and port.

@vm.http_get(url_path)

Returns a Net::HTTPResponse that contains the result of GETing the URL path on port 80.

Together with rspec extensions (like rspec-http), vagrant-test-subject allows the tester to write fluent tests against the VM. vagrant-test-subject is a very young package, and is evolving rapidly.

Parameterizing The Build

As discussed in part 2, we try to decompose our Chef roles to be orthogonal along certain dimensions, such as OS choice, datacenter location, hypervisor, etc. This means that we can vary the Vagrant build by simply changing which of these roles is used in the Vagrantfile. Because a Vagrantfile is Ruby, we have many choices for such a mechanism; but the simplest working approach would be to select each role based on an environment variable.

The use of environment variables as the mechanism for varying the build has an additional benefit - our CI server, Jenkins, can parameterize build jobs using the same mechanism. So we simply have a Jenkins job that exposes VAGRANT_HINT_OS with two different values, ‘centos’ and ‘omnios’.

Fixtures as Recipes and Roles

The last piece of the puzzle needed for testing is some way of initializing the test subject with state; i.e., test fixtures. Types of state in fixtures include filesystem assets, database records, cache entries, queue state and other project-specific needs. Because the fixture types vary by project, and the process by which to install them varies as well, we need a generic way of running variety of tasks within the VM to converge to a known state. Obviously, Chef recipes are a good fit here.

Following the attribute definition practices we outlined in part 2, we implemented a recipe for each fixture type. Each recipe expects to find details about the fixture in the node’s attributes. Together, that allows us to define a specific set of fixtures using Chef roles. A simple example will help illustrate.

The Fontdeck project’s CDN nodes store fonts on the filesystem. When testing the font serving system, we’d like to be able to load a single known font into the test subject VM representing the CDN, then try to access the font as both an authorized and unauthorized user.

We then define a companion recipe that purges the fixture. While not entirely needed (we could just destroy the VM and start over, or use VirtualBox’s snapshotting facility) this approach is simple and fast. There are safeties in place to ensure that the purge will not run unless specifically authorized.

Now that we can install and remove font assets, we define a specific fixture as a role.

Conclusion

The worlds of Chef, Vagrant, and Jenkins are experiencing rapid changes. Together, they can be used to solve serious problems in QA, allowing us to view entire systems - or groups of systems - as test subjects. To reach that goal, a lot of glue code is still required; and some of the best tools have yet to emerge. Even developing tools in-house, we’ve had some payoff from the process - we halted the release of a trafficserver module because it failed a test in QA. One month ago, the module was considered to be “only testable in production”. We look forward to rolling out this approach to more projects, and reaping more benefits as we go along.