Ansible for Infrastructure Testing

At $JOB we often find ourselves at customer sites where we see the
same set of basic problems that we have previously encountered
elsewhere (“your clocks aren’t in sync” or “your filesystem is full”
or “you haven’t installed a critical update”, etc). We would like a
simple tool that could be run either by the customer or by our own
engineers to test for and report on these common issues.
Fundamentally, we want something that acts like a typical code test
suite, but for infrastructure.

It turns out that Ansible is almost the right tool for the job:

It’s easy to write simple tests.

It works well in distributed environments.

It’s easy to extend with custom modules and plugins.

The only real problem is that Ansible has, by default, “fail fast”
behavior: once a task fails on a host, no more tasks will run on that
host. That’s great if you’re actually making configuration changes,
but for our purposes we are running a set of read-only independent
checks, and we want to know the success or failure of all of those
checks in a single operation (and in many situations we may not have
the option of correcting the underlying problem ourselves).

In this post, I would like to discuss a few Ansible extensions I’ve
put together to make it more useful as an infrastructure testing tool.

The ansible-assertive project

The assertive callback plugin modifies the output of assert
tasks and collects and reports results.

The idea is that you write all of your tests using the assert
plugin, which means you can run your playbooks in a stock environment
and see the standard Ansible fail-fast behavior, or you can activate
the assert plugin from the ansible-assertive project and get
behavior more useful for infrastructure testing.

A simple example

Ansible’s native assert plugin will trigger a task failure when an
assertion evaluates to false. Consider the following example:

While that doesn’t look like much of a change, there are two things of
interest going on here. The first is that the assert plugin
provides detailed information about the assertions specified in the
task; if we were to register the result of the failed assertion and
display it in a debug task, it would look like:

The assertions key in the result dictionary contains of a list of
tests and their results. The ansible_stats key contains metadata
that will be consumed by the custom statistics support in recent
versions of Ansible. If you have Ansible 2.3.0.0 or later, add
the following to the defaults section of your ansible.cfg:

A callback plugin for better output

The assertive callback plugin provided by the ansible-assertive
project will provide more useful output concerning the result of
failed assertions. We activate it by adding the following to our
ansible.cfg:

Machine readable statistics

The above is nice but is still primarily human-consumable. What if we
want to collect test statistics for machine processing (maybe we want
to produce a nicely formatted report of some kind, or maybe we want to
aggregate information from multiple test runs, or maybe we want to
trigger some action in the event there are failed tests, or…)? You
can ask the assertive plugin to write a YAML format document with
this information by adding the following to your ansible.cfg:

[assertive]
results = testresult.yml

After running our playbook, this file would contain:

groups:-hosts:localhost:stats:assertions:2assertions_failed:1assertions_passed:1assertions_skipped:0tests:-assertions:-test:'''apples'' in fruits'testresult:failedmsg:youhavenoapplestestresult:failedtesttime:'2017-08-04T21:20:58.624789'-assertions:-test:'''lemons'' in fruits'testresult:passedmsg:Allassertionspassedtestresult:passedtesttime:'2017-08-04T21:20:58.669144'name:localhoststats:assertions:2assertions_failed:1assertions_passed:1assertions_skipped:0stats:assertions:2assertions_failed:1assertions_passed:1assertions_skipped:0timing:test_finished_at:'2017-08-04T21:20:58.670802'test_started_at:'2017-08-04T21:20:57.918412'

With these tools it becomes much easier to design playbooks for
testing your infrastructure.