Verifying TDD Scenarios

Now that Ansible has done all the information gathering for us it’s time to finally make use of it. In this post I will show how to use Ansible to run traceroutes from and to the hosts defined in a test scenario and perform verification of the results of those tests. Should any of those tests fail, Ansible will provide a meaningful description of what exactly failed and why. While doing all this I’ll introduce a couple of new Ansible features like conditional looping and interactive prompts.

TDD Playbook

In order to run and verify tests I will create a separate playbook. It makes sense to separate it from the previous playbook simply because this time it will be used multiple times, while the information gathering playbook can only be run once. The new playbook will have to accomplish the following tasks:

Select which scenario to test

Run tests as specified in that scenario

Parse test results

Verify that test results conform to the specification

Selecting test scenario

Our scenarios/all.txt file contains multiple test scenarios each defined by a name. Each test scenario represent a certain state in the network, e.g. scenario #1 tests how the network behaves in a normal state with no outages or link failures, scenario #2 tests how traffic should be rerouted in the event of primary link failure. Inside each scenario there are one or more test steps each testing a behaviour of a particular traffic flow, e.g. traffic from router R1 to router R4 should traverse R2 followed by R3. Each steps contains keywords From, To and Via which identify source, destination and transit routers. This is how a typical scenario file looks like.

In the previous post I showed how to parse and store these scenarios in YAML dictionary in group_vars/all.yml file, which makes this information automatically available to any future playbooks. So in the new playbook ~/tdd_ansible/cisco_tdd.yml all we need to do is let the user decide which scenario to test:

This playbook contains a standard header followed by a vars_prompt section which prompts user to select a particular scenario number and stores the selection in scenario_num variable. The first task in the playbook extracts scenario name and steps from scenarios dictionary stored in group_vars/all.yml file and stores them in respective variables. Of course this task is optional and it’s possible to reference the same data using full notation, however I prefer things to be more readable even if it leads to some inefficient memory use.

Run test specified in scenario steps

Now it’s time to run traceroutes to see how the packets flow in the network. As we did in one of the previous posts we’ll use the raw module to run traceroutes. However this time, instead of running a full-mesh any-to-any traceroutes we’ll only run them if they were defined in one of the test steps. Indeed, why would we run a traceroute between devices if we’re not going to verify it? Ansible’s conditionals will help us with that. For each of the hosts in cisco-devices group we’ll look into scenario_steps dictionary and see if there were any tests defined and if there were, we’ll run a traceroute to each of the destination hosts.

When both a loop (with_dict) and a conditional (when) are defined in a task, Ansible does the looping first. That’s why if a test scenario is not defined for a particular host (e.g. R3) the conditional check will fail and stop execution of the playbook. To overcome that we can use Ansible (Jinja) templates inside the with_dict loop. Appending |default({}) will instruct Ansible create an empty dictionary in case scenario_steps[inventory_hostname] does not exist which will make conditional return False and skip this host altogether.

Parse test results

There’s no silver bullet when it comes to parsing of the outcome of traceroute command. We’ll have to use Python to traverse the textual output line by line looking for msec and storing all found IPs in a list.

Ansible module contains a class with a single public method compare. The first thing it does is converts the list of IP addresses of transit devices into a list of hostnames. That’s where the IP-to-Hostname dictionary created in the previous playbook is first used. IP address is used as a lookup key and the Hostname is extracted from the first element of the returned list (second element, the interface name, is currently unused). The private method __validatepath is used to confirm that devices listed after Via in a test scenario are present in the traceroute path in the specified order. If this verification fails, the whole module fails and the error message is passed back to Ansible playbook.

TDD in action

So let’s finally see the whole thing action. First let’s modify a 4-router topology so that traffic from R1 to R4 is routed via R2 and R3 (a simple delay 9999 on Ethernet0/1 will do). Now let’s run the first scenario and verify that no errors are displayed.

Nothing much really, which is good, that means all scenarios were verified successfully. Now let’s see how it fails. The easiest way is to run the tests from a second scenario, the one that assumes that the link between R1 and R2 failed and all the traffic is routed via R4.

Here all 3 test steps within a scenario failed. Ansible displayed error messages passed down by our module, specifying the expected and the actual path.
Now if we simply shutdown Ethernet0/0 of R1 to simulate a link failure and re-run the same scenario all tests will succeed again.

So there it is, a working network TDD framework in action. I still haven’t covered a lot of corner cases (e.g. when traceroute times out) and deployment scenarios (device with VRFs) but it should still work for a lot of scenarios and can be easily extended to cover those corner cases.