Behaviour-Driven End-to-End Testing

Towards a living documentation

To successfully build complex software,
it is important to reach a shared understanding regarding
the product and its desired capabilities amongst
developers, product owners and end users.
This outlines the required external quality of
a piece of software and makes its progress
measurable.

Alignment of developers’ and stakeholders’ mental models can be achieved
through conversation, by employing and evolving a ubiquitous language:
a common language that correctly,
consistently and unambiguously expresses the business domain.

This language is continuously leveraged throughout the development process:
when creating and discussing user stories and requirements,
when developing the software using domain objects
and when confirming the requested feature works as expected
through testing — acceptance or otherwise.

Specification by example is a way of doing BDD,
identifying and exploring business rules by discovering, discussing and
challenging examples for each conceived scenario.

Gherkin

A common way of capturing business requirements through BDD
is done using the Gherkin language,
a human-readable language containing a few formulas for writing clear specifications.

In Gherkin, each feature is subdivided in scenario’s which are described
using a title and a combination of statements such as:

Given <situation>

When <action>

Then <consequence>

Every statement is a high level description of what step
is to be taken: implementation details have no place here.
Instead of “I type Alex in the name field and click on the submit button”,
it would be “I fill in my name as Alex and send the form.”

Examples can be given through multiple scenarios or by
defining variables in scenario outlines and
supplying data for these variables through a defined table.

A feature can be supplied with background information, usually
in the style of a user story:

As a <Role>,

I want <Desired Behaviour>,

in order to <Goal>

Cucumber

The benefit of this formulaic yet human-readable language is
that it can be automated. Cucumber is a tool,
implemented in several programming languages, that does just that.
Gherkin steps are converted into executable code
by implementing step definitions that are parsed and run by Cucumber.

BDD and executable specifications allow for
outside-in test-driven development,
starting from the higher-level requirements and ending in lower-level
implementation details.

Because these feature specifications are readable by software,
they serve as a testable documentation of the application’s behaviours,
integrated in the entire development process.
This increases software quality and developer confidence as
the feature correctness (atleast, from a behavioural standpoint)
is continuously verified by developers, stakeholders and,
through the power of continuous integration, computers.

This integrated solution forms the basis for a
living documentation,
a documentation that is updated and evaluated regularly,
and makes the process of documenting features
less painful than writing and updating it by hand.
For extra context,
it is often a good idea to add more information
to these features, even if these details are not executable.
Explaining the behaviour of the system and the entities in the system can
help understand the system and aid the onboarding process of developers.

End-to-end testing

End-to-end testing is a form of testing that covers all layers
of an application: from one end to another. An end-to-end test
drives the software as if it were a user and verifies expectations
and requirements a user may have regarding the software’s behaviour.

This makes them suitable for asserting high level
requirements without a lot of details, such as basic happy path
use cases and critical failure scenario’s.

However, end-to-end tests touch the entire application,
including external interactions with databases and web APIs.
This makes them usually slower to run than other tests.
To add insult to injury, end-to-end tests are more brittle than other
tests due to their heavily integrated nature:
any change regarding a feature in any layer could
affect the outcome of the test.

Agile Test Pyramid

A common guide in Agile circles
for formulating a testing strategy is found in
the test pyramid.
There should be a strong foundation in unit tests,
component integration should be tested using integration tests and
use cases should be covered in end-to-end tests.

Unit tests are typically relatively easy to write,
isolated, flexible and fast to run.
They cover a single module, object or a group of objects (a component),
and verify their public API works as expected. This includes
happy paths, failure cases and edge cases.

Integration tests cover the integration of software components,
between layers or services. This makes them more difficult to write,
slightly slower and more brittle than unit tests.

At the top of the pyramid are end-to-end tests, acceptance and contract tests.
These verify the exposed behaviour of an entire application or service.
Because these are slow to run and difficult to maintain,
it is often more prudent to spend more time in unit and integration tests.
Nevertheless, they have their place in verifying high-level acceptance criteria
and asserting the entire application works as intended.

Example: Wikipedia

To get started and put this theory into practice, we are going
to perform end-to-end tests for a web application you probably know.
Wikipedia. First, create a directory for the project and make
sure the directory structure test/e2e/features exists.

This is where the features for our end-to-end tests will reside.

If you want to check out the code and CI pipeline right away,
go ahead and check it out on GitLab.

The user story

in order to easily find entries without having to browse each page by hand

The feature

We discuss the feature with the product owner and developers
and come up with the following scenario’s during a brainstorm session.

Feature: Searching on Wikipedia
As a User,
I want to find information by a relevant keyword,
In order to easily find entries without having to browse each page by hand
Scenario: A perfectly matching entry is found
Given I am on the English Wikipedia main page
When I search for "Socrates"
Then I should see the entry with the title "Socrates"
Scenario: No possible matching entry is found
Given I am on the English Wikipedia main page
When I search for "Socratasty"
Then I should see the search results page
And I should not see a list of search results
Scenario Outline: Multiple possible matching entries are found
Given I am on the English Wikipedia main page
When I search for "<search term>"
Then I should see the search results page
And the search results should contain "<entry>"
Examples:
| search term | entry |
| Socratez | Socrates |
| Pllato | Plato |
| Greece Philosophy | Ancient Greek philosophy |

Cucumber.js is used to make our (Gherkin-based) specification executable.

The steps in our Gherkin need to be translated into actions,
using step definitions. Definitions can be written any way we want
and employing any library we want. Because we are going to do end-to-end testing on
a web application, we should use a tool for emulating a browser.

However, as Gherkin is merely an abstraction layer for describing our tests,
Cucumber could also be used for other types of tests (either higher or lower level)
for performing tests on other platforms.
It seems to work best for highly abstract test cases
like the ones found in end-to-end functional tests.

We are going to use Nightmare.
Nightmare is an electron-based (headless) browser automation tool,
which can be used for UI testing.
It’s a bit like the popular testing tool
Selenium but quicker, easier to use
and you don’t need Java.
Keep in mind that electron uses the chromium engine,
so certain tests may not cover all browsers.

In order to easily assert certain expectations and requirement,
we can use an assertion library, such as chai.

The end-to-end feature we just wrote reside in a test/e2e/features directory
relative to our project root.

NPM scripts

In order to run testing scripts, we can add them to package.json, under scripts:

The --fail-fast flag will cause a failure to stop running tests.
This allows for a quick feedback loop and a lower build machine utilisation.

Let’s try and run the end-to-end tests:

npm run e2e

You should see the following:

UUUUUUUUUUUUUUUUUUU

This means Cucumber has read our feature file,
but only sees undefined steps. Furthermore,
Cucumber gives us more information about how to
implement the missing steps in its warnings.

The step definitions

In our test/e2e/features directory, create a
directory called step-definitions. This
is where – you guessed it – our step definitions
will reside. Let’s create a file for our wikipedia-related definitions:
wikipedia.js.

Because our tests are going to be simple, we are not going
to define a context or “World” for our steps.
A World is an isolated context for each scenario,
exposed to the hooks and steps as this.
Because Nightmare can hold that context for us,
we are going to reference Nightmare directly
using a mutable variable called browser and
make sure a fresh instance is started before each scenario
using Cucumber’s Before hook.

You can configure Nightmare
so that you can see what is happening,
by adding the show: true configuration.
But you probably want to turn this off when running
it during CI.

letbrowser;Before(function(){browser=Nightmare({show:true});});

It’s time to define our first step: going to the English Wikipedia
main page. We represent the URL as a constant. Alternatively,
you could create a step that references a country by a country code
and use a variable for that.

constENGLISH_WIKIPEDIA="https://en.wikipedia.org";Given("I am on the English Wikipedia main page",function(){returnbrowser.goto(ENGLISH_WIKIPEDIA);});

We could make this extremely terse by using arrow syntax.
However, if we were to introduce a custom World,
it would be added through the this scope.
Arrow functions do not bind this, so to reduce future pains
we are going to type a bit more characters.

Run npm run e2e again. It goes to the English Wikipedia main page!
However, our Cucumber test is still running while the browser is open
(until it times out). In order to make sure the browser is closed after
each scenario, we can end the browser session in an After hook:

After(function(){returnbrowser.end();});

Stopping and starting the browser every time may take too long.
If this is the case, you might want to share browser
session but provide the necessary manual clean-up measures in the
Before and/or After hooks. In our case,
replacing Before and After with
BeforeAll and AfterAll cuts the execution time in half.
Don’t forget to change the required functions as well!

Let’s add to the end of our file and define other steps in a
logical chronological order.
We are using a variable string
in When I search for "Socrates", Cucumber can easily
deal with this.
We fill-in the search input box
and click search button by referencing them using their CSS selector.

Now it’s time to evaluate our result and check if the
entry title is as we expect. Nightmare offers the
evaluate method to execute client-side JavaScript
on the page and return the result as a promise.
So we get the element with the ID “firstHeading”,
get its text content and trim it.
Then, back in our (Node) test context, we
use Chai’s expect API to verify that the title is correct.

Then("I should see the entry with the title {string}",function(expectedTitle){returnbrowser.evaluate(()=>document.getElementById("firstHeading").textContent.trim()).then((titleText)=>expect(titleText).to.equal(expectedTitle));});

And our first scenario is defined!
And, as you may have noticed, we have defined a lot of steps
for the other scenario’s as well: reusability. 🙌

Determining whether we are on the search results page is implemented
in the same way as verifying an entry page. As these pages are
probable to evolve differently over time
and the criteria for verification might change independently,
this duplication gives us flexibility in the future.

Let’s also extract the search results title and create a constant for it
we maintain at the same place as our previous constant:

Then("I should see the search results page",function(){returnbrowser.evaluate(()=>document.getElementById("firstHeading").textContent.trim()).then((titleText)=>expect(titleText).to.equal(SEARCH_RESULTS_TITLE));});

In order to determine whether there are search results,
we can verify the presence/absence of the class mw-search-results
by determining its length.

Then("I should not see a list of search results",function(){returnbrowser.evaluate(()=>document.getElementsByClassName("mw-search-results").length>0).then((hasSearchResults)=>{expect(hasSearchResults).to.be.false;});});

Lastly, we need to verify the list of search results
contains a desired term. For this we select the matches
by class name, convert the HTMLCollection to an array,
get the text content for each element and determine
whether the expected title is present of the shown titles.

Note that we inject the variable from Node context to
the front-end context in the evaluate function.

Optimizing

In our When I search for {string} definition,
we use the .type method to type in the search input.
This fires type events (i.e. onKeyDown) and is slower
than using the .insert method. Strangely,
we need to add a slight .wait
before continuing to the next step because this is so fast.
250 ms works for me,
but you might use a different value or,
instead of waiting for a fixed amount of time,
wait for a certain element to appear.

Prettier output

OK, so far so good. But what are these dots all about?
It would be nice if we could see our scenario’s in text
while they’re running — especially if we are going to run our
tests in a headless environment such as a CI pipeline.

Sadly, although Cucumber offers some formatting options,
this specific one is not included by default.

Continuous integration

Now that we have our features running locally,
it should be easy to incorporate this in a
continuous integration pipeline.
In practice, you would want to run these tests
in a test or staging environment.

Basic setup in GitLab CI

If we are using a (private) GitLab repository,
we can enable GitLab’s CI feature by
adding a .gitlab-ci.yml file to our project and
setting up some configuration in there.
Basically, we just want a node container,
install our dependencies and run our test.

As an aside, keep in mind that you could also create a
prepared image so even more time can be shaved off the
tests as installation steps can be skipped.

✖ Given I am on the English Wikipedia main page # test/e2e/features/step-definitions/wikipedia.js:15
Error: function timed out after 5000 milliseconds
at Timeout._onTimeout (/builds/arothuis-blog-experiments/wikipedia-bdd/node_modules/cucumber/lib/user_code_runner.js:93:22)
at ontimeout (timers.js:475:11)
at tryOnTimeout (timers.js:310:5)
at Timer.listOnTimeout (timers.js:270:5)

It looks like Cucumber times out on the execution of the step.
This means something went wrong inside the step definition:
either in the traffic between the test runner and the web
or in the way Nightmare works.

Fixing nightmare CI issues

After debugging for a while,
I figured out the problem was in Nightmare by
running Nightmare in debug mode by adding the
environment variable DEBUG=nightmare:

nightmare electron child process exited with code 127: command not found - you may not have electron installed correctly

Turns out, it is pretty difficult to run a headless browser
inside of a GUI-less CI runner. 😉

To mitigate this,
we can use X-virtual framebuffer (xvfb), an X11 display server
that can execute operations in memory without showing screen
output. To use this, we need to install xvfb and its dependencies
in our node container and run the npm run e2e command
through xvfb.

This fixes the problem. Splendid. 🧐
As a trade-off, it makes our build
a tad slower because of the required extra setup.
It might be beneficial to introduce a prebuilt image
to shave off some more time.

Alternatively, you could split up your test in steps
and make them run in parallel in the CI pipeline.

In conclusion

The code (and CI pipeline) for this tutorial can be found on
GitLab.
End-to-end testing does not need to be scary, difficult or slow
and the produced code is nice and readable.
The features, written in an understandable, common language,
can even be embedded in the development process in the
form of specification by example.
In turn, using continuous integration, this can serve
as a basis for a living documentation.

However, end-to-end tests are not a
be-all and end-all testing solution.
In order to keep feedback frequent and
quick and make the test suite less brittle,
a testing strategy should be applied that
leverages unit tests, integration tests, end-to-end tests
and other tests fit for verifying the conditions of
satisfaction are met. The Agile test pyramid can be
a rule-of-thumb for drafting such a strategy.