Post navigation

ActiveRecord validations are a very powerful tool. They allow you clearly and concisely declare rules for your model that must be met in order for any instance of that model to be considered “valid”. And using them gives you all sorts of nifty error handling and reporting out of the box.

This validation does exactly what it says. It won’t consider an instance of Account valid unless the email address associated with it is unique across all accounts.

Or will it?

This particular validation will perform a SELECT to see if there are any other accounts with the same email address as the one being validated. If there are, then the validation will trip and an error will be recorded. If not, you will be allowed to save the record to the database.

But, here is the problem. Somewhat obviously, ActiveRecord validations are run within the application. There is a gap in time from when ActiveRecord checks to see if an account exists with the same email address, and when it saves that account to the database. This gap is small, but it most certainly exists.

Let’s say we have two processes running our application, serving requests concurrently. At roughly the same time, both processes get a request to create an account with the email address “john@johnpwood.net”. Consider the following series of events:

Process 1: Receive requestProcess 2: Receive requestProcess 1: Verify no account exists with the email “john@johnpwood.net”Process 2: Verify no account exists with the email “john@johnpwood.net”Process 1: Save the record to the databaseProcess 2: Save the record to the database => whoops!

At the end of this series of events, we will have two accounts in the database with an email of “john@johnpwood.net”.

Relational database constraints exist for this exact reason. Unlike the application processes, the database knows exactly what it contains when it processes each request. It is the only one that can reliably determine if it is about to create a duplicate record.

Some Rails developers feel that database constraints, including null constraints, foreign keys, and unique indexes, are not necessary because of the validations performed by ActiveRecord. As you can see by the example above, this is certainly not the case. It is almost a certainty that non-null columns without a non-null constraint will eventually contain null data, foreign key columns without a foreign key constraint will eventually contain ids to records that do not exist, and unique columns without a unique index will eventually contain non-unique data. I’ve seen many projects that solely relied on ActiveRecord validations to ensure the integrity of their data. Every single one of those projects had junk data in the database.

A database with inconsistent or bad data can be incredibly difficult to work with. Relational databases have many tried and true tools that can be used to ensure that the integrity of your data remains in-tact. All of them can easily be setup in a Rails migration. So, be sure to use them when creating or altering tables or columns.

Our deployment process at UrbanBound has matured considerably over the past year. In this blog post, I’d like to describe how we moved from prescribed deployment windows with downtime, to a deploy-when-ready process that could be executed at any point in time.

The Early Days

About a year ago, UrbanBound was in the middle of transitioning from the “get it up and running quickly to validate the idea” stage to the “the idea works, let’s make sure we can continue to push it forward reliably” stage. Up until that point, we had accrued a significant amount of technical debt, and we didn’t have much in the way of a test suite. As a result, deploys were unpredictable. Sometimes new code would deploy cleanly, sometimes not. Sometimes we would introduce regressions in other areas of the application, sometimes not. Sometimes deploys would interfere with users currently using the app, sometimes not. Our deployment process was simply not reliable.

Stopping the Bleeding

The first order of business was to stop the bleeding. Before we could focus on improving the process, first we needed to stop it from being a problem. We accomplished this with some process changes.

First, we decided to limit the number of releases we did. We would deploy at the end of each two week sprint and to push out critical bug fixes. That’s it. We made some changes to our branching strategy in git to support this work flow, which looked something like this:

All feature branches would be based off of an integration branch. When features were completed, reviewed, and had passed QA, they would be merged into this integration branch.

At the end of every two week sprint, we would cut a new release branch off of the integration branch. Our QA team would spend the next few days regression testing the integration branch to make sure everything looked good. From this point on, any changes made to the code being released, a fix for a bug QA found for example, would be made on this release branch, and then cherry picked over to the integration branch.

When QA was finished testing, they would merge the release branch into master, and deploy master to production.

Emergency hotfixes would be done on a branch off of master, and then merged into master and deployed when ready. This change would then have to be merged upstream into the integration branch, and possibly a release branch if one was in progress.

This process change helped us keep development moving forward while ensuring that we were releasing changes that would not break production. But, it did introduce a significant of overhead. Managing all of the branches proved challenging. And, it was not a great use of QA’s time to spend 2-3 days regression testing the release branch when they had already tested each of the features individually before they were merged into the integration branch.

Automated Acceptance Testing

Manually testing for regressions is a crappy business to be in. But, at the time, there was no other way for us to make sure that what we were shipping would work. We knew that we had to get in front of this. So, we worked to identify the critical paths through the application…the minimum amount of functionality that we would want covered by automated tests in order to feel comfortable cutting out the manual regression testing step of the deployment process.

Once we had identified the critical paths through the application, we started writing Capybara tests to cover those paths. This step took a fair amount of time, because we had to do this while continuing to test new features and performing regression testing for new releases every two weeks. We also had to flush out how we wanted to do integration tests, as integration testing was not a large part of our testing strategy at this point in time.

Eventually, we had enough tests in place, and passing, that we felt comfortable ditching manual regression testing effort. Now, after QA had passed a feature, all we needed to see was a successful build in our continuous integration environment to deem the code ready for deployment.

Zero Downtime Deploys

We deploy the UrbanBound application to Heroku. Personally, I love Heroku as a deployment platform. It is a great solution for those applications that can work within the limitations of the platform. However, one thing that is annoying with Heroku is that, by default, your application becomes totally unresponsive while it reboots after a deploy. The amount of time it is down depends on the application, and how long it takes to boot. But, this window was large enough for us that we felt it would be disruptive to our users if we were deploying multiple times per day.

Thankfully, Heroku offers a rolling reboot feature called preboot. Instead of stopping the web dynos and then starting the new ones, preboot changes the order so that it first starts the new web dynos, and makes sure they have started successfully and are receiving traffic before shutting down the old dynos. This means that the application stays responsive during the deploy.

However, preboot adds a fair amount of complexity to the deployment process. With preboot, you will have the old version of the application running side-by-side with the new version of the application, worker dynos, and the newly migrated database, for at least a few minutes. If any of your changes are not backwards compatible with the older version of the application (a deleted or renamed column in the database, for example), the old version of the application will begin experiencing problems during the deploy. There are also a few potential gotchas with some of the add-ons.

In our case, the backwards compatibility issue can be worked around fairly easily. When we have changes that are not backwards compatible, we simply deploy these changes off hours with the preboot feature disabled. The challenge then becomes recognizing when this is necessary (when there are backwards incompatible changes going out). We place the responsibility for identifying this on the author of the change and the person who performs the code review. Both of these people should be familiar enough with the change to know if it will be backwards compatible with the version of the application currently running in production.

The End Result

With the automated acceptance testing and zero downtime deploys in place, we were finally ready to move to a true “deploy when ready” process. Today, we deploy several times a day, all without the application missing a step. No more big integration efforts, or massive releases. We keep the deploys small, because doing so makes it much easier to diagnose problems when they happen. This deployment process also allows us to be much more responsive to the needs of the business. In the past, it could be up to two weeks before a minor change made it into production. Today, we can deploy that change as soon as it is done, and that’s the way it should be.

Feature specs are great for making sure that a web application works from end to end. But feature specs are code, and like all other code, the specs must be readable and maintainable if you are to get a return on your investment. Those who have worked with large Capybara test suites know that they can quickly become a stew of CSS selectors. It can make the tests difficult to read and maintain if CSS selectors are sprinkled around through the suite. This is even more problematic if the name of the CSS selector doesn’t accurately reflect what the element actually is or does.

Introducing Page Objects

A page object wraps an HTML page, or fragment, with an application-specific API, allowing you to manipulate page elements without digging around in the HTML.

Page Objects are great for describing the page structure and elements in one place, a class, which can then be used by your tests to interact with the page. This makes your tests easier to read and understand, which in turn makes them easier to maintain. If used correctly, changing the id or class of an element that is used heavily in your feature specs should only require updating the page object, and nothing else.

Let’s look at an example. Here is a basic test that uses the Capybara DSL to interact with a page:

Let’s take a look at at that same spec re-written to use SitePrism, an outstanding implementation of Page Objects in Ruby:

In the SitePrism version of the spec, the majority of the CSS selectors have moved from the test into the CompanyOfficePage class. This not only makes the selectors easier to change, but it also makes the test easier to read now that the spec is simply interacting with an object. In addition, it makes it easier for other specs to work with this same page since they no longer need knowledge about the structure of the page or the CSS selectors on the page.

While SitePrism’s ability to neatly abstract away most knowledge about CSS selectors from the spec is pretty handy, that is only the beginning.

Repeating Elements, Sections, and Repeating Sections

Page Objects really shine when you completely model the page elements that your spec needs to interact with. Not all pages are as simple as the one in the above example. Some have repeating elements, some have sections of related elements, and some have repeating sections of related elements. SitePrism contains functionality for modeling allofthese. Defining the proper repeating elements, sections, and repeating sections in your page object allows you to interact with those elements and sections via SitePrism APIs that make your specs much easier to write, read, and understand.

Testing for Existence and Visibility

SitePrism provides a series of methods for each element defined in the page object. Amongst the most useful of these methods are easy ways to test for the existence and visibility of an element.

Using the Capybara DSL, the best way to check for the existence of an element is to expect that a CSS selector that uniquely identifies the element can be found on the page:

This becomes a bit easier with SitePrism, as we no longer have to use the CSS selector if our page object has an element defined:

Visibility checks are also simpler. Asserting that an element exists in the DOM, and is visible, can be a bit convoluted using the Capybara DSL:

Using SitePrism, this simply becomes:

The above code will wait until #blah is added to the DOM and becomes visible. If that does not happen in the time specified by Capybara.default_wait_time, then an exception will be raised and the test will fail.

Waiting for Elements

Using the above visibility check is a great way to ensure that it is safe to proceed with a portion of a test that requires an element be in the DOM and visible, like a test that would click on a given element. Waiting for an element to be not only in the DOM, but visible, is a great way eliminate race conditions in tests where the click target is dynamically added to the DOM.

Using Capybara Implicit Waits

By default, SitePrisim will not implicitly wait for an element to appear in the DOM when it is referenced in the test. This can lead to flakey tests. Because of this, we highly recommend telling SitePrism to use implicit waits in your project.

Summary

SitePrism is a fantastic tool. If you write feature specs, I highly recommend you check it out. It will make your specs eaiser to write, and maintain.

In a previous post, I wrote about how the proper use of Capybara’s APIs can dramatically cut back on the number of flaky/slow tests in your test suite. But, there are several other things you can do to further reduce the number of flaky/slow tests, and also debug flaky tests when you encounter them.

Use have_field(“.some-element”, with: “X”) to check text field contents

Your test may need to ensure that a text field contains a particular value. Such an expectation can be written as:

This can be a flaky expectation, especially if the contents of #some-text-field are loaded via an AJAX request. The problem here is that this expectation will check the value of the text field as soon as it hits this line. If the AJAX request has not yet come back and populated the value of the field, this test will fail.

A better way to write this expectation would be to use have_field(“.some-element”, with: “X”):

This expectation will wait Capybara.default_wait_time for the text field to contain the specified content, giving time for any asynchronous responses to complete, and change the DOM accordingly.

Disable animations while running the tests

Animations can be a constant source of frustration for an automated test suite. Sometimes you can get around them with proper use of Capybara’s find functionality by waiting for the end state of the animation to appear, but sometimes they can continue to be a thorn in your side.

At UrbanBound, we disable all animations when running our automated test suite. We have found that disabling animations have stabilized our test suite and made our tests more clear, as we no longer have to write code to wait for animations to complete before proceeding with the test. It also speeds up the suite a bit, as the tests no longer need to wait for an animations to complete.

Use has_no_X instead of !have_X

Testing to make sure that an element does not have a class, or that a page does not contain an element, is a common test to perform. Such a test will sometimes be implemented as:

Here, have_css will wait for the element to appear on the page. When it does not, and the expression returns false, it will be negated to true, allowing the expectation to pass. However, there is a big problem with the above code. have_css will wait Capybara.default_wait_time for the element to appear on the page. So, with the default settings, this expectation will take 2 whole seconds to run!

A better way to check for the non-existence of an element or a class is to use the has_no_X matcher:

Using to_not will also behave as expected, without waiting unnecessarily:

No sleep for the fast feature test

Calls to sleep are often used to get around race conditions. But, they can considerably increase the amount of time it takes to run your suite. In almost all cases, the sleep can be replaced by waiting for some element or some content to exist on the page. Waiting for an element or some content using Capybara’s built in wait functionality is faster, because you only need to wait the amount of time it takes for that element/content to appear. With a sleep, your test will wait that full amount of time, regardless.

So, really scrutinize any use of sleep in a feature test. There are a very small number of cases, for one reason or another, where we have not been able to replace a call to sleep with something better. However, these cases are the exception, and not the rule. Most of the time, using Capybara’s wait functionality is a much better option.

Reproduce race conditions

Flaky tests are usually caused by race conditions. If you suspect a race condition, one way to reproduce the race condition is to slow down the server response time.

We used the following filter in our Rails application’s ApplicationController to slow all requests down by .25 seconds:

Intentionally slowing down all requests by .25 seconds flushed out a number of race conditions in our test suite, which we were then able to reliably reproduce, and fix.

Capture screenshots on failure

A picture is worth a thousand words, especially when you have no friggin idea why your test is failing. We use the capybara-screenshot gem to automatically capture screenshots when a capybara test fails. This is especially useful when running on CI, when we don’t have an easy way to actually watch the test run. The screenshots will often provide clues as to why the test is failing, and at a minimum, give us some ideas as to what might be happening.

Write fewer, less granular tests

When writing unit tests, it is considered best practice to make the tests as small and as granular as possible. It makes the tests much easier to understand if each test only tests a specific condition. That way, if the test fails, there is little doubt as to why it failed.

In a perfect world, this would be the case for feature tests too. However, feature tests are incredibly expensive (slow) to setup and run. Because of this, we will frequently test many different conditions in the same feature test. This allows us to do the expensive stuff, like loading the page, only once. Once that page is loaded, we’ll perform as many tests as we can. This approach lets us increase the number of tests we perform, without dramatically blowing out the run time of our test suite.

Sharing is caring

Have any tips or tricks you’d like to shrare? We’d love to hear them in the comments!

A good suite of reliable feature/acceptance tests is a very valuable thing to have. It can also be incredibly difficult to create. Test suites that are driven by tools like Selenium or Poltergeist are usually known for being slow and flaky. And, flaky/erratic tests can cause a team to lose confidence in their test suite, and question the value of the specs as a whole. However, much of this slowness and flakiness is due to test authors not making use of the proper Capybara APIs in their tests, or by overusing calls to sleep to get around race conditions.

The Common Problem

In most cases flaky tests are caused by race conditions, when the test expects an element or some content to appear on the page, but that element or content has not yet been added to the DOM. This problem is very common in applications that use JavaScript on the front end to manipulate the page by sending an AJAX request to the server, and changing the DOM based on the response it receives. The time that it takes to respond to a request and process the response can vary. Unless you write your tests to account for this variability, you could end up with a race condition. If the response just happens to come back quick enough and there is time to manipulate the DOM, then your test will pass. But, should the response come a little later, or the rendering take a little longer, your test could end up failing.

Take the following code for example, which clicks a link with the id "foo", and checks to make sure that the message "Loaded successfully" displays in the proper spot on the page.

There are a few potential problems here. Let’s talk about them below.

Capybara’s Finders, Matchers, and Actions

Capybara provides several tools for working with asynchronous requests.

Finders

Capybara provides a number of finder methods that can be used to find elements on a page. These finder methods will wait up to the amount of time specified in Capybara.default_wait_time (defaults to 2 seconds) for the element to appear on the page before raising an error that the element could not be found. This functionality provides a buffer, giving time for the AJAX request to complete and for the response to be processed before proceeding with the test, and helps eliminate race conditions if used properly. It will also only wait the amount of time it needs to, proceeding with the test as soon as the element has been found.

In the example above, it should be noted that Capybara’s first API will not wait for .message to appear on the DOM. So if it isn’t already there, the test will fail. Using find addresses this issue.

The test will now wait for an element with the class .message to appear on the page before checking to see if it contains "Loaded successfully". But, what if .message already exists on the page? It is still possible that this test will fail because it is not giving enough time for the value of .message to be updated. This is where the matchers come in.

Matchers

Capybara provides a series of Test::Unit / Minitest matchers, along with a corresponding set of RSpec matchers, to simplify writing test assertions. However, these matchers are more than syntactical sugar. They have built in wait functionality. For example, if has_text does not find the specified text on the page, it will wait up to Capybara.default_wait_time for it to appear before failing the test. This makes them incredibly useful for testing asynchronous behavior. Using matchers will dramatically cut back on the number of race conditions you will have to deal with.

Looking at the example above, we can see that the test is simply checking to see if the value of the element with the class .message equals "Loaded successfully". But, the test will perform this check right away. This causes a race condition, because the app may not have had time to receive the response and update the DOM by the time the assertion is run. A much better assertion would be:

This assertion will wait Capybara.default_wait_time for the message text to equal "Loaded successfully", giving our app time to process the request, and respond.

Actions

The final item we’ll look at are Capybara’s Actions. Actions provide a much nicer way to interact with elements on the page. They also take into account a few different edge cases that you could run into for some of the different input types. But in general, they provide a shortened way of interacting with the page elements, as the action will take care of performing the find.

Looking at the example above, we can re-write the test as such:

click_link will not just look for something on the page with the id of #foo, it will restrict its search to a link. It will perform a find, and then call click on the element that find returns.

Summary

If you write feature/acceptance tests using Capybara, then you should spend some time getting familiar with Capybara’s Finders, Matchers, and Actions. Learning how to use these APIs effectively will help you steer clear of flaky tests, saving you a whole lot of time and aggravation.