Give Codeship’s CI/CD Platform a Try

Want to learn more?

This is the seventh Testing Tuesday episode. Every week we will share our insights and opinions on the software testing space. Drop by every Tuesday to learn more! Last week we talked about the top 5 Cucumber best practices.

Generating and maintaining test data

When you write software tests, you usually need to get your application into a certain state by creating test data. This test data is the basis to run your tests on. One possibility to create this test data is writing an SQL script. A better one is writing fixtures. But all static test data has a downside:

Maintainability

Generating test data with factory girl There are tools that make generating and maintaining test data easy. In this screencast I show you why using a test data generation tool makes sense. I introduce my favorite tool named factory_girl. It is written in Ruby, but there are libraries inspired by factory_girl for Python, PHP, Scala and JavaScript as well. You can find them below in the “Further readings” section.

In next week’s Testing Tuesday #8 we’ll talk about integration and unit testing and how to use it in behavior driven development. We will meet our old friend Cucumber again and also introduce our new friend Rspec.

Transcript

Ahoi and welcome! My name is still Clemens Helm and you’re watching Codeship Testing Tuesday #7. As I promised you last week, today we’ll take a look at managing test data. By test data I mean data used in automated tests. Especially integration tests usually require a specific configuration of test data to perform on. By the way, what’s an integration test? Integration testing means that you test multiple components of your application together. For example you test the user interface, the whole underlying web application and the database. In contrast to unit tests, where you just test single components like models or controllers. We will cover this difference in next week’s episode.

For now let’s focus on how to create test data for our tests. One possibility is to simply insert it into the database before running the tests using SQL like this:

Much better. These structured datasets are called test fixtures. They are more readable and you can easily parse this data and insert it into the database. Also this way you are not depending on a specific database type and can migrate your tests easily to something like MongoDB later on.

So what we can do now is insert all our test data into the database first and then run our tests on it. Right?

Wrong. Most of the time we modify test data during the tests. That means, the next test has to deal with modified test data. That’s not what we want. We want each test to run on fresh, unmodified data.

So we can simply re-generate all test data before each test. Right?

Well, you could do that, but then you generate all data for any test for each test. As your test suite grows linearly, the time for setting up your test data will increase exponentially, and that’s definitely not what you want.

Instead, we actually only want test data that’s relevant in a test. So in your tests you could do something like

bob = load_fixture(“Bob”)

But then, why do we need fixtures anyway? We could just create test data in each test! Fixtures are much more maintainable. Let’s say you’ve got 200 tests using users and then you add a required attribute “secret_wish” to the user. Then you need to correct your test data in all 200 places. If you use a small number of fixtures everywhere, then you just need to correct the fixtures.

Unfortunately, as your project grows, you usually need a large number of fixtures. You may need old and young, female and male, dead and alive users and combinations thereof. Also maybe you have to define 20 attributes for each user so it is a valid record, but you only need one per test.

This way you will end up with a huge amount of – mostly duplicate – fixture data. This will of course lead to the same problem: Test data becomes hard to maintain.

There is a number of tools that solve this problem. My favorite one is factory_girl. # show https://github.com/thoughtbot/factory_girl factory girl is a fixture replacement written in Ruby, but there are also similar implementations in Python, PHP, Scala and JavaScript. Check out the further readings section to find out more.

In factory girl we can define fixtures as “factories” like this:

FactoryGirl.define do factory :user do name “Bob Dylan” secret_question “How many roads must a man walk down before he can call him a man?” secret_answer “Seven.” end end

You create a new user in the database like this:

user = FactoryGirl.create(:user)

But you can also decide to build a user in your application without saving it:

user = FactoryGirl.build(:user)

You can override the attributes defined in a factory when you build a user:

user = FactoryGirl.build(:user, name: “Janis Joplin”)

We can also tell a factory to generate a different attribute every time. Let’s say we only allow users to sign up once per email address:

FactoryGirl.define do factory :user do name “Bob Dylan” secret_question “How many roads must a man walk down before he can call him a man?” secret_answer “Seven.” sequence(:email) { |number| “user#{number}@example.com” } end end

user1@example.com

user2@example.com

One of the most beneficial features of factory girl is that factories can inherit from other factories. So one good strategy is to keep all required attributes in a factory and let all other factories inherit from it. So you just need to define required attributes once instead of for every single factory:

Subscribe via Email

Over 60,000 people from companies like Netflix, Apple, Spotify and O'Reilly are reading our articles. Subscribe to receive a weekly newsletter with articles around Continuous Integration, Docker, and software development best practices.

We promise that we won't spam you. You can unsubscribe any time.

Join the Discussion

Leave us some comments on what you think about this topic or if you like to add something.