Why Use Seed Data?

To give a set of realistic data to develop against, including ad hoc testing and user interface work.

To speed up automated test runs by pre-populating some data rather than continually recreating it.

A commonly used approach is to work with copies of production data, often anonymised to at least some extent. However, this is extremly risky because it is very easy to leak PII (personally identifiable information), particularly if laptops are taken home. Do not have copies of production data on your development machine – even anonymised ones!

Instead of using production-based data, seed data should be generated to give a small but sufficient data set for use in development and test scenarios. Using the same data for both allows developers to become familiar with the available data.

Generating Seed Data

The recommended approach for generating seed data is to use FactoryGirl which has many powerful methods for creating Active Record objects, and use these factories from the db:seed rake task. This means that the data is generated via the models which reduces the chance of it getting stale or out-of-sync.

A basic db/seeds.rb file for a service with users might look something like this:

FactoryGirl.find_definitionsFactoryGirl.create_list:user,10

When creating seed data it’s important not to use random data as it must be the same every time; gems like Faker can be great for runtime test data but you shouldn’t use them here. Using FactoryGirl’s sequences lets you generate unique but consistent test data, for example:

Considerations

How Much Data?

How much seed data you should generate is a balance between necessity and speed. The less data you generate the faster your builds will be, but if you don’t have enough then you won’t be able to test things like paging. The correct answer then is as little as possible, but not too little.

As an example, you’ll need enough restaurants to fill at least one, but perhaps two or three listing pages in a single zone. You’ll also want a small number of restaurants in a handful of other cities and countries to be able to test variants of the code there and localisation.

Performance Testing

Because you won’t have a dataset nearly the size of the production one, it makes it very difficult to test performance on the local system. However, if you stick to good engineering practices (particularly around Active Record) then you should rarely have a problem.

If there are performance problems with the code then they should be found on a staging or pre-production environment which should be using a larger anonymised dataset.

Migration Testing

Testing migrations can also be tricky as sometimes the data in real databases isn’t quite as clean as the idealised seed data on your machine. These problems will surface when run in a staging or pre-production environment and the problematic data can be explored there.

Keeping migrations atomic or idempotent will ensure that if there are problems on the full dataset that they can be re-run once the problem is resolved.

Creating Factories

Return valid records. Every Time!

When creating factories, set the minimum required columns to return a valid record.

If we are getting an ActiveRecord::RecordInvalid: Validation failed: Surname type can't be blank.
Please add a reasonable default for the surname field.

# Good because FactoryGirl.create(:user) returns a valid record.FactoryGirl.definedofactory:userdoname"John"surname"Lennon"endend

If calling FactoryGirl.create(:user) multiple times causes ActiveRecord::RecordInvalid: Validation failed: Name 'John' is taken.
Please use sequence for the name field or use faker, so every record will have a unique name.

# Good because 2.times { FactoryGirl.create(:user) } doesn't raise an exceptionFactoryGirl.definedofactory:userdosequence(:name){"John the #{n}"}endend

So FactoryGirl.create(:user) wouldn’t rely on the existing data but instead will create a company record - using company factory- if necessary.

Return unique records.

Make sure that calling FactoryGirl.create multiple times will return unique records.

Example

# Bad - every created user will have the same id, email address etc.FactoryGirl.definedofactory:userdoinitialize_with{User.find_by_email("bob@example.com")}endend

Not only we are relying on the fact that our user factory needs another user to be created before but also
everytime we call FactoryGirl.create(:user) we will get the same user which is not the expected behaviour for a Factory.

Please make sure calling the create on the factory always returns unique records.