Blog Posts -

Production bugs are painful and can severely impact a dev team’s velocity. My team at Outbrain has succeeded in implementing a work process that enables us to send new features to production free of bugs, a process that incorporates automated functions with team discipline.

Why should I even care?

Bugs happen all the time – and they will be found locally or in production. But the main difference between preventing and finding the bug in a pre-production environment is the cost: according to IBM’s research, fixing a bug in production can cost X5 times more than discovering it in pre-production environments (during the design, local development, or test phase).

Let’s describe one of the scenarios happen once a bug reaches production:

A customer finds the bug and alerts customer service.

The bug is logged by the production team.

The developer gets the description of the bug, opens the spec, and spends time reading it over.

The developer then will spend time recreating the bug.

The developer must then reacquaint him/herself with the code to debug it.

Next, the fix must undergo tests.

The fix is then built and deployed in other environments.

Finally, the fix goes through QA testing (requiring QA resources).

How to stop bugs from reaching production

To catch and fix bugs at the most time-and-cost efficient stage, we follow these steps, adhering to the several core principles:

Stage 1 – Local Environment and CI

Step 1: Design well. Keep it simple.

Create the design before coding: try to divide difficult problems into smaller parts/steps/modules that you can tackle one by one, thinking of objects with well-defined responsibilities. Share the plans with your teammates at design-review meetings. Good design is a key to reducing bugs and improving code quality.

Step 2: Start Coding

The code should be readable and simple. Design and development principles are your best friends. Use SOLID, DRY, YAGNI, KISS and Polymorphism to implement your code.
Unit tests are part of the development process. We use them to test individual code units and ensure that the unit is logically correct.
Unit tests are written and executed by developers. Most of the time we use JUnit as our testing framework.

Step 3: Use code analysis tools

To help ensure and maintain the quality of our code, we use several automated code-analysis tools:FindBugs – A static code analysis tool that detects possible bugs in Java programs, helping us to improve the correctness of our code.Checkstyle – Checkstyle is a development tool to help programmers write Java code that adheres to a coding standard. It automates the process of checking Java code.

Step 4: Perform code reviews

We all know that code reviews are important. There are many best practices online (see 7 Ways to Up-Level Your Code Review Skills, Best Practices for Peer Code Review, and Effective Code Reviews), so let’s focus on the tools we use. All of our code commits are populated to ReviewBoard, and developers can review the committed code, see at any point in time the latest developments, and share input.
For the more crucial teams, we have a build that makes sure every commit has passed a code review – in the case that a review has not be done, the build will alert the team that there was an unreviewed change.
Regardless of whether you are performing a post-commit, a pull request, or a pre-commit review, you should always aim to check and review what’s being inserted into your codebase.

Step 5: CI

This is where all code is being integrated. We use TeamCity to enforce our code standards and correctness by running unit tests, FindBugs validations Checkstyle rules and other types of policies.

Stage 2 – Testing Environment

Step 1: Run integration tests

Check if the system as a whole work. Integration testing is also done by developers, but rather than testing individual components, it aims to test across components. A system consists of many separate components like code, database, web servers, etc.
Integration tests are able to spot issues like wiring of components, network access, database issues, etc. We use Jenkins and TeamCity to run CI tests.

Step 2: Run functional tests

Check that each feature is implemented correctly by comparing the results for a given input with the specification. Typically, this is not done at the development level.
Test cases are written based on the specification, and the actual results are compared with the expected results. We run functional tests using Selenium and Protractor for UI testing and Junit for API testing.

Stage 3 – Staging Environment

This environment is often referred to as a pre-production sandbox, a system testing area, or simply a staging area. Its purpose is to provide an environment that simulates your actual production environment as closely as possible so you can test your application in conjunction with other applications.
Move a small percentage of real production requests to the staging environment where QA tests the features.

Stage 4 – Production Environment

Step 1: Deploy gradually

Deployment is a process that delivers our code into production machines. If some errors occurred during deployment, our Continuous Delivery system will pause the deployment, preventing the problematic version to reach all the machines, and allow us to roll back quickly.

Step 2: Incorporate feature flags

All our new components are released with feature flags, which basically serve to control the full lifecycle of our features. Feature flags allow us to manage components and compartmentalize risk.

Step 3: Release gradually

There are two ways to make our release gradual:

We test new features on a small set of users before releasing to everyone.

Open the feature initially to, say, 10% of our customers, then 30%, then 50%, and then 100%.

Both methods allow us to monitor and track problematic scenarios in our systems.

Step 4: Monitor and Alerts

We use the ELK stack consisting of Elasticsearch, Logstash, and Kibana to manage our logs and events data.
For Time Series Data we use Prometheus as the metric storage and alerting engine.
Each developer can set up his own metrics and build grafana dashboards.
Setting the alerts is also part of the developer’s work and it is his responsibility to tune the threshold for triggering the PagerDuty alert.
PagerDuty is an automated call, texting, and email service, which escalates notifications between responsible parties to ensure the issues are addressed by the right people at the right time.

Tests are crucial in systems that rely on CI/CD as part of their release cycle. One of the challenges is to write stable tests that work for you without spending a lot of time on maintaining bad tests.

Tests are Hard

They’re hard to write, hard to maintain and it’s even harder to stabilize a flaky test. At Outbrain, we take special pride in our ability (for the most part) to deliver new features to production and doing so with the confidence that only reliable tests can give you. These tests play a crucial role in our ability to deliver fast, good and stable code making sure no regression bugs were introduced in the process. It is crucial then, to not only maintain good test suites (unit tests, integration, and e2e) but also to fix any test that misbehaves (flaky tests).

We have a special environment to facilitate integration and e2e tests called simulation environment (it is only one of the set of tools we have for that purpose). This is a dedicated set of servers which we use to simulate our production environment. We deploy every new version of our services to that environment before we deploy to production, and run tests that check new flows of code, regression, and interoperability to other services.

In order to write an effective test for a new feature, we sometimes need to set up the environment with entities that are required for the feature we’re testing. If, for example, our new feature is to register a car to an owner (a Person entity). Before running the tests we need the required entities, a Car and a Person in our database. We’re not trying to test a flow for creating a new car, or a new person in this scenario. Therefore there is no need in creating the car and/or the person entities explicitly in the test before the actual test scenario happens. And in order to make our tests as clear and succinct as possible — we don’t want to be creating this data explicitly in each and every test.

Bad Practices

So, it was a common practice (albeit a bad one) to have pre-existing data on which we would rely on to run tests (for the whole simulation environment!). This led to two big (interconnected) problems:

No test isolation – a test mistakenly deleting some or all of the pre-existing data, for example, would do so for all the tests that run in that environment

Flaky tests – tests running concurrently are creating, deleting and generally changing data that affects others, which in turn would fail tests for no good reason — which makes it really hard to analyze and fix a failing test

We’ve tackled this problem by creating the needed data before the tests in a test class and deleting it after the test run. Which mitigated the problem somewhat — not only the tests in the same class were interconnected but also added boilerplate to the test class. Now, a test class looked something like (assuming these are entities autogenerated by Scalike for the relevant tables):

Looking at this, we were presented with a challenge. First, the data is created for all the tests that run in a class, which must be deleted only after all tests have finished running — this means that the tests are not isolated one from another and potentially may become flaky. Second, we wanted an elegant way of creating and deleting the needed entities seamlessly in order to minimize the boilerplate for each test class.

Note: It is possible however, in Specs2, to make a better solution by using the ‘Scope’ trait like so:

It’s a good solution, for a simpler problem than we faced. We needed the tests running in a single transaction, with a supplied session and a configurable db name (indicating a set of Scalike connection parameters).

Enter Loan Pattern

We first encountered this pattern when using ScalaTest and quickly moved to using it also in Specs2 (as most of our tests are written in Specs2). From ScalaTest documentation for Sharing fixtures:

“A test fixture is composed of the objects and other artifacts (files, sockets, database connections, etc.) tests use to do their work. When multiple tests need to work with the same fixtures, it is important to try and avoid duplicating the fixture code across those tests.”
“If you need to both pass a fixture object into a test and perform cleanup at the end of the test, you’ll need to use the loan pattern”

Which means, we can use fixtures to set up ‘artifacts’ for the tests to use, promoting the DRY principle by minimizing code duplication. It is also a good way to reduce boilerplate when writing tests. So, we wrote this one simple trait:

Let’s go over what’s happening in this trait. We’re mixing in a custom trait called ‘DefaultGenerator’ which gives us the ‘DefaultObjects’ which are the entities we need to be pre-created for our tests to run. We have two private methods. One that calls ‘create’ on ‘DefaultObjects’ with a custom name to generate the needed entities. The other calls ‘cleanup’ on the test data to clean the environment after the test has finished running. And the star of this trait, the method (or fixture if you will) ‘withTestData’ which gets the test function as a parameter, calls the private method ‘createTestData’, calls the test and passing it the data we just generated and finally cleans up the generated data after the test finishes.

‘testData’ is the data generated in our ‘withTestData’ method (a car and a person in our case).

The Specs2 version of the Loan Pattern is a bit more complex, as we’ve added some more bells and whistles to make it easier for us to create those entities in our domain. We’re using Scalike to create the entities in MySQL database, and we need a somewhat more refined control over the session we’re using, DB name etc’.

It’s very similar to the ScalaTest flavor, but with several changes we needed to make to better facilitate our needs in the Specs2 tests. We have a mechanism to initialize a named DB connection, with a named connection pool and an explicit session. Besides these additions, it’s pretty similar to ScalaTest — generate the test data, run the test and clean the generated data.

Summary

We tackled several issues our team faced on a day to day basis, which made our simulation environment unstable, hard to maintain and generally very frustrating to work on. By extracting data generation and cleanup to an external trait and using a clever mechanism to reduce boilerplate, we managed to clean and simplify the test class, reduce code duplication and generally made our lives easier. Tests are still hard, but a bit easier to write and nicer to read. What do you think?