Partners

The Conversation launched over six years ago and our continuous integration infrastructure has evolved over that time.

For the first few years we ran happily ran Jenkins on a single physical server. Our feature set grew rapidly and by early 2013 the build time for our core CMS code base peaked at an unhelpful 28 minutes. We explored options for speeding up our feedback loop and eventually ended up with a 6 minute parallel build running on three physical build servers, coordinated by Buildkite.

It feels like no development team is ever happy with their build time, but we found 6 minutes a pragmatic average that avoided most long waits for a result.

The challenge from there was avoiding the inevitable build time increase as new tests were added over time, and that required a way to monitor the long term trends.

We use librato for monitoring and it felt like a good fit for this situation too. Buildkite provide webhooks and we have a lita-powered slack bot that is well suited to hosting glue-code like this.

The solution we settled on provides a librato dashboard displaying the four-week build time trend for each of our codebases. Each chart includes the raw data that fluctuates a bit, plus a smoothed line based on the average of the last ten builds.

In May 2017 we upgraded our standard operating environment from Ubuntu 14.04 to Ubuntu 16.04. When the upgrade was applied to our build servers we inadvertently changed a postgresql setting that added over a minute to the build. Within a few days the increase became obvious on the trend line and we identified then resolved the issue.

Total build time for our core CMS over a four week period. The period of slower builds was due to build server misconfiguration after an upgrade to the host operating system.CC BY-NC-ND

As an extra safety net we have librato configured to alert us via slack if the average build time passed a threshold.

When the average build time for our project exceeds a threshold, we’re alerted via slack.CC BY-NC-ND

We’ve had this setup in place for about a year and we’ve had at least three occasions where configuration or spec changes have impacted build time significantly and been reversed once discovered.

With a bit of luck, the bad old days of 28 minute builds will remain a bad memory.