Getting ready to cast my vote in the great CI election

Last year we started using Swift for any new features that we were developing at Memrise. Swift has a lot of great language features however the tooling support still leaves a lot to be desired and by introducing Swift into our existing Objective-C project we seen a significant increase in our build times. A quick search on Google brought up a lot of articles and talks about workarounds to try and reduce compilation time (an especially good talk is by Uber). We implemented a lot of the recommendations and did see an improvement however the compilation times were still more than the Objective-C only project had been.

Just to give you an idea of the size of project when this experiment began the project consisted of:

The real bottleneck for us was around our in-house CI where build times jumped from about 10-12 minutes per build to ~20 minutes. This additional 8-10 minutes build time was exacerbated by the iOS ecosystem which is hostile to running builds in parallel on the same machine due to limitations around running multiple instances of the simulator (looks like improvements in Xcode 9 actually solves this). Ideally in development we want our feedback loops to be as quick as possible as this often results in the code changes being cheaper to implement. So this increase in build time resulted in an actual cost increase in our development process, this was especially acute during our regression testing days where there is often a number of very small bug fix PRs that require fast turn around to ensure that we meet our submission targets. With this increase in build time and having exhausted both settings and code optimisations we decided to throw additional processing power at it - we went from one Mac mini to 3 running Jenkins. This helped a lot by allowing us to run builds in parallel (on different machines) but also created additional work for both the iOS and DevOps teams, in trying to maintain these 3 machines and keep them in sync. This often resulted in only 1-2 of those Jenkins' machines actually running. The increased maintenance costs coupled with the increased build times meant that we decided to look at what other options we had for CI.

In this post, I really want to detail the steps that we took when looking at these different CI solutions, how we accessed those CI solutions against each other and finally what CI solution we ended up settling on.

Is the grass greener?

Before we could look for alternatives, we needed to come up with a list of what a managed CI solution needed to support:

Mirror development environment

Good support and documentation

Support for concurrent builds

Environment configuration per branch

Performant

During our search, support for Mirror development environment was actually the biggest constraint. A number of promising CI solutions that we found, didn't include a MacOS option. In the end we were left with 2 credible CI solutions that we felt covered all of my requirements:

Travis CI

CircleCI

And of course our existing CI solution:

Jenkins

Getting to know the candidates 🕵️‍♀️

Travis CI

Travis CI is a cloud based system that is administrated as part of SAAS package. Configuration is controlled via a YAML file in your project/repo with additional settings available in the Travis CI web interface for that repo. Frequent readers (thanks for coming back 😉) of this blog will know that I actually already use Travis CI on most of my open source projects. I find it to be a reliable system, that works well with the MacOS/iOS ecosystem and is widely supported i.e. lots of questions and answers on Stackoverflow. To say that I'm a fan of the support that Travis CI offers the open source community would be a understatement - I think they do a fantastic job that really promotes good practice and helps to drive up code quality.

However I didn't know how good Travis CI was with private projects and at satisfying all of my requirements listed above. One interesting side note is that Travis CI offers two landing pages:

With .org being for open source projects and .com being for private projects.

CircleCI

CircleCI is very similar to Travis CI in that it's also a cloud based solution that is administrated as part of SAAS package. Configuration is also controlled via a YAML file in your project/repo with additional settings available in the CircleCI web interface for that repo. I was aware of CircleCI mainly via it's support (advertisements) of various different email aggregators such as iOS Dev Weekly but hadn't had any actual experience of integrating with it. Also as CircleCI doesn't have a free open source tier, the number of questions/answers, posts, etc that exist on the internet was significantly smaller when compared to Travis CI.

Jenkins

Jenkins is the elder statesman of CI solutions having made its first appearance as Hudson in 2005 before donning its current guise of Jenkins in 2011. Jenkins is a self contained Java app that has almost endless configuration options via its wide support of plugins. I've personally used Jenkins as a CI solution with various companies since 2011 and have found it to be a reliable CI solution - even with its slightly dated UI.

Initial predications on the outcome 🤔

For a team of our size I was looking at the Small Business/Growth which cost $249 (at time of experiment).

With Jenkins already being up and running, the costs were related to the necessary support required from the DevOps team to maintain and extend. As such these costs were trickier to calculate/understand.

Due to the breadth of posts, questions, unlimited build minutes, examples of open source projects using it and because of my own personal prior experience - Travis CI started as my favourite.

But do keep reading to see if it finished in that position 😋.

On the campaign trail

It's important to note that during testing, Jenkins would still be running and acting as our primary source of CI information. An important difference between Jenkins and Travis-CI/CircleCI is that Jenkins is based on the idea of jobs where different jobs can potentially be pointing at the same branch but have different configurations where as Travis-CI/CircleCI is based on having one super job i.e. the repo itself and using individual branches to adjust the configuration of how the project is built. In order to support this different approach we had to add a number of virtual branches that only existed as code merge branches to allow for these different configuration options. Individual developers won't be merging to or branching off of them. So for Jenkins we had the following branches:

feature_branch - development branches used for developing one feature or fixing one bug.

develop - main working branch that each feature_branch is merged into or branched out of.

release - regression testing branch, only bug fixes are merged into or branched out of.

master - approved app versions.

And for Travis-CI/CircleCI we added the following virtual branches:

alpha - used to trigger our nightly/alpha builds.

beta - used to trigger our TestFlight builds.

production - used to trigger our App Store submission builds.

I detail the branches above so that when we come to see the implementation details of the fastfile files, the structure and different tasks will make more sense. As the project already supported Fastlane we decided to create individual fastfile files for each CI to speed up experimentation and ensure that one solution didn't interfere with the other.

(When the individual CIs came to run, the first step would be to rename the relevant fastfile so that Fastlane would use it)

Travis CI

First thing to understand is that in order to use Travis CI you need to create a configuration file .travis.yml - it provides a jump off point for running a build on Travis CI.

(If you create the file and then are wondering where it went to in Finder - it's hidden. Files created with . don't show up by default, you can change this setting in Finder or open the file from the terminal.)

Quite a short config file. Split into broad sections, this file is configuring the machine/image, building the project and handling what to do with whats been built. Let's delve deeper into each of these sections:

While Travis CI builds Swift based projects we can't use swift as the language.

Next to try and speed up building we attempt to cache the response from bundler and cocoapods. If you haven't used bundler before - it allows you to specify a working environment that can be shared among your development team. As we will see later on, it does this with one command and a config file.

cache:
- bundler
- cocoapods

Next we get down to specifying the image of the machine that we want to use:

osx_image: xcode8.2

(At the time of the experiment Xcode 8.2 was the most up-to-date version).

Here is where we can begin to see the power of a managed CI solution, in that we can "spin" up a fresh, pre-configured machine with Xcode 8.2 already installed (among other things) with only one statement. This means that when we came to migrating to a new version of Xcode (or any of the tools we used), we could change this value on their branch and test it out. If we discovered a compatibility issue with this change that couldn't be overcome, rolling back from this change became a case editing the YAML back - no need to uninstall and reinstall anything.

The first two lines are responsible for renaming the fastfile to what Fastlane is expecting. Sadly we had some hardcoded unit tests so rather than actually fixing them (🤷), I changed the timezone on the machine to be what the tests expect. The final line is installing the tools the project required, these tools/gems are specified in a Gemfile which for the sake of completeness is:

Compared to the configure section, the build section can initially seem underwhelming:

script:
- bundle exec fastlane build

but this is because all the "magic" is happening in the fastfile.

I won't go into the actual details of the fastfile as I think this is outside of the scope of this post. Other than to detail that like most CI environments, Travis CI comes with some preconfigured environment variables that you can use to alter the executed path for your branch so the build lane consisted of:

As you can see, there are multiple private, more focused lanes that are called from this lane. I could have included this logic in the .travis.yml and directly called the appropriate lane but instead I choose the fastfile to contain this logic. I wanted to keep the .travis.yml file focused on defining the build environment rather than concerned with the details of actually building.

The above should seem familiar as it uses the same structure as the build lane with multiple private lanes being called depending on the branch that is being executed.

And that's pretty much it for the v1 of our Travis CI configuration. It configures a machine, tests, builds and finally sends an IPA to either HockeyApp/AppStore/TestFlight. Spurred on by my success in getting everything up and running, I decided to dig deeper into the Travis-CI documentation (remember one of the requirements of a CI solution was: Good support and documentation) and discovered that Travis-CI supported pipelining in the build process with the ability to run multiple steps in parallel.

Travis CI - tk 2

Travis-CI allows for multiple parallel steps in it's build process via a Build Matrix. Using Build Matrix, I was able to run both the build and testing steps in parallel. The updated .travis.yml consisted of:

This is were I describe the parallel steps: build and test which results in the script section being called twice (with each MODE version), the MODE value of each execution is then passed into the build lane which runs either the internal build or test lanes for that branch. With this small change I was able to achieve better build times (see the conclusion for build times).

CircleCI

Like Travis-CI, CircleCI has it's own configuration file circle.yml - this also provides the jump off point for running a build on CircleCI.

I found that on CircleCI I experienced an increase in Status 65 errors when running unit tests. After having trawled through the CircleCI forum it looked like this was caused by the simulator not launching quickly enough for Xcode to use. The above is an attempt to solve this by launching the simulator before it's needed - this actually saved my experimentation with CircleCI as without this fix, the quantity of Status 65 errors would have been too great.

Again this is very similar to what we can already see in the TravisCI config file. Here we perform the filename dance, then run the tests and finally build the actual project. Please also notice that in CircleCI you can specify a distinct test section, I disabled it and instead chose to run my unit tests as part of the compile/built step. This was useful as it allowed for exiting early when we had failures in our unit tests. With iOS, in order to run unit tests you need to build for that task - you can take a pre-built project and run against it. This means that to both produce an IPA and run unit tests you need to build twice. Having an IPA produced first to then discover that you had a failed test or encountered a Status 65 error resulted in us having to throw away the IPA, so building in that IPA was a waste of time.

Let's have a quick peak into the fastfile to see what the lanes consist of:

Slightly different from the build lane found in the Travis CI fastfile, in that run_tests isn't called from the build lane. This is to allow for the simulator version to be passed directly into the test lane (so that the same simulator that was manually started is the same that is actually used in the tests) - I could have passed the simulator version into the build lane and made the config file simpler by removing that step but I felt the above solution was cleaner as it would have resulted in passing information that only one branch (if ENV["CI_PULL_REQUEST"]) actually used.

The above should seem familiar as it uses a similar structure to the build lane with multiple private lanes being called depending on the branch that is being executed.

Well, that's pretty much it for the our CircleCI configuration. It configures a machine, tests, builds and finally sends an IPA to either HockeyApp/AppStore/TestFlight. Just like with Travis-CI having got my initial configuration working I looked through the documentation looking for ways to improve the performance of my build but sadly unlike with Travis CI, CircleCI does not support parallel build steps on their MacOS images so I had to settle for a serial build pipeline.

Jenkins

With Jenkins you don't have a YAML config file rather each job is configured in the UI. This means that we don't need to use if...else statements as shown in both Travis CI and CircleCI fastlane files. Instead the Jenkin's job acts as an implicit if...else statement, resulting in a much smaller configuration. The below is an example from the our AppStore submission job:

As you can see, it's much smaller with only one section (compared to the 3 with other examples). This is because we can directly call the specific lane from the job and have that lane handle more than just building the ipa - it's building, testing and uploading.

Counting the votes 🗳️

So having got all 3 options set up and working, we decided to run them in parallel, reporting their results back to Github as pass/fail checks in our PRs. This allowed for direct comparison between each CI solution over an extended period (3 weeks).

Let's recap what the requirements I had for a viable CI solution were:

Mirror development environment

Good support and documentation

Support for concurrent builds

Environment configuration per branch

Performant

Mirror development environment

All 3 options allowed us to mirror our development environment with various degrees of ease. Travis CI and CircleCI supported this through a combination of pre-built images and customisation with installing tools via bundler, Jenkins supported this by us having access directly to Mac mini itself.

+1 Travis CI

+1 CircleCI

+1 Jenkins

Good support and documentation

Again all 3 options had good support and documentation. Travis CI and CircleCI being a managed service both offer support as part of your subscription; with documentation coming via their dedicated documentation, their forums which are staffed or general website articles (such as this one here which your lovely self is reading). Jenkins having been around since 2005 has a much wider base of articles and how-to's to choose from but this needs to be treated with caution as some of those articles are no longer valid.

+1 Travis CI

+1 CircleCI

+0.5 Jenkins

Support for concurrent builds

So depending on how loose/generous you want to be with the term "concurrent" all 3 options satisfy it. With Jenkins we were not able to successfully customise our instance to run more than one job at a time on the same machine due to constraint around launching more than one version of the iOS simulator. This meant that to add more concurrent builds we needed to add more Mac minis to our network. These additional machines would then be setup as slaves. However from cold, hard experience we discovered that it wasn't as easy as this with Jenkins as you also had to set up an additional system to ensure that each machine was provisioned the same way. With both Travis CI and CircleCI because you configure the machine via the YAML file, adding more concurrency become a matter of buying a bigger plan with more executors.

+1 Travis CI

+1 CircleCI

+0.5 Jenkins

Environment configuration per branch

Both Travis CI and CircleCI allow environment configuration as their whole raison d'etre. While it's technically possible on Jenkins, we found it to be a lot harder in practice.

+1 Travis CI

+1 CircleCI

+0 Jenkins

Performant

So we measured performance in 2 ways (each scored separately):

Build time

Stability

Thanks to generous trials (and extensions of those trials) we were able to directly compare all 3 options over that 3 week period, with each running the same builds. It's important to note that I only tracked the times on feature branches which produced an IPA and ran our unit tests. (We actually found that running unit tests increased our build times by ~40% vs just build-only times).

Travis CI (serial): 46.34 minutes

Travis CI (parallel): 37.06 minutes

CircleCI: 26.23 minutes

Jenkins: 17.18 minutes

(I detail both serial and parallel build times for Travis CI, as running in parallel means that each build actually uses two executors from your pool rather than one so reduces your ability to run concurrent builds).

In raw performance terms Jenkins is far quicker and as you can see switching to a managed CI solution was actually going to result in an increase in build times of between 33% and 133% depending on what solution was chosen. But there is a caveat here: this build time was only true when only one build was executing, once multiple builds were requested in a short period of time both Travis CI and CircleCI performed better than Jenkins, as these builds did not have to queue up as frequently.

While CircleCI is ~30% faster than Travis CI the perception at the time was that Travis CI was the more stable with fewer of the dreaded Status 65 errors that occasionally plagued otherwise healthy builds on CircleCI. Forcing the build to need to be re-executed (so actually resulting in CircleCI having a longer build time than shown above).

+0.5, +1 Travis CI

+1, +0.5 CircleCI

+0.5, +1 Jenkins

The results are in 🏅

For those of you who are not sleep reading your way through this post, you will already know the winner but for the rest of us I've listed it below:

Travis CI - 5.5/6

CircleCI - 5.5/6

Jenkins - 3.5/6

So it's a dead heat 🔥 between Travis CI and CircleCI with both scoring 5.5 - sorry Jenkins but you've lost this fight 😢.

Oh no, we have no clear winner!

Both companies offer very similar plans for the same price ($249) but have slight differences. With Travis CI offering unlimited build minutes but fewer executors (5) and CircleCI offering limited build minutes (5000) per month but more executors (7) - I had to work out how big a factor that constraint on build minutes was going to be.

Taking the average build, I determined that we could run ~8 builds a day which was tight but do-able without having to adjust our development processes to fit CircleCI. This meant that the number of executors became the deciding factor which I know to be especially important on those busy, stressful regression testing days. As getting the best performance from Travis CI meant running some steps in parallel which meant using two executors, it was effectively 2.5 executors vs 7 executors.

CircleCI takes the win!

So that's it, after 3 weeks of testing we decided to go with CircleCI over Travis CI and Jenkins - which was a real 😮 moment for me and overturned the idea I had at the start about which solution we would go with.

One interesting caveat to the above is that once we actually started paying CircleCI, we noticed that build times increased - not significantly but there was a jump of ~1 minutes which leads me to believe that CircleCI may be being more generous with computing resources when users are on trial than they are once they start paying.

Living with the choice

Since we decided to go with CircleCI we have continued to look for ways to further reduce the build times and currently have it down to <17 minutes through a combination of further code improvements (helping the compiler know the type of an object rather than having to always infer it), reducing external dependencies (pods), newer versions of Xcode (thank you WMO), speed improvements in the tools that we use and (I'm guessing here) CircleCI improvements. This is while we have increased the number of tests in our project and introduced monitoring tools like Codecov into our process which have actually increased build times. At present our circle.yml file looks like: