Autoscaling build servers with Gitlab CI

I’ve been using Gitlab CI for a while now and until certain point it worked really well. We had three build servers (GitLab runners) in the beginning, and when number of teammates or build steps and therefore commits and build jobs increased, I’d just add one more server to handle an extra load and felt that problem was solved.

Not for long. When number of servers climbed to more than ten, it became obvious that simply adding servers one by one doesn’t work anymore. It was both expensive to have all of them running all the time and it still wasn’t enough to handle occasional spikes of commits. Not to mention that during the nights and weekends those servers were doing absolutely nothing.

The whole thing needs to be dynamic and fortunately GitLab CI supports autoscaling out of the box. Documentation is a little bit confusing but in reality it’s very easy to get started. So here’s the plan: let’s try it!

Word of warning though: I’ll skip an introduction of what is GitLab, GitLab runner and even Docker – they’ve been discussed in previous posts.

How GitLab CI’s autoscaling works

The idea is very simple. We already used GitLab runners that compiled TypeScript project directly at the host they were installed by using shell executors. However, we also could’ve used docker executor, which would put the code into a Docker container and compile it there. Being able to use Docker for builds, it’s just a tiny step to start using docker-machine utility which would spin up a new VM with Docker installed on it and perform the build remotely. When it’s done we could safely use docker-machine again to remove that temporary host and wait for another build to come. GitLab knows how to do that automatically and has docker+machine executor for that.

Tooling

GitLab, as well as Docker and docker-machine can be installed virtually anywhere, but today I’ll use good old Mac with Vagrant, docker-machine and VirtualBox on it. We’re also going to need some demo project to pass through build pipeline and I think .NET Core’s default “Hello World” console app is perfect for that.

Setting up GitLab and dev environment

I’m going to rush through this part, because GitLab installation was already covered in my previous post. That time we hosted GitLab server in Docker container, but today we’re going to promote it to its own virtual machine.

Configure Virtual Machine

This relatively simple Vagrantfile with slightly less simple provision.sh should (will) create a new VM with GitLab, .NET Core 2.0 SDK and Docker on it:

Creating .NET Core console app

That’s going to be simple. As we installed .NET Core SDK inside of our VM, we can get in there and use that SDK to create ready-to-use “Hello World” app.

Shell

1

2

3

4

5

6

7

vagrant ssh

#inside of ubuntu@ubuntu-xenial

mkdirconsole-app

cdconsole-app/

dotnet newconsole

dotnet run

#"Hello World!"

After that we’ll simply git init . and commit it, add our newly created GitLab server as origin and happily push it in there:

Set origin and push

Shell

1

2

git remote add origin http://192.168.33.10/root/console-app.git

git push--set-upstream origin master

Configure build steps

For build steps we’ll have something simple like compiling the project in Debug and Release configurations. As usual, we’ll put build steps definitions into .gitlab-ci.yml file.

.gitlab-ci.yml

YAML

1

2

3

4

5

6

7

8

9

10

11

Build:

tags:

-dotnetcore-2-sdk

script:

-dotnetbuild-cDebug

BuildinRelease:

tags:

-dotnetcore-2-sdk

script:

-dotnetbuild-cRelease

If we did everything right and pushed .gitlab-ci.yml to origin, project’s “Pipeline” page will show pending build, which will remain pending until we add some runners to build it.

Configuring Docker runner

The simplest way to get autoscaling “docker+machine” runner is to start with “docker” runner instead. If we find the right Docker image to build our project and confirm that it works locally, there’s no reason why it won’t work remotely on dynamic VMs.

This image from Microsoft should probably be capable of building our console app – microsoft/2.0-sdk. I copy-pasted the script that installed “shell” runner before, changed a few lines and voilà, this is the thing that will build the project in Docker containers:

And as soon as script finishes, we can go back to the page with pending build, click on it and see this beauty in action:

I even launched watch -n 1 sudo docker ps command to see if a new container is really created, and yes, that’s for real.

It takes some time for build to finish (after all, microsoft/2.0-sdk image is 1.6 GB in size), but consequent builds are much faster.

So we confirmed that “docker” executor works. Let’s disable that thing for now (Settings -> CI/CD -> Runner settings) and create a truly scalable runner.

Configuring docker-machine runner

Previous “docker” runner could’ve been installed in the same VM as GitLab, or even at host machine – GitLab’s IP is public anyway. However, “docker-machine” runner will create new VMs, which, when happening inside of existing VM, might lead to The Matrix. For sake of simplicity and saving humanity let’s create this new runner on host machine, which in my case is Mac.

Installing gitlab-runner on Mac tricky. Probably I didn’t do it right, but commands that worked very similarly on Linux and even on Windows doesn’t perform that well here. E.g. it never put runner’s configuration file into correct directory, so I had to copy it over, and it never worked for me as a service, so I run it in user mode instead (gitlab-runner run).

As I promised, “docker-machine” runner configuration is almost identical to simple “docker”:

Register docker+machine runner

Shell

1

2

3

4

5

6

7

8

9

10

sudo gitlab-runner register\

-uhttp://192.168.33.10/ci\

-rpsu73HL3bCXbGj4dXcay\

-n\

--executor docker+machine\

--docker-image"microsoft/dotnet:2-sdk"\

--machine-machine-driver"virtualbox"\

--machine-machine-name"%s"\

--tag-list"dotnetcore-2-sdk"\

--name"docker-machine runner"

Isn’t that cool? It just two more settings and none of them says “recompile everything” and “create your own cloud”.

And now, the moment of truth. Hit “retry” button next to one of already finished builds at GitLab and see this magic happing in VirtualBox Manager window:

It created a new virtual machine specifically for this build! As soon as it finishes, that machine will be gone as well. What’s interesting, build output looks like it was produced by regular “docker” runner.

For this runner we used VirtualBox provider, but docker-machine supports many others: AWS, Google Compute Engine, Azure – you name it. And it doesn’t have to be just one machine at the time. We can create hundreds of them in parallel, keep some VMs in advance, reuse already created VMs – it’s insanely flexible.

Conclusion

I’ve been looking at this autoscaling feature since the first day I started to use GitLab CI, but somehow its documentation made it look extremely complicated, so I never tried it. Maybe they rewrote the docs since then, but recently it all started to make sense and the feature itself is not hard to enable after all. You saw that, it was just a few more parameters to register command.

To be honest, I think it’s going to be a little bit harder in production. E.g. it’s not uncommon for my CI to run 80 or so concurrent builds. When all of them get into different VMs, computing power will stop being a bottleneck for sure, but network and GitLab itself might become one. Pulling docker images can be mitigated by local Docker registry, build caches can go to S3 or Google storage, but repository itself and build artifacts will still have to travel between GitLab and VMs. And I can tell you it’s a uncomfortable to imagine 80 build VMs pulling 5 GB repository from tiny GitLab server simultaneously. Especially when it’s sitting in another network.