My team at Red Hat depends heavily on GitLab CI and we build containers
often to run all kinds of tests. Fortunately, GitLab offers up CI to build
containers and a container registry in every repository to hold the
containers we build.

This is really handy because it keeps everything together in one place: your
container build scripts, your container build infrastructure, and the
registry that holds your containers. Better yet, you can put multiple types
of containers underneath a single git repository if you need to build
containers based on different Linux distributions.

Building with Docker in GitLab CI

By default, GitLab offers up a Docker builder that works just fine. The CI
system clones your repository, builds your containers and pushes them
wherever you want. There’s even a simple CI YAML file that does everything
end-to-end for you.

However, I have two issues with the Docker builder:

Larger images: The Docker image layering is handy, but the images end up
being a bit larger, especially if you don’t do a little cleanup in each
stage.

Additional service: It requires an additional service inside the CI
runner for the dind (“Docker in Docker”) builder. This has caused some CI
delays for me several times.

Building with buildah in GitLab CI

On my local workstation, I use podman and buildah all the time to build,
run, and test containers. These tools are handy because I don’t need to
remember to start the Docker daemon each time I want to mess with a
container. I also don’t need sudo.

All of my containers are stored beneath my home directory. That’s good for
keeping disk space in check, but it’s especially helpful on shared servers
since each user has their own unique storage. My container pulls and builds
won’t disrupt anyone else’s work on the server and their work won’t disrupt
mine.

Finally, buildah offers some nice options out of the box. First, when you
build a container with buildah bud, you end up with only three layers by
default:

Original OS layer (example: fedora:30)

Everything you added on top of the OS layer

Tiny bit of metadata

This is incredibly helpful if you use package managers like dnf, apt, and
yum that download a bunch of metadata before installing packages. You would
normally have to clear the metadata carefully for the package manager so that
your container wouldn’t grow in size. Buildah takes care of that by squashing
all the stuff you add into one layer.

Of course, if you want to be more aggressive, buildah offers the --squash
option which squashes the whole image down into one layer. This can be
helpful if disk space is at a premium and you change the layers often.

Getting started

I have a repository called os-containers in GitLab that maintains fully
updated containers for Fedora 29 and 30. The .gitlab-ci.yml file calls
build.sh for two containers: fedora29 and fedora30. Open the build.sh
file and follow along here:

# Use vfs with buildah. Docker offers overlayfs as a default, but buildah# cannot stack overlayfs on top of another overlayfs filesystem.exportSTORAGE_DRIVER=vfs

First off, we need to tell buildah to use the vfs storage driver. Docker uses
overlayfs by default and stacking overlay filesystems will definitely lead to
problems. Buildah won’t let you try it.

# Write all image metadata in the docker format, not the standard OCI format.# Newer versions of docker can handle the OCI format, but older versions, like# the one shipped with Fedora 30, cannot handle the format.exportBUILDAH_FORMAT=docker

By default, buildah uses the oci container format. This sometimes causes
issues with older versions of Docker that don’t understand how to parse that
type of metadata. By setting the format to docker, we’re using a format
that almost all container runtimes can understand.

Here we set a path for the auth.json that contains the credentials for
talking to the container repository. We also use buildah to authenticate to
GitLab’s built-in container repository. GitLab automatically exports these
variables for us (and hides them in the job output), so we can use them here.

buildah bud -f builds/${IMAGE_NAME} -t ${IMAGE_NAME} .

We’re now building the container and storing it temporarily as the bare image
name, such as fedora30. This is roughly equivalent to docker build.

Now we are making a reference to our container with buildah from and using
that reference to squash that container down into a single layer. This keeps
the container as small as possible.

The commit step also tags the resulting image using our fully qualified
image name (in this case, it’s
registry.gitlab.com/majorhayden/os-containers/fedora30:latest)

buildah push ${FQ_IMAGE_NAME}

This is the same as docker push. There’s not much special to see here.

Maintaining containers

GitLab allows you to take things to the next level with CI schedules. In my
repository, there is a schedule to build my containers once a day to catch
the latest updates. I use these containers a lot and they need to be up to
date before I can run tests.

If the container build fails for some reason, GitLab will send me an email to
let me know.

My team at Red Hat builds a lot of kernels in OpenShift pods as part of our
work with the Continuous Kernel Integration (CKI) project. We have lots of
different pod sizes depending on the type of work we are doing and our GitLab
runners spawn these pods based on the tags in our GitLab CI pipeline.

Compiling with make

When you compile a large software project, such as the Linux kernel, you can
use multiple CPU cores to speed up the build. GNU’s make does this with the
-j argument. Running make with -j10 means that you want to run 10 jobs
while compiling. This would keep 10 CPU cores busy.

Setting the number too high causes more contention from the CPU and can
reduce performance. Setting the number too low means that you are spending
more time compiling than you would if you used all of your CPU cores.

Every once in a while, we adjusted our runners to use a different amount of
CPUs or memory and then we had to adjust our pipeline to reflect the new CPU
count. This was time consuming and error prone.

Many people just use nproc to determine the CPU core count. It works well
with make:

make -j$(nproc)

Problems with containers

The handy nproc doesn’t work well for OpenShift. If you start a pod on
OpenShift and limit it to a single CPU core, nproc tells you something very
wrong:

$ nproc
32

We applied the single CPU limit with OpenShift, so what’s the problem? The
issue is how nproc looks for CPUs. Here’s a snippet of strace output:

The sched_getaffinity syscall looks to see which CPUs are allowed to run
the process and returns a count of those. OpenShift doesn’t prevent us from
seeing the CPUs of the underlying system (the VM or bare metal host
underneath our containers), but it uses cgroups to limit how much CPU time we
can use.

Reading cgroups

Getting cgroup data is easy! Just change into the /sys/fs/cgroup/ directory
and look around:

cpu.cfs_quota_us: the total available run-time within a period (in microseconds)
cpu.cfs_period_us: the length of a period (in microseconds)
cpu.stat: exports throttling statistics [explained further below]

The default values are:
cpu.cfs_period_us=100ms
cpu.cfs_quota=-1

A value of -1 for cpu.cfs_quota_us indicates that the group does not have any
bandwidth restriction in place, such a group is described as an unconstrained
bandwidth group. This represents the traditional work-conserving behavior for
CFS.

Writing any (valid) positive value(s) will enact the specified bandwidth limit.
The minimum quota allowed for the quota or period is 1ms. There is also an
upper bound on the period length of 1s. Additional restrictions exist when
bandwidth limits are used in a hierarchical fashion, these are explained in
more detail below.

Writing any negative value to cpu.cfs_quota_us will remove the bandwidth limit
and return the group to an unconstrained state once more.

Any updates to a group’s bandwidth specification will result in it becoming
unthrottled if it is in a constrained state.

Let’s see if inspecting cpu.cfs_quota_us can help us:

$ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
10000

Now we’re getting somewhere. But what does 10000 mean here? OpenShift
operates on the concept of millicores of CPU time, or 1⁄1000 of a CPU. 500
millicores is half a CPU and 1000 millicores is a whole CPU.

The pod in this example is assigned 100 millicores. Now we know that we can
take the output of /sys/fs/cgroup/cpu/cpu.cfs_quota_us, divide by 100, and
get our millicores.

The script checks for the value of the quota and divides by 100,000 to get
the number of cores. If the share is set to something less than 100,000, then
a core count of 1 is assigned. (Pro tip: make does not like being told to
compile with zero jobs.)

Reading memory limits

There are other limits you can read and inspect in a pod, including the
available RAM. As we found with nproc, free is not very helpful:

If you run Java applications in a container, like Jenkins (or Jenkins
slaves), be sure to use the -XX:+UseCGroupMemoryLimitForHeap option. That
will cause Java to look at the cgroups to determine its heap size.

My work at Red Hat involves testing lots and lots of kernels from various
sources and we use GitLab CE to manage many of our repositories and run our
CI jobs. Those jobs run in thousands of OpenShift containers that we
spawn every day.

OpenShift has some handy security features that we like. First, each
container is mounted read-only with some writable temporary space (and any
volumes that you mount). Also, OpenShift uses arbitrarily assigned user IDs for each container.

Constantly changing UIDs provide some good protection against container
engine vulnerabilities, but they can be a pain if you have a script or
application that depends on being able to resolve a UID or GID back to a real
user or group account.

Ansible and UIDs

If you run an Ansible playbook within OpenShift, you will likely run into a
problem during the fact gathering process:

After writing my last post on my IPv6 woes with my Pixel 3, some readers
asked how I’m handling IPv6 on my router lately. I wrote about this
previously when Spectrum was Time Warner Cable and I was using Mikrotik
network devices.

We have two Google Pixel phones in our house: a Pixel 2 and a Pixel 3. Both
of them drop off our home wireless network regularly. It causes lots of
problems with various applications on the phones, especially casting video
via Chromecast.

At the time when I first noticed the drops, I was using a pair of wireless
access points (APs) from Engenius:

At this point, I felt strongly that the APs had nothing to do with it. I
ordered a new NetGear Orbi mesh router and satellite anyway. The Pixels
still dropped off the wireless network even with the new Orbi APs.

Reading logs

I started reading logs from every source I could find:

dhcpd logs from my router

syslogs from my APs (which forwarded into the router)

output from tcpdump on my router

Several things became apparent after reading the logs:

The Wi-Fi drop occurred usually every 30-60 seconds

The DHCP server received requests for a new IP address after every drop

None of the network traffic from the phones was being blocked at the router

The logs from the APs showed the phone disconnecting itself from the
network; the APs were not forcing the phones off the network

All of the wireless and routing systems in my house seemed to point to a
problem in the phones themselves. They were voluntarily dropping from the
network without being bumped off by APs or the router.

Getting logs from the phone

It was time to get some logs from the phone itself. That would require
connecting the phone via USB to a computer and enabling USB debugging on the
phone.

First, I downloaded the Android SDK. The full studio release isn’t needed
– scroll down and find the Command line tools only section. Unzip the
download and find the tools/bin/sdkmanager executable. Run it like this:

The line with CTRL-EVENT-AVOID-FREQ isn’t relevant because it’s simply a
hint to the wireless drivers to avoid certain frequencies not used in the
USA. The CTRL-EVENT-DISCONNECTED shows where wpa_supplicant received the
disconnection message. The last line with ConnectivityService was very
interesting. Something in the phone believes there is a network connectivity
issue. That could be why the Pixel is hopping off the wireless network.

From there, I decided to examine only the ConnectivityService logs:

sudo platform-tools/adb logcat 'ConnectivityService:* *:S'

This logcat line tells adb that I want all logs from all log levels about the
ConnectivityService, but all of the other logs should be silenced. I
started seeing some interesting details:

Wait, what is this “validation failed” message? The Pixel was making network
connections successfully the entire time as shown by tcpdump. This is part of
Android’s [network connecivity checks] for various networks.

The last few connections just before the disconnect were to
connectivitycheck.gstatic.com (based on tcpdump logs) and that’s Google’s
way of verifying that the wireless network is usable and that there are no
captive portals. I connected to it from my desktop on IPv4 and IPv6 to verify:

Heading to Google

After a bunch of searching on Google, I kept finding posts talking about
disabling IPv6 to fix the Wi-Fi drop issues. I shrugged it off and kept
searching. Finally, I decided to disable IPv6 and see if that helped.

I stopped radvd on the router, disabled Wi-Fi on the phone, and then
re-enabled it. As I watched, the phone stayed on the wireless network for two
minutes. Three minutes. Ten minutes. There were no drops.

At this point, this is still an unsolved mystery for me. Disabling IPv6 is a
terrible idea, but it keeps my phones online. I plan to put the phones on
their own VLAN without IPv6 so I can still keep IPv6 addresses for my other
computers, but this is not a good long term fix. If anyone has any input on
why this helps and how I can get IPv6 re-enabled, please let me know!

Update 2019-03-18

Several readers wanted to see what was happening just before the Wi-Fi drop,
so here’s a small snippet from tcpdump: