Analyzing Docker Container Performance With Native Tools

2017-10-16, by Richa Karn

Containerization is changing how organizations deploy and use software. You can now deploy almost any software reliably with just the docker run command. And with orchestration platforms like Kubernetes and DC/OS, even production deployments are easy to set up.

You may have already experimented with Docker, and have maybe run a few containers. But one thing you might not have much experience with is understanding how Docker containers behave under different loads.

Because Docker containers, from the outside, can look a lot like black boxes, it's not obvious to a lot of people how to go about getting runtime metrics and doing analysis.

In this post, we will set up a small CrateDB cluster with Docker and then go through some useful Docker commands that let us take a look at performance.

Let us start with a quick intro to CrateDB.

Set Up a CrateDB Docker Cluster

CrateDB is an open source, distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time. It is horizontally scalable, highly available, and runs in fault tolerant clusters that work very well in virtualized and containerised environments.

You might already have a CrateDB Docker cluster that you can use. Or indeed, running Docker containers of any kind.

Docker Metrics

The main parameters of container performance analysis we're interested in for this post are CPU, memory, block I/O, and network I/O.

Docker provides multiple options to get these metrics:

Use the docker stats command

Use the REST API exposed by Docker daemon

Read the cgroups pseudo files

However, metrics coverage across these three mechanisms is uneven.

For example, docker stats provides a top level picture of these resources which is enough for many users. While the cgroup pseudo files provides detailed analytics that can come in handy for a deep analysis of container performance.
I will discuss all three options.

Let's start with the docker stats command.

docker stats

The docker stats command displays a live data stream with CPU, memory usage, memory limit, block I/O, and network IO metrics for all the running containers.

Note that if you specify a stopped container, the command succeeds but there is no output.

To limit data to one or more specific containers, you can specify a list of container names or IDs, separated by a space.

The CPU % column reports the host capacity CPU utilization.For example, if you have two containers, each allocated the same CPU shares by Docker, and each using max CPU, the docker stats command for each container would report 50% CPU utilization. Though from the container's perspective, their CPU resources would be fully utilized.

The MEM USAGE / LIMIT and MEM % columns display the amount of memory used by the container, along with the container memory limit, and the corresponding container utilization percentage.If there is no explicit memory limit set for the container, the memory usage limit will be the memory limit of the host machine.Note that like the CPU % column, these columns report on host utilization.

The NET I/O column displays the total bytes received and transmitted over the network by the corresponding container.For example, in the above output, container 2f2697df4b79 received 21.7 MB and sent 8.51 MB of data.

The BLOCK I/O section displays the total bytes written and read to the container file system.

The PIDS column displays the number of kernel process IDs running inside the corresponding container.

Next, let's take a look at the REST APIs exposed by Docker daemon.

REST API

The Docker daemon listens on unix:///var/run/docker.sock, which only allows local connections by the root user. When you launch Docker, however, you can bind it to a different port or socket.

Like docker stats, the REST API continuously reports a live stream of CPU, memory, and I/O data. However, the API provides longer, live-streaming chunks of JSON, with metrics about the container.

Per core CPU usage in nanoseconds. A sum total of all the usage stats in this object.

usage_in_kernelmode

System CPU usage in nanoseconds.

usage_in_usermode

User CPU usage in nanoseconds.

Next up is system_cpu_usage. This value represents the host's cumulative CPU usage in nanoseconds. This includes user, system, and idle.

The online_cpus value represents the number of CPU core on the host machine.

CPU utilization is one of the key factors needed to judge the overall load on the system and as you can see above, the Docker daemon REST API provides comprehensive CPU usage stats, so you can monitor and adjust your deployment as needed.

There's a lot of data here, and we don't need to know what all of it means.

Here are the most important bits for getting started:

The cache value is the memory being used by the container that can be directly mapped to block devices. In simpler terms, this as a measure of file operations (open, read, write, and so on) being performed against the container file system.

The rss value is memory that doesn't correspond to anything mapped to the container file system. That includes stacks, heaps, and anonymous memory maps.

The mapped_file value is the memory mapped by the processes inside the container. Files are sometimes mapped to a segment of virtual memory to improve I/O performance.

The swap value is the amount of swap currently used by processes inside the container. Swap, as you may know, is file system based memory that is used when the physical memory (RAM) has run out.

This object displays block I/O operations performed inside the container.

The io_service_bytes_recursive section contains the number of objects representing the bytes transferred to and from the container file system by the container, grouped by operation type.

Within each object, the first two fields specify the major and minor number of the device, the third field specifies the operation type (read, write, sync, or async), and the fourth field specifies the number of bytes.

This file shows us CPU usage, accumulated by the processes of the container. This is broken down into user and system time.

User time (user) corresponds to time during which the processes were in direct control of the CPU, i.e. executing process code. Whereas system time (system) corresponds to the time during which the CPU was executing system calls on behalf of those processes.

Next, let's explore the I/O stats in cgroup files.

I/O metrics are present in blkio.throttle.io_service_bytes and blkio.throttle.io_serviced files present in this directory:

This shows the total bytes transferred during all the I/O operations performed by the container.

Finally, let's look at how to extract network metrics from pseudo files. This is important as network metrics are not directly exposed by control groups. Instead, Docker provides per-interface metrics.

Since each container has a virtual ethernet interface, Docker lets you directly check the TX (transmit) and RX (receive) counters for this interface from inside the container.

This shows you the data transfer details for the container's virtual interface eth0.

Wrap Up

In this post we took a look at the docker stats command, the Docker REST API, and cgroups pseudo files.

We learnt that there are multiple ways to get statistics from a Docker container. Which method you use will depend on your setup.

The docker stats command is good for small scale use, with a few containers running on a single host.

The Docker REST API is good when you have multiple containers running on multiple hosts, and you'd like to retrieve the stats remotely.

The cgroups pseudo files are the fastest and most efficient way to get stats, and are suitable for for large setups where performance is important.

While all these options are useful if you're planning to build your own tooling around Docker monitoring, there are several pre-built solutions, including Prometheus, cAdvisor, Scout, DataDog. We'll take a closer look at Docker health monitoring tools in the future.

Newsletter

Stay up to date

Sign up here to keep informed about CrateDB product news, events, how-to articles, and community update.