Containers & Docker: How Secure Are They?

This post reviews the various security implications of using Docker to run applications within containers, and how to address them.

There are three great areas to consider:

the intrinsic security of containers, as implemented by namespaces and cgroups;

the specific attack surface of the Docker daemon itself;

the “hardening” security features of the kernel and how they interact with containers.

We will also discuss how Docker security features compare with other systems.

image source: US PATENT US6877440 B1

Kernel Namespaces

Docker containers are essentially LXC containers, and they come with the same security features. When you start a container with docker run, behind the scenes, it uses lxc-start to execute the Docker container. This creates a set of namespaces and control groups for the container. Those namespaces and control groups are not created by Docker itself, but by lxc-start. This means that as the LXC userland tools evolve (and provide additional namespaces and isolation features), Docker will automatically make use of them.

Namespaces provide the first, and most straightforward, form of isolation: processes running within a container cannot see, and even less affect, processes running in another container, or in the host system.

Each container also gets its own network stack, meaning that a container doesn’t get a privileged access to the sockets or interfaces of another container. Of course, if the host system is setup accordingly, containers can interact with each other through their respective network interfaces — just like they can interact with external hosts. By default, IP traffic is allowed between containers; so they can ping each other, send/receive UDP packets, and establish TCP connections; but that can be restricted if necessary. From a network architecture point of view, all containers on a given Docker host are sitting on a bridge interface. This means that they are just like physical machines connected through a common Ethernet switch; no more, no less.

We often get the question: “is this code mature?”, and the answer is “yes, pretty mature”. Kernel namespaces have been introduced between kernel version 2.6.15 and 2.6.26. This means that since July 2008 (date of the 2.6.26 release, now 5 years ago), namespace code has been exercised and scrutinized on a large number of production systems. And there is more: the design and inspiration for the namespaces code are even older. Namespaces are actually an effort to reimplement the features of OpenVZ in such a way that they could be merged within the mainstream kernel. And OpenVZ was initially released in 2005… So yes, both the design and the implementation are pretty mature.

Control Groups

Control Groups are the other key component of Linux Containers. They implement resource accounting and limiting. They provide a lot of very useful metrics, but they also help to ensure that each container gets its fair share of memory, CPU, disk I/O; and, more importantly, that a single container cannot bring the system down by exhausting one of those resources.

So while they do not play a role in preventing one container from accessing or affecting the data and processes of another container, they are essential to fend off some denial-of-service attacks. They are particularly important on multi-tenant platforms, like public and private PaaS, to guarantee a consistent uptime (and performance) even when some applications start to misbehave.

Control Groups have been around for a while as well: the code was started in 2006, and initially merged in kernel 2.6.24.

Specific Attack Surface of the Docker Daemon

Running containers (and applications) with Docker implies running the Docker daemon. This daemon currently requires root privileges, and you should therefore be aware of some important details.

First of all, only trusted users should be allowed to control your Docker daemon. This is a direct consequence of some powerful Docker features. Specifically, Docker allows you to share a directory between the Docker host and a guest container; and it allows you to do so without limiting the access rights of the container. This means that you can start a container where the /host directory will be the / directory on your host; and the container will be able to alter your host filesystem without any restriction. This sounds crazy? Well, you have to know that all virtualization systems allowing filesystem resource sharing behave the same way. Nothing prevents you from sharing your root filesystem (or even your root block device) with a virtual machine.

This has a strong security implication: if you instrument Docker from e.g. a web server to provision containers through an API, you should be even more careful than usual with parameter checking, to make sure that a malicious user cannot pass crafted parameters causing Docker to create arbitrary containers.

For this reason, the REST API endpoint (used by the Docker CLI to communicate with the Docker daemon) changed in Docker 0.5.2, and now uses a UNIX socket instead of a TCP socket bound on 127.0.0.1 (the latter being prone to cross-site-scripting attacks if you happen to run Docker directly on your local machine, outside of a VM). You can then use traditional UNIX permission checks to limit access to the control socket.

You can also expose the REST API over HTTP if you explicitly decide so. However, if you do that, being aware of the abovementioned security implication, you should make sure that it will be reachable only from a trusted network or VPN; or protected with e.g. stunnel and client SSL certificates.

Recent improvements in Linux namespaces will soon allow to run full-featured containers without root privileges, thanks to the new user namespace. This is covered in detail here. Moreover, this will solve the problem caused by sharing filesystems between host and guest, since the user namespace allows users within containers (including the root user) to be mapped to other users in the host system.

The end goal for Docker is therefore to implement two additional security improvements:

map the root user of a container to a non-root user of the Docker host, to mitigate the effects of a container-to-host privilege escalation;

allow the Docker daemon to run without root privileges, and delegate operations requiring those privileges to well-audited sub-processes, each with its own (very limited) scope: virtual network setup, filesystem management, etc.

Finally, if you run Docker on a server, it is recommended to run exclusively Docker in the server, and move all other services within containers controlled by Docker. Of course, it is fine to keep your favorite admin tools (probably at least an SSH server), as well as existing monitoring/supervision processes (e.g. NRPE, collectd, etc).

Linux Kernel Capabilities

By default, Docker starts containers with a very restricted set of capabilities. What does that mean?

Capabilities turn the binary “root/non-root” dichotomy into a fine-grained access control system. Processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the net_bind_service capability instead. And there are many other capabilities, for almost all the specific areas where root privileges are usually needed.

This means a lot for container security; let’s see why!

Your average server (bare metal or virtual machine) needs to run a bunch of processes as root. Those typically include SSH, cron, syslogd; hardware management tools (to e.g. load modules), network configuration tools (to handle e.g. DHCP, WPA, or VPNs), and much more. A container is very different, because almost all of those tasks are handled by the infrastructure around the container:

SSH access will typically be managed by a single server running in the Docker host;

cron, when necessary, should run as an user process, dedicated and tailored for the app that needs its scheduling service, rather than as a platform-wide facility;

log management will also typically be handed to Docker, or by third-party services like Loggly or Splunk;

hardware management is irrelevant, meaning that you never need to run udevd or equivalent daemons within containers;

network management happens outside of the containers, enforcing separation of concerns as much as possible, meaning that a container should never need to perform ifconfig, route, or ip commands (except when a container is specifically engineered to behave like a router or firewall, of course).

This means that in most cases, containers will not need “real” root privileges at all. And therefore, containers can run with a reduced capability set; meaning that “root” within a container has much less privileges than the real “root”. For instance, it is possible to:

deny all “mount” operations;

deny access to raw sockets (to prevent packet spoofing);

deny access to some filesystem operations, like creating new device nodes, changing the owner of files, or altering attributes (including the immutable flag);

deny module loading;

and many others.

This means that even if an intruder manages to escalate to root within a container, it will be much harder to do serious damage, or to escalate to the host.

Of course, you can always enable extra capabilities if you really need them (for instance, if you want to use a FUSE-based filesystem), but by default, Docker containers will be locked down to ensure maximum safety.

Other Kernel Security Features

Capabilities are just one of the many security features provided by modern Linux kernels. It is also possible to leverage existing, well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with Docker.

While Docker currently only enables capabilities, it doesn’t interfere with the other systems. This means that there are many different ways to harden a Docker host. Here are a few examples.

You can run a kernel with GRSEC and PAX. This will add many safety checks, both at compile-time and run-time; it will also defeat many exploits, thanks to techniques like address randomization. It doesn’t require Docker-specific configuration, since those security features apply system-wide, independently of containers.

If your distribution comes with security model templates for LXC containers, you can use them out of the box. For instance, Ubuntu comes with AppArmor templates for LXC, and those templates provide an extra safety net (even though it overlaps greatly with capabilities).

You can define your own policies using your favorite access control mechanism. Since Docker containers are standard LXC containers, there is nothing “magic” or specific to Docker.

Just like there are many third-party tools to augment Docker containers with e.g. special network topologies or shared filesystems, you can expect to see tools to harden existing Docker containers without affecting Docker’s core.

Comparison With Virtual Machines

Traditional virtualization techniques (as implemented by Xen, VMWare, KVM, etc.) are deemed to be more secure than containers, since they provide an extra level of isolation. A container can issue syscalls to the host kernel, while a full VM can only issue hypercalls to the host hypervisor, which will generally have a much smaller surface of attack.

But the real reason why full VMs would be considered more secure than containers, is because they got more exposure in production, and more scrutiny. There are many providers out there selling virtual machines to the public; while those selling containers are only a handful — mainly public PAAS providers. Since containers are much more resource-efficient and easier to manage, you can expect this situation to reverse over the next years; and you can rely on the responsiveness of the Linux kernel development community to patch security holes extremely quickly when they will surface.

However, it has been pointed out that if a kernel vulnerability allows arbitrary code execution, it will probably allow to break out of a container — but not out of a virtual machine. No exploit has been crafted yet to demonstrate this, but it will certainly happen in the feature (especially with more and more containers in production: they will become a more “interesting” target for a malicious user). Does that mean that containers are really less secure? We think not. First, hypervisors are not exempt of vulnerabilities. And then, critical kernel issues tend to be fixed very quickly when they’re discovered (since they potentially affect not only container-based systems, but all Linux systems out there).

There is another side to the coin: when an exploit or security hole is found in the kernel, you have to upgrade the kernel and reboot. Sometimes, you can use a system like Ksplice, which allows “reboot-less” upgrades. However, you still need to deploy the new kernel, and update your VM images. Things are easier with containers: since the kernel is outside of the scope of the container image, you don’t have to change all your container images when you upgrade the kernel. Even if you do something quite drastic like moving from AppArmor to SELinux or vice versa, you will make changes outside of your containers, but you won’t have to update the containers themselves. This clear and clean separation of concerns is a major advantage over VMs. Also, the availability of systems like CRIU means that you can do container live migration, i.e. move a container from a machine to another without killing processes. This means that it will be possible to achieve uninterrupted operation during kernel upgrades.

Virtual Machines might be more secure today, but containers are definitely catching up; and containers are already easier to manage, and therefore it’s easier to make sure that they are up-to-update from a security standpoint.

Comparison With Other Containerization Systems

We can sort other containerization systems in three categories.

LXC-based systems: those systems will provide exactly the same level of security as Docker itself. They might claim extra security features, but those features will not be provided by the system, they will be enabled. In other words, if another containerization software advertises “role-based authorization and enhanced security”, it means that it merely enables and configures some existing features like SELinux or SMACK, and that it should be fairly easy to add similar features to Docker — either in the core, or as a third-party add-on (as explained earlier).

Linux systems not based on LXC: that would be OpenVZ. OpenVZ is great, and it has been around for longer than LXC, so some people consider it to be more stable and secure. However, one has to keep in mind that LXC and OpenVZ share many developers in common, and that LXC is nothing else than “OpenVZ redesigned to be able to be merged into the mainline kernel”. Therefore, OpenVZ will eventually sunset, to be fully replaced by LXC.

Non-Linux containerization systems: some of those systems are plain awesome (e.g. Solaris Zones); however, to the best of our knowledge, none of them will let you run existing Linux processes as efficiently and as reliably as a “true” Linux system. This might or might not be a problem for you (after all, many people run e.g. Node and Mongo stacks on Solaris without any problem whatsoever). Note, however, that even if there are some big deployments of FreeBSD Jails and Solaris Zones out there, it’s just a drop of water in the big ocean of Linux-based “VPS” offerings out there. This means that Linux (and that includes VServer, OpenVZ, LXC) got much more exposure. That doesn’t make it intrinsically more secure, but that helps a lot.

Finally, it’s worth mentioning that Docker 1.0 won’t be LXC-specific. It will be able to support other runtimes through a plug-in mechanism.

Conclusions

VMs are considered more secure than containers, but the difference blurs away if you abide by the previous advice, i.e. run processes as non-privileged users (which is sometimes impractical with VMs, but easy with containers).

You can add an extra layer of safety by enabling Apparmor, SELinux, GRSEC, or your favorite hardening solution.

Last but not least, if you see interesting security features in other containerization systems, you will be able to implement them as well with Docker, since everything is provided by the kernel anyway.

Note: the paragraphs about Containers/VM security, and about other isolation systems, have been updated following feedback on HackerNews. Thanks guys!

About Jérôme Petazzoni

Jérôme is a senior engineer at dotCloud, where he rotates between Ops, Support and Evangelist duties and has earned the nickname of “master Yoda”. In a previous life he built and operated large scale Xen hosting back when EC2 was just the name of a plane, supervized the deployment of fiber interconnects through the French subway, built a specialized GIS to visualize fiber infrastructure, specialized in commando deployments of large-scale computer systems in bandwidth-constrained environments such as conference centers, and various other feats of technical wizardry. He cares for the servers powering dotCloud, helps our users feel at home on the platform, and documents the many ways to use dotCloud in articles, tutorials and sample applications. He’s also an avid dotCloud power user who has deployed just about anything on dotCloud – look for one of his many custom services on our Github repository.

Containers & Docker: How Secure Are They?

Jerome is a senior engineer at Docker, where he rotates between Ops, Support and Evangelist duties. In another life he built and operated Xen clouds when EC2 was just the name of a plane, developed a GIS to deploy fiber interconnects through the French subway, managed commando deployments of large-scale video streaming systems in bandwidth-constrained environments such as conference centers, and various other feats of technical wizardry. When annoyed, he threatens to replace things with a very small shell script.

21 Responses to “Containers & Docker: How Secure Are They?”

Engineer

Since these are very important issues, and there are several recommendations here, it might be good to have a security guide/howto for docker.

Specifically, for me, I’m curious about how the networking is set up and how to make sure my networking is set up correctly on my hosts so that containers can talk to other containers on the VPN while also talking to the outside world on a normal interface.

The information in this blog post (and many other useful security details) will soon be integrated in the Docker main documentation as well.

Regarding network setup, the general approach of Docker is to provide each container with a private interface (eth0), but allow additional interfaces to be added dynamically. Those interfaces can be private networks, physical NICs on the host machine, VLANs, VPNs…

The Gentoo warning applies to older kernels, especially when retaining all capabilities. A recent (3.8+) kernel, running containers dropping most kernel capabilities, is much safer than those articles describe.

The feature described in the Ubuntu article (user namespaces) is in the kernel, but is not used yet by Docker. It will be used either when LXC tools support it (there is experimental code to do that already), or if Docker switches to a different toolset.

Well, unfortunately, this is by design 🙂
The host will always see the containers.
If, for some reason, you need regular users to be able to log on the host, you can:
– assign different UIDs to the containers;
– wait for user namespaces to be supported.

The emulator is very speedy: back in the day I used to play the Linux binaries of “Return to Caste Wolfenstein” on FreeBSD system using NVidia drivers.

… but if you are concerned about security, keep in mind that those systems haven’t had as much exposure, neither; and their source code isn’t always available for peer review and auditing.

Jails have been around ‘officially’ since FreeBSD 4.0 (2000), and FreeBSD is obviously open source. Containers have been available since Solaris 10 (2005), and OpenSolaris (now illumos) is also open source.

Joyent has their “SmartOS” product, which is based on illumos, as the core to their cloud computing virtualization infrastructure. There are companies that rent you a FreeBSD VPS (with full root access).

I know of two past vulnerabilities in FreeBSD that allow leakage outside of a jail (and one DoS), and no vulnerabilities in Solaris’ zones that allowed leakage (but one DoS). The proviso about not running privileged (read: root) processes inside them is not needed, as the isolation is strong enough that it does not matter.

If you’re talking about open source alternatives to LXC, then there are technologies out there that have been around longer, have a proven track record, and appear to be more ‘sealed’ than LXC.

In case you’re wondering: I work in IT, and have run FreeBSD and Solaris systems at past companies. I’m currently in a Linux shop where we use KVM extensively as part of our virtualization infrastructure. We have another group that runs VMware too (with some Hyper-V for their test lab).

I hope this helps your readers get an idea as to what else is out there.

Yes, I know that FreeBSD and Solaris can run Linux processes; but there are far from perfect, at least from my experience. Specifically:
– The “lx” brand (for zones) officially supports only RedHat 3.X (which means that officially, it doesn’t support anything useful); it is supposed to work with more recent releases, but trying to get it to work with a Debian install was (in my personal experience) pretty complex. The syscall emulation layer is outdated, meaning that recent libc will behave erratically (especially with threads), and I don’t know what 64 bits support is like (the last time I “played” with the lx brand, I was only able to run it in 32 bits).
– On FreeBSD, the syscall translation is more up-to-date, and it’s quite impressive to be able to run “heavy” things like WINE, for sure. But things like gdb or strace (basically anything using the ptrace syscall) will get you in trouble.
It’s great to run some legacy software; but I wouldn’t call that a “generic” way to run arbitrary processes blindly with 100% confidence that it will work exactly as “the real thing”.
I realize that you can hit glitches with containers or virtual machines, but in my experience, containers and VMs were a few orders of magnitude closer to the “real thing” than zones or jails (as far as Linux process execution is concerned). Hence my remark in the blog post. I might update that section to expand it a little bit, indeed!

I also know that you can find companies renting zones and jails. But on the top of my head, I can only think about Joyent (for zones) and I don’t even have an example for jails; while I could easily list 10 providers for Linux “VPS”, without Googling. That’s what I meant when I mentioned different “exposure”. Nonetheless, I updated that section of the blog post, because it was very misleading, since people could think that jails or zones weren’t open source (I was thinking about proprietary Unices when I wrote that).

Robert

(The thread you posted is two years old; while it gives a very good descriptions of capabilities, and some of their limitations — the famous “full root equivalence” — it doesn’t apply do containers, at least not in the way described by this article. I would love if you could expand your point!)

I could not read the entire article so my comment is all surface. Depending on the use-case a container is no more or less secure than any other OS installation; virtual or physical. Once you invite random software into your domain (friend or not) you are at risk.

As pointed out in the article, a container can be more secure, because stuff that “traditionally” has to run as root, can run as a non-privileged user. I invite you to read the article in full (sorry that it’s so long; I do realize that for people with a lot of expertise in that area, there is probably a lot of things that you already know and would like to skip!)

[…] Containers & Docker: How Secure Are They? | Docker BlogFrom a network architecture point of view, all containers on a given Docker host are sitting on a bridge interface. This means that they are just like physical machines connected through a common Ethernet switch; no more, no. […]

[…] a speaker could fill a session with just that topic. For folks who want a deep dive, check out this post about containers and Docker security on the Docker blog. It may be slightly outdated, but serves as a reasonable […]

Jesús

Part of my hardening procedure into hosts/VMs is to use File Integrity Checker. I use Samhain or OSSEC. I would like to check the integrity of some configuration file or an binary into the container. What is the best approach to perform this?