I’ve mentioned user namespaces here before, and shown how to play a bit with them. When a task is cloned into a new user namespace, the uids in the namespace can be mapped (1-1, in blocks) to uids on the host – for instance uid 0 in the container could be uid 100000 on the host. The uids are translated at the kernel-userspace boundary (i.e. stat, etc), and capabilities for a namespaced task are only valid against objects owned by that namespace. The result is that root in a container is unprivileged on the host.

Eric has been making great progress in moving the kernel functionality upstream. With the newest 3.7 based ubuntu kernel, plus a few of his not yet merged patches, a milestone has been reached – it’s now possible to run a full ubuntu container in a user namespace!

First start up a fresh, uptodate quantal vm or instance. Install my user namespace ppa, install the kernel and nsexec packages from there, create a container, and convert it to be namespaced:

The ‘container-userns-convert’ script just shifts the user and group ids of file owners in the container rootfs, and adds two lines to the container configuration file to tell lxc to clone the new user namespace and set up the uid/gid mappings.

Now you can start the container,

sudo lxc-start -n q1 -d
sudo lxc-console -n q1

Look around the container, sudo bash; notice that it looks like a normal system, with ubuntu as uid 1000, root as uid 0. But look from the host, and you see root tasks in the container are actually running as uid 100000, and ubuntu ones as uid 100000.

There are a few oddnesses (you can sudo on ttys 1-4, but sometimes it fails on /dev/console, and shutdown in the container does not kill init); the lxc package needs a few more changes (the cgroup setup needs to be moved to the container parent); and plenty of things are not yet allowed by the kernel (mounting an ext4 filesystem).

But this is a full Ubuntu image, confined by a private user namespace!

After working out some kinks, we’ll next want to look into container startup by unprivileged users.

9 Responses to Full Ubuntu container confined in a user namespace

This is awesome^infinity! I hope issues with saucy/XFS-or-whatever-blocking-now are resolved so user namespace are usable in 13.10. It’s been difficult to keep up-to-date with the kernel team, but hopefully no patched kernel will be needed in saucy out-of-the-box. Great work, very exciting!

Dwight Engen has gotten the xfs patches accepted into the xfs tree. Now we just need the xfs tree to be merged into Linus’ tree. It won’t be enabled in saucy, as that kernel has been chosen, but at the next cycle.

\o/ at reboot Q&A :)
anyway…So I get word thatlxc now sounds like it’s tremendously close to production quality… Great work. How Is the network SETUID in terms of security contra the usual (host ) network stack? If equally strong/weak to attacks, then it [lxc] is ready for production servers?

Hi, thanks. I’m not quite sure whether you mean something a bit different by ‘contra the network stack’, but I would say,

Containers at this point, using apparmor/selinux, seccomp, cgroups, and user namespaces, are as secure as containers will get. The remaining attack surface – any unknown-to-us attacks against syscalls – can’t be further mitigated so long as we share a kernel. Note seccomp is not configured by default, but if you know your workload I’d heavily recommend using a seccomp blacklist (v2 policy) to further reduce the attack surface.

To give a strong answer: providing root in an unprivileged container is no more dangerous than providing a regular user shell account (by definition, since any unprivileged user can create a new user namespace in which he is root). And, if you are sufficiently paranoid, any network facing service has a chance of an exploit allowing escape to at least an unprivileged user shell, which again is equivalent to an unprivileged container.

If I were going to wait for anything, I’d simply give it some time for any more of the (implementation and design) bugs to be shaken out. We’ve already run into some, and the interactions are complex and subtle enough that we shouldn’t be too surprised to learn about more.

Ye.. it’s brilliant serge. I dropped by last year some time, commending how much legwork you guys have put in and said I was looking forward to what is obviously today. And in all honesty, I was actually thinking it would be closer to end of 2014 so I am very happy and impressed with how much and far you guys have come on this. I am now dwelving into the LXC world, preparing to go ‘live’ as you say, with perhaps some more bugs been hammered out and so forth. Thank you for the tips. Policies will be some work. Also need to test more on container-systemd potential oddities/bugs. (Am principally a Arch user since unity made me say bye bye to Ubu :))

I did grab a dev release of ubutnu now though to make sure all the patches and so on were in place. I re-compiled the arch kernel (whcih didn’t have user namespaces set) but think some of the patches to login and/or shadow might be lacking. WIll find out soon.

Still, a huge applaud for the work you guys have been throwing at this! :)