Linux Sandboxing

Chromium uses a multiprocess model, which allows to give different privileges and restrictions to different parts of the browser. For instance, we want renderers to run with a limited set of privileges since they process untrusted input and are likely to be compromised. Renderers will use an IPC mechanism to request access to resource from a more privileged (browser process). You can find more about this general design here.

We use different sandboxing techniques on Linux and Chrome OS, in combination, to achieve a good level of sandboxing. You can see which sandboxes are currently engaged by looking at chrome://sandbox (renderer processes) and chrome://gpu (gpu process).

We have a two layers approach:

Layer-1 (also called the “semantics” layer) prevents access to most resources from a process where it's engaged. The setuid sandbox is used for this.

Layer-2 (also called “attack surface reduction” layer) restricts access from a process to the attack surface of the kernel. Seccomp-BPF is used for this.

You can disable all sandboxing (for testing) with --no-sandbox.

Layered approach

One notable difficulty with seccomp-bpf is that filtering at the system call interface provides difficult to understand semantics. One crucial aspect is that if a process A runs under seccomp-bpf, we need to guarantee that it cannot affect the integrity of process B running under a different seccomp-bpf policy (which would be a sandbox escape). Besides the obvious system calls such as ptrace() or process_vm_writev(), there are multiple subtle issues, such as using open() on /proc entries.

Our layer-1 guarantees the integrity of processes running under different seccomp-bpf policies. In addition, it allows restricting access to the network, something that is difficult to perform at the layer-2.

User namespaces sandbox

The namespace sandbox aims to replace the setuid sandbox. It has the advantage of not requiring a setuid binary. It's based on (unprivileged) user namespaces in the Linux kernel. It generally requires a kernel >= 3.10, although it may work with 3.8 if certain patches are backported.

Starting with M-43, if the kernel supports it, unprivileged namespaces are used instead of the setuid sandbox. Starting with M-44, certain processes run in their own PID namespace, which isolates them better.

The seccomp-bpf sandbox

Also called seccomp-filters sandbox.

Our main layer-2 sandbox, designed to shelter the kernel from malicious code executing in userland.

Also used as layer-1 in the GPU process. A BPF compiler will compile a process-specific program to filter system calls and send it to the kernel. The kernel will interpret this program for each system call and allow or disallow the call.

To help with sandboxing of existing code, the kernel can also synchronously raise a SIGSYS signal. This allows user-land to perform actions such as “log and return errno”, emulate the system call or broker-out the system call (perform a remote system call via IPC). Implementing this requires a low-level async-signal safe IPC facility.

seccomp-bpf is supported since Linux 3.5, but is also back-ported on Ubuntu 12.04 and is always available on Chrome OS. See this page for more information.