Google has released gVisor, a new kind of sandbox that can be used to provide secure isolation for containers that is less resource intensive than running a full virtual machine (VM). At its core, gVisor is an open source user-space kernel that implements a substantial portion of the Linux system surface. It is written in Go and designed with different trade-offs than existing container technology. The project includes an Open Container Initiative (OCI) runtime called "runsc" that integrates with Docker and Kubernetes.

The gVisor project GitHub README states that the core of gVisor is a kernel that runs as a normal, unprivileged process that supports most Linux system calls. Just like within a VM, an application running in a gVisor sandbox gets its own kernel and set of virtualized devices, distinct from the host and other sandboxes. gVisor provides a strong isolation boundary by intercepting application system calls and acting as the guest kernel, and can be thought of as an extremely paravirtualized operating system with a "flexible resource footprint and lower fixed cost than a full VM". However, this flexibility has associated tradeoffs with performance and compatability: gVisor may provide poor performance for system call heavy workloads; and although gVisor implements a large part of the Linux system API (currently 200 system calls), several system calls and arguments are not supported (and neither are some parts of the /proc and /sys filesystems), which means that not all applications will run inside gVisor.

The Google Cloud Platform (GCP) blog announcement for gVisor discusses that containers have revolutionised how organisations develop, package, and deploy applications, but states that the system surface exposed to containers is broad enough that many security experts "don't recommend them for running untrusted or potentially malicious applications". The blog post references an opensource.com article "Are Docker containers really secure?" in order to add credence to this claim, although it is worth noting that this article was published in 2014, and much has changed in the container security landscape since this time, particularly in relation to Docker.

There are, however, still widely acknowledged security challenges with current container technology, as we have catalogued in a previously published InfoQ article "Docker and High Security Microservices: A Summary of Aaron Grattafiori's DockerCon 2016 Talk". One of the primary issues is that the efficiency and performance gains from using a single, shared kernel also mean that container escape is possible with a single vulnerability. Accordingly, Google posit that there is a growing desire to run more heterogenous and less trusted workloads that has created a new interest in sandboxed containers, "containers that help provide a secure isolation boundary between the host OS and the application running inside the container".

gVisor limits the host kernel surface accessible to the application while still giving the application access to all the features it expects. Unlike most kernels, gVisor does not assume or require a fixed set of physical resources; instead, it leverages existing host kernel functionality and runs as a normal user-space process. gVisor intercepts all system calls made by the application, and does the necessary work to service them. A key distinction in comparison with other container technology, gVisor does not simply redirect application system calls through to the host kernel, and instead implements most kernel primitives (signals, file systems, futexes, pipes, mm, etc.) and has complete system call handlers built on top of these primitives.

In order to provide defense-in-depth and limit the host system surface, the gVisor runtime is split into two separate processes. First, the Sentry process includes the kernel and is responsible for executing user code and handling system calls. Second, file system operations that extend beyond the sandbox (not internal proc or tmp files, pipes, etc.) are sent to a proxy, called a Gofer, via a 9P connection.

The Sentry requires a platform to implement basic context switching and memory mapping functionality. Today, gVisor supports two platforms: the Ptrace platform uses SYSEMU functionality to execute user code without executing host system calls; and the KVM platform (experimental) allows the Sentry to act as both guest OS and Virtual Machine Monitor (VMM), switching back and forth between the two worlds seamlessly.

The gVisor runtime integrates with Docker and Kubernetes via "runsc" (short for "run Sandboxed Container"), which conforms to the OCI runtime API. The runsc runtime is interchangeable with runc, which is Docker's default container runtime. In Kubernetes, most resource isolation occurs at the pod level, making the pod a natural fit for a gVisor sandbox boundary. The Kubernetes community is currently formalizing the sandbox pod API, but experimental support is available today. The runsc runtime can run sandboxed pods in a Kubernetes cluster through the use of either the cri-o or cri-containerd projects, which convert messages from the Kubelet into OCI runtime commands.

In regards to related projects, Kata containers is an open-source project that uses "extremely lightweight" VMs to keep the resource footprint minimal for container isolation. Like gVisor, Kata contains an OCI runtime that is compatible with Docker and Kubernetes. There has been much associated discussion about the trade-offs between the technologies on HackerNews, with one user "jsolson" suggesting that "the tradeoffs between [these differing sandbox technologies] are mostly with respect to compatibility, robustness of the security boundaries, and performance".

gVisor is written in Golang (Go), which was chosen for its memory- and type-safety. It is worth noting that gVisor can currently only build and run on x86_64 Linux 3.17+ and only supports x86_64 binaries inside the sandbox (i.e., it cannot run 32-bit binaries).

InfoQ Weekly Newsletter

Join a community of over 250 K senior developers by signing up for our newsletter. If you are based in the EEA, please contact us so we can provide you with the protections afforded to you under EEA protection laws.

Is your profile up-to-date? Please take a moment to review and update.

Email Address

Note: If updating/changing your email, a validation request will be sent

Company name:

Keep current company name

Update Company name to:

Company role:

Keep current company role

Update company role to:

Company size:

Keep current company Size

Update company size to:

Country/Zone:

Keep current country/zone

Update country/zone to:

State/Province/Region:

Keep current state/province/region

Update state/province/region to:

Subscribe to our newsletter?

Subscribe to our architect newsletter?

Subscribe to our industry email notices?

You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.

We notice you're using an ad blocker

We understand why you use ad blockers. However to keep InfoQ free we need your support. InfoQ will not provide your data to third parties without individual opt-in consent. We only work with advertisers relevant to our readers. Please consider whitelisting us.