Checkpoint, Restore, Live Migration and beyond

Checkpoint/restore is a feature that allows to freeze a set of running processes and save their complete state to a disk. This state can later be restored, so the processes resume exactly the way they were running before. This feature opens a whole set of possibilities, such as live migration, fast start of a huge application, or kernel update without service interruption. While such functionality exist in e.g. OpenVZ kernel, many attempts to merge it upstream (i.e. to vanilla Linux kernel) had failed miserably, mostly for code complexity reasons.

We found a way to overcome this by implementing most of the required pieces in userspace, using the existing kernel APIs where possible, and extending those if necessary. This is what Checkpoint and Restore in Userspace (aka CRIU) project is about.

The talk outlines the current state of the project, including:

recent CRIU-related changes merged to the upstream kernel

some implementation details

current abilities of CRIU userspace tool

plans for the future

The report is of interest to system and distro developers, advanced users, and anyone interested in containers, virtualization, HA and HPC.