Having to deal with NUMA machines is becoming more and more common, and will likely continue to do so. Running typical virtualization workloads on such systems is particularly challenging, as Virtual Machines (VMs) are typically long lived processes with large memory footprints. This means one might incur really bad performance if the specific characteristics of the platform are not properly accounted for. Basically, it would be ideal to always run a VM on the CPUs of the node that host its memory, or at least as close as possible to that. Unfortunately, that is all but easy, and involves reconsidering the current approaches to scheduling and memory allocation.

Extensive benchmarks have been performed, running memory intensive workloads inside Linux VMs hosted on NUMA hardware of different kinds and size. This has then driven the design and development of a suite of new VM placement, scheduling and memory allocation policies for the Xen hypervisor and its toolstack. The implementation of such changes has been benchmarked against the baseline performance and proved to be effective in yielding improvements, which will be illustrated during the talk. Although some of the work is hypervisor specific, it covers are issues that can be considered of interest for the whole Linux virtualization community. Whether and how to export NUMA topology related information to guests, just to give an example. We believe that the solutions we are working on, the ideas behing them and the performance evaluation we conducted are something the community would enjoy hearing and talking about.

Topic Lead: Dario Faggioli
Dario has interacted with the Linux kernel community in the domain of scheduling during his PhD on real-time systems. He now works for Citrix on the Xen Open Source project. He spent the last months on investigating and trying to improve the performance of virtualization workloads on NUMA systems.

The Linux Kernel is already very NUMA aware and NUMA capable in terms of hard NUMA bindings. For example, it is relatively easy to bind an application or small virtual machine to a single NUMA node. However, hard bindings are not set up automatically by the kernel. The administrator might not know enough to set up the hard bindings, might do them in a non-optimal way or might not re-do them when the system workload changes significantly. Incorrect NUMA bindings (of none at
all) can result in a significant performance hit.

Userland NUMA job managers exist, for example numad, and they can be used to automatically move processes to different NUMA nodes. But, those job managers have limits on how well they can optimize certain workloads. Such difficult workloads include processes that span across
more than one NUMA node or many virtual machines that overcommit the host's CPU resources.

The objective of AutoNUMA is to avoid the need for the administrator to set up hard bindings and to solve the more difficult problems faced by the userland NUMA job managers, while achieving optimal performance on NUMA systems for all types of workloads. The Linux Kernel, when
AutoNUMA is enabled, is capable of reaching NUMA convergence dynamically and incrementally based on the current load, in turn handling NUMA placements automatically.

Andrea Arcangeli joined Qumranet and then Red Hat in 2008 because of his interest in working on the KVM virtualization project, with a special interest in virtual machine memory management. Before joining Qumranet, he worked for Novel/SUSE for 9 years. He worked on many parts of the Linux Kernel, but specialized on the virtual memory subsystem. Andrea starting working with Linux in his spare time shortly after first connecting to the internet back in 1996 while studying at Bologna University. He enjoys spending most of his time solving software problems - "big and small" - and promoting the adoption of Linux and Open Source software everywhere.

During Google Summer of Code 2010 (Migration from memory ballooning to memory hotplug in Xen) it was discovered that in main line Linux Kernel exists 3 balloon driver implementations for 3 virtualization platforms (KVM, Xen, VMware). It quickly came out that they are almost identical but of course they have different controls and API/ABI. In view of e.g. memory hotplug driver which has generic base (not linked with specific hardware/software solution) this situation is not acceptable. The goal of this project is generic balloon driver which could be placed in MM subsystem and which could be linked with as little as possible platform specific code (placed e.g. in relevant arch directory). This solution could give unified ABI (which could ease administration) and unified API for developer (i.e. easier integration with e.g. tmem, memory hotplug, etc.). Additionally, balloon driver behavior would be almost identical on all platforms.

Discussion should outline the goals and key solutions for such driver.

Topic Lead: Daniel Kiper
Daniel was a Google Summer of Code 2010 (memory hotplug/balloon driver) and Google Summer of Code 2011 (kexec/kdump) student. He is involved in *NIX administration/development since 1994. Currently his work and interests focuses on kexec/kdump implementation for Xen.