Kernel configuration

When configuring your kernel for cgroups with util-vserver you must make sure CONFIG_CGROUP_NS (CGroup Namespaces) is unset for the time being.

CGroup Namespaces are a different approach to namespaces than that used by Linux vServer, and are not currently supported.

Prerequisites

To use util-vserver's Control Groups (cgroups) support, you need to have /dev/cgroup mounted.

Recent versions of util-vserver sort this out for you by including the appropriate mount command in the util-vserverinit (ie: runlevel) script included in the util-vserver distribution, however this apparently only works for the sysvinit script, and not the Debian or Gentoo ones.

If you were to mount the cgroup Control Groups filesystem manually, you would use something like:

# mkdir /dev/cgroup

# mount -t cgroup -o <subsystems> /dev/cgroup

Where <subsystems> is something like cpuset,memory.

To avoid the need for manual configuration after reboot, on Gentoo you may wish to add the cgroup mount to /etc/fstab. For Debian see the live examples section at the bottom of this page.

This limit is an hard limit, see it like an upper wall for the resources used by the cgroup.

If you set both CPU share AND hard limit the system will do fine but hard limits takes priority over CPU share scheduling, so CPU share will do the job but each cgroup will have an upper bound that it cannot cross even if the CPU share you gave it is higher.

The hard limit feature adds 2 cgroup files for the CFS group scheduler:

cfs_runtime_us: Hard limit for the group in microseconds.

cfs_period_us: Time period in microseconds within which hard limits is enforced.

using cgroup to enforce memory limits

in linux-vserver patch version vs2.3.0.36.29 memory limiting by cgroup is introduced. to use it you need to have the following config lines in your kernel build (aditionally to the others mentioned for cgroup cpu limits):

CONFIG_RESOURCE_COUNTERS=y

CONFIG_CGROUP_MEM_RES_CTLR=y

CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y

make sure /dev/cgroup is mounted with -o...,memory to be able to use this feature. The following files let you adjust memory limits of a running vserver (create them in /etc/vservers/-vserver-name-
/cgroup/ to make them permanent):

memory.memsw.limit_in_bytes the total memory limit (memory+swap) of your cgroup context

memory.limit_in_bytes the total memory limit

values are stored in bytes. When writing to those files you can use suffixes: K,M,G.

Note: cgroup memory limits are to replace rss.soft and rss.hard some time in the future.
When you wish the guests to see only their limited memory pool, be sure to include VIRT_MEM in your cflags config file.

For a deeper understanding check out Documentation/cgroups/memory.txt of your kernel source tree.

Real world Examples of Scheduling

This section is for working and tested examples you have put in place.

Please add the following information for each example you put here (use vserver-info).

Base kernel version

vServer version

Other kernel patches in use (grsec, etc.)

util-vserver release

Ben's install on Debian Lenny

I used the kernels from [2], described at [3]. I've done this on a few versions, works for 2.6.31.7 with patch vs2.3.0.36.27 on amd64, also 2.6.31.11 with patch vs2.3.0.36.28. I used the stock Lenny util-vserver, patched as described below. The kernel config is critically important, with specific cgroup options necessary in order to get cgroups working in this way. Check the configs for the [4] kernels to see which ones I used.

Getting Lenny Ready

There's a very old version of util-vserver on Lenny, it needs this patch applying before it will set the cgroups properly, it basically only adds one line:

Then started the guests. When the system was loaded (I used one instance of cpuburn on each server - not advised but a useful test) they each should have got the following percentage of CPU.

Guest Name

cpu.share given

percentage of cpu

fivetime

512

10%

fourtime

1024

20%

threetime

1024

20%

twotime

1536

30%

onetime

1024

20%

This didn't quite happen, as each process could migrate to other CPUs. When I fixed every guest to use only one of the available CPUs (see below how I did this) the percentage of processing time alloted to each guest were then pretty much exact! Each process was given exactly it's designated percentage of time according to vtop.

Dishing out different processors sets to different guest servers

The "cpuset" for each guest is the subset of CPUs which it is permitted to use. I found out the number of CPUs available on my system by doing this:

$ cat /dev/cgroup/cpuset.cpus

This gave me the result 0-1, meaning that the overall set for my cgroups consists of CPUs 0 and 1 (for a quad core system one would expect the result 0-3, or for quad core with HT, 0-7). I stopped my guests, then for each guest specified a cpuset containing only CPU 0 for each of them:

On restarting the guest, I could see (using vtop) that these guest were only using the CPU 0 (the column "Last used cpu (SMP)" needs to be on in vtop in order to see this). This set up isn't particularly useful, but did allow me to check that the cpu.shares I specified for my guest were working as expected.

Doing this to servers live

The parameters in the last two sections can be set when the servers are running. For example to move the guest "threetime" so that it could use both CPUs I did this:

$ echo "0-1" > /dev/cgroup/threetime/cpuset.cpus

The processes running on threetime instantly were allocated cycle on both CPUs. Then:

$ echo "1" > /dev/cgroup/threetime/cpuset.cpus

Shifts them all to CPU 1. One can change where cycles are allocated with impunity. The same with CPU shares:

$ echo "4096" > /dev/cgroup/threetime/cpu.shares

Gave threetime a much bigger slice of the processors when it was under load.

NOTE: The range "0-1" is not the only way of specifying a set of CPUs, I could have used "0,1". On bigger systems, with say 8 CPUs one could use "0-2,4,5", which would be the same as "0,1,2,4,5" or "0-2,4-5".

Making sure all of this gets set up after a reboot

This process will make sure /dev/cgroup is present at boot and correctly mounted:

patch util-vserver (see above)

mkdir /etc/vservers/.defaults/cgroup

mkdir /lib/udev/devices/cgroup (this will mean that the /dev/cgroup is created early in the boot process)

add the following line to /etc/fstab

vserver /dev/cgroup cgroup cpu,cpuset,memory 0 0

Ben's install on Debian Squeeze/Sid

Squeeze is due to ship with the 2.6.32 kernel. Currently the package linux-image-2.6.32-5-vserver-amd64 works well for cgroup scheduling. The following steps are simplest way to set it up:

mkdir /etc/vservers/.defaults/cgroup

mkdir /lib/udev/devices/cgroup (this will mean that the /dev/cgroup is created early in the boot process)

Instructions for setting particular parameters are the same as for Lenny.

The reason for specifying the cgroup subsystems is that if the namespace subsystem "ns" is included, Linux-Vserver will not work. The /etc/fstab line above mounts /dev/cgroup with all the available subsystems excluding "ns".