Per the title, it appears that docker 1.10.2 isn't respecting the --cpuset-cpus argument. We have a number of containers for applications which use thread pools which are sized based on the number of cores available. Since updating to 1.10.2 (from a various array of versions starting somewhere in 1.3.x), the thread counts on our docker hosts are through the roof. [Edit: this wasn't actually linked to the update, but rather we'd deployed a few new containers which ran on mono at around the same time. This is still an issue, however.]

This comment has been minimized.

On the surface this issue looks to be similar to what's described as Ubuntu bug ID 1435571, though I can see how this behaviour might manifest from some other root cause. However in this case it may have been a kernel bug, as they've fixed it with these two kernelpatches.

This comment has been minimized.

Before seeing your comment I fired up a fresh install of CentOS 7 and made sure it was up to date. I then installed docker according to the official installation instructions. This issue does not occur in that configuration.

I will run this the check-config script in both locations and compare the output.

If it turns out that this was an issue with this feature not being supported by the kernel, I'd suggest that this script be converted into runtime checks within docker itself so that the docker CLI can fail with an appropriate error message when trying to create a container which would use kernel features that aren't supported.

This comment has been minimized.

Actually, I think the patch review is unnecessary, as this issue occurs on a different docker host in our prod environment which is already running 3.10.0-327.10.1, and the latest userspace, CentOS 7.2.1511. To avoid (or inadvertently create) confusion, I refer to this host as host-with-latest-userspace-and-kernel below.

The output of check-config.sh ran on this host is identical to my test VM.

This also suggests that the exact CentOS version may also not matter much, as both my test VM and host-with-latest-userspace-and-kernel are CentOS 7.2.1511, while the machine upon which I originally reported is CentOS 7.1.1503.

Just for completeness, below you will find the same info requested in the issue template, but for host-with-latest-userspace-and-kernel

This comment has been minimized.

To see if I could spot a pattern of some sort, I've tested for the presence of this on the 10 docker hosts to which I have access. The only machine on which I have not observed this issue is the clean VM I set up specifically to test this issue. Below are the configurations of the machines in question (hosts discussed above are included).

Except for the test VM, which is excluded from the machine counts in the table below, all machines tested are bare metal.

On the off chance that there's some difference in behaviour between --cpuset and --cpuset-cpus, I also tested --cpuset on one of the 4 machines running the el7 build of Docker 1.8.2. No change in behaviour.

This comment has been minimized.

Argh... forget everything I said about the test VM working correctly. It turns out I'd forgotten that I'd only provisioned one vcpu for the vm. Now that I've switched it to 4 vcpus, the problem occurs there, too.

I don't fully understand the patches I linked in my first comment, but I have verified that nothing like them has been applied to the CentOS kernel. In fact, there is no effective_cpus member in the cpuset struct in kernel 3.10.0.

This comment has been minimized.

So it's looking like --cpuset-cpus does assign processor affinity correctly, however code which inspects the machine configuration still thinks it has access to the full core count of the machine.

To determine this I created two containers, one with --cpuset-cpus=0 and the other with no --cpuset-cpus argument. In the container console I then backgrounded 4 bash while true loops, and checked process affinity with ps -o pid,cpuid,comm. On the container which had the --cpuset-cpus=0 arg, all cpuid values were 0, while on the other container multiple cpuid values were listed.

Question: Is solving this issue in scope for docker, or is this a kernel-level problem?

This comment has been minimized.

benjamincburns
changed the title from
--cpuset-cpus argument appears to be ignored on 1.10.2 under CentOS 7.1.1503
to
When --cpuset-cpus argument is used, processes inspecting CPU configuration in the container see all coresMar 1, 2016

This comment has been minimized.

Eh, that might be a red herring. I've tried doing this manually to no effect. Also it appears that cgroup.clone_children is only defaulting to 1 on my Ubuntu boxes. On my CentOS hosts /sys/fs/cgroup/cpuset/docker/cgroup.clone_children was already set to 0.

This comment has been minimized.

That command works correctly, which is good news as for the applications for which we control we can inspect this file. However for applications running in vms like mono, this will present some pain. It'd be much simpler overall if the process didn't need to be aware that it was running within a cgroup.

This comment has been minimized.

To add a bit of supporting info to my last statement, I grepped mono's source quickly and found that on systems with a proper glibc, mono detects the core count via sysconf(_SC_NPROCESSORS_ONLN). So, I wrote a quick and dirty c program to call this and print the result, copied it into a container built with --cpuset-cpus=0, and it returns the core count of the full machine.

This comment has been minimized.

Yes, it certainly does. Digging into mono source a bit further it's also parsing /proc/stat in places.

I'll likely open an issue with mono to make the VM cgroup aware, however I agree with @thechile's last comment on #20688 that the container community ought to be working with kernel maintainers to sort out a solution to this problem.

Linus has a pretty famous rule that the kernel shouldn't break userspace. I'd think that the container shouldn't break userspace, either. You might argue that it's not the container, it's cgroups, but if the choice to use cgroups forces containerized processes to become cgroup aware, then from the perspective of the user it's the same result.

It's pain enough for native processes where I control thread pooling and resource allocation, but when you've got a full platform stack that you're trying to drop into a container it gets quite expensive quite quick.