A
thread’s CPU affinity mask determines the set of CPUs
on which it is eligible to run. On a multiprocessor system,
setting the CPU affinity mask can be used to obtain
performance benefits. For example, by dedicating one CPU to
a particular thread (i.e., setting the affinity mask of that
thread to specify a single CPU, and setting the affinity
mask of all other threads to exclude that CPU), it is
possible to ensure maximum execution speed for that thread.
Restricting a thread to run on a single CPU also avoids the
performance cost caused by the cache invalidation that
occurs when a thread ceases to execute on one CPU and then
recommences execution on a different CPU.

A CPU affinity
mask is represented by the cpu_set_t structure, a
"CPU set", pointed to by mask. A set of
macros for manipulating CPU sets is described in
CPU_SET(3).

sched_setaffinity()
sets the CPU affinity mask of the thread whose ID is
pid to the value specified by mask. If
pid is zero, then the calling thread is used. The
argument cpusetsize is the length (in bytes) of the
data pointed to by mask. Normally this argument would
be specified as sizeof(cpu_set_t).

If the thread
specified by pid is not currently running on one of
the CPUs specified in mask, then that thread is
migrated to one of the CPUs specified in mask.

sched_getaffinity()
writes the affinity mask of the thread whose ID is
pid into the cpu_set_t structure pointed to by
mask. The cpusetsize argument specifies the
size (in bytes) of mask. If pid is zero, then
the mask of the calling thread is returned.

The affinity bit mask mask contains no processors
that are currently physically on the system and permitted to
the thread according to any restrictions that may be imposed
by cpuset cgroups or the "cpuset" mechanism
described in cpuset(7).

EINVAL

(sched_getaffinity() and, in kernels before
2.6.9, sched_setaffinity()) cpusetsize is
smaller than the size of the affinity mask used by the
kernel.

EPERM

(sched_setaffinity()) The calling thread does not
have appropriate privileges. The caller needs an effective
user ID equal to the real user ID or effective user ID of
the thread identified by pid, or it must possess the
CAP_SYS_NICE capability in the user namespace of the
thread pid.

The CPU
affinity system calls were introduced in Linux kernel 2.5.8.
The system call wrappers were introduced in glibc 2.3.
Initially, the glibc interfaces included a cpusetsize
argument, typed as unsigned int. In glibc 2.3.3, the
cpusetsize argument was removed, but was then
restored in glibc 2.3.4, with type size_t.

After a call to
sched_setaffinity(), the set of CPUs on which the
thread will actually run is the intersection of the set
specified in the mask argument and the set of CPUs
actually present on the system. The system may further
restrict the set of CPUs on which the thread runs if the
"cpuset" mechanism described in cpuset(7)
is being used. These restrictions on the actual set of CPUs
on which the thread will run are silently imposed by the
kernel.

There are
various ways of determining the number of CPUs available on
the system, including: inspecting the contents of
/proc/cpuinfo; using sysconf(3) to obtain the
values of the _SC_NPROCESSORS_CONF and
_SC_NPROCESSORS_ONLN parameters; and inspecting the
list of CPU directories under
/sys/devices/system/cpu/.

The affinity
mask is a per-thread attribute that can be adjusted
independently for each of the threads in a thread group. The
value returned from a call to gettid(2) can be passed
in the argument pid. Specifying pid as 0 will
set the attribute for the calling thread, and passing the
value returned from a call to getpid(2) will set the
attribute for the main thread of the thread group. (If you
are using the POSIX threads API, then use
pthread_setaffinity_np(3) instead of
sched_setaffinity().)

The
isolcpus boot option can be used to isolate one or
more CPUs at boot time, so that no processes are scheduled
onto those CPUs. Following the use of this boot option, the
only way to schedule processes onto the isolated CPUs is via
sched_setaffinity() or the cpuset(7)
mechanism. For further information, see the kernel source
file Documentation/admin-guide/kernel-parameters.txt.
As noted in that file, isolcpus is the preferred
mechanism of isolating CPUs (versus the alternative of
manually setting the CPU affinity of all processes on the
system).

A child created
via fork(2) inherits its parent’s CPU affinity
mask. The affinity mask is preserved across an
execve(2).

C
library/kernel differences
This manual page describes the glibc interface for the CPU
affinity calls. The actual system call interface is slightly
different, with the mask being typed as unsigned
long *, reflecting the fact that the underlying
implementation of CPU sets is a simple bit mask. On success,
the raw sched_getaffinity() system call returns the
size (in bytes) of the cpumask_t data type that is
used internally by the kernel to represent the CPU set bit
mask.

Handling
systems with large CPU affinity masks
The underlying system calls (which represent CPU masks as
bit masks of type unsigned long *) impose no
restriction on the size of the CPU mask. However, the
cpu_set_t data type used by glibc has a fixed size of
128 bytes, meaning that the maximum CPU number that can be
represented is 1023. If the kernel CPU affinity mask is
larger than 1024, then calls of the form:

sched_getaffinity(pid,
sizeof(cpu_set_t), &mask);

fail with the
error EINVAL, the error produced by the underlying
system call for the case where the mask size
specified in cpusetsize is smaller than the size of
the affinity mask used by the kernel. (Depending on the
system CPU topology, the kernel affinity mask can be
substantially larger than the number of active CPUs in the
system.)

When working on
systems with large kernel CPU affinity masks, one must
dynamically allocate the mask argument (see
CPU_ALLOC(3)). Currently, the only way to do this is
by probing for the size of the required mask using
sched_getaffinity() calls with increasing mask sizes
(until the call does not fail with the error
EINVAL).

Be aware that
CPU_ALLOC(3) may allocate a slightly larger CPU set
than requested (because CPU sets are implemented as bit
masks allocated in units of sizeof(long)).
Consequently, sched_getaffinity() can set bits beyond
the requested allocation size, because the kernel sees a few
additional bits. Therefore, the caller should iterate over
the bits in the returned set, counting those which are set,
and stop upon reaching the value returned by
CPU_COUNT(3) (rather than iterating over the number
of bits requested to be allocated).

The program
below creates a child process. The parent and child then
each assign themselves to a specified CPU and execute
identical loops that consume some CPU time. Before
terminating, the parent waits for the child to complete. The
program takes three command-line arguments: the CPU number
for the parent, the CPU number for the child, and the number
of loop iterations that both processes should perform.

As the sample
runs below demonstrate, the amount of real and CPU time
consumed when running the program will depend on intra-core
caching effects and whether the processes are using the same
CPU.

We first employ
lscpu(1) to determine that this (x86) system has two
cores, each with two CPUs:

$lscpu | grep -i ’core.*:|socket’

Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1

We then time
the operation of the example program for three cases: both
processes running on the same CPU; both processes running on
different CPUs on the same core; and both processes running
on different CPUs on different cores.

This page is
part of release 4.16 of the Linux man-pages project.
A description of the project, information about reporting
bugs, and the latest version of this page, can be found at
https://www.kernel.org/doc/man−pages/.