CPU Priority Management is an infrastructure also introduced by Gregory
Haskins to make task migration decisions efficient. This code tracks the
priority of every CPU in the system. Every CPU can be in any one of the
following states: INVALID, IDLE, NORMAL, RT1, ... RT99.

CPUs in the INVALID state are not eligible for task routing. The system
maintains this state with a two-dimensional bitmap: one dimension for the different
priority levels and the second for the CPUs in that priority level
(priority of a CPU is equivalent to the rq->rt.highest_prio). This is
implemented using three arrays, as shown in
Listing 3.

The pri_active bitmap tracks those
priority levels that contain one or more CPUs. For example, if there
is a CPU at priority 49, pri_active[49+2]=1
(real-time task priorities are mapped to 2–102 internally in order
to account for priorities INVALID and IDLE), finding the first set bit
of this array would yield the lowest priority that any of the CPUs in
a given cpuset is in.

The field cpu_to_pri indicates the priority of
a CPU.

The field pri_to_cpu yields information about all
the CPUs of a cpuset that are in a particular priority level. This is
encapsulated in struct cpupri_vec, as shown in Listing 4.

Like rt_overload, cpupri also is scoped at the root domain level. Every
exclusive cpuset that comprises a root domain consists of a cpupri
data value.

If a priority level is non-empty and lower than the priority of the task
being pushed, the lowest_mask is set to the mask corresponding to the
priority level selected. This mask is then used by the push algorithm
to compute the best CPU to which to push the task, based on affinity, topology
and cache characteristics.

Details of the Push Scheduling Algorithm

As discussed before, in order to ensure SWSRPS, when a low-priority
real-time task gets preempted by a higher one or when a task is woken
up on a runqueue that already has a higher-priority task running on it,
the scheduler needs to search for a suitable target runqueue for the
task. This operation of searching a runqueue and transferring one of its
tasks to another runqueue is called pushing a task.

The push_rt_task() algorithm looks at the
highest-priority non-running runnable real-time task on the runqueue and
considers all the runqueues to find a CPU where it can run.
It searches for a runqueue that is of lower priority—that is, one where the
currently running task can be preempted by the task that is being pushed.
As explained previously, the CPU Priority
Management infrastructure is used to find a mask of CPUs that have
the lowest-priority runqueues. It is important to select only
the best CPU from among all the candidates. The algorithm gives the
highest priority to the CPU on which the task last executed, as it is
likely to be cache-hot in that location. If that is not possible, the
sched_domain map is considered to find a CPU
that is logically closest to last_cpu. If
this too fails, a CPU is selected at random from the mask.

The push operation is performed until a real-time task fails to be
migrated or there are no more tasks to be pushed. Because the algorithm
always selects the highest non-running task for pushing, the assumption
is that, if it cannot migrate it, then most likely the lower real-time
tasks cannot be migrated either and the search is aborted. No lock is
taken when scanning for the lowest-priority runqueue. When the target
runqueue is found, only the lock of that runqueue is taken, after which
a check is made to verify whether it is still a candidate to which to push the task
(as the target runqueue might have been modified by a parallel scheduling
operation on another CPU). If not, the search is repeated for a maximum
of three tries, after which it is aborted.