PREEMPT_RT replaces most spinlock_t instances with a preemptiblereal-time lock that supports priority inheritance. An uncontended(fastpath) acquisition of this lock has no more overhead thanits non-rt spinlock_t counterpart. However, the contended casehas considerably more overhead so that the lock can maintainproper priority queue order and support pi-boosting of the lockowner, yet remaining fully preemptible.

Instrumentation shows that the majority of acquisitions under mostworkloads falls either into the fastpath category, or the adaptivespin category within the slowpath. The necessity to pi-boost alock-owner should be sufficiently rare, yet the slow-path pathblindly incurs this overhead in 100% of contentions.

Therefore, this patch intends to capitalize on this observationin order to reduce overhead and improve acquisition throughput.It is important to note that real-time latency is still treatedas a higher order constraint than throughput, so the fullpi-protocol is observed using new carefully constructed rulesaround the old concepts.

1) We check the priority of the owner relative to the waiter on each spin of the lock (if we are not boosted already). If the owner's effective priority is logically less than the waiters priority, we must boost them.

2) We check the priority of ourselves against our current queue position on the waiters-list (if we are not boosted already). If our priority was changed, we need to re-queue ourselves to update our position.

3) We break out of the adaptive-spin if either of the above conditions (1), (2) change so that we can re-evaluate the lock conditions.

4) We must enter pi-boost mode if, at any time, we decide to voluntarily preempt since we are losing our ability to dynamically process the conditions above.

Note: We still fully support priority inheritance with thisprotocol, even if we defer the low-level calls to adjust priority.The difference is really in terms of being a pro-active protocol(boost on entry), verses a reactive protocol (boost whennecessary). The upside to the latter is that we don't take apenalty for pi when it is not necessary (which is most of the time)The downside is that we technically leave the owner exposed togetting preempted (should it get asynchronously deprioritized), evenif our waiter is the highest priority task in the system. When thishappens, the owner would be immediately boosted (because we wouldhit the "oncpu" condition, and subsequently follow the voluntarypreempt path which boosts the owner). Therefore, inversion iscorrectly prevented, but we have the extra latency of thepreempt/boost/wakeup that could have been avoided in the proactivemodel.

However, the design of the algorithm described above constrains theprobability of this phenomenon occurring to setscheduler()operations. Since rt-locks do not support being interrupted bysignals or timeouts, waiters only depart via the acquisition path.And while acquisitions do deboost the owner, the owner alsochanges simultaneously, rending the deboost moot relative to theother waiters.

What this all means is that the downside to this implementation isthat a high-priority waiter *may* see an extra latency (equivalentto roughly two wake-ups) if the owner has its priority reduced viasetscheduler() while it holds the lock. The penalty isdeterministic, arguably small enough, and sufficiently rare that Ido not believe it should be an issue.

Note: If the concept of other exit paths are ever introduced in thefuture, simply adapting the condition to look at owner->normal_prioinstead of owner->prio should once again constrain the limitationto setscheduler().

Special thanks to Peter Morreale for suggesting the optimization toonly consider skipping the boost if the owner is >= to current.

+ orig_owner = rt_mutex_owner(lock);+ /* * waiter.task is NULL the first time we come here and * when we have been woken up by the previous owner * but the lock got stolen by an higher prio task. */- if (!waiter.task) {- add_waiter(lock, &waiter, &flags);+ if (!waiter.task)+ _add_waiter(lock, &waiter);++ /*+ * We only need to pi-boost the owner if they are lower+ * priority than us. We dont care if this is racy+ * against priority changes as we will break out of+ * the adaptive spin anytime any priority changes occur+ * without boosting enabled.+ */+ if (!waiter.pi.boosted && current->prio < orig_owner->prio) {+ boost_lock(lock, &waiter);+ boosted = 1;++ spin_unlock_irqrestore(&lock->wait_lock, flags);+ task_pi_update(current, 0);+ spin_lock_irqsave(&lock->wait_lock, flags);+ /* Wakeup during boost ? */ if (unlikely(!waiter.task)) continue; }