SMP and Embedded Real Time

As embedded real-time applications start to run on SMP systems, kernel issues emerge.

Priority Inversion

Priority inversion is illustrated by Figure 9. A low-priority process
P2 holds a lock, but is preempted by medium-priority processes. When
high-priority process P1 tries to acquire the lock, it must wait, because
P2 holds it. But P2 cannot release it until it runs again, which will not
happen while the medium-priority processes are running. So, in effect,
the medium-priority processes have blocked a high-priority process:
in short, priority inversion.

Figure 9. Priority Inversion

One way to prevent priority inversion is to disable preemption during
critical sections, as is done in CONFIG_PREEMPT builds of the Linux
kernel. However, this preemption disabling can result in excessive
latencies.

The -rt patchset therefore uses priority inheritance instead, so that
P1 donates its priority to P2, but only for as long as P2 continues
to hold the lock, as shown in Figure 10. Because P2 is now running at
high priority, it preempts the medium-priority processes, completing
its critical section quickly and then handing the lock off to P1.

Figure 10. Priority Inheritance

So priority inheritance works well for exclusive locks, where only one
thread can hold the lock at a given time. But there are also reader-writer
locks, which can be held by one writer on the one hand or by an unlimited
number of readers on the other. The fact that a reader-writer lock can be
held by an unlimited number of readers can be a real problem for priority
inheritance, as illustrated in Figure 11. Here, several low-priority
processes are read-holding lock L1, but are preempted by medium-priority
processes. Each low-priority process might also be blocked write-acquiring
other locks, which might be read-held by even more low-priority processes,
all of which are also preempted by the medium-priority processes.

Figure 11. Reader-Writer Lock Priority Inversion

Priority inheritance can solve this problem, but the cure is worse
than the disease. For example, the arbitrarily bushy tree of preempted
processes requires complex and slow bookkeeping. But even worse, before
the high-priority writer can proceed, all of the low-priority processes
must complete their critical sections, which will result in arbitrarily
long delays.

Such delays are not what we want in a real-time system. This situation
resulted in numerous “spirited” discussions on the Linux-kernel mailing
list, which Ingo Molnar closed down with the following proposal:

Only one task at a time may read-acquire a given reader-writer lock.

If #1 results in performance or scalability problems, the
problematic lock will be replaced with RCU (read-copy update).

RCU can be thought of as a reader-writer lock where readers never block;
in fact, readers execute a deterministic number of instructions. Writers
have much higher overhead, as they must retain old versions of the
data structure that readers might still be referencing. RCU provides
special primitives to allow writers to determine when all readers
have completed, so that the writer can determine when it is safe to
free old versions. RCU works best for read-mostly data structures, or
for data structures with hard real-time readers. (More detail may be
found at en.wikipedia.org/wiki/RCU, and even more detail may be
found at www.rdrop.com/users/paulmck/RCU. Although user-level
RCU implementations have been produced for experimental purposes,
for example, www.cs.toronto.edu/~tomhart/perflab/ipdps06.tgz,
production-quality RCU implementations are currently found only in
kernels. Fixing this is on my to-do list.)

A key property of RCU is that writers never block readers, and,
conversely, readers do not block writers from modifying a data
structure. Therefore, RCU cannot cause priority inversion. This is
illustrated in Figure 12. Here, the low-priority processes are in
RCU read-side critical sections and are preempted by medium-priority
processes, but because the locks are used only to coordinate updates,
the high-priority process P1 immediately can acquire the lock and carry
out the update by creating a new version. Freeing the old version does
have to wait for readers to complete, but this freeing can be deferred
to avoid degrading real-time latencies.

Figure 12. RCU Prevents Priority Inversion

This combination of priority inheritance and RCU permits the -rt patchset
to provide real-time latencies on mid-range multiprocessors. But priority
inheritance is not a panacea. For example, one could imagine applying some
form of priority inheritance to real-live users who might be blocking
high-priority processes, as shown in Figure 13. However, I would rather
we did not.

Comments

Comment viewing options

I have a question about that "interrupt" discribed in figure 6-8.
Could you tell me if this kind of interrupt happens on one CPU, from cpu catch a INTn do tophalf instructions to deal with the blue rectangle(maybe a softirq() of bottomhalf),do all of these was executed by one CPU?
waiting for your explanation!
thank you!

There is a small portion of code that happens in the "top half", or hard irq context. On a non-PREEMPT_RT system he actual interrupt handler code would also execute in hard irq context. However, in PREEMPT_RT, the handler instead executes at process level in a kernel thread executing at real-time priority.

If this handler uses a bottom half, or softirq, then the softirq will be scheduled as another kernel thread, also executing at real-time priority.

The softirq interface is such that the softirq handler executes on the same CPU where the raise_softirq() request ran, Normally the system would be configured so that the hard irq and irq handler ran on the same CPU as well. (I believe that it can be configured otherwise, but I don't know of a good reason to do so.)

There are a number of open-source audio projects. Two that come to mind immediately are Jack and Pulse audio, both of which were enthusiastic about testing out the -rt patchset. The only RT SIP servers that I am aware of are proprietary, ditto with object brokers.

There has been some effect of -rt on networking, but many real-time applications use lower-level protocols (such as UDP) or special transports (such as Infiniband) in order to retain greater control over latency. That said, there are special real-time protocols, such as the DDS suite.

Usually, real-time operating systems are designed for responsiveness, and usually give up throughput performance in favor of responsiveness. For one look at this issue, see my recent OLS paper on real time vs. real fast.