How PTS benefits embedded systems designs

In Part 1, we investigated Preemption-Threshold Scheduling (PTS)—a trademarked technology from Express Logic—and explained what it is and how it works. In Part 2, we will focus on schedulability, and how PTS enables real-time systems to be schedulable, even under worst case conditions, while still improving efficiency over traditional approaches. Other benefits of PTS also will be identified and discussed.

Real-time analysis for PTSAnalysis basics
Real-time analysis techniques allow us to determine if a set of tasks is schedulable—all of its tasks will meet their deadlines in all possible scheduling situations. These analysis techniques depend on knowing how often each task i can run (minimum execution period Ti), how long the task takes to run (worst-case execution time Ci) and the deadline (Di) for that task. If tasks share mutually exclusive resources through critical sections, we also need to define the maximum time section ξklj spent in each critical section.

Priority-driven scheduling policies are typically used by real-time kernels. A priority-driven scheduling policy ensures that contention for resources (e.g., processor time) is always resolved in favor of the highest priority job which is ready to run.

Scheduling policy can be static or dynamic. A fixed-priority scheduling policy does not change priorities at run time, while a dynamic-priority policy does. Classical examples of fixed-priority scheduling algorithms are Rate Monotonic (RM), Deadline Monotonic (DM), and Audsley’s optimal priority assignment algorithms.

Unlike fixed-priority scheduling, in a dynamic priority scheme different jobs belonging to the same task can be assigned different priorities dynamically. The most popular dynamic priority scheduling policy is the earliest deadline first (EDF) algorithm. In EDF scheduling, the priority of each job is assigned dynamically to be inversely proportional to job’s absolute deadline. Below we discuss some of the basic schedulability tests for both of these schemes.

Fixed-priority schemes can be analyzed using level-i busy period analysis. This means that a bound is computed on the response times of all jobs of a task by essentially simulating the worst-case scenario that jobs can experience. Such a bound is referred to as the worst-case response time (WCRT) for the task and denoted by Wi.If the WCRT for each task is no larger than its respective deadline, the entire workload is said to be schedulable.

The WCRT of a task can be obtained by considering the response-time of a single job that is released under this worst-case scenario. There are two components to the response time. Equal or higher-priority tasks create interference, and lower-priority tasks can cause blocking. This WCRT can be computed using recursive equations which consider the effective of blocking by higher-priority tasks.

In a dynamic-priority scheme such as EDF, we merely need to ensure that the utilization is no more than 1.

Analysis for PTS
In a fixed-priority scheme, using PTS can lead to a different type of worst-case scenario. In addition to interference from higher priority tasks, a task Ti may also be blocked by a lower priority task Tj if πi ≤ γj. In this case, the WCRT of a task happens when the task with the longest WCET that has lower priority but higher preemption-threshold is released one clock cycle before Ti. This will affect the task’s worst-case start time, worst-case finish time, and worst-case response time.

To use PTS with a dynamic-priority scheme, the system preemption relations need to be known at design time (i.e., which tasks/jobs can preempt which tasks/jobs). To this end, Gai et al. showed that the concept of preemption levels presented earlier by Baker (Baker, 1991) can be used. Preemption levels present a static mapping that enables offline analysis of dynamic priority scheduling schemes. A preemption level for a task Ti is assigned statically at design time similar to the task’s priority in the case of fixed-priority schemes (i.e. the preemption levels of all jobs belonging to a task are the same). As explained by Baker, the preemption level mapping is arbitrary as long as it is assigned to satisfy one single property: A job of Ti is not allowed to preempt a job of Tj unless it has higher priority and λi > λj . Under the EDF scheduling policy, the previous property has been verified if preemption levels are assigned inversely proportional to the tasks’ periods.

After preemption levels are assigned, the system can be analyzed statically as in the fixed-priority case and preemption-thresholds can be assigned. Hence, through the use of preemption levels, one can transform a dynamic-priority system into an equivalent static fixed-priority scheme and analyze it accordingly.

Resource sharing and PTS
Often there are resources (e.g., global variables, buffers, hardware devices, etc.) that tasks must access in a mutually exclusive manner. In fully preemptive and semi-preemptive schemes like PTS, the mechanisms that enforce mutual exclusion can block a task until another task releases the resource. In fully-nonpreemptive schemes, mutual exclusion is not a problem since tasks cannot be interrupted while accessing a resource.

Many resource sharing protocols have been proposed, including the priority inheritance protocol (PIP), the priority ceiling protocol (PCP), and the stack resource policy (SRP). The PIP prescribes that if a higher priority job becomes blocked by a lower priority one due to a shared resource, the job that is causing the blocking should execute with a priority which is the maximum of its own nominal priority and the priority of the job(s) that it is causing the blocking. That is, it should inherit the priority of the higher priority task). However, the PIP does not prevent deadlocks. In addition, a job can be blocked multiple times with PIP.

The PCP solves the PIP problem by adding a priority ρk to each shared resource, called the priority ceiling of the resource and denoted by ceil(ρk). The priority ceiling is an upper bound on the priority of any job that may lock the resource. A job that would require its critical section unless its priority is higher than the priority ceiling of all the shared resources it might acquire. This prevents the deadlocks and guarantees that a job can be blocked at most once.

Similar to the PCP, the Stack Resource Policy (SRP) was developed by Baker to enable real-time tasks to share mutually exclusive resources while preventing deadlocks. The SRP can be used with dynamic-priority assignment scheduling as well as static-priority scheduling. The SRP ensures that once a task starts execution, it cannot be blocked until completion. A task can only be preempted by higher priority tasks. However, the execution of a task Ti with the highest priority in the system could be delayed by a lower priority task, which is locking some resource, and has raised the system ceiling to a value greater than or equal to the priority level λi. Similar to the PCP, this delay is called the resource blocking time and denoted by Bri.

The blocking time due to shared resources Bri can now be used to refine the schedulability conditions. The maximum blocking time a job can experience due to a shared resource can be calculated as the longest critical section ξklj of tasks with lower preemption levels (which are equivalent to priorities in the case of fixed-priority scheduling policies), but with a ceiling greater than or equal to the preemption level of Ti. The blocking times in the schedulability equations are refined to be the maximum of the blocking due to lower preemption level tasks with higher preemption-thresholds and the blocking due to shared resources.

Gai et al. extended the Stack Resource Policy to support PTS, resulting in the stack resource policy with preemption-thresholds (SRPT). They showed how this enables the use of PTS with shared resources while preventing priority inversions, deadlocks, and enabling task synchronization. From an implementation point of view, the SRPT also allows tasks to share a unique stack. Since a task is never blocked due to a shared resource it simply cannot start executing if its preemption level is not high enough.

Blocking due to self-suspension is not an issue either since tasks in the same non-preemptive group have the same stack spaces. If a task delays itself and later resumes execution, it will not corrupt its stack space because it only shares the same stack space with other tasks that cannot preempt it. Tasks that are allowed to preempt the suspended task, on the other hand, are located in different areas of memory and will not cause any stack corruption.

Additional benefits of PTS
PTS’s ability to reduce preemptions and stack space requirements opens the doors to other system optimizations. Here are two examples from a large field of research.

Increasing effective size of SPM using PTS
By ensuring that tasks within a non-preemptive group cannot preempt each other, we enable those tasks to share stack space for automatic variables. One benefit is to reduce stack space memory requirements, as described above. Another is to fit more data into an existing fast memory. This is the goal of Data Allocation with Real-Time Scheduling (DARTS) as described by Ghattas, Parsons, & Dean, 2007 and Kang & Dean, 2010.

Non-real-time systems may use caches to make external memory look fast. However, caches may occasionally deliver slow performance due to misses. Predicting when these misses will occur is challenging and requires sophisticated analysis. We care about how many cache misses will occur because they affect a task’s worst-case execution time (WCET), which is needed to calculate if a real-time workload is schedulable or not. An alternative to caches is to use a small fast local memory (Scratchpad memory, or SPM) which is managed by software. All memory accesses to the SPM are completely predictable, which simplifies the calculation of WCET.

Click on image to enlarge.

PTS may be used in order to disable preemptions among real-time tasks where possible, allowing them to share SPM space. As more variables are moved into SPM, tasks run faster, potentially enabling even more preemptions to be disabled and allowing the cycle to repeat. We have developed an automated toolchain to perform the necessary analyses and code transformations, as shown in Figure 7. This toolchain targets the ARM7 architecture, and we have evaluated it on real hardware with sample code (Kang & Dean, 2010).

Click on image to enlarge.

Figure 8 shows how DARTS uses PTS and SPM to reduce memory access times and eliminate preemptions, resulting in lower processor utilization. Allocating all data to external SRAM leads to processor utilization of about 53%. An SPM large enough to hold all data (1 KB) would reduce processor utilization to about 45%. The DARTS technique can operate on several levels of granularity (task, procedure or variable). For example, at the finest granularity level (variable) the toolchain splits stack frames to place the most frequently used automatic variables in SPM, while leaving those rarely accessed in slower SRAM. DARTS makes a small SPM (64 or 128 bytes) work almost as well as a much larger SPM (512 or 1024 bytes). It also is important to note that this memory performance is easily calculated and fully predictable, unlike a cache memory.

Saving energy using PTS
Saving energy and reducing power are critical issues for many systems these days. Batteries have limited energy, and high-performance processors have limited power-dissipation capabilities. The key technique used is dynamic voltage and frequency scaling, often shortened to DVS. If a processor running an application has idle time, then power and energy can be reduced in two ways. First, when it is idle, the processor can be turned off or put into a low-power sleep mode. Second, the processor clock rate can be slowed down to reduce or eliminate the idle time. Reducing the clock rate allows the processor supply voltage to be reduced as well. There is a quadratic relationship between voltage and power; dividing the voltage by a factor of two will divide the power by a factor of four. For over a decade high-performance processors (server, desktop and embedded)have used DVS to meet their thermal and energy budgets.

Researchers have developed methods to apply DVS to hard real-time systems. One approach is to determine the optimal speed (or slowdown factor) for each task, and use the scheduler to adjust the processor voltage and speed at each context switch.Further research builds upon this by adjusting the operating speed as the system runs. The main benefit is that slack time (resulting from tasks finishing before the predicted worst-case time) can be reclaimed through sleep or additional processor slow-down.

Jejurikar and Gupta investigated how to improve energy efficiency for real-time systems with PTS and DVS (Jejurikar & Gupta, 2004).Reducing preemptions can save energy. As expected, some benefits come from a shorter active time for the system due to fewer context switches, allowing the system to sleep or run at a lower speed and frequency.

However, reducing preemptions leads to other important benefits. Preemptions pollute the reference locality on which caches, branch target buffers and virtual memory translation look-aside buffers rely. Misses in these structures will reduce execution speed. They will also increase total system energy since servicing these misses requires accesses to main memory, which is much less energy-efficient. Applying DVS to a multithreaded system will increase the context switch overhead, due to the added delays (e.g. 20 ms) of switching the processor operating voltage and frequency. Preempted tasks may have been using peripherals which will remain powered during the preemption, wasting energy.

Click on image to enlarge.

Jejurikar and Gupta developed methods to combine processor slowdown (DVS) and PTS for real-time energy-efficient systems. They also developed slack-reclamation schemes which work with PTS to adaptively save energy based on actual execution characteristics. They evaluated their techniques using a simulations and synthetic workloads. They found that using PTS reduced energy by up to 21%, as shown in Figure 9.

Summary
Preemption-threshold scheduling enables system designers to reduce preemptions while still meeting real-time deadlines. There is a well-developed body of theoretical work showing how to use PTS and still meet all system deadlines.

Reducing preemptions makes a system run faster. Less time is wasted on context switches. Caches work more effectively due to less pollution resulting from preemption. Less stack space may be required, enabling systems to share faster memory. Power optimization becomes easier with fewer preemptions. Alexander G. Dean, PhD, is associate professor in the Department of Electrical and Computer Engineering at North Carolina State University and a member of its Center for Efficient, Scalable and Reliable Computing. He received his BS EE from the University of Wisconsin and MS and PhD ECE degrees at Carnegie Mellon. His research focus in embedded systems includes compiling for concurrency and performance, energy efficient use of COTS processors, memory allocation, and benchmarking for real-time systems and robust embedded system design.