A Survey of Task Schedulers

There are many different kinds of task schedulers available to software developers of embedded and real-time systems. They range from a simple cyclic executive that you can build at home, to full-featured priority-based preemptive schedulers that are available commercially, and beyond to "futuristic"-sounding deadline schedulers. Depending on the nature of your application and your I/O requirements, you can choose the appropriate scheduler from the wide spectrum of schedulers that will be described here. The table below lists the various kinds of task schedulers that will be presented in this paper. Some are limited in the kinds of hardware device input and output (I/O) operations that they can support, as noted. TASK SCHEDULER TYPE Endless Loop Basic Cyclic Executive Time-Driven Cyclic Executive Multi-rate Cyclic Executive Multi-rate Executive for Periodic Tasks Multi-rate Executive with Interrupts Priority-based Preemptive Scheduler Deadline Scheduler TASK EXECUTION No tasks. As often as possible. Single frequency. Multiple frequencies. Multiple frequencies, at higher precision. Multiple frequencies, at higher precision. Periodic and Non-periodic tasks. Periodic and Non-periodic tasks. + Better handling of real-time requirements. DEVICE I/O METHOD(S) Polled I/O Only. Polled I/O Only. Polled I/O Only. Polled I/O Only. Polled I/O Only. Polled and Interrupt-driven I/O. Polled and Interrupt-driven I/O. Polled and Interrupt-driven I/O.

THE ENDLESS LOOP For very simple embedded systems, the most basic way to write application software is as an endless loop. The activities programmed within the loop are executed in sequence. Branches and nested loops are OK, as long as when the code is done executing, it loops back to the beginning for another go-round. The endless loop is the easiest way to get an embedded system to keep on working for you. Embedded systems software typically does not just run once from start to finish. Its objective is not simply to collect some inputs, and then process them to produce some outputs in one-shot fashion. Instead, most embedded systems need to quickly, automatically, and repeatedly go through many cycles of input collection, processing and production of outputs. For example, a medical monitoring device is not expected to analyze a single patient heartbeat and then shut down; rather, it is expected to continue analyzing patient heartbeats, beat after beat, for a long period of time. Similarly for an automotive engine monitoring system, or an air traffic control system. Often embedded systems are expected to repeat these software cycles for days or weeks at a time, without human intervention. Sometimes they are required to go on operating automatically for even months or years at a time. Putting the processing for these repeating cycles within an endless loop, is a simple way of achieving this. Example (in pseudocode -- "structured English") : DO FOREVER Request Input Device make a Measurement Wait for the Measurement to be ready Fetch the Value of the Measurement Process the Value of the Measurement IF Value is Reasonable THEN Prepare new Result using Value

END DO

ELSE Report an Error END-IF Request Output Device to deliver the Result Wait for the Result to be output Confirm that output is OK

In some simple embedded systems this style of programming works well, especially if the software can complete the sequence of code and loop around quickly enough. But in other embedded systems, this style of programming will result in performance that is too slow. Please also keep in mind that interrupts delivering data from hardware devices can not be handled in this style of programming. [They'd screw things up, in ways that we'll see later on.] I/O Devices can only be polled, if they are to interact with software in the loop. BASIC CYCLIC EXECUTIVE In somewhat more complex embedded systems, the idea of the endless loop can be extended. There may be hundreds or thousands of lines of code in such complex systems, and so software designers like to organize the code into separate units referred to as tasks. These tasks (sometimes also called processes) should be as independent as possible of one another, so that each deals with a separate application issue and interacts as little as possible with other tasks. In a software design using a basic cyclic executive, these tasks execute in standard sequence within an infinitelyrepeating loop. This is much like the endless loop scheduler design, but now dealing with larger chunks of software we're calling tasks. This style is sometimes called round-robin scheduling. It is illustrated in Figure_1 below, where each rectangle depicts a separate task.

Figure 1: Basic Cyclic Executive, showing Round-Robin Scheduling

These tasks can pass information to one another easily, by simply writing and reading shared data. Thats because every task always runs to completion before the next task begins running. So theres no danger of a task getting incomplete data from a previous task. Here too, interrupts from hardware devices can not be handled in this style of programming. I/O Devices must be polled, if they are to interact with tasks in the loop. In some sense, this can be thought of as real-time task scheduling, if all of the software in the loop executes quickly and the loop can execute repeatedly at a very rapid rate. TIME-DRIVEN CYCLIC EXECUTIVE For some applications, the view of real-time taken by a basic cyclic executive is not precise enough. A basic cyclic executive tries to run its tasks as quickly and as often as possible. But in more sophisticated applications, precision of timing is often more important than raw speed. A time-driven cyclic executive can begin to address the requirement for more precise timing. In this scheme, one hardware timer interrupt is used to trigger the execution of all tasks. The tasks execute one after another, each one running to completion before the next one begins. For a time-driven cyclic executive to work correctly, the final task in the chain of tasks must complete its execution before the next timer interrupt arrives. The rate of hardware timer interrupts is the rate at which all tasks must execute. See Figure_2.

Figure 2: Time-Driven Cyclic Executive Although there is a hardware timer interrupt involved here, tasks can still pass information to one another easily, by simply writing and reading shared data. Every task runs to completion before another task begins running. Interrupts from hardware devices (other than the timer) can not be handled in this style of programming. I/O Devices must be polled, if they are to interact with tasks. MULTI-RATE CYCLIC EXECUTIVE The time-driven cyclic executive above assumes that all tasks need to run at the same rate of repetition. real-world applications, different tasks may need to run at different rates. But in most

A modified time-driven cyclic executive, called the multi-rate cyclic executive, can handle this need reasonably well in cases where a higher rate is an integer multiple of the base rate. The idea is simple: In a multi-rate cyclic executive, the base-rate tasks run once per timer interrupt, and a higher rate task can run a number of times per timer interrupt. That number is the integer multiple of the base rate. And the repeated executions of the higher rate task should be as equally spaced as possible within the sequence of tasks following a timer interrupt. Often the base-rate period is called the major cycle, and higher rates identify so-called minor cycles.

The example illustrated in Figure_3 shows a system of 10 tasks that execute at the base rate (e.g., 10 Hz, if the timer delivers 10 interrupts per second). In addition, there is an eleventh task marked by a star, which should execute at 40 Hz, 4 times the base rate. This is done by having the starred task appear 4 times in the chain of task execution that follows each timer interrupt. LIMITATIONS OF CYCLIC EXECUTIVES Cyclic executives have been shown to be effective in scheduling application software tasks, while remaining fairly simple to implement. With the help of a hardware timer interrupt, they can run tasks at a regular rate. Or even run different tasks at different rates. Tasks can communicate with one another through shared data, without special concern about data integrity. Hardware devices (other than the timer) are polled, rather than interrupt driven. The limitation that hardware devices must be polled when using a cyclic executive is often a serious drawback. If the device is not polled frequently enough, transient occurences of importance might be missed. If the device is polled too frequently, much of the processors power might be wasted on the execution of the device polling software. For these reasons, interrupt-driven peripheral devices are usually preferable for I/O. Another objection to cyclic executives is that the timing of execution of a task can not be controlled precisely. Even when hardware timing is used to trigger the execution of a chain of tasks, only the first task in the chain has its start time determined precisely by hardware. The second task in the chain starts to run whenever the first ends, and so on. If these tasks contain code of varying processor loading such as data-dependent loops, all later tasks in the chain will run at times influenced by the loads on previous tasks. So precise timing of tasks further along the chain of tasks cannot be finely controlled. Even if all tasks do not contain code of varying processor loading, timing of individual tasks is only approximate. This can be seen in the illustrated example of the multi-rate cyclic executive. In that diagram, the starred task is required to

execute at a rate of 40 Hz. In other words, there should be precisely 25 milliseconds (or 25,000.00 microseconds) between successive execution starts for the starred task. If the diagram is viewed as a circle where a complete circumference represents one 10 Hz base period, then the starred task should execute at angles of precisely 0 degrees, 90 degrees, 180 degrees and 270 degrees. But it does not necessarily do so. Sometimes it executes a tad early. Sometimes, a tad late. It all depends on when the previous task finished executing its code, and how long the following task will be. Remember, each task must run to completion and cannot be interfered with in mid-execution. Some software designers have in the past decided to solve these timing problems by actually counting machine cycles of the computer instructions to be executed by each task, in order to figure out precisely how long each task will take. Then the designer had to determine exactly how much of a certain task could execute before a precisely timed task needed to run. This part of the task would be allowed to run, and then the precisely timed task would be inserted for execution, before the remainder of the first task would run some time later. Effectively, the task would be cut in two, as shown in Figure_4.

Figure 4: Multi-Rate Cyclic Executive, with Black Task cut in two to allow Precisely-Timed Starred Task to Run But this solution itself gives rise to several new problems: (a) If the tasks involved in a mid-task switch share some data structures, those data could end up in an inconsistent state because of the mid-task switch. This could result in a numeric error in the outputs of either of the tasks involved. (b) Every time that software maintenance causes some code to be changed or added in the tasks that run before the mid-task switch, machine cycles need to be re-counted and task timings recalculated. A task might need to be cut apart differently for the mid-task switch in this new code situation. In other words, this solution is in itself an error-prone and excruciatingly tedious method of building software. So rather than offer it as a solution, this should be offered as an example of an attempt to use a cyclic executive beyond its realm of usefulness. Cyclic executives should not be used in situations where timing requirements are so precise and critical, that there is thought of surgically cutting a task into two sections. MULTI-RATE EXECUTIVE FOR PERIODIC TASKS If all tasks are periodic, but required to run at differing rates of execution, then a multi-rate (non-cyclic) executive can often be better than a cyclic executive. In such a scheduler, timer interrupts must be delivered at a rate which is the lowest common multiple of all the rates of all the tasks. And at each timer interrupt (or tick), some tasks can be made to execute. For example, if tasks need to execute at 50 Hz, 60 Hz and 100 Hz, then timer interrupts must be delivered at a rate of 300 Hz. The 100 Hz task will be made to execute on every third tick. The 60 Hz task will be made to execute on every fifth tick. And the 50 Hz task will be made to execute on every sixth tick. If tasks do not need to be time-synchronized with each other, then they could be executed at ticks which are offset from one another. For example, the 3 tasks described here need not all be run at tick number zero. Perhaps the 100 Hz task would be run for the first time at tick #0; and the 60 Hz task at tick #1, and the 50 Hz task at tick #2. A simpler example is shown in Figure_5. Here we have only 2 rates, with the higher rate or minor cycle being four times the lower rate or major cycle.

Figure 5: Multi-Rate (Non-Cyclic) Executive, with Minor Cycle = 1/4 * Major Cycle Every task must run to completion before another task begins running. As with cyclic executives, tasks can pass information to one another easily, by simply writing and reading shared data. All hardware devices (other than the timer) must be polled.

CAUTION: ADDING INTERRUPTS

The restriction to polled hardware devices in all the previous types of schedulers, is a serious limitation. Modern hardware I/O devices are typically interrupt driven. But interrupt driven devices can cause problems of their own, if they are not handled properly in software. Very often, if an Interrupt Service Routine (or ISR) tries to pass data to the very same task that it is interrupting, the task may not receive and handle the passed data properly. For example, a task may begin processing some data and then an interrupt service routine might interrupt it and update the data, followed by the task reading some more data for purposes of further processing. The net result would be: Part of the data processing in the task is done on an old data value, and another part is done on new data values, resulting in possible inconsistent outputs from the task. In extreme cases, the resulting outputs might be non-sensical. Another example is a situation of a task and an interrupt service routine communicating through a shared data table, as shown in Figure_6. If an interrupt occurs and is serviced while the task is in the midst of reading from the table, the task might well read old data from part of the table and new data from other parts of the table. This combination of old and new data might well be inconsistent, and might well lead to erroneous results.

Figure 6: Errors Can Result, if an Interrupt Service Routine ("ISR") and a Task Share a Data Area For instance, the Task shown in the upper right corner of Figure_6 might be interrupted when it is in the middle of reading from the shared data area. If the ISR writes into the same shared data area, the Task will read a combination of some areas of newly-updated data plus some areas of data that have not yet been updated. [New Yorkers would call this a mish-mash.] These updated and old data areas together may be incorrect in combination, or may not even make sense. Let's assume this is a shared data area containing temperature measurements, that already has as its contents the value 99 F. A task begins reading from the shared data area , reading character-after-character. The task is interrupted in the middle of reading, by an ISR that receives the value "10 C" from temperature sensor hardware, and the ISR writes this new value into the shared data area. When the ISR is done writing, the task continues reading from where it had left off. The net result is that the task could possibly read a value like 90 C or 99 C ( rather than 99 F or "10 C"), depending on precisely when the interrupt happened and the ISR ran. The partially updated values are clearly incorrect, and are caused by delicate timing coincidences that are very hard to debug or reproduce consistently. Hence, the integration of interrupt-driven software with task schedulers requires special care, particularly in terms of information exchange between ISRs and tasks. Some software designers have gone so far as to disable interrupts whenever a task prepares to read or write to a data area it shares with an ISR. But this is quite a "brute-force" approach that temporarily severs the connection between software and hardware --- effectively un-embedding the embedded system, and risking the loss of data that need to go to / from hardware. [We'll see some more "gentle", less intrusive ways of ensuring reliable information exchange between ISRs and tasks, later on in this paper.]

MULTI-RATE EXECUTIVE WITH INTERRUPTS One clever idea for avoiding the pitfalls we have seen when ISRs and tasks interact, is to have the ISRs write their input data into one set of buffers, and then have the tasks use data from a completely separate set of buffers. At every clock tick (of the multi-rate executive for periodic tasks, for example), interrupts are turned off and input data are copied from the ISR buffers to the task buffers. Then interrupts are turned back on, and the tasks scheduled for that tick are permitted to execute. This is illustrated in Figure_7, below. If clock ticks occur once per millisecond or once per 10 milliseconds, for example, this approach is much less intrusive to software-hardware connections than is disabling interrupts every time a task accesses a shared data area. In this way, data can be transferred from ISRs to tasks without danger of inconsistent data (since interrupts are disabled during the data transfer). But interrupts are re-enabled and active while the actual application tasks are running. This technique works when all scheduled tasks finish running before the next clock tick.

Figure 7: Multi-Rate Executive with Interrupts This kind of scheduler can become quite complex, and can no longer be written as a weekend garage project. With this scheduler, as with previous schedulers, every task runs to completion before another task begins running. And also, as with previous kinds of schedulers, tasks can pass information to one another easily, by simply writing and reading shared data. However, hardware devices are no longer restricted to polled I/O only. They can be interrupt-driven. But it is important to note that the information delivered by an interrupt and acquired into software by an ISR, is not immediately passed onward to a task for further processing. ISR data are transferred to task buffers only after the next timer interrupt. In some applications, this could be an unacceptable delay or complication. GETTING FASTER RESPONSE: PREEMPTIVE SCHEDULING The schedulers that have been surveyed so far are called non-preemptive, because switching between tasks only takes place when one task has fully completed its execution and another task wants to start its execution (from the beginning of its code). Faster response can often be obtained by going over to a preemptive scheduler. With a preemptive scheduler, switching between tasks can take place at any point within the execution of a task -- even when the task isn't yet done executing its code. "Preemption" really means that the scheduler is allowed to stop any task at any point in its execution (even in the middle of a high-level language instruction). For example, when an interrupt occurs its ISR might want to say something like I dont care which task was executing before my interrupt, and I dont want to wait for the next timer tick. I would like Task #67 to begin executing right now! A preemptive scheduler can do this, as shown in Figure_8.

Figure 8: Preemptive Scheduling Example (Purple Task is Preempted)

However, a preemptive scheduler is orders of magnitude more complex to build than any of the non-preemptive schedulers we've looked at so far. And such a scheduler gobbles up lots of RAM memory for storage of task contexts and other task status information. Such a scheduler could not be written in a month of Sundays as a garage project. Most commercially available real-time operating systems have preemptive schedulers as their underlying task scheduling engines. When you begin to think about using a preemptive scheduler for your embedded project, after rejecting all of the nonpreemptive "home-made" schedulers we've surveyed in the first 2/3 of this paper --- that's when you need to start thinking about getting an off-the-shelf Real-Time Operating System ("RTOS"). If you feel that the embedded system you want to develop can not be built with restrictions such as: * Only Periodic Execution of Tasks, or... * Limited to 1 Frequency of Task Execution, or a Small Number of (Simply-Related) Frequencies, or... * Large Time Jitter in Execution of Tasks, or... * Polled I/O Only, or Interrupt I/O with Major Delays in Response to Interrupts, ... then you really need to look at using an off-the-shelf RTOS. Attempting to write your own preemptive scheduler would be a major project in itself, that would seriously delay the development of the embedded system in which you want to use it. With a preemptive scheduler, hardware devices can be polled for I/O or interrupt-driven, or both. Information delivered by an interrupt and acquired into software by an ISR, can immediately be passed onward to a task for further processing, to achieve very rapid response.

CAUTION: PREEMPTIVE SCHEDULERS BRING UP NEW ISSUES

Preemptive schedulers offer the software developer many benefits, beyond what can be achieved with simpler homemade schedulers. They allow embedded application software to be scheduled with very rapid responsiveness. But their sophistication brings up new issues, of which a software developer must be aware. The first issue that a software developer must consider is: Which tasks should a given task be allowed to preempt if they are running, and which tasks should a given task not be allowed to preempt ? This question is answered by assigning a priority number to each task. Tasks of higher priority can preempt tasks of lower priority. But tasks with lower priority can not preempt tasks of higher priority. A preemptive scheduler needs to be told what is the priority of each task that it can schedule. A second issue that a software developer must consider is: Tasks that can be preempted and that can preempt others, cannot be allowed to pass information from one to another by simply writing and reading shared data. Simple methods for passing information between tasks which worked for non-preemptive schedulers, no longer work for preemptive schedulers. The problem in passing information between tasks by writing and reading shared data when using a preemptive scheduler, is very similar to that we've seen between ISRs and tasks. It can be described as follows: If one task preempts another while the other task is in the middle of reading from a shared data table, that other task might well read old data from part of the table and new data from another part of the table, after the first (preempting) task writes new data into the table. This combination of old and new data might well be inconsistent, and might well lead to erroneous results. This is illustrated in Figure_9.

Figure 9: Errors Can Result, if Preempting Tasks Share a Data Area This is in fact the same problem that occurs if an ISR and a task try to communicate by writing and reading shared data, as shown earlier in this paper. In order to help the software developer prevent these problems when tasks try to share information, an operating system that has a preemptive scheduler needs to also provide mechanisms for passing information between tasks (and also to / from ISRs). In fact, that's the primary difference between "just" a task scheduler, and a "full-fledged" RTOS: The RTOS provides both a task scheduler (usually, priority-based and preemptive), and one or more mechanisms for reliably passing information between tasks. The specific mechanisms provided vary in the various RTOSs. Most real-time operating systems provide messaging via queues, plus some form of semaphores. Different operating systems may provide differing additional mechanisms such as event flags, pipes, mutexes, and / or

asynchronous signals. [For more information about inter-task communication and synchronization mechanisms in popular real-time operating systems, attend one of our introductory courses "Introduction to Real-Time Operating Systems" or "Introduction to Embedded Systems and Software".] Figure_10 shows a task passing a message through a message queue for another task that it has preempted. The message passing services of the real-time operating system guarantee the integrity of the message that is passed from task to task (or from ISR to task, or from task to ISR).

These operating system provided mechanisms must be used every time information is to be passed between tasks, in order to ensure reliable delivery of that information in a preemptible environment. DEADLINE SCHEDULING Users of off-the-shelf priority based preemptive schedulers for "hard" real-time systems, sometimes have the following objection: Where do I tell the scheduler what are the deadlines for my tasks, so that the scheduler will make sure theyre met??. The fact of the matter is that you cant tell these schedulers about task deadlines. They have no way of dealing with that kind of information. All they want to be told is each tasks priority; and then they'll do their task scheduling based on the priority numbers. The mapping between deadlines and priorities is often not straightforward. In fact, its often downright impossible for a software designer to be 100% sure that tasks will always meet their deadlines if a priority-based preemptive scheduler with fixed task priorities is used. An alternative kind of preemptive task scheduler is called a Deadline Scheduler. This kind of scheduler tries to provide execution time to the task that is most quickly approaching its deadline. This is typically done by the scheduler changing priorities of tasks on-the-fly as they approach their individual deadlines. Figure_11 illustrates a deadline scheduling scenario.

Figure 11: Deadline Scheduling Scenario Timeline In this timeline, Task B normally has higher priority than Task A. However, Task A's execution deadline is closer than that of Task B, and Task A's deadline would be missed if Task B ran first. So a deadline scheduler would be aware of the approaching deadline and temporarily raise the priority of Task A to above that of Task B. This would allow Task A to complete its execution before its deadline. In the example illustrated above, Task B would then also complete its execution before it violates its own Task B execution deadline. Today's popular commercially-available off-the-shelf real-time operating systems dont offer deadline scheduling; but some off-the-shelf operating systems may offer them in the future. [For more information about deadline scheduling, attend our advanced course "Architectural Design of Real-Time Software".] PARTITION SCHEDULING Some users of off-the-shelf priority based preemptive schedulers for safety-critical real-time systems, may have another objection: Where do I tell the scheduler that certain critical tasks should never be starved, so that the scheduler will make sure theyll always get their chance to run on time??. The fact of the matter is that you cant tell priority based preemptive schedulers about processor time availability requirements. They have no way of dealing

with that kind of information. All they want to be told is each tasks priority; and then they'll do their task scheduling based solely on the priority numbers. They have no way to override the priority of a task, if another high-priority task is being starved of CPU access time. An example of a system where we want to ensure that certain tasks will always get their chance to run on time, is a control system for a nuclear reactor. Nowadays, with powerful embedded processors available, a single processor might be given a number of major responsibilities such as moving the graphite control rods of the nuclear reactor (based on real-time sensor data), plus accumulating a data base of nuclear reactor information, plus also managing an elegant graphic user interface for the human operator of the nuclear reactor control system. Wed like to make sure that the graphite control rods of the nuclear reactor are adjusted in a timely fashion, even if some database or user interface software is asking to run at higher priority. [The database or user interface software might be assigned higher priority by its designer, or as a result of a software error.] An elegant way to ensure that a critical task always has the processing time that it needs, is to use a Partition Scheduler. Such a scheduler asks that tasks be organized into groups, each group containing the tasks that deal with a certain aspect of a system. These groups might be called blocks or processes or partitions, depending on the specific RTOS. Each group can contain a large number of related tasks. See Figure 12.

Figure 12: Partition Scheduling - Organization of Groups and Tasks In our nuclear reactor controller example we may, for example, have a Rod Control Process, a Database Process, and a User Interface Process --- each process containing several tasks. The Rod Control Process contains tasks named Sensor Reader Task, Motion Calculator Task, Rod Mover Task, Sensor Recalibrator Task, etc. A partition scheduler will run processes according to a sequence of time windows that are specified by the system designer. The tasks in each process run only during the time window for that process. All tasks in all other processes are not allowed to run during this time window. The tasks within the active process are typically scheduled using priority-based preemption during the window of activity. Thus, when the time window of a process is active, the high priority tasks in that process will be certain to run, since no tasks of any other processes could possibly run in this time window. Continuing with our nuclear reactor controller example, the three high-priority tasks within the Rod Control Process, called Sensor Reader Task, Motion Calculator Task, and Rod Mover Task, would be ensured of running time during the regularly recurring time windows of the Rod Control Process. No task in the Database Process or the User Interface Process could ever steal any processing time from these Rod Control Process tasks. Such a partition scheduler is described in a standards document called ARINC Specification 653 for use in critical avionic software. [ For more information, visit: http://www.arinc.com/aeec/projects/apex/index.html ] Several commercially available RTOSs specializing in avionic applications, offer ARINC 653-style partition schedulers.