The Real-Time Scheduler

Real-time scheduling constraints are necessary to manage data acquisition or process control hardware.
The real-time environment requires that a process be able to react to external
events in a bounded amount of time. Such constraints can exceed the capabilities
of a kernel that is designed to provide a fair distribution of the
processing resources to a set of time-sharing processes.

This section describes the SunOS real-time scheduler, its priority queue, and how to
use system calls and utilities that control scheduling.

Dispatch Latency

The most significant element in scheduling behavior for real-time applications is the provision
of a real-time scheduling class. The standard time-sharing scheduling class is not suitable
for real-time applications because this scheduling class treats every process equally. The standard
time-sharing scheduling class has a limited notion of priority. Real-time applications require a
scheduling class in which process priorities are taken as absolute. Real-time applications also require
a scheduling class in which process priorities are changed only by explicit application
operations.

The term dispatch latency describes the amount of time a system takes to respond
to a request for a process to begin operation. With a scheduler that
is written specifically to honor application priorities, real-time applications can be developed with
a bounded dispatch latency.

The following figure illustrates the amount of time an application takes to respond
to a request from an external event.

Figure 11-2 Application Response Time

The overall application response time consists of the interrupt response time, the dispatch
latency, and the application's response time.

The interrupt response time for an application includes both the interrupt latency of
the system and the device driver's own interrupt processing time. The interrupt latency
is determined by the longest interval that the system must run with interrupts
disabled. This time is minimized in SunOS using synchronization primitives that do not
commonly require a raised processor interrupt level.

During interrupt processing, the driver's interrupt routine wakes the high-priority process and returns
when finished. The system detects that a process with higher priority than the
interrupted process is now ready to dispatch and dispatches the process. The time
to switch context from a lower-priority process to a higher-priority process is included
in the dispatch latency time.

Figure 11-3 illustrates the internal dispatch latency and application response time of a system.
The response time is defined in terms of the amount of time a
system takes to respond to an internal event. The dispatch latency of an
internal event represents the amount of time that a process needs to wake
up a higher priority process. The dispatch latency also includes the time that
the system takes to dispatch the higher priority process.

The application response time is the amount of time that a driver
takes to: wake up a higher-priority process, release resources from a low-priority process, reschedule
the higher-priority task, calculate the response, and dispatch the task.

Interrupts can arrive and be processed during the dispatch latency interval. This processing
increases the application response time, but is not attributed to the dispatch latency
measurement. Therefore, this processing is not bounded by the dispatch latency guarantee.

Figure 11-3 Internal Dispatch Latency

With the new scheduling techniques provided with real-time SunOS, the system dispatch latency
time is within specified bounds. As you can see in the following table,
dispatch latency improves with a bounded number of processes.

Table 11-1 Real-time System Dispatch Latency

Workstation

Bounded Number of Processes

Arbitrary Number
of Processes

SPARCstation 2

<0.5 milliseconds in a system with fewer than 16 active
processes

1.0 milliseconds

SPARCstation 5

<0.3 millisecond

0.3 millisecond

Scheduling Classes

The SunOS kernel dispatches processes by priority. The scheduler or dispatcher supports the
concept of scheduling classes. Classes are defined as real-time (RT), system (SYS), and time-sharing
(TS). Each class has a unique scheduling policy for dispatching processes within its
class.

The kernel dispatches highest priority processes first. By default, real-time processes have precedence
over sys and TS processes. Administrators can configure systems so that the priorities
for TS processes and RT processes overlap.

The following figure illustrates the concept of classes as viewed by the SunOS
kernel.

Figure 11-4 Dispatch Priorities for Scheduling Classes

Hardware interrupts, which cannot be controlled by software, have the highest priority. The
routines that process interrupts are dispatched directly and immediately from interrupts, without regard
to the priority of the current process.

Real-time processes have the highest default software priority. Processes in the RT class
have a priority and time quantum value. RT processes are scheduled strictly on the basis
of these parameters. As long as an RT process is ready to
run, no SYS or TS process can run. Fixed-priority scheduling enables critical processes
to run in a predetermined order until completion. These priorities never change unless they
are changed by an application.

An RT class process inherits the parent's time quantum, whether finite or infinite.
A process with a finite time quantum runs until the time quantum expires.
A process with a finite time quantum also stops running if the process
blocks while waiting for an I/O event or is preempted by a higher-priority
runnable real-time process. A process with an infinite time quantum ceases execution only
when the process terminates, blocks, or is preempted.

The SYS class exists to schedule the execution of special system processes, such
as paging, STREAMS, and the swapper. You cannot change the class of a
process to the SYS class. The SYS class of processes has fixed priorities
established by the kernel when the processes are started.

The time-sharing (TS) processes have the lowest priority. TS class processes are
scheduled dynamically, with a few hundred milliseconds for each time slice. The TS
scheduler switches context in round-robin fashion often enough to give every process an
equal opportunity to run, depending upon:

The time slice value

The process history, which records when the process was last put to sleep

A child process inherits the scheduling class and attributes of the parent process
through fork(2). A process's scheduling class and attributes are unchanged by exec(2).

Different algorithms dispatch each scheduling class. Class-dependent routines are called by the kernel
to make decisions about CPU process scheduling. The kernel is class-independent, and takes the
highest priority process off its queue. Each class is responsible for calculating a
process's priority value for its class. This value is placed into the dispatch
priority variable of that process.

As the following figure illustrates, each class algorithm has its own method of
nominating the highest priority process to place on the global run queue.

Figure 11-5 Kernel Dispatch Queue

Each class has a set of priority levels that apply to processes in
that class. A class-specific mapping maps these priorities into a set of global
priorities. A set of global scheduling priority maps is not required to start
with zero or be contiguous.

By default, the global priority values for time-sharing (TS) processes range from -20
to +20. These global priority values are mapped into the kernel from 0-40,
with temporary assignments as high as 99. The default priorities for real-time (RT)
processes range from 0-59, and are mapped into the kernel from 100 to
159. The kernel's class-independent code runs the process with the highest global priority
on the queue.

Dispatch Queue

The dispatch queue is a linear-linked list of processes with the same global
priority. Each process has class-specific information attached to the process upon invocation. A process
is dispatched from the kernel dispatch table in an order that is
based on the process' global priority.

Dispatching Processes

When a process is dispatched, the context of the process is mapped
into memory along with its memory management information, its registers, and its stack. Execution
begins after the context mapping is done. Memory management information is in the
form of hardware registers that contain the data that is needed to
perform virtual memory translations for the currently running process.

Process Preemption

When a higher priority process becomes dispatchable, the kernel interrupts its computation and
forces the context switch, preempting the currently running process. A process can be
preempted at any time if the kernel finds that a higher-priority process is
now dispatchable.

For example, suppose that process A performs a read from a peripheral device.
Process A is put into the sleep state by the kernel. The
kernel then finds that a lower-priority process B is runnable. Process B is
dispatched and begins execution. Eventually, the peripheral device sends an interrupt, and the driver
of the device is entered. The device driver makes process A runnable and
returns. Rather than returning to the interrupted process B, the kernel now preempts
B from processing, resuming execution of the awakened process A.

Another interesting situation occurs when several processes contend for kernel resources. A high-priority
real-time process might be waiting for a resource held by a low-priority process.
When the low-priority process releases the resource, the kernel preempts that process to
resume execution of the higher-priority process.

Kernel Priority Inversion

Priority inversion occurs when a higher-priority process is blocked by one or more
lower-priority processes for a long time. The use of synchronization primitives such as
mutual-exclusion locks in the SunOS kernel can lead to priority inversion.

A process is blocked when the process must wait for one or more
processes to relinquish resources. Prolonged blocking can lead to missed deadlines, even for
low levels of utilization.

The problem of priority inversion has been addressed for mutual-exclusion locks for the
SunOS kernel by implementing a basic priority inheritance policy. The policy states that
a lower-priority process inherits the priority of a higher-priority process when the lower-priority
process blocks the execution of the higher-priority process. This inheritance places an upper
bound on the amount of time a process can remain blocked. The policy
is a property of the kernel's behavior, not a solution that a programmer
institutes through system calls or interface execution. User-level processes can still exhibit priority inversion,
however.

User Priority Inversion

Interface Calls That Control Scheduling

The following interface calls control process scheduling.

Using priocntl

Control over scheduling of active classes is done with priocntl(2). Class attributes
are inherited through fork(2) and exec(2), along with scheduling parameters and permissions required for
priority control. This inheritance happens with both the RT and the TS classes.

priocntl(2) is the interface for specifying a real-time process, a set of processes,
or a class to which the system call applies. priocntlset(2) also provides the
more general interface for specifying an entire set of processes to which the
system call applies.

The command arguments of priocntl(2) can be one of: PC_GETCID, PC_GETCLINFO, PC_GETPARMS, or
PC_SETPARMS. The real or effective ID of the calling process must match the
real or effective ID of the affected processes, or must have superuser privilege.

PC_GETCID

This command takes the name field of a structure that contains a recognizable class name. The class ID and an array of class attribute data are returned.

PC_GETCLINFO

This command takes the ID field of a structure that contains a recognizable class identifier. The class name and an array of class attribute data are returned.

PC_GETPARMS

This command returns the scheduling class identifier or the class specific scheduling parameters of one of the specified processes. Even though idtype and id might specify a big set, PC_GETPARMS returns the parameter of only one process. The class selects the process.

PC_SETPARMS

This command sets the scheduling class or the class-specific scheduling parameters of the specified process or processes.

Other interface calls

sched_get_priority_max

Returns the maximum values for the specified policy.

sched_get_priority_min

Returns the minimum values for the specified policy. For more information, see the sched_get_priority_max(3R) man page.

sched_rr_get_interval

Updates the specified timespec structure to the current execution time limit.

sched_setparam, sched_getparam

Sets or gets the scheduling parameters of the specified process.

sched_yield

Blocks the calling process until the calling process returns to the head of the process list.

Utilities That Control Scheduling

The administrative utilities that control process scheduling are dispadmin(1M) and priocntl(1). Both
of these utilities support the priocntl(2) system call with compatible options and loadable modules.
These utilities provide system administration functions that control real-time process scheduling during runtime.

priocntl(1)

The priocntl(1) command sets and retrieves scheduler parameters for processes.

dispadmin(1M)

The dispadmin(1M) utility displays all current process scheduling classes by including the -l
command line option during runtime. Process scheduling can also be changed for the class
specified after the -c option, using RT as the argument for the real-time
class.

A class-specific file that contains the dispatch parameters can also be loaded during
runtime. Use this file to establish a new set of priorities that replace
the default values that were established during boot time. This class-specific file must
assert the arguments in the format used by the -g option. Parameters
for the RT class are found in the rt_dptbl(4), and are listed in Example 11-1.

To add an RT class file to the system, the following modules must
be present:

Load the class-specific module with the following command, where module_name is the class-specific module.

#modload /kernel/sched/module_name

Invoke the dispadmin command.

#dispadmin -c RT -sfile_name

The file must describe a table with the same number of entries as the table that is being overwritten.

Configuring Scheduling

Associated with both scheduling classes is a parameter table, rt_dptbl(4), and ts_dptbl(4).
These tables are configurable by using a loadable module at boot time, or
with dispadmin(1M) during runtime.

Dispatcher Parameter Table

The in-core table for real-time establishes the properties for RT scheduling. The rt_dptbl(4)
structure consists of an array of parameters, struct rt_dpent_t. Each of the n priority levels
has one parameter. The properties of a given priority level are specified by
the ith parameter structure in the array, rt_dptbl[i].

A parameter structure consists of the following members, which are also described in
the /usr/include/sys/rt.h header file.

rt_globpri

The global scheduling priority associated with this priority level. The rt_globpri values cannot be changed with dispadmin(1M).

rt_quantum

The length of the time quantum allocated to processes at this level in ticks. For more information, see Timestamp Interfaces. The time quantum value is only a default or starting value for processes at a particular level. The time quantum of a real-time process can be changed by using the priocntl(1) command or the priocntl(2) system call.

Reconfiguring config_rt_dptbl

A real-time administrator can change the behavior of the real-time portion of the
scheduler by reconfiguring the config_rt_dptbl at any time. One method is described in
the rt_dptbl(4) man page, in the section titled “Replacing the rt_dptbl Loadable Module.”

A second method for examining or modifying the real-time parameter table on a
running system is through the dispadmin(1M) command. Invoking dispadmin(1M) for the real-time class
enables retrieval of the current rt_quantum values in the current config_rt_dptbl configuration from
the kernel's in-core table. When overwriting the current in-core table, the configuration file used
for input to dispadmin(1M) must conform to the specific format described in the
rt_dptbl(4) man page.

Following is an example of prioritized processes rtdpent_t with their associated time quantum
config_rt_dptbl[] value as the processes might appear in config_rt_dptbl[].