TTK4147: Real-time Systems

Introduction

Real-time computing

Optimized for predictability while general purpose is optimized for speed and throughput

Not really possible to do with general-purpose operating systems

Not a fancy way of saying fast computing

Not irrelevant even if hardware is fast enough

Constraints

Hard real-time: Missed deadline can cause total system failure

Firm real-time: Missed deadline means that the result has no value

Soft real-time: Missed deadline means that the result is less valuable

Task triggers

Periodic: Task performed at a regular time interval

Sporadic: Task performed when event occurs, limits on how often the event can occur

Aperiodic: Task performed when event occurs, no limitations on when it can occur

FreeRTOS

Minimalistic operating system for microcontrollers that often does not use operating systems

Run on many platforms

VxWorks and QNX

Proprietary operating systems for embedded and real-time systems

Stable and proven platforms

Expensive licenses

Real-time variants of Linux

Various methods for improving the real-time capabilities of Linux

RT-Linux, RTAI, Xenomai, preempt-RT

From book

Real-time system definition

Any system in which the time at which output is produced is significant.
This is usually because the input corresponds to some movement in the physical world, and the output has to relate to that same movement.
The lag from input time to output time must be sufficiently small for acceptable timeliness.

Cohesion is concerned with how well a module holds together – its internal strength.

Coupling, by comparison is a measure of the interdependence of program modules. If two modules pass control information between them, they are said to possess high (or tight) coupling.

Within all design methods, a good decomposition is one that has strong cohesion and loose coupling. This principle is equally true in sequential and concurrent programming domains.

From book

Processor: Controls the operation of the computer and performs its data processing functions. Where there is only one processor, it is often referred to as the central processing unit (CPU)

One of the processor’s functions is to exchange data with memory. Makes use of two internal registers: a memory address register (MAR), which specifies the address in memory for the next read or write; and a memory buffer register (MBR). Which contains the data to be written into memory or which receives the data read from memory.

Main memory: Stores data and programs. This memory is typically volatile.

I/O modules: Move data between the computer and its external environment.

System bus: Provides communication among processors, main memory, and I/O modules.

Microprocessors are now multiprocessors; each chip (called a socket) contains multiple processors (called cores), each with multiple levels of large memory caches, and multiple logical processors sharing the execution units of each core.

The classic microprocessor is giving way to the system on a chip (SoC), where not just the CPUs and caches are on the same chip, but also many of the other components of the system, such as DSPs, GPUs, I/O devices (such as radios and codecs), and main memory.

The replacement algorithm chooses, within the constraints of the mapping function, which block to replace when a new block is to be loaded into the cache and the cache already has all slots filled with other blocks

An OS is a program that controls the execution of application programs and acts as an interface between applications and the computer hardware. It has three objectives

Convenience – an OS makes a computer more convenient to use

Efficiency – An OS allows the computer system resources to be used in an efficient manner

Ability to evolve – An OS should be constructed in such a way as to permit the effective development, testing, and introduction of new system functions without interfering with service

A monolithic kernel is implemented as a single process, with all elements sharing the same address space

A microkernel architecture assigns only a few essential functions to the kernel, including address spaces, IPC, and basic scheduling. Other OS services are provided by processes, sometimes called servers, that run in user mode and are treated like any other application by the microkernel

Multiprogramming and memory

Multiprogramming:

Maximize processor utilization, providing reasonable response times, let different tasks use different resources, allow tasks to communicate with each other

Process

A process is an entity that can be assigned to and executed by a processor

Each process has their own set of data

Dispatcher – Tells the OS which process to start running

5 state process model:

Running – process currently executing in the CPU

Ready – Process is in a queue waiting to be executed

Blocked – running process has to wait for an event, remain blocked until the event occur, a separate queue for the blocked processes

New – Process has been created but not yet added to the pool of executable processes

Exit – Process that is removed from the pool of executable processes, but still exists in memory

Swapping – when there is no room for a process in main memory, it is swapped to disk
-Seven-state process model includes suspend states for ready and blocked

Process image

User data – the modifiable part of the user space. May include program data, a user stack area, and programs that may be modified

User program – the program to be executed

Stack – Each process has one or more LIFO stacks associated with it. A stack is used to store parameters and calling addresses for procedure and system calls

Process switching may occur any time the OS has gained control from the currently running process, caused by events like clock interrupt, I/O interrupt, memory fault)

In a multiprocessor system, more than one process can be in the running state

Two main characteristics of a process:

Resource ownership – process or task – virtual address space to hold the process image. A process may from time to time be allocated control or ownership of resources (main memory, I/O or files). The OS performs a protection function to prevent unwanted interference between processes with respect to resources.

Scheduling/Execution – lightweight process – follows an execution path through one or more programs. May be interleaved with other processes. Thus, a process has an execution state and a dispatching priority and is the entity that is scheduled and dispatched by the OS.

Threads within the same process can communicate with each other without invoking the kernel, because they share memory and files

Any alteration of a resource by one thread affects the environment of the other threads in the same process. It is therefore necessary to synchronize the activities of the activities of the various threads so that they do not interfere with each other or corrupt data structures.

User-level threads – all of the work of thread management is done by the application and the kernel is not aware of the existence of threads (only sees the process).

Advantages of user-level threads instead of kernel-level threads(KLT)

Thread switching does not require kernel mode privileges because all of the thread management data structures are within the user address space of a single process. Saves overhead of mode switches (user to kernel and back)

Scheduling can be application specific. One application may benefit most from RR scheduling while another from priority based.

ULTs can run on any OS. No changes are required to the underlying kernel support ULTs.

Disadvantages of ULTs compared to KLTs

In a typical OS, many system calls are blocking. As a result, when a ULT executes a system call, not only is that thread blocked, but also all of the threads within the process are blocked

In a pure ULT strategy, a multithreaded application cannot take advantage of multiprocessing. A kernel assigns one proves to only one processor at a time.

KLT – are threads within a process that are maintained by the kernel. Because they are recognized by the kernel, multiple threads within one process can execute in parallel on a multiprocessor and the blocking of one thread does not block the entire process. However, a mode switch is required to switch from one thread to another.

Multithreaded native applications are characterized by having a small number of highly threaded processes

Multiprocess applications are characterized by the presence of many single-threaded processes

Process relate to resource ownership, thread relates to program execution

Memory management terms

Frame – a fixed length block of main memory

Page – a fixed length block of data that resides in secondary memory. A page of data may temporarily be copied into a frame of main memory

Segment – a variable length block of data that resides in secondary memory. An entire segment may temporarily be copied into an available region of main memory (segmentation) or the segment may be divided into pages which can be individually copied into memory (combined segmentation and paging)

Internal fragmentation – wasted space internal to a partition due to the fact that the block of data loaded is smaller than the partition

External fragmentation – memory that is external to all partitions becomes increasingly fragmented. Lots of small holes in memory and memory utilization declines

Memory management techniques

Fixed partitioning – main memory is divided into a number of static partitions at system generation time. A process may be loaded into a partition of equal or greater size. Simple, but inefficient use of memory due to internal fragmentation

Dynamic partitioning – Partitions are created dynamically, so that each process is loaded into a partition of exactly the same size as that process. No internal fragmentation and more efficient use of main memory, but inefficient use of processor due to the need for compaction to counter external fragmentation

Simple Paging – Main memory is divided into a number of equal-size frames. Each process is divided into a number of equal-size pages of same length as frames. A process is loaded by loading all of its pages into available, not necessarily contiguous frames. No external fragmentation, but a small amount of internal fragmentation.

Simple segmentation – each process is divided into a number of segments. A process is loaded by loading all of its segments into dynamic partitions that need not be contiguous. No internal fragmentation, improved memory utilization and reduced overhead compared to dynamic partitioning, but external fragmentation

Virtual memory paging – as with simple paging, except that it is not necessary to load all of the pages of a process. Nonresident pages that are needed are brought in later automatically. No external fragmentation, higher degree of multiprogramming, large virtual address space, but overhead of complex memory management

Virtual memory segmentation – as with simple segmentation, except it is not necessary to load all of the segments of a process. Nonresident segments that are needed are brought in later automatically. No internal fragmentation, higher degree of multiprogramming, large virtual address space, protection and sharing support, but overhead of complex memory management

Paging – main memory is divided into many small equal-size frames. Each process is divided into frame-size pages. Smaller process requires fewer pages; larger processes require more. When a process is brought in, all of its pages are loaded into available frames, and a page table is set up. This approach solves many of the problems with partitioning.

Segmentation – a process is divided into a number of segments that need not be of equal size. When a process is brought in, all of its segments are loaded into available regions of memory, and a segment table is set up

Virtual memory terminology

Virtual memory – a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory. The addresses a program may use to reference a memory are distinguished from the addresses the memory system uses to identify physical storage sites, and program-generated addresses are translated automatically to the corresponding machine addresses. The size of virtual storage is limited by the addressing scheme of the computer system and by the amount of secondary memory available and not by the actual number of main storage locations.

Virtual address – the address assigned to a location in virtual memory to allow that location to be accessed as though it were part of main memory

Virtual address space – the virtual storage assigned to a process

Address space – the range of memory addresses available to a process

Real address – the address of a storage location in main memory

Virtual memory paging

Not all pages of a process need be in main memory frames for the process to run. Pages may be read in as needed. Reading a page into main memory may require writing a page out to secondary memory

Virtual memory segmentation

Not all segments of a process need be in main memory for the process to run. Segments may be read in as needed. Reading a segment into main memory may require writing one or more segments to secondary memory

Trashing – throwing out a piece of memory just before needing it and having to go get it back almost immediately – swapping pieces rather than executing instructions

Uniprocessor scheduling, inter-process communication and FreeRTOS

Long-term scheduling – which processes that are allowed to be added to the pool of processes being executed

Medium-term scheduling – decide which processes are fully or partially swapped into main memory

Short-term scheduling – decide which of the currently ready processes the CPU should execute – called the dispatcher – main objective is to allocate processor time to optimize certain aspects of system behavior

Scheduling criteria:

Turnaround time – interval between submission of a process and its completion

Response time – time from the submission if a request until the start of response

Deadlines – when process completion deadlines can be specified, as many deadlines as possible should be met

Predictability – a given job should run in about the same amount of time and at about the same cost regardless of the load on the system

Throughput – maximize number of processes completed per unit of time

Processor utilization – Percentage of time the processor is busy

Fairness – processes thread the same and no process suffer starvation

Enforcing priorities – favor higher-priority processes

Balancing resources – if a resource is overused, processes not needing it should be favored

Priority based scheduling
– each priority gets its own queue, the dispatcher will check higher priority queues first

Scheduling methods

First come first serve – dispatcher chooses the process that has been in the ready queue the longest

Round robin – each process is given a time slice

Shortest process next – shortest executing time

Shortest remaining time – preemptive version of SPN

Highest response ratio next – dispatcher chooses process from execution time and time waited in queue R = (w+s)/s

R = response ratio

w = time waited in queue

s = service time

Feedback – decrease priority each time process is preempted, does not require knowledge of execution time

FreeRTOS

Minimalistic operating system for microcontrollers that often does not use operating systems

Run on many platforms

Very small OS

Tasks – regular process with own stack, is preemptive

Co-Routines – intended for use on small processors that have severe RAM constraints, share a single stack

Concurrency encompasses a host of design issues, including communication among processes, sharing and competing for resources, synchronization of activities of multiple processes, and allocation of processor time to processes

Atomic operations – a function or action implemented as a sequence of one or more instructions that appears to be indivisible. The sequence of instruction is guaranteed to execute as a group, or not execute at all, having no visible effect on system state. Atomicity guarantees isolation from concurrent processes.

Critical section – a section of code within a process that requires access to shared resources and that must not be executed while another process is in a corresponding section of code

Deadlock – two or more processes are unable to proceed because each is waiting for one of the others to do something

Livelock – two or more processes continuously change their states in response to changes in the other process(es) without doing any useful work

Mutual exclusion – the requirement that when one process is in a critical section that accesses shared resources, no other process may be in a critical section that accesses any of those shared resources

Race condition – multiple threads or processes read and write a shared data item and the final result depends on the relative timing of their execution

Starvation – a runnable process is overlooked indefinitely by the scheduler

To guarantee mutual exclusion, it is sufficient to prevent a process from being interrupted. This capability can be provided in the form of primitives defined by the OS kernel for disabling and enabling interrupts

Concurrency mechanisms

Semaphore – an integer value used for signaling among processes. Only three operations may be performed on a semaphore, all of which are atomic: initialize, decrement, increment.

Binary semaphore – takes only values 0 and 1

Mutex – binary semaphore with the concept of ownership. This means that only the process/thread that locked the mutex can unlock it.

Condition variable – a data type that is used to block a process or thread until a particular condition is true

Event flags – a memory word used as a synchronization mechanism.

Messages – two processes exchange information and may be used for synchronization

Spinlocks – mutual exclusion mechanism in which a process executes in an infinite loop waiting for the value of a lock variable to indicate availability

In message passing both sender and receiver can be blocking or non-blocking

Non-blocking solves the problem where a message might be lost or a process fails before it sends an anticipated message. Prevents the receiving process from being blocked indefinitely

Indirect addressing – messages are sent to a shared data structure consisting of queues that can temporarily hold messages. The strength of the use of indirect addressing is that by decoupling the sender and receiver it allows for greater flexibility in the use of messages

Linux and real-time variants

Linux kernel – monolithic kernel with loadable modules

Concurrency

Atomic operations – guaranteed to perform without interruptions or interference

Spinlocks – A thread that spins instead of sleeps until the mutex it is waiting for is released

Semaphores – Semaphores for kernel space

Barriers – Enforce order in which instructions are executed

Modular monolithic kernel

– Includes virtually all of the OS functionality in one large block of code that runs as a single process with a single address space. All the functional components of the kernel have access to all of its internal data structures and routines. Linux is structured as a collection of modules

Loadable modules – A module is an object file whose code can be linked to and unlinked from the kernel at run-time. A module is executed in kernel mode on behalf of the current process.

Dynamic linking and stackable modules

Linux page replacement

Least frequently used policy – each page in main memory has a counter, which is incremented each time it is accessed. Periodically decremented. The lower the counter, the less frequently it is used and will be preferred for replacement.

A buddy algorithm is used so memory for the kernel can be allocated and deallocated in units of one or more pages

Page allocator alone would be inefficient because the kernel requires small short-term memory chunks in odd sizes

Slab allocation – used by Linux to accommodate small chunks

Linux scheduling

Three classes – FIFI, Round-robin, non-real-time threads

Linux kernel not intended to be preemptive, and can’t be fully preemptive regardless of kernel configuration

Linux thread model

Originally there were no threads in Linux, only processes. Threads could be implemented in user space inside the processes. One blocked thread would block the whole process

Now, processes and threads are almost the same, both are running as separate “kernel threads”. Threads share the same virtual memory. Context switch between threads are faster than between processes. The kernel schedules each kernel thread individually.

User space

– Use System Calls to request kernel services. Kernel can notify user-space program with Signals

Kernel space

Reasons for developing in kernel space: User space programs can only access the services already provided by the kernel. Need to access an unsupported device or a special feature of the CPU of the kernel. Functionality can be implemented in kernel space to reduce the number of “context switches” which reduce performance.

Disadvantages with kernel space: The normal user-space libraries can’t be used in kernel space

Root is the highest level of privilege you can have in user-space. Not the same as kernel space.

Real-time

Linux is not a real-time system and the kernel is not preemptive.

Soft real-time capabilities can be improved. Configure the kernel to be as preemptive as possible

How to get harder real-time

Make Linux kernel full preemptive, thus more predictable – effect is uncertain

Real-time device model (RTDM) – if a Xenomai program uses a Linux driver e.g. some hardware device, the driver will execute in Linux context, not Xenomai – not hard real-time and inefficient due to many switches between Linux and Xenomai

From book

Most UNIX kernels are monolithic. Monolithic kernels include virtually all the OS functionality in one large block of code that runs as a single process with a single address space.

Linux is structured as a collection of modules, a number of which can be automatically loaded and unloaded on demand.

A module is an object file whose code can be linked to and unlinked from the kernel at runtime.

Although Linux may be considered monolithic, its modular structure overcomes some of the difficulties in developing and evolving the kernel.

Dynamic linking – a kernel module can be loaded and linked into the kernel while the kernel is already in memory and executing. A module can also be unlinked and removed from memory at any time

Stackable modules – the modules are arranged in a hierarchy. Individual modules serve as libraries when they are referenced by client modules higher up in the hierarchy, and as clients when they reference modules further down.

Kernel components:

Signals – kernel uses signals to call into a process. Used to notify a process of certain faults.

System calls – the system call is the means by which a process requests a specific kernel service. Six categories – file system, process, scheduling, IPC, socket and miscellaneous

Process and scheduler – manages and schedules processes

Virtual memory – allocates and manages virtual memory for processes

File system – provide a global hierarchical namespace for files, directories, and other file-related objects and provide file system functions

Network protocols – support the sockets interface to users for the TCP/IP protocol suite

Character device drivers – manage devices that require the kernel to send or receive data one byte at a time, such as terminals, modems and printers

Block device drivers – manage devices that read and write data in blocks, such as various forms of secondary memory

Traditional UNIX systems support a single thread of execution per process, modern UNIX systems typically provide support for multiple kernel-level threads per process

A new process is created by coping the attributes of the current process. A new process can be cloned so that it shares resources, such as files, signal handlers and virtual memory. When the two processes share the same virtual memory, they function as threads within a single process.

Although clones processes that are part of the same process group can share the same memory space, they cannot share the same user stacks. Thus the clone() call creates separate stack spaces for each process

Linux namespace

A namespace enables a process (or multiple processes that share the same namespace) to have a different view of the system than other processes that have other associated namespaces. One of the overall goals of namespaces is to support the implementation of control groups, a tool for lightweight virtualization that provides a process or group of processes with the illusion that they are the only processes in the system

Memory management

Three-level page table structure

Page directory – an active process has a single page directory that is the size of one page. Each entry in the page directory points to one page of the page middle directory. The page directory must be in main memory for an active process.

Page middle directory – may span multiple pages. Each entry in the page middle directory points to one page in the page table

Page table – may span multiple pages. Each page table entry refers to one virtual page in the process

Each time a page is accessed, the age variable is incremented. Linux periodically sweeps through the global page pool and decrements the age variable for each page as its rotates through all the pages in main memory. A page with the age 0 is the best candidate for replacement.

Real-time Scheduling

Fixed-Priority Scheduling (FPS)

If preemptive (which it probably should be), then the highest priority available task will always run.

Earliest deadline first (EDF)

The priority of each task will change dynamically based on the current deadlines. EDF is able to schedule any task set as long as it takes less time to execute than what is available – optimal.

EDF is more difficult to implement than fixed priority, it has more overhead. Difficult to incorporate tasks without deadlines in EDF. FPS is more predictable if the system gets overloaded.

Scheduling tests

Sufficient test means that if the test is passed, the tasks are certain to meet their deadlines. Utilization based test. Periods that are multiples of each other are considered to be in the same family.

Necessary test means that if the test is failed, the tasks will miss their deadlines

Exact tests are both necessary and sufficient. Response-time analysis – a method for calculating the worst case execution time of a task when there are higher priority tasks that interfere.

Tasks that have periods that are multiples of each other are
considered to be in the same family.

Example:

All tasks are in family (Periods: 20, 40, 80)

Not necessary to use 3 task limit (78.0%), instead the 1 task limit(100%).

Schedulability for EDF can also be tested with utilization, but withanother formula than for FPS

This means that EDF is able to schedule any task set as long as it takes less time to execute than what is available.

In this sense we can say that EDF is optimal.

Utilization test cannot be used on deadline monotonic!

Have to do graphical analysis or response time analysis

Priority inversion

– when a high priority task has to wait for a lower-priority task. Okay if high-priority task is blocked by low priority due to shared resource. Not okay if medium priority prevents the low-priority task from completing and return the mutex.

Priority Ceiling protocols

Immediate ceiling priority protocol (ICPP) – each shared resource is assigned the priority of the highest priority task that uses it. When a task is allowed to access a shared resource, it will inherit its priority. This means that when a task starts using a resource, it is guaranteed to be uninterrupted until it is done with the resource. Minimize the time a resource is locked.

Original ceiling priority protocol (OCPP) – a task inherits the priority of a resource ONLY if it blocks a higher priority task. A task can only access a shared resource if its priority is higher than the priority of all locked resources. Does not need to know the priority of the tasks in advance.

ICCP is easier to implement than OCPP, which has to keep track of what is blocking what. ICCP leads to less context switches as blocking is prior to first execution. ICPP requires more priority movements as this happens with all resource usage. OCPP changes priority only if an actual block has occurred.

Deadline monotonic priority

– rate monotonic is not suitable when deadline is shorter than period (D < T). Not the same as EDF, as priorities are still static. Same as rate monotonic when D = T.

Scheduling aperiodic tasks

In a rate/deadline monotonic scheme, an aperiodic task could be given the lowest priority – suffer from starvation

For better response for aperiodic tasks, we use Execution-time servers which maintain a budget for clients

Periodic/Polling server – each period the computation time budget is available for any pending clients – if a client needs more, it will continue next period. If one or more clients are pending, they will be scheduled within the server. The budget has to be used when the server is running or it is wasted.

Defferable server – Only lose budget when used. Better response time for aperiodic tasks. Back-to-back, where one budget is used just after another.

Sporadic server – budget is replenished one-time period after it has started to be used.

Finding worst case execution time

– highest theoretical value, actual value (impossible to get), worst execution time found when testing x number of times

From book

No actual tasks exist at run-time; each minor cycle is just a sequence of procedure calls

Task based scheduling – tasks exist at run time

Earliest deadline first – the absolute deadlines are computed at run time and hence the scheme is described as dynamic

Tests

Sufficient – pass the test will meet deadlines

Necessary – fail the test will miss deadlines

Exact – necessary and sufficient

Simple task model

Fixed set of tasks

Periodic, with known periods

Independent

Overheads, context-switching times are ignored

Deadline equal to period

Fixed worst-case execution time

Sporadic tasks have a minimum inter-arrival time – require D<T

Aperiodic tasks – do not have minimum inter-arrival times

Can run aperiodic tasks at a priority below the priorities assigned to hard processes, therefore, they cannot steal, in preemptive systems, resources from the hard processes – often miss deadlines  server

Fixed-Priority Scheduling(FPS) vs. Earliest deadline first (EDF)

FPS is easier to implement as priorities are static

EDF is dynamic and requires a more complex runtime system which will have higher overhead

It is easier to incorporate tasks without deadlines into FPS; arbitrary deadline is more artificial

Easier to incorporate other factors into the notion of priority rather than deadline

FPS more predictable during overload situations, but EDF gets more out of the processor

QNX

QNX microkernel

A microkernel is an operating system kernel that only contain what has to run in kernel mode. (Scheduling, memory management and inter-process communication (IPC).

There is no “kernel process”, the CPU executes code in the kernel only when an explicit kernel call is made, an exception is raised or to response to a hardware interrupt. All based around ICP.

Fundamental services in QNX

Thread services via POSIX thread-creation primitives

Signal services via POSIX signal primitives

Message-passing services – the microkernel handles the routing of all messages between all threads throughout the entire system

Synchronization services via POSIX thread-synchronization

Scheduling services – the microkernel schedules threads for execution using the various POSIX real-time scheduling policies.

Timer services – the microkernel provides the rich set of POSIX timer services

Scheduling

The running thread is blocked since it has to wait for some event. When unblocked it is placed on the end of its ready queue

The running thread is preempted due to a higher priority thread that becomes ready. The preempted thread is placed in the front of its ready queue.

The running thread voluntarily yields the processor, and is placed in the end of its ready queue. The scheduler selects the next thread, which can be the one that just yielded.

Scheduling methods – FIFI, round robin, sporadic server

Sporadic server

Is in QNX implemented as a thread that will keep its normal priority for as long as it is within its budget. When budget is exceeded, the priority will drop to a low level until the budget is replenished

Interrupts

low latencies, because interrupts are enabled almost all of the time (only certain critical code sections requires interrupt to be disabled). QNX supports nested interrupts.

Message passing

Things that typically are implemented in the kernel are instead implemented as user-space processes.

The primary inter-process communication mechanism in QNX is message passing

One process creates a channel. When another process attaches to the channel, they can exchange messages

Server communication – MsgReceive() when ready to reply to clients  get RECEIVE blocked, unless client is already SEND blocked  get READY when client does MsgSend()  send MsgReply() which will not block, since client is already synced

Message passing has priority inheritance

Server receives a message from a high priority client – server inherits the (higher) priority when the client sends the message

Server receives a message from a low priority client – server inherits the (lower) priority when the server receives the message

Memory management

– provides protection unlike many other RTOS (choose not to because of large overhead)

The overhead is less of a problem than the advantages it provides – adds robustness, threads that try to access memory which it is not allowed to access can be killed. Mitigates problems caused by accidental overwriting some memory location, which is typically very difficult to debug

QNX Qnet

Used for accessing remote file systems, scale applications and extend applications from a single machine to several machines and distribute processes among those CPUs

Qnet file system is shared between Qnet nodes. Resources and IPC mechanisms are available on the file system, is available for other nodes. Includes message passing, named semaphores, signals and message passing queues.

VxWorks

Features like – microkernel, fast response time, broad support, scalable and deterministic

Board support package (BSP) – supports different CPUs, like x86, ARM and Power PC

Memory management

Originally no memory protection – all tasks share memory. Convenient for communication between tasks, but on task can overwrite the memory of another

Newest version has memory management

Newer versions also have real-time process (RTP) - similar to process. Task is similar to thread. Like QNX, all tasks are scheduled independently by the kernel

Non-overlapped memory model – all virtual addresses in the system are unique – allow the same system to work with memory management units (MMU) in the CPU enabled or disabled

Kernel

Earlier versions were running on top of 3rd party kernels (VRTX or pSOS) – Wind River systems developed their own “Wind Kernel”

Older versions were running in “kernel mode” and had access to all memory

In newer versions, there is both user and kernel mode, and memory protection (more microkernel in the right sense of the term)

Task model

Consists of states ready, suspended, delayed, pended. Tasks can be in more than one state at a time

Single-core scheduling

VxWorks native scheduler is preemptive, highest priority task will always run, same priority tasks can either be FCFS or round robin (same as QNX)

Multi-core scheduling

All tasks can run on any CPU core by default

A task can be given a CPU affinity – a task will only run on that CPU core (same memory cache – less data moved)

A task can reserve a CPU core – only this task can run on that CPU core

Foreground and background RTPs

Foreground RTPs is assigned to a time partition

Background RTPs run within a time partition when all foreground RTPs have completed

Inter-process communication

Semaphores, mutex, message queues

Also has task lock – can disable and re-enable preemption. Faster than semaphores and useful if you need frequent and short protected access. Prevents all other tasks from running, not just the tasks contending for the resource

Wind River Linux

a Linux based system as an alternative to VxWorks based system. Provided with both PREEMPT-RT and co-kernel

VxWorks for hard real-time and Wind River Linux for soft/hard-ish real-time systems

Embedded Linux and Multiprocessor scheduling

Embedded development

One or more targets and one host computer (usually)

Host computer – program and compile code (more powerful)

Target computer – embedded computer that runs the code

Embedded Linux

Advantages – customizable and configurable, runs on a large number of architectures, large and active user community, free

Embedded Linux build systems

Best is to use whatever tool that offers best support for your hardware

uClibc

Used as alternative for embedded Linux, instead of glibc

Not ABI compatibility between versions. Not everything in glibc is supported in uClibc

Embedded Linux vs QNX/VxWorks

The setup of embedded Linux system and development environment is provided when using commercial systems as QNX or VxWorks (cross-toolchain, tool to configure and make a file system of image file)

Freeness of Linux comes at a cost

Microprocessor scheduling

Multicore computers are computers with multiple CPU cores on the same physical CPU chip, share memory and usually share operating system

Assignment of processes to cores:

Static assignment to a core – each process is permanently assigned to a core, one queue of processes for each core, minimal scheduling overhead, may result in one core being idle while another one is very busy

Global queue – each process is added to a global queue of processes. When a processor core is idle it will select the next process and start running it. No core is idle as long as there is available work. Preempted or blocked processes are likely to run on a different core next, which makes caching less effective

Dynamic load balancing – each process is assigned to a core – may be moved, depending on load

How to assign processes:

Master/slave architecture – kernel scheduling functions always run on the same master core, slave cores send requests to the master. Simple and requires little changes from uniprocessor scheduling. Master can become a bottleneck and a single point of failure

Peer architecture – kernel scheduling functions can run on any core. Each core schedule itself with available processes. Must make sure that no two cores run the same process.

A process can get full access to a processor core of its own, like VxWorks

Real-time on multicore

Difficult since another potential uncertainty is introduced

Moving a thread from one core to another will cause a longer and more unpredictable delay than normal scheduling

From book

Real-time computing may be defined as that type of computing in which the correctness of the system depends not only on the logical results of the computation but also on the time at which the results are produced.

Windows

Kernel and executive

Typical kernel-level tasks are divided into the Kernel and the Executive

Kernel controls how the processors are used – schedule threads. Exception and interrupt handling. Synchronize between multiprocessors

The kernel is in many ways similar to a micro-kernel. But since the executive (and many other components) also are running as kernel-mode, Windows is not a micro-kernel system

Other kernel-mode components – hardware abstraction layer between kernel and hardware. Means that kernel does not need to consider the actual hardware. Device drivers – extend the functionality of the Executive, typically used for interacting with specific I/O device

User-mode processes

Special system processes – user-mode services that manage the system, session manager and authentication subsystem

Service processes – “server” programs. Only way to run a non-system background program

Client/server model

Use remote procedure calls, which is a type of inter-process communication

The caller/client causes a procedure to run in another address space belonging to a server

Could be another computer or the same computer but in different virtual address space

Windows objects

Windows is written in C but has object -based design

Some of the internal system components of Windows exists as objects. Things like processes, threads etc. are objects. The object manager keeps track of all the objects in the system

Objects have both data (attributes) and procedures (services). Attributes are only accessible through services, thus it is possible to protect from unauthorized control

Windows processes

Implemented as a process object

Virtual address space and other attributes

Contains one or more threads that execute code and can be scheduled by the Kernel

Has a handle table with references to the different objects known to the process

Windows threads

Threads are similar to other systems – scheduled by the Kernel, share address space with other threads in the same process. A thread pool is a collection of worker threads that is available to perform asynchronous callbacks in the background

Fibers – one level below threads, one thread can schedule multiple fibers within itself. Run in the context of the thread that schedule it

User-mode scheduling (UMS) – lightweight mechanisms for applications to schedule its own threads. Differs from fibers as a UMS thread, has its own context

Both fibers and UMS have potential problems like one blocked fiber/thread blocks all

States – Same as the 5-state model except for the extra transitions state, means that a thread is ready to run, but all its resources are not yet available

Windows scheduling

Real-time priority class (16 to 31), fixed priorities that never change. Round-robin between threads with same priority

Variable priority class (0 to 15), vary depending on use, can get boosted to 15 if starved. Will never become higher than 15

Multiprocessor scheduling

Any thread can run on any processor

By default, soft affinity is used, which means that the Kernel will try to assign a thread to its previous processor so that cache can be reused

Can also be assigned with hard affinity, which means that it will only run on one processor

Windows dispatcher objects are used for synchronization

Concurrency mechanisms

Critical section protects critical code – similar to mutex

Slim reader-writer – like critical section

Condition variables

Lock-free synchronization – atomic access to a resource without using a lock/mutex, a thread cannot get preempted until it is done with the resource

Kernel-mode components

Kernel – controls execution of processors. The Kernel manages thread scheduling, process switching, exception and interrupt handling, and multiprocessor synchronization. Unlike the rest of the Executive and the user level, the Kernel’s own code does not run in threads

Hardware abstraction layer (HAL) – maps between generic hardware commands and responses and those unique to a specific platform. It isolates the OS from platform-specific hardware differences. The HAL makes each computer’s system bus, DMA controller, interrupt controller, system timers, and memory controller look the same to the Executive and Kernel components. It also delivers the support needed for SMP

Device drivers – dynamic libraries that extend the functionality of the Executive. These include hardware device drivers that translate user I/O function calls into specific hardware device I/O requests and software components for implementing file systems, network protocols, and any other system extensions that need to run in kernel mode

Windowing and graphics system – implements the GUI functions, such as dealing with windows, user interface controls and drawing

User-mode processes

special system processes, service processes, environment subsystem, user applications

The Windows OS services, the environment subsystems, and the applications are structured using the client/server computing model

Threads and symmetric multiprocessing (SMP)

OS routines can run on any available processor, and different routines can execute simultaneously on different processors

Multiple threads within the same process may execute on different processors simultaneously

Server processes may use multiple threads to process requests from more than one client simultaneously

Processes share data and resources with IPC

Windows objects

Used in cases where data are intended for user mode access or when data access is shared or restricted

Encapsulation – an object consists of one or more items of data, called attributes, and one or more procedures that may be performed on those data, called services

Object class and instance – an object class is a template that lists the attributes and services of an object and defines certain objects characteristics. The OS can create specific instances of an object class as needed

Inheritance – the Executive uses inheritance to extend object classes by adding new features. Every Executive class is based on a base class which specifies virtual methods that support creating, naming, securing and deleting objects. Dispatcher objects are Executive objects that inherit the properties of an event object, so they can use common synchronization methods

Polymorphism – API functions are used to manipulate objects of any type

Process and thread management

An application consists of one or more processes.

A process provides the necessary resources to execute a program. A process has a virtual address space, executable code, open handles to system objects, a security context, a unique process identifier, environment variables, a priority class, minimum and maximum working set sizes, and at least one thread of execution. Each process is started with a single thread, but can create additional threads from any of its threads

A thread is the entity within a process that can be scheduled for execution. All threads of a process share its virtual address space and system resources. In addition, each thread maintains exception handlers, a scheduling priority, thread local storage, a unique thread identifier, and a set of structures the system will use to save the thread context until it is scheduled. On a multiprocessor computer, the system can simultaneously execute as many threads as there are processors on the computer

A job object allows groups of processes to be managed as a unit. Job objects are nameable, securable, shareable objects that control attributes of the processes associated with them

A thread pool is a collection of worker threads that efficiently execute asynchronous callbacks on behalf of the application. Used to reduce the number of application threads and provide memory management of the worker threads

A fiber is a unit of execution that must be manually scheduled by the application. Fiber has state information about the threads stack, a subset of its register, and the fiber data provided during fiber creation

User-mode scheduling (UMS) is a lightweight mechanism that applications can use to schedule their own threads

Characteristics of Windows processes

Implemented as objects

Processes can be created as a new process or as a copy of an existing process

An executable process may contain one or more threads

Both process and thread objects have built-in synchronization capabilities

Memory management

The windows virtual memory manager controls how memory is allocated and how paging is performed. The memory manager is designed to operate over a variety of platforms and to use page sizes ranging from 4 Kbytes to 64Kbytes

Virtual address map – on 32-bit platforms, each Windows user process sees a separate 32-bit address space, allowing 4 Gb of virtual memory per process. By default, half of this memory is reserved for the OS, so each user actually has 2 Gb of available virtual address space and all processes share most of the upper 2 GB of system space when running in kernel mode

Swapping – Paging will hold items that haven’t been accessed in a long time, whereas swapping holds items that were recently taken out of memory

Scheduling

Windows implements a preemptive scheduler with a flexible system of priority levels that includes RR scheduling within each level and, for some levels, dynamic priority variation on the basis of their current thread activity. Threads are the unit of scheduling in Windows rather than processes

Priorities in Windows are organized into two classes: real time and variable, each consists of 16 priority levels. Each level has a FIFO queue.

Soft affinity – dispatcher tries to assign a ready thread to the same processor it last ran on

Hard affinity – thread executes only on certain processors

Android

Same as Linux kernel but, some changes to the kernel, instead of glibc, it uses its own standard C-library Bionic. It does not use the same X Window System that most Linux graphical user interfaces uses

Regular Linux programs and libraries are not directly compatible Android

Use ahead-of-time compilation instead of just-in-time

Android applications

Consists of one or more instances of four types of components:

Activities – A screen visible as a user-interface. An application can have multiple activities, each for a separate task. Activities can be external, thus it can be started by another application

Services – a background task. Can continue to run even when another application is used.

Content providers – An interface to application data. Data can be stored in a file, SQLite database etc.

Broadcast receivers – Responds to system wide broadcast announcements. Events from other applications or the system can have an effect on the application

Inter-process communication

Not typical Linux IPC mechanisms

Has its own addition to the Linux kernel called Binder, which implements a light weight remote procedure call (RPC) capability. Allow communication between processes running on two different virtual machines

Android and real-time

Android is probably not the first choice for a real-time system – but could be useful in some situation. A mixed-criticality system with touch-screen interface. A personal (always carried) device needing to do something in real-time

From book

Standard OS for IoT

Defined as a software stack that includes the OS kernel, middleware, and key applications

Complete software stack – not just OS. Android is a form of embedded Linux

All applications that the user interacts with directly are part of the application layer

Open-source android architecture

Every Android application runs in its own process, with its own instances of the Dalvik virtual machine

Linux kernel

The OS kernel for Android is similar to, but not identical with, the standard Linux kernel distribution. One noteworthy change is that the Android kernel lacks drivers not applicable in mobile environments, making the kernel smaller

Relies on its Linux kernel for core system services such as security, memory management, process management, network stack and driver model

The kernel acts as an abstraction layer between the hardware and the rest of the software stack and enables Android to use the wide range of hardware drivers that Linux support

Process and thread management

Four types of application components

Activities – correspond to a single scree visible as a user interface

Services – perform background operations that take a considerable amount of time to finish

Content providers – act as an interface to application data that can be used by the application

Broadcast receivers – respond to system-wide broadcast announcements

Default allocation of process and treads is a single of each. To avoid slowing down the user interface when slow and/or blocking operations occur in a component, the developer can create multiple treads within a process and/or multiple processes within an application

IPC

Does not use pipes, shared memory, messages, sockets, semaphores or signals

Binder – provides a lightweight remote procedure call (RPC) capability that is efficient in terms of both memory and processing requirements, and is well suited to the requirements of an embedded system

The Binder is used to mediate all interaction between two processes. A component in one process (the client) issues a call. This call is directed to the Binder in the kernel, which passes the call on to the destination process (the service). The return goes through the Binder and is delivered to the calling process.

Comparison of OS

No OS

When to use

Simple applications

When system is only doing one (or very few) things simultaneously

Advantages

Full control

Can be deterministic

No CPU and memory is spent on OS, on low-powered microcontrollers

Some low-level operations are much simpler than a OS

Disadvantages

Tricky to do multiple things

Often need to read CPU datasheet to use features

Often not scalable

Free RTOS

When to use

Simple applications

When you need multitasking etc. on a simple microcontroller

Advantages

The advantages of bare metal at the same time as simple operating system capabilities

Operating system for hardware that can’t run other operating systems

Disadvantages

For better or worse, a different type of system than the other we have covered

Android

When to use

On hand-held devices

Graphical systems in ARM CPUs

Advantages

Wide adoption

Many apps, libraries etc. available

Linux kernel customized and minimized for hardware

Every app run within its own virtual machine

Disadvantages

Not intended for real-time system or control systems

Linux

When to use

Soft real-time

Developers have Linux experience

When many different programs and libraries are needed/useful

Advantages

Support for many platforms and system types

Large user community

Scalable and configurable

Many free and open source tools, programs, libraries etc.

Disadvantages

Not hard real-time

Unpredictable periods when kernel is non-preemptive

Costly to spend time on setting up system instead of buying something that just works

Real-time Linux

When to use

Linux with hard real-time

Mixed criticality systems

Advantages

A fully functional Linux operating system with the addition of hard real-time

Separation between real-time tasks and regular Linux processes

Disadvantages

Real-rime tasks cannot fully use the Linux system

Difficult to set up, experience with Linux in not directly applicable for development with real-time tasks

Windows

When to use

Developers have Windows experience

Existing Windows code

Advantages

Embedded compact has suitable capabilities for hard real-time

Many tools, programs, libraries etc.

Disadvantages

Cost

Consumer versions are not suitable for embedded/real-time

QNX

When to use

When a proven, commercial real-time system is required

Distributed control systems

Advantages

Hard real-time

Specialized development tools

Microkernel

Customizable

Disadvantages

Cost

Environment is unfamiliar for many

VxWorks

When to use

When a proven, commercial real-time system is required

Arguably lower-weight than QNX

Advantages

Hard real-time

Specialized development tools

Microkernel

Customizable

Same vendor provides commercial Linux alternative

Disadvantages

Cost

Environment is unfamiliar for many

Div

Microkernel vs monolithic kernel
A microkernel differs from a monolithic kernel in that it only the most basic
functions are available from system calls; the kernel is broken down into separate
processes, known as servers. This structure is implemented in many real-time
operating systems. The practical purpose of using a microkernel is to sacrifice some
performance for reliability. In a monolithic structure, a service is obtained by a single
system call. In a microkernel structure a service is obtained through Inter-process
Communication (IPC), this causes overhead due to the required context switch.
However, since the microkernel is divided into different servers, if one fails, the other
servers will work efficiently. In critical operations as control systems often are, this is
an extremely important feature.