Embedded Systems/Threading and Synchronization

One of the most useful developments in the history of computing is multitasking and multithreading. This technique isn't always available to an embedded system engineer, but some embedded systems and RTOS have multithreading (MT) capability. The chapters in this section will talk about some of the uses of MT, and will discuss some of the common pitfalls associated with MT programming. This page is only going to serve as a brief reference to multi-threaded programming.

When the first multi-tasking systems were established, they did not have a central controller. Multi-tasking was established by having programs voluntarily give up control to the system, and the system would then give control to another process. This system worked reasonably well, except that any program that was misbehaving would slow down the entire system. For instance, if a program got stuck in an infinite loop, it would never give up control, and the system would freeze.

The solution to this problem is preemptive multithreading. In a preemptive environment, control could be moved from one process to another process at any given time. The process that was "preempted" would not even know that anything had happened, except maybe there would be a larger than average delay between 2 instructions. Preemptive multithreading allows for programs that do not voluntarily give up control, and it also allows a computer to continue functioning when a single process hangs.

There are a number of problems associated with preemptive multithreading that all stem from the fact that control is taken away from one process when it is not necessarily prepared to give up control. For instance, if one process were writing to a memory location, and was preempted, the next process would see half-written data, or even corrupted data in that memory location. Or, if a task was reading in data from an input port, and it was preempted, the timing would be wrong, and data would be missed from the line. Clearly, this is unacceptable.

The solution to this new problem then is the idea of synchronization. Synchronization is a series of tools provided by the preemptive multithreaded operating system to ensure that these problems are avoided. Synchronization tools can include timers, "critical sections," and locks. Timers can ensure that a given process may be preempted, but only for a certain time. Critical sections are commands in the code that prevent the system from switching control for a certain time. Locks are commands that prevent interference in atomic operations. These topics will all be discussed in the following chapters.

The term Mutex is short for "Mutual Exclusion", and is a type of mechanism used in a preemptive environment that can prevent unauthorized access to resources that are currently in use. Mutexes follow several rules:

Mutexes are system wide objects, that are maintained by the kernel.

Mutexes can only be owned by one process at a time

Mutexes can be acquired by asking the kernel to allocate that mutex to the current task

If a Mutex is already allocated, the request function will block until the mutex is available.

In general, it is considered good programming practice to release mutexes as quickly as possible. Some problems with mutexes will be discussed in the chapter on deadlocks.

Spin locks is a quick form of synchronization methods. It is named after its behavior - spin in the loop while the condition is false. To implement spin lock system should support test-and-set idiom or give exclusive access to a locking thread by any means (masking interrupts, locking bus).

An advantage of spin locks is that they are very simple. A disadvantage is that they waste CPU cycles in loop waiting. Most common use of spin locks is to synchronize quick access to objects. It is not advisable to do a long computations while spin locked a section.

A critical section is a sequence of computer instructions that may malfunction if interrupted. An atomic operation is a sequence of computer instructions that cannot be interrupted and function correctly. In practice, these two subtly different definitions are often combined. Operating systems provide synchronization objects to meet these requirements, and some actually call these objects as "critical sections," "atomic operations" or "monitors."

An example of a critical section is code that removes data from a queue that is filled by an interrupt. If the critical section is not protected, the interrupt can occur while the dequeuing function is changing pointers, and corrupt the queue pointers. An example of an atomic operation is an I/O read where the process must read all the data at a particular rate, and cannot be preempted while reading.

A generally good programming practice is to have programs exit their critical sections as quickly as possible, because holding a critical section for too long will cause other processes on the system not to get any time, and will cause a major performance decrease. Critical sections should be used with care.

Many RTOS have a mechanism to distinguish the relative priorities of different tasks. High-priority tasks are executed more frequently than the low priority tasks. Each implementation of priority scheduling will be a little different, however.

Perhaps the most common concurrency algorithm used in embedded systems is the "read the counter twice and compare" optimistic concurrency control pattern.

Many hardware timers and hardware counters are connected to the CPU over an 8-bit bus. If the low byte of the timer happens to be 0xFF when we start to read the timer, if the timer increments between two byte reads, a simple read of each byte separately will get a corrupted value.[1] We get a slightly different corrupted value if we read the low byte first vs. the low byte last, but it's corrupted either way.

longatomic_read_counter(long*counter){do{longcounter_old=*counter;// alas, not an atomic operation when the timer is connected to the CPU over an 8-bit bus.longcounter_new=*counter;}while(counter_old!=counter_new);returncounter_new;}

When the read-twice-and-compare algorithm is used to synchronize several processes (rather than, as above, a hardware counter and a process that reads the counter), the process that changes the counter needs to make that change in a way that appears atomic to the other process.

Many architectures have a single instruction that can atomically increment a variable. Alas, the exact details vary from one architecture to another, and from one C compiler to another. Some relatively portable ways to tell the C compiler to atomically increment a variable involve the std::atomic template class from the standard Atomic operations library, part of the C11 standard for the C programming language;[15] or the tbb::atomic template class; or the boost::atomic template class; etc.

// rough draft -- untested code__interrupt_handlerh(){// ...// inside interrupt handler, interrupts are already turned off, so it's safe to increment the counterraw_counter++;// ...}// using CAS to update the counter is safe even if interrupts are turned on.// inspired by example_atomic_inc() in// https://www.kernel.org/doc/Documentation/atomic_ops.txtvoidincrement_counter(long*counter){do{longold=*counter;longnew=old+1;// requires #include <stdatomic.h>longret=atomic_compare_exchange_strong(counter,old,new);}while(ret!=old)returnret;}// only use if interrupts are turned on, and it's not a multiprocessor machine:voidincrement_counter(long*counter){disable_interupts();(*counter)++;enable_interrupts();}

The main problem with the double-read algorithm is the ABA problem.[16] To avoid the ABA problem, we use counters that only count up,[17] and we give the counter enough bits that, for any reasonable amount of time a thread could possibly be delayed between reading the first byte of the first read and the last byte of the second read (say, maximum 1 second), the counter is long enough that at the worst-case (highest-frequency) count rate it takes much longer than that -- say, the minimum time to overflow is once every 10 seconds.

For example, up-down counters can be synthesized from 2 counters, one that only counts up, and the other that only counts down.

↑A few timers, such as the Atmel AVR timers, fix this with hardware. Those times have special additional hardware so that, when the program reads the low byte, all the remaining bytes of the timer are simultaneously latched into a hardware temporary register. Later the CPU can read each byte from the temporary latch (while the timer itself continues to count) without corruption. Atmel. "AVR072: Accessing 16-bit I/O Registers". 2002. Unfortunately, sometimes the main application needs to read one 16-bit register that uses a hardware temporary register, and an interrupt routine needs to read a (the same or some other) 16-bit register that overwrites the same temporary register. Then the main application will occasionally get a corrupted value. "AVR Libc FAQ: Why do some 16-bit timer registers sometimes get trashed?"

↑Milan Verle. "Architecture and programming of 8051 MCU's". Chapter 2 : "8051 Microcontroller Architecture". section: 2.6 "Counters and Timers". subsection: "How to 'read' a timer?". says: "... The lower byte is correctly read (255), but at the moment the program counter was about to read the higher byte TH0, an overflow occurred and the contents of both registers have been changed (TH0: 14→15, TL0: 255→0). This problem has a simple solution. The higher byte should be read first, then the lower byte and once again the higher byte. If the number stored in the higher byte is different then this sequence should be repeated. It's about a short loop consisting of only 3 instructions in the program. ... another solution ... is ... turn the timer off while reading ... and turn it on again after reading is finished."

↑James Rogers. "The 8051 Timers". section "Reading the Timers on the Fly". "read the high byte, then read the low byte, then read the high byte again. If the two readings of the [high byte] are not the same, repeat the procedure."

↑Craig Steiner. "The 8051/8052 Microcontroller: Architecture, Assembly Language, and Hardware Interfacing". p. 41. Section 8.2.6.1 "Reading the value of a timer". says "The program reads the high byte of the timer, the reads the low byte, then reads the high byte again. If the high byte read the second time is not the same as the high byte read the first time, the cycle is repeated. In code, this would be written as [3 assembly-language instructions]".

↑Abhishek Yadav. "Microprocessor 8085, 8086". p. 463. says "You read the high byte of the timer, then read the low byte, then read the high byte again. If the high byte read the second time is not the same as the high byte read the first time, you repeat the cycle. In code, this would appear as [3 assembly-language instructions]".

↑Alka Kalra, Sanjeev Kumar Kalra. "Architecture and Programming of 8051 Microcontroller". 2010. Section 5.5.8: "Reading the value of a timer". p. 163. "You read the high byte of the timer, then read the low byte, then read the high byte again. If the high byte read the second time is not the same as the high byte read the first time, you repeat the cycle. In code, this would appear as [3 assembly-language instructions]".