Embedded Systems

Trampolines for Embedded Systems

Source Code Accompanies This Article. Download It Now.

Trampolines are short snippets of code that start up other snippets of code to, among other things, minimize the latency of interrupt handlers.

Sep01: Trampolines for Embedded Systems

Joseph holds a degree in electrical engineering from Northern Illinois University and works as a developer for MCS at Argonne National Laboratory. He can be contacted at joelink@joelink.net.

While designing an operating system for an embedded application running on Motorola's MC68360 microcontroller, I decided that the entire structure of the OS needed to be determined by how it handled interrupts. Among other things, interrupt handling determines whether it's easy to write drivers for a system, whether a system can be used as an RTOS, and whether a system is suitable for multitasking. In this article, I show how to use "trampolines"  short snippets of code that start up one another, usually after setting or modifying parameters. Trampolines can minimize the latency of interrupt handlers, allow you to write all of your interrupt handlers in a high-level language (namely C), and provide efficient task switching in multitasking environments. In this article, trampolines will be used as interrupt handler wrappers, and will be registered in a processor's interrupt vector table in place of real handlers.

The Simplest Trampoline

When an interrupt handler is written in assembler code, it is important to save any registers that your code might modify. Why? Because the interrupted instruction stream isn't allowed to prepare for your code, and it cannot run like it would have if it called your handler as a function in C. If these registers are not saved, data corruption will likely occur once the interrupt handler returns. The necessary preparation depends upon the standard calling conventions for the specific processor and/or compiler you're using. In general, there may exist a set of registers that every C function assumes has been saved by the caller. These registers are referred to as "disposable registers."

Figure 1 illustrates a simple trampoline that solves this problem. This trampoline simply wraps an interrupt handler with the calling conventions that a C function assumes. You use a trampoline like this by registering it as the handler for the intended interrupt, instead of the actual interrupt handler. Since you have gone to the trouble of writing a wrapper in assembler, you can add another feature. Again, there is no preparation prior to an interrupt handler's execution; this includes the passing of arguments to the handler. You can now let the interrupt handlers accept arguments, which lets more general handlers be written.

To summarize the operation of the simple trampoline in Figure 1: An interrupt occurs. The processor goes through its process, then begins executing the trampoline. Upon entering the trampoline, you immediately save the disposable registers and set up any predetermined arguments, which are passed to the handler in the manner specified by the standard calling convention. Next, the actual handler is called, and can work in an environment where it has access to arguments and can cause no harm to the interrupted instruction stream. After the handler completes its work and returns, the trampoline cleans up any mess it made in passing arguments to the handler and restores the saved disposable registers. The interrupted execution stream then continues (if there are no pending interrupts) from where it was interrupted.

Asynchronous System Traps

Another issue involving interrupt handlers is the length of time they take to execute. To maximize I/O throughput, interrupt latency needs to be kept to a minimum. An interrupt handler is usually required to respond to the hardware quickly, but other work can be deferred, for example, when an Ethernet device interrupts with data ready. First, the data has to be removed from the device's buffers to let more data be received. Second, the data needs to be processed, such as by TCP/IP stack software. It is this second, time-consuming process that does not need to occur within the interrupt handler and can be deferred.

To defer this process, my implementation was derived from the 4.4 BSD UNIX's interpretation of the Asynchronous System Trap (AST), which originally appeared on VAX systems. An AST simply lets an interrupt handler queue a function for execution at a later time.

For instance, take the case of using timer interrupts to create "ticks." When a timer is used to provide time sharing in a multitasking environment, it usually requires extensive modification of multiple lists, such as those for ready tasks and sleeping tasks. Modification of these highly volatile lists involves a critical section that usually requires interrupts to be completely disabled to prevent corruption of these lists. Disabling interrupts is not conducive to keeping latency at a minimum and I/O devices as busy as possible. The solution is to have the handler for the timer register an AST for itself instead of doing all of the work immediately. This AST can then be run at a later time, specifically, while interrupts are enabled.

If you have a system that is only interrupt driven and has no tasks, you can simply operate in a loop that does nothing but check for ASTs that need to be run. Consequently, if all interrupt handlers registered ASTs instead of doing the work themselves, there should never be a need to disable interrupts. Since the handler for the registered AST is not reentrant, it does not need to disable interrupts to prevent data corruption.

Task Switching

Multitasking involves management of multiple lists, minimally a list of tasks ready to run and a list of tasks waiting for an operation to complete (such as an I/O operation). Usually, operations that require a task to wait until it has completed maintains its own list of waiting tasks. For example, suppose a task calls the sleep() function. This would likely cause the timer-specific routines to remove this task from the ready-task list, placing it into its own wait list. Then, a new task would be set up to run. After the specified time has elapsed, the waiting task would be removed from the timer's wait list and placed back onto the ready list.

When a running task has been removed from the ready list (or a new task inserted), it is necessary to switch processor execution to the new task. A context switch does this. The context of a task includes its stack pointers, program counters, working registers, and status register. A context switch involves saving the context for the currently running task and restoring the context of another task that has been scheduled to run.

Bringing It all Together

It should be obvious that the only time a context switch is necessary is after an interrupt, as there is no other way for the ready-tasks lists to be modified. You can use this knowledge to create a trampoline like that in Figure 2, which greatly simplifies the core of the OS. It not only calls its respective interrupt handler as the simple one in Figure 1 does, but it also incorporates AST handling and task switching.

To better understand this trampoline, break it into two parts  the top and bottom halves. As you can see, the top half is similar to the simple trampoline in Figure 1. It simply wraps a C function designated as the interrupt handler with the standard calling convention, which lets the handler receive arguments and perform its work in a nondestructive manner.

The bottom half can be further divided into two sections  AST handling and task switching. Interrupts are disabled before entering either section. This is necessary because you need to prevent more ASTs from being registered if you are going to take the direct path to the task switch (where they won't be handled). If there are ASTs registered, interrupts are enabled and the AST handler is called. Upon returning from the AST handler, interrupts are again disabled as you check to see if any more ASTs have been registered while you were processing the last set of ASTs. This loops until no ASTs have been registered. When this condition is met, you enter the task-switching section with interrupts disabled.

The AST handler may or may not have modified the ready-tasks list. Check this list to see if the task at the top of the list is different from the task that is currently running. If different, a context switch is required to start the new task running; otherwise, skip over the context switch. Again, the context switch is accomplished by saving the state of the currently running task, restoring the state of the task at the top of the ready-tasks list, and then setting the currently running task to this new task. Finally, you return from interrupt handling.

Why are two conditions  that is, those separating the top half from the bottom half  needed? For my purposes, there were two types of interrupts  initial and nested interrupts. Initial interrupts are those that occur while a regular task is executing. Nested interrupts are those that occur while a previous interrupt is being serviced. This assumes, of course, that your processor allows nested interrupts to reduce the response time in handling interrupts. However, the AST handler is not reentrant. You need to ensure that only one interrupt can cause the AST handler to be executed at any given moment.

Use whatever method your processor allows for determining whether you are in a nested interrupt. What follows, if in a nested interrupt, is equivalent to the simple trampoline in Figure 1. That is, no AST handling or task switching will occur in the trampoline for a nested interrupt. If a nested interrupt has registered an AST, it is handled when all nested interrupts have completed and the only interrupt remaining is the initial one.

In the AST handling section of the trampoline, interrupts are completely enabled prior to calling the AST handler. For some processors, this makes it difficult to determine if you are in a nested interrupt should an interrupt occur while processing ASTs. Consequently, you need to add a variable that indicates if ASTs are currently being handled. If this variable has been set, you take the same route as though you were in a nested interrupt (because you really are) and return from this interrupt. Otherwise, it is concluded that no AST handling is currently running and we set this variable prior to beginning AST handling. After the bottom half has completed, clear this variable to allow the next initial interrupt to handle ASTs.

Implementing Simple Trampolines

The simple trampoline in Figure 1 requires both a C function to register a trampoline as an interrupt handler and the assembler code that makes up the trampoline itself; see Listing One. The trampoline allows two long word arguments to be passed. (This example is based on a Motorola 68K-series processor with the CPU32+ instruction set. Assembler code is presented in MIT syntax, as opposed to Motorola syntax  for instance, movel a7,a0@ instead of MOVE.L A7,(A0)  because I am using the GNU tools in their default configuration.)

The trampoline's operation is straightforward. Upon entering the trampoline, you save the 68K's disposable registers, push two arguments onto the stack, and then call the handler. When the handler returns, you pop the two arguments off of the stack, and then restore the disposable registers before returning from the interrupt.

Since this trampoline has to be replicated and customized for every handler that uses it (and you have chosen to create them dynamically), you need to make available to C a pointer to the beginning of the trampoline (Trampoline) and the total size of the trampoline (TRAMP_SIZE). Although not shown here, the definition of three offsets for PARM1, PARM2, and FUNC are also required. These are the actual byte offsets from Trampoline to the placeholders for the parms and function pointer in the assembled binary output.

I use a C function named IrqRegisterTramp to handle this replication and write the actual arguments that will be passed and the actual handler function that will be called. The quick explanation of this function is to obtain memory for the copy of the trampoline, copy the trampoline to this new memory, assign the arguments and function handler by using ugly pointer/offset manipulation, and finally, set the trampoline as the interrupt handler at the specified interrupt vector.

I have assumed that you have access to dynamic memory and find it beneficial to be able to dynamically register these trampolines. If you do not on either note, you can simply hardcode multiple occurrences of the trampoline and manually register them as interrupt handlers.

For my purposes, it is required that all handlers have the C prototype: void handler(void *p, int i);. This is completely arbitrary. If you would like fewer arguments, simply reduce what you push onto the stack in the trampoline. Likewise, if you require more, push more onto the stack. Also, none of my handlers are allowed to return anything. It is, however, possible to modify the trampoline to handle a return value. For the 68K family, any return value will appear in register d0 after the function call.

Coding ASTs

Now, I'll examine how you might implement a simple AST handling system. You need a way to register ASTs and a method of handling ASTs. Listing Two shows how you use a globally defined unsigned long integer for the simplest possible AST register, and how a handler for a PIT timer might register an AST.

For the CPU32+ instruction set, an unsigned long is guaranteed to be 32 bits long. This assumption will be put to use later, in the full-featured trampoline code. Also, note the register_ast(mask) macro defined. This has been defined with a compiler-specific extension that lets you write assembler code within a C file. This was necessary to ensure that the astRegister was modified directly and that compiler optimizations would not cause problems.

The method of handling ASTs depends on the overall system. If multitasking is not going to be implemented, then you simply need a loop running as the sole task that continually tests the astRegister; otherwise, the ASTs can be handled in a function that is called when necessary. In either case, the core of the loop or the handler function is similar to Listing Three.

If you use Listing Three in a loop, you must create a critical section to retrieve a copy of the astRegister, and then to clear it. This is necessary because there is no way in C to atomically test and clear the specific bit corresponding to the AST we're about to handle. If the code is to be used with the full-featured trampoline, a copy of the astRegister is passed to the AST handler function.

The Complete Package

Within the status register of the 68K (along with the condition codes and other bits) are 3 bits that specify the current interrupt mask (IM). It is this interrupt mask that specifies which interrupts are allowed to occur. The 68K has seven levels of interrupts. Every interrupting device must specify an interrupt level for itself. The processor determines whether an interrupt will be serviced by comparing the interrupt request level to the level specified in the IM. An interrupt only occurs if the request level is greater than the IM, except for level 7 requests, which are nonmaskable. If the interrupt is to occur, the IM is updated to reflect the requested interrupt's level.

Listing Four has a few requirements. First, any task must run with the interrupt mask at level 0. Second, all interrupt handlers must be registered within a trampoline, and finally, no interrupt at level 7 is allowed. The penalty for violating either of the latter two conditions is, however, small. The only situation that may occur is that a registered AST may not be handled as quickly as possible, waiting until the next initial interrupt.

When the 68K has decided to execute an interrupt handler, it first saves some state information onto the stack frame in what is called the "exception frame." This exception frame includes the status register and the current program counter (PC register) of the instruction stream that was interrupted. Since the status register contains the IM prior to the interrupt that you are processing, you can use it to determine whether you are processing a nested interrupt. If the IM of the status register in the frame is not zero, you are nested.

Tracing the code, you see that upon entering the trampoline, you save your disposable registers, push arguments onto the stack for the handler, and call the handler. After the handler returns, the arguments are removed from the stack. Then, the status register stored in the exception frame on the stack is tested for an IM greater than zero. In the case of a nested interrupt, this evaluates True and a branch occurs to where the disposable registers are restored before returning from interrupt. As you can see, this is a similar execution path to that of Listing One.

Now, if the interrupt is not nested (IM in stack is equal to 0), you need to check whether you are already handling ASTs (since you set the IM to 0 before calling the AST handler). If you are, the doingASTs variable will evaluate True and a branch occurs to where the disposable registers are restored before returning from interrupt.

If doingASTs evaluates False, you set doingASTs to True and enter the AST handling code. To ensure that you do not miss an AST being registered while you determine your execution path, interrupts are disabled. Recall that the astRegister is guaranteed to be a 32-bit (long) word; this lets a single test determine if any ASTs have been registered. If there are ASTs registered, you push a copy of the astRegister onto the stack for the AST handler, clear it to allow more ASTs to be registered while you process the current batch, and finally, enable interrupts (IM to 0). The AST handler function, whose body should be similar to Listing Three, is then called. Upon return, loop back to the disabling of interrupts and testing of the astRegister. Once all ASTs have been processed and a test of the astRegister returns zero, proceed to the context switch section. Interrupts are left disabled to prevent an AST from being registered when it will not be handled.

To prepare for a task switch, the currently running task is compared with the task at the top of the ready-tasks list. If these tasks are equal, there is no need to perform the switch, so simply branch to restore the disposable registers before returning from interrupt. If they are not equal, then a task switch is necessary.

For this operating system, every task has its own dedicated stack; this stack contains everything a task needs to run, including its state information and any working variables a C routine within the task may have stored there. The task is represented by a C struct that has a pointer to the task's stack. So, to save the context for this system, you simply push the rest of the state information of the current task onto its stack (next to its already saved disposable registers), and save the stack pointer in the current task's struct. Then, the stack pointer for the new task to be run is placed into the stack pointer register. From this, all of the new task's state information is restored from its own stack before returning from interrupt.

This trampoline is registered using the same function as in Listing One. However, only the handler-specific region of the entire trampoline is replicated for each handler. This position-independent portion of the code simply jumps to the common section of the trampoline after the handler is called.

A word of caution: In my operating system, there is only one stack pointer. I do not make use of the 68K's separate user stack pointer. Consequently, the entire trampoline operates in the context of the current task. Specifically, the actual interrupt handler function and the AST handler both use the current task's stack. This requires that every task have a stack allocated to it that is large enough to contain up to seven of the largest exception frames (for the worst case of nested interrupts), plus the context of any interrupt handler function, as well as the context of the AST handling function.

Conclusion

We have managed to design a rather simple method of allowing interrupt handlers to be written in C, reducing overall interrupt latency, and providing for simple and efficient task switching in a multitasking environment. While the code presented here made use of some processor-specific concepts, the trampolines shown should be implementable on any system.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!