Detecting and avoiding stack overflow in embedded systems

One of the toughest (and unfortunately common) problems in embedded systems is stack overflow and the collateral corruption or crash that it can cause.

Often, the consequences of a stack overflow manifest themselves far removed from the cause of the overflow itself, making the cause that much more difficult to identify and fix. Developers using ThreadX have an array of tools at their disposal to detect and even avoid stack overflow problems. These tools and techniques not only help developers avoid stack overflow due to inadequate stack memory allocation, they also help minimize RAM wasted by allocating excessive memory for thread stacks "just to be safe". The following tools and techniques are discussed in this white paper:

Manual stack inspection

Kernel awareness and ThreadX stack analysis

ThreadX run-time stack analysis

IAR Embedded Workbench stack usage analysis

TraceX stack analysis

Overview

In the C programming language, the stack—a region of memory in which local variables are located and function arguments are passed—is allocated by the programmer, with the amount of memory allocated based on factors such as machine architecture, OS, application design, and amount of memory available. If the program should require more memory for its stack than has been allocated, the stack overflows—without warning in most cases—which can corrupt other memory areas and often results in a program malfunction or even a crash. Such problems are very difficult to trace back to the stack overflow, causing programmers to expend considerable time and energy to find the underlying cause of the problem that the application exhibits. As a result, they tend to over-allocate stack memory as a precaution, “just to be safe.”

Traditionally, deciding how much memory to allocate for the stack has been a trial and error process. As widely respected industry commentator and consultant, Jack Ganssle, has observed:

"With experience, one learns the standard, scientific way to compute the proper size for a stack: Pick a size at random and hope." -- Jack Ganssle, “The Art of Designing Embedded Systems,” Elsevier, 1999.

In an RTOS, there is a separate stack for each thread, and each thread might have drastically different stack size needs. Making things even more challenging, stack overflows often affect a somewhat unrelated memory area – global variables, allocated memory, or another thread’s stack – and thus the subsequent problem does not manifest itself until much later than when the overflow occurred.

Manual Stack Inspection

The most obvious and basic technique to prevent stack overflows is to manually inspect the stack memory region and stack pointers for potential overflow. To facilitate this, ThreadX places a 0xEF data pattern throughout each thread’s stack (note that IAR Embedded Workbench has a stack plugin that does something similar but uses the pattern 0xCD). The idea here is to run the thread through its validation tests and then review all of the thread stacks. The non-0xEF byte closest to the start of the stack represents the high-water mark of that thread’s stack usage. Of course, if there are no remaining 0xEF data patterns in a thread’s stack, there is a high-probability that a stack overflow has occurred. The following figure shows an example thread stack with the 0xEF data pattern:

In addition to detection of stack overflow, manual stack inspection can be used to tune the stack size. For example, if a large area of unused stack space is found, it may indicate that the size for that thread’s stack is excessive. Of course, this analysis assumes that the test suite is exercising the worst case call tree depth for each thread. Note also that every thread stack must always have a minimum amount of unused memory in order to save its context if an interrupt occurs at the highest point of stack used. The exact amount needed varies by architecture, but is defined in each ThreadX port’s readme_threadx.txt file.

ThreadX Kernel Awareness and Stack Analysis

IAR Embedded Workbench provides the ThreadX kernel awareness. This awareness provides a single-click, system level view of ThreadX resources. The ThreadX-aware debugger also provides thread stack analysis, which effectively automates the manual inspection technique described above. The following screen shot shows an example of IAR Embedded Workbench for ARM, with its ThreadX kernel awareness for an ARM Cortex-M device. Specifically, this illustration shows the information related to the “thread” object:

The key column is the Stack Usage column. The difference between the Stack Size and the Stack Usage column yields the remaining stack size. For example, in the example shown thread 5 has a stack size of 512 bytes, while its current usage is 136 bytes. This means that there are 376 free bytes on thread 5’s stack.

This information is obtained by the debugger automatically examining the thread’s stack memory for the 0xEF data pattern. The stack memory for thread 5 ranges from address 0x20001a58 through 0x20001c57. The figure below shows the memory dump of this stack area:

Manual inspection of the thread’s stack memory area shows that the lowest address not having the 0xEF data pattern is 0x20001bd0. Subtracting this from the ending stack address of 0x20001c57 yields the reported used stack size of 136 bytes. Of course, the C-SPY debugger in IAR Embedded Workbench takes care of all this manual work, providing this valuable information via a simple mouse click.

IAR Embedded Workbench Stack Usage Analysis

Another useful tool for determining proper stack memory use and allocation is the Stack Usage Analysis in IAR Embedded Workbench. Under the right circumstances, the linker in IAR Embedded Workbench can accurately calculate the maximum stack usage for each call graph, starting from the program start, interrupt functions, tasks etc. (each function that is not called from another function, in other words, a root. In general, the compiler will generate this information for each C function, but if there are indirect calls (calls using function pointers) in your application, you must supply a list of possible functions that can be called from each calling function. If you use a stack usage control file, you can also supply stack usage information for functions in modules that do not have stack usage information.

Result of an Analysis—The Map File Contents

This is an example of what the stack usage chapter in the map file might look like:

The summary contains the depth of the deepest call chain in each category as well as the sum of the depths of the deepest call chains in that category. Each call graph root belongs to a call graph root category to enable convenient calculations in check that directives.

Call Graph Log

To help you interpret the results of the stack usage analysis, there is a log output option that produces a simple text representation of the call graph (--log call_graph).

Example output:

Each line consists of the following information:

The stack usage at the point of call of the function

The name of the function, or a single '-' to indicate usage in a function at a point with no function call (typically in a leaf function)

The stack usage along the deepest call chain from that point. If no such value could be calculated, "[---]" is output instead. "***" marks functions that have already been shown.

TraceX Stack Analysis

Another stack analysis tool available to ThreadX users is TraceX. Although the main purpose of TraceX is to provide a system level, graphical view of what the application is doing, TraceX also analyzes the stack usage for each thread represented in the trace buffer. TraceX does not provide a worst-case stack size for the entire thread execution, but only the worst case stack usage within the captured trace. For example, the following trace shows thread 5’s execution in the trace buffer:

Event number 184 is thread 5’s call to tx_event_flag_get, which in turn suspends as shown by event 185 and the subsequent execution of other threads. The stack analysis of this trace buffer, selected by View -> Thread Stack Usage:

The TraceX view of thread stack usage shows that thread 5 has a minimal available stack of 416 bytes (or used stack of 96 bytes). The reason TraceX shows less stack used than the other methods is that the stack sampled in the trace buffer does not include the stack required to save the thread’s context. However, it still provides a useful cross checking of the stack usage for thread execution captured within the trace buffer.

Summary

ThreadX users must still deal with stack overflow issues as well as attempting to ascertain the minimal amount of stack space required for each thread. However, they have unparalleled stack analysis tools at their disposal – eliminating much of the guesswork and hope!