Tracking down the tainted data in your embedded app with static analysis

It is inherently risky to assume that software inputs will always be well-formed and within reasonable ranges. In the worst case, this assumption can lead to serious security vulnerabilities and crashes. Systems built from a combination of components with different sources are at particular risk: research has proven that security vulnerabilities proliferate at the boundaries between code components, often due to innocent differences in the interpretation of interface specifications.

In the parlance of secure programming, unchecked in put values are said to be tainted. Tainted data should always be a concern for developers : it can cause unexpected behavior, lead to program crashes, or even provide an avenue for attack.

An important consequence for the embedded domain is that any software that reads input from any type of sensor should treat all values from the sensor as potentially dangerous. The sensor may malfunction and report an anomalous value, or it may accurately report circumstances that had not been foreseen by the software author. It may even be possible for an attacker to gain access to the sensor’s communication channel and send values of their own choosing. Opportunities for these attacks have grown along with the increasing degree of network connectivity in embedded applications; where attackers would once have needed physical access to a device, they may now be able to attack over the network.

This article will describe some of the ways in which tainted data can cause problems, then explain how taint analysis capabilities in modern static analysis tools can help developers and users identify and eliminate these problems.

The dangers of tainted dataThe most risky input channels are those over which an attacker has control. Programmers can defend against such defects by treating inputs from potentially risky channels as hazardous until the validity of the data has been checked.

The biggest risk of using unchecked values read from an unverified channel is that an attacker can use the channel to trigger security vulnerabilities or cause the program to crash. Many types of issues can be triggered by tainted data, including buffer overruns, SQL injection, command injection, cross-site scripting, and path traversal. (For more details on these and other classes of defect, see the Common Weakness Enumeration at Mitre.) Many of the most damaging cyber-attacks in the last two decades have been caused by the infamous buffer overrun (Figure 1). As this is such a pervasive vulnerability, and because it illustrates the importance of taint analysis, it is worth explaining in some detail.

There are several ways in which a buffer overrun can be exploited by an attacker, but here we describe the classic "stack smashing" attack, in which an attacker hijacks the process and forces it to run arbitrary code. Consider the following code:

In this example, the input from the outside world is through a call to getenv() that retrieves the value of the environment variable named “CONFIG”.

The programmer who wrote this code was expecting the value of the environment variable to fit in the 100 characters of buf, but there is actually no guarantee that this will be the case. If the attacker has sufficient system access, he or she can cause a buffer overrun by assigning CONFIG a value whose length exceeds 100.

Because buf is an automatic variable that will be placed on the stack as part of the activation record for the procedure, any characters after the first 100 will be written to the parts of the program stack beyond the boundaries of buf. The variable named count may be overwritten (depending on how the compiler chose to allocate space on the stack). If so, then the value of that variable is under the control of the attacker.

This is bad enough, but the real prize for the attacker is that the stack contains the address to which the program will jump once it has finished executing the procedure. To exploit this vulnerability, the attacker can set the value of CONFIG to a specially-crafted string that overwrites this return address with a different address of their choosing. When program control exits the function, it will return to that address instead of the address of the function’s caller.

If the code is executed in a sufficiently secure context, it may be impossible for an attacker to exploit this vulnerability. Nevertheless, the code is clearly risky and remains a liability if left unfixed. A programmer might also be tempted to re use the code in a different program that does not run under the same degree of external security.

While this example takes its input from the environment, the code would be just as risky if the string were read from another input source, such as the file system or a network channel.

It is also important to note that unexpected inputs do not necessarily originate from attackers. A problematic input value may be accidentally provided by a trusted user, for example, or be generated by malfunctioning equipment. Whatever the origin of tainted input, the same analysis techniques can be applied to detect it and track its influences, and the same defense techniques apply.