.NET

Detecting Endian Issues with Static Analysis Tools

By Carl Ek, July 20, 2010

Best practices for using static analysis tools and enforcing correct programming in C/C++

Detecting Errors Related to Endianness: Syntax and Semantics

Outside of exhaustive unit testing and code inspection, it would be nice if compilers could tell programmers when endian issues are present. But any warnings that a compiler could give on any potential problems would be so noisy that they would most certainly be ignored.

All detection methods require some knowledge of the endianness of the data being processed before any warning can be given. Since compilers are not "smart enough" to know the contents of variables at run time, their ability to detect endian related errors is limited. Detecting endian issues at a code inspection is easier if naming conventions are used to identify big endian or little endian data, but compilers are (usually) not privy to the knowledge of such coding standards.

This is where static analysis tools can be used to detect issues which are applicable to a specific environment. Static analysis tools do not have to behave like compilers which must conform to a language standard, they can report on issues which violate such specific things as potential endian errors.

As a simple example, it is possible to develop a static analysis rule to warn on this situation:

union {
short number;
char view[2];
} my_number;

But if the appropriate coding standard was in place, it could be tailored to give no warning for this: (if the tool knew that regex(BigEndian.*) names are big-endian):

union {
short number;
char BigEndian_view[2];
} my_number;

Detecting Errors Related to Endianness: Protocol and Paths

The above example with a union is a case of detecting errors by examining syntax and semantics. There are other errors which can be detected when programming protocols are used to correctly handle endianness processing.

In these schemes, proper byte swapping is necessary, and protocols must be followed in order that numeric values are correctly interpreted.

A simple, but typical, example of a protocol: reading from and writing to a network. In this example, the length values must be passed into the function-style macros ntohl() and htonl() before they are used in the calls to data_alloc() and net_write(). Proper byte-swapping is done for input and output information:

A static analysis tool detector could be developed to keep track of a specific functions whose parameters must be processed by the byte-swapping routines. If the byte-swapping routine has not been called for each parameter (or the incorrect one has been called!) an error or warning message could be issued.

The types and complexities of the protocols which can be implemented depend a great deal upon the customization features and abilities of the particular static analysis tool, the usability, and other restrictions which are common across all static analysis tools.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!