Learning When Values are Changed by Implicit Integer Casts

C and C++ perform implicit casts when, for example, you pass an integer-typed variable to a function that expects a different type. When the target type is wider, there’s no problem, but when the target type is narrower or when it is the same size and the other signedness, integer values may silently change when the type changes. For example, this program:

prints 4294967293. Like unsigned integer wraparound (and unlike signed integer overflow) these changes of value are not undefined behavior, but they may be unintentional and may also be bugs. As of recently, Clang contains support for dynamically detecting value changes and either providing a diagnostic or else terminating the program. Some fine-grained flags are available for controlling these diagnostics, but you can enable all of them (plus some others, such as signed overflow and unsigned wraparound checks) using -fsanitize=integer.

To suppress the diagnostic, we can make the conversion explicit using a cast:

int main() {
int x = 300;
unsigned char c = (unsigned char)x;
}

Different parts of this functionality landed in Clang before and after the 7.0.0 release — to get everything, you will want to build a Clang+LLVM from later than 1 Nov 2018 (or else wait for the 8.0.0 release in a few months).

How would you use these checks? Generally, they should be part of a testing campaign, perhaps in support of a code audit. If you enable them on a non-trivial code base, you will run into diagnostics that do not correspond to bugs, just because real C and C++ programs mix integer types so freely. I suggest starting with -fsanitize=implicit-integer-truncation. Let’s look at doing this to Clang+LLVM itself and then compiling a hello world program: here’s the output.

I’d be happy to hear about any interesting bugs located using these checks, if anyone wants to share.

Finally, see this tweet by Roman Lebedev (the author of these checks) showing that the runtime impact of -fsanitize-trap=implicit-conversion is very low.

The relationship between the choice of unsigned/signed types with the size of int and the presence or absence of defined wrapping behavior should be viewed as a historical accident. Consider the effect of “x *= x;” when x holds the largest value of its type, for various types. Because code written for system with different sizes of “int” in the days before the Standard would have different expectations as to which values behave as signed or unsigned, and the Standard wanted to minimize impact on such code, the Standard ends up mandating that some systems process the difference between two uint16_t values as an unsigned type, and that other systems process it as a signed type.

Adding some new fixed-sized types whose semantics would be independent of the size of “int” would allow implementations to tailor their optimizations and diagnostics far more usefully than is possible with the existing types. If a struct member is supposed to represent a number of things from 0-65535, an attempt to add one when it holds 65535 should be considered a mistake which should be reported by a diagnostic build. If a struct member is supposed to represent a 16-bit checksum, an attempt to add one when it holds 0xFFFF should be considered normal. If the first purpose were served by a “16-bit whole number” type and the latter by a “16-bit wrapping algebraic ring” type, a compiler could easily and usefully generate a diagnostic in the first case but not the second. If the same “uint16_t” type is used in both cases, however, it would seem difficult to ensure that diagnostics from the erroneous actions get noticed without someone having to wade through lots of useless diagnostics from valid actions.