Like this article? We recommend

Like this article? We recommend

In part 1 of this series, we looked at some of the problems that C causes with novice programmers, related to error checking and variable initialization. This week we'll dig a bit deeper and see how C handles integer values—an unexpected cause of security holes—and look at how we can build some less error-prone memory management routines on top of C.

Integer Issues

If you're used to high-level languages, C's support for numerical values may seem distressingly primitive. The same might be true even if you come from assembly language, since C also doesn't expose any of the condition codes found in a modern CPU to the programmer.

In a high-level language, numbers typically are arbitrary-precision, where the precision is defined by your need. C has quite a selection of integer types. The most common of these is int, which is expected to correspond to a machine word. On the machines where I learned C, this was 16 bits. Now it's typically 32 bits, even on architectures where the machine word is 64 bits, since a lot of code has been written now assuming that it's always 32.

One of the most common causes of strange behavior is trying to store a pointer value in an int. On common 32-bit platforms, this strategy works fine. On some 64-bit platforms it works, but on others it doesn't. The C99 standard defines a new integer type, intptr_t, which is guaranteed to be big enough to store a pointer. If you're planning to store a pointer in an integer, you should always use intptr_t.

I say pointer, and this is one of the other slightly confusing corners. C defines two kind of pointers, one that points to code and one that points to data. A pointer to a function isn't guaranteed to be the same size as one that points to data. On a lot of embedded platforms, it's common to use 16-bit pointers for code and 32-bit pointers for data. Casting a void* to a function pointer results in some data being discarded in this case.

The other core kinds of integer in C are char, short, and long. C99 also introduced a long long. None of these have fixed sizes, but they all have minimum sizes. (Technically, they have minimum ranges of values they can store, and make no guarantees as to their internal layout.) A short must be at least 16 bits, a long at least 32, and a long long at least 64. If you need a minimum amount of precision, pick one of these, and you may get a bit more space than you requested, depending on the architecture.

I haven't mentioned chars yet because they differ subtly from the other types. All of the other basic integer types are signed unless explicitly declared as unsigned. This isn't always the case with char. Sadly, it isn't never the case with chars, either—whether a char is signed or not depends entirely on the implementation. If you're using chars, always declare them as signed or unsigned explicitly; if you don't, you might be surprised later.

C has some quite surprising rules for implicit conversions between these types in operations. It's a common assumption that the precision of an operation will be specified by how you use it. For example, suppose you do this:

a = b + c;

Because you're storing the result in a, you might assume that the calculation would be performed with whatever the type of a is. In fact, it will be performed with the type of b or c. This makes sense when you consider that the alternative is to have the value of (b + c) dependent on something other than b and c themselves. You might assume that the type of this expression will be the type of b. The C99 standard defines a number of rules for determining the type, but in general it will be whichever is the larger type, b or c—for example, if a is a char and b is an int, the type of the expression will be an int. A fairly common bug comes from doing something like this:

long long a;
long b;// = something
long c;// = something
a = b * c;

a is at least 64 bits, b and c at least 32 bits. Since you can only guarantee that b and c are 32 bits, you shouldn't have put anything bigger than that in them, even if your compiler or platform has 64-bit longs. When you multiply them, you get a result that fits in a 64-bit integer, and you store it in something that fits a 64-bit integer. Sounds sensible? It is, except for the fact that the result is truncated to 32 bits just before the assignment. Here's the correct way of doing this:

a = (long long)b * c;

This sign-extends b to 64 bits (or more, depending on the implementation). The type promotion rules then kick in, ensuring that c has a type with as much space as b, so it's also sign-extended. The multiplication then occurs as a multiplication of a pair of long longs, with 32 or more leading zeros, and the result (a long long) is stored in a long long variable.

In general, you can avoid surprises by explicitly converting the types to something with enough precision for the operation first. Make sure that the signedness matches on both. This is a very common cause of errors, since you can accidentally lose the top bit when converting a big unsigned value to its signed equivalent.

A more subtle bug comes from the integer overflow. This is especially common with malloc, where a common pattern is to write malloc(i * sizeof(x)). If an attacker has any influence over i, he can try to make this overflow. For very large values of i, this will give a result much smaller than i, which is a problem. The call to malloc will succeed, but when you attempt to use the resulting array, only the first few values will be valid. An attacker could cause you to overwrite other data.

A simple way of avoiding this kind of hole is to use calloc() instead of malloc(). (Of course, you hope that your system's implementation of calloc will perform bounds-checking itself and not simply malloc() and memset() count*size bytes.)

realloc() is more of a problem. No standard way exists of doing this, so you need to do it yourself. Fortunately, OpenSSH includes xrealloc(), which is a bounds-checking version of realloc(). This includes a number of other checks, but if you don't need all of them you can implement a simplified version:

This test is quite simple. SIZE_T_MAX is the largest value that a size_t can have. When you divide by the number of elements requested, you get the maximum size that each element can be without an overflow. If this size is smaller than the size requested, you have an overflow, and so you return NULL.

realloc returns NULL in case of an error, so you should always be checking that the return value from realloc is valid. Unfortunately, this is a fairly common cause of memory leaks (which, in turn, can become denial-of-service attacks). If realloc() returns NULL, the original pointer is still valid. It's common for developers to forget this principle and just do something like the following:

ptr = realloc(ptr, newsize);

When realloc() return NULL, you've lost your reference to the old memory location. FreeBSD provides a convenience function, reallocf(), which is the equivalent of this: