Various notes on Programming, Mathematics, and similar stuff

Menu

PSA: How to safely check buffer sizes in C

A common task in C is checking whether some data fits in a buffer. Unfortunately, doing it can be quite tricky, as http://lwn.net/Articles/278137/ shows. Going by the standard, the problem is that in C creating a pointer beyond the end of an array is undefined behaviour – which means the compiler can do whatever it wants – including sending your customers’ credit card numbers to Russia (actually that’s quite a probable outcome, because, as the link shows, the compiler may introduce a security bug, which will be exploited and result in the credit card numbers being sent).

First, I’ll like to say that C does allow you to create a pointer to exactly-the-end-of-an-array (say, if A is an array of length 5, then you can create a pointer to (the non-existent) A[5], as long as you don’t dereference it).

The code in red tests for overflow by first creating a pointer overflowing a buffer, which overflows a buffer and begets undefined behaviour, and then tries to check the length, but the check comes too late – the program’s behaviour is already undefined, and the compiler, knowing so, may remove the check.
The code in orange is slightly less wrong – the check is actually well-defined, because unsigned integer addition is defined as addition modulo 256^sizeof(size_t) – however, it checks the wrong thing, because if an attacker puts, say, len=(size_t)-sizeof(unsigned int), then len+sizeof(unsigned int) = -4+4 = 0, and the check will succeed and copy a huge amount of data off the end of the buffer, causing problems. I repeat, the red and orange codes are evil – use the blue or turquoise code unless you are introducing a backdoor.

Note that in these examples, read_uint is assumed to be a macro/function that reads sizeof(unsigned int)=4 bytes from a char* as an unsigned integer (this isn’t the same as *(int*)packet, because of alignment and endianness issues) – there does not seem to be any standard here. Assuming data is big-endian (as in most network protocols), it can be defined as: