In the C language integer types can represent a finite range of numbers. If the result of an arithmetic operation falls outside of the type's range (e.g., the largest representable value plus one) then the value overflows or underflows. This becomes a problem if the programmer didn't think of it, e.g., the size parameter of memory allocator function becomes smaller due to the overflow.

The plugin is based on spender's idea, the intoverflow_t type found in older PaX versions. This was a 64 bit wide integer type on 32 bit archs and a 128 bit wide integer type on 64 bit archs.There were wrapper macros for the important memory allocator functions (e.g., kmalloc) where the value to be put into the size argument (of size_t type) could be checked against overflow.For example:

This solution had a problem in that the size argument is usually the result of a longer computation that consists of several expressions. The intoverflow_t cast based check could only verify the last expression that was used as the argument to the allocator function and even then it only helped if the type cast of the leftmost operand affected the other operands as well. Therefore if there was an integer overflow during the evaluation of the other expressions then the remaining computation would use the overflowed value that the intoverflow_t cast cannot detect.Second, only a few basic allocator functions had wrapper macros because wrapping every function with a size argument would have been a big job and resulted in an unmaintainable patch.

In contrast, the size_overflow plugin recomputes all subexpressions of the expression with a double wide integer type in order to detect overflows during the evaluation of the expression.

Internals of the size_overflow plugin

The compilation process is divided into passes in between or in place of which a plugin can insert its own. Each pass has a specific task (e.g., optimization, transformation, analysis) and they run in a specific order on a translation unit (some optimization passes may be skipped depending on the optimization level).The plugin has a GIMPLE and a regular IPA pass. The GIMPLE pass (insert_size_overflow_asm) marks the parameters (marked with a size_overflow attribute) with an asm statement. This is neededbecause the IPA pass runs after inlining and all functions of interest get inlined thus losing the size_overflow attribute.The IPA pass collects all decls of interest (generate_summary), prints out the missing functions (execute) and duplicates the necessary statements (transform). The plugin also supports LTO with gcc-4.9 (not public yet).

Before I describe the plugin in more detail, let's look at some gcc terms

The gimple structure in gcc represents the statements (stmt) of the high level language.For example this is what a function call (gimple_code: GIMPLE_CALL) looks like:

This stmt has 3 operands, one lhs (left hand side) and two rhs (right hand side) ones.Each variable is of type "tree" and has a name (SSA_NAME) and version number (SSA_NAME_VERSION) while we are in SSA (static single assignment) mode.

In userland there is only a hash table (e.g., openssl). The present description covers the kernel.The attributes

Plugins can define new attributes. This plugin defines two new attributes:

The __size_overflow attribute is used to mark the size parameters of interesting functions so that they can be tracked backwards.This is what the attribute looks like: __attribute__((size_overflow(1))) where the parameter (1) refers to the function argument (they are numbered from 1) that we want to check for overflow. In the kernel there is a #define for this attribute similarly to other attributes: __size_overflow(...).For example:

Originally we only had the attribute similarly to the constify plugin but in order to reduce the kernel patch size all functions except for the base ones are stored in a hash table.

Size_overflow hash table

The hash table is generated by the tools/gcc/size_overflow_plugin/generate_size_overflow_hash.sh script from tools/gcc/size_overflow_plugin/size_overflow_hash.data into tools/gcc/size_overflow_plugin/size_overflow_hash.h.The hash table stores functions, function pointers, struct fields and variable declarations.A hash table entry is described by the size_overflow_hash structure whose fields are the following:

next: the hash chain pointer to the next entry

name: name of the declaration

param: an integer with bits set corresponding to the size parameters, PARAM0 means the function return value.

context: needed to improve the hash and also to differentiate the decl types as follows:

fields: the name of the encompassing struct

function pointers: the name of the encompassing struct if any

functions: "fndecl"

variables: "vardecl" and for static globals also the file name

For example this is what the hash entry of the include/linux/slub_def.h:kmalloc function looks like:

This hash table is generated by the tools/gcc/size_overflow_plugin/generate_size_overflow_hash.sh script from tools/gcc/size_overflow_plugin/disable_size_overflow_hash.data into tools/gcc/size_overflow_plugin/disable_size_overflow_hash.h.The hash table stores functions that are of no interest for the size_overflow checks. For now these are functions returning an error code which are discovered in LTO mode.

global variables: if the data flow reaches a global variable then all writes to the variable are traced back as well. Not fully implemented -> TODO

structure fields: if the data flow reaches a structure field then all writes to the field are traced back as well. Mostly implemented -> TODO

When the plugin finds a marked decl then it traces back the use-def chain of the parameter(s) defined by the function attribute. The stmts found recursively are duplicated using variables of double wide integer types.

In some cases duplication is not the right strategy. In these cases the plugin takes the lhs of the original stmt and casts it to the double wide type:

function calls (GIMPLE_CALL): they cannot be duplicated because they may have side effects. However the computation of the function return value will be duplicated if PARAM0 is set in the hash table for the given function.

inline asm (GIMPLE_ASM): it may have side effects too.

division (RDIV_EXPR, etc.): special case for the kernel because it doesn't support division with double wide types

If the marked decl's parameter can be traced back to a decl then the plugin checks if the caller is already in the hash table (or it is marked with the attribute). If it isn't then the plugin prints the following message:

When the plugin finds one of the above cases then it will insert a range check against the double wide variable value (TYPE_MIN, TYPE_MAX of the original variable type). This guarantees that at runtime the value fits into the original variable's type range.

If the runtime check detects an overflow then the report_size_overflow function will be called instead of executing the following stmt.The marked function's parameter is replaced with a variable cast down from its double wide clone so that gcc can potentially optimize out the stmts computing the original variable.

If we uncomment the print_the_code_insertions function call in the insert_check_size_overflow function then the plugin will print out this message during compilation:"Integer size_overflow check applied here."This message isn't too useful because later passes in gcc will optimize out about 6 out of 10 insertions. If anyone is interested in the insertion count after optimizations then try this command (on the kernel):

The plugin creates the report_size_overflow declaration in the start_unit_callback, but the definition is always in the current program. The plugin inserts only the report_size_overflow calls. This is a no-return function.

This function prints out the file name, the function name and the line number of the detected overflow. If the stmt's line number is not available in gcc then it prints out the caller's start line number. The last three strings are only debug information.The report_size_overflow function's message looks like this:

In the kernel the report_size_overflow function is in fs/exec.c. The overflow message is sent to dmesg along with a stack backtrace and then it sends a SIGKILL to the process that tiggered the overflow.In openssl the report_size_overflow function is in crypto/mem.c. The overflow message is sent to syslog and the triggering process is sent a SIGSEGV.

Each dumpable gcc pass is dumped by -fdump-tree-all and -fdump-ipa-all. This blog post focuses on the ssa and the size_overflow passes.The marked function is coolmalloc, the traced parameter is _9. The main function's ssa representaton is below, just before executing the size_overflow pass (test.c.*.size_overflow_functions*, before transform):

gcc intentional overflow:Gcc can produce unsigned overflows while transforming expressions. e.g., it can transform constants that will produce the correct result with unsigned overflow on the given type. (e.g., a-1 -> a+4294967295) The plugin used to detect this (false positive) overflow at runtime .The solution is to not duplicate such stmts that contain constants. Instead, the plugin inserts an overflow check for the non-constant rhs before that stmt and uses its lhs (cast to the double wide type) in later duplication.For example on 32 bit: