mehr...

Inhaltsverzeichnis

Program Execution Details

Program Startup

Upon startup, the executable file is loaded into the main memory. There are multiple executable file formats available, e.g. Linux uses the Extensible Linking Format (ELF) while current versions of Microsoft Windows rely on the Portable Executable (PE) file format and MacOS X uses a format called Mach-O. As Linux is the primary target of this tutorial, the remaining startup description focuses on the execution of ELF files on Linux-based operating systems.

Knowing the address of the entry point _start of the executable from the ELF format, the operating system is able to start the execution. C developers might wonder why the execution entry point is called _start and not main(), which is what they are used to. Although the runtime environment C programs have is minimal in comparison to other languages, its setup is done within each application's code. To save C programmers the effort of writing setup routines in every program by hand, compilers link ready-made code taking care of these tasks. This predefined code is called crt0 and fills the functional gap between the raw execution entry point _start and the C entry point main()1)2).

Concluding from the reasoning above, command line arguments and environment variables are among the first values pushed to the stack of an application3). Due to that, in comparison to other variables in the control flow of the application their offsets are relatively easy to calculate or at least estimate. Keep this fact in mind, it will be important when calculating stack addresses and using these locations for exploitation.

Function Calls

Another important aspect of program execution is the way that functions are called. Parameter passing details depend on the applied calling convention. On x86 systems the cdecl calling convention4), which is used by default by GCC, requires the parameters to be put on the stack in reverse order5). When a call instruction is encountered, the address of the instruction executed directly after the function call is pushed to the stack. Execution is then continued with the code of the function.

Following, the steps of a function conforming to the cdecl calling convention on a Linux-based x86 operating system are demonstrated. Usually, inside the function the stack is prepared first. After saving the base pointer at the stack, it is overwritten with the current stack pointer.

pushebpmovebp,esp

Remember that the stack grows from high memory addresses to low memory addresses. Allocating memory thus decreases the address of the top of the stack. By decreasing the value of the stack pointer ESP as shown below, n bytes of memory for local variables are allocated.

subesp, n

At this point the stack setup is done and the actual content of the function is next to be executed. The base pointer can now be used to reference function parameters and local variables. While parameters have a positive offset, variables are referenced by a negative offset from EBP. After the function execution is finished the stack pointer and base pointer are restored to their original values.

movesp,ebppopebp

After these instructions, the data of the function is not inside the range of the stack anymore.

Lastly, the return address is read from the stack and written to the instruction pointer register EIP by executing the ret instruction. It is not possible to directly assign a value to the instruction pointer register via a mov instruction.

ret

Execution continues with the code at the saved address. According to the cdecl calling convention the return value is placed in the EAX register. It is then the task of the caller to clean up the stack and remove the passed parameters6).

The information on the stack belonging to a particular function invocation is called the stack frame of the function7). A visualization of the stack frame of a single function is shown below.

Keep in mind that function calls are nested in every non-trivial program. Thus there are several stack frames located on the stack.

Calling Functions in Shared Libraries

To avoid compiling the same code into every binary again and again, common functions are packaged into shared libraries. These shared libraries are loaded at runtime and their functions are made available to the binary. On Linux these libraries are called shared objects and have the suffix „.so“ in their filename whereas on Windows they are called Dynamic-Link Library (DLL) and have a „.dll“ suffix.

Upon compilation the linker resolves calls to shared libraries. Over time libraries might change and addresses of functions inside them are modified. Additionally, mechanisms like ASLR randomize the offsets as a protection mechanism. An indirection is introduced to overcome the resolution of function addresses8).

For demonstration purposes a minimal shared library and a header describing its function are created and compiled as a shared object for Linux. Compilation instructions are listed in the first line of the source file.

As the shared library is resolved at runtime, it must be located within one of the configured search paths. Keeping the global configuration untouched, the user may extend these paths by specifying the LD_LIBRARY_PATH environment variable9). The shared library is located in the current working directory, which is by default not part of this variable. It is sufficient to set the variable accordingly to make the application run.

$ LD_LIBRARY_PATH=. ./a.out
shared

Now the application runs and terminates correctly.

Although the call to the function in the shared library is syntactically equivalent to a normal function call, they differ in the way the function code is resolved. GDB is used to analyze the call in detail.

This code is located in the Procedure Linkage Table (PLT) which is the first level of indirection. The underlying concept is more complex, but it can informally be considered a jump table for functions in dynamically linked libraries. At this point the description does not go more into detail, as it is not relevant for the topics covered by this tutorial. Remember that there is an indirection when calling a function located in a shared library and that the PLT contains jumps to these functions.