WEBINAR:On-Demand

In this essay, I am going to show you how the most commonly used C++ compilers (MSVC and Borland) use the stack. Beginners will learn how the stack is used, what the function stack frame is, what the stack frame pointer is, and how to use this information to get a function stack trace. I hope that the more advanced readers will find some interesting information, too.

As an example, a simple class named StackDumper is described. The class StackDumper has a method that browses the thread's stack and allows you to save in a text file the names (or addresses in the worst case) of all functions executed before the foo is called. As a side effect of this functionality, you will be able to ask the StackDumper whether foo is called from foo1.

This widget requires JavaScript to run. Visit Site for more...
The task became a little bit complicated by my intention to create a class that is not dependent on any particular compiler. If you download the article demo, you will find three projects—for MSVC++ 6, for Borland C++ 5.02, and for Borland C++ Builder 6.
If you are using the VC++ compiler, there is an easy and already described way to implement this functionality. John Panzer has published in the C/C++ Users Journal, January 1999, his essay "Automatic Code Instrumentation." Shortly, he uses the /Gh compiler switch to force the compiler to generate a call to a _penter function at the start of each client function. From inside _penter, the address of the caller is retrieved and stored in a parallel stack. It also records the function entry time. Afterwards, the original return address of the caller is replaced by the address of a user function. This function is used for profiling purposes. When it is called, it records the function exit time and restores the original return address of the caller.

This is the main idea. You can download the essay from here. Obviously, this approach uses a Microsoft-specific compiler switch. Due to the parallel stack support, special attention should be paid to the case when an exception occurs in one of the profiled functions.

The compiler generates a hidden call to the _penter function in all user functions; this is not very flexible because you may not want to collect information about every function.

The same functionality can be implemented by the well-known concept of stack backtrace. The idea is to use the stack frame that each function builds on the stack to trace the calling sequence. This approach is compiler independent (at least it works with both MSVC and Borland compilers), but it also imposes some limitations. Most compilers provide options to compile a function without prolog and epilog code. For example, functions declared with the __declspec(naked) attribute will be compiled by MSVC++ and Borland C++ compilers without prolog and epilog code. The same effect can be reached by specifying some compiler options for optimizations. For example, the /Oy switch of the Microsoft compiler suppresses the creation of frame pointers on the call stack.

In this essay, I take for granted that the program is compiled without any optimisations or "special" compiler switches.

1. How the Stack Is Used

1.1. Stack Frame and Frame Pointer

Here is a brief explanation for those of you who are not familiar with the way the stack is used during function calls.
The more advanced readers can skip this section. When a function is called, its arguments are pushed first on the stack, depending on the calling convention.

Then, the call instruction is executed. It pushes the return address on the stack. The first instruction of the function is push ebp—the base pointer is pushed. The stack pointer is moved in the ebp, than esp is decremented to make room on the stack for the local variables of the function. So, for every called function, the following information is built on the stack:

Figure 1.

This information is called the "stack frame." The register ebp has a special meaning. It is called the "frame pointer." The frame pointer is initialized at the function start by the standard prolog code and stays unchanged during function execution.

The value of the previous function frame pointer is restored when the current function exits (epilog). How is the frame pointer used? The compiler uses the frame pointer to refer to local variables and parameters of a function (if any).

*(ebp) is the value of the frame pointer of the caller;

*(ebp+4) is the return address (the place in the caller body where execution will continue after the callee returns);

The Borland C++ 5.02 compiler generates the following assembly code(__cdecl calling convention—the arguments are pushed on the stack from right to left, the caller cleans the stack. More information about calling convention can be found here):

Dissassembly of main:

00401110 55

push

ebp

store the ebp register on the stack

00401111 8B EC

mov

ebp, esp

current stack pointer in ebp

00401113 6A 01

push

1

the arguments of func are pushed on the stack from right to left

00401115 6A 00

push

0

00401117 E8 EC FF FF FF

call

func

this call pushes the return address (0040111C) on the stack

0040111C 83 C4 08

add

esp, 8

__cdecl calling convention; caller has to clean the space that arguments used on the stack. Every push decrements the stack pointer and every pop increments it by the size of the operand. Two ints are pushed on the stack => the stack pointer has to be increased by 8.

0040111F B8 01 00 00 00

mov

eax, 1

the return value of main goes in eax

00401124 5D

pop

ebp

ebp is restored

00401125 C3

ret

Dissassembly of func:

00401108 55

push

ebp

| prolog

00401109 8B EC

mov

ebp, esp

|

0040110B 83 C4 F4

add

esp, -0x0c

make room for 3*4 bytes for the local variable n on the stack

add

esp, -0x0c

make room for 3*4 bytes for the local variable n on the stack

0040110E 8B 45 08

mov

eax, [ebp+0x08]

move the return value (nArg1) in eax

00401111 8B E5

mov

esp, ebp

| epilog. epb contains the caller frame ptr

00401113 5D

pop

ebp

|

00401114 C3

ret

1.2. Callstack

It is not very hard to see how the stack frame of a function can be used to get the caller address of this function. As I have mentioned before, *(ebp+4) points to the return address of the function. This address is inside the body of the caller.

1.2.1. How to Get the Starting Address of the Caller

Approach 1: Prolog searching

We supposed that all functions are compiled with a prolog and an epilog. Having an address inside the function, we just have to search for the byte sequence 55 8B EC. As you can see from the dissassembly above, these are the opcodes of the prolog. Let's call them the "prolog signature."

Unfortunately, there is a problem. The same sequence of bytes could appear in an instruction encoding. For example, the instruction mov eax, EC8B55h has the following instruction encoding: B8 55 8B EC 00.

Obviously, when we are searching the prolog signature byte by byte we will find the signature somewhere inside the mov instruction.

Note: In my experience, the method of searching the prolog signature in most cases works fine. To keep things simple, you can skip the next section.

Well, I don't have an elegant solution to the above described problem. For that reason, I am going to kill a mosquito with a nuclear bomb.

Approach 2: Backwards disassembling

Given an address inside the body of a function, it is quite enough to find the address of the previous instruction. This task cannot be solved without knowing the instruction format. We need a disassembler to do this. You can find in the source a function called FindAddressOfPrevInstruction:

I think this function is straightforward. Having this function, it is easy to find the prolog signature in a more precise way—you just have to disassemble backwards until you find the prolog signature.

1.2.2. Tracing the Stack

As you can see from Figure 1, every function stack frame contains a pointer to the caller's frame that contains a pointer to its caller frame, and so on. In fact, we have a list of stack frames which can be used to find the callstack of every called function. But still there is an important question: When do we have to stop stack browsing, or, in other words, when does this list begin?

I think we can find the answer if we take a look at the process and thread starting routines which resides in kernel32.dll. When Windows creates a process and its main thread, it performs an internal call to the CreateProcess API. CreateProcess, on the other hand, invokes an internal routine in kernel32.dll named BaseProcessStart. Here is the disassembly (under the condition that you have kernel32.pdb):

KERNEL32!BaseProcessStartThunk:

77e8d2e4

xor

ebp,ebp

Look here!

77e8d2e6

push

eax

77e8d2e7

push

0x0

This can be interpreted as a return address

KERNEL32!BaseProcessStart:

77e8d2e9

push

ebp

This is a normal stack frame. But the previous frame ptr is zeroed a few lines above

77e8d2ea

mov

ebp,esp

77e8d2ec

push

0xff

...snip...

77e8d323

call

dword ptr [ebp+0x8]

call the entry point of our process (for example, mainCRTStartup)

77e8d326

jmp

KERNEL32!BaseProcessStart+0x3d (77eb6624)

Similar things are happening in each call to the CreateThread API:

KERNEL32!BaseThreadStartThunk:

77e964cb

xor

ebp,ebp

77e964cd

push

ebx

77e964ce

push

eax

77e964cf

push

0x0

KERNEL32!BaseThreadStart:

77e964d1 55

push

ebp

77e964d2 8bec

mov

ebp,esp

77e964d4 6aff

push

0xff

...snip...

77e9651d ff750c

push

dword ptr [ebp+0xc]

push the argument to ThreadFunc

77e96520 ff5508

call

dword ptr [ebp+0x8]

DWORD WINAPI ThreadFunc( LPVOID );

77e96523 50

push

eax

77e96524 e805000000

call

KERNEL32!ExitThread (77e9652e)

77e96529 e923f10000

jmp

KERNEL32!BaseThreadStart+0x81 (77ea5651)

Judging from the examples above, I think we can conclude that stack frames tracing can stop when either the return address or the old ebp turns to zero.

1.3. "Called from" Functionality

Up to now, we know how to create a callstack list (a list of addresses of functions). I would like to say a few words about the following question: Is it possible to implement a method that will allow the following C++ functionality?

1.3.1 Do We Need Such a Functionality?

This functionality could be considered as a new type of C++ runtime information. Because it is not implemented in the C++ standard, it is probably useless.... In my opinion, this is a theoretical question and it should be the subject of a separate discussion.

1.3.2. How to Implement This Functionality

Because we know the call stack of foo, it may seems trivial to browse the caller addresses searching for the address of functionX. Unfortunately, things are not so simple. The main difficulty is how to get the address of functionX. functionX can be a class member (virtual or nonvirtual) function.

There is no way to get the "real" address of a function from inside a C++ program. By "real address," I mean the relative virtual address where the compiler placed the function body. In most cases, when you get a function address from inside a C++ program, it does not appear to be a real address, but rather an address in some virtual or thunking table.

On the other hand, it is definitely clear that callstack addresses are real addresses. So, what can we do in this case? The only simple solution I have found is to use function names instead of their addresses. This is because there is a relatively easy way(s) to find the address of a function, having been given its name. This will allow us to implement the above function as follows:

Now the question is how to find the address of a function given its name. There are two common ways to do this:

1. The development environment provides libraries that allow working with symbolic information.

For example, Microsoft provides the debug help library named DbgHelp (prior to Windows 2000, the library was known as the Image Help Library). The library contains functions for working with symbolic information; for example SymGetSymFromName, SymFromName, and so forth. As an example of using the DbgHelp library, here is the function getFuncInfo you can find in the source. As far as I know, Borland provides a similar library named Borland Debug Hook Library, that can be used to extract information from Borland debug symbol (.tds) files. Unfortunately, there is not much information on the Net about how to use this library. You can download it from here.

2. Working with .map files

Both the Microsoft VC++ and Borland C++ compilers are able to generate .map files. Generally speaking, .map files are text files that contain information about functions (and variables) in a module and their addresses. For example, here is a snippet of a .map file generated by the Borland C++ 5.02 compiler:

The important thing here is that the addresses are logical addresses. For example, 0001:00000639 means that the destructor StackDumper::~StackDumper() resides in the first section in the PE file at offset 0x639 in that section. In the source, you can find a simple function (written by Matt Pietrek) that converts linear addresses to logical addresses. Using this function, you can convert the addresses from the stack trace into logical addresses that can be found in the .map file. Having the logical address you just have to parse the .map file to find the function name.

As a conclusion, I have to say that I don't have any idea about how to implement the IsCalledFrom functionality in your release builds—when you have neither debug information nor a .map file generated (or you don't want to distribute such information with your program).

2. Exit Thunks

In this section, I will describe how to use the stack frame information in order to implement exit thunks. An exit thunk is a function that is invoked immediately after the ret instruction of the function for which the thunk is installed. Exit thunks can be used, for example, in profiling applications. Here they are implemented just as another example of how to use the stack frame information. The idea is very simple—to install an exit thunk we just have to declare a local variable: StackDumper varName(true) (true means "use exit thunk") in the body of the function for which we want to install the thunk. The destructor StackDumper::~StackDumper() first saves the original return address of the function(for example, foo) where StackDumper is declared in a static local variable in StackDumper.

Afterwards, it replaces the return address of foo with the address of the ExitThunk. This causes the ret instruction of foo to pass the control to the beginning of the ExitThunk. (Note that the destructor is the right place to perform this replacement. If we replace the function return address in the constructor (for instance), subsequent calls to the DumpStack function will generate erroneous stack trace information.) This is not a common function call—the ret instruction has popped the return address from the stack (which is now the address of ExitThunk) and a jump is performed. So, if ExitThunk builds a standard stack frame, this frame will not contain a return address.

Another problem is that ExitThunk has to be "invisible"—it should not touch the registers, especially eax—where the foo has placed its return value (if any). If ExitThunk is a standard function, it will have something like this at the beginning(MSVC, Debug):

You see that ExitThunk could not be a normal function. This is the reason for which this function must be declared naked. From MSDN: "For functions declared with the naked attribute, the compiler generates code without prolog and epilog code. You can use this feature to write your own prolog/epilog code using inline assembler code." Fortunately, the three compilers I have tested the examples with (BC++ 5.02, BCB6, MSVC6) support naked function calls. (For Borland C++ 5.02 users, this is probably a surprise. This was not documented. ;-))).

This is the solution of the above-mentioned problems. So, the implementation of ExitThunk could be the following:

Acknowledgements

I would like to thank mamaich for his help on disassembling issues!

Note about the source compilation. To compile inline assembler source code with the Borland C++ 5.02, you will need the Turbo Assembler (tasm32.exe). The tasm32.exe is not included in Borland C++ 5.02 distribution. If you have BC++ Builder, you will find the tasm32.exe in the bin folder.

Downloads

Comments

Release build without pdbs (Symbol file) it is not working

Posted by Sang_kd2
on 10/23/2009 05:24pm

Hi Dimitrov,
Thanks for getting this article here.Its really gr8!.
When I try to build my application in release build where in pdbs are not generated - it doesnt work. IS their any alternative to this? bcos we can/should not ship our product with PDBs.

How debuggers do it?

Your article pointed out a very interesting thing that I never thought about before -- viz, stack frame does not directly give start address of the caller, only the return address inside the caller.

Debuggers (like in VC++) show "call stack". Do you think these use a similar technique -- backtracking using disassembly? Does VC++ use _penter for the purpose?

In case of instruction backtracking, I wonder how "GOTO" and jumps will affect the code. Also, VC++ debugger allows "Set Next Instruction" which essentially changes the instruction pointer, and can make backtracking mechanism fail.

I have not seen any such problems with VC++ debugger (though "code" can crash if a jump is made to a different function using "Set Next Instruction").

Advertiser Disclosure:
Some of the products that appear on this site are from companies from which QuinStreet receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. QuinStreet does not include all companies or all types of products available in the marketplace.

Thanks for your registration, follow us on our social networks to keep up-to-date