Introduction

Every reasonable sized computer program contains bugs or unforeseen behaviour because of unexpected user input. The most annoying thing that can happen is that your program crashes (e.g., access violation, divide by zero, ...) or that your program throws an unexpected (untested) exception when it runs out of your control (at the customer's site, no debugger available). The next thing then - of course - is that you need to find the cause of the problem.

In most Windows based desktop applications, if the exception isn't caught in code, the OS will catch the exception and present this nice popup window to tell you that there was a problem and that Microsoft can be informed about it. If you setup everything correctly, you can obtain a dumpstack for post mortem analysis.

Sometimes, for several reasons, people implement their own "catch and log" mechanism to track down problems. So did I, as I will explain next.

Background

I write software for embedded applications, quite often without direct user interaction. I.e., no screen, mouse, or keyboard, hence embedded software. In these situations, you don't want code to be invoked displaying a message on a screen that requires a user to press a button. I will present you with a mechanism that I developed for such situations and that helps you catch, trace, and analyze the problem.

The sample code is specifically targeted to the Microsoft C++ compiler and the x86 CPU architecture. The sample code runs both on desktop Windows and Windows Embedded CE. Windows Mobile is based on Windows Embedded CE, but as Windows Mobile is mainly targeted at the ARM architecture, the source code samples do not apply there. Although, with the proper ARM compiler knowledge, the same mechanism explained here can be used.

You might also wonder why I didn't use the Debug Help APIs. Well, they are not available for Windows Embedded CE...

The problem

I want to have full control over any exception that might happen. Therefore, every thread that I write is protected at the highest level with a try { } catch () block that needs to catch any exception.

In C++, you can use the try { } catch () statement to catch exceptions being thrown by your code. However, by default, it will not catch Structured Exception Handling (SEH) - or in other words, Windows OS specific - exceptions like an access violation or divide by zero.

Moreover, in situations where you throw your own exceptions and you haven't carefully designed your own classes derived from std::exception, or if you use third party libraries, you have no idea where the exception occurred in the code if you have no debugger attached. It would be quite helpful if at least you could trace the callstack from the moment the exception occurred until the moment you catch it and save it to a log file.

The solution

For the first problem, the Microsoft compiler can helps us. Although not popular with most programmers, I do specify the Microsoft C++ compiler /EHa option instead of the default /EHsc. I.e., SEH exceptions are translated in/transformed to standard C++ exceptions. In C++ terminology, this means that you can use try { } catch (...) statements to catch all exceptions, be it SEH or standard C++. It also eliminates the need for non C++ standard __try { } __except() { } or __try { } __finally { } statements. Moreover, if you use the _set_se_translator() API carefully, you can even generate std::exception() based exceptions that are more meaningful.

I do have to note, however, that mixing modules (libraries/DLLs) that are compiled with either /EHa or /EHsc will give unexpected behaviour. So, always be sure to use one of both mechanisms, not both at the same time.

The solution for the second problem is more complex. If you throw your own std::exceptions, you can always provide a meaningful description about why the exception occured and where in the code this happened (using predefined macros like __LINE__ and __FILE__). But, what if you are not in control, or the exception description is something like "some stupid error occurred"? Or, if you catch the exception with a catch (...) statement?

The explanation that follows will give you a mechanism to trace back the callstack to the point in code where the exception occurred.

Trace back the callstack

Although a try { } catch() statement and its corresponding behaviour (like automatic stack unwinding and object destruction) is a standard C++ mechanism, the C++ standard does not specify how this mechanism should be implemented by the compiler. The implementation therefore is compiler specific. What follows is Microsoft specific, and only tested with Visual Studio 2008, although most likely it will work for older versions as well.

The sample code implements four classes: C1, C2, C3, and C4 that each implement one method. C4::Test4() calls C3::Test3() that calls C2::Test2() that calls C1::Test1(). C1::Test1() causes either an access violation:

char* c = NULL;
*c = 'A';

or throws a std::exception():

throw std::exception("It failed");

depending on what you define in the source code. At the top level, C4::Test4() will catch any exception:

Step by step

We will explain step by step what happens inside the catch(...) block:

First of all, we need to understand a bit on how the Microsoft compiler implements the stack unwinding mechanism. The secret is revealed in Slava Ok's weblog. When an exception occurs, the EBP CPU register is adjusted to a frame containing the catch block, but the ESP CPU register is not touched.

So, the first thing to do when we arrive in the catch block is to store the EBP and ESP registers:

BYTE* pStack;
BYTE* pFrame;
__asm
{
mov pStack, esp;mov pFrame, ebp;}

This gives us already an idea of the stack region that we have to search for the callstack information.

Next, if we look at the generated assembler code of a C++ function call, we see in the first assembler instructions that the compiler emits the following assembler code (a.k.a. starting a new stack frame):

All function arguments and local variables are now accessed relative to EBP. We should also know that an assembler call instruction always implicitly pushes the return address onto the stack (pointed to by ESP).

Summarized, our stack layout looks as follows:

Putting all this knowledge together, we can use the EBP register that points to the current stack frame to walk and record the current call stack. [ebp+4] is the caller’s address, [[ebp]+4] is the caller’s caller, [[[ebp]]+4] is the caller’s caller’s caller, and so on.

If you look careful at what we said before, we can find the addresses that contain the call instructions by taking the 4 byte values (8 byte on 64bit CPU) just above the frame pointers stored on the stack (stack always grows from high address to low address on an x86 architecture).

Cool, isn't it? We have traced the complete callstack from the moment the exception occurred until the catch statement.

But there is more. If we could find somehow a way to translate this raw address pointer values to a meaningful C++ label, this would make our debugging task even more easy.

Before we delve into how we can accomplish this, I need to elaborate a bit more on how the C++ compiler works. Sometimes a call instruction is followed by a jmp instruction, especially when class methods are involved. We are not interested in the call instruction address, but rather in the jmp instruction address as the latter is related to our C++ code. This is why you see the code searching for both call and jmp instructions.

Last but not least, we output all this information to a log file called 'dump.txt'. Note that we use a specific format so that later on it will be easier to parse the log file to complete the raw address pointers with more meaningful symbol names... We also add some information about all the modules (DLLs) that were loaded into memory when our program started. We need this information when we will try to match a code address to a virtual address stored in the PDB file. Every virtual address assigned to a C++ function is relative to the module's base address (= where the DLL is loaded in memory).

PDB files and the DIA SDK

PDB files contain debug information about your executable code (exe or DLL). Everything the debugger needs during a debug session is stored in the PDB file (symbol names, address information, ...). As the format and contents of a PDB file might change over different compiler versions, Microsoft created the Debug Information Access (DIA) SDK. With this SDK, it is possible with a unified (and stable over different versions) interface to iterate over the PDB file information. I will not go into details, you can find other CodeProject articles about its use.

The DIA SDK itself is a COM DLL that you can find as part of the installation of Visual Studio. By default, it is not registered, so you need to register it yourself.

I created a C# console program - named DiaTool (also included in the sample code) - that needs:

a list of PDB files used in your program, stored in DiaInput.txt

at least one 'dump.txt' file to analyze

The DiaTool will try to match all reported function addresses with the virtual addresses stored in the different PDB files. The result will be saved in 'dump.txt.anl'.

When you compile your own executable, the corresponding PDB file that will be created for you will contain all the symbols of your program. However, there is more than your own code. A C++ program will call into the CRT runtime library. When the CRT library is referenced as a DLL, you need the PDB file for the CRT runtime as well. Also, during the exception, the generated code will call into kernel code. So, you need the PDB files for kernel32.dll and ntdll.dll. If you use other DLLs (COM, system DLLs), you need the corresponding DLLs too. You can download the Microsoft specific PDBs from the Microsoft website here. Or, specify the following information in Visual Studio | Options | Debug, and run your program once with the debugger attached.

When you debug your program, with the Symbol file locations filled in, Visual Studio will obtain the PDB files from the Microsoft symbol server for you automatically and store them in the 'D:\VisualStudioCache' folder.

Using the samples

Callstack.exe

You can run the sample code in several flavours. You can specify the following defines in stdafx.h:

CATCH_LOWER_LEVELS: You can choose to include try { } catch () statements in the lower levels in C1::Test1(), C2::Test2(), and C3::Test3(), and experience the difference. If any.

DONT_SWALLOW: If defined, the exception will not be 'swallowed' at the highest catch level in C4::Test4(); instead, it will be thrown outside the current thread context. As such, the OS or Visual Studio Debugger will deal with it.

Here is an example of what will be output to 'dump.txt'.

The 'CallStack' source code includes more functionality than I have described here. I leave it up to you to explore it further and experiment with it.

DiaTool.exe

The DiaTool accepts a few commandline parameters. Run 'DiaTool.exe /?' to output a list of all options, or check the source code. A 'DiaSample.bat' file is included to show how it can be used. After processing 'dump.txt', it will output 'dump.txt.anl' as shown in the following picture:

'_RtlDispatchException@8' is where the exception code starts, so the line logged above this line is where the exception occurred. This is according to the sample in 'callstacktest.exe : public: long __thiscall C1::Test1(void)'. Indeed, this is where we caused - on purpose - an access violation.

There is even more information to be found in the PDB files. Explore it yourself!

Good to know

It is possible that the compiler - depending on what compiler optimization options that you have selected - will eliminate frame blocks. When you compile for speed (typically in a 'Release' build), the compiler might decide that it can save on instructions by reserving and using the EBP register for other use. Don't be surprised if you see 'missing' call stack frames. They are not really missing, the compiler simply optimized them away. Check out the '/Oy' or frame pointer omission compiler option for more information. See also references at the end of this article.

Conclusion

I presented you code that can reconstruct the C++ callstack from the point where an exception occurred in your code until it will caught by your code. In fact, it can even replace part of the debugger functionality at runtime.

The source code is written for and tested on 32 bit desktop Windows (XP and Vista) and 32 bit Windows Embedded CE 6.0. However, with a few changes, it can compile and work for 64 bit code too.

I decided to put the DIA functionality in a separate DiaTool that needs to be run afterwards on the dumped information, because I use this code for my embedded C++ applications running on Windows CE x86 hardware. The DIA COM DLL does not run under Windows CE, that's why. If you target only desktop Win32 x86 applications, you can integrate the DIA functionality directly in your C++ analyzing code. This eliminates the need for a post mortem separate DiaTool to parse 'dump.txt' into 'dump.txt.anl'.

Arm is a completely different processor architecture than Intel x86. The mechanism described in the article only applies to x86, therefore it can never work for Arm. This said, this doesn't mean you can do something similar (walking the callstack) for Arm too, but this goes beyond my knowledge.
You can however try to figure out how the Arm compiler keeps track of the callstack by looking at and comparing the GetThreadCallStack() API under Windows CE. It is available for all processor architectures and can be used to obtain the callstack (if frame pointers are available on the stack) of a running thread. If you have PlatformBuilder (CE5 or CE6) installed, you can find the source code of this API in the WINCExxx source directory and see how they (MS) do it. Note however that this API cannot work when you are in the process of an exception trace, the WINCExxx source code explicitely states that it removes all exception callstack info .
Another good starting point can be http://msdn.microsoft.com/en-us/library/ms254220(VS.80).aspx[^] According to what is mentioned here, you need to look at what is done with the R11 register during function prolog an epilog code. This is what I did for x86. R11 is frame pointer and can be compared to what EBP is for x86.
Also a good starting point is http://carbidehelp.nokia.com/help/topic/com.nokia.carbide.cpp.debug.crashdebugger/html/DebuggingInformation/CrashDebuggerCallStack.guide05.html[^]