This is Part 1 covering various techniques on how to debug applications on Windows, focussed on application crashes

Introduction

This article explains debugging application crashes in an easy and simpler way for Windows Applications.
The scope of this article is limited to user-mode debugging. This article covers very basic debugging using WinDbg, procdump.

Pre-Requisites

Background

While using or working on Windows applications, we all have seen applications stop working for unknown reasons.
A General Dialog, which we all have seen, is somewhat similar to this.

When we see this, we generally select the option "Close Program" and then try to launch the application again.
If the same repeats and it is a third party application, then we report the issue and wait for a solution.

Now, we will move to the other side of the coin, which is the team that will be analyzing this issue and give a solution
as soon as possible, because this has stopped production on the customer site.
Let's go into a little bit of detail and see step-by-step why exactly the application crashed, why it happened, and how can we solve this.

Definition

An application crash is an unexpected situation which stops the normal functioning of the program. Let's consider the following source code for example:

When we execute this sample, we get the same dialog as shown above related to the Application Crash.
What is the reason for this application crash, "*p=10", "assigning value to an unallocated pointer" or in other words "assigning value to a NULL pointer".
We can say this since we have the code and it is small enough to figure out the source of the problem. Identifying this issue in millions of lines of code is not easy and fixing it is far more difficult. So this boils down to the conclusion that we need to have some technique by which we could get to the precise root cause of the issue (or at least around it) without digging through the entire code.

Debugging Techniques

There are many different techniques used to identify why an app crashes, but some things remain common across different techniques.

Step 1: Identify the Faulty Module

Identification of faulty module can be done using the event viewer. Consider our current example, i.e., AppCrash.exe, once it has crashed, it would have generated an event in the event viewer. Go to "Run" type "eventvwr":

Have a look at the Text written in the General Tab, there are two interesting points in that:

Faulting Application Name: Indicates the application which is faulty. In this case, it is AppCrash.exe.

Faulting Module Name: Indicates which module in this application or executable has misbehaved. In this case (again), it is AppCrash.exe.

This makes it clear that the issue resides in AppCrash.exe. If the faulting module had been, for example, AppCrashLib.DLL", then that would have been the culprit and we would have had to debug that.

Another important point is Exception Code this explains what exactly this error means. In the current case, exception code is 0xC0000005 which means Access Violation, which means application is trying to access invalid memory location. To get the list of all the Exception codes, please refer to the link below:

Crash dump basically contains the current working state of the program which has terminated abnormally. Crash dump can also give us a complete state of the current memory, i.e., RAM, which can be used for analyzing the problem. The simplest way to take the crash dump is "procdump." procdump should be configured before the application crashes, procdump -ma -x c:\dumps "E:\Study\Windows Internals\Training\Sample Code\AppCrash\x64\Release\AppCrash.exe". This is one of the most basic examples of procdump, more options can be explored. With this option, it will launch the process
and it will take the full memory dump when the application crashes and save it to c:\dumps.

Step 3: Analyze the Crash Dump

Now that we have got the dump, we need to analyze the dump. The best way to analyze the dump is "Windbg."
WinDbg is the father of all the debugging tools available (as of the writing of this article) on Windows. We will not get into the intricacies of Windbg, this is out of scope of this article. We will be concentrating only on how we analyze the dumps with Windbg.
To start analyzing the dump, we need the pdb files corresponding to the executable version, which has crashed.
pdb is nothing but program database, it contains all the debugging information required for debugging an application.
The only constraint is the pdb and executable should be of the same timestamp or else the program database symbols do not match and hence we cannot analyze the dump.

In the next step, we launch the Windbg and configure the pdb files as shown below:

It will show the below screen after dump file is being loaded successfully:

Just go to the command window and "!analyze -v" like below:

After typing the above command, we do get the below output:

Now, we need to concentrate on different parameters to identify the issue.
If we see the stack trace, it says the crash happened in Appcrash.exe, in function main at Offset of 0x39. This does not give us the exact faulty source code which may have caused the problem.

Let's check what the below statement says, AppCrash!main+39 [e:\study\windows internals\training\sample code\appcrash\appcrash\source.cpp @ 9]. This gives us the exact location where the crash happened and the lines below give us more details:

In the above analysis, the crash actually happened at line number 8, but windbg points to line number 9. This is due to optimizations which are enabled during the compilation. So if I want to identify the exact line which is having the issue, it is line number 8.
Since the NULL pointer is being assigned a value, I tried to write to a location which does not exist.

Step 4: Fix the Issue and Release

Since we know the issue, we can now allocate the memory for the pointer and then assign the value. So the new code would be:

Optimizations

We discussed that due to optimizations being set, we were not able to get the exact point where the crash is happening. Let's discuss optimizations some more.

Optimizations mean to what level we are asking the compiler to do optimizations. As we move up the levels like "Full Optimization" means
that binary size would be lesser and less debugging information would be there with the pdb file. As we move more down the level, for example, "Disable Optimization," we will have more debugging information and a larger sized binary and pdb. Similarly, if we build the binary in debug mode, we do get more debugging information and more the size of binary.

We see that, overall, there are four options available to be configured. Normally, the option selected in most projects is "Maximize Speed," which is enough for debugging the crashes being reported by customer. In the above mentioned example, if we disable the optimizations, then we do get the following result.

So here, we see that it points exactly to the position where the problem is i.e *p=10.
This happens since the debugging information is sufficient to identify the root cause of the issue.
So as a rule of thumb, when we make the release, we should maintain the pdb files
so that they can be used to analyze the crash dumps on customer site.

If the issue is reproduced locally, then it is recommended that optimization be disabled, then rebuild the EXE
and collect the latest dumps and analyze them to make life easier.
Debug mode is not advisable, since there are lot of issues which will not occur in debug mode.

pdb Files

For any unmanaged code which is being built, pdb files are being created along with EXE files. These pdb files contain the debugging information, which is necessary for debugging any issues. In other words, this file is also known as Symbol file. Symbol File contains different symbols which are useful for debugging. To name few of them Local Variables,Global Variables, Function names, Source Line numbers, etc. Each of this information is known as symbol. There are 2 Types of Symbols available:

Public Symbols contain relatively very less information as compared to private symbols. Public symbols contain only that information which can be viewed across different files. So this calls out that local variables, will not be available as part of public symbols. Even most of the functions in Public symbols will have decorated names.

Debugging with private symbols will even give line number of where the problem is (as explained in the above example), but this will not be the case with public symbols.

Most of the companies do maintain two symbol servers, one private for internal use and public symbols for external distribution.

By default, Visual Studio Build generates Private Symbols, to make it public add the flag /pdbstripped under linker section. Follow this link for more details.

Summary

This was a very simple and straightforward way to debug the issue. Normally, there would be much more complicated ways compared to this.
Such complications include having multiple modules and multiple threads, misleading stack traces which need to be analyzed carefully.
We have just covered a very basic scenario, there is a lot more to be explored on this.

Share

About the Author

I have been doing software development since 8yrs, i am crazy about coding,designing,architecting systemsi love programming in C/C++ thats my hot favourite!!!In my free time i like to travel a lot, see new places, meet new people, explore the world

check if symbols are loaded correctly using command !lmi <modulename> for Eg : if module name is app.exe, then command would be !lmi app.exe. this would display the path from which symbol file is loaded.

if you feel its not correct symbol path, change the path from the windbg UI and fire this command .reload -f -i <modulename> eg: .reload -f -i app.exe.

After that analyze the dump using !analyze -v and see the details. if you do not find the root cause do send me the output of windbg.

On which version of Windows are you seeing this ??The overall dialog is being displayed by the OS, it is totally under OS control.on Widnows 7 Dialog is little different than Windows 8, the one which is there in the article is for Windows 8.The Debug button not present in the dialog is not related to pdb files.

Great article series, but could you perhaps insert forward/backward links to the other parts, or just a list of links to all parts in some place? I realize this is tricky when you just release the first article, but you could add them later on.[edit]forget it, just spotted it! [/edit]

GOTOs are a bit like wire coat hangers: they tend to breed in the darkness, such that where there once were few, eventually there are many, and the program's architecture collapses beneath them. (Fran Poretto)

As I wrote, I spotted the forward link in the summary afterwards. But having all the links at the start is even better. Nice work!

I just did take the time to actually read part 1, too. We've only recently introduced crash reports in our application, and I still need to wrap my head around it.

Good explanations, my 5. I really can't think of anything to improve (for now )

Going to read the follow-ups now ...

GOTOs are a bit like wire coat hangers: they tend to breed in the darkness, such that where there once were few, eventually there are many, and the program's architecture collapses beneath them. (Fran Poretto)