Wednesday, April 8, 2009

Finding the Hard to Reproduce Errors

I used to do a ton of contract work. One of the most common reasons I was called out to a company was that their users were experiencing an exception, usually an access violation, which they could not reproduce in the development environment.

Since I have been fairly busy, and I really don't have time to do side work. I had a request to share what I do in this case, so they could go about fixing it with out me. Instead of just responding via email with my methods of madness, I decided to share the way I do this with everyone.

First off the only way your going to get information to help is with good logging. There are a variety of tools out there to help. But I have found a good old text file is typically the easiest method, although products like SmartInspect and CodeSite offer quite a bit more and may be worth investing in, I will let you evaluate and decide.

The problem with logging is that the information is only as good as what you supply. You don't know where the error is going to occur and you really don't need a log file endlessly filling up with useless information. So how do you know where to log? Well my favorite places to log information on Screen changes. That way I know the order of the screens visited, and sometimes key values that are needed for that screen. I also add verbose logging on a temporary basis for hard to find problems.

But hey! The odds are I have told you nothing that you did not know already.

I have found many people are not familiar with call stack logging when an exception occurs. In .NET getting the exception call stack is easy, you just need to reference the .StackTrace property on the exception. Once you have that write it out to your log file, using your method of choice.

However, with native code it requires quite a bit more work. This is because information about method names and line numbers are not part of the compiled code.In Delphi the information that contains this information is in the MAP file. The key is how do you get this information at the time of the exception and log the details you need.

There are two commercial products out there that do just that and more.

After installing the JCL, you will want to open and look at the following Demo applications found in the "jcl\examples\jclDebugExamples.bdsgroup" project group.It offers a fairly good look at the possibilities.

Ultimately the question is: How do I plug the JCL Debug into my application and make it useful.

Here are simple steps to do this.

Download and Install JCL

Open "jcl\experts\debug\tools\MakeJclDbg.dpr" and Compile. It may have already been compiled during install.

Open: "jcl\experts\debug\dialog\CreateStdDialogs.dpr"

Look at "Params." around line 80 and set the params you want.

Run the application.ExceptDlg.pas/dfm and ExceptDlgMail.pas/dfm will be generated

You are good to go! Unhandled exceptions in your application will be replaced by this new dialog. Since the code to the dialog is in your source directory you can change it to meet your specific needs and/or design layout.

In the last few steps, I showed you my preferred method of dealing with information for symbols, but the JCL supports many other methods.

For example there is an expert that you can install that will do the same thing without dropping to the command prompt. Since I never deploy an application that is compiled in the IDE, I use FinalBuilder, calling MakeJclDbg seems second nature.

One of the key things is that symbols can be stored in a variety of different ways, and JCL will search for them in the following order.

JCL Debug data in the executable file (What MakeJclDbg and the Expert will do)

JDBG file

Borland TD32 symbols

MAP file

Library or Borland package exports

There is a good document on the options found in "\jcl\experts\debug\HowTo.txt"

Now the most surprising question that I have heard over and over after doing this is: What do I do with this information?

Well you now have a call stack when the exception occurred. You can see also see the line the exception occurred on. Often looking at the line of code where the error occurred shows that the given method was called incorrectly. With the call stack you can see the line of code that called that method. This typically this should give you enough of a clue to resolve the problem, or at the very least you will know where to add more logging to figure it out.