Debugging techniques

There are a thousand different debuggingtechniques. The following are those that I have gotten the most mileage out of over the years:

Dump Routines

When you are defining a data structure, take a few minutes to create a routine that dumps the contents of the structure, in human readable form, to stdout or stderr (or if debugging Windows® software - use OutputDebugString()
as mentioned below). This may seem like a waste of time, but it makes tracking down dynamic structure allocation problems much easier.

Logging Tools

Once your program is released, you will inevitably find that your users report bugs. Unless you can visit each user's site you need some way to get debugging output from your program, while it is running on the user's system.

This is where logging/tracing earn their keep. Build your program from the ground up with a trace/logging facility. Design your program so that a command line argument, environment variable, or menu option will the enable trace output.

Create a set of multi-level logging routines that provide trace output from your program. Design your program so that a command line argument, environment
variable, or menu option will enable trace output. The option to save trace output to a file gives you a way to get program traces from your users. Assign consistent levels to various functions. The higher the traces level the more information that gets dumped. For example:

Feel free to assign your own scheme, but do it consistently, across all modules.

Windows® Debugging

Make use of OutputDebugString() to output debug messages. When used in combination with a multi-level logging capability, you have a powerful tool for tracking down software weirdness. Several third party tools are available
which allow you to tap into the output of this routine, and most of these tools allow you to save the output to a file.

Look at your malloc(), calloc(), realloc(), and free() calls. Look at each and every malloc() or calloc() call, can you easily point to its matching free() call? If not, get suspicious quick! Avoid calls to realloc() unless
you really understand it (and how it behaves on your given operating system). When possible use calloc() instead of malloc(), calloc by default initializes all allocated bytes to a value of 0x0.

Further Reading

No Bugs!, by David Thielen, published by Addison-Wesley, ISBN 0-201-60890-1

a. & n. from Debug, v. (hah missed that one Webster 1913 ! What ? The word didn't exist in 1913 ? You win this one Webster 1913, but you'd better watch your back from now on...)
Put simply it is the act of finding (and hopefully fixing) the bugs (ie errors) in your code. Often infuriating, but usually rewarding when you finally find and crush the little bugger. Debugging normally starts when your program exhibits some abnormal behaviour. If you're lucky you will be able to reproduce the problem easily and get to work. If you are unlucky the program will behave fine every time you try. You are of course neglecting the influence of the alignment of the planets on your code.Your first task is to gradually locate where in your code the problem is occurring. You may already have a fairly good idea of where the problem is thanks to things such as core dumps, Stdlog files (generated by Macsbug on versions of the Mac OS prior to Mac OS X), crash logs etc., which provide information on the state of the program if it actually crashes. Here's one from a program I've been working on :

It is giving be a stack trace for each of my threads,
which basically tells me the name of the function that was executing when my program crashed. As you might expect, this narrows down the problem significantly.

If the bug you are hunting doesn't actually cause a crash then you're best bet is to follow the program and see what it is doing.
If you are lucky enough to have a debugger (which you will almost certainly have these days), you will usually put breakpoints at various places in your code. When the execution of your program reaches one of these points the debugger will step in and let you examine the contents of memory and variables, and step through your code line by line. If a crash occurs, the debugger will often show you what line caused it. If you are doing the wrong thing you may see it happen. If you don't have a debugger, you will have to add statements to your code that output data you are interested in. This is of course less flexible and may also interfere with the problem you are trying to fix. Depending on your operating system, you may have other tools at your disposal, for example environment variable that cause system libraries to print extra information about what they are doing (or sometimes separate "debug" versions of these libraries).

Debugging your program may alter the way your program works. For example if the problem is caused by 2 threads trying to access a same piece of data or resource at the same time (a common problem, known as a race condition) then you interrupting the execution may stop the simultaneous access from happening. Even something as innocent as adding a printf statement can alter execution of your program in some way.

Often the problem is simply the final result of an earlier problem. Part of your program may trash some data another part of your program relies on. A crash may happen when the second part executes, but this may give you very little information on where the actual problem occurs. Even worse is when your program is trashing the stack or the heap, which will usually cause your program to crash at seemingly random points.Yet another type of problem, is what is known as a deadlock. When this happens you don't actually get a crash, the program just locks up. This happens when part A of the program is waiting for part B to complete, part B is waiting for part A to complete. As you can see, when this happens you will wait forever.

All these previous types of bugs are what I might call an implementation bug: you had the right idea when you were writing you code, you just messed up when you converted your ideas into code. Equally insidious is what i call a logical bug, i.e. a bug that is caused by a fault in your logic or design. You can step through code till you're blue in the face, it won't help much until you realise you were thinking about the task your program is doing in the wrong way. And even then you have to come up with the right way of doing it, which may involve rewriting significant amounts of code.

At some point you will probably end up looking through hundreds or thousands of lines of code trying to work out what is happening, cup of coffee in one hand, mouse in the other. You may make random changes, and keep your fingers crossed while you run the program or send it off to testers. Oh the sinking feeling when you get an email with the subject "Bug not fixed"!!

But in the end it's all worth it, the feeling of satisfaction you get when you have sent the little bugger into the other world keeps you going until the next bug is found.

I hope some of the non developers out there have gained a brief insight into what we are actually doing staring at our screens at 2 am and would like to finish with a few words of advice if you ever submit a bug report:

If you're thinking "Hey I'm just a user, you're the developer, it's up to you to fix all that!" then bear in mind that the better the bug report is, the easier it will be to find and fix the bug and you will have a better product sooner.