Search & Destroy Bugs

Like death and taxes, bugs are inevitable. It probably isn't possible to prove in a scientific sense that that any non-trivial program contains bugs, but it's good practice to assume bugs exist, and to program accordingly. Doing so enables you protect against them more easily and more completely.

In this article, I'll walk you through the process of designing your apps to minimize bugs, testing for them, and, finally, eradicating them. Debugging code is a three-part process. First, you can prevent bugs by understanding the code as thoroughly as possible. Use clear, uncomplicated code with good comments, and no sneaky tricks. Second, write code that looks for bugs aggressively. Use Debug.Assert and #If DEBUG to look for bugs in debug builds without affecting performance in release builds. You should use an abundance of tests, so you can detect bugs as close as possible to the code that caused them. Finally, you need to take care to fix the bugs you find carefully. Consider the consequences of your actions before you start ripping the code apart and slapping on patches. Test the code after you make changes. Then learn from the bugs you find. Add new offensive code to catch these types of bugs in the future. Every programmer creates bugs once in a while. Good programmers don't let the same bug bite them again and again.

You might be unable to find and remove every bug in a large (or even not so large) application, but you can use the approaches I'll lay out to make your code robust enough so that users might never know about their existence. Unfortunately many developers take the approach that their programs work correctly until proven otherwise. This sounds like a crazy approach when stated so bluntly, but it's an easy trap to fall into. When you write a small piece of code, it's natural to assume it contains no bugs. After all, if you knew there was a problem in the code, you would have fixed it. So why waste time looking for non-existent bugs that you "know" aren't there when you could be adding functionality that makes your application better?

A bug represents a misunderstanding, a mismatch between what the programmer wants the program to do and what it actually does. This can be as simple as the programmer initializing a variable incorrectly or as complex as writing an incorrect scheduling algorithm, but it's not really the program's fault. The computer does exactly what it is told to do, except in extreme cases such as floating point math bugs and so on. It's nearly always the developer's imperfect understanding of what he or she thinks the code will do versus what the code actually does that causes the problem.

It might also be the case that a programmer doesn't understand what the program is supposed to do or how other pieces of the program fit together. All these problems boil down to the same thing: the programmer doesn't understand completely how the program and its environment interact together.

At a high level, the solution to these problems is simple: Give the programmer a better understanding. In this article, I don't discuss requirements gathering and validation or architectural design; I assume the programmer has a decent understanding of what the code is supposed to do. Instead, I focus on techniques for better understanding the code.

Comment Often and Well
The first, most essential thing to keep in mind when you write code is that you're writing code for people, not just for the computer (Sidebar, "Top 10 Tips for Writing for People, Not Computers").

Top 10 Tips for Writing for People, not Computers

A computer does exactly what the code tells it to do, and it doesn't care what the code looks like. It doesn't care about alignment, comments, or documentation. In fact, it doesn't care whether you're writing in Visual Basic, C#, Java, or assembly. At some level, all of the code gets interpreted or compiled into machine code, and that's what the computer executes. If you were writing code for the computer, you would write in machine code.

At the same time, it's easy to forget that you write code for people, not computers. Ultimately, you are satisfying the requirements of a user somewhere, not just writing text that will execute on a computer. It's also easy to forget that you aren't writing code in a vacuum. In a development environment of any size, the code you write will be reviewed and or maintained by other people, so you need to make sure that you are clear about your intentions. Poorly commented code can make it enormously hard for other people to figure out what your app is supposed to do, much less how it is supposed to go about doing it.

Following these ten steps and keeping in mind you're writing code for people, not computers, can help you reduce the number of bugs in your code significantly.

Use an abundance of comments to explain what the code is doing and why. The reader should never need to guess what the code is doing. Debugging code should be a simple matter of comparing the code's purpose to its behavior, not an IQ test.

Use good variable names. Use Hungarian notation if required, but make the rest of the name meaningful. Good names basically give you "free" comments.

Use constants or enumerated types instead of magic numbers and hard-coded strings. The best part about this approach: It's another way for you to incorporate free comments.

Plan before you code. Many developers just sit down and start typing, but it's always better to at least sketch out a design before you code. Designing and programming can help you think about the code in two different ways, essentially giving you two perspectives on the code.

Make sure code lines up nicely to show its structure. The IDE helps but it can still do some weird things, particularly on continued lines.

Avoid sneaky tricks. If you need to use a complicated piece of code, comment it thoroughly. For example, some developers like to maximize performance, even to the extent of relying on undocumented features of the OS, and so on. But this code is often the most troublesome code to maintain because your super-fast algorithm for handling something can also be extremely fragile. Sometimes things are undocumented for a reason, and when the OS changes, your code will blow up. It's also the case that extremely complicated code is more fragile and more error-prone by nature. If you must use such code, give the person who maintains the app a chance to rectify problems if the code fails at a later time, for whatever reason.

Make routines small and with a well-defined purpose. If you can't describe a routine's purpose in one or two sentences, break it up into smaller pieces. Keeping routines compact and discrete makes it easier to return to the code later and make adjustments.

Keep lines of code small enough to read without scrolling far to the right. If a line is too long, continue it on the next line.

Limit variable scope. The more limited a variable's scope, the easier it is to see what it is doing.

Minimize side effects. Routines that modify their parameters can be extremely confusing because the change is not obvious from the outside. Modify parameters as seldom as possible, and use comments to make it obvious whenever you do so.

You can make understanding your code easier by adding insightful comments. These not only help others understand your code, but they help you clarify your ides as you write the code. They can also help you understand the code later if you need to modify or fix it.

It is well known that the longer a bug remains in the code, the harder it is to fix. One reason for this: The longer it's been since you wrote the code, the less you remember how it works. To fix the bug without breaking anything else, you need to regain the understanding you had when you originally wrote the code. Comments can make that easier.

I've heard developers argue that any comment that isn't essential is distracting and should therefore be removed. However, comments aren't most important when you are trying to understand what the code is supposed to do. They are most important when something is wrong and you need to compare in detail what the code is supposed to do against what it actually does. In that case, explicit, and detailed comments are critical. It's easier to add extra comments and ignore them when you don't need them, rather than skimping on comments and trying to figure out exactly where the code is going wrong without them.

Consider the approach taken on one project I worked on. We commented the code as normal, but, way off to the right, so the comments didn't distract you during normal reading, we added extra comments explaining what every non-trivial line of code did. You couldn't even see those comments unless you made the screen really big or scrolled to the right,. They were there if you needed them for debugging, however.

After development, we transferred the project to a maintenance group that had the philosophy that any comment that was not absolutely necessary was not allowed, and they removed all the extra comments. About six months later, they discovered that the code was not maintainable because they couldn't figure out how it worked. They'd make a change and the code would break. I'll give you three guesses as to why.

The relatively new XML comments and attributes such as Description and Category serve as comments for other developers, which makes it easier to understand your code's usage. You can use these features to help other developers understand your code better.

So far, I've covered ways to help make your code more understandable to other developers. Working in teams is one of the best ways possible to reduce the number of bugs in your code. For example, pair programming makes two developers look at the code. Two sets of eyes looking at the code are more likely to find trouble spots and help each other understand the code better.

Code reviews perform a similar function, ensuring that more than one person reads the code. It's fairly easy to breeze through a routine and see what you think it does, instead of what's actually there. It's much harder to explain the code to someone else if it doesn't do what you think it does. In fact, just preparing for a code review makes a programmer think about the code differently, and this process often exposes potential bugs.

Detect Bugs More Easily
Suppose you've followed good design practices, made a plan before coding, had design and code reviews, and you've written code for humans, rather than the computer. The bugs are still there. They are fewer in number than if you hadn't followed these practices, but you can bet there are a few still lurking about, waiting to pounce under the right circumstances.

Now your goal is to find as many of those bugs as possible, as quickly as possible, and as close as possible to the code that's causing them. All too often, the first indication that a problem exists with an application comes in the form of a bug report from a user or customer. By the time a bug is detected, thousands of lines of code might have fiddled with the data, so the source of the problem can be hard to pin down.

Some of the techniques you can use to find bugs as quickly as possible include offline testing and offensive coding.

You should test new code as soon as you write it. You should also test it as thoroughly as possible before you release it for integration with the rest of the application. It is surprising how many developers release their code with little or no testing. Sometimes this is partly due to management or cultural issues. It's natural to want to move on to write new code and achieve more milestones, but it's better to spend a little extra time now, so you don't have to spend a lot of time debugging later.

There are several different kinds of tests you can use to flush out bugs in your code. In exhaustive testing, you feed every possible input to a routine and verify the outputs. Unfortunately the number of legal inputs for most real applications is so huge this isn't practical.

In black box testing, you pretend you know nothing about how the code works. You dump a bunch of inputs into the routine and verify the outputs. To ensure that the tests find as many bugs as possible, you need to use a wide variety of inputs. Try a variety of valid and invalid inputs to make sure the code behaves as expected. Look for "natural trouble points" such as empty arrays, null references, and missing values.

In white box testing, you are allowed to peek at the code inside the routine and design test inputs to try to break the code. If you know that the code uses a particular data structure, try to pick inputs that won't fit into it easily.

Finally, try a bunch of randomly generated inputs if possible. Sometimes it can be hard to verify the results from random inputs (for example, verifying a face recognition algorithm might require you to review images manually). But if you can, write a program to generate random inputs and verify the results. Then you can run thousands of tests.

These different testing methods work best against different kinds of bugs, so it makes sense to use all of them. Write test modules that parallel code modules and keep them around for later use in case you need to retest the code.

Try Offensive Coding
The idea of defensive coding has been around for at least 30 years. The idea is to make the code use the fewest assumptions possible about its data, so the program doesn't crash if the data is wrong.

For example, assume the variable shirt_size should have one of the enumerated values Sizes.Small, Sizes.Medium, or Sizes.Large. The following code shows a defensive Select Case statement that examines the variable. The first two Case statements check explicitly for Small and Medium, and the Case Else statement handles Large (as the comment notes):

The Case Else section also handles any unexpected values that might sneak into the data. If you add the new value ExtraLarge to the enumerated values, but forget to update the Select Case statement, at least the program won't crash. Similarly if a bug in some other routine sets shirt_size to a numeric value outside of the enumeration's bounds, the program still won't crash.

This code does something with unexpected values, but it doesn't necessarily do the right thing with them. The program won't crash, but it might do something stupid, such as shipping a Large shirt when the customer ordered ExtraLarge or charging the Large price for an ExtraLarge shirt.

Worse, the program gives you no hint that anything is wrong. The error could go unfixed for a long time, during which time you are undercharging customers and sending them the wrong sizes. When a customer does complain, and you realize that there's a problem somewhere, you have no indication that this Select Case statement is causing trouble. You'll have to start a long debugging session, stepping through the code to figure out where things are going wrong.

A better approach is what I call offensive programming. Instead of making the code depend as little as possible on the data, it depends as much as possible on the data. It goes out of its way to look for trouble, trying to find data that doesn't fit its assumptions. If it finds a value that doesn't make sense, the code immediately brings it to the developer's attention. Instead of sweeping the problem under the rug as defensive programming does, offensive programming shines a spotlight on it.

This code illustrates an offensive version of the previous Select Case statement. If you add ExtraLarge to the Sizes enumeration, this code throws an error the first time it sees the new value. That gives developers a much better idea of where the problem first occurred, so tracking down the bug is much easier:

Defensive programming does have the advantage of not allowing the program to crash, however, and that's still a worthwhile goal. Throwing an exception is useful for developers, but not so useful to end-users.

Instead of always throwing an exception when it encounters an unexpected value, this code checks the DEBUG environment variable to see if it is running in a debug or release build. If this is a debug build, the code throws the exception. If this is a release build, the code takes pity on the poor user and performs some default action. This should include notifying the developers of the bug somehow, perhaps by e-mailing a bug report, and possibly telling the user that the size they want is not currently available.

To set the DEBUG environment variable, open Visual Studio's Build menu and select the Configuration Manager. In the "Active solution configuration" dropdown, select Debug or Release.

Verify Data
Even offensive coding doesn't always find bugs as quickly as possible. In the previous example, the variable shirt_size might have been set incorrectly much earlier in the program and only detected when the program reached the Select Case statement. In the worst case, shirt_size might have been set incorrectly by another routine a long time ago in an unrelated code module or library. In that case, it could take you a while to trace the bad data back to its source.

You can localize bugs further by adding extra data checking code instead of waiting until the program needs to use the data. You could place this code just about anywhere, but you should focus on places where the data is likely to be modified.

One natural place to put data checking code is at routine entry and exit points. For example, when a subroutine starts, it can validate its input parameters, any other data that it will need to use, and any assumptions it needs to make.

As is the case with offensive programming, you might want the program to react differently at debug time and release time. For example, you might want it to throw an error if it sees suspicious data while you are developing the program. However, you might want the program to take some default action and continue anyway when you release the program to customers.

To make the program behave differently in debug and release versions, you can use #If DEBUG statements, as before. For simple validations, you can also use the Debug class's Assert method. Debug.Assert checks a Boolean condition and throws an exception if the condition is false. For example, this code illustrates how a routine might verify that the shirt_size parameter is between Sizes.Small and Sizes.Large in Visual Basic:

In debug builds, Debug.Assert performs its check normally. Calls to Debug.Assert are omitted from release builds, so they won't stop the program when end-users run it, don't increase the program's size, and don't affect the program's performance.

You can still use compiler directives such as #If DEBUG to manage more complicated tests. For instance, if you want the code to verify that the items in a large array are sorted, you can surround the code with an #If DEBUG statement so that validation is skipped in release builds.

Developers often avoid this kind of exhaustive validation by claiming it affects performance. By using Debug.Assert and conditional compilation directives, you can have your test—and performance, too.

It's also easy to claim that these checks are unnecessary because you tested the code before using it (assuming that's even true). If all of the code is tested properly, why should you keep verifying that the results are correct? The answer is that bugs are inevitable. Even if the code was correct at one time, there's no guarantee that it will remain correct. The more often you validate the data, the faster you will catch and fix bugs when they arise.

Eradicate Bugs
Once you know a bug has occurred, how do you figure out exactly what caused it? If you're lucky, offensive coding and data verification detected the problem close to its source. You can backtrack and the most recent statement that modified the data is usually the culprit.

If you're less lucky, you might have to step through a lot of code looking for the bug. Usually this happens when your offensive coding and data verification tests are not specific enough and missed the corrupt data. For example, you might have verified that a group of order numbers was sorted correctly, but perhaps you didn't notice that they were all the same.

In the short term, you can step through the code, examining the data until you stumble across the code that changed the data incorrectly. Visual Studio includes terrific tools for stepping through the code, examining data (IntelliSense even lets you drill down into complex data structures), changing variable values, calling routines, setting properties, and generally nosing around, looking for trouble.

Learning how to use these tools is largely a matter of finding out what they can do and practice, so I won't cover them in depth in this article. To learn more, check out the Visual Studio online help and experiment with the IDE, particularly the commands in the Debug menu. Note that there are two tools that are underused significantly by many developers: watch windows and breakpoints.

Watch windows let you display the values in key variables as you step through the code. Simply right-click on a variable and select Add Watch. Often, stepping through the code while watching a key variable can show you exactly where the problem is. The Locals and Autos windows (look in the Debug menu's Windows submenu) also let you view variable values easily as you step through the code.

No doubt, you know how to set a breakpoint on a line of code. However, you might not realize that you can place conditions on a breakpoint so it stops execution only under certain circumstances.

Let's give this a try. Set a breakpoint as usual (for example, click on the margin to the left of a line of code). Next, right-click on the breakpoint symbol to see options for modifying the breakpoint. Use the Condition command to make the breakpoint depend on some condition. For example, the breakpoint might stop execution at a particular line only if a variable holds a certain value or if its value has changed. If you design conditions properly, breakpoints can pinpoint the bug for you.

Use the Hit Count command to make the breakpoint stop execution when the line of code is reached a certain number of times or a multiple of some number of times (for example, every five times it is reached). This command also lets you reset the line's count. The When Hit command lets you make the breakpoint display a message or run a macro when it fires instead of simply stopping execution.

Plan Your Bug Fixes
After you have located a bug, don't blithely change the code to fix it. First, make a plan and think about the consequences of your changes. Make sure you understand how the code is supposed to work, how it actually works, and how it will work after you make the change. Remember that bugs are caused by not understanding the code. Make sure you understand the situation before you act. Changes to old code are far more likely to introduce new bugs than writing new code is, so think before you make matters worse.

After you fix the bug, test all the code! It's amazing how many programmers change code to fix a bug, possibly verify that the particular error condition is now accounted for, but don't test other cases to make sure nothing else is broken.

Rerun the tests that you wrote for this code originally. Then think about the bug you fixed and how you might detect it in the future. Add new tests to the code using either Debug.Assert or #If DEBUG to uncover that particular type of bug should it arise again. If you found a breakpoint useful in finding the bug, consider making it permanent with more tests that use either Debug.Assert or #If DEBUG. Add similar test code to other pieces of code where similar bugs might arise. It's bad enough to spend time fixing a bug once, but it's far worse to have to waste time finding the same kind of bug over and over again.

Taken as a whole, the steps I've laid out in this article can help you reduce the number of bugs that make their way into your application. More importantly, these steps can help you catch and eradicate the bugs that do get introduced more easily. Proper commenting practices are also important when someone needs to overhaul your application two or three years from now, if not further down the road, and at a time when you might not be available to answer questions or issues that come up. Designing your app properly can make a significant difference in how long it can continue to fulfill its role after you've released it into the wild—well, to IT, at any rate.