Monthly Archives: May 2006

Post navigation

At each development stage, an application has some new or updated features that will need to be tested thoroughly, beyond a quick execution of the program. Certainly, the code should be pretty solid after having passed through some of these tools, but there is still no guarantee that the results produced are actually correct, except to the extent that they are manually checked.

It is very important that you test your application to make sure that it withstands unusual input and produces correct results, or fails gracefully, especially if your software can be used for mission critical operation. This will often involve checking more input and output than a team of testers can conveniently generate, so this is where automated testing tools can help you with quality assurance.

One type of automated testing tool interacts directly with your source code and automatically generates special code, known as a “test harness”, which deliberately throws unusual parameter values at routines and monitors the results to make certain that the routines handle unexpected values reasonably. These tools have a number of different configuration options, but their general nature prevents them from having specific knowledge about a particular program.

Another type of automated tool interacts with the interface of a program, essentially providing a somewhat more sophisticated approach to what we use to call “keyboard testing,” which was just banging randomly and rapidly on the keyboard in an (often successful) attempt to crash or confuse the program. This type of testing is more appropriate for some types of applications than others. We have never investigated using this approach for testing our games, though a young child is a good substitute.

Developers can, and should, provide this type of glass box testing for their own products. You can write test harnesses that explicitly call routines with certain parameters and check for valid results. One excellent method for doing this, especially during optimization, is to have two separate routines that use different techniques for generating the desired results, and then run both routines, comparing results. This also allows you to profile both routines under the same conditions and ultimately use the better one.

For interface testing, you can use a standard macro recorder, software that records and can replay keyboard and mouse input into a program. Although this does not allow for random actions, it does allow a test sequence to be developed and verified on a regular basis. Also, testing an application with a macro recorder makes it possible to reproduce bugs simply by using the macro.

The most powerful programs for glass box testing include source code analysis, runtime checking, and automated testing tools. These are not generally included in compiler packages, so they need to be obtained separately, and can often be somewhat expensive.

Source code analysis tools, better known as “lint” tools in C and C++ development, are utilities that examine your source code and produce warnings for potential problems. The output is similar to that from a compiler, except that the tool performs deeper checks, even emulated code walkthroughs, and has a larger and more specific set of issues to check.

A decent source code analysis tool would likely be your best investment of any glass box testing tool. Unlike a compiler, which merely needs to produce object code for a specific platform, a lint tool can check for a whole range of problems, from portability to standards compliance, and some coding guidelines. The details of potential problems can even help a programmer to better understand nuances of the language.

Lint tools produce many more warnings and errors than a compiler, but they also provide great flexibility to disable individual warnings, even for specific lines of code. It is unlikely that a non-trivial program could pass through such a tool at the highest level without warnings (and sometimes thousands of them), but each issue or type of warning identifies a pitfall that can be considered and resolved.

When developing, I run source code analysis on a regular basis to catch potential errors that the compiler missed. In this way, I can remain confident that my code is relatively free of silly errors, so I can instead concentrate on the logic of the overall code, not individual mistakes. Also, anywhere that my code does something unusual, there is, by necessity, a comment indicating a suppressed lint warning.

Another way of performing some rudimentary source code analysis, especially for a cross-platform project, is to compile the source code under two different development environments. It is somewhat inconvenient, particularly during the initial setup, but if code can build and work correctly from two different compilers, chances are pretty good that the code is solid.

Runtime checking tools include a variety of programs that automatically monitor the behavior of the program as it executes. Often, these tools check memory or resource usage, but they can also watch for invalid pointers and range errors, verify API parameters and return values, and report on code coverage. The most common benefit of these tools is to identify memory and resource leaks.

A comprehensive runtime checking tool serves as an ideal supplement to a source code analysis tool. While the latter catches potential problems with the code itself, the runtime checker highlights problems with the logic of the application during execution. Some tools can insert extra code and information during the build, in a process known as “instrumentation”, and this improves the runtime testing even more.

One issue with runtime checking is that it tends to slow program execution significantly, so it is definitely not intended for a release version, nor for every debugging build. Nevertheless, like other testing techniques, it is best to use the available tools early and often. The earlier a bug is detected and identified, the easier and less costly it will be to fix.

In my development process, I use my source code analysis tools after writing or modifying no more than a couple of routines. I use my runtime checking tools, at the highest detection level, after every major feature update, or before every delivery to a client. This glass box testing takes place in the background while I do black box testing of the application and, especially, new or updated features. If any problems appear, I address those problems right away before considering the feature to be done.

Most development environments include a debugger, which is an essential tool for producing quality software. However, the function of a debugger goes well beyond merely helping to find bugs. Some programmers do not regularly use a debugger, or only use one to help locate “that tough bug.” If you fall into this category, I strongly urge you to familiarize yourself with a debugger and integrate it into your standard development process.

Using a debugger for code assurance is another form of glass box testing. It is most powerful when used to perform live walkthroughs of program code. You can manually step through your code, examining variables, and make sure that it is performing as expected. There is no better way to assure yourself that the program is performing correctly than to actually watch it. It also helps identify situations where an errant value could cause problems.

To put this capability to work for you, set a breakpoint at the beginning of each new routine. When the breakpoint triggers, step through the code line by line, confirming that the variables are correct and that the process produces the desired results. Some authorities recommend setting a breakpoint at every single code path, only removing a breakpoint when the path has been thoroughly tested. I must admit that I find this to be overkill in some situations, such as where the function is simply returning an error code, but I do this for all significant branches.

Another glass box testing tool that is often provided with common development environments is a profiler. A profiler is a tool that takes time measurements of a running application and then provides performance statistics for modules or specific functions. This is useful for identifying performance bottlenecks and functional irregularities in a program.

There are two important metrics provided by most profilers, function time and execution count. The function time shows how much overall time was spent in a function (or module), which gives an indication of where any performance delays may be. The execution count shows how many times a function was called, and occasionally this highlights an unexpected problem if a routine is being called too often.

Together, the time and count metrics help show where a program can benefit from optimization, and it is useful to have this information. However, unless there is a serious problem, it is best to wait until all program functionality is complete before attempting to optimize. There is a term in the industry for unnecessarily modifying code for performance before having functionality: “premature optimization”.

There are more powerful profilers and debuggers available from third-party suppliers, but I recommend getting comfortable with the capabilities and features, as well as drawbacks, of the tools provided by your compiler vendor before evaluating expensive alternatives. The quality improvement to be gained by using any debugger far outweighs the incremental benefit of switching to a more powerful tool.

The best place to start a discussion of practical testing is with the development environment and the tools that you are already using. Rather than a comprehensive discussion of programming practices, which would be a book by itself, I will concentrate on using these tools to facilitate the testing work.

Generally, a development environment for a compiled language consists of a text editor, a compiler, and a linker, plus other tools useful during development, and often these are all part of a single IDE (Integrated Development Environment). This compiled development environment is assumed for this discussion, though there are analogous approaches in other environments.

The first step in producing quality code is to understand your development environment. Although there seems to be a growing trend towards hiding or automating functionality, it is nevertheless important to know what the make or project files are doing, and what options are available and being used. You need to how things work in the case that, “It Just Works,” fails.

Assuming you understand how the development environment works, you can begin actually programming. Once a certain amount of code is written, you try to build (i.e., compile and link) the executable. This is, in fact, the most basic form of glass box testing. If there are problems in the source code, or the build environment is not correct, then warnings or errors will be generated.

To make the best use of this functionality, modify the settings in the make or project files to set the highest warning level available. This may produce lots of extras warnings on some projects, but compiler warnings exist for good reasons. A warning almost always indicates that there is an immediate problem or, at least, a lack of clarity within the code that reduces the ease of maintenance and could cause future problems.

Many compilers include an option to treat any warnings as errors, and I recommend enabling this option. Warnings should never be ignored, and this prevents that from happening. Instead, source code should be corrected to eliminate warnings. This may seem like obvious advice to some readers, but my experience working with code from other programmers shows that many programmers routinely ignore warning messages during compilation, a dangerous practice that is contrary to quality development.

Taking this checking one step further, build the program frequently. This allows you to catch the warnings as they are introduced, rather than having a collection of problems at the end of a large coding session. Some warnings indicate problems that may need to be addressed by technical design changes, and it is good to find these problems early. Personally, I rarely write more than a short function between builds.

Black box testing should also be used in the early stages of development, even when features are incomplete. Running the executable regularly helps make sure that errors are not introduced or, when they are, catches them at an early stage. For incomplete features, you can hardcode values for testing, or just assure that the program behaves as expected, considering the missing code.

[This article was originally published in the December 2002 issue of ASPects.]

“Come, give us a taste of your quality.”

When Shakespeare penned these words for Hamlet, he was referring to the calling of those to whom the comment was directed. In this context, quality is literally ones profession. As software developers who, for the most part, can either succeed or fail based on the quality of our work, this interpretation still holds true.

In the first part of this article, I discussed planning for quality software development, classification and tracking of bugs and unimplemented features, and some basic quality assurance concepts and terms. In particular, I explained black box and glass box testing, some methods for which we will explore in this part. As a simple reminder, black box testing is, essentially, user testing of the program itself, while glass box testing is developer testing of both source and object code.

You have been waiting long enough, so it is time to put these ideas into practice.

To this point, we have chosen a project, determined what we want to accomplish with the end product, and proved in the design document that we can meet our goals with it. We have recommitted to developing a quality product and we have devised our own system for tracking the features and the bug reports that will inevitably be received, and some of us have even populated the software with features from the design docs.
In short, we have established a strong foundation for quality software development.

Now, all of the real programmers among us are jumping up and down yelling, “Let’s get to the coding already!” With this foundation, we are in a great position to get started with the actual programming. Unfortunately, the discussion of specific glass box and black box techniques, including beta testing methods, will have to wait for the next installment.

I have some coding to do.

Gregg Seelhoff is an independent game developer who serve[d] on the ASP Board of Directors. He can be reached at seelhoff@sophsoft.com.

Classification of discovered and reported bugs can be beneficial in determining priorities for development resources. High priority bugs need to be addressed as soon as reasonably possible, while suggestions should probably wait. In my experience, most companies, including Microsoft, use four or five classifications, similar to the following:

1 – Severe error – This includes program crashes, errors which damage data or interfere with the operating system, or anything that prevents further testing. These types of bugs are known as “showstoppers” and, though hopefully infrequent, are urgent issues.

2 – Functionality impaired – This includes any type of bug in which the program does not perform as expected and has a detrimental impact on the ability to use the program.

3 – Minor issue – This includes bugs which are cosmetic, such as spelling errors, or those that do not significantly affect the ability to use the program efficiently.

4 – Suggestion – This includes any suggestions for unplanned and non-essential features. These should generally be implemented only later in development, and changes to address these issues should be reflected in updated design documentation.

5 – Postponed – This includes any suggestions or, in some cases, problematic bug fixes that are not likely to be in the current release, but that should nevertheless be retained for future consideration.

In parallel to the above classification of reported bugs, we also use a similar scale for prioritizing product features:

1 – Essential – These are features without which the program cannot ship.

2 – Important – These are features which should be in the program.

3 – Desired – These are features that we want to be implemented, if possible.

4 – Extra – These are features that would be nice to incorporate, if there is time.

5 – Wish list – These are features that will probably have to wait for a later version.

Despite the fact that the two classification systems are similar, we generally prioritize bug fixes (levels 1-3) before implementation of new features. The only times in which a bug fix is postponed temporarily are when a feature implementation is incomplete and needs to be finished first, when an upcoming feature will obviate the bug, or when the bug is so minor that delaying the fix has no discernible adverse impact on the product.

As you begin development, it is important that you devise a method for tracking and managing both feature development and bug reports. Throughout the process, priorities invariably change and bugs will be uncovered through testing, and if you do development work for somebody else, new feature requests and change orders need to be handled. A good system will help keep the development in perspective and prevent issues from getting forgotten.

For tracking bugs and features, there are a number of software options, some of which are very expensive. Some ASP members produce reasonable software for tracking product features and bugs, so check the download site for some software designed for that purpose. I personally prefer a simple solution, so my feature and bug tracking is done with WordPad and one physical file folder. Another intriguing approach, which I learned from the member newsgroups, is to use a deck of index cards, one for each issue, and remove items physically when they have been resolved.

As soon as program development commences, you can, and should, begin testing the software in various ways. Once you have adopted the attitude that producing a quality product is paramount, incorporating the testing process into development is the single biggest step that you can take to achieve the desired result.

At this point, it will be very useful to define some terms. The one known to most software users is “bug”, which refers to an unexpected problem with software (as opposed to a design decision or “feature”). The term comes from the early days of electronics, but discovery of the first computer bug is attributed to Grace Hopper, who, in 1945, extracted a moth from a relay that had apparently beaten it to death. She pasted it into a logbook, and now this original bug resides at the Smithsonian Institution, still attached to that same page.

In searching for bugs, there are, essentially, two different kinds of testing, known as “black box” and “glass box” (or “white box”) testing. Black box testing, as the name implies, is testing which treats the software as a black box, checking only the output based on the provided input. Glass box testing, on the other hand, is testing which makes use of special knowledge or access to the inner workings of the software.

Black box testing is the most common form of software testing, and is what most people mean when they use the word, “testing”. One simply runs the software to see if it does what it is supposed to do. If the program does something incorrect or looks wrong, then there is probably a bug, and with some luck, the developer will get a defect report and be able to fix the problem.

Glass box testing methods can significantly decrease the number of software defects and improve software quality. This is especially true in development environments in which glass box testing has not been used previously. There are a number of glass box techniques that can be utilized as development is underway to help identify and eliminate bugs before they reach the black box stage.

Another couple of terms that are commonly used with regard to testing are “internal” and “external”. Internal testing is any form of testing in which the process is conducted and controlled by the developer, without distribution of the product in any form. External testing is any form of testing in which the product is distributed, whether widely or in a limited way, for testing purposes.

The quality process begins with the initial product concept. From its inception, the development of a product must be planned. Basically, if you do not know where you are going, it is unlikely you will get there. If you do not know the route, you may get lost along the way and probably will not get to your destination efficiently. Neither situation is conducive to quality software.

The tools for helping software development maintain the correct direction throughout a project are design documents. Design documents should do exactly what they say: document the design of a program. The actual contents and approach vary greatly depending on the type of software and the author(s) of the document, but they should provide a written roadmap for the project.

One very important aspect of design documents, and a reason why even lone developers should use them, is that they provide the opportunity to discover flaws in the planning before time is wasted on unproductive activities or projects. When one is forced to document the design, one must think through all aspects of the project. If the project is not viable, for whatever reason, it is better to find out early, rather than too late.

From a quality perspective, the most important aspect of a design document is what is known as the technical specification, which describes in detail the technical aspects of the project, including the program interface and any complex procedures. During the documentation of these items, potential pitfalls can be identified and either corrected by a design change or noted for future consideration.

An interesting approach to interface design specification is to write the interface portion of the help file first, before the programming begins. Again, this forces a developer to consider any issues with the design and usability before time is spent on implementation. Simply, if a procedure is too complex to be easily described in a help file for the end user, then the interface is probably overly complicated to use.

The main goal of design documentation is to determine what the end product should be. This is a “living document”, which means that it should be revisited regularly and, if necessary, changed to reflect any new direction. The design document can help keep the project on track and avoid feature creep or inconsistent functionality. Ultimately, it should be the basis of your software testing plan.

[This article was originally published in the November 2002 issue of ASPects.]

It can be known as Quality Assurance (QA), Quality Control (QC), ISO9000, or simply “testing”. Regardless of the name used and the particulars of the implementation, it is essential to have a process for making sure that your products exhibit the highest possible degree of quality.

In my previous article, I mentioned the need for testing three times in the same sentence; such is the importance of this topic. The creation of a quality product begins and ends with testing, so it is worthwhile to discuss the different types of testing that can be incorporated into the development process. Although my focus is on development of computer software, much of the process applies to services and other kinds of projects.

The quality process can range from having a fully documented and certified procedure for assuring quality, complete with personnel devoted specifically to maintenance of the process and rigorous testing of the product, down to the lone developer who compiles an executable, runs it once, and then distributes it. It is easy to tend towards the lower end of this scale, through either ignorance or laziness. As independent developers with limited resources, it is probably unrealistic to be at the absolute top of the scale, either. However, we must strive to find a comfort level as close to the top as possible.

When I wrote my first line of code, more than two decades ago, nobody told me about quality, nor even about bugs. The only test was whether the program did what I wanted when the user did what I had anticipated. This changed rather quickly, though, when I got to sit in a computer lab and watch high school students deliberately try to break our games in order to ridicule our capabilities. The challenge became finding ways to make our programs more resilient to these attacks. Though the motivations may be different, today our software is similarly used and abused by our potential customers.