This was a debugging code to print log message that was meant for tracing what was going wrong in the program. This program crashed with a NullPointerException because the code did not check for null before accessing dataType or getCategory()! In other words, the code that was meant for debugging introduced a bug, which prompted me to write about this topic in this post.

It is a good practice to save debug information and trace information in log files. When the application crashes in customer site, if log files contain all relevant information, it is possible to trace the cause of the problem by just analyzing the log files.

A major problem in logging/tracing messages is that it generates huge data (sometimes, in the order of few GBs!) and it is easy to get lost in details. A practical approach is to introduce multiple trace levels which is useful for trouble shooting and debugging. In case of multiple threads, there needs to be some way of matching trace messages originating from a given thread of control (and process as well). Otherwise, if no such identification and time stamping of trace messages is available, it will be difficult to make use of an exceptionally large trace file. For this reason, some projects develop custom tools for processing the log files and report the problems! Also note that log files need to be removed periodically, or if the size grows more than the “allowed” size. I know about an application which used to crash often because the log files became so huge that no more data could be written to it.

Sometimes, when debugging code is added to a working system to understand how it is working, the debug code can introduce new bugs. For example, we can add test probes for getting intermediate values at fixed locations in code for diagnostic purposes. This process of introducing test probes can introduce subtle timing errors, particularly in embedded systems code where response time is critical. In other words, the process of examining the system can alter the system!

Debug code can also introduce security issues. In 1988, when Internet was in its early stages, a worm affected around 5% of computers connected to the Internet. The worm affected only Sun and VAX machines. It collected host, network and user information and then based on that information, it broke into other machines. The affected machines were overloaded with unknown processes and killing the processes did not help. Rebooting also didn’t solve the problems. It was later found that the worm exploited three different vulnerabilities in Unix systems: buffer overrun vulnerability in fingerd, debug mode of sendmail program, and accounts with weak (or no) passwords. Our interest here is about the attack on sendmail program by exploiting debug code. The worm would send a DEBUG command to sendmail program and fork a shell (sh); it would use that shell to download and compile new worms and thus spread the worm. Why did sendmail allow DEBUG code? It was provided to allow testers to verify that mail is arriving at a particular site without the need to invoke the address resolution routines! For more details about this worm, check this article online: “Crisis and aftermath” (E. H. Spafford, Communications of the ACM, June 1989).