Bug Prevention with Code Generation: A J2EE Case Study

Imagine you are in a renowned restaurant. You find a bug in your dinner. You ask for the restaurant manager. As politely as you can, you explain that you'll never show up again in that restaurant.

Imagine you're in another renowned restaurant. You don't find any bug. Actually, there are bugs in the kitchen, but you don't know it, the restaurant manager doesn't know it, the kitchen personnel doesn't know it. Does this second scenario make you feel more comfortable, with respect to your dinner?

What if you were a restaurant manager and a patron tells you that there are bugs hiding in the kitchen? The customer claims he can both prove the existence of these bugs and remove them. You may be skeptical, but are curious enough to let the customer go ahead. When the customer provides evidence that there are bugs in the kitchen, would you quickly dismiss him, or invite him to explain how his bug prevention remedy works? Similarly, in software engineering, we should be careful not to ignore or reject new tools and techniques that can help us build better software with fewer bugs.

This article describes how the alpha version of a new tool discovered four harmless bugs in a leading J2EE sample application, by means of a technique that only recently has started to receive deserved attention from the software engineers community: a technique called code generation.

Code Generation

The former meaning traditionally associated with "code generation" referred to the stage of compilers when the source code gets actually translated into assembly language. The current meaning broadens the scope of the term to encompass the production of files containing what is normally considered source code.

Once upon a time, assembler was the source code. Then compilers turned it into generatable code. The new wave of code generation reapplies, in the large, that same conceptual jump: Java, C#, PHP, COBOL, Pascal, XML, HTML, JSP, ASP, Fortran, CORBA IDL, assembler, Perl, Python, Ruby, and so forth can all be considered generatable languages, provided that you have appropriate tools to produce them. This is conceptually similar to compilers that enabled the automatic generation of assembly code.

The best site on the Internet to learn about this new wave of tools is the Code Generation Network (CGN), whose editor is Jack Herrington, author of book Code Generation in Action (CGiA). There you can find a database on available generators, a number of interviews with code generation experts, a list of recommended
books on this fascinating subject, and more. Among other articles, Jack Herrington also published "Code-Generation Techniques for Java" on ONJava.com. With CGiA, you can also learn to write your own generators, if you want to. That's a powerful technique that, once mastered, pops up in your mind very often to solve repetitive problems. Quoting from CGN, pragmatic engineers can get higher quality, consistency, productivity, and abstraction using code generation.

Most software engineers have at some time in their careers handcrafted their own specialized code generators, even without naming their tools "generators." But the new thing, now, is that these tools are getting out of their childhood and growing up to become serious professional tools. And there are even pioneers that deliver no-frills custom generators as a service.

From the Code Generator Building Lab: Prototype of a J2EE 1.4 DAO Generator

Although the concepts in this article apply broadly, they are based on the author's experience with code generation at Somusar. For us, J2EE appeared to be the most challenging field to begin applying the new tools and skills, in particular, because J2EE 1.4 was getting ripe. Setting up a J2EE lab using Sun Microsystems' SDK is easy, as the SDK's software and documentation are really great. Note that the target of code generators is source-but-no-longer-source code. You don't need huge machines to produce that code. After all, it's all text files. So off we go, starting from the database layer and choosing the Data Access Object (DAO) pattern as the first target.

Why DAO? Because it provided a good functioning model to cast the code generator mold. "What is a software mold?" Have you ever wondered how all of the different plastic and metal objects that you use hundreds of times each day are produced? A lot of them are pressed using so-called molding machines. "But you can't mold software!" Right: you can't mold all software. But you can certainly mold a good portion of it. All in all, the mechanical properties of software are much softer than those of metal or plastic.

Writing a code generator basically requires three things:

A sample of what you intend to generate: This should be a carefully hand-written sample of code that has been tested to work in your deployment environment. Generators don't invent code: they can only learn from you how to write the code that you can write by hand. While you choose the code target, if you can, you obviously try to select a sample written by the engineers who are known to be the best engineers for that particular type of target.

A definition of the input format to the generator: This should ideally be concise and self-explaining. You don't want to spend more time programming the input to the generator than you would do programming your software by hand; otherwise, where is the return on investment?

A careful testing strategy: Not only do you need to test what happens when you change the input to the generator, but you also need to test what happens when you change the code generator. Fortunately, this is comparatively easy, given a sample of the target code: you run the generator and compare (using diff) the generated code against the target code. Then you make a copy of your generator, a copy of what it produced, and take one more step in augmenting or refining the generator. Then you again check the newly produced code against the target code and against the code generated previously, to ensure that it got better and didn't get worse. And so forth.

Figure 2 shows two examples of the generation scheme for the DAO code generator:

Figure 2. The process of DAO code generation

Lab Report: Four Dormant Bugs Found. Hand-Carved.

The steps that you take when you extend your code generator must be small. With every step, you should have your code generator produce a new, small fragment of code. This simplifies the comparison between that new fragment and the corresponding fragment in the model. Obviously, in most cases, you will find errors in your generator. The target code model is right. It must be right.

Now, wait. What is that? Where does that difference come from? Wasn't the member sequence in the constructor departureTime followed by arrivalTime? Can you please check that? Also, check the sequence in the database table. That's funny. Smells like a bug. It can't be. Not in this source from that company. They surely haven't published the code without intensively testing it.

Hey, there it is again! The SELECT fetches departureTime followed by arrivalTime, but the assignments are performed in the opposite order. Can you please run and test the application? As unbelievable as it seems, you should get departure times in place of arrival times and vice versa. Check the on-screen results against the data in the DB. No? Your on-screen data are correct? That's relieving, on one hand. I was sure that that company wouldn't publish untested code. On the other hand, it's weird. The sequence in the code is wrong. Let me check the diff again. There are more differences. Let's debug it on paper. Let's print out the original code.