Bug Prevention with Code Generation: A J2EE Case Study

Imagine you are in a renowned restaurant. You find a bug in your dinner. You ask for the restaurant manager. As politely as you can, you explain that you'll never show up again in that restaurant.

Imagine you're in another renowned restaurant. You don't find any bug. Actually, there are bugs in the kitchen, but you don't know it, the restaurant manager doesn't know it, the kitchen personnel doesn't know it. Does this second scenario make you feel more comfortable, with respect to your dinner?

What if you were a restaurant manager and a patron tells you that there are bugs hiding in the kitchen? The customer claims he can both prove the existence of these bugs and remove them. You may be skeptical, but are curious enough to let the customer go ahead. When the customer provides evidence that there are bugs in the kitchen, would you quickly dismiss him, or invite him to explain how his bug prevention remedy works? Similarly, in software engineering, we should be careful not to ignore or reject new tools and techniques that can help us build better software with fewer bugs.

This article describes how the alpha version of a new tool discovered four harmless bugs in a leading J2EE sample application, by means of a technique that only recently has started to receive deserved attention from the software engineers community: a technique called code generation.

Code Generation

The former meaning traditionally associated with "code generation" referred to the stage of compilers when the source code gets actually translated into assembly language. The current meaning broadens the scope of the term to encompass the production of files containing what is normally considered source code.

Once upon a time, assembler was the source code. Then compilers turned it into generatable code. The new wave of code generation reapplies, in the large, that same conceptual jump: Java, C#, PHP, COBOL, Pascal, XML, HTML, JSP, ASP, Fortran, CORBA IDL, assembler, Perl, Python, Ruby, and so forth can all be considered generatable languages, provided that you have appropriate tools to produce them. This is conceptually similar to compilers that enabled the automatic generation of assembly code.

The best site on the Internet to learn about this new wave of tools is the Code Generation Network (CGN), whose editor is Jack Herrington, author of book Code Generation in Action (CGiA). There you can find a database on available generators, a number of interviews with code generation experts, a list of recommended
books on this fascinating subject, and more. Among other articles, Jack Herrington also published "Code-Generation Techniques for Java" on ONJava.com. With CGiA, you can also learn to write your own generators, if you want to. That's a powerful technique that, once mastered, pops up in your mind very often to solve repetitive problems. Quoting from CGN, pragmatic engineers can get higher quality, consistency, productivity, and abstraction using code generation.

Most software engineers have at some time in their careers handcrafted their own specialized code generators, even without naming their tools "generators." But the new thing, now, is that these tools are getting out of their childhood and growing up to become serious professional tools. And there are even pioneers that deliver no-frills custom generators as a service.

From the Code Generator Building Lab: Prototype of a J2EE 1.4 DAO Generator

Although the concepts in this article apply broadly, they are based on the author's experience with code generation at Somusar. For us, J2EE appeared to be the most challenging field to begin applying the new tools and skills, in particular, because J2EE 1.4 was getting ripe. Setting up a J2EE lab using Sun Microsystems' SDK is easy, as the SDK's software and documentation are really great. Note that the target of code generators is source-but-no-longer-source code. You don't need huge machines to produce that code. After all, it's all text files. So off we go, starting from the database layer and choosing the Data Access Object (DAO) pattern as the first target.

Why DAO? Because it provided a good functioning model to cast the code generator mold. "What is a software mold?" Have you ever wondered how all of the different plastic and metal objects that you use hundreds of times each day are produced? A lot of them are pressed using so-called molding machines. "But you can't mold software!" Right: you can't mold all software. But you can certainly mold a good portion of it. All in all, the mechanical properties of software are much softer than those of metal or plastic.

Writing a code generator basically requires three things:

A sample of what you intend to generate: This should be a carefully hand-written sample of code that has been tested to work in your deployment environment. Generators don't invent code: they can only learn from you how to write the code that you can write by hand. While you choose the code target, if you can, you obviously try to select a sample written by the engineers who are known to be the best engineers for that particular type of target.

A definition of the input format to the generator: This should ideally be concise and self-explaining. You don't want to spend more time programming the input to the generator than you would do programming your software by hand; otherwise, where is the return on investment?

A careful testing strategy: Not only do you need to test what happens when you change the input to the generator, but you also need to test what happens when you change the code generator. Fortunately, this is comparatively easy, given a sample of the target code: you run the generator and compare (using diff) the generated code against the target code. Then you make a copy of your generator, a copy of what it produced, and take one more step in augmenting or refining the generator. Then you again check the newly produced code against the target code and against the code generated previously, to ensure that it got better and didn't get worse. And so forth.

Figure 2 shows two examples of the generation scheme for the DAO code generator:

Figure 2. The process of DAO code generation

Lab Report: Four Dormant Bugs Found. Hand-Carved.

The steps that you take when you extend your code generator must be small. With every step, you should have your code generator produce a new, small fragment of code. This simplifies the comparison between that new fragment and the corresponding fragment in the model. Obviously, in most cases, you will find errors in your generator. The target code model is right. It must be right.

Now, wait. What is that? Where does that difference come from? Wasn't the member sequence in the constructor departureTime followed by arrivalTime? Can you please check that? Also, check the sequence in the database table. That's funny. Smells like a bug. It can't be. Not in this source from that company. They surely haven't published the code without intensively testing it.

Hey, there it is again! The SELECT fetches departureTime followed by arrivalTime, but the assignments are performed in the opposite order. Can you please run and test the application? As unbelievable as it seems, you should get departure times in place of arrival times and vice versa. Check the on-screen results against the data in the DB. No? Your on-screen data are correct? That's relieving, on one hand. I was sure that that company wouldn't publish untested code. On the other hand, it's weird. The sequence in the code is wrong. Let me check the diff again. There are more differences. Let's debug it on paper. Let's print out the original code.

Bug Analysis: "They're Not Poisonous."

SELECT_TRANSPORTATION_QUERY_STR fetches departuretime, followed by arrivaltime. The fetched values get assigned to the volatile strings arrivalTime and departureTime in the wrong order. And this is the first bug. Then these strings are passed on to the Transportation constructor as arrivalTime and departureTime, but the constructor expects departureTime and arrivalTime. This is the second bug. Of course, as they are both strings, the compiler can't detect the error. Later on in the code, exactly the same bugs are replicated. That makes four bugs. But the application behaves wonderfully. How can it be?

Wait. Sure. I got it. One bug neutralizes the other. The departure time from the DB gets assigned to arrivalTime, but then arrivalTime is passed as a parameter where departureTime is expected. So the constructor receives the fetched departure time where it expects it, and the application works fine.

Figure 3 shows the two method execution flows. Seen from the outside, the two flows appear identical. Inside, things are different:

Figure 3. Hidden bugs contribute to method execution

Hand-Coding J2EE Applications Is Hard. Even Where the Sun Shines.

The sample code for the DAO code generator comes from the best possible J2EE 1.4 source. It is part of the J2EE 1.4 Developer Release published by Sun Microsystems on java.sun.com on November 24, 2003. The DAO samples are taken from the new blueprint application called "Java[tm] Adventure Builder Demo 1.0 Early Access 4" (JABD). More specifically, the source file containing the dormant bugs is PointbaseCatalogDAO.java.

The purpose of the new DAO code generator is to enable automatic generation of all Java files from the JABD application that contain database access statements. The rationale for this is that the code in these files is conceptually simple, although structurally complex. In other words, mapping database data to Java properties and vice versa is not rocket science, but it does require constant attention, due to the number of software layers involved. In many cases, the code scans the list of database columns or the list of class properties, and applies some very simple micropattern of code to each item in those lists.

There are two ways to provide this constant attention on repetitive code fragments: one way is to have J2EE developers focus on these tedious details and write that code by hand. As this case study demonstrates, it is difficult to avoid bugs when hand-writing this type of code, although good testing can verify the proper behavior of the application. The other way is to have powerful and flexible automatic tools to build the plumbing parts of J2EE applications, and free developers to perform more challenging development tasks. There are plenty of challenging tasks when developing J2EE applications. The help provided by these tools becomes particularly relevant when you extend, restructure, or maintain your applications; for instance, when you add a new property to your class model. Then you have to add that property in a number of code spots. Modern code generation tools can reliably do that job in seconds.

The JABD application is a comparatively simple application, at least from a size perspective. It clearly demonstrates and documents how to use J2EE 1.4 technologies. In particular, it contains three Pointabase*DAO.java files. The DAO code generator currently produces 95% of this Java code (including comments and blank lines) starting from a concise declarative textual input. The remaining 5% is one business method with real logic that the code generator cannot invent. That code must be written by hand, and the DAO code generator provides the means to insulate and preserve hand-written code across successive re-generation runs.

Where Can I Get that J2EE Generator?

The prototype DAO code generator that helped discover the bugs is the first in a set of J2EE generators modeled on the JABD application. Somusar's approach with respect to code generation is to derive each new generator from existing code samples, and reproduce those samples with the new code generator starting from a concise declarative description of the desired code. Users of the code generator can then provide their specific concise input and have the generator produce and re-produce that particular type of code for their new software entities. This is true not only for Java code, but for any type of code that provides a significant degree of redundancy, and for documentation, as well.

As an example, a multi-tier code generator can produce SQL create table scripts, XML metadata descriptors, JSP files, multi-class Java implementations of application components, HTML documentation, and more. In particular, Somusar's code generators for large applications can produce all of these files for each software entity starting from one compact multi-tier entity description file. If your application model contains more than 15 or 20 entities, you might want to do the math and consider how many files you could generate for your J2EE implementation of that model.

The JADB application provides a complete set of samples, so expect a number of J2EE code generators to be announced in the near future. In a way, you can think of these specialized code generators as if they were passionate plumbing code writers, or maniacal web designers or pedantic technical writers. They love repetitive work.

To view the code generated by the DAO generator prototype and to try that generator on your computer please refer to Code Generation Somusar Style, (Chapter 3, "J2EE Generators"). By the way, while you are there, you can also view the POJO and database schema SQL files produced automatically by corresponding generators.

Conclusion

This article discussed how repetitive work can lead to sneaky coding errors, even in the best J2EE lab in the world. This is true for a carefully hand-written sample application, and is far more true for real-world large applications. Smart code generators can help avoid those coding errors, and greatly increase development teams' productivity by automating repetitive work and thus freeing valuable developers' time for more complex tasks.

The larger the application where you apply these code generators, the higher the amount of benefit and relief that you can derive from them. Think of a J2EE application with a few hundred entities; let's say 300 entities, with an average of five properties per entity, an average of four Java classes per entity, plus one database table, and maybe three user-interface dialogs. Do the math: 300 * (4 + 1 + 3) * 5 = 12,000 code fragments that manipulate those data items. Actually, many more than that, as each class will manipulate them more than once. In most cases, those fragments will move data across application tiers. That's roughly comparable to drilling holes when you build a house. If your task consisted of drilling 12,000 holes, would you prefer a manual drill, or would you rather use its automated equivalent?

Francesco Aliverti-Piuri
has more than 20 years of varied software experience on several platforms, technologies, and architectures.