Code-Generation Techniques for Java

Working in Java either means writing a little bit of complex code or writing
a lot of gruntwork code. J2EE is a prime example; implementing the persistence
for a single database table takes five classes and two interfaces using EJBs,
and almost all of the classes are clerical work. We have to write them, but we
don't have to do it by hand. Code-generation techniques can make building high-quality EJB code a breeze.

Will code generation revolutionize computing and change the way we develop
forever? Yes, but it will take a while. Software engineering has always
concentrated on increasing our level of abstraction. In the beginning, we hand-wrote machine code; then we created assemblers and macro assemblers. After that,
we created Fortran and compiled our code into assembler. Then came structure
programming, and after that, object-oriented programming. With each step, we
have increased our level of abstraction and, thus, our ability to create higher
quality applications with more functionality, more quickly.

What is Code Generation?

What is this panacea for developers called code generation? Code
generation is the technique of writing and using programs that build
application and system code. To understand code generation, you need to
understand what goes in and what comes out. What goes in is the design for the
code in a declarative form: "I need two tables named book and
author with these fields." What comes out is one or more target
files. It could be Java code, deployment descriptors, SQL, documentation, or
any type of controlled output.

Figure 1 shows the basic form of today's code generators:

Figure 1. The process of code generation

The components can change slightly between the different models, but the song
remains the same. The code generator reads in the design, then uses a set of
templates to build output code that implements the design. The separation
between code generation logic in the generator and output formatting in the
templates is akin to the separation between business logic and user interfaces
in web applications.

Code generators are not wizards. Wizards are passive generators. They write
code once, and then it's up to you to maintain the code forever. Code generators
are active. They continually maintain code over multiple generation cycles. As
the designs change, the input to the generator changes, and new code is created
to match the design. This is a key advantage — when have you been on a
project where the requirements don't change?

What Are the Benefits?

Before we get into specific examples of code generators for Java, let's make
sure we have the end goals firmly in mind. One way to approach this is to think about the qualities we want in an optimal generator.

Quality: We want the output code to be at least as good as
what we would have written by hand. Thankfully, the template-based approach of
today's generators builds code that is easy to read and debug. Because of the
active nature of the generator, bugs found in the output code can be fixed in
the template. Code can then be re-generated to fix that bug across the
board.

Consistency: The code should use consistent class, method,
and argument names. This is also an area where generators excel because, after
all, this is a program writing your code.

Productivity: It should faster to generate the code than to
write it by hand. This is the first benefit that most people think of when it
comes to generation. Strangely, you may not achieve this on the first
generation cycle. Thankfully, the real productivity value comes later, as you
re-generate the code base to match changing requirements; at this point you
will blow the hand-coding process out of the water in terms of
productivity.

Abstraction: We should be able to specify the design in an
abstract form, free of implementation details. That way we can re-target the
generator at a later date if we want to move to another technology
platform.

Now that we understand that benefits that we want, and how those are
addressed by code generation techniques in general, we should understand what we
expect to use code generation for in the Java context.

What We Expect the Generator to Handle

The output files of a generator are called the target files. There are
several generation targets within the Java enterprise application stack.
Figure 2 shows the stack:

Figure 2. J2EE generation targets

All four of these elements of the stack are potential generation targets,
but some are more common than others. From the bottom to the top:

Database: Given Java's object-persistence approaches to
database work, there isn't much call for direct generation of SQL for database
code or stored procedures. However, if this is your architecture, you can use
the custom approaches listed below to generate the required code.

Persistence: Database persistence code is the most common
generation target in the Java environment. All of the generators I refer to in
the sections that follow build persistence code. Why? It's generally redundant
grunt code. Generated database-persistence code also is an excellent foundation
for a solid application, because it is consistent and relatively bug-free.

Business Logic and User Interfaces: Only MDA and
custom generators build production business logic and user interfaces. The
critical factor in generating this code is building on top of a stable,
predictable platform, ideally a generated persistence layer.

It's obvious that code generation is powerful and can build useful code, but
does it have drawbacks?

What to Look Out For

Code generation is not without pitfalls and detractors. One of the most
common complaints is that code that was once active is now being hand-modified
and thus cannot be re-generated. One trick is never to check the generated
source into the code base. This ensures that engineers will always be required
to use the generator as part of the compilation process. This keeps the
generator alive and keeps engineers from modifying the output code.

Another problem is that engineers who have been around for around since the
early 90s liken code generators to Computer-Aided Software Engineering (CASE)
tools. The comparison is mistaken because code generators are developed bottom-up by engineers for engineers. CASE tools were developed as a top-down
replacement for programming languages and for engineers.

There are more reasons that engineers are skeptical about generation. Some
issues are technical and others are cultural. Some times it comes down to
simple job preservation. These tend to be situation-specific and boil down to
simple issues: trust, teamwork, and education. In order to successfully deploy
a generator, the team must trust the tool. They must feel that they have some
control over the tool and its implementation. They also need to know how the
tool is used both at a basic level (e.g., How do I run it?), and at a
specific level (e.g., How do I specify when I need a table with a compound
primary key?).

Perhaps the biggest drawback of code generation is that it falls to the
implementer of the tool to ensure successful adoption within the team. If you
put a copy of the code generator on the server and expect that people will
immediately understand its use and the compelling value, then you are sure to
fail. Education and empathy are key.

Given an understanding of which Java application components we can generate
and what we have to look out for, let's talk about the generators that build
them.