Instrumenting Bytecode

Rather than instrumenting source code and then compiling it, I could
compile the original source code and then instrument the bytecode
that is produced. Depending on the exact transformation required,
this can be either much easier or much more difficult than source code
instrumentation. The main advantage of bytecode instrumentation is
that the code can be modified at run time, without having a compiler
available.

Although Java's bytecode format is relatively simple, I will certainly
want to use a Java library to do the parsing and generation of the
bytecode (e.g., to insulate us from future changes in the Java class
file format). I chose to use Jakarta's
Byte Code Engineering Library
(BCEL), but I could just as easily have picked CGLIB, ASM, or SERP.

Since I will be instrumenting bytecode in a number of different ways,
I'll begin by declaring a generic interface for instrumentation.
This will act as a simple framework for doing annotation-based
instrumentation. This framework will support the transformation of
classes and methods, based on annotations, so the interface will look
something like this:

ClassGen and MethodGen are BCEL classes
that implement the Builder pattern. That is, they provide methods
for mutating an otherwise immutable object, and for transforming
between the mutable and non-mutable representations.

Now I will need to write an implementation for this interface that
replaces @Status annotations with the appropriate calls
to StatusManager. As described previously, I want to
wrap these calls in a try-finally block. Note that for
this to work, the annotations that are used must be tagged with
@Retention(RetentionPolicy.CLASS), which instructs the
Java compiler not to discard the annotations while compiling. Since I
declared @Status as @Retention(RetentionPolicy.SOURCE) earlier, I need to
upgrade it.

As it turns out, in this case, instrumenting bytecode is significantly
more difficult than instrumenting source code. The reason is that
try-finally is a concept that exists in source code only!
The Java compiler transforms try-finally blocks into a
series of try-catch blocks and inserts calls to the
finally block before every return. Thus, I will need to
do something similar in order to add a try-finally block
to existing bytecode.

This is the bytecode that represents an ordinary method call, flanked
by StatusManager updates.

As you can see, I need to duplicate some instructions and add several
jumps and exception table records just to implement a single
try-finally. Luckily, BCEL's
InstructionList class makes this fairly easy.

Instrumenting Bytecode At Compile Time

Now that I have an interface for modifying classes based on
annotations and a concrete implementation of this interface, the last
step is to write the actual framework that will call it. I'm actually
going to write a few of these frameworks, starting with one that
instruments all classes at a compile time. Since this is going to
happen as part of my build process, I've decided to define an Ant task
for it. The declaration of the instrumentation target in my build.xml file
should look something like this:

To provide an implementation of this task, I need to define a
class that realizes the org.apache.tools.ant.Task
interface. Attributes and sub-elements of our task are passed in
through set and add method calls. The execute method is called to
implement the work of the task -- in this case, to instrument the class
files given in the specified <fileset>.

The one problem with using BCEL for this purpose is that as of version
5.1, it does not support parsing annotations. I could load the
classes that we're instrumenting and use reflection to view the
annotations. However, then I would have had to use
RetentionPolicy.RUNTIME instead of
RetentionPolicy.CLASS. I'd also be executing any static
initializers in those classes, which may load native libraries or
introduce other dependencies. Luckily, BCEL provides a plugin
mechanism that allows clients to parse bytecode attributes that it
does not natively support. I've written my own implementation of the
AttributeReader interface that knows how to parse the
RuntimeVisibleAnnotations and
RuntimeInvisibleAnnotations attributes that are inserted
into bytecode when annotations are present. Future versions of BCEL
should include this functionality without the need for a plugin.

This compile time bytecode instrumentation approach is shown in the directory code/02_compiletime of the sample code.

There are a number of disadvantages to this approach, however. For
one thing, I had to add an additional step to my build process. I
also cannot turn the instrumentation on or off based on command-line
settings or other information that is not available at compile time.
If both instrumented and non-instrumented code needs to be run in
a production setting, two separate .jars will need to be built and the
decision of which to use must be all or nothing.

Instrumenting Bytecode at Class Loading Time

A better solution would be to delay the instrumentation of our
bytecode until after it is loaded from the disk. This way, the
instrumented bytecode does not need to be stored. The start-up
performance of our application may suffer, but the trade-off is that
you can control what happens based on system properties, or other
runtime configuration data.

Prior to Java 1.5, it was possible to do this type of class-file
manipulation with a custom class loader. However, the new
java.lang.instrument package added in Java 1.5 provides a
few additional tools. In particular, it defines the concept of a
ClassFileTransformer, which can be used to instrument a class
during the standard loading process.

To register our ClassFileTransformer at the appropriate time
(before any of our classes are loaded), I'll need to define a
premain method. Java will call this before the main
class is even loaded, and it is passed a reference to an
Instrumentation object. I will also need to add a
-javaagent option to the command line to tell Java about
our premain method with Java. This argument takes the
full name of your agent class (which contains the
premain method) and an arbitrary string argument. In
this case, we'll pass the full name of our Instrumentor
class as the argument (this should all be on one line):

This approach of instrumenting bytecode at startup is shown in the
example code's /code/03_startup directory.

Conclusion

In this article, I have replaced a hard-coded solution with one
that uses metaprogramming based on annotations and instrumentation.
Although I've eliminated the need for any extra steps in our build
process, my solution still has a number of limitations. In the next
installment, I will explore a completely different
implementation that uses thread sampling, and then combine
these two techniques to create a solution that gives the best
features of each. I will also discuss a number of additional
requirements, including a progress bar and dynamic status messages.