Topics

Featured in Development

As part of our core values of sharing knowledge, the InfoQ editors were keen to capture and share our book and article recommendations for 2018, so that others can benefit from this too. In this second part we are sharing the final batch of recommendations

Featured in Architecture & Design

Tanya Reilly discusses her research into how the fire code evolved in New York and draws on some of the parallels she sees in software. Along the way, she discusses what it means to be an SRE, what effective aspects of the role might look like, and her opinions on what we as an industry should be doing to prevent disasters.

Featured in Culture & Methods

Mik Kersten has published a book, Project to Product, in which he describes a framework for delivering products in the age of software. Drawing on research and experience with many organisations across a wide range of industries, he presents the Flow Framework™ as a way for organisations to adapt their product delivery to the speed of the market.

Featured in DevOps

The fact that machine learning development focuses on hyperparameter tuning and data pipelines does not mean that we need to reinvent the wheel or look for a completely new way. According to Thiago de Faria, DevOps lays a strong foundation: culture change to support experimentation, continuous evaluation, sharing, abstraction layers, observability, and working in products and services.

Living in the Matrix with Bytecode Manipulation

You are probably all too familiar with the following sequence: You input a .java file into a Java compiler, (likely using javac or a build tool like ANT, Maven or Gradle), the compiler grinds away, and finally emits one or more .class files.

Figure 1: What is Java bytecode?

If you run the build from the command line with verbose enabled, you can see the output as it parses your file until finally it prints your .class file.

javac -verbose src/com/example/spring2gx/BankTransactions.java

The generated .class file contains the bytecode, essentially the instruction set for the Java virtual machine (JVM), and is what gets loaded by the Java runtime class loader when a program executes.

In this article we will investigate Java bytecode and how to manipulate it, and why anyone would ever want to do so.

Bytecode-manipulation frameworks

Some of the more popular frameworks for manipulating bytecode include:

Why should you care about manipulating bytecode?

Many common Java libraries such as Spring and Hibernate, as well as most JVM languages and even your IDEs, use bytecode-manipulation frameworks. For that reason, and because it’s really quite fun, you might find bytecode manipulation a valuable skillset to have. You can use bytecode manipulation to perform many tasks that would be difficult or impossible to do otherwise, and once you learn it, the sky's the limit.

One important use case is program analysis. For example, the popular FindBugs bug-locator tool uses ASM under the hood to analyze your bytecode and locate bug patterns. Some software shops have code-complexity rules such as a maximum number of if/else statements in a method or a maximum method size. Static analysis tools analyze your bytecode to determine the code complexity.

Another common use is class generation. For example, ORM frameworks typically use proxies based on your class definitions. Or consider security applications that provide syntax for adding authorization annotations. Such use cases lend themselves nicely to bytecode manipulation.

JVM languages such as Scala, Groovy, and Grails all use a bytecode-manipulation framework.

Consider a situation where you need to transform library classes without having the source code, a task routinely performed by Java profilers. For example, at New Relic, bytecode instrumentation is used to time method executions.

With bytecode manipulation, you can optimize or obfuscate your code, or you can introduce functionality such as adding strategic logging to an application. This article will focus on a logging example, which will provide the basic tools for using these bytecode manipulation frameworks.

Our example

Sue is in charge of ATM coding for a bank. She has a new requirement: add key data to the logs for some designated important actions.

Here is a simplified bank-transactions class. It allows a user to log in with a username and password, does some processing, withdraws a sum of money, and then prints out “transactions completed.” The important actions are the login and withdrawal.

To simplify the coding, Sue would like to create an @ImportantLog annotation for those method calls, containing input parameters that represent the indexes of the method arguments she wants to record. With that, she can annotate her login and withdraw methods.

/**
* A method annotation which should be used to indicate
* important methods whose invocations should be logged.
*/
public @interface ImportantLog {
/**
* The method parameter indexes whose values should be logged.
* For example,if we have the method
* hello(int paramA, int paramB, int paramC), and we
* wanted to log the values of paramA and paramC, then fields
* would be ["0","2"]. If we only want to log the value of
* paramB, then fields would be ["1"].
*/
String[] fields();
}

For login, Sue wants to record the account ID and the username so her fields will be set to “1” and “2”, (she doesn’t want to display the password!) For the withdraw method, her fields are “0” and “1” because she wants to output the first two fields: account ID and the amount of money to remove. Her audit log ideally will contain something like this:

To hook this up, Sue is going to use a Java agent. Introduced in JDK 1.5, Java agents allow you to modify the bytes that comprise the classes in a running JVM, without requiring any source code.

Without an agent, the normal execution flow of Sue’s program is:

Run Java on a main class, which is then loaded by a class loader.

Call the class’s main method, which executes the defined process.

Print “transactions completed.”

When you introduce a Java agent, a few more things happen — but let’s first see what’s required to create an agent. An agent must contain a class with a premain method. It must be packaged as a JAR file with a properly constructed manifest that contains a Premain-Class entry. There is a switch that must be set on launch to point to the JAR path, which makes the JVM aware of the agent.

java -javaagent:/to/agent.jar com/example/spring2gx/BankTransactions

Inside premain, register a Transformer that captures the bytes of every class as it is loaded, makes any desired modifications, and returns the modified bytes. In Sue’s example, Transformer captures BankTransaction, which is where she makes her modifications and returns the modified bytes. Those are the bytes that are loaded by the class loader, and which the main method will execute to perform its original functionality in addition to Sue’s required augmented logging.

When the agent class is loaded, its premain method is invoked before the application main method.

Figure 2: Process with Java agent.

It’s best to look at an example.

The Agent class doesn’t implement any interface, but it must contain a premain method, as follows:

The Transformer class contains a transform method, whose signature accepts a ClassLoader, class name, Class object of the class being redefined, ProtectionDomain defining permissions, and the original bytes of the class. Returning null from the transform method tells the runtime that no changes have been made to that class.

Javassist

A subproject of JBoss, Javassist (short for “Java Programming Assistant”) consists of a high-level object-based API and a lower-level one that is closer to the bytecode. The more object-based one enjoys more community activity and is the focus of this article. For a complete tutorial, refer to the Javassist website.

In Javassist, the basic unit of class representation is the CtClass (“compile time class”). The classes that comprise your program are stored in a ClassPool, essentially a container for CtClass instances.

The ClassPool implementation uses a HashMap, in which the key is the name of the class and the value is the corresponding CtClass object.

A normal Java class contains fields, constructors, and methods. The corresponding CtClass represents those as CtField, CtConstructor, and CtMethod. To locate a CtClass, you can grab it by name from the ClassPool, then grab any method from the CtClass and apply your modifications.

Figure 3.

CtMethod contains lines of code for the associated method. We can insert code at the beginning of the method using the insertBefore command. The great thing about Javassist is that you write pure Java, albeit with one caveat: the Java must be implemented as quoted strings. But most people would agree that’s much better than having to deal with bytecode! (Although, if you happen to like coding directly in bytecode, stay tuned for the ASM section.) The JVM includes a bytecode verifier to guard against invalid bytecode. If your Javassist-coded Java is not valid, the bytecode verifier will reject it at runtime.

Similar to insertBefore, there's an insertAfter to insert code at the end of a method. You can also insert code in the middle of a method by using insertAt or add a catch statement with addCatch.

Let's kick off your IDE and code your logging feature. We start with an Agent (containing premain) and our ClassTransformer.

To add audit logging, first implement transform to convert the bytes of the class to a CtClass object. Then, you can iterate its methods and capture ones with the @ImportantLogin annotation on them, grab the input parameter indexes to log, and insert that code at the beginning of the method.

Javassist annotations can be declared as “invisible” or “visible”. Invisible annotations, which are only visible at class loading time and compile time, are declared by passing in the RententionPolicy.CLASS argument to the annotation. Visible annotations (RententionPolicy.RUNTIME) are loaded and visible at run time. For this example, you only need the attributes at compile time, so make them invisible.

The getAnnotation method scans for your @ImportantLog annotation and returns null if it doesn’t find the annotation.

With the annotation in hand, you can retrieve the parameter indexes. Using Javassist’s ArrayMemberValue, the member value fields are returned as a String array, which you can iterate to obtain the field indexes you had embedded in the annotation.

Your implementation creates a StringBuilder, appending some preamble followed by the required method name and class name. One thing to note is that if you're inserting multiple Java statements, you need to surround them with squiggly brackets (see lines 4 and 26).

(Brackets are not required for just a single statement.)

That pretty much covers the code for adding audit logging using Javassist. In retrospect, the positives are:

Because it uses familiar Java syntax, there’s no bytecode to learn.

There wasn't too much programming to do.

Good documentation on Javassist exists.

The negatives are:

Not using bytecode limits capabilities.

Javassist is slower than other bytecode-manipulation frameworks.

ASM

ASM began life as a Ph.D. project and was open-sourced in 2002. It is actively updated, and supports Java 8 since the 5.x version. ASM consists of an event-based library and an object-based one, similar in behavior respectively to SAX and DOM XML parsers. This article will focus on the event-based library. Complete documentation can be found here.

A Java class contains many components, including a superclass, interfaces, attributes, fields, and methods. With ASM, you can think of each of these as events; you parse the class by providing a ClassVisitor implementation, and as the parser encounters each of those components, a corresponding “visitor” event-handler method is called on the ClassVisitor (always in this sequence).

Then pass the output bytes to a no-op ClassWriter to put the parsed bytes back together in the byte array, producing a rehydrated BankTransaction that as expected is virtually identical to our original class.

Now let’s modify our ClassWriter to do something a little more useful by adding a ClassVisitor (named LogMethodClassVisitor) to call our event handler methods, such as visitField or visitMethod, as the corresponding components are encountered during parsing.

For your logging requirement, you want to check each method for the indicative annotation and add any specified logging. You only need to overwrite ClassVisitorvisitMethod to return a MethodVisitor that supplies your implementation. Just like there are several components of a class, there are several components of a method, corresponding to the method attributes, annotations, and compiled code. ASM’s MethodVisitor provides hooks for visiting every opcode of the method, so you can get pretty granular in your modifications.

Again, the event handlers are always called in the same predefined sequence, so you always know all of the attributes and annotations on the method before you have to actually visit the code. (Incidentally, you can chain together multiple instances of MethodVisitor, just like you can chain multiple instances of ClassVisitor.) So in your visitMethod, you’re going to hook in the PrintMessageMethodVisitor, overriding visitAnnotations to capture your annotations and insert any required logging code.

Your PrintMessageMethodVisitor overrides two methods. First comes visitAnnotation, so you can check the method for your @ImportantLog annotation. If present, you need to extract the field indexes from that field’s property. When visitCode executes, the presence of the annotation has already been determined and so it can add the specified logging. The visitAnnotation code hooks in an AnnotationVisitor that exposes the field arguments on the @ImportantLog annotation.

This is the scary part of ASM — you actually have to write bytecode, so that’s something new to learn. You have to know about the stack, local variables, etc. It’s a fairly simple language, but if you just want to hack around, you can actually get the existing bytecode pretty easily with javap:

javap -c com/example/spring2gx/mains/PrintMessage

I recommend writing the code you need in a Java test class, compiling that, and running it though javap -c to see the exact bytecode. In the code sample above, everything in blue is actually the bytecode. On each line, you get a one-byte opcode followed by zero or more arguments. You will need to determine those arguments for the target code, and they can usually be extracted by doing a javap-c -v on the original class (-v for verbose, which displays the constant pool).

I encourage you to look at the JVM spec, which defines every opcode. There are operations like load and store (which move data between your operand stack and your local variables), overloaded for each parameter type. For example, ILOAD moves an integer value from the stack into a local variable field whereas LLOAD does the same for a long value.

There are also operations like invokeVirtual, invokeSpecial, invokeStatic, and the recently added invokeDynamic, for invoking standard instance methods, constructors, static methods, and dynamic methods in dynamically typed JVM languages, respectively. There are also operations for creating new classes using the new operator, or to duplicate the top operand on the stack.

In sum, the positives of ASM are:

It has a small memory footprint.

It’s typically pretty quick.

It’s well documented on the web.

All of the opcodes are available, so you can really do a lot with it.

There’s lots of community support.

The really only one negative, but it’s a big one: you’re writing bytecode, so you have to understand what's going on under the hood and as a result developers tend to take some time to ramp up.

Lessons learned

When you're dealing with bytecode manipulation, it's important to take small steps. Don't write lots of bytecode and expect it to immediately pass verification and work. Write one line at a time, think about what's in your stack, think about your local variables, and then write another line. If it's not passing the verifier, change one thing at a time; otherwise you'll never get it to work. Also keep in mind that besides the JVM verifier, ASM maintains a separate bytecode verifier, so it's good to run both and verify that your bytecode passes both of them.

It's important to think about class loading when you're modifying classes. When you use a Java agent, its transformer will touch every class as it is loaded into the JVM, no matter which class loader is loading it. So you need to make sure that the class loader can also see that object. Otherwise, you're going to run into trouble.

If you're using Javassist and an application server that has multiple class loaders, you have to be concerned about your class pool being able to see your class objects. You might have to register a new classpath to your class pool to get it to see your class objects. You can chain your class pools like Java chains class loaders, so if it doesn't find the CTClass object in its class pool, it can go look at its parents.

Finally, it’s important to note that the JDK has its own capability to transform classes, and some limitations will apply to any class that the JDK has already transformed; you can modify the implementation of methods but, unlike original transformations, re-transformations are not permitted to change the class structure, for example by adding new fields or methods, or by modifying signatures.

Bytecode manipulation can make life easier. You can find bugs, add logging (as discussed), obfuscate source code, perform preprocessing like Spring or Hibernate, or even write your own language compiler. You can restrict your API calls, analyze code to see if multiple threads are accessing a collection, lazy-load data from the database, and find differences between JARs by inspecting them.

So I encourage you to make a bytecode-manipulation framework your friend. Someday, one might save your job.

This article was extracted from a presentation by New Relic’s Ashley Puls at Spring One. The full presentation can be viewed and downloaded here.

About the Author

Victor Grazi is the Java queue lead at InfoQ. Inducted as an Oracle Java Champion in 2012, Victor works at Nomura Securities on core platform tools, and as a technical consultant and Java evangelist. He is also a frequent presenter at technical conferences. Victor hosts the "Java Concurrent Animated" and "Bytecode Explorer" open source projects.