Topics

Featured in Development

As part of our core values of sharing knowledge, the InfoQ editors were keen to capture and share our book and article recommendations for 2018, so that others can benefit from this too. In this second part we are sharing the final batch of recommendations

Featured in Architecture & Design

Tanya Reilly discusses her research into how the fire code evolved in New York and draws on some of the parallels she sees in software. Along the way, she discusses what it means to be an SRE, what effective aspects of the role might look like, and her opinions on what we as an industry should be doing to prevent disasters.

Featured in Culture & Methods

Mik Kersten has published a book, Project to Product, in which he describes a framework for delivering products in the age of software. Drawing on research and experience with many organisations across a wide range of industries, he presents the Flow Framework™ as a way for organisations to adapt their product delivery to the speed of the market.

Featured in DevOps

The fact that machine learning development focuses on hyperparameter tuning and data pipelines does not mean that we need to reinvent the wheel or look for a completely new way. According to Thiago de Faria, DevOps lays a strong foundation: culture change to support experimentation, continuous evaluation, sharing, abstraction layers, observability, and working in products and services.

Secrets of the Bytecode Ninjas

The Java language is defined by the Java Language Specification (JLS). The executable bytecode of the Java Virtual Machine, however, is defined by a separate standard, the Java Virtual Specification (usually referred to as the VMSpec).

JVM bytecode is produced by javac from Java source code files, and the bytecode is significantly different from the language. For example, some familiar high-level Java language features have been compiled away and don’t appear in the bytecode at all.

Related Vendor Content

One of the most obvious examples of this would be Java’s loop keywords (for, while, etc), which are compiled away and replaced with bytecode branch instructions. This means that bytecode’s flow control inside a method consists only of if statements and jumps (for looping).

In this article, we will assume that the reader has a grounding in bytecode. If some background is required, see The Well-Grounded Java Developer (Evans & Verburg, Manning 2012) or this report from RebelLabs (signup required for PDF).

Let’s look at an example that often puzzles developers who are new to JVM bytecode. It uses the javap tool, which ships with the JDK or JRE, and which is effectively a Java bytecode disassembler. In our example, we will discuss a simple class that implements the Callable interface:

This disassembly looks wrong - after all, we wrote one call method not two; even if we had tried to write it as such, javac would have complained that there are two methods with the same name and signature that differ only in return type, and so the code would not have compiled. Nevertheless, this class was generated from the real, valid Java source file shown above.

This clearly shows that Java’s familiar ambiguous return type restriction is a Java language constraint, rather than a JVM bytecode requirement. If the thought of javac inserting code that you didn’t write into your class files is troubling, it shouldn’t be; we see it every day! One of the first lessons a Java programmer learns is that “if you don’t provide a constructor, the compiler adds a simple one for you”. In the output from javap you can even see the constructor that has been provided even though we didn’t write it.

These additional methods provide an example of the requirements of the language spec being stricter than the details of the VM spec. There are a number of "impossible" things that can be done if we write bytecode directly - legal bytecode that no Java compiler will ever emit.

For example, we can create classes with genuinely no constructor. The Java language spec requires that every class has at least one constructor, and javac will insert a simple void constructor automatically if we fail to provide one. If we write bytecode directly, however, we are free to omit one. Such a class could not be instantiated, even via reflection.

Our final example is one that almost works, but not quite. In bytecode, we can write a method that attempts to call a private method belonging to another class. This is valid bytecode, but it will fail to link correctly if any program attempts to load it. This is because the access control restrictions on the call will be detected by the classloader’s verifier and the illegal access will be rejected.

Introduction to ASM

If we want to create code that can implement some of these non-Java behaviours, then we will need to produce a class file from scratch. As the class file format is binary, it makes sense to use a library that enables us to manipulate an abstract data structure, then convert it to bytecode and stream it out to disc.

There are several such libraries to choose from, but in this article we will focus on ASM. This is a very common library that appears (in slightly modified form) in the Java 8 distribution as an internal API. For user code, we want to use the general open-source library instead of the JDK’s version, as we should not rely upon internal APIs.

The core focus of ASM is to provide an API that while somewhat arcane (and occasionally crufty), corresponds to the bytecode data structures in a fairly direct way.

The Java runtime is the result of design decisions made over a number of years, and the resulting accretion can clearly been seen in successive versions of the class file format.

ASM seeks to model the class file fairly closely - and so the basic API breaks down into a number of fairly simple sections for methods (although ones that model binary concerns).

The programmer who wishes to create a class file from scratch needs to understand the overall structure of a class file, and this does change over time. Fortunately, ASM handles the slight differences in class file format that are seen between Java versions, and the strong compatibility requirements of the Java platform also help.

In order, a class file contains:

Magic number (in the traditional Unix sense - Java’s magic number is the rather dated and sexist 0xCAFEBABE)

Fields that this class possesses (over and above those of superclasses)

Methods that this class possesses (over and above those of superclasses)

Attributes (Class-level annotations)

The main sections of a JVM class file can be recalled using the following mnemonic:

ASM offers two APIs, and the easiest to uses relies heavily upon the Visitor pattern. In a common formulation, ASM starts from a blank slate, with the ClassWriter (when getting used to working with ASM and direct bytecode manipulation, many developers find the CheckClassAdapter a useful starting point - this is a ClassVisitor that checks its methods in a similar manner to the verifier that appears in Java’s classloading subsystem.)

Let’s look at some simple class generation examples that follow a common pattern:

To drive the classes we’ll generate, we use a harness, called Main. This provides a simple classloader and a reflective way of calling back onto the methods of the generated class. For simplicity, we also write out our generated classes into the Maven target directory into the right place to be picked up on the IDE’s classpath:

Note how the methods are generated with the ACC_STATIC flag set, and how the method arguments are first in the local variable list (as implied by the ILOAD 0 pattern - in an instance method, this would be ILOAD 1, as the "this" reference would be stored at the 0 offset in the local variable table).

Using javap, we can confirm that this class genuinely has no constructor:

Working With Generated Classes

Up until now, we have worked reflexively with the classes we’ve generated via ASM. This helps to keep
the examples self-contained, but in many cases we want to use the generated code with regular Java files. This is easy enough to do. The examples helpfully place the generated classes into the Maven target
directory, so simply:

This won’t compile in the IDE (as the GetterSetter class is not on the classpath). However, if we drop down to the command line and supply the appropriate dependency on the classpath, everything works fine:

Conclusion

In this article we’ve looked at the basics of generating class files from scratch, using the simple API from the ASM library. We’ve shown some of the differences between a Java language and a bytecode requirement, and that some of the rules of Java are actually just conventions from the language, and are not enforced by the runtime. We’ve also shown that a correctly written class file can be used directly from the language, just as though it had been produced by javac. This is the basis of Java’s interoperability with non-Java languages, such as Groovy or Scala.

There are a number of much more advanced techniques available, but this article should provide a good place to get started with deeper investigations of the JVM runtime and how it operates.

About the Author

Ben Evans is the CEO of jClarity, a Java/JVM performance analysis startup. In his spare time he is one of the leaders of the London Java Community and holds a seat on the Java Community Process Executive Committee. His previous projects include performance testing the Google IPO, financial trading systems, writing award-winning websites for some of the biggest films of the 90s, and others.