Java Language Integrity & Security: Fine Tuning Bytecodes

This series, The Object-Oriented Thought Process, is intended for someone just learning an object-oriented language and who wants to understand the basic concepts before jumping into the code, or someone who wants to understand the infrastructure behind an object-oriented language he or she is already using. These concepts are part of the foundation that any programmer will need to make the paradigm shift from procedural programming to object-oriented programming.

In keeping with the code examples used in the previous articles, Java will be the language used to implement the concepts in code. One of the reasons that I like to use Java is because you can download the Java compiler for personal use at the Sun Microsystems Web site http://java.sun.com/. You can download the standard edition, J2SE 5.0, at http://java.sun.com/j2se/1.5.0/download.jsp to compile and execute these applications. I often reference the Java J2SE 5.0 API documentation and I recommend that you explore the Java API further. Code listings are provided for all examples in this article as well as figures and output (when appropriate). See the first article in this series for detailed descriptions for compiling and running all the code examples.

The code examples in this series are meant to be a hands-on experience. There are many code listings and figures of the output produced from these code examples. Please boot up your computer and run these exercises as you read through the text.

Last month, you began an examination of the structure of bytecodes. You explored how a class file is designed and how you can disassemble it. Although there may be very few instances when you would need to disassemble code, it is often a very good mechanism for instructional purposes. Understanding what goes on under the hood of an application is a beneficial process. In this article, you continue the discussion of the structure of bytecodes by exploring how you can process them to improve performance, security, intellectual property protection, and readability issues.

Inspecting Classes

One topic that is quite interesting to investigate is the relationship between source code and the bytecodes that the compiler produces. In fact, explore this from various perspectives. You can look at performance issues, security issues, intellectual property issues, and even readability issues. In many cases, one or more of these topics intersect. For example, fine tuning bytecodes for increased performance can also lead to more secure code. The same goes for intellectual property concerns, which can go hand-in-hand with dealing with code performance. It is important, and quite interesting, to understand the effect on fine-tuning bytecodes can have.

When you talk about fine-tuning bytecodes, you are actually changing the bytecodes themselves. This can be problematic because it is normally not a good idea to actually change output from the compiler. Figure 1 illustrates the process by which bytecodes are created.

Note that the bytecodes produced by the compiler are fed directly into the virtual machine. This implies that it is possible to alter the bytecode file (the class file) to various ends. In fact, this defines a security threat that the Java virtual machine takes pains to avoid. You certainly do not want any malicious code introduced to your already compiled bytecodes, malicious code that the virtual machine is unaware of. However, this situation allows you to exploit this technique for legitimate and productive purposes.

To understand how you might fine-tune bytecodes, explore how you could do the same thing by hand. This is an interesting approach. The implication is that if you were to fine-tune bytecodes, the same fine-tuning could have (perhaps should have) been done at the source code level.

There is some truth to this; however, not in all cases. There are certainly times when fine-tuning at the bytecode level that is directed at undesirable source code, perhaps you can call it poorly written source code. Yet, as already mentioned, there are many reasons why fine-tuning bytecodes has nothing at all to do with poorly written source code. Rather, the updates to the bytecodes result from a desire to improve performance, security, intellectual property protection and readability issues.

The one thing that must be remembered is that humans look at code totally differently than the computer. This may seem like an obvious statement; however, high-level languages were developed for just this reason. Thus, source code and bytecodes are written for completely different audiences. This means that what might be good for one is not necessarily good for the other.

To illustrate what I mean by this, consider the code formatting styles that have been adopted by the software development industry. These rules are in place to make source code easier to develop, whether it is writing, reading or maintaining code. In short, the rules are in place to help humans make sense of the process. Certain developments in source code creation provide absolutely no specific benefit to the way the machine interprets the source code.

The adoption of coding standards is a perfect point. Take a look at the code in Listing 1. Just by looking at this code can you tell what it is doing? There are some clues, but it is not necessarily obvious.

Now, look at the code in Listing 2. Even though both applications behave the exact same way and produce the exact same output, my bet is that you agree Listing 2 provides better-documented code. The only thing that is different is the naming conventions.

There are any number of similar examples that can be presented to illustrate this point, including the use of whitespace and comments. While none of this is exactly rocket science, much of the way we write code today deals with readability issues. Often, these issues make no difference to the machine itself. Still, it is interesting to look at things from the perspective of the machine.

Now, pursue the angle of code performance. There are times when code readability and performance issues do not line up. As demonstrated in Listings 1 and 2, making attribute names meaningful is an important way to make your code easier to develop and maintain. However, is this beneficial from a performance perspective?

Note that the names were selected to provide some meaning to the reader of the code. It is obvious what the attributes companyID and balance are meant to be—at least at a high level. In other words, these names are more descriptive that if you had named them x and y. However, might there be a situation when naming the attributes x and y would be preferable? The answer is yes, at least at the bytecode level.