Java Language Integrity & Security: Uncovering Bytecodes

Obviously, there is a Java language specification. Although the Microsoft version of java.exe will run only on a Windows platform, the java compiler on UNIX and other platforms must abide by the same Java language specification. Despite the fact that each individual platform has its own non-portable java compiler, they all, theoretically at least, behave in consistent ways. Thus, a Java program on a Windows platform should run unchanged on a UNIX platform—even though the tools themselves were written on different platforms by different developers.

As already stated, the Microsoft javac.exe file contains Microsoft Windows-specific machine code. It is interesting to try and open an executable file with a text editor. This is actually a meaningless operation in this context, except that you will be able to compare it to a file of bytecodes and it provides a baseline for this discussion. When you open the javac.exe file in Notepad, you get the results seen in Figure 3.

Obviously, this exercise provides us no useful information. However, I always find it interesting to take a look at this type of output. It also provides students a way to differentiate between character and binary files. Because Notepad is a text editor, the characters displayed are the ACSII representation of the file. Perhaps the most interesting thing about the display of this file is that there are no recognizable words—at least none that I can determine. That is not the case when you look at a file of bytecodes in the same way.

Dynamically Linked Executables

The bytecode model uses a different approach. In this case, if you don't want the kitchen sink, you won't get it. The corresponding definition of dynamically linked is:

Definitions of dynamically linked on the Web:

Linked in name only, so that the executable file contains only the information needed to locate the code of a procedure—the name of the module that contains it and the name of the entry point. When the executable program is loaded, the module is also loaded, and the linkage between them is fixed in memory only.

Figure 4 shows the life-cycle of source code in the bytecode model. In this model, instead of creating machine dependent object modules, bytecode is produced. Although there are drawbacks to this model, which you will explore shortly, a primary advantage is that the bytecodes are, again theoretically, platform independent.

Under The Hood

Perhaps the best way to explain the bytecode model is to look at it directly. You'll design a small Java application for this illustration. In this case, you will create a simple application called Performance presented in Listing 1 and use a class called Employee presented in Listing 2.

Although I do most of my development with an Integrated Development Environment (IDE), I normally will use batch files like these so that I know my CLASSPATH information is correct. This helps in the instruction phase of programming, and it also assists in the testing of the various versions of the development kits. For example, as was mentioned earlier, a web developer must allow for various platforms while developing and testing. In the same manner, a developer must allow for various versions of a development kit. If Java is the development platform, what version of the SDK should be used? The answer is that all reasonable versions must be tested. This means that multiple versions of the development kit may be installed on a machine at the same time. Therefore,, keeping track of the CLASSPATH is problematic.

To deal with this, I like to use batch files to insure that I am using the version of the development kit that I intend to use. Granted, there are much more sophisticated methods of doing this and there are many development tools available to the professional developer; however, in an academic environment, using a more simple, and inexpensive solution is often desirable.

When this application is compiled, there are two separate class files produced, Performance.class and Employee.class, as seen in Figure 5.

Take another look at Figure 3, when you opened the statically linked javac.exe file with Notepad. Open the employee.class file and see what you get. The results, using Notepad, can be seen in Figure 6.

Again, this exercise provides no real benefit from a technical perspective; however, it does provide a window into the structure of the bytecodes. Primarily, you can see that there are some textual components of the file that are recognizable. The word Employee is clearly identifiable in at least a couple of locations. The reason why this is important is because it hints at the possibility of decoding this file and providing much more information about it. Could you potentially even re-create the original source code?