A Gcc-based Java Implementation

Short abstract

While the portability of Java bytecodes is a major factor
in its success, we believe it cannot become a mainstream
programming language without mainstream implementation
techniques, specifically an optimizing ahead-of-time
compiler. This allows much better optimization, and much
faster application start-up times than with JIT translators.
Cygnus is writing a Java front-end for the GNU compiler
(gcc) to translate Java bytecodes to machine code. This uses
proved and widely used technology. The meta-data (such as
the Class objects and lists of fields) will be laid
out by the compiler in static data memory, saving more
startup time. We will enhance and use the GNU linker (ld)
to link compiled class files into standard shared or static
libraries. For the run-time environment, we are enhancing
the existing Kaffe free Java VM to make it full-strength
and to support linking with pre-compiled class libraries.
Kaffe is a JIT system, which means that methods that have been
dynamically loaded and compiled use the same calling
conventions as pre-compiled methods. We will enhance the
GNU debugger (gdb) to understand Java, which will provide
a familiar and multi-language debugging environment (you
can use the same interface to debug Java and native methods).

Extended abstract

Java has taken off because it is a decent programming language,
is buzzword-compliant (object-orient and web-enabled), and
because it is implemented by compiling to portable bytecodes.
However, interpreting bytecodes makes Java program many times slower
than comparable C or C++ programs. One approach to improving this
situation is "Just-In-Time" (JIT) compilers. These dynamically
translate bytecodes to machine code just before a method is executed.
This can provide substantial speed-up, but it is still slower
than C or C++. There are two main problems with the JIT approach
compared to conventional compilers: (1) The compilation is done
every time the application is executed, which increases start-up
times substantially, and (2) the JIT compiler has to run fast,
and therefore cannot do any substantial optimization.

While JIT compilers have an important place in a Java system,
for frequently used applications it is better to use a more
traditional "ahead-of-time" or batch compiler. While Java has
been primarily touted as an internet/web language, many people
are interested in using Java as an alternative to traditional
languages such as C++, if the performance can be made adequate.
For embedded applications it makes much more sense to pre-compile
the Java program, especially if the program is to be in ROM.

So Cygnus is building a Java programming environment that is
based on conventional a compiler, linker, and debugger, using
Java-enhanced versions of the existing GNU programming tools.

The core tool is of course the compiler. This is "cc1java,"
a gcc new front-end. This has similar structure as existing
front-ends, and shares most of the code with them.
The most unusual aspect of cc1java is that its "parser" reads
*either* Java source files or Java bytecode files. (The first
release will only support directly support bytecodes; parsing
Java source will be done by invoking Sun's javac. A future
version will provide an integrated Java parser, mainly for
the sake of compilation speed.) In any case, it is important
that cc1java can read bytecodes, for at three reasons: (1) it
is the natural way to get declarations of external classes (in
this respect a Java bytecode file is like a C++ pre-compiled
header file); (2) it is needed so we can support code produced
from other tools that produce Java bytecodes (such as the Kawa
Scheme-to-Java-bytecode compiler); and (3) some libraries are
(unfortunately) distributed as Java bytecodes without source.

To "parse" a Java bytecode file involves first parsing the
meta-data in the file. Each bytecode file defines one Java
class, and defines the superclass, fields, and methods of
the class. We use this information to build corresponding
declarations and type nodes using mostly-standard gcc "tree"
nodes. This information will also be used to generate the
run-time meta-information (such as the Class data structure):
The compiler generates initialized static data that have the
same layout as the run-time data structures used by the Java VM.
Thus startup is fast, and does not require allocating any data.

The executable content of a bytecode file contains a vector
of bytecode instructions for each (non-native) method.
Code generation means converting the stack-oriented
bytecodes into gcc expression nodes. The first problem
is that we must know for each instruction the types of
each operand (stack and local variable slots) in the
Java virtual machine state. This is done with a process
very similar a Java bytecode verifier. Transforming
postfix stack operations to expression nodes involves
a compile-time stack of expression nodes. When necessary,
we also map stack locals and local varaibles into gcc
pseudo-registers.

Linking a set of compiled Java binaries into a library or executable
will use the standard linker (GNU ld). However, some enhancements
are necessary or at least desirable. The linker must provide a way
to build a table mapping class names to Class objects. This can
be done using the same mechanism used for running C++ static
initializer. Linker help is also desirable to combine multiple
copies of the same literal.

Running a compiled Java program will need a suitable Java run-time
environment. This contains support for threads, garbage collection,
and all the primitive Java methods. Complete Java support also means
being able to dynamically load new bytecodes classes. Hence the
appropriate Java environment is a basically a Java Virtual Machine.
We are using the Kaffe free Java VM (written by Tim Wilkinson),
but enhancing and modifying it to be more suitable for pre-compiled
code. (For example, we are simplifying the data structures.)
Kaffe include a JIT compiler, which solves the problem of calling
between pre-compiled and dynamically loaded methods (since both use
the same calling convention).

We plan to enhance gdb (the GNU debugger) so it can understand
Java-compiled code. This may involve accessing Java meta-data
from the Java executable. We may also enhance gdb to understand
dynamically-loaded bytecodes, but the need for that is reduced
if we instead provide a hook so gdb knows about JIT-compiled code.