In order to perform meaningful experiments in optimizing
compilation and run-time system design, researchers usually rely on a
suite of benchmark programs of interest to the optimization technique
under consideration. Programs are described as numeric,
memory-intensive, concurrent, or object-oriented, based on a
qualitative appraisal, in some cases with little justification. We
believe it is beneficial to quantify the behavior of programs with a
concise and precisely defined set of metrics, in order to make these
intuitive notions of program behavior more concrete and subject to
experimental validation. We therefore define a set of unambiguous,
dynamic, robust and architecture-independent metrics that can be used
to categorize programs according to their dynamic behavior in five
areas: size, data structure, memory use, concurrency, and
polymorphism. A framework computing some of these metrics for Java
programs is presented along with specific results.

14:00 - 14:30

How Java Programs Interact with Virtual Machines at the Microarchitectural Level

Java workloads are becoming increasingly prominent on various
platforms ranging from embedded systems, over general-purpose
computers to high-end servers. Understanding the implications of all
the aspects involved when running Java workloads, is thus extremely
important during the design of a system that will run such workloads,
to meet its design goals. In other words, understanding the
interaction between the Java application, its input and the virtual
machine it runs on, is key to a successful design. The goal of this
paper is to study this complex interaction at the microarchitectural
level, e.g., by analyzing the branch behavior, the cache behavior,
etc. This is done by measuring a large number of performance
characteristics using performance counters on an AMD K7 Duron
microprocessor. These performance characteristics are measured for
seven virtual machine configurations, and a collection of Java
benchmarks with corresponding inputs coming from the SPECjvm98
benchmark suite, the SPECjbb2000 benchmark suite, the Java Grande
Forum benchmark suite and an open-source raytracer, called Raja with
19 scene descriptions. This large amount of data is further analyzed
using statistical data analysis techniques, namely principal
components analysis and cluster analysis. These techniques provide
useful insights in an understandable way.

From our experiments, we conclude that (i) the behavior observed
at the microarchitectural level is primarily determined by the
virtual machine for small input sets, e.g., the SPECjvm98 s1 input
set; (ii) the behavior can be quite different for various input sets,
e.g., short-running versus long-running benchmarks; (iii) for
long-running benchmarks with few hot spots, the behavior can be
primarily determined by the Java program and not the virtual machine,
i.e., all the virtual machines optimize the hot spots to similarly
behaving native code; (iv) in general, the behavior of a Java
application running on one virtual machine can be significantly
different from running on another virtual machine. These conclusions
warn researchers working on Java workloads to be careful when using a
limited number of Java benchmarks or virtual machines since this
might lead to biased conclusions.

14:30 - 15:00

Effectiveness of Cross-Platform Optimizations for a Java Just-In-Time Compiler

We describe the system overview of our Java JIT compiler, which
has been the basis for the latest production version of IBM Java
virtual machine that supports a diversity of processor architectures
including both 32-bit and 64-bit modes, CISC, RISC, and VLIW
architectures. In particular, we focus on the design and evaluation
of the cross-platform optimizations that are common across different
architectures. We study the effectiveness of each optimization by
selectively disabling it in our JIT compiler on three different
platforms: IA32, IA64, and PowerPC. Based on the detailed statistics,
we classify our optimizations and identify a small set of the most
cost-effective ones in terms of the performance improvement as the
benefit and the compilation time as the cost. In summary, we
demonstrate that, with a selected set of optimizations, we can
achieve 90% of the peak performance for SPECjvm98 at the expense of
only 33% of the total compilation time in comparison to the case in
which all the optimizations are enabled.