First we present some results for the the computational kernel of the
multigrid code, namely unaccelerated red-black relaxation algorithm
of Figure 1.
Figure 6 gives our results
for this kernel on a 512 by 512 matrix. The results are encouraging.
The HPJava version scales well, and eventually comes quite
close to the HPF code (absolute megaflop performances are modest,
but this feature was observed for all our codes,
and seems to be a property of the hardware).

The flat lines at the bottom of the graph give the sequential Java
and Fortran performances, for orientation. We did not use any
auto parallelization feature here.

Figure 6:
Red-black relaxation of two dimensional Laplace equation
with size of .

Corresponding results for the complete multigrid code are given in
Figure 7.
The results here are not as good as for simple red-black relaxation--both
HPJava speed relative
to HPF, and the parallel speedup of HPF and HPJava are less satisfactory.

The poor performance of HPJava relative to Fortran
in this case can be attributed largely to the naive nature of the translation
scheme used by the current HPJava system. The overheads
are especially significant when there are many very tight overall
constructs (with short bodies). We saw several of these in section
3. Experiments done elsewhere
[13] lead us to believe
these overheads can be reduced by straightforward optimization
strategies which, however, are not yet incorporated in our
source-to-source translator.

The modest parallel speedup of both HPJava and HPF is due to communication
overheads. The fact that HPJava and HPF have similar scaling
behavior, while absolute performance of HPJava is lower, suggests
the communication
library of HPJava is slower than the communications of the native SP3
HPF (otherwise the performance gap would close for larger
numbers of processors).
This is not too surprising because Adlib is built on top of a portability
layer called mpjdev, which is in turn layered on MPI. We assume
the SP3 HPF is more carefully optimized for the hardware.
Of course the lower layers of Adlib could be ported
to exploit low-level features of the hardware (we already did some
experiments in this direction, interfacing Java to LAPI [14]).