In this section, we present some performance numbers
for a Java Linpack benchmark [10]
that solves a system of linear equations (i.e. factorization followed
by forward and back substitution).

Table 1:
Linpack benchmark on the IBM

n

JDK1.0.2

500

SPMquotMflops: 3.3 Time: 25.8 s. Norm Res: 5.28"

1000

SPMquotMflops: 3.2 Time: 210.9 s. Norm Res: 9.61"

JDK1.0.2 + native Level 1 BLAS

500

SPMquotMflops: 8.1 Time: 10.3 s. Norm Res: 4.50"

1000

SPMquotMflops: 7.6 Time: 88.5 s. Norm Res: 11.13"

JDK1.0.2 + native Level 1 BLAS (parallel factorize)

500

SPMquotMflops: 9.7 Time: 8.6 s. Norm Res: 4.50"

1000

SPMquotMflops: 15.6 Time: 43.0 s. Norm Res: 11.13"

In tables 1-3, we show
the results on the IBM, Sun,
and SGI, respectively,
for a pure Java implementation of this benchmark and an implementation
that uses native implementation of the primitives
DDOT, DAXPY, DSCAL and IDAMAX from
Level 1 BLAS. For the IBM, we also present the performance
of a version in which loop parallelization has been applied
to the elimination step in the factorization.

Again, it is clear that providing native Level 1 BLAS
can improve the performance substantially. Note, however, that
the mapping between Java and C data types caused
a slight change in precision of the computed result.