This JSR proposes extensions to the Java Programming Language
and Java Virtual Machine that support more efficient execution of
floating point code.

Section 1. Identification

Submitting Member: International Business Machines Corporation

Name of Contact Person: Marc Snir

E-Mail Address: snir@us.ibm.com

Telephone Number: 1-914-945-3204

Fax Number: 1-914-945-4425

Specification Lead: Marc Snir, IBM Corporation

E-Mail Address: snir@us.ibm.com

Telephone Number: 1-914-945-3204

Fax Number: 1-914-945-4425

Initial Expert Group Membership:

International Business Machines Corporation

Sun Microsystems, Inc

Endorsers of this JSR include individuals from:

National Institute of Standards and Technology

The Mathworks, Inc.

Section 2: Request

2.1 Please describe the proposed Specification:

Version 1.1 of the Java platform specification
gave strict
rules for floating point semantics, using the IEEE 754 Standard for
Binary Floating-Point Arithmetic. These rules enforced bit-by-bit
reproducibility of floating point results across implementations.
As a result, in some cases, the rules also significantly impaired floating point
performance by effectively prohibiting certain code generation optimizations
and the use of certain native operations on some processors.
Version 1.2 of the Java platform specification permitted a
relaxation of the rules for floating point semantics.
These new rules allow a larger exponent
(larger than that specified by earlier version of Java) to be
used in certain situations. This improved the achievable
performance of Java platform implementations for certain popular
microprocessors at the cost of bit-by-bit reproducibility of floating
point calculations on those processors. However, the relaxed rules still impair performance
in many important cases. In particular

The current Java platform specifications prevent the use
of hardware features such as the Fused Multiply Add (FMA) operation on
systems such as Intel IA64, PowerPC, PA-RISC 2.0, and MIPS IV. The FMA operation
computes a*b+c, where a, b and c are values representable in the IEEE 754 formats
double (or float); the
result is within 0.5 ulp's of the exact answer. The FMA operation does not
round the intermediate product before performing the addition; therefore,
the result may be (slightly) different than that obtained by computing
the product, rounding back to an IEEE 754 double (resp., float), followed by performing
the addition. On processors with FMA instructions, an FMA typically offers twice the throughput of a
multiplication followed by an addition.

The current Java platform specifications prohibit common
code generation optimizations. Such optimizations transform programs using field
axioms that hold for real arithmetic but that hold only approximately for
floating-point arithmetic.

While it is sometimes desirable to maintain
bit-by-bit reproducibility of floating-point operations, such strictness
is not always required. Floating-point arithmetic is an approximation
to real arithmetic, and rounding errors are unavoidable. From a numeric
viewpoint, in order to improve performance and/or accuracy, it is often
acceptable to replace a computation with another.
This relaxation of floating-point rules can be adopted as an option by the
Java platform, provided that strict reproducibility can be enforced
when needed and that suitable restrictions are set on implementations
exploiting the relaxed rules.

The Java Grande Numerics Work Group has
discussed several proposals to fix this problem, focusing on proposals
that

Introduce only limited platform dependence
and do so only for codes that allow the relaxation of floating-point rules.

Have limited impact on
accuracy.

Observe certain consistency and
reproducibility constraints.

Require modest implementation effort,
and negligible implementation effort on platforms that are not affected
by the current Java floating point restrictions.

Add at most one additional keyword.

This JSR is also guided by the above constraints.

We propose to add an FP-fast floating point
mode. This mode will be associated with methods declared using the fastfp modifier, and the methods of classes and interfaces declared using
the fastfp
modifier. It is thus superficially analogous to the FP-strict floating point mode
and strictfp modifier introduced in Java platform version 1.2.

The fastfp modifier can be used as a
modifier in a method declaration, in which case it applies to the method
itself. It can also be used as a modifier in a class or interface
declaration. It is a compile time error for a given method,
class, or interface declaration to contain both the strictfp and
fastfp modifiers.
The detailed specification will define if and when method modifiers can override
class modifiers.

A compiler for the Java programming language will recognize the fastfp
modifier and will set accordingly an ACC_FAST bit flag in the
method_info structure for each method within a declaration
bearing the fastfp modifier. (That is, declaring a class to
be fastfp causes all of its methods to be FP-fast; declaring
an interface to be fastfp causes its static initializer to be
FP-fast.) The ACC_FAST bit indicates to
the Java virtual machine that the associated method can be executed so
as to take best advantage of the underlying floating point
hardware and advanced code generation
optimizations, while respecting the constraints of the FP-fast mode.
The detailed specification will define those constraints on the behavior of
FP-fast methods, so that different implementations will
produce the same results up to acceptable rounding errors.

A method that does not have its ACC_FAST
bit set will be interpreted as having the default floating point mode or,
if its ACC_STRICT flag is set, having the floating point mode
FP-strict.
Thus, the behavior of methods not declared using the fastfp modifier
(and the behavior of preexisting binary classes) is not changed by this proposal.
A VerifyError will be thrown at the verification phase of class
linking if a method has both ACC_FAST and ACC_STRICT
set.

The FP-strict and the default floating point mode
are each a valid implementation of the FP-fast mode; thus, any Java
virtual machine implementation can trivially provide support for FP-fast
methods.

The class java.lang.reflect.Modifier,
which currently supports querying whether a method is FP-strict,
will be amended to permit querying whether a method is FP-fast.

The precise meaning of fastfp -- i.e.,
the extent to which the behavior of FP-fast methods can deviate from
the behavior of methods of the existing floating point modes -- will
be elaborated by the expert group. The following two options
were specifically proposed by the Java Grande Forum:

An expression of the form a*b+c can be replaced
by an FMA.

Floating point operations may be reordered,
assuming floating-point arithmetic to be associative.

Other proposals for possible optimizations in the fastfp mode
will be examined by the expert group.
The preference will be for nonprocedural specifications of the
allowed optimizations, as procedural specifications tend to get obsolete
faster and hamper future technological enhancements.
(An example of this situation is the procedural definition of the
behavior of trigonometric functions in Java.)
The specification for the use of FMA is an exception to its rule, given
its fundamental importance to linear algebra codes and its availability as a hardware primitive with an axiomatic definition.
The associative rule is an axiomatic specification, and thus nonprocedural.
Of particular interest to this group are optimizations that also
improve the accuracy of floating-point computations.
(We define a computation as being more accurate if the final result it
produces is closer to the exact result.)
These optimizations can be valuable to some classes of applications
(in a sense, they make floating-point arithmetic behave more like real
arithmetic) and can be used when, and only if, appropriate for the
particular computation.

It is important to note that the fastfp specification will provide
a list (at least conceptually) of which optimizations are valid in this mode.
Implementations are free to implement a subset of this list.
However, any implementation of an optimization must respect some fundamental principles.
One of these principles is that optimizations should be temporally and
spatially consistent.
That is, during a particular execution of a Java program, the same instance of
a construct must always produce the same result, and two different
instances of the same construct must also produce the same result.
(This reproducibility, of course, only holds for the same set of inputs
on constructs that are deterministic under the strictfp mode
of operation.)
Another fundamental principle is that the list of possible (not necessarily performed)
optimizations that can be applied to a piece of code must be defined by the
syntax of the code only.
That is, the application developer must be able to tell which optimizations
a particular implementation can or cannot perform on his/her code.

Implementations will provide information on the
transformations enabled in fastfp mode using (read-only) java.util.Properties
entries for each distinct optimization, e.g.java.fastfp.fma. An implementation can potentially choose, on
startup, which optimizations it will exploit, and set the corresponding properties.
Implementations can provide means of controlling which optimizations are
exploited for a particular run and which property entries are set, e.g.
with command line flags.

2.3 What need of the Java community will be addressed by the proposed specification?

The proposal will enable Java programs to achieve
competitive floating point performance by taking advantage of the
hardware of various widespread microprocessors and better code generator
restructuring.

2.4 Why isn't this need met by existing specifications?

Current specifications preclude using FMAs
and disable many common code generation optimizations.

2.5 Please give a short description of the underlying technology or technologies:

See 2.1.

2.6 Is there a proposed package name for the API Specification? (i.e., javapi.something, org.something, etc.)

Not relevant.

2.7 Does the proposed specification have any dependencies on specific operating systems, CPUs, or I/O devices that you know of?

The proposal can be implemented on any system
that currently supports Java, with minimal changes in compilers and Java
virtual machine implementations.
(The minimal change required in Java
virtual machine implementations is support for the fastfp
modifier in the reflection API.)
The effect of the ACC_FAST flag is system dependent.
(Note: Changes to a Java
virtual machine implementation are necessary in order to exploit the additional
optimizations enabled by the fastfp mode. However, since the effect
of ACC_FAST is system-dependent, Java virtual machine implementations
are not forced to perform any of these optimizations.)

2.8 Are there any security issues that cannot be addressed by the current security model?

No.

2.9 Are there any internationalization or localization issues?

No.

2.10 Are there any existing specifications that might be rendered obsolete, deprecated, or in need of revision as a result of this work?

This JSR proposes extensions to the current Java
programming language and virtual machine specifications.
It also proposes a minor extension to the
java.lang.reflect.Modifier API.

2.11 Please describe the anticipated schedule for the development of this
specification.

12/2000: First specification of fastfp extensions.

06/2001: Reference implementation of a Java compiler and
Java Virtual Machine that are fastfp
aware.

06/2001 - 08/2001: Open review of the fastfp
extension.

12/2001: Final specification of the fastfp
extension ready.

03/2002: Reference implementation and test suite
for final specification ready.

Section 3: Contributions

3.1 Please list any existing documents, specifications, or implementations that describe the technology. Please include links to the documents if they are publicly available.