Marc Daumas presented on "past and future formalizations of the
IEEE 754, 854, and 754R standards." The presentation closely
followed a
slide handout;
since the guest rule was in effect, there was no grilling until
after the presentation was complete.

Kahan began the questioning with the following statement:
"I approve of what you're doing. I regard it as
necessary. My hope is that continuing in that direction will
ultimately lead to specifications and implementations which
are not merely verifiable, but also maintainable." He
then stated that language support is a greater problem than
hardware correctness, and that formal verification can
supplement, but not supplant, testing. Kahan supported his
statement with several examples:

On June 4, 1996, the Ariane 5 rocket went out of control
shortly after takeoff, turning cartwheels over French
Guiana until the onboard safety systems caused it to
self-destruct. In their presentation
of the inquiry
report, the inquiry board concluded that the
development program "did not include adequate
analysis and testing."

The Ariane 5 control software was written in Ada, where
overflow is treated as a fatal exception which causes a
computation to abort to a handler. Because the Ariane 5
rockets were more powerful than those of the Ariane 4, the
software to recalibrate the gyros, which was copied from
the Ariane 4 system, suffered an overflow. Since there
was no defined exception handler, the system wrote
debugging information to memory, overwriting information
needed by other parts of the system, and transferred
control to the second computer. The backup computer
suffered the same exception and transferred control back
to the first computer.

Had Ada supported the IEEE standard, the overflow would
have generated resulted in an exceptional value and a
raised flag, but no disastrous change in control flow.

Software is never fully tested if it is at all
interesting. It's unfair to say we didn't know what the
software should do in the exceptional circumstance; in any
interesting piece of software, there are cases like that.
We simply don't want the software to die messily when in
such circumstances.

T800 verification:

In his paper, "Formal
Methods Applied to a Floating Point Number
System," Geoff Barrett verified that an Occam
implementation intended to mimic the Inmos T800
Transputer's floating point hardware did indeed meet a Z
specification intended to capture the requirements of IEEE
754. There were two mistakes. The first mistake was a
transcription error of the type Daumas mentioned in his
presentation. "The most negative binary
integer" was uttered instead of "-0"
somewhere in the specification, so that the routing in
add-subtract went the wrong way when signed zeros were
added. The second mistake was more vicious. Since the
Transputer was designed to do very rapid context
switching, so the engineers wanted to minimize the amount
of system state. Since they regarded overflow, divide by
zero, and invalid exceptions as "mistakes," they
replaced three distinct flag bits with a single bit. So
the T800 did not conform to the standard.

Stack overflow on the 8087:

In Palmer and Morris's book on the 8087, they acknowledged
that you could hardly write the software to naturally
handle stack overflow without contortions. Worse,
software that would have worked on the 8087 would not work
on the 80287, and software for the 80287 would not work on
the 80387. Microsoft decided to simplify their lives by
enabling the invalid operation trap. On stack
over/underflow, the system will jump to a handler to tell
you to rewrite your program. The same trap occurs on
invalid operations like 0/0; the handler deals with this
situation by re-executing the instruction and then
clearing the flag so that the invalid trap handled will
not occur again immediately when control is returned to
the program. So even though the IEEE flags are supported
in the 80x87 hardware, they are not accessible from
software under Windows.

Pentium FDIV bug:

The algorithm on which the infamous buggy Pentium FDIV was
based was proved correct.

The proof was incorrect, and that incorrectness had
already been published in the literature. The author of
the original paper on the algorithm introduced a cryptic
footnote which, to the knowledgeable and observant reader,
meant "you shouldn't take this too literally."
The FDIV implementors followed the proof as presented in
the paper and claimed their hardware design was now proven
correct. In fact, the design was correct; but the proof
wasn't.

The trouble was not in the design, but in its
transcription to hardware. The table specified in the
design was not correctly transferred to the PLA; some of
the entries in the table were incorrectly identified as
don't-cares. Unfortunately, the mistake affected table
entries which were infrequently exercised, and the test
battery designed for an older implementation failed to
expose the problem. The designers, having
"proved" the correctness of the design, believed
they did not need a new test battery.

Moral: Even after an algorithm is proved formally correct,
it must be tested. Testability is a design feature; for
instance, the algorithms for the 8087 were designed to be
tested. Some 14000 arguments were needed to design for
correctness. On the 8087, all the transcendentals were
tested with these test vectors, and nobody ever found a
bug not induced by some erroneous optimization introduced
later.

Kahan expressed additional concerns about the feasibility of
verifying some algorithms. There is a tradeoff between cost
and ease of proof; algorithms which can be proven using
pessimistic inequalities are often overkill. For example,
Kahan noted that the algorithms in fdlibm use tricks to modify
the coefficients of a Chebyshev approximation in order to
compensate for error due to the rounding of the stored
coefficients. If you use such a trick, the error due to
coefficient rounding becomes negligible compared to other
error sources, but proving the algorithm gets good results
becomes trickier.

Daumas mentioned that for small arguments, the problem of
checking correctness is largely solved. Kahan replied that
the only time such techniques work is when you can afford
large enough tables. For quad precision, for example, it is
unlikely that large tables of low-order approximants will
suffice.

Daumas noted that HOC and Coq are systems that check proofs; a
human must generate the proof. He compared it to a colleague
you can call any time who will be picky about your proofs, and
who will not tell you to shut up because he is tired of
listening. Kahan agreed that, if they are used to verify a
human-derived proof, automatic tools would enhance confidence
in the proof. However, he warned, such verification should be
considered not as a guarantee, but rather as corroborating
evidence of correctness. Kahan also noted that proofs
involving integrals, integral inequalities, and limits seemed
likely to be difficult to express; Daumas responded that Coq
does have some support for limits. Kahan gave an example
non-transcendental problem that he thought might be difficult
to verify; Daumas subsequently provided a solution, which
he sent to the mailing list.

Kahan and Daumas also briefly debated applicability of
automated verification techniques to simulated extra
precision. Kahan thought Demmel
and Hida's paper on accurate summations might provide
difficult examples. Daumas noted that he has worked some on
applications to online arithmetic. Kahan responded that he
cannot afford to spend that long.

Zuras also asked several questions. His first question was
whether automated tools could be used to determine whether a
set of test vectors exercised all the hypotheses of a proof.
The question as posed was never answered, but it did provoke
further discussion of testability and the design of the 8087.
Zuras noted that Daumas had mentioned a growing set of proved
facts about 754, and asked if any of those facts involved FMA
operations. Daumas responded that there were no such verified
facts yet, but that it should be simple to do the
verifications based on current work. Finally, Zuras noted
that Daumas works with Muller, and asked whether they had made
any progress on proving and verifying correct rounding.
Daumas thought that it would likely be a difficult task.

Matula asked Kahan: would it be worthwhile to have a flag to
signal when massive cancellation occurs? He noted that
something like this could be used to monitor inner product
accumulation to tell when more precision is needed, for
instance.

Kahan replied that the IBM 360 and the Sigma 7 both had
variants of such flags. Nobody could figure out how to use
them. Besides, he said, cancellation is not the bad news, but
only the bearer of bad news. As to extra precision in inner
products, that takes us to the XBLAS. When we are using
extended precision tricks, we don't want to avoid
cancellation; we want to court it!

The sensible thing to do, said Kahan, is to re-do uncertain
computations in higher precision and see how many figures
agree. The technique is not perfect, but the probability of
success is high, and it is cheap compared to error analysis.
Programming languages that made such recomputations easy, he
said, would contribute far more to the safety of typical
programs than any attempts to promulgate proof techniques
(which will be used by specialists).

We began with our monthly discussion of table 4. At this
point, we specify that the first six predicates shall be
provided, the next five should be provided, and the remainder
are included in the document for informational purposes,
though they can be implemented by standard logical
combinations of the first eleven. Our attempted guiding
rationale in discussing the predicates is to say what they
are, not what they aren't. Jason suggested putting the
"not" cases and making them separate, possibly to
the extend of putting them in a separate table, with all the
verbiage about predicate negation there. We discussed Jim
Thomas's two-part proposal from a couple months ago; the first
part was approved last time, and the second part was
withdrawn, so the discussion was brief.

Alex Liu pointed out that there is no reference to signaling
NaN in this section, even in the original text. For library
implementors who want things to be quiet even in the face of
signaling NaNs, this is a problem. Hough observed that there
is also no way to make adds and subtracts quiet.

At this point, we again turned to the troubles of signaling
NaNs. Hough asked whether we still thought we might use
signaling NaNs to indicate unknowns; if so, he said, then we
ought to revisit some of our discussions on negating signaling
NaNs, for instance. Kahan pointed out that the origin of
signaling NaNs was a political compromise. There were members
on the original committee who thought infinity was too drastic
for overflow and zero too drastic for underflow, and so the
signaling NaN was proposed as a kludge to allow them to
implement whatever they wanted. Other people found different
uses, but not many. The biggest use is to represent
uninitialized data.

Zuras asked whether we could just use quiet NaNs for
uninitialized data. The only operations which are not NaN
preserving are those which would ignore the operand anyhow.
Hough argued that uninitialized data is still a compelling
reason to maintain signaling NaNs. Kahan thought the right
idea, even if the implementation is difficult, is that when
signaling NaNs are used as arithmetic operands, they are first
either replaced with a quiet NaN or they are replaced with
some quantity determined by the programmer. Kahan then
repeated Hough's point: if it is to be used for uninitialized
data, signaling NaNs should generate a signal other than
"invalid operation."

Thomas observed that the situation sounds like an extreme case
of the tail wagging the dog, and pointed out that we do not
want to make changes without a compelling reason.

Hough began with the statement that he does not particularly
believe in counting mode. If most of the computations are in
double, and extended is available, why would counting mode
warrant hardware support? Riedy mentioned determinants, and
Zuras objected that the LU factorization is more expensive
than the product in that case; Riedy said "Sparse," and Zuras
said "Oh." Kahan confirmed that scaling time could be
dominant for sparse determinants, and mentioned that when
computing Klebsch-Gordon coefficients, scaling also takes up
much of the time.

Hough asserted that the original point of a split definition
of underflow (underflow when any subnormal is generated vs
when accuracy is lost due to subnormalization) is to support
counting mode. Kahan disagreed. The reason for trapping on
any subnormal, he replied, is that there are people who never
want to see a subnormal number in their programs. They want
to flush to zero or to some threshold. Speed is one reason,
as owners of older Alphas knew; another reason is
computational convenience. For example, a speaker at LBL
recently talked about computing fast spherical harmonics. As
part of his computation, he generates Legendre functions by a
backward recurrence. The numbers generated become very tiny;
when they do, their exact value matters little, so long as
they are nonzero. This gentleman had a legitimate reason for
avoiding gradual underflow.

Hough thought that it might be worthwhile to allow two user
exceptions: underflow and exact subnormal. He thought it
would be a mistake to combine the two. Scott agreed with the
generic opinion that the current multi-definition of underflow
causes a lot of confusion to a lot of people. And now,
continued Hough, since we are proposing a menu, the logical
solution seems to be to provide two separate extensions. We
have already accepted the suggestion that exceptions will be
defined from the perspective of the user program. Kahan
asked: if people who want flush-to-zero set both traps, and
people who want the default disable both traps, what would
someone who wanted counting mode, wrapping, etc want? Would
they turn on the first exception and not the second? Hough
replied that he thought it depended on who it is that wants
counting mode, and whether they wanted subnormals or not.
Several noted that flush-to-zero is generally for performance
anyhow.

Schwarz stated that the fewer alternatives there were, the
happier he would be. He only wants one behavior for
non-trapped mode. Designers at IBM are used to subnormals, he
said, and can implement them at high speed. Hough asked how
they cope with gradual underflow, and whether it would be
possible to post something to the mailing list describing the
alternatives. Schwarz thought it would be worthy of a paper
which has not yet been written, but agreed that he would look
through his library and provide a bibliography.

Schwarz's summary
to the mailing list is far more comprehensible than my
notes on the brief discussion of FMA implementation that
followed, and I refer the interested reader there. As I
understand it, the IBM machines take advantage of FMA data
flow and tags in the register file. The Athlon machines
internally represent all numbers in a normalized format, and
invoke microarchitectural traps when producing subnormal
results and when writing subnormals to memory. The penalty
for the microarchitectural traps on the Athlon is a modest
number of cycles (14 instead of 4 cycles to produce subnormal
outputs; 8-10 instead of 2 cycles to store a subnormal); but
the real penalty is that the pipeline is flushed.

Kahan stated that he would like to see implementations where
the pipeline is not flushed, but merely stalled. Das Sarma
replied that it is possible, but difficult, and that denormals
are so extremely rare in benchmarks that it is not worth
penalizing the common case. Riedy asked if they would run a
benchmark that penalized slow gradual underflow, if such a
benchmark existed. Das Sarma replied that they would.

Hough's reason for bringing up underflow in the first place was
his work on a proposed facility to statically modify an operation
at compile time. He suggested that we might write, for instance

x * y
invalid: substitution(expression)
substitutionmag()
goto label

The point is that this special, static way of dealing with
this is right there. That's the essence of the proposal, and
it leads to other things -- like what the names of the
exceptions are, why signaling NaN should not be among them,
and why we should have a different name for an exact subnormal
result. More details are available from postings to the
mailing list.