On Friday, October 25, 2002, at 10:24 AM, Steven T. Hatton wrote:
> On Friday 25 October 2002 09:34 am, you wrote:
>
> This is not a high priority for me, but it would be nice to know how to
> identify the source of such discrepancies.
>
In a program as large and complex and (dare I say it?) closed source as
Mathematica that would be nearly impossible. Without access to actual
instruction stream sent to the processor and knowledge of the processor
state there is no way of determining the exact result of anything but
the simplest floating point expression. In principle you could attach
a debugger to the executing code but good luck in being able to
determine which machine instructions correspond to the expression of
interest in a program as large as Mathematica.
Using some small C or Fortran programs in a debugger may help you probe
your machine's and compiler's idiosyncrasies but that's not much help
in determining how Mathematica operates.
> I *believe* Java trys to create a platform neutral computing
> environment which
> insures results will be uniform across platforms. IIRC, I read
> something
> about this kind of thing in the Mathematica documentation. There are
> ways of
> manipulating the content of atoms which can be used to optimize
> performance,
> but they are discouraged because they can lead to the kinds of
> discrepancies
> were are discussing.... Indeed, see A.1.4 of the Mathematica Book:
> (4.2, Help
> Browser)
>
The information in A.1.4 seems to indicate that you have access to the
underlying bit pattern in hexadecimal form of a float or complex or
other atomic type. It's not obvious to me that you can use these
results to manipulate floating point computations though since the Raw
representation is just that a representation and not a pointer to the
actual data. Anyway the raw byte patterns of data still doesn't give
you enough information. Returning to the example of Intel x86 versus
most other architectures, if you set the processor state correctly
multiplying two IEEE 754 floating point doubles (64-bit long float)
will result in an intermediate 80 bit IEEE floating point long double
(that's why the register is 80 bits wide) which will then be converted
back to an IEEE 754 floating point double when stored in memory. That
64 to 80 bit conversion is the source of many differences between
identical numerical code on an x86 chip and other architectures.
Java gets around this by disabling the 80 bit conversion. I have tried
as much as possible to stay away from the details of x86 machine code
as possible in my career so I don't know the details of the process but
I believe it is possible to turn off the 64 to 80 bit conversion on x86
chips. However many JVMs now do JIT compilation, which converts large
sections of Java code to optimized machine code for execution.
Depending on the JIT compiler and hardware you use it is possible that
a complex expression could be evaluated in different orders depending
on the peculiarities of the pipeline structure of the processor. These
guys (http://www.naturalbridge.com/floatingpoint/) seem to agree with
me about JITs effect on Java's reproducibility problems. I'm not sure
what the Java standard says about this though, but from page 41 of this
paper (http://java.sun.com/people/darcy/JavaOne/2001/1789darcy.pdf) it
seems there is a flag that forces a JVM to force equivalent results on
different architectures. There is probably a large performance penalty
for this though.
Finally section 3.1.6 of the Mathematica book seems to indicate that
Mathematica will use whatever native floating point capabilities exist
on a particular hardware platform. Again, this indicates to me that we
can expect machine precision values to have 80 bit intermediate values
in calculations on x86 (I'm not sure about IA-64) hardware. Also it is
probably safe to assume that the Mathematica kernel is compiled to take
advantage of the pipeline structure of whatever hardware it is running
on and that machine precision expressions will be executed in the most
efficient order possible on a particular hardware platform. The only
way to test this is to create an expression whose result severely
depends on the order of operations implied by parentheses then evaluate
the expression with different parenthesization (is that a word?) to see
if Mathematica respects the parentheses or internally reorders the
expression into the most efficient form. I'm trying to think of an
example right now, I'll post the results when available.
> I do seem to recall a lost probe not too long ago. Seems someone
> forgot to
> convert from miles to kilometers, or something like that. To my mind
> that was
> just plain stupidity. They should never have been using imperial
> units in
> the first place.
>
There is also the story of the Ariane rocket
(http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html)
that was lost due to not checking for out of range exceptions when
converting from floating point values to integers. No process is
perfect, including auditing the code.
> Nonetheless, this demonstrates the kinds of things which can creep
> into your
> calculations. Suppose the Auditors come in and bless everything off,
> and the
> next week you get a brand new computer. Not realizing that the
> hardware will
> influence the outcome of your calculation, you copy everything over to
> the
> new system, give the nice lady at WRI a call to get your new password,
> and
> run the calculations 10 times faster, but based on assumptions which
> are
> inconsistent with your current environment. Ooops, "Houston, we've
> got a
> problem."
>
The possibility of having to make that phone call is why I am glad I
don't work for NASA ;-).
At any rate If you were considering writing satellite launch or control
code with Mathematica I suggest you examine the second paragraph of the
Limited Warranty on your license agreement, in particular "WRI does not
recommend the use of the software for applications in which errors or
omissions could threaten life, injury or significant loss."
Regards,
Ssezi