On 1 April 2010 03:15, David Cournapeau <david@silveregg.co.jp> wrote:
> Anne Archibald wrote:
>>>>> Particularly given the comments in the boost source code, I'm leery of
>> this fix; who knows what an optimizing compiler will do with it?
>> But the current code *is* wrong: it is not true that u == 1 implies u -
> 1 == 0 (and that (u-1) != 0 -> u != 1), because the spacing between two
> consecutive floats is much bigger at 1 than at 0. And the current code
> relies on this wrong assumption: at least with the correction, we test
> for what we care about.
I don't think this is true for IEEE floats, at least in the case we're
interested in where u is approximately 1. After all, the issue is
representation: if you take an x very close to zero and add 1 to it,
you get exactly 1 (which is what you're pointing out); but if you then
subtract 1 again, you get exactly zero. After all, u==1 means that the
bit patterns for u and 1 are identical, so if you subtract 1 from u
you get exactly zero. If u is not equal to 1 (and we're talking about
IEEE floats) and you subtract 1, you will get something that is not
zero.
The problem is that what we're dealing with is not IEEE
floating-point: it resembles IEEE floating-point, but arithmetic is
often done with extra digits, which are discarded or not at the whim
of the compiler. If the extra digits were kept we'd still have
(u-1)==0 iff u==1, but if the compiler discards the extra digits in
one expression but not the other - which is a question about FPU
register scheduling - funny things can happen.
> So I would say the code correction I suggest is at least better in that
> respect.
>>> Now the
>> "correction" is completely bogus.
>> I am not sure to understand why ? It is at least not worse than the
> current code.
Well, it would be hard to be worse than inappropriate NaNs. But the
point of the correction is that, through some deviousness I don't
fully understand, the roundoff error involved in the log is
compensated for by the roundoff error in the calculation of (1+x)-1.
If the compiler uses a different number of digits to calculate the log
than it uses to caiculate (1+x)-1, the "correction" will make a hash
of things.
>> Since this is a subtle issue, I vote for delegating it (and log10_1p)
>> to log1p. If *that* has trouble, well, at least it'll only be on
>> non-C99 machines, and we can try compiler gymnastics.
>> Yes, we could do that. I note that on glibc, the function called is an
> intrinsic for log1p (FYL2XP1) if x is sufficiently small.
Even if we end up having to implement this function by hand, we should
at least implement it only once - log1p - and not three times (log1p,
log10_1p, log2_1p).
>> Clearly the optimizing compiler is inserting the DRDB (drain David's
>> battery) opcode.
>> :)
That was tongue-in-cheek, but there was a real point: optimizing
compilers do all sorts of peculiar things, particularly with
floating-point. For one thing, many Intel CPUs have two
(logically-)separate units for doing double-precision calculations -
the FPU, which keeps extra digits, and SSE2, which doesn't. Who knows
which one gets used for which operations? -ffloat-store is supposed to
force the constant discarding of extra digits, but we certainly don't
want to apply that to all of numpy. So if we have to get log1p working
ourselves it'll be an exercise in suppressing specific optimizations
in all optimizing compilers that might compile numpy.
Anne