Created attachment 6381[details]
test case and example workaround code
glibc's 387 asm for sqrt() simply uses the fsqrt instruction, yielding results that may not be correctly rounded, which violates the IEEE requirements for the sqrt operation. The cause is double-rounding due to the fact that the fsqrt operation is performed in extended 80-bit precision (and gives correctly-rounded results at this precision), then rounded again when it's converted to 64-bit double precision. Naturally this is very problematic for any algorithm that depends on the result of sqrt being correctly rounded.
Until recently, the bug was almost always masked by the corresponding bug in gcc (which generates fsqrt for __builtin_sqrt), gcc bug #52593:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52593
But since -fexcess-precision=standard was added (and made default for -std=c99 and -std=c11) somewhere in the gcc 4.5 series, the library function will be called when correct standards-conformance is desired, only to repeat the exact same mistake in the library code.
Using the C sqrt code would fix the problem, but I think that's highly undesirable. I have an algorithm we're using in musl that uses fsqrt and only fixes up the result in cases where the double-rounding could have an effect on the result; it uses the floating point status word to determine the direction of the rounding that takes place and biases the 80-bit value so that the result will be correctly rounded to double precision after store and load:
http://git.etalabs.net/cgi-bin/gitweb.cgi?p=musl;a=blob;f=src/math/i386/sqrt.s
Attached is test code by Szabolcs Nagy (nsz) which demonstrates incorrectly rounded results and includes a C function d_sqrt() which patches up the results of an incorrectly-rounded underlying sqrt.
Finally, note that sqrtf() is not affected. It can be shown (left as an exercise for the reader) that the correctly-rounded 80-bit result of fsqrt subsequently rounded to 32-bit single precision is also the correctly-rounded single-precision square root.