First -- a tiny introduction on IEEE floating point formats. On most chips, IEEE floats
come in three flavors: 32-bit, 64-bit, and 80-bit, called "single-", "double-"
and "extended-" precision, respectively. The increase in precision from one to the next
is better than it appears: 32-bit floats only give you 24 bits of mantissa, while 64-bit
ones give you 53 bits. Extended gives you 64 bits. (Unfortunately, you can't use
extended precision under Windows NT without special drivers. This is ostensibly
for compatibility with other chips.) Other CPUs, like the PowerPC, have even larger formats,
like a 128-bit format.

Second -- to dispel a myth. Typing "float" rather than "double" changes the memory representation of your data,
but doesn't really change the way the chip uses it. In fact, it guarantees you at least
floating point accuracy. Because optimized code keeps data in registers longer,
it will sometimes be more precise than debug code, which flushes to the stack often.

Again: typing "float" does not change anything that goes on in the FPU internally. If you don't actively
change the chip's precision, you're probably doing double-precision arithmetic
while an operand is being processed.

The x86 FPU does have precision control, which will stop calculation after it's
computed enough to achieve floating point accuracy. But, you can't change FPU precision from ANSI C.

What runs faster in single precision?

Speedwise, single precision affects exactly two calls: divides and sqrts. It won't make trancendentals any
faster (sin, acos, log, etc., all run the same no matter what: 100+ cycles.) If your program does lots of single-precision
divides and sqrts, then you need to know about this.

Single precision will at least double the speed of divides and sqrts. Divides take 17 cycles
and sqrts take about 25 cycles. In double precision, they're at least twice that.

On x86, precision control is adjusted using the assembly call "fldcw". Microsoft has a
nice wrapper called _controlfp that's easier to use. If you're using Linux,
I recommend getting the Intel Instruction Set Reference and writing the inline assembly.
(Send me code so I can post it!)

To set the FPU to single precision:

_controlfp( _PC_24, MCW_PC );

To set it back to default (double) precision:

_controlfp( _CW_DEFAULT, 0xfffff );

I use a C++ class that sets the precision while it's in scope and then drops back to the previous rounding and precision
mode as it goes out -- very convenient, so you don't forget to reset the precision, and you can handle error cases properly.

Lots of people have been talking about how bad Intel chips are at converting floating point to
integer. (Intel, NVIDIA, and hordes of people on Usenet.)

I thought it would be fun to try to do the definitive, complete, version of this discussion.
This will get a little winded at the end, but try to hang on -- good stuff ahead.

So, if you ever do this:

inline int Convert(float x)
{
int i = (int) x;
return i;
}

or you call floor() or ceil() or anything like that, you probably shouldn't.

The problem is that there is no dedicated x86 instruction to do an "ANSI C" conversion from floating point to integer.
There is an instruction on the chip that does a conversion (fistp), but it respects the chip's current
rounding mode (set with fldcw like the precision flags up above.) Of course, the default rounding mode does
not clamp to zero.

To implement a "correct" conversion, compiler writers have had to
switch the rounding mode, do the conversion with fistp, and switch back.
Each switch requires a complete flush of the floating
point state, and takes about 25 cycles.

This means that this function, even inlined, takes upwards of 80 (EIGHTY) cycles!

Let me put it this way: this isn't something to scoff at and say, "Machines are getting faster." I've seen quite
well-written code get 8 TIMES FASTER after fixing conversion problems. Fix yours! And here's how...

Sree's Real2Int

There are quite a few ways to fix this. One of the best general conversion utilities I have is one that Sree Kotay
wrote. It's only safe to use when your FPU is in double-precision mode, so you have to be careful about it if
you're switching precision like above. It'll basically return '0' all the time if you're in single precision.

(I've recently discovered that this discussion bears a lot of resemblance
to Chris Hecker's
article on the topic, so be sure to read that if you can't figure out
what I'm saying.)

This function is basically an ANSI C compliant conversion -- it always chops, always does the right thing,
for positive or negative inputs.

fistp

In either precision mode, you can call fistp
(the regular conversion) directly. However, this means you need to know the current FPU rounding
mode. In most cases, the following function will round the conversion to the nearest
integer, rather than clamping towards zero, so that's what I call it:

Direct conversions

I originally learned this one (and this way of thinking about the FPU) from
Dan Wallach when we were both at Microsoft
in the summer of 1995. He showed me how to implement a fast lookup table by direct manipulation
of the mantissa of a floating point number in a certain range.

First, you have to know a single precision float has three components:

sign[1 bit] exp[8 bits] mantissa[23 bits]

The value of the number is computed as:
(-1)^sign * 1.mantissa * 2^(exp - 127)

So what does that mean? Well, it has a nice effect:

Between any power of two and the next (e.g., [2, 4) ), the exponent is constant.
For certain cases, you can just mask out the mantissa, and use that as a fixed-point number.
Let's say you know your input is between 0 and 1, not including 1. You can do this:

The first line makes 'y' lie between 1 and 2, where the exponent is a constant 127. Reading the
last 23 bits gives an exact 23 bit representation of the number between 0 and 1. Fast and easy.

Timing: fast conversions

Sree and I have benchmarked and spent a lot of time on this problem. I tend to use fistp a lot more than his conversion, because I write a lot
of code that needs single precision. His tends to be faster, so I need to remember to use it more.

In specific cases, the direct conversions can be very useful, but they're about as fast as Sree's code, so generally
there's not too much advantage. You can save a shift, maybe, and it does work in single precision mode.

Each of the above functions is about 6 cycles by itself. However, the Real2Int version
is much better at pairing with floating point (it doesn't lock the FPU for 6 cycles like fistp does.) So in really
well pipelined code, his function can be close to free. fistp will take about 6 cycles.

Hacking VC's Runtime for Fun and Debugging

Finally, here's a function that overrides Microsoft's _ftol (the root of all evil.) One of the comp.lang.x86.asm guys
ripped this out of Microsoft's library, then Terje Mathison did the "OR" optimization, and then I
hacked it into manually munging of the stack (note: no frame pointer?) like you see below. But don't do this at home -- it's evil too.

It also may have some bugs -- I'm not sure it works when there are floating point exceptions. Anyway, it's about 10-15
cycles faster than Microsoft's implementation.

Also, it will show up in a profile, which is a really great thing.
You can put a breakpoint in it and see who's calling slow conversions, then go fix them. Overall,
I don't recommend shipping code with it, but definitely use it for debugging.

Also, there's a "fast" mode that does simple rounding instead, but that doesn't get inlined, and it's
generally a pain, since you can't get "ANSI correct" behavior when you want it.