Quick question on double precision

This is a discussion on Quick question on double precision within the C++ Programming forums, part of the General Programming Boards category; Hi,
I am just messing around on some pre-existing project, and I noticed that if I use something like:
Code:
...

Quick question on double precision

I am just messing around on some pre-existing project, and I noticed that if I use something like:

Code:

double a = 1.354;
double b = 5617963.0 + a;

Stepping through the debugger i'm getting:

a = 1.3540000000000001
b = 5617964.5000000000

As if the addition of two doubles is only using single precision?

Yet if I create a dummy test project and copy in those exact same lines I get:

a = 1.3540000000000001
b = 5617964.3540000003

Which seems much more like a double should preform. Both projects use Visiual Studio 2008, so my question is, why is the former losing so much precision, and is there an option in VS that would cause it to do that?

Edit:
Even making sure both numbers are doubles I still get the same results:

The addition of two double variables does not use single precision. What you are describing is a fundamental property of all floating point representations. Precision is finite, so floating point types only represent a discrete set of values.

In base 10, for example, it is not possible to represent the value 1/3 in a finite number of decimal places (the decimal representation is 0.3333 <recurring forever>). If you are required to represent that fraction in a finite number of decimal places, you will truncate/round at some point.

The same phenomen occurs with all numeric bases. The catch is that the values with an infinite representation depend on the base. Floating point generally works with base 2 (binary) representation of the mantissa. The value 0.1 (decimal) has an infinitely recurring representation in base 2.

When you add two values of reasonably different magnitude, some truncation also occurs. Again, to use an example in decimal, imagine you are limited to 4 significant figures. The value 123.0 + 0.456 is actually 123.456 but, with truncation to 4 significant figures, you will get the value 123.5 (with rounding up). Again, the same sort of effect occurs with any base.

If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

I put these values in a Python 3.1 program to see their exact double values (Python 3.1 is one of those languages capable of printing all digits):

The value of a is 1.354000000000000092370555648813024163246154785156 25 (I don't know why the cboard editor put a space between the '6' and '25'), and the value of c is 5617964.35400000028312206268310546875. You can see both match their "true" values up to about 16 or 17 digits, which is about the precision of a double.

The value for b you say you see in the debugger makes no sense. Even if it were rounded to one decimal place, it still wouldn't be 5617964.5.

Except that floating point values are not rounded in decimal terms. The rounding is of binary digits. Things are a little further complicated as floating point types support a mantissa and an exponent (the value is mantissa times 10 to the power of exponent) - that's another story.

The net effect, however, is that floating point types support a set of discrete values. Lets say x1 and x2 can both be represented exactly in your floating point type, with no other representable values lies between them (that's what discrete means). Try to enter a value between x1 and x2, then the result is either x1 or x2 (the choice depends on how rounding is implemented).

There is no guarantee that the difference between x1 and x2 is less than 0.1. Practically, for "large" values, it can be greater than .1.

As to why things are different in the debugger versus the executable .... lots of reasons. Debuggers can use different floating point representations internally (eg a software emulation of greater precision). There is also the question of how the values are printed. Which brings us to your next comments.

The number of digits output is typically independent of the precision of the underlying data type.

Originally Posted by DoctorBinary

I put these values in a Python 3.1 program to see their exact double values (Python 3.1 is one of those languages capable of printing all digits):

The value of a is 1.354000000000000092370555648813024163246154785156 25 (I don't know why the cboard editor put a space between the '6' and '25'), and the value of c is 5617964.35400000028312206268310546875. You can see both match their "true" values up to about 16 or 17 digits, which is about the precision of a double.

Sorry, but this is incorrect.

That long string of trailing digits means that (1) python is using an extended precision type and/or (2) that digits have been output until some stopping criterion is met (i.e. the I/O function gives up at some point, to avoid getting into an infinite loop).

The exact value of many decimal fractions (0.1, 0.2, 0.354) still has an infinite representation in binary (values of 0.5 and powers of it are exceptions to that). It is also a mathematical fact that some values that can be represented to a finite number of binary places (eg some of the values that may be stored in a floating point variable) have an infinite representation in decimal. If the value output by Python was a "true" value, it would be looping forever on some values.

If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

That long string of trailing digits means that (1) python is using an extended precision type and/or (2) that digits have been output until some stopping criterion is met (i.e. the I/O function gives up at some point, to avoid getting into an infinite loop).

Those values are correct. They are the exact decimal representations of the binary values in their respective doubles. There is no chance of an infinite loop (see my next response).

Originally Posted by grumpy

It is also a mathematical fact that some values that can be represented to a finite number of binary places (eg some of the values that may be stored in a floating point variable) have an infinite representation in decimal. If the value output by Python was a "true" value, it would be looping forever on some values..

Not true. Every finite binary fraction terminates in decimal.

Originally Posted by grumpy

Describing that output as a "true" value is a fallacy.

I didn't say the output values (the double values) are the "true" values. This is what I said:

Originally Posted by DoctorBinary

You can see both match their "true" values up to about 16 or 17 digits....

The "true" values being the decimal values: 1.354 and 5617964.354

Last edited by DoctorBinary; 02-17-2010 at 08:53 AM.
Reason: Added one more point, about the "true" values

Can you look at the contents of memory for these variables -- b in particular? If so, can you post it? (Just post the 8 byte hex value and we can decode it.) That is the surefire way to see what's in your variable vs. what is being displayed.

Turns out iMalc was correct... Somewhere in the 3rd party lib's they MUST be calling '_controlfp', although I cannot find it...

Turns out that when I checked the precision it had been set to '_PC_24 (24 bits)'. So I just put in a call to set the precision back to 53 bits (for double) to use its normal precision.

Code:

unsigned int current;
_controlfp_s(&current, _PC_53, _MCW_PC);

Once I set this it seems to stay set, and the code works fine now, so whatever was calling it calls it only once at startup. I can understand wanting certain sections of code to run faster but doesn't it seem a little silly to just silently set this and not set it back after the 'faster' code you want executed has completed?

Now the rounding you see makes sense. 5617964.354 in pure binary is 10101011011100100101100.01011... This has 23 bits before the radix point, leaving only one bit after the radix point. The '.0' is rounded to .1 -- that is, 0.5 in decimal -- since bits 25 and beyond total > 1/4.

It's not unheard of for some libraries to change it, and ignore anything else that might be affected, or even not change it back when deinitialised.
Heck I do it myself in my software 3D renderer project. Not that it is packaged in any kind of reusable library at the moment. (In which case I would certainly make users aware of it if I left it in that state)

This 'fudge factor' for floats and doubles is defined as FLT_EPSILON and DBL_EPSILON respectively in MSVC.

FLT_EPSILON and DBL_EPSILON are defined by the C standard as "the difference between 1 and the least value greater than 1 that is representable in the given floating point type".

The specification of those values does not mean that all consecutive representable values in a floating point value differ by those _EPSILON values.

Originally Posted by McFury

I don't really think comments like this:

[grumpy]
"You need to find a basic text on numerical analysis."

Are really needed? Thanks for your input though.

The comment was a simple statement of fact. All basic texts on numerical analysis describe a number of the phenomena you have described, and the reason for them. In fact, a number of the phenomena you describe are the reason numerical analysis exists. If you take offense at being pointed to detailed discussions relevant to your question (or parts of it: the _controlfp() concern is another aspect) that is your problem.

If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.