Showing What Software Can Do

Rounding Errors And Arbitrary Precision Arithmetic

The computer computes. Plus, minus, times and sometimes even divide. The results are always correct. Really? Computers usually use floating point numbers, which are very flexible, but still can cause quite substantial rounding errors – at least when you are doing complex calculations with many steps. These small inaccuracies can accumulate and become a creeping poison. Arbitrary precision arithmetic offers a way out.

“If you drink much from a bottle marked ‘poison’ it is certain to disagree with you sooner or later.” – Lewis Carroll

Numerical Inaccuracies in Practice

OK, let’s start up the Windows-Calculator and switch to “scientific mode”:

√4 – 2 = -8.1648465955514287168521180122928e-39

0 would have been correct, but close enough and square roots are a bit problematic anyway. Next try:

10-37 + 1 = 1

Not quite, although the rounded result certainly looks a lot nicer than the correct one.

In programming languages we encounter the same kinds of effects when using floating point numbers:

Java

1

2

3

4

floatintermediateResult=(3.0f/5000.0f)*5000.0f;

System.out.println("Zwischenergebnis: "+intermediateResult);

floatwrongResult=1.0f/(intermediateResult-3.0f);

System.out.println("Endergebnis: "+wrongResult);

… which will return an intermediate result of 3.0000002 and a final result of 4194304.

When using the datatype double (64-bit double-precision instead of 32-bit single-precision) the result does not inspire confidence either: The intermediate result in this case is 2.9999999999999996 and this time the end result is a very large negative number: -2251799813685248. This is not an error in floating point arithmetic (like the FDIV-Bug), but their normal behaviour as specified in the standard IEEE 754.

The errors usually start very small, but can accumulate during complex calculations to quite substantial errors. A few examples:

Solving linear equation systems is very unstable numerically. One such example involving Hilbert-Matrices can be found on the JLinAlg-Homepage (more about this open source library later).

Algorithms to approximate mathematical constants like the number π do not converge to the correct result anymore, if one uses floating point numbers. This problem can often be remedied to some extend by a suitable mathematical transformation of the problem.

Failure at Dhahran: On February 25, 1991 an Iraqi Scud-rocket hit a barracks of the US-Army in Dhahran (Saudi Arabia), although there was a Patriot missile defense system in place. 28 soldiers died because a calculation within the radar system based on floating point numbers had been getting more and more imprecise the longer it was in operation. After 100 hours the radar system was unable to track the incoming rocket properly and dismissed it as a measurement error. Details can be found in the official report.

The radar system used for Patriot-rockets

Floating Point Numbers

Floating point numbers are designed to perform numerical calculations as exactly as possible, despite having only a limited number of bits available. Contrary to fixed point numbers, like for example 0.0000042, floating point numbers do not waste bits on leading or trailing zeros. Instead numbers are normalized in a way resembling the scientific notation, which turns 0.0000042 into 4.2 · 10-6.

Using scientific notation one can write down small numbers and very large numbers in a compact form. Fixed point numbers will often need a lot more space for the same information. Two examples:

Planck time is the smallest time interval in which the laws of physics as we understand them today are valid. This constant is 5.391·10-44 seconds. As a fixed point number this would be: 0.000,000,000,000,000,000,000,000,000,000,000,000,000,000,053,91 seconds

The mass of the earth is approximately 5.974 · 1024 kg or as a fixed point number: 5,974,000,000,000,000,000,000,000 kg

Moreover, floating point numbers are displayed like 4.2e-6 for 4.2 · 10-6 and using floating point arithmetic are rounded and normalized (i.e. the exponent is recalculated) after each arithmetic operation.

Despite their great flexibility, floating point numbers are often imprecise because many numbers cannot be described with a finite number of digits. The decimal number 0.0625, which is 2-4 , is no problem for the binary system, but the decimal number 0.1 has infinitely many decimal places when one uses the binary system (just like 1/7 = 0,142857… when using the decimal system). The best approximation for 0.1 using single precision floating point numbers is – transformed back to a decimal number – exactly:

Conclusion

It is important to be aware of rounding errors when using floating point numbers and avoid them, if possible. There are cases in which it is worth sacrificing some computational performance and use arbitrary precision arithmetic.

Related Posts

Getting Hierarchical Information into Shape

TreeJuggler-Tutorial – Tree Juggling For Beginners

Solving Knifflidiffels – A Legespiel With Duplicate Cards

Juggling with Trees – How to Layout Hierarchical Information Compactly

About the Author

Andreas Keilhauer is a freelance software engineer and IT-consultant. He has been fascinated by complex problems and solving them with software for many years and is also engaged in the sensible utilisation of open source.