How simple calculations can be a matter of life and death

The errors we've seen so far have concerned floating point numbers where accuracy is lost if there's not enough bits to store the mantissa. OK, those errors can add up, but essentially they're just rounding errors, and the likelihood of not having enough bits to store the exponent is comparatively small given that the maximum values they can store are absolutely huge.

When integers are involved, the effect can actually be far more serious. A 64-bit integer can store a maximum positive value of 9,223,372,036,854,775,807. If you try adding 1 to an integer variable that already equals this maximum value, you don't just lose that extra value. Instead, the integer overflows.

In other words, as far as a computer working in 64-bit integer arithmetic is concerned, 9,223,372,036,854,775,807 + 1 = -9,223,372,036,854,775,808 (note the minus sign). Something very similar happened on-board the European Space Agency's Ariane V rocket on its maiden flight.

EXPENSIVE MISTAKE:Programmers call it an overflow; in reality it was bad maths that caused this $370million spacecraft to explode

In fact, the arithmetic operation in question – if you can call it that – was even simpler than adding 1. Instead, it just involved copying one number that had been stored in floating point format to another location that was defined as an integer – and a 16-bit integer at that (maximum positive value of 32,767).

Unfortunately, the number was already too large to fit in the integer location, and as a result it overflowed. The exact sequence of events that followed is pretty complex but, to cut a long story short, the end result was that the Ariane V became one of the most expensive fireworks in history.

Guarding against cock-ups

This run-through of some of computing's most astonishing mathematical cock-ups may have come as something of an eye-opener to you. If so, you're probably wondering whether tomorrow's computers can avoid making such elementary mistakes.

Surprisingly, perhaps – and with the exception of the Pentium floating point error, which was caused by a hardware glitch – all of the errors we've mentioned here could have been prevented. In that sense, they can all be thought of as software errors.

As an example, let's take that integer overflow on the Ariane V rocket. That an integer can overflow isn't an error on the part of the processor because it's the way it's supposed to work. But whenever an integer does overflow, the processor sets something called a flag that the program can interrogate.

In the case of the Ariane software, the program didn't check for an overflow; if it had done, corrective action could have been taken. Of course, there will always be a limit to how large an integer can be and how much precision a floating point number can have – and this depends on the processor. But all of today's computers are universal computing machines, which means that they can solve any problem involving logic and maths.

So if a processor's internal instructions can't operate on large enough integers or on floating point numbers with sufficient precision, it's always possible for the programmer to implement arithmetic routines that will.

There will be a trade off against speed, though, which is why this isn't usually done. However clever the software or however much memory you use to store a floating point number, the result of some divisions will never be accurate.

We've seen how 1 divided by 10 is an infinite string in binary, and, in the general case, a move to decimal arithmetic wouldn't help either: 1 divided by 10 can be stored accurately in decimal, but 1 divided by 3 equals 0.3333333… ad infinitum.

The bottom line is that whatever number base you choose, some divisions will produce results that can never be stored accurately as a finite number of digits. Even this isn't a show-stopper, though.

Remember how 1.0 - 0.9 - 0.1 often yields an inaccurate answer because of rounding errors even though we know, immediately, that the answer is 0? Well, it's quite possible to write software to store the result of a division as a rational number.

In other words, you don't actually do the division – you just store the two numbers. In subsequent arithmetic operations you handle the values as fractions, just as you were taught in school, and the result will be exact.

So computers might suck at maths, but there's always a solution available to circumvent their inherent weaknesses. And in that case, it's probably more accurate to say that computer programmers suck at maths – or at least some of them do.