arbitrary

Arbitrary-precision arithmetic

On a computer, arbitrary-precision arithmetic, also called bignum arithmetic, is a technique whereby computer programs perform calculations on integers or rational numbers (including floating-point numbers) with an arbitrary number of digits of precision, typically limited only by the available memory of the host system. Using many digits of precision, as opposed to the approximately 6–16 decimal digits available in most hardware arithmetic, is important for a number of applications as described below; the most widespread usage is probably for cryptography used in every modern web browser.

It is often implemented by storing a number as a variable-length array of digits in some base such as 10 or 10000 or 256 or 65536, etc., in contrast to most computer arithmetic which uses a fixed number of bits in binary related to the size of the processor registers. Numbers can be stored in a fixed-point format, or in a floating-point format as a significand multiplied by an arbitrary exponent. However, since division almost immediately introduces infinitely repeating sequences of digits (such as 4/7 in decimal), should this possibility arise then either the representation would be truncated at some satisfactory size or else rational numbers would be used: a large integer for the numerator and for the denominator, with the greatest common divisor divided out. Unfortunately, arithmetic with rational numbers can become unwieldy very swiftly: 1/99 - 1/100 = 1/9900, and if 1/101 is then added the result is 10001/999900.

An early widespread implementation was available via the IBM 1620 of 1959-1970 which was a decimal-digit machine that despite using discrete transistors had hardware that performed integer or floating-point arithmetic (via lookup tables) on digit strings of a length that could be from two to whatever memory was available, though the mantissa of floating-point numbers was restricted to 100 digits or less and the exponent of floating-point numbers was restricted to two digits only: the largest memory supplied offered sixty thousand digits. Compilers for the IBM 1620 (Fortran), however, settled on some fixed size (which could be specified on a control card if the default was not satisfactory), such as ten digits. IBM's first business computer, the IBM 702, which was a vacuum tube machine, implemented integer arithmetic entirely in hardware on digit strings of any length from one to 511 digits. The earliest widespread software implementation of arbitrary precision arithmetic was probably that in Maclisp. Later, around 1980, the VAX/VMS and VM/CMSoperating systems offered bignum facilities as a collection of stringfunctions in the one case and in the EXEC 2 and REXX languages in the other. Today, arbitrary-precision libraries are available for most modern programming languages (see below). Almost all computer algebra systems implement arbitrary-precision arithmetic.

Arbitrary-precision arithmetic is sometimes called infinite-precision arithmetic, which is something of a misnomer: the number of digits of precision always remains finite (and is bounded in practice), although it can grow very large. Aside from the question of the total storage available, the variables used by the software to index the digit strings are themselves limited in size.

Arbitrary-precision arithmetic should not be confused with symbolic computation, as provided by computer algebra systems. The latter represent numbers by symbolic expressions such as pi sin(3), or even by computer programs, and in this way can symbolically represent any computable number (limited by available memory). Numeric results can still only be provided to arbitrary (finite) precision in general, however, by evaluating the symbolic expression using arbitrary-precision arithmetic.

Applications

Arbitrary-precision arithmetic is considerably slower than arithmetic using numbers that fit entirely within processor registers, since the latter are usually implemented in hardware arithmetic whereas the former must be implemented in software in most cases: note that certain "variable word length" machines of the 1950s and 1960s, notably the IBM 1401 and the Honeywell "Liberator" series, could manipulate numbers bound only by available storage, with an extra bit that delimited the value. Even if the computer lacks hardware for certain operations (such as integer division, or all floating-point operations) and software is provided instead it will use number sizes closely related to the available hardware registers: one or two words only and definitely not N words. Consequently, arbitrary precision is used in applications where the speed of arithmetic is not a limiting factor, or where precise results or exact integer arithmetic with very large numbers is required. It is also useful for checking the results of fixed-precision calculations, and for determining the best possible value for coefficients needed in formulae, such as the √⅓ that appears in Gaussian integration as just one example.

A common application is public-key cryptography, whose algorithms commonly employ arithmetic with integers of hundreds or thousands of digits; another is in human-centric applications where artificial limits and overflows would be inappropriate.

Arbitrary precision arithmetic is also used to compute fundamental mathematical constants such as π to millions or more digits and to analyze the properties of the digit strings (e.g. ), or more generally to investigate the precise behaviour of functions such as the Riemann Zeta function where answers via analytical methods are difficult to obtain.
Another example is in rendering Fractal images with an extremely high magnification, such as those found in the Mandelbrot set.

Arbitrary-precision arithmetic can also be used to avoid overflow, which is an inherent limitation of fixed-precision arithmetic. Just like a 4-digit odometer which rolls around from 9999 to 0000, a fixed-precision integer can exhibit wraparound if numbers grow too large to represent at the fixed level of precision. Some processors can deal with overflow by saturation, which means that if a result would be unrepresentable, it is replaced with the nearest representable value. (With 16-bit unsigned saturation, adding 1 to 65535 yields 65535 — see saturation arithmetic.) Some processors can generate an exception if an arithmetic result exceeds the available precision. Where necessary, the exception can be caught and the operation can be restarted in software with arbitrary-precision operands.

Since many computers now routinely use 32-bit or even 64-bit integers, it can often be guaranteed that the integer numbers in a specific application will never grow large enough to cause an overflow, though as time passes the exact nature of the constraint can be forgotten, as in implementations of the Binary search method which often employ the form (L + R)/2; this means that for correct functioning the sum of L and R is limited to sixteen bits (or thirty-two, etc.), not the individual variables. However, some programming languages such as Scheme, Lisp, Rexx, Python, Perl and Ruby use, or have an option to use, arbitrary-precision numbers for all integer arithmetic. Although this reduces performance, it eliminates the possibility of incorrect results (or exceptions) due to simple overflow, and makes it possible to guarantee that arithmetic results will be the same on all machines, regardless of any particular machine’s word size. The exclusive use of arbitrary-precision numbers in a programming language also simplifies the language, because “a number is a number” and there is no need for the multiplicity of types needed to represent different levels of precision.

Algorithms

Numerous algorithms have been developed to efficiently perform arithmetic operations on numbers stored with arbitrary precision. In particular, supposing that N digits are employed, algorithms have been designed to minimize the asymptotic complexity for large N.

The simplest algorithms are for addition and subtraction, where one simply adds or subtracts the digits in sequence, carrying as necessary, which yields an O(N) algorithm (see big O notation).

For multiplication, the most straightforward algorithms used for multiplying numbers by hand (as taught in primary school) requires O(N^2) operations, but multiplication algorithms that achieve O(N log(N) log(log(N))) complexity have been devised, such as the Schönhage-Strassen algorithm, based on fast Fourier transforms, and there are also algorithms with slightly worse complexity but with sometimes superior real-world performance for smaller N.

Example

Calculation of factorials produce very large numbers very swiftly. This is not a problem for their usage in many formulae (such as Taylor series) because they appear along with other terms so that given careful attention to the order of evaluation the net calculation value is not troublesome. If actual values of factorial numbers are desired, Stirling's approximation gives good results. If exact values are of interest then alas, the integer limit is soon exceeded. Even floating-point approximations soon exceed the maximum floating-point value possible, to the degree that the calculations should be recast into using the log of the number.

But if exact values for large factorials are desired, then special software is required, somewhat as in the pseudocode that follows, which implements the classic primary school algorithm to calculate 1, 1*2, 1*2*3, 1*2*3*4, etc. the successive factorial numbers.

Constant Limit = 1000; %Sufficient digits.

Constant Base = 10; %The base of the simulated arithmetic.

Array digit[1:Limit] of integer; %The big number.

Integer carry,d; %Assistants during multiplication.

Integer last,i; %Indices to the big number's digits.

Array text[1:Limit] of character;

Constant tdigit[0:9] of character = ["0","1","2","3","4","5","6","7","8","9"];

BEGIN

digit:=0; %Clear the whole array.

digit[1]:=1; %The big number starts with 1,

last:=1; %Its highest-order digit is number 1.

for n:=1 to 365 do

carry:=0; %Start a multiply.

for i:=1 to last do %Step along every digit.

d:=digit[i]*n + carry; %The classic multiply.

digit[i]:=d mod Base; %The low-order digit of the result.

carry:=d div Base; %The carry to the next digit.

next i;

while carry > 0 %Store the carry in the big number.

if last >= Limit then croak('Overflow!'); %If possible!

last:=last + 1; %One more digit.

digit[last]:=carry mod Base; %Placed.

carry:=carry div Base; %The carry reduced.

Wend %With n > base, maybe > 1 digit extra.

text:=" "; %Now prepare the output.

for i:=1 to last do %Translate from binary to text.

text[Limit - i + 1]:=tdigit[digit[i]]; %Reversing the order.

next i; %Arabic numerals put the low order last.

Print text," = ",n,"!";

next n;

END;

With the example in view, a number of details can be described. The most important is the choice of the representation of the big number. In this case, only integer values are required for factorials, so a fixed-point scheme is adequate. The powers of the base are zero and upwards, so it is convenient to have successive elements of the array represent higher powers. The computer language may not enable a convenient choice of the array bounds (for example, the lower bound might have to be one, always, or zero, always) and the requirements of the calculation in general might not involve a permitted bound, so this example proceeds with an array starting from one, not zero, to demonstrate the simple issues of accountancy. That the index into the digit array corresponds to a certain power of the base is not directly utilised as a part of the method.

The second most important decision is in the choice of the base of arithmetic, here ten. There are many considerations. The scratchpad variable d must be able to hold the result of a single-digit multiply plus the carry from the previous digit's multiply. In base ten, a sixteen-bit integer is certainly adequate as it allows up to 32767. However, this example cheats, in that the value of n is not itself limited to be a single-digit base ten number. This has the consequence that the method will fail for n > 3200 or so, not a pressing limit in this example. In general, n would be a multi-digit big number also. A second consequence of the shortcut is that after the multi-digit multiply has been completed, the last value of carry must be carried into higher-order digits beyond what was the upper limit of the previous number because it may be carrying multiple digits not just the single digit that would otherwise be normal.

Flowing from the choice of the base for the bignumber comes the issue of presenting its value. Because the base is ten, the result could be shown simply by printing the successive digits of array digit, but, they would appear with the highest-order digit last (so that a hundred and twenty-three would appear as "321") because of the first choice for the representation of the bignumber. The tradition for Arabic numbers is the other way around, so they could be printed in reverse order. But that would present the number with leading zeroes ("00000...000123") which may not be appreciated, so the final decision is to build the representation in a text variable and then print that. The first few results (with many leading spaces removed) are:

Reach of computer numbers.

1 = 1!

2 = 2!

6 = 3!

24 = 4!

120 = 5! 8-bit unsigned

720 = 6!

5040 = 7!

40320 = 8! 16-bit unsigned

362880 = 9!

3628800 = 10!

39916800 = 11!

479001600 = 12! 32-bit unsigned

6227020800 = 13!

87178291200 = 14!

1307674368000 = 15!

20922789888000 = 16!

355687428096000 = 17!

6402373705728000 = 18!

121645100408832000 = 19!

2432902008176640000 = 20! 64-bit unsigned

51090942171709440000 = 21!

1124000727777607680000 = 22!

25852016738884976640000 = 23!

620448401733239439360000 = 24!

15511210043330985984000000 = 25!

403291461126605635584000000 = 26!

10888869450418352160768000000 = 27!

304888344611713860501504000000 = 28!

8841761993739701954543616000000 = 29!

265252859812191058636308480000000 = 30!

8222838654177922817725562880000000 = 31!

263130836933693530167218012160000000 = 32!

8683317618811886495518194401280000000 = 33!

295232799039604140847618609643520000000 = 34! 128-bit unsigned

10333147966386144929666651337523200000000 = 35!

More serious attempts would try to use the available arithmetic of the computer more efficiently. A simple escalation would be to base 100 (with corresponding changes to the translation process for output), or, bigger computer variables (such as 32-bit integers) could be used so as to enable larger bases, such as 10,000. Conversion from non-decimal bases to a decimal base for output is a significant computation. Nevertheless, working in bases closer to the computer's built-in integer operations offers advantages. Operations on an integer holding a value such as six take just as long as the same operation on an integer holding a larger value, so there are large gains in packing as much of a bignumber into each element of the digit array as possible. The computer may also offer facilities for splitting a product into a digit and carry without requiring the two operations of mod and div as in the example. For instance, the IBM1130 integer multiply of 16-bit integers (actually of a 32-bit accumulator and extension register pair, with a nominated 16-bit word) produced a 32-bit result which could be treated as two separate 16-bit words, thus if the bignumber base was 65536 the carry would be in the high-order sixteen bits, and the digit would be in the lower-order sixteen bits. No mod and div operations would be required to separate them.

This sort of detail is the grist of machine-code programmers, and a suitable bignumber routine would run orders of magnitude faster than the result of the compilation of a high-level language, which do not offer similar facilities. Even so, it may be possible to juggle 16-bit and 32-bit variables in cunning ways, but the tricks (essentially, arranging that a 32-bit variable overlays the same storage as two 16-bit variables) are frowned upon by computer language purists. Thus the EQUIVALENCE statement of Fortran and the OVERLAY statement of Pl/1 are deprecated.

For a single-digit multiply the working variables must be able to hold the value (base -1)² + carry, where the maximum value of the carry is (base - 1). Notice that the IBM1130 offered a working register of 32 bits for 16-bit arithmetic so that many calculations whose intermediate results exceeded the 16-bit limit nevertheless worked; in a high-level language, if the bignumber's digit array were of unsigned 16-bit integers and the base 65536, the maximum result of a digit multiply would not exceed 4,294,901,760 but this exceeds the capacity of a 32-bit signed integer which is 2³¹ - 1 = 2,147,483,647. The high-level language may not offer a 32-bit unsigned integer as a variable (limit 2³² - 1 = 4,294,967,295), even if the computer's internal arithmetic register allows this or is bigger still. Or, if 32-bit unsigned integers are available, what then of the required 64-bit unsigned integers?

Choosing instead a base of 256 has the advantage of simplicity, and moreover, it is quite possible to check for the correct function of the basic arithmetic operations on all possible digit combinations. Errors in computer hardware are not unknown. Since the prime purpose for slow but exact or at least high-precision computation is to obtain definitive results, some sort of assurance is helpful.

Arbitrary-precision software

Arbitrary-precision arithmetic in most computer software is implemented by calling an external library that provides data types and subroutines to store numbers with the requested precision and to perform computations.