Efficient and accurate computations

[Warning: You need to enable Javascript, lest the math be all garbled and hurt your eyes even more.]

Programmers know how to improve the speed and robustness of their codes. However, when a program does a lot of calculations, new problems arise and new solutions are needed. Both are fairly language-independent: the same issues, with the same fixes, will exist in C++, Java, Fortran, Javascript.

Consequently, I will focus on the numerical aspects, without paying too much attention to opportunities for language-specific optimizations. The code snippets are in C++; the necessary #include and using (e.g. using namespace std) are assumed to be there (to make the code leaner and the syntax less specifically C++). Also some elements will seem to be missing (such as classes without constructors whose instances are miraculously initialized just fine) so as to focus on what is relevant without diluting it in an ocean of boilerplate code.

Do it faster

Making a code run faster is a wide-spread concern among programmers (of compiled languages). And calculations, when they are very massive or very numerous (e.g. deep nested loops), can be a performance drag. This is where programming meets applied mathematics and computations.

This code is inefficient because there is in fact no need to calculate all these square roots. A quick benchmark on my laptop found that the square root calculation was an order of magnitude slower than calculating the three products and two sums of x*x + y*y + z*z.

A better alternative (with additions and modifications in bold) is thus

When you have to first calculate the square of what you are interested in (e.g. Euclidean distance, standard deviation), ask yourself whether this square root is really necessary. However, sqrLength() may run into trouble if x, y or z is very large, see below.

However, it is uselessly repetitive. We are naturally drawn to implementing the exact variables that appear in the mathematical equations: if the equations have \(a\), \(b\) and \(c\), so will the program (likewise with the distance in the previous example). But in this case, we do not need them individually:

The weakness of this algorithm is subtle. One must first notice that, with exponentials as with powers, \(e^{(a-b) \times i} = (e^{a-b})^i = e^{a-b} \times (e^{a-b})^{i-1}\). And since we use these powers in increasing order we can simply multiply in each iteration:

(For the sake of simplicity some functions are assumed to be implemented elsewhere.) What is wrong with this? About everything. The only good thing about it is that it is O(\(N\)), not O(\(N^2\)): the inner loop is not over all atoms but only over the neighbors of the first atom (we keep a list of them, with help from the NSA), so the inner loop has constant complexity.

The powers (6 and 12) are calculated multiple times (and pow() may or may not be efficient with simple integral exponents). The distance is obtained through a perfectly pointless square root. Am I finicky? Yes (otherwise I would code in Python). But when the loop above runs millions of times for millions of atoms, and when simulations take days or weeks, no gain is really negligible. The following implementation is better (3.5 times faster on my laptop) without being more complicated:

One would make further modifications, such as having the class for atoms handle vectors directly instead of individual coordinates. Note that a pow6() function would be pointless in this case: the function taking \(x^2\) in and returning \(x^6\) is called cube().

Do it more accurately

A code may crash, freeze, return the wrong result, etc. These are often clear-cut: either black or white. When floating-point calculations are involved arises the grey zone of imprecise results — not completely wrong, but not quite right either.

Do not lose accuracy for no good reason

Imagine that we want to track the time evolution of some data (wittily called Data in the code snippet) and calculate the average change each time:

An eight year old knows that the answer is 0.1234. But if you attempt this operation with two floats, your code will not be as accurate as an eight-year old. This problem is known as "big number minus big number".

The case of integers is obviously different. It is the worst of situations (adding up too many positive integers will dump you straight into negative territory — the mother of all inaccuracies). It is the best of situations (the risk is so blatant than only the greenest of tyros would fall for it.)

Mathematically speaking, \(\sum_i b_i - \sum_i a_i = \sum_i (b_i - a_i)\). But computationally they may differ because of the finite accuracy of floating-point numbers. This is particularly relevant when \(b_i\) is of the form \(a_i + \varepsilon_i\), where the \(\{\varepsilon_i\}\) are small. This is because the accuracy of floating point numbers is relative: a number on the order of 1 will be accurate to about 10−15, but only to 10−5 if it is around 1010.

This solves the problem by calculating the new value of myData off its previous value and dDdata instead of calculating dData as the difference between data and oldData. Of course, this really works only if myData.change() does not simply return myData.update() - myData.

Implement a better (yet mathematically equivalent) formula

Example 1

Instead of rushing ahead and implementing your equation as it is written on paper, you may benefit from looking for a mathematically equivalent one which will run faster and (or) more accurately. For instance, if \(\varepsilon\) is small, \(\sqrt{x+\varepsilon} - \sqrt{x}\) is open to the problem of big number minus big number. This problem can easily be averted by remembering that \((a-b)(a+b) = a^2 - b^2\) still works if \(a\) and \(b\) are square roots:

i.e. the average of the squares minus the square of the average. However, this clever little formula, being open to the problem of big number minus big number, looks better on paper than in a code.

Examples 3 and 4

Another two (slightly more complicated) situations where a better mathematical formula makes a difference, a trigonometric function and the Verlet algorithm to discretize Newton's law, are given in an appendix.

Do not pay for accuracy you will not consume

I insisted that, on top on the usual trade-offs between speed, memory and readability, in codes carrying out a lot of calculations there is a trade-off with accuracy. A little cleverness can often give you more accurate results at very little cost (except using your brain and math). But you should also ask yourself whether it may not sometimes be a good idea to give up some precision you do not in fact need in exchange for using less memory or making the program run faster.

Use a smaller type for calculations

Say you want to create a class of fractions. You make it a template so that users can have fractions of 16-bit, 32-bit or 64-bit integers. But if you define 4/8 as a fraction of long longs and want to reduce it, it is probably a good idea to test whether numerator and denominator would not both fit is a smaller type and then call the reduction algorithm for that type:

(The code is obviously far from complete, but I think the trick is clear.) On my computer the speed-up is on the order of 2 or 3.

Using integers (?)

Imagine you need to deal with points, line segments, etc. in 2D (e.g. to describe a building blue print). It is fairly natural to store such dimensions as doubles. But you will not need numbers both on the scale of 1010 and of 10−10: all coordinates will be similar in scale. So they could be stored as integers, giving distances in millimeters for instance. A 16-bit unsigned integral type can go up to 65 535 mm = 65.5 m, which should be enough to describe most buildings. You do not need to know building dimensions down to the millimeter, but using millimeters ensures that the round-off errors will remain smaller than the uncertainty on the empirical measurements (centi­meters would be more prone to a build-up of errors).

This strategy may be useful when there are only simple calculations, e.g. monetary amounts can probably be advantageously stored as integers representing cents in accounting. Additions and subtractions will then be between integers, no problem. If there are more involved calculations, it may be necessary to use a different type to do the math: calculating the length of the diagonal of a rectangle requires to square the lengths of its sides, and these intermediate values may be larger than 65 535. So, 16-bit integers may be used for storage and doubles for calculations. (Full disclosure: I had this idea after implementing financial calculations when it was too late to change the type, so this is more a hunch than a tested idea; but if you do try it, let me know how it went.)

Do it at all

Inaccurate or slow calculations are bad. But it could be worse: some computations plainly crash.

Double-precision floats are typically limited to about 10308 in absolute value. Try to go past this limit and it will overflow. In some cases, this may be hard to avoid: your numbers are just huge. But in most cases, the final result would be well within bounds, but intermediate calculations overflow.

Back to the lengths of vectors

Using sqrLength(), as I advised above, can for instance be an issue if x, y or z is very large since it may overflow. For example, if x = y = z = 10200 then the square of the length equals
3 × 10400, which is larger than the largest double the computer can handle (even though the length itself is not, as it is barely larger than x, y and z).

If this is a genuine risk in your application then it would be a good idea to bypass the squared length and compare lengths directly. Of course length() cannot involve the square of the length and should be calculated using something like:

x/myMax, etc. are all between 0 and 1 (at least one of them exactly 1), so we take the square root of a number between 1 and 3 — no risk of overflow.

Factorials

You do not deal with numbers larger than 10308 every day. Well, you may. Factorials and exponentials for instance get pretty big pretty fast. An important case is that of combinations (and permutations):

Since \(\binom {n}{k} = \binom {n}{n-k}\) we used the variant requiring the fewest computations of the two (through i_min). Since dividing by i can create havoc if ans is not a multiple of it, I use a long double to carry out the calculations (one may consider using fractions, I have not tried it). The return type is (unsigned) 64-bit integer, whereas everything else is 32 bits, because the combination is generally much larger than n and k (in generic programming, it would probably be necessary to use the same large type for all).

Mathematical appendices

Derivation of the force for the Lennard-Jones potential

The force is minus the gradient of the potential energy (i.e. the derivative with respect to position). However, the energy is known as a function of the distance \(r\), not of the positions themselves (we have easy access to \(-dU/dr\) rather than to \(-dU/dx\)). Since \(d\,r^2 = 2\,r\,d\,r\) and \(d\,r^2 / d\,x = d(x^2+y^2+z^2) / d\,x = 2\,x\), we can write:

If \(\delta t = 10^{-3}\) then \(\delta t^2 = 10^{-6}\), but all it takes to get \(\delta t^3 = 10^{-6}\) is \(\delta t = 10^{-2}\). Concretely, running with a longer time step is like walking with a longer stride: much faster. This is often used in mathematical models such as molecular dynamics.