Hack the derivative

Follow along

The Python functions used in this article are available on PyPI and GitHub. These links point to the most current version as of publishing (September 2015), but the latest version (if any changes have been made in the mean time) are easily available from there. To follow along at home:

Introduction

The derivative (the mathematical flavor, not the financial) is one of the first things that a young calculus student learns. By the time they are formally taught it, it’s likely that a student has already encountered the idea before; the slope of a line (rise over run!) is typically taught a couple years earlier. The student has likely already thought about the relationship between position, speed, and acceleration in physics, even before it is explained to them directly. But the derivative is not only the first piece of calculus presented to young mathematicians, it is one of the most important cornerstones of advanced mathematics, particularly in applied mathematics.

Computing the derivative (as opposed to calculating it by hand) is a critical procedure in applied fields like physics, finance, and statistics. Standard approaches for this calculation exist, but as is always the case with computational mathematics, there are limits and tradeoffs.

When taking derivatives, we are typically working with the real numbers. The real numbers have a number of fancy mathematical properties that are incompatible with finite representation, whether that be on a piece of paper or in a computer. In fact, single irrational numbers (say π\piπ or 2\sqrt{2}√​2​​​) cannot even be represented exactly in a computer. Modern 64-bit computers can, however, do exact operations on the integers modulo 18,446,744,073,709,551,616, represented unsigned as {z∈Z∣0<=z<=264−1}\{z \in \mathbb{Z} \mid 0 <= z <= 2^{64}-1 \}{z∈Z∣0<=z<=2​64​​−1} or signed as {z∈Z∣−263<=z<=263−1}\{z \in \mathbb{Z} \mid -2^{63} <= z <= 2^{63}-1 \}{z∈Z∣−2​63​​<=z<=2​63​​−1}.

The standard computational approximation of the real numbers are the floating point numbers, and they work quite nicely for a great deal of calculations. Yet they are just that: an approximation. This rears its ugly head in the most standard way of computing the derivative. To understand this, we will look into the derivative itself and the implementation of floating point numbers. Once we understand the issue at hand, we’ll find an interesting solution in one of the most important theorems in modern mathematics, the Cauchy-Riemann equations. To tee this up, we’ll have to understand a little bit about the imaginary number (iii in math, j in Python) and complex numbers. Then we’ll be ready to hack the derivative.

The derivative

Informally, the derivative is a mathematical method for determining the slope of a possibly nonlinear curve at a specific point, as shown in the following image from Wikipedia:

Formally, the derivative of f(x)f(x)f(x), f′(x)f'(x)f​′​​(x), is a new function which tells us the slope of the tangent line that intersects fff at any given xxx. Mathematically, we define this as

This is obviously just the tip of the derivative iceberg, and I encourage you to check out Wikipedia or a text book if you feel uninitiated.

Finite difference

The most common approach for computationally estimating the derivative of a function is the finite difference method. In the definition of the derivative, instead of taking the limit as h→0h \to 0h→0, we simply plug in a small hhh. It probably seems quite natural that the smaller the step we take, the closer the approximation is to the true value. A curious reader can verify this using a Taylor Series expansion.

The finite difference method can be implemented in Python quite easily:

(Remember from Example 1 that f′(x)=2xf'(x) = 2xf​′​​(x)=2x for f(x)=x2f(x) = x^2f(x)=x​2​​, and so f′(1)=2f'(1) = 2f​′​​(1)=2.) This is fairly accurate, but can we take it further? As discussed earlier, we are using floating point numbers, so in Python we can try this out with the smallest float.

This is clearly the wrong answer. Since h = 0.00001 was reasonably small and worked before, you may be tempted to just use that. Unfortunately, it can break too: consider the case where x0=1020x_0 = 10^{20}x​0​​=10​20​​.

In[6]:finite_difference(f,1.0e20,0.00001)Out[6]:0.0

To understand what is happening here, we’ll take a closer look at the floating point numbers.

In essence, floating point numbers have three main components: the sign, the exponent (which gives us scale), and the fraction (which gives us precision). This is much like scientific notation, e.g., 5.916829373×10235.916829373 \times 10^{23}5.916829373×10​23​​, but in base two. For the following example, we’ll use base ten scientific notation, since it’s a bit easier to work with.

This format allows for two numbers to be multiplied and divided without loss of precision:

This works great for multiplication and division (due to associativity and commutativity) as we can rearrange and compute these operations separately. This is not the case, however, for addition and subtraction:

This may seem obvious, but it means that the floating point numbers have gaps. For each floating point (except the largest one), there is a next one. More importantly, the gaps between each floating point number and the next one change as the floating point numbers get larger. We can see this in Python:

At this point we have essentially minimized the total error, both from the finite difference method and from computational round off.2 As such, it would be easy to think this is as good as it gets. Surprisingly, though, with the help of the complex numbers, we can take this even further.

Complex analysis

Have you ever tried to solve x2+1=0x^2 + 1 = 0x​2​​+1=0? If you have, you’ll know that it is impossible for all x∈Rx \in \mathbb{R}x∈R. We would need x2=−1x^2 = -1x​2​​=−1, but when we square a real number, it will stay positive if it’s positive, become positive if it’s negative, or stay 000 if it’s 000. In order to solve this, we actually need to bring in a whole new set of numbers: the imaginary numbers. We define

i=−1i = \sqrt{-1}i=√​−1​​​

and then, together with the real numbers, build the complex numbers as:

We can now solve our dreaded x2+1=0x^2 + 1 = 0x​2​​+1=0 with ±i\pm i±i. In fact, all polynomials of degree nnn will have nnn zeros in C\mathbb{C}C. There is plenty to explore in the realm of complex analysis, but we are going to focus on the derivative.

Functions can always be written in terms of their real and imaginary parts, e.g., f(x+iy)=u(x,y)+iv(x,y)f(x+iy) = u(x,y) + iv(x,y)f(x+iy)=u(x,y)+iv(x,y). These can also be written as R(f)\mathfrak{R}(f)R(f) and I(f)\mathfrak{I}(f)I(f), respectively. Note that both uuu and vvv are real valued functions, and xxx and yyy are both real numbers. In our example, we have

Now, for z∈Cz \in \mathbb{C}z∈C we are only concerned with the subset of z¯∈R\bar{z} \in \mathbb{R}​z​¯​​∈R. This tells us that for z¯=x+iy\bar{z} = x + iy​z​¯​​=x+iy, we know that y=0y = 0y=0. Plugging this in

As you can see, the implementation of some complex functions have limitations. We also will not be able to use abs (its imaginary part is 000 by definition) or a function with <<< or >>> in it (the complex numbers are not ordered).

The advantage of this method over finite differencing is quite stark (within the bounds where it works). Not only is it more accurate, but it also doesn’t involve the extra search for the right step size. Exploring these advantages and comparing them not only to finite differencing but all the other known methods is beyond the scope of an article such as this. But there is something quite interesting happening in the details here that offers a bit of insight into numerical computation. The “hack” that makes this work lies in the fact that f(x,0)=u(x,0)f(x,0) = u(x,0)f(x,0)=u(x,0) and the substitution using the Cauchy-Riemann equations. But in general, f(x,0)=u(x,0)f(x,0) = u(x,0)f(x,0)=u(x,0) is basically just the identity function. Yet somehow this gives us startlingly more accurate results.

The devil is in the details. The problems with the finite-differencing methods arise from the fact that basic mathematical truths, such as x+ϵ≠xx + \epsilon \ne xx+ϵ≠x whenever ϵ≠0\epsilon \ne 0ϵ≠0, do not hold true in floating point arithmetic. It is not actually that surprising that this is the case. Take the number π\piπ, for example. Even with every atom in the visible universe acting as a bit in some hypothetical “universe computer”, you would not be able to represent the whole thing. It goes on forever. And there are uncountably more digits that are just the same way. In a lot of respects, it’s quite amazing how accurate many calculations can be made with floating numbers, like orbital mechanics and heat equations. It is important to remember that numerical computation is not abstract mathematics, and to have an awareness of what is happening under the hood. This hack gives us a fun way to take a peek, see where things break down, and use some other mathematics to patch it up.