Let’s say you want to write a short program to multiply two linear functions f(x) = ax+b and g(x) = cx+d and compute the coefficients of the resulting product:

You might think it’ll take 4 multiplications (for ac, ad, bc and bd) and 1 addition (for ad+bc), but there’s a better way: calculate ac, bd, (a+b)(c+d)-ac–bd which results in 3 multiplications and 4 additions/subtractions. It’s better since additions and subtractions are generally much faster than multiplications, so the tradeoff is usually worth it.

The above method is known as Karatsuba algorithm, named after Anatolii Alexeevitch Karatsuba, a Russian mathematician who found it when he was only an undergraduate. These days, one is likely to be underwhelmed by Karatsuba’s method, but it caused quite a stir in the early 1960s when it disproved a conjecture by Andrey Kolmogorov.

The power of Karatsuba’s method is that the coefficients can belong to any algebraic structure as long as multiplication is distributive over addition, i.e. we don’t even need multiplication to be associative. In particular, a, b, c, d can be polynomials themselves. E.g. suppose we wish to multiply two cubic polynomials:

followed by three applications of Karatsuba method to the three multiplications. This gives 9 multiplications instead of 16. Let’s do a more detailed count of the number of operations. Suppose f(n) (resp. g(n)) is the number of multiplications (resp. additions or subtractions) for using Karatsuba’s method to multiply two polynomials of degree n-1. Then:

f(2n) = 3f(n), which gives us f(2m) = 3m in general.

g(2n) ≤ n + 3g(n) + 2n, since n additions are required to compute a+b and c+d, and a further 2n subtractions are used for (a+b)(c+d) – ac – bd. This gives:

Hence, even for the number of additions, Karatsuba’s method will eventually overtake the naive method.

Fourier Transform

There’s a nice philosophical ideal behind Karatsuba’s method. Back to the linear polynomials f(x) = ax+b and g(x) = cx+d. Essentially Karatsuba’s method evaluates the two functions at various points, then multiply the results:

f(0) = b, g(0) = d which gives (fg)(0) = bd;

f(1) = a+b, g(1) = c+d, which gives (fg)(1) = (a+b)(c+d);

f(∞) = a∞, g(∞) = b∞, which gives (fg)(∞) = (ab)∞2.

If the reader should balk at our wanton use of infinity, feel free to replace x with 1/x and f(x) with x∙f(1/x) and evaluate at x=0. The big idea is as follows:

Big Picture. To compute the polynomial product fg, evaluate f(x) and g(x) at various points xm, multiply f(xm)g(xm), then find the polynomial which passes through these points.

This is much harder than it sounds. For suppose deg(fg) ≤ d. If we naively evaluate f(xm) independently for each m, then we need d evaluations, each of which requires about d/2 multiplications since the deg(f), deg(g) ≈ d/2. That’s bad news since it gives rise to d2/2 multiplications, and g(xm) will give another d2/2, and the total number of multiplications stands at d2 which is not any better than simple expansion.

[ And we haven’t even started finding the polynomial through a set of points – the standard method, Lagrange interpolation, is way too slow here. ]

Thus, we need a better plan.

That’s where discrete Fourier transform comes in. Suppose we wish to multiply two polynomials whose product has degree at most 7. Thus, only 8 values of xi are needed. The values we’ll pick are the 8-th roots of unity:

Thus and , which allows us to evaluate all powers of ω using only the first 8 powers via . Now break up the polynomial evaluation as follows: for , evaluate , for m = 0, …, 7 as follows:

Collect all the even columns (red arrows) together and the odd columns (blue columns) together, we divide the matrix into four blocks:

The blocks labeled ‘A’ are simply the discrete Fourier transforms of since . The block labeled ‘B’ is the discrete Fourier transform of with the j-th term multiplied by ωj, for j = 0, 1, 2, 3. Finally, the ‘C’ block is simply the negative of the ‘B’ block since ω4= -1.

Summary. To perform FFT for the case of 2n parameters, it suffices to perform FFT for two cases of n, with an additional 2n complex additions and n complex multiplications. This gives an overall complexity of O(n log n).

Inverse Fast Fourier Transform

Having found a method to perform FFT in time complexity O(n log(n)) it remains to find a method to invert FFT in comparable time complexity. But this turns out to be rather nice, for the inverse of the transformation from (aj) to f(ωj) is similar to the forward transformation:

The reader can easily verify for himself that the above relations do indeed recover the aj‘s. Just use the relations.

If j is a multiple of 8, then so .

If j is not a multiple of 8, since the numerator is 0 but the denominator isn’t.

Notice that this is almost identical to the forward FFT process, except that we have to reverse the order of the resulting aj‘s, from j = 1, …, 7. Thus, the inverse FFT process also has complexity O(n log n).

In short, to multiply two polynomials whose product has degree at most n-1, we do:

perform FFT on the first polynomial to obtain ;

perform FFT on the second polynomial to obtain ;

multiply the two sequences termwise to give ;

perform inverse FFT on the resulting to get the answer.

All steps other than the third require a time complexity of O(n log n), and the third step has a time complexity of O(n). Thus, the overall time complexity is O(n log n).

Applications of FFT

There are multiple uses for the fast Fourier transform algorithm.

Signal Processing : Fourier transform is the process of breaking a signal into a sum of various harmonics. Since digital data is collected in discrete packets, FFT is a natural way to do that, and it makes it tractable to perform real-time Fourier transform on millions of data points. [ The naive method, which has complexity O(n2), would make it impossible. ]

Image Compression : the bitmap standard JPEG is a lossy compression based on Discrete Cosine Transform (DCT), which one can think of as taking the real components of the Fourier transform. Roughly speaking, after FFT, the high-frequency components have to be down-sampled to attain a certain level of compression (this process is irreversible, hence the term lossy). Special allowances are often made for areas of the colour spectrum to which the human eye is particularly sensitive.

Multiprecision Computation : within a computer, a multi-precision integer (e.g. 1000-digit) is often stored in base 232 or 264 since computers typically have 32- or 64-bit buses and registers. FFT provides a way of multi-precision multiplication: to multiply ab, write a and b as polynomials with coefficients in [0, 232-1] (say). Then use FFT to multiply the two polynomials quickly and substitute x=232 to get the product.

The last case is of interest: although FFT is asymptotically faster than naive multiplication, it’s only used for extremely huge integers (e.g. millions of digits). The reason is that the calculations involved in FFT require computations with complex numbers, to a certain level of precision (certainly not to thousands, or even hundreds of significant figures, but still slow). Hence, the constant factor in the O(n log n) notation is rather large.

For example, the C library GMP uses a nice combination of Karatsuba and FFT in order to calculate multi-precision multiplication quickly.