Tag: integral

A frequently asked question about numerical integration is: how to speed up the process of computing for many values of parameter ? Running an explicit for over the values of seems slow… is there a way to vectorize this, performing integration on an array of functions at once (which in reality means, pushing the loop to a lower level language)?

Usually, the answer is some combination of “no” and “you are worrying about the wrong thing”. Integration routines are typically adaptive, meaning they pick evaluation points based on the function being integrated. Different values of the parameter will require different evaluation points to achieve the desired precision, so vectorization is out of question. This applies, for example, to quad method of SciPy integration module.

Let’s suppose we insist on vectorization and are willing to accept a non-adaptive method. What are our options, considering SciPy/NumPy? I will compare

This computation took 0.799 seconds on my AWS instance (t2.nano). Putting the results of evaluation together takes less time:

Trapezoidal rule np.trapz(values, dx=dx) took 0.144 seconds and returned values ranging from 0.99955 to 1.00080.

Simpson’s rule si.simps(values, dx=dx) took 0.113 seconds and returned values ranging from 0.99970 to 1.0000005.

Romberg quadrature si.romb(values, dx=dx) was fastest at 0.0414 seconds, and also most accurate, with output between 0.99973 and 1.000000005.

Conclusions so far:

SciPy’s implementation of Romberg quadrature is surprisingly fast, considering that this algorithm is the result of repeated Richardson extrapolation of the trapezoidal rule (and Simpson’s rule is just the result of the first extrapolation step). It’d be nice to backport whatever makes Romberg so effective back to Simpson’s rule.

The underestimation errors 0.999… are greatest when is near zero, so the integrand is very nonsmooth at . The lack of smoothness renders Richardson extrapolation ineffective, hence all three rules have about the same error here.

The overestimation errors are greatest when is large. The function is pretty smooth then, so upgrading from trapezoidal to Simpson to Romberg brings substantial improvements.

All of this is nice, but the fact remains that non-vectorized adaptive integration is both faster and much more accurate. The following loop with quad, which uses adaptive Clenshaw-Curtis quadrature, runs in 0.616 seconds (less than it took to evaluate our functions on a grid) and returns values between 0.99999999995 and 1.00000000007. Better to use exponential notation here: and .

An adaptive algorithm shines when is small, by placing more evaluation points near zero. It controls both over- and under- estimates equally well, continuing to add points until the required precision is reached.

The last hope of non-adaptive methods is Gaussian quadrature, which is implemented in SciPy as fixed_quad (“fixed” referring to the number of evaluation points used). With 300 evaluation points, the integration takes 0.263 seconds (excluding the computation of nodes and weights, which can be cached) and the results are between and . This is twice as fast as a loop with quad, and more accurate for large — but sadly, not so accurate for small . As said before, with near zero is really a showcase for adaptive methods.

Trying to approximate a generic continuous function on with the Fourier trigonometric series of the form is in general not very fruitful. Here’s such an approximation to , with the sum over :

Poor approximation to exp(x)

It’s better to make a linear change of variable: consider on the interval , and use the formula for the cosine series. This results in , which is approximated by the partial sum of its cosine Fourier series as follows.

Better approximation after a change of variable

But one can do much better with a different, nonlinear change of variable. Consider on the interval , and again use the formula for the cosine series. This results in , which is approximated by the partial sum of its cosine Fourier series as follows.

Excellent approximation after nonlinear change of variable

Yes, I can’t see any difference either: the error is less than .

The composition with cosine improves approximation because is naturally a periodic function, with no jumps or corners in its graph. Fourier series, which are periodic by nature, follow such functions more easily.

A practical implication of this approximation of is the Clenshaw-Curtis integration method. It can be expressed in one line:

The integral is approximated by summing , where are even-numbered cosine coefficients of . In the example with using just three coefficients yields

while the exact integral is .

At first this doesn’t look practical at all: after all, the Fourier coefficients are themselves found by integration. But since is so close to a trigonometric polynomial, one can sample it at equally spaced points and apply the Fast Fourier transform to the result, quickly obtaining many coefficients at once. This is what the Clenshaw-Curtis quadrature does (at least in principle, the practical implementation may streamline these steps.)

The left endpoint rule, and its twin right endpoint rule, are ugly ducklings of integration methods. The left endpoint rule is just the average of the values of the integrand over left endpoints of equal subintervals:

Here is its graphical representation with on the interval : the sample points are marked with vertical lines, the length of each line representing the weight given to that point. Every point has the same weight, actually.

Left endpoint rule

Primitive, ineffective, with error for points used.

Simpson’s rule is more sophisticated, with error . It uses weights of three sizes:

Simpson’s rule

Gaussian quadrature uses specially designed (and difficult to compute) sample points and weights: more points toward the edges, larger weights in the middle.

Gaussian quadrature

Let’s compare these quadrature rules on the integral , using points as above. Here is the function:

Nice smooth function

The exact value of the integral is , about -0.216.

Simpson’s rule gets within 0.0007 of the exact value. Well done!

Gaussian quadrature gets within 0.000000000000003 of the exact value. Amazing!

And the lame left endpoint rule outputs… a positive number, getting even the sign wrong! This is ridiculous. The error is more than 0.22.

Let’s try another integral: , again using points. The function looks like this:

Another nice function

The integral can be evaluated exactly… sort of. In terms of elliptic integrals. And preferably not by hand:

Maple output, “simplified”

Simpson’s rule is within 0.00001 of the exact value, even better than the first time.

Gaussian quadrature is within 0.00000003, not as spectacular as in the first example.

And the stupid left endpoint rule is … accurate within 0.00000000000000005. What?

The integral of a smooth periodic function over its period amounts to integration over a circle. When translated to the circle, the elaborate placement of Gaussian sample points is… simply illogical. There is no reason to bunch them up at any particular point: there is nothing special about (-1,0) or any other point of the circle.

Gaussian nodes on a circle

The only natural approach here is the simplest one: equally spaced points, equal weights. Left endpoint rule uses it and wins.

is approximate in general, but exact for linear functions (polynomials of degree at most one).

Midpoint Rule

With two sample points we can integrate any cubic polynomial exactly. The choice of sample points is not obvious: they are to be placed at distance from the midpoint. On the example below, and , so the sample points are . The integral is equal to . One can say that each sample point has weight in this rule.

Two-point quadrature: areas of yellow rectangles add up to the integral

Three sample points are enough for polynomials of degrees up to and including five. This time, the weights attached to the sample points are not equal. The midpoint is used again, this time with the weight of . Two other sample points are at distance from the midpoint, and their weights are each. This contraption exactly integrates polynomials of degrees up to five.

Three-point quadrature: rectangles of unequal width

Compare this with Simpson’s rule, which also uses three sample points but is exact only up to degree three.

The above are examples of Gaussian quadrature: for each positive integer , one can integrate polynomials of degree up to by taking samples at the right places, and weighing them appropriately.

Let’s move from the real line to the complex plane. If one accepts that the analog of interval is a disk in the plane, then quadrature becomes very simple: for any disk and any complex polynomials ,

where is the center of the disk and is its radius. One sample point is enough for all degrees! The proof is easy: rewrite in terms of powers of and integrate them in polar coordinates. The same works for any holomorphic function, as long as it is integrable in .

Disk: a quadrature domain with one node

But maybe the disk is a unique such shape? Not at all: there are other such quadrature domains. A simple family of examples is Neumann ovals (Neumann as in “boundary condition”). Geometrically, they are ellipses inverted in a concentric circle. Analytically, they are (up to linear transformations) images of the unit disk under

This image, denoted below, looks much like an ellipse when is small:

Neumann oval with c=0.3

Then it becomes peanut-shaped:

Neumann oval with c=0.6

For it looks much like the union of two disks (but the boundary is smooth, contrary to what the plot suggests):

Neumann oval with c=0.95

In each of these images, the marked points are the quadrature nodes. Let’s find out what they are and how they work.

Suppose is holomorphic and integrable in . By a change of variables,

Here is holomorphic, but is anti-holomorphic. We want to know what does when integrated against something holomorphic. Power series to the rescue:

hence

Multiply this by and integrate over in polar coordinates: the result is if is odd and

if is even. So, integration of against a power series produces the sum of over even powers only (with the factor of ). The process of dropping odd powers amounts to taking the even part of :

The dreaded calculus torture device that works for exactly two integrals, and .

Actually, no. A version of it (with one integration by parts) works for :

hence (assuming )

Yes, this is more of a calculus joke. A more serious example comes from Fourier series.

The functions , , are orthogonal on , in the sense that

This is usually proved using a trigonometric identity that converts the product to a sum. But the double integration by parts give a nicer proof, because no obscure identities are needed. No boundary terms will appear because the sines vanish at both endpoints:

All integrals here must vanish because . As a bonus, we get the orthogonality of cosines, , with no additional effort.

The double integration by parts is also a more conceptual proof, because it gets to the heart of the matter: eigenvectors of a symmetric matrix (operator) that correspond to different eigenvalues are orthogonal. The trigonometric form is incidental, the eigenfunction property is essential. Let’s try this one more time, for the mixed boundary value problem , . Suppose that and satisfy the boundary conditions, , and . Since and vanish at both endpoints, we can pass the primes easily:

A function can be much larger than its derivative. Take the constant function , for example. Or to make it nonconstant. But if one subtracts the average (mean) from , the residual is nicely estimated by the derivative:

Here is the mean of on , namely . Indeed, what’s the worst that could happen? Something like this:

Deviation from the mean

Here is at most the integral of , and the shaded area is at most . This is what the inequality (1) says.

An appealing feature of (1) is that it is scale-invariant. For example, if we change the variable , both sides remain the same. The derivative will be greater by the factor of , but will be integrated over the shorter interval. And on the left we have averages upon averages, which do not change under scaling.

What happens in higher dimensions? Let’s stick to two dimensions and consider a smooth function . Instead of an interval we now have a square, denoted . It makes sense to denote squares by , because it’s natural to call a square a cube, and “Q” is the first letter of “cube”. Oh wait, it isn’t. Moving on…

The quantity was the length of interval of integration. Now we will use the area of , denoted . And is now the mean value of on . At first glance one might conjecture the following version of (1):

But this can’t be true because of inconsistent scaling. The left side of (2) is scale-invariant as before. The right side is not. If we shrink the cube by factor of , the gradient will go up by , but the area goes down by . This suggests that the correct inequality should be

We need the square root so that the right side of (3) scales correctly with : to first power.

And here is the proof. Let denote averaged over . Applying (1) to every horizontal segment in , we obtain

where is the sidelength of . Now work with , using (1) along vertical segments:

Of course, is the same as . The derivative on the right can be estimated: the derivative of average does not exceed the average of the absolute value of derivative. To keep estimates clean, simply estimate both partial derivatives by . From (4) and (5) taken together it follows that

This is an interesting result (a form of the Poincar\'{e} inequality), but in the present form it’s not scale-invariant. Remember that we expect the square of the gradient on the right. Cauchy-Schwarz to the rescue:

The first factor on the right is simply . Move it to the left and we are done:

In higher dimensions we would of course have instead of . Which is one of many reasons why analysis in two dimensions is special: is a Hilbert space only when .

The left side of (7) is the mean oscillation of on the square . The integrability of in dimensions ensures that is a function of bounded mean oscillation, known as BMO. Actually, it is even in the smaller space VMO because the right side of (7) tends to zero as the square shrinks. But it need not be continuous or even bounded: for the integral of converges in a neighborhood of the origin (just barely, thanks to in the denominator). This is unlike the one-dimensional situation where the integrability of guarantees that the function is bounded.