Wolfram Blog » Oleksandr Pavlykhttp://blog.wolfram.com
News, views, and ideas from the front lines at Wolfram Research.Thu, 07 Dec 2017 16:17:22 +0000enhourly1http://wordpress.org/?v=3.2.1New in the Wolfram Language: RandomPointshttp://blog.wolfram.com/2015/11/13/new-in-the-wolfram-language-randompoints/
http://blog.wolfram.com/2015/11/13/new-in-the-wolfram-language-randompoints/#commentsFri, 13 Nov 2015 14:42:01 +0000Oleksandr Pavlykhttp://blog.internal.wolfram.com/?p=28404Picking random points on the surface of a sphere so that the points are uniformly distributed is not as straightforward as you might think. Naively picking random spherical coordinates ϕ and θ will not give a uniform distribution of points. The problem is important enough to warrant a dedicated article in encyclopedias, such as Wolfram MathWorld (see Sphere Point Picking). Uniform sampling from Sphere[] is now available in the Wolfram Language with the RandomPoint function:

In fact, RandomPoint can be used to uniformly sample from any bounded geometric region, in any dimension. In 2D:

For example, you can use RandomPoint to mark uniformly distributed locations on a map of Africa:

Random points can be used to approximate geometric quantities. For instance, to estimate the maximum distance between two points in a regular pentagon inscribed in the unit circle, find the maximum distance between 1,000 pairs of random points on the boundary of a pentagon:

Or estimate the area of the symmetric difference using RegionSymmetricDistance(or any Boolean combination) of region sets:

Visualize the point cloud:

Build the Nearest function to quickly test if a point is within a given distance r from the point cloud:

Use the Monte Carlo method to estimate the area of the underlying region ℛ* from the set of sample points pts. This is done by sampling n points from the bounding rectangular region and counting the fraction of points that are within the range of r from the point cloud.

Accumulate the estimate statistics:

Estimate the distribution density:

Compare the estimated value with the exact numerical value:

I will conclude with a one-liner, a RandomPoint-based, Styrofoam-style visualization of the 8₃ knot:

RandomPoint is part of both the geometric computation and the probability and statistics capabilities in the Wolfram Language. RandomPoint was first introduced in Version 10.2 of the Wolfram Language and has been extended to cover new methods with the release of 10.3

Bernoulli’s treatise Ars Conjectandi (i.e. The Art of Conjecturing) was posthumously published in 1713, eight years after his demise, and was written in Latin, science’s lingua franca of the time. It is considered a seminal work of mathematical probability. Its importance is witnessed, in part, by its translations to French by G. Le Roy in 1801, and, recently, to English by E. D. Sylla in 2005.

The Art of Conjecturing comprises four parts. The first part reproduces Christiaan Huygens’ De Ratiociniis in Ludo Aleae (On Reasoning in Games of Chance), with extensive commentary from Bernoulli and detailed solutions of Huygens’ five problems, posed at the end of Huygens’ work with answers, but without derivations. In the first part, Bernoulli also derives the probability that at least m successes will occur in n independent trials with success probability of p:

The second part, “The Doctrine of Permutations and Combinations,” is devoted to combinatorics and to the study of figurate numbers, i.e. numbers that can be represented by a regular geometrical arrangement of equally spaced points:

It is here that Bernoulli introduces Bernoulli numbers. He starts by noting the identity among binomial coefficients , namely that .

To view the full content of this page, please enable JavaScript in your browser.

Bernoulli knew that for a fixed m, binomial coefficient is a polynomial in n, namely . This identity allows him to solve for . He gives a table of results for 0≤m≤10.

To reproduce Bernoulli’s table, define a function to construct equations for the sum of powers:

Solving for the sum of powers:

Bernoulli writes, “[W]hoever has examined attentively the law of the progression herein can continue the Table further without these digressions for computations” by making the following educated guess:

He notes that coefficients Br+1 do not depend on n, and can be computed recursively by substituting n==1 into the above equation.

These coefficients are the celebrated Bernoulli numbers, which have found their way into many areas of mathematics [e.g. see mathoverflow.net].

In the second part of his book, Bernoulli counts the number of permutations, the number of permutations in sets with repetitions, the number of choosing objects from a set, etc., which he later applies to compute probabilities as the ratio of the number of configurations of interest to the total number of configurations.

In part three, Bernoulli applies results from the first two chapters to solve 24 problems related to games of chance. A recurring theme in these problems is a sequence of independent 0 or 1 outcomes, which bears the name of Bernoulli trial, or Bernoulli process. I thought Jacob Bernoulli’s birthday anniversary to be an apt occasion to explore his problems with Mathematica.

For example, problem 9 asks to find the expected payout in a three-player game. Players alternately draw cards without replacement from a pack of twenty cards, of which ten are face cards. When the cards are exhausted, winnings are divided among all those who hold the highest number of face cards.

With c1, c2, and c3 denoting the number of face cards each player has, the payout of the first player is:

After the pack of twenty has been distributed, the first and the second players each receive seven cards, but the third one only receives six. The tally vector of face cards received by each player follows MultivariateHypergeometricDistribution:

This and other problems are stated and solved in the accompanying Mathematica notebook.

The concluding part four of Ars Conjectandi discusses uses of probability in civil, moral, and economic matters. Here Bernoulli argues that the probability reflects our incomplete knowledge of the state of the world, and unlike in a game of chance, where probabilities can be determined by finding the proportion that configurations of interest take in the whole set of possible configurations, the probabilities here cannot be a priori established. Bernoulli argues that these unknown probabilities can be inferred from past outcomes.

He proves the weak law of large numbers, asserting that the observed frequency of successes in n independent trials where the probability of success equals p will converge to p as the number of trials grows. Thus, you can estimate p arbitrarily accurately by running a sufficient number of trials. Specifically, for any δ and ε, there exists a large enough sample size n that:

]]>http://blog.wolfram.com/2015/01/15/jacob-bernoullis-legacy-in-mathematica/feed/3A “Trivial” Probability Problemhttp://blog.wolfram.com/2013/06/03/a-trivial-probability-problem/
http://blog.wolfram.com/2013/06/03/a-trivial-probability-problem/#commentsMon, 03 Jun 2013 13:27:18 +0000Oleksandr Pavlykhttp://blog.internal.wolfram.com/?p=14176I am a junkie for a good math problem. Over the weekend, I encountered such a good problem on a favorite subject of mine–probability. It’s the last problem from the article “A Mathematical Trivium” by V. I. Arnol’d, Russian Mathematical Surveys 46(1), 1991, 271–278.

It’s short enough to reproduce in its entirety: “Find the mathematical expectation of the area of the projection of a cube with edge of length 1 onto a plane with an isotropically distributed random direction of projection.” In other words, what is the average area of a cube’s shadow over all possible orientations?

This blog post explores the use of Mathematica to understand and ultimately solve the problem. It recreates how I approached the problem.

Projection of a square on a line

Before tackling the case of the cuboid, I started with a unit square, randomly rotated around its center of mass, with the intent to find the average length of its projection on a horizontal axis.

For the sake of simplicity, I placed the center of the mass of the square at the origin.

I found the left and right boundaries of the projection as the smallest and largest x coordinates of the vertices of the square, rotated around the origin through an angle of α degrees:

I then combined these functions within the Manipulate to be able to dynamically change the angle of rotation:

To view the full content of this page, please enable JavaScript in your browser.

Here’s the plot of the length of the projection as the function of the rotation angle:

Assuming the rotation angle α is uniformly distributed in the interval 0 ≤ α < 360, I readily found the expected length of the projection, that is, the average length of the square’s shadow:

Change the frame of reference of the rectangle

That was easy enough, but in order to take the victory to 3D, I needed to change the point of view. Instead of rotating the square, I rotated the line we project on.

Perspective of the wire frame:

To view the full content of this page, please enable JavaScript in your browser.

I found this change of perspective illuminating, as it made me think that the length of the projection is the sum of the lengths of the projections of the two sides of the square visible from the plane.

The length of the projection of a segment of length &#8467 with unit normal vector n1 onto a line with unit normal vector n2 equals &#8467 Dot[n1,n2].

The projection of each side of the square only contributes if Dot[n1,n2] is positive (i.e. the side is visible); otherwise it is hidden behind other sides. The length of the shadow is thus the sum of the contributions of the east, west, top, and bottom sides of the square:

Thus the length of the projection is the sum of the absolute values of coordinate components of the normal vector nvec. I implemented this way of computing the length of the shadow in a function:

And, of course, it agrees with the earlier way of computing the projection length:

Naturally, the expectation is the same:

The case of the cube

Guided by the insight gained by considering the square, I adopted the reference frame of the cuboid, whose center of mass is situated at the origin. The cuboid casts its shadow on a plane whose orientation is parametrized by its perpendicular (i.e. normal) vector nvec.

I started by straightforwardly building projections of each face of the cuboid and drawing them together. First I defined this helper function to project a vector xvec onto a plane with normal nvec:

The following function gives vertex coordinates of a face in the (i,j) plane with the other coordinate being z0. Here i and j range over {1,2,3}, which designates the x-, y-, and z- directions, and z0 ranges over {-1/2,1/2}:

I then defined a function to project each face onto a plane with normal vector nvec and produce the corresponding 3D polygon:

The area of a polygon that happens to be a quadrangle projected onto a plane with normal nvec is computed as the sum of the areas of two triangles that the quadrangle is split into by a diagonal:

With these utilities in place, I was ready to visualize the projection of the cuboid with edge of length one, whose center of mass is situated at the origin and whose sides are aligned with the coordinate axis. I parametrized the unit normal vector to the projection plane using spherical coordinates θ and φ: {sin(θ) sin(φ),sin(θ) cos(φ),cos(θ)}.

To view the full content of this page, please enable JavaScript in your browser.

Since at any one time, only three of the cuboid’s faces are visible, and since the area contributed by the invisible faces is the same (just imagine a parallel plane on another side of the cuboid), I sum over all the six faces and divide the result by two. For a particular orientation nvec=={-1,2,3}√14 of the plane, the area of the projection therefore is computed as follows:

Applying the insight learned from solving the 2D case, I checked if the area of the projection was again the sum of the absolute values of dot products of the normal vector nvec with axis vectors:

Bingo! This makes sense, I thought. Each term corresponds to the area of one of at most three visible faces. Indeed, consider a patch of area S on a plane with unit normal vector n1. The area of the projection of the patch onto another plane with unit normal vector n2 equals S Abs[Dot[n1,n2]].

Here is the spherical plot of the area of the shadow cast by the cuboid onto a plane with unit normal vector {sin(θ) sin(φ),sin(θ) cos(φ),cos(θ)} as the function of Euler’s angles θ and φ:

The minimum of the area function corresponds to the area of one face, which equals 1, and the maximum of √3 corresponds to the projection on the plane, whose normal vector aligns with the cuboid’s diagonal:

I was then almost ready to tackle the expected value of the area. For the normal unit vector {nx,ny,nz}, the expected projection area is:

The surface area of an infinitesimal patch (θ,θ+ ⅆθ)×(φ,φ+ⅆφ) of the unit sphere is well known to be sin(θ) ⅆθ ⅆφ. Dividing the infinitesimal area by the total surface area of the unit sphere, I obtain the infinitesimal probability measure, corresponding to the uniform distribution on the unit sphere: ⅆ S(θ,φ)==1/(4 π)sin(θ) ⅆθ ⅆφ.

And, finally, the expected projection area equals:

Of course, I could simplify the computation, using the symmetry argument:

It is a well-known fact (also see this relevant question on math.stackexchange.com) that each individual component of a random vector, uniformly distributed on the unit sphere, follows a uniform distribution on the interval (-1,1). With this insight, the answer can be worked out in one’s head, explaining why this problem was deemed by Arnol’d a “trivium”:

Going to higher dimensions

The insight I gained allowed me to easily construct the answer for the case of the d-dimensional hypercube, projected on the randomly oriented hyperplane:

I simply needed to know the distribution of a component of the d-dimensional random unit vector.

The computations are simple, and are based on hyperspherical coordinates (e.g. see Wikipedia: n-sphere). The infinitesimal hyperspherical area also factors as (Sin[θ1]d-2 ⅆθ1)(Sin[θ2]d-3 ⅆθ2)⋯(Sin[θd-2]ⅆθd-2)ⅆθd-1. Since the nd==cos(θ1), I needed to find the normalization constant for the quasi-density Sin[θ1]d-2:

Therefore the expected area of the projection of d-hypercube is:

Alternatively, I could use the closed-form result for the probability density function (PDF) of the distribution from here:

I concluded this rewarding exploration by reproducing the results obtained earlier and tabulating results for higher dimensions:

The average area of the shadow grows boundlessly with space dimension, as the number of faces that contribute to its area also increases:

In the high-dimensional space, the area scales as a square root of the dimension d:

This concluded my exploration of Arnol’d's trivium problem with Mathematica. The use of Mathematica led to many “Aha!” moments, and enabled me to readily answer various “What if…” questions. It is my hope that I was able to convey the excitement of the discovery process largely accelerated with the use of Mathematica, and to inspire you to begin explorations of your own.

]]>http://blog.wolfram.com/2013/06/03/a-trivial-probability-problem/feed/8Centennial of Markov Chainshttp://blog.wolfram.com/2013/02/04/centennial-of-markov-chains/
http://blog.wolfram.com/2013/02/04/centennial-of-markov-chains/#commentsMon, 04 Feb 2013 20:56:40 +0000Oleksandr Pavlykhttp://blog.internal.wolfram.com/?p=13258On January 23, 1913 of the Julian calendar, Andrey A. Markov presented for the Royal Academy of Sciences in St. Petersburg his analysis of Pushkin’s Eugene Onegin. He found that the sequence of consonants and vowels in the text could be well described as a random sequence, where the likely category of a letter depended only on the category of the previous or previous two letters.

At the time, the Russian Empire was using the Julian calendar. The 100th anniversary of the celebrated presentation is actually February 5, 2013, in the now used Gregorian calendar.

To perform his analysis, Markov invented what are now known as “Markov chains,” which can be represented as probabilistic state diagrams where the transitions between states are labeled with the probabilities of their occurrences.

Alice in Wonderland: A Case Study

Here we repeat the analysis that Markov applied to Pushkin’s text on Alice’s Adventures in Wonderland, by Lewis Carroll. To this end, let’s define a function computing frequencies of elements of a list, returning results as rules.

First, extract words from the text, making them lowercase:

Split the text into a sequence of letters:

Then classify them as vowels or consonants:

And compute the frequencies of vowels and consonants in the text:

Therefore, if we treat the text as a random sequence of either a vowel or a consonant, the probability of a vowel turning up is p = 0.3866.

Following Markov, let’s look at the frequencies of pairs of consecutive symbols:

Then we find the probabilities of a vowel or a consonant given by what precedes it:

In his paper, Markov observed that the sequence of vowels and consonants agreed much better with the model where the probability of a vowel depended on the preceding characters than with the model where it did not. Moreover, he found that the model where the probability depended on the two preceding characters agreed yet better.

Empowered with Mathematica, we can continue this investigation. Markov found that pairs of consecutive vowel-consonants carry more information than the sequence of vowel-consonants itself. One measure of the information stored in the data is the Entropy. The greater the entropy, the more information the data contains. Let’s compute it for the sequences of k-tuples of vowel-consonants (known as k-grams) for different values of k.

The plot confirms Markov’s findings. Curiously, it also shows that 25-grams carry little more information that 20-grams.

The probabilistic 2-gram model describing the sequence is now known as a Markov chain process.

The Markov chain process describes the evolution of a probability distribution πn on a state space at step n. Assuming that the state space is finite (in this example it consists of two elements {“vowel”,”consonant”}), the probability distribution can be thought of as a vector, and the conditional transition probabilities can be arranged in a matrix P:

In the case at hand, the transition matrix is:

Assuming the initial state is a vowel (encoded as 1), the 2-gram model is defined in Mathematica as follows:

With it, we can ask about the distribution of the distance between vowels in the text and compare the result with the data:

Thus the Markov model accurately predicts that the average distance between vowels in the text is about 2.586 characters. The distributions of distances predicted by the model also agree well with those actually found in the text:

Text Generation

The text of Alice in Wonderland only uses 1,484 unique words:

Repeating the same analysis with words, rather than vowel-consonants, we find that 4-grams carry essential information:

With the 4-gram model, the frequency of {w1, w2, w3, w4} encodes the probability of w4, given the three most recent words w1w2w3.

Now we define a function, assembling a sequence of k-grams that resulted from walking the graph into a text.

We can now simulate the resulting Markov chain to generate a random 100-word text:

Play Audio

Linguistic Analysis

The graph associated with 4-grams has visibly long linear subgraphs, seen as long threads of vertexes. Words occurring along these threads will always appear in combination in the randomly generated text. It is interesting to examine length of such unchanged sequences.

These long subgraphs can be singled out by removing from the original graph all the vertexes that share more than two edges.

We use WeaklyConnectedComponents to extract vertexes of these lines. After sorting them in the order in which they appeared in the text, we recreate the longest six such sequences. Each is a passage from the text (minus punctuation) in which every four-word sequence occurs uniquely in the text:

As you can see, the six sequences are actually quite long. In fact, they are exceptionally long. A median length of such fragments is eight words. One could continue this analysis on other pieces of literature and compare the results, but we’ll leave that for another time.

]]>http://blog.wolfram.com/2013/02/04/centennial-of-markov-chains/feed/10Mathematica and The American Mathematical Monthly‘s “Problems and Solutions” Sectionhttp://blog.wolfram.com/2009/10/28/mathematica-and-the-american-mathematical-monthlys-problems-and-solutions-section/
http://blog.wolfram.com/2009/10/28/mathematica-and-the-american-mathematical-monthlys-problems-and-solutions-section/#commentsWed, 28 Oct 2009 17:50:38 +0000Oleksandr Pavlykhttp://blog.internal.wolfram.com/?p=2074The “Problems and Solutions” section of The American Mathematical Monthly journal has always been a source of interesting problems to keep me entertained. Their solutions often require ingenuity. The problems in the October issue were no exception.

I always analyze and explore these problems in Mathematica. Being a kernel developer, I see whether Mathematica can indeed find a solution. This last issue has challenging problems, and it was particularly gratifying to observe that Mathematica could solve them right out of the box. So here are my solutions to three of the paraphrased problems:

Problem 11457, by M. L. Glasser:

Solution in Mathematica: Relaxing assumptions on a and b to avoid degenerate cases:

Problem 11456, by R. Mortini:

Solution in Mathematica:

Problem 11449, by M. Bataille:

Solution in Mathematica: As the expression is left unchanged by the variable’s rescaling, and as c is positive, let us set c==1.

Hence the minimal value of 9/8 is attained for a==b==c.

Because the expression is left invariant by interchanging any of the variables, the maximum value of 2 is attained for a==b==c/2 or a==c==b/2 or b==c==a/2.

Mathematica can help solve practically any computation problem. Its thousands of functions are detailed in the online Documentation Center, with many tutorials and examples.
]]>http://blog.wolfram.com/2009/10/28/mathematica-and-the-american-mathematical-monthlys-problems-and-solutions-section/feed/9Today We Broke the Bernoulli Record: From the Analytical Engine to Mathematicahttp://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/
http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/#commentsTue, 29 Apr 2008 12:52:29 +0000Oleksandr Pavlykhttp://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/In Mathematica, a core principle is that everything should be scalable. So in my job of creating algorithms for Mathematica I have to make sure that everything I produce is scalable.

Last week I decided to test this on one particular example. The problem I chose happens to be a classic. In fact, the very first nontrivial computer program ever written—by Ada Lovelace in 1842—was solving the same problem.

Bernoulli numbers have a long history, dating back at least to Jakob Bernoulli’s 1713 Ars Conjectandi.
Bernoulli’s specific problem was to find formulas for sums like .
Before Bernoulli, people had just made tables of results for specific n and m. But in a Mathematica-like way, Bernoulli pointed out that there was an algorithm that could automate this.
For any given n, the answer is a polynomial in m, and the coefficients are constants that Bernoulli showed could be computed by a simple recurrence formula.

It could have been that Bernoulli numbers would be useful only for solving this particular problem. But in fact over the past 300 years they have found their way into a remarkable range of areas of mathematics. They appear in the formula for the Riemann zeta function at even integers. They are in the coefficients in the series expansion for tan(x). They appear in the Euler-Maclaurin formula for approximating integrals by sums and in the Stirling series for the gamma function. They even relate to Fermat’s last theorem—which was first proved by Ernst Kummer for “regular primes” characterized by Bernoulli numbers.

In 1713 Bernoulli was rather proud of being able to compute the first ten Bernoulli numbers in “a quarter of an hour”.

Of course, in Mathematica, it’s now instantaneous:

But just how far can we go—295 years after Bernoulli, and now with Mathematica?

I decided to try scaling up Bernoulli’s computation—by a factor of a million—and computing the 10 millionth Bernoulli number.

Until recently, doing this would have been impractical, even in Mathematica. Because Mathematica used essentially the same classic recurrence relation for computing Bernoulli numbers that Jakob Bernoulli himself used, and that Ada Lovelace described programming on Charles Babbage’s Analytical Engine.

This algorithm has the feature (already recognized by Lovelace) that it takes about n^2 steps to compute the nth Bernoulli number. So even if one could compute the 10th Bernoulli number in a millisecond, it’d take several thousand years to compute the 10 millionth Bernoulli number.

But a few years ago I programmed a quite different algorithm into Mathematica. Instead of directly computing the Bernoulli numbers using a recurrence relation, I instead used a trick recently suggested by Bernd Kellner: computing Bernoulli numbers by computing the Riemann zeta function.

It’s the integrated nature of Mathematica that makes things like this practical. Without Mathematica, one has to use the simplest building blocks to make efficient algorithms. But with Mathematica, one can take for granted access to efficient very-high-level operations—like computing Riemann zeta functions.

To get the numerator, one then just has to use the relation to the zeta function. The right-hand side is, then, evaluated approximately and multiplied by the already-known denominator of the Bernoulli number. Provided the approximation is done with enough significant digits, the numerator of the Bernoulli number is a mere integer part of the product.

But there’s still a problem: to get all the digits in a Bernoulli number, one has to compute powers of pi to extreme precision.

Of course, Mathematica can do that—to billions of digits if necessary.

A week ago, I took our latest development version of Mathematica, and I typed BernoulliB[10^7].

But the numerator is not so simple; in fact it’s 57,675,292 digits long. It took less than 1 gigabyte of memory to compute, and required computing pi to about 66 million digits.

The numerator is negative, it begins with -47845869850733698144899338333210878162030638218660 and ends with 57164275665935124168181176013725629647185402960697.

Its digits seem almost random. I counted occurrences of all possible k-digit subsequences in the numerator of the 10 millionth Bernoulli number for k=1, 2, 3, and 4. The ratio of the standard deviation to the mean stays low, though grows with k.

So how can we tell if it’s correct?

Bernoulli numbers have lots of interesting properties. One that’s particularly revealing was discovered by Kummer in 1843, and goes by the name of p-adic continuity. In particular, for two integers n and m, and a prime number p such that neither n nor m is divisible by p-1, and a natural number r, such that:

the p-adic continuity asserts that:

This property is completely independent of the way we computed the Bernoulli number.

So let’s start checking. Obviously n=10^7. I checked the p-adic continuity for p=43, r=3, m=59776; p=59, r=2, m=916; p=7919, r=1, m=7484; and p=27449, r=1, m=8928. Every one of these congruences is 0, as it should be.

We’ve successfully found the 10 millionth Bernoulli number.

We’ve done what Ada Lovelace believed should be possible: we’ve mechanized the computation of Bernoulli numbers—so well, in fact, that 295 years after Jakob Bernoulli, it’s taken us only 500 times longer to compute a million-times-as-large Bernoulli number.

And all with a single line of Mathematica input.

]]>http://blog.wolfram.com/2008/04/29/today-we-broke-the-bernoulli-record-from-the-analytical-engine-to-mathematica/feed/0Mathematica and the Fundamental Theorem of Calculushttp://blog.wolfram.com/2008/01/19/mathematica-and-the-fundamental-theorem-of-calculus/
http://blog.wolfram.com/2008/01/19/mathematica-and-the-fundamental-theorem-of-calculus/#commentsSat, 19 Jan 2008 23:25:52 +0000Oleksandr Pavlykhttp://blog.devel.wolfram.com/2008/01/19/mathematica-and-the-fundamental-theorem-of-calculus/Most calculus students might think that if one could compute indefinite integrals, it would always be easy to compute definite ones. After all, they might think, the Fundamental Theorem of Calculus says that one just has to subtract the values of the indefinite integral at the end points to get the definite integral.

So how come inside Mathematica there are thousands of pages of code devoted to working out definite integrals–beyond just subtracting indefinite ones?

The answer, as is often the case, is that in the real world of mathematical computation, things are more complicated than one learns in basic mathematics courses. And to get the correct answer one needs to be considerably more sophisticated.

In a simple case, subtracting indefinite integrals works just fine.

Consider computing the area under a sine curve, which equals

We work out the indefinite integral:

Then we can just subtract its value at each end point, and correctly find a definite integral such as

But consider a more complicated case:

We can verify that this indefinite integral at least formally differentiates correctly:

Now let’s compute the definite integral by subtracting values of our indefinite one:

But this cannot be correct. After all, if we plot the integrand, we can see that it is positive throughout the range 0 to 2π:

So what went wrong with subtracting the end points? The issue is that the Fundamental Theorem of Calculus isn’t directly applicable here. Because, when you state it fully, the theorem requires that the antiderivative that is going to be subtracted be continuous throughout the interval.

But the antiderivative we have here looks like this:

It has a discontinuity right in the middle of the interval.

So how does Mathematica get its answer? It has to be more careful. Sometimes it works by detecting discontinuities in the antiderivative, and then breaking up the integration region into parts, and carefully taking directional limits at the discontinuity points:

Another thing it can do is to make use of ambiguity in the antiderivative. Every calculus student knows that antiderivatives can contain an arbitrary additive constant. But in fact, there’s more arbitrariness than that: one can add different constants on different parts of the interval.

Often it’s not obvious from the algebraic form that one has added a piecewise constant like this. For example, consider:

Differentiating this shows that it is indeed an antiderivative of the same function as above.

It doesn’t happen to be the antiderivative that Mathematica generates by default. But it is a perfectly valid one. And it turns out to be continuous over the region of integration:

So if one uses it, one can now directly apply the Fundamental Theorem of Calculus and get the correct result:

Even though their algebraic forms look different, one can verify that the antiderivatives differ by a piecewise constant:

So how come Mathematica doesn’t always generate the “better” antiderivative, so that the Fundamental Theorem of Calculus applies directly?

The problem is that there’s actually no way to produce an antiderivative that has this property for all definite integrals one might want to compute. Here’s the formal situation.

The Fundamental Theorem of Calculus states that an antiderivative continuous along a chosen path always exists. It is defined as , where the integration is performed along the path. Its existence is of theoretical importance–though in practice cannot always be expressed in terms of any predetermined set of elementary and special functions.

Moreover, if a meromorphic integrand has simple poles in the complex plane, it is impossible to choose an antiderivative continuous along every imaginable path in the complex plane–because of branch cuts in .

Our integrand has simple poles at :

But now consider how our two antiderivatives behave in the complex plane. Here are the real parts of these functions:

Looking on the real axis, is not continuous, so the Fundamental Theorem cannot directly be applied. But is continuous, so the Fundamental Theorem will work.

However, look now at the line from , indicated in the picture by a black line. Along this line, is continuous, so the Fundamental Theorem will work fine for it. But now is not continuous, so the Fundamental Theorem will not work:

And indeed one can show that there is no single choice of antiderivative for which the Fundamental Theorem will always work.

So Mathematica has to go to more effort to get the correct answer for the definite integral.

This may seem subtle–but actually it is just the tip of the iceberg of the issues that crop up in doing definite integration correctly in Mathematica–and in mathematics. It’s the job of our group at Wolfram Research to understand all these issues and figure out good algorithms for handling them. It’s a fascinating exercise not only in algorithm development but also in mathematics itself.

]]>http://blog.wolfram.com/2008/01/19/mathematica-and-the-fundamental-theorem-of-calculus/feed/0Our First Russian Student Competitionhttp://blog.wolfram.com/2007/06/05/our-first-russian-student-competition/
http://blog.wolfram.com/2007/06/05/our-first-russian-student-competition/#commentsTue, 05 Jun 2007 16:35:59 +0000Oleksandr Pavlykhttp://blog.devel.wolfram.com/2007/06/05/our-first-russian-student-competition/In April, we invited high school and college students in Russia to participate in a Mathematica competition. We gave the students a week to answer a set of seven Mathematica questions. The response was great, with submissions coming in from all across the country.

The first-place winner was Vladimir Dudchenko, an undergraduate student at the Moscow Institute of Physics and Technology (MIPT). He correctly solved all seven competition problems and displayed remarkable ingenuity and skill in his use of Mathematica. In addition to a student copy of Mathematica 6, Vladimir won a new MacBook Pro, a top-of-the-line machine donated by Apple and DPI Computers (Apple’s partner in Russia).

The runner-up winners were Yulia Kalugina, Konstantin Kanishev, Il’ja Lysenkov, and Aleksey Valabuev. Each won a student copy of Mathematica and other prizes from Wolfram Research.

And a special congratulation goes to 14-year-old Askar Safin. Askar was the youngest participant, and he correctly solved two of the problems–a significant achievement for a person of his age.

One of the problems, which turned out to be the most challenging, asked competitors to compute an inertia tensor of the spikey polyhedron, a cumulated icosahedron. You can now visualize the spikey polyhedron (which Michael Trott discussed in a recent blog post) instantly using the new PolyhedronData function in Mathematica.

We thought the problem’s most elegant solution was the one submitted by Aleksey Valabuev, a student at MIPT. He remarked that the spikey’s inertia tensor with respect to its center of mass must be spherical, i.e. I==I1==I2==I3. Aleksey further noted that is manifestly invariant under rotations. He pointed out that it suffices to integrate over a tetrahedron with one vertex at the spikey’s center of mass and a base being one of the spikey’s faces, instead of integrating over the whole spikey. This observation follows because the spikey’s symmetry group acting on this tetrahedron would generate the whole spikey!

In order to carry out the integration over a tetrahedron with base vertices of coordinates {a, b, c}, he represented the point inside the tetrahedron as

for

Of course, Mathematica now lets one look up the inertia tensor of a spikey immediately using PolyhedronData:

We were impressed by the variety of answers we received and congratulate everyone who participated. We’d like to hold other student competitions soon… but more on that later.