Archive for the ‘mathematics’ Category

From the four numbers [6, 6, 5, 2], using only the binary operations [+, -, *, /], form the number 17.

When he tweeted the first time, I thought about it a little bit (while walking from my desk to the restroom or something like that), but forgot about it pretty soon and didn’t give it much further thought. When he posted again, I gave it another serious try, failed, and so gave up and wrote a computer program.

This is what I thought this time.

Idea

Any expression is formed as a binary tree. For example, 28 = 6 + (2 * (5 + 6)) is formed as this binary tree (TODO make a proper diagram with DOT or something):

+
6 *
2 +
5 6

And 8 = (2 + 6) / (6 – 5) is this binary tree:

/
+ -
2 6 6 5

Alternatively, any expression is built up from the 4 given numbers [a, b, c, d] as follows:
Take any two of the numbers and perform any operation on them, and replace the two numbers with the result. Then repeat, until you have only one number, which is the final result.

So my idea was to generate all possible such expressions out of [6, 6, 5, 2], and see if 17 was one of them. (I suspected it may be possible by doing divisions and going via non-integers, but couldn’t see how.)

(In hindsight it seems odd that my first attempt was to answer whether 17 could be generated, rather than how: I guess at this point, despite the author’s assurance that there are no underhanded tricks involved, I still wanted to test whether 17 could be generated in this usual way, if only to ensure that my understanding of the puzzle was correct.)

This is a very hard question. Understanding is an individual and internal matter that is hard to be fully aware of, hard to understand and often hard to communicate. We can only touch on it lightly here.

People have very different ways of understanding particular pieces of mathematics. To illustrate this, it is best to take an example that practicing mathematicians understand in multiple ways, but that we see our students struggling with. The derivative of a function fits well. The derivative can be thought of as:

Infinitesimal: the ratio of the infinitesimal change in the value of a function to the infinitesimal change in a function.

Symbolic: the derivative of is , the derivative of is , the derivative of is , etc.

Logical: if and only if for every there is a such that when

Geometric: the derivative is the slope of a line tangent to the graph of the function, if the graph has a tangent.

Rate: the instantaneous speed of , when is time.

Approximation: The derivative of a function is the best linear approximation to the function near a point.

Microscopic: The derivative of a function is the limit of what you get by looking at it under a microscope of higher and higher power.

This is a list of different ways of thinking about or conceiving of the derivative, rather than a list of different logical definitions. Unless great efforts are made to maintain the tone and flavor of the original human insights, the differences start to evaporate as soon as the mental concepts are translated into precise, formal and explicit definitions.

I can remember absorbing each of these concepts as something new and interesting, and spending a good deal of mental time and effort digesting and practicing with each, reconciling it with the others. I also remember coming back to revisit these different concepts later with added meaning and understanding.

The list continues; there is no reason for it ever to stop. A sample entry further down the list may help illustrate this. We may think we know all there is to say about a certain subject, but new insights are around the corner. Furthermore, one person’s clear mental image is another person’s intimidation:

The derivative of a real-valued function in a domain is the Lagrangian section of the cotangent bundle that gives the connection form for the unique flat connection on the trivial -bundle for which the graph of is parallel.

These differences are not just a curiosity. Human thinking and understanding do not work on a single track, like a computer with a single central processing unit. Our brains and minds seem to be organized into a variety of separate, powerful facilities. These facilities work together loosely, “talking” to each other at high levels rather than at low levels of organization.

This has been extended on the MathOverflow question Different ways of thinking about the derivative where you can find even more ways of thinking about the derivative. (Two of the interesting pointers are to this discussion on the n-Category Café, and to the book Calculus Unlimited by Marsden and Weinstein, which does calculus using a “method of exhaustion” that does not involve limits. (Its definition of the derivative is also mentioned at the earlier link, as that notion of the derivative closest to [the idea of Eudoxus and Archimedes] of “the tangent line touches the curve, and in the space between the line and the curve, no other straight line can be interposed”, or “the line which touches the curve only once” — this counts as another important way of thinking about the derivative.)

In his wonderful article “On proof and progress in mathematics“, Bill Thurston describes (among many other topics) how one’s understanding of given concept in mathematics (such as that of the derivative) can be vastly enriched by viewing it simultaneously from many subtly different perspectives; in the case of the derivative, he gives seven standard such perspectives (infinitesimal, symbolic, logical, geometric, rate, approximation, microscopic) and then mentions a much later perspective in the sequence (as describing a flat connection for a graph).

One can of course do something similar for many other fundamental notions in mathematics. For instance, the notion of a group can be thought of in a number of (closely related) ways, such as the following:

Motivating examples: A group is an abstraction of the operations of addition/subtraction or multiplication/division in arithmetic or linear algebra, or of composition/inversion of transformations.

Universal algebraic: A group is a set with an identity element , a unary inverse operation , and a binary multiplication operation obeying the relations (or axioms) , , for all .

Symmetric: A group is all the ways in which one can transform a space to itself while preserving some object or structure on this space.

Representation theoretic: A group is identifiable with a collection of transformations on a space which is closed under composition and inverse, and contains the identity transformation.

Presentation theoretic: A group can be generated by a collection of generators subject to some number of relations.

Topological: A group is the fundamental group of a connected topological space .

Dynamic: A group represents the passage of time (or of some other variable(s) of motion or action) on a (reversible) dynamical system.

Category theoretic: A group is a category with one object, in which all morphisms have inverses.

Quantum: A group is the classical limit of a quantum group.

etc.
One can view a large part of group theory (and related subjects, such as representation theory) as exploring the interconnections between various of these perspectives. As one’s understanding of the subject matures, many of these formerly distinct perspectives slowly merge into a single unified perspective.

From a recent talk by Ezra Getzler, I learned a more sophisticated perspective on a group, somewhat analogous to Thurston’s example of a sophisticated perspective on a derivative (and coincidentally, flat connections play a central role in both):

Sheaf theoretic: A group is identifiable with a (set-valued) sheaf on the category of simplicial complexes such that the morphisms associated to collapses of -simplices are bijective for (and merely surjective for ).

The rest of the post elaborates on this understanding.

Again in a Google Buzz post on Jun 9, 2010, Tao posted the following:

Bill Thurston’s “On proof and progress in mathematics” has many nice observations about the nature and practice of modern mathematics. One of them is that for any fundamental concept in mathematics, there is usually no “best” way to define or think about that concept, but instead there is often a family of interrelated and overlapping, but distinct, perspectives on that concept, each of which conveying its own useful intuition and generalisations; often, the combination of all of these perspectives is far greater than the sum of the parts. Thurston illustrates this with the concept of differentiation, to which he lists seven basic perspectives and one more advanced perspective, and hints at dozens more.

But even the most basic of mathematical concepts admit this multiplicity of interpretation and perspective. Consider for instance the operation of addition, that takes two numbers x and y and forms their sum x+y. There are many such ways to interpret this operation:

1. (Disjoint union) x+y is the “size” of the disjoint union X u Y of an object X of size x, and an object Y of size y. (Size is, of course, another concept with many different interpretations: cardinality, volume, mass, length, measure, etc.)

2. (Concatenation) x+y is the size of the object formed by concatenating an object X of size x with an object Y of size y (or by appending Y to X).

3. (Iteration) x+y is formed from x by incrementing it y times.

4. (Superposition) x+y is the “strength” of the superposition of a force (or field, intensity, etc.) of strength x with a force of strength y.

5. (Translation action) x+y is the translation of x by y.

5a. (Translation representation) x+y is the amount of translation or displacement incurred by composing a translation by x with a translation by y.

6. (Algebraic) + is a binary operation on numbers that give it the structure of an additive group (or monoid), with 0 being the additive identity and 1 being the generator of the natural numbers or integers.

7. (Logical) +, when combined with the other basic arithmetic operations, are a family of structures on numbers that obey a set of axioms such as the Peano axioms.

8. (Algorithmic) x+y is the output of the long addition algorithm that takes x and y as input.

9. etc.

These perspectives are all closely related to each other; this is why we are willing to give them all the common name of “addition”, and the common symbol of “+”. Nevertheless there are some slight differences between each perspective. For instance, addition of cardinals is based on perspective 1, while addition of ordinals is based on perspective 2. This distinction becomes apparent once one considers infinite cardinals or ordinals: for instance, in cardinal arithmetic, aleph_0 = 1+ aleph_0 = aleph_0 + 1 = aleph_0 + aleph_0, whereas in ordinal arithmetic, omega = 1+omega < omega+1 < omega + omega.

Transitioning from one perspective to another is often a necessary first conceptual step when the time comes to generalise the concept. As a child, addition of natural numbers is usually taught initially by using perspective 1 or 3, but to generalise to addition of integers, one must first switch to a perspective such as 4, 5, or 5a; similar conceptual shifts are needed when one then turns to addition of rationals, real numbers, complex numbers, residue classes, functions, matrices, elements of abstract additive groups, nonstandard number systems, etc. Eventually, one internalises all of the perspectives (and their inter-relationships) simultaneously, and then becomes comfortable with the addition concept in a very broad set of contexts; but it can be more of a struggle to do so when one has grasped only a subset of the possible ways of thinking about addition.

In many situations, the various perspectives of a concept are either completely equivalent to each other, or close enough to equivalent that one can safely “abuse notation” by identifying them together. But occasionally, one of the equivalences breaks down, and then it becomes useful to maintain a careful distinction between two perspectives that are almost, but not quite, compatible. Consider for instance the following ways of interpreting the operation of exponentiation x^y of two numbers x, y:

1. (Combinatorial) x^y is the number of ways to make y independent choices, each of which chooses from x alternatives.

2. (Set theoretic) x^y is the size of the space of functions from a set Y of size y to a set X of size x.

9. (Computational) x^y is whatever my calculator or computer outputs when it is asked to evaluate x^y.

10. etc.

Again, these interpretations are usually compatible with each other, but there are some key exceptions. For instance, the quantity 0^0 would be equal to zero [ed: I think this should be one —S] using some of these interpretations, but would be undefined in others. The quantity 4^{1/2} would be equal to 2 in some interpretations, be undefined in others, and be equal to the multivalued expression +-2 (or to depend on a choice of branch) in yet further interpretations. And quantities such as i^i are sufficiently problematic that it is usually best to try to avoid exponentiation of one arbitrary complex number by another arbitrary complex number unless one knows exactly what one is doing. In such situations, it is best not to think about a single, one-size-fits-all notion of a concept such as exponentiation, but instead be aware of the context one is in (e.g. is one raising a complex number to an integer power? A positive real to a complex power? A complex number to a fractional power? etc.) and to know which interpretations are most natural for that context, as this will help protect against making errors when manipulating expressions involving exponentiation.

It is also quite instructive to build one’s own list of interpretations for various basic concepts, analogously to those above (or Thurston’s example). Some good examples of concepts to try this on include “multiplication”, “integration”, “function”, “measure”, “solution”, “space”, “size”, “distance”, “curvature”, “number”, “convergence”, “probability” or “smoothness”. See also my blog post below in which the concept of a “group” is considered.

I plan to collect more such “different ways of thinking about the same (mathematical) thing” in this post, as I encounter them.

was found by Leibniz in 1673, while he was trying to find the area (“quadrature”) of a circle, and he had as prior work the ideas of Pascal on infinitesimal triangles, and that of Mercator on the area of the hyperbola with its infinite series for . This was Leibniz’s first big mathematical work, before his more general ideas on calculus.

Leibniz did not know that this series had already been discovered earlier in 1671 by the short-lived mathematician James Gregory in Scotland. Gregory too had encountered Mercator’s infinite series , and was working on different goals: he was trying to invert logarithmic and trigonometric functions.

Neither of them knew that the series had already been found two centuries earlier by Mādhava (1340–1425) in India (as known through the quotations of Nīlakaṇṭha c.1500), working in a completely different mathematical culture whose goals and practices were very different. The logarithm function doesn’t seem to have been known, let alone an infinite series for it, though a calculus of finite differences for interpolation for trigonometric functions seems to have been ahead of Europe by centuries (starting all the way back with Āryabhaṭa in c. 500 and more clearly stated by Bhāskara II in 1150). Using a different approach (based on the arc of a circle) and geometric series and sums-of-powers, Mādhava (or the mathematicians of the Kerala tradition) arrived at the same formula.

This startling universality of mathematics across different cultures is what David Mumford remarks on, in Why I am a Platonist:

As Littlewood said to Hardy, the Greek mathematicians spoke a language modern mathematicians can understand, they were not clever schoolboys but were “fellows of a different college”. They were working and thinking the same way as Hardy and Littlewood. There is nothing whatsoever that needs to be adjusted to compensate for their living in a different time and place, in a different culture, with a different language and education from us. We are all understanding the same abstract mathematical set of ideas and seeing the same relationships.

Suppose we have an alphabet of size . Its generating function (using the variable to mark length) is simply , as contains elements of length each.

2. Words

Let denote the class of all words over the alphabet . There are many ways to find the generating function for .

2.1.

We have

so its generating function is

2.2.

To put it differently, in the symbolic framework, we have , so the generating function for is

2.3.

We could have arrived at this with direct counting: the number of words of length is as there are choices for each of the letters, so the generating function is

3. Smirnov words

Next, let denote the class of Smirnov words over the alphabet , defined as words in which no two consecutive letters are identical. (That is, words in which for all , and for any .) Again, we can find the generating function for in different ways.

3.1.

For any word in , by “collapsing” all runs of each letter, we get a Smirnov word. To put it differently, any word in can be obtained from a Smirnov word by “expanding” each letter into a nonempty sequence of that letter. This observation (see Analytic Combinatorics, pp. 204–205) lets us relate the generating functions of and as

which implicitly gives the generating function : we have

3.2.

Alternatively, consider in an arbitrary word the first occurrence of a pair of repeated letters. Either this doesn’t happen at all (the word is a Smirnov word), or else, if it happens at position so that , then the part of the word up to position is a nonempty Smirnov word, the letter at position is the same as the previous letter, and everything after is an arbitrary word. This gives

or in terms of generating functions

giving

3.3.

A minor variant is to again pick an arbitrary word and consider its first pair of repeated letters, happening (if it does) at positions and , but this time consider the prefix up to : either it is empty, or the pair of letters is different from the last letter of the prefix, giving us the decomposition

and corresponding generating function

so

which is the same as before after we cancel the factors.

3.4.

We could have arrived at this result with direct counting. For , for a Smirnov word of length , we have choices for the first letter, and for each of the other letters, as they must not be the same as the previous letter, we have choices. This gives the number of Smirnov words of length as for , and so the generating function for Smirnov words is

again giving

4. Words with bounded runs

We can now generalize. Let denote the class of words in which no letter occurs more than times consecutively. (.) We can find the generating function for .

4.1.

To get a word in we can take a Smirnov word and replace each letter with a nonempty sequence of up to occurrences of that letter. This gives:

so

4.2.

Pick any arbitrary word, and consider its first occurrence of a run of letters. Either such a run does not exist (which means the word we picked is in ), or it occurs right at the beginning ( possibilities, one for each letter in the alphabet), or, if it occurs starting at position , then the part of the word up to position (the “prefix”) is a nonempty Smirnov word, positions to are occurrences of any of the letters other than the last letter of the prefix, and what follows is an arbitrary word. This gives

or in terms of generating functions

so

giving

4.3.

Arriving at this via direct counting seems hard.

5. Words that stop at a long run

Now consider words in which we “stop” as soon we see consecutive identical letters. Let the class of such words be denoted (not writing to keep the notation simple). As before, we can find its generating function in multiple ways.

5.1.

We get any word in by either immediately seeing a run of length and stopping, or by starting with a nonempty prefix in , and then stopping with a run of identical letters different from the last letter of the prefix. Thus we have

and

which gives

5.2.

Alternatively, we can decompose any word by looking for its first run of identical letters. Either it doesn’t occur at all (the word we picked is in , or the part of the word until the end of the run belongs to and the rest is an arbitrary word, so

and

so

6. Probability

Finally we arrive at the motivation: suppose we keep appending a random letter from the alphabet, until we encounter the same letter times consecutively. What can we say about the length of the word thus generated? As all sequences of letters are equally likely, the probability of seeing any string of length is . So in the above generating function , the probability of our word having length is , and the probability generating function is therefore . This can be got by replacing with in the expression for : we have

In principle, this probability generating function tells us everything about the distribution of the length of the word. For example, its expected length is

A long time ago, Diophantus (sort of) discussed integer solutions to the equation

(solutions to this equation are called Pythagorean triples).

Centuries later, in 1637, Fermat made a conjecture (now called Fermat’s Last Theorem, not because he uttered it in his dying breath, but because it was the last one to be proved — in ~1995) that

has no positive integer solutions for . In other words, his conjecture was that none of the following equations has a solution:

… and so on. An nth power cannot be partitioned into two nth powers.

About a century later, Euler proved the case of Fermat’s conjecture, but generalized it in a different direction: he conjectured in 1769 that an nth power cannot be partitioned into fewer than n nth powers, namely

has no solutions with . So his conjecture was that (among others) none of the following equations has a solution:

… and so on.

This conjecture stood for about two centuries, until abruptly it was found to be false, by Lander and Parkin who in 1966 simply did a direct search on the fastest (super)computer at the time, and found this counterexample:

(It is still one of only three examples known, according to Wikipedia.)

to which, if you add (say) print four_fifths(150) and run it, it returns the correct answer fairly quickly: in about 47 seconds on my laptop.

The if cc_sum in fifths: line inside the loop is an cost each time it’s run, so with a simple improvement to the code (using a set instead) and rewriting it a bit, we can write the following full program:

But there’s something unsatisfying about this solution, which is that it assumes there’s a solution with all four numbers on the LHS less than 150. After all, changing the function invocation to find_counterexample(145) makes it run a second faster even, but how could we know to do without already knowing the solution? Besides, we don’t have a fixed 8- or 10-second budget; what we’d really like is a program that keeps searching till it finds a solution or we abort it (or it runs out of memory or something), with no other fixed termination condition.

The above program used the given “n” as an upper bound to generate the combinations of 4 numbers; is there a way to generate all combinations when we don’t know an upper bound on them?

Yes! One of the things I learned from Knuth volume 4 is that if you simply write down each combination in descending order and order them lexicographically, the combinations you get for each upper bound are a prefix of the list of the next bigger one, i.e., for any upper bound, all the combinations form a prefix of the same infinite list, which starts as follows (line breaks for clarity):

There doesn’t seem to be a library function in Python to generate these though, so we can write our own. If we stare at the above list, we can figure out how to generate the next combination from a given one:

Walk backwards from the end, till you reach the beginning or find an element that’s less than the previous one.

Increase that element, set all the following elements to 1s, and continue.

Calling itertools.combinations_with_replacement is by far the fastest, taking about 2.7 seconds. It turns out that it’s written in C, so this would be hard to beat. (Still, writing it in a try block is seriously bad.)

The “equivalent” Python code from the itertools documentation (benchmark_itertools_combinations_with_replacment) is about 50x slower.

Gets slightly better when specialized to numbers.

Simply generating all combinations without an upper bound is actually faster.

It can be made even faster by writing it in a more C-like way.

The tuples version with the loop unrolled manually is rather fast when seen in this light, less than 4x slower than the library version.

Found via G+, a new physical experiment that approximates , like Buffon’s needle problem: The Pi Machine.

Roughly, the amazing discovery of Gregory Galperin is this: When a ball of mass collides with one of ball , propelling it towards a wall, the number of collisions (assuming standard physics idealisms) is , so by taking , we can get the first digits of . Note that this number of collisions is an entirely determinstic quantity; there’s no probability is involved!

Here’s a video demonstrating the fact for (the blue ball is the heavier one):

The NYT post says how this discovery came about:

Dr. Galperin’s approach was also geometric but very different (using an unfolding geodesic), building on prior related insights. Dr. Galperin, who studied under well-known Russian mathematician Andrei Kolmogorov, had recently written (with Yakov Sinai) extensively on ball collisions, realized just before a talk in 1995 that a plot of the ball positions of a pair of colliding balls could be used to determine pi. (When he mentioned this insight in the talk, no one in the audience believed him.) This finding was ultimately published as “Playing Pool With Pi” in a 2003 issue of Regular and Chaotic Dynamics.

Though many ſtones doe beare greate price,
The whetſtone is for exerſice
As neadefull, and in woorke as ſtraunge:
Dulle thinges and harde it will ſo chaunge,
And make them ſharpe, to right good vſe:
All arteſmen knowe, thei can not chuſe,
But uſe his helpe: yet as men ſee,
Noe ſharpeneſſe ſemeth in it to bee.

The grounde of artes did brede this ſtone:
His vſe is greate, and moare then one.
Here if you lift your wittes to whette,
Moche ſharpeneſſe thereby ſhall you gette.
Dulle wittes hereby doe greately mende,
Sharpe wittes are fined to their fulle ende.
Now proue, and praiſe, as you doe finde,
And to your ſelf be not vnkinde.

Modern spelling

Though many stones do bear great price,
The whetstone is for exercise
As needful, and in work as strange:
Dull things and hard it will so change
And make them sharp, to right good use:
All artsmen know they cannot choose
But use his help; yet as men see,
No sharpness seemeth in it to be.

The ground of arts did breed this stone;
His use is great, and more than one.
Here if you lift your wits to whet,
Much sharpness thereby shall you get.
Dull wits hereby do greatly mend,
Sharp wits are fined to their full end.
Now prove and praise as you do find,
And to yourself be not unkind.

Apparently the full title contains a pun (see http://www.pballew.net/arithm17.html): “the cossike practise” in the title refers to algebra, as the Latin cosa apparently meaning “a thing” was used to stand for an unknown, abbreviated to cos — but the Latin word cos itself means a grindstone.

The author again reminds readers not to blame his book, at the end of his preface:

To the curiouſe ſcanner.

If you ought finde, as ſome men maie,
That you can mende, I ſhall you praie,
To take ſome paine ſo grace maie ſende,
This worke to growe to perfecte ende.

But if you mende not that you blame,
I winne the praiſe, and you the ſhame.
Therfore be wiſe, and learne before,
Sith ſlaunder hurtes it ſelf moſte ſore.

Authors are either anxious about how their book is received, or make sure to be pointedly uncaring.

If it were not for the hillocks
You’d think little of the hills;
The rivers would seem tiny
If it were not for the rills.
If you never saw the brushwood
You would under-rate the trees;
And so you see the purpose
Of such little rhymes as these.