31 January 2008

What We're Doing Wrong: Programming in Theory Classes, from Michael Mitzenmacher's blog My Biased Coin. Mitzenmacher claims that the standard introductory classes in algorithms don't include programming assignments; furthermore, he thinks this is bad for theoretical computer science, because non-theoretical computer scientists get the idea that theory isn't actually good for anything.

I think the theoretical computer scientists are doing it right! But I think this from a selfish point of view -- I'm taking an algorithms class that nominally requires that I have certain classes which involve programming as a prerequisite, and I don't feel lost for not having had those classes. From a less facetious point of view, I agree with Mitzenmacher, even though it would hurt me. If someone learning something new has no idea how it connects to anything they've ever learned, it's not going to stick in their brain. The human brain is not a computer -- and even if it were, that doesn't mean that throwing a bunch of random facts in it is the best way to do things.

The same statement holds, I would think, for any course in any field that covers the "theory" behind some other material. Within mathematics, for example, it's possible to teach people what a category is without giving examples of all the familiar structures they know that are in fact categories; it's possible to teach real analysis without pointing out the connections to the calculus students already know; it's possible to teach rigorous measure-theoretic probability without appealing to (often flawed, but still useful) intuition that people have about flipping coins and rolling dice and distributing objects among boxes. I'm sure my readers can provide other examples.

30 January 2008

Here’s an evolutionist’s dream: 10,000 planet Earths, starting from the same point at the same time, and left to their own devices for four and a half billion years. What would happen? Could you go on safari from one planet to the next seeing an endless procession of wildly different organisms? Or would many of the planets be home to life forms that are broadly similar?

This is the sort of question that's hard to answer a priori. Basically, evolution is made up of a ridiculously large numer of random decisions, each with a very small effect. There are a lot of classes of combinatorial structures for which we can generate members of the class uniformly at random (or according to some other probability distribution; the details don't matter here) and they'll all basically look the same. Why shouldn't evolution be like that? The details will be different every time; but in broad outline one can imagine that a "law of large numbers" and "central limit theorem" could apply to evolution -- if we consider some numerical measure of some evolutionary trait, then if we average that numerical measure over many independent "runs" of evolution we should approach some limit, and the deviations from that average might even be spread out according to a normal distribution.

Of course, this isn't something that has to be true -- the many events that make up a single evolutionary process aren't exactly independent, some of them can only happen if others happen, and so on. And whatever numerical measure I was talking about in the previous paragraph might only exist in some runs and not in others. That would seem to argue against my hypothesis. But on the other hand, evolution isn't just a random walk. There are selection pressures which are the more standard explanation for what's known as "convergent evolution", which is the indepedent evolution of similar traits in evolutionarily distinct populations.

By the way, on the topic of convergent evolution: the eye has evolved something like forty times. This suggests that eyes are very likely to arise via the evolutionary process; things that have only evolved once among all life, like language (although that's open for debate), are given the state of our current knowledge less likely to arise. One might be able to compute something like the "probability" that eyes, language, or some other complex trait evolves by looking at how many times it has arisen independently. But this is the sort of probability that is very hard to interpret -- what would it mean to let evolution happen more than once?

In a comment to that post, John Armstrong has informed me of this post from the Volokh Conspiracy that shows that yes, they are. At least in the case of certain prediction markets dealing with Major League Baseball, that is. The standard contract here pays $10 if a team wins a particular game. A plot is provided which aggregates all the trades for contracts on MLB games in each ten-cent interval.

Now, let's say the Phillies and the Qankees are playing each other. (Longtime readers may recall that the reason for the Qankees' existence is because their name starts with Q, which is the letter after P. I haven't talked about them for a while, because it's not baseball season.) And let's say that I think the Phillies have a probability of 62.5% of beating the Qankees. Then I will be willing to pay up to $6.25 for the aforementioned contract, since that's the expected payout.

Now, what does it mean that this probability is 62.5%? Well, it means that if lots and lots of games like that one were played, then the Phillies would win about 62.5% of them. (The meaning of "lots and lots" and "about" can be made precise via the law of large numbers.) But that particular game will never be played again, so we can't check if my intuition is right. But we can do the next best thing -- look at all the games where people paid $6.25 for a contract for some team to win $10, and ask if that team won 62.5% of the time.

It turns out that, basically, they do. That's the point of the chart over at Volokh, which is due to Michael Abramowicz, author of the book Predictocracy: Market Mechanisms for Public and Private Decision Making. It's nice to see some evidence that at least in the world of sports -- which seems to be a good test bed for a lot of statistical and economic work, because it's possible to collect basically all the relevant data -- these things appear to actually be measuring probabilities.

About once a week, I buy a hoagie from Wawa for lunch. (Apparently people not from the Philadelphia area think "Wawa" is a very funny word. Here's an explanation; basically Wawa the store is named after Wawa the town, which in turn is the name for the Canada Goose in the language of the people who lived there before Europeans did.) Today was such a day.

Now, when you go in and buy a hoagie, you order on a touch screen, and the touch screen prints out a receipt with a number on it. These numbers are assigned sequentially -- but for various reasons, people's orders don't get filled in the same order that people put in their orders. This is immensely frustrating, and gives one a visceral sense of why permutations with a lot of inversions are kind of annoying. (A permutation is just a reordering of some totally ordered set, say {1, 2, ..., n}; an inversion, informally, is what we have when a larger number comes before a smaller number. For example, 1 5 3 2 4 is a permutation of {1, 2, 3, 4, 5}; it has four inversions, namely the pairs (5, 3), (5, 2), (5, 4), and (3, 2). I know that some people define inversions a bit differently, looking at the position in which the offending numbers appear instead of the offending numbers themselves.) The reasons why orders don't come out in the same order they come in have to do with the fact that there are multiple people preparing orders in parallel (at least during the lunch rush; the dinner rush is much less intense, both because there are less people buying dinner than lunch on a university campus and because the times at which peopel eat dinner are more spread out than the times at which they eat lunch) and that some orders take longer to prepare than others. To a first approximation, there seem to be three classes of order: food which is already prepared, cold made-to-order things, and hot made-to-order things.)

The wait was long today, so I got to thinking -- basically this procedure is generating a permutation of the integers. But it's a permutation of the integers with only Θ(1) inversions per element; that is, for any given customer, the number of people who came in after them and yet get served before them is on average some constant number. (I have no idea what this number is.) If we consider, say, the permutation that's generated in this way over the course of an hour during the lunch rush, it might be a permutation of [100] with a few hundred inversions. A "typical" permutation of 100 has about 2,500 inversions -- as n gets large, the number of inversions of a permutation of n selected uniformly at random is approximately normally distributed with mean n(n-1)/4 and standard deviation on the order of n3/2/6. Efficiently generating a random permutation with, say, less than 10n inversions (for large n) is not just a matter of generating random permutations (which is easy) and throwing out the ones which you don't like; there are sampling algorithms which work this way, to sample uniformly at random from some subset of a set which it's easy to sample u. a. r. from -- but as far as I know they don't throw away all but a vanishingly small proportion of all the elements in the large set! I don't know if anybody's thought about this problem.

28 January 2008

Somebody seems to have mistaken Dummit and Foote's Abstract Algebra -- a standard undergraduate abstract algebra text -- for a middle school algebra text, and wrote a bad review of it, saying things like:

As we see from these excerpts from the text, Dummit and Foote are disciples of "new math," a doctrine discredited in the 70's. Too often, strange symbols and jargon take the place of clear English prose. Extraneous concepts like "sets"--much less "finite nilpotent groups" or "invariant factor decompositions" or "symmetric multilinear maps"--are merely obstacles to a student's understanding of algebra. Sadly, the authors, holed up in their ivory towers, have not yet learned these vital educational lessons.

Well, of course! It's not good as a text for middle schoolers, because it was never supposed to be! (And it probably comes with a preface saying "this is a text for juniors and seniors in college majoring in math", or something like that; can anybody who has a copy of the book confirm this?)

I think it might be a parody. I hope it's a parody, of what someone who expected a middle school algebra text and got an abstract algebra text would say. And I think every mathematician has had that moment where they told someone they're taking "algebra" and people say "but didn't you learn that years ago?"

And the author makes a point, perhaps inadvertently - there is a time and a place for the precise language of higher-level mathematics, and middle school isn't it.

It appears to not be widely known, at least among some people who would like to know it, that primality testing can be done in polynomial time. The "naive" methods for determining whether a number is prime -- trial division and the like -- take time which is polynomial in the size of the input (the polynomial may be n1/2, but it's still a polynomial) which is considered "large" because it's exponential in the number of digits or bits in the input.

The fact that primality testing can be done in polynomial time -- and without recourse to a randomized algorithm -- I've heard about here and there; not surprisingly, the exponent is large. Namely, to test n for primality takes time O~(log6 n), where O~(t(n)) = O(t(n) * P(log t(n))) and P is a polynomial. (If you don't like the O~ notation, you can say O(log6+ε n) for any positive real ε.) It's conjectured that "6" can be replaced by "3". The proof is based on a generalization of Fermat's Little Theorem; of course it's nothing like the "naive" method of trial division, but the paper is surprisingly accessible given the strength of the result.

Edited, 7:21 pm: It's been pointed out to me that this result is widely known among people who work in the appropriate fields, and that's true. But a lot of people who are just looking for some sort of color when they're giving an example of a hard algorithmic problem, and whose specialty is not in that area, either don't know this or don't remember it. I'm claiming that the result deserves to be more well-known among mathematicians in general.

27 January 2008

18.098/6.099. Street Fighting Mathematics, a course currently being offered during MIT's Independent Activities Period, by Sanjoy Mahajan. (I got a call from MIT asking me for money tonight; I donated some; that got me thinking about the Institute so I poked around the web site a bit.)

Personally, I always thought the epsilons and deltas were harming me. The text (a draft version of which can be found on the course web page) stresses the idea that approximate answers, heuristics, etc. are more valuable than they are often claimed to be, which is a question that Mahajan also took on in his PhD thesis, which is a combination of a version of such a textbook and some extended examples on what one might call "research-level" problems, one of which is a probabilistic model of the primes which it is too late at night to seriously read.

From a quick poke around the web page, it looks like Mahajan also offered a similar, but more physics-oriented course in IAP 2006, as well as TAing a couple more substantial courses in the same vein at Caltech called "Order-of-Magnitude Physics". (The MIT IAP course meets three hours a week for four weeks and carries one-quarter the credit that a "normal" course at MIT would carry; the Caltech courses appear to have met three hours a week for ten weeks. As such, they have more problem sets. But they're also more physics-y, which may be good or bad depending on how you feel about physics.

Up until the late 1990s, it was commonly thought that 3rd Dimensia was only a disorder for patients dealing with 2-to-3-dimensional crossover. But today, scientists and doctors know better. Be warned: 3rd Dimensia does not discriminate. It can strike anyone at anytime.

One of the diagnostic questions is "Do you settle for just jumping over objects and projectiles?" -- I suspect that part of the reason video game characters can jump so high is because in two dimensions there is only one horizontal dimension, so the more realistic option of swerving around an oncoming enemy simply isn't available to them. If this is so, then I'd suspect characters in 3-D games can't jump as high (relative to their body height) as those in 2-D games; at some point the designers should have realized that real people don't jump that high, and designers probably feel more constrained by real physics in 3-D games than in 2-D games.

Try as I might, though, I couldn't find a reference on that site to my favorite fact about two-dimensional life -- namely that their digestion must work differently from ours, because topologically they are unable to have a digestive tract. Presumably they'd absorb nutrients through the skin, like unicellular life forms in our world do. I suspect larger 2-D life forms would have fractal surfaces, to get a large surface-area-to-volume ratio, similarly to how we have branching networks of blood vessels so that oxygen can get to all our tissues. This is one thing that Edwin Abbott Abbott got wrong.

25 January 2008

The first problem considered in Algorithm Design, by Jon Kleinberg and Éva Tardos, is the stable marriage problem. (I think this is a fairly standard first problem in an algorithms text, but I may be wrong.) The problem is as follows: let there be n men and n women. Let us say that each of the men can rank the women in order of how much he would like to be married to them, and the women have a similar ordering for the men. We would like to pair these people off in marriage in such a way that we can't find a man and a woman who both prefer each other to the person that they're currently married to. (It's a bit unclear who "we" are, at least in this version of the problem; most of the real-world examples I can think of where everybody gets matched at the same time are more like what the Wikipedia article calls the "college admissions" or "hospitals/residents" problem.) Anyway, the procedure works as follows:

All individuals start out in the state "free". The state "engaged" is also possible.

Pick a man arbitrarily. He proposes to the women who is highest on his preference list that he has not already proposed to. If she is free, or if she prefers the man who just proposed to the man she's currently with, she accepts the proposal and they become "engaged". (If she just broke off an engagement, then the man in question becomes "free" again.) Repeat until everybody's married.

Now, it turns out that which men we choose arbitrarily at each juncture when we're allowed to do so doesn't matter. Regardless of the choices, each man ends up with the best woman that he possibly could among all stable matchings; each woman ends up with the worst.

Although this is clearly a toy version of the real-life marriage problem, this has an interesting implication; historically, men did the proposing. Now, women can propose as well (although anecdotal evidence suggests men do most of the proposing). This is probably good for women, bad for men.

There's also an interesting combinatorial question hidden here, which I haven't thought about -- there are (n!)2n possible sets of preference lists when we have n men and n women. Some of these sets of preferences will have more stable matchings than others; what is the largest possible number of stable matchings? How is it distributed? Is the number of stable matchings corresponding to some set of preference lists even something one could easily count?

One final comment: I've seen the stable marriage problem a few times before, and usually a mention is worked in of something called the "same-sex stable marriage problem" -- often called the "stable roommates problem" -- in which everybody has a preference list for everybody else. This could just as well be called the bisexual stable marriage problem, but this usage doesn't seem to be as common.

Edit (Saturday, 12:15 pm): stable matchings aren't unique. But as I recently learned from Alexander Holroyd, Robin Pemantle, Yuval Peres, Oded Schramm. "Poisson Matching" (arXiv: 0712.1867), they are unique when the preferences are symmetric in a certain sense. One example of symmetric preferences -- the one considered in that paper -- is the case when the men and women are associated with points in Euclidean space, and each prefers to be matched with the people of the opposite gender who are closest in distance. (When they first claimed that stable matchings were unique, I was tempted to just stop reading -- but I didn't, and I'm glad.)

24 January 2008

Take the lazy way (from Agence France-Presse, via the (Toronto) Globe and Mail,): it is better to wait for the bus than to walk along the route of the bus and get on the bus when it catches up to you. This is intuitively obvious: whether you start walking or not, you end up arriving at your destination on the same bus (and therefore at the same time), and if you walk you risk that the bus will pass you between stops. The result is due to Justin Chen, Scott Kominers, and Robert Sinnott.

This was actually in the arXiv a couple weeks ago (0801.02979v2), but I didn't see it. Surprisingly enough, a newspaper article about mathematics in the mainstream news actually linked to the original research it was referring to! (This is surprisingly rare, and not just for articles about mathematics, but for just about anything.)

On a personal note, I walk to school and back each day even though there is public transit paralleling my usual route. This is contrary to the solution explained in the paper, but there are things that the paper doesn't include. First, by walking (a mile and a half each way) I get exercise, so I don't have to go to a gym. Second, I get frustrated if I wait for the trolley and it doesn't come. Third, I get a lot of my best thinking done while walking; the trolley is noisy enough and bumpy enough that I can't think well on it. Fourth, I'm cheap. Fifth, bus schedules around herre works of fiction anyway, and solving this problem correctly would require a more sophisticated probabilistic model, which I'm not going to go to the trouble of doing.

Most of these are extra-mathematical fallacies. For example, in mathematics we don't often see a relation "X causes Y", although it is quite common in ordinary discourse. Even in probability and statistics, we don't see causation nearly as often as correlation. However, they may be familiar to mathematicians as "false methods of proof". For example, what this list calls appeal to force is the well-known "proof by intimidation" -- the Important Person at the front of the room says it's true, so it must be!

A lot of these fallacies are essentially statistical in nature, as any reasoning about the real world must be. In mathematics we either know things or we do not; we don't attach probabilities to our knowledge. (However, we can attach probabilities to other people's knowledge -- or to our own extra-mathematical knowledge -- and then reason about those probabilities. This is the basis of Bayesian reasoning.) Many others are fallacies that exploit the ambiguitiy of natural language. Perhaps the power of mathematics is that it allows us to know things surely, which can never happen in the Real World. But on the flip side, mathematicians know fewer things than other people, because we insist on such certainty in our knowledge.

23 January 2008

I don't know, but it's certainly an interesting question. From what I've heard, the question of tactical voting in general elections is well-studied. But in primary elections, where one often has the choice of voting in either of two elections, there's the added complexity that you don't necessarily know which election to vote in. Some people claim this happened in the recent New Hampshire primary: independents tended to prefer either Obama (in the Democratic primary) or McCain (in the Republican primary). Polls shortly before the primary showed both of those candidates leading, and that Obama had a comfortable lead, and McCain's lead was less comfortable. Thus a lot of people who preferred McCain over any other Republican candidate and Obama over any other Democratic candidate chose to vote in the Republican primary, for McCain... and so Obama lost. (There are of course other interpretations of the New Hampshire primary results; the two main ones are that Clinton won because she cried, which is silly, and that the polls were wrong because statistics work that way.)

And that doesn't even touch on the issue of "electability" -- the person you would most like to be President isn't necessarily the right one to vote for, because a candidate that in your mind is inferior may be better able to defeat the candidate of the other party. It wouldn't surprise me to learn that in certain pathological candidates, someone who would always want a Democratic president should vote in the Republican primary, and vote for the "least electable" Republican candidate.

22 January 2008

J. Michael Steele, a professor of statistics here at Penn, has a page of humorous quotes, some of which refer to probability and statistics. He also has some "rants", which include the non-ranty Advice for Graduate Students in Statistics. I can't vouch for how good this advice is -- it's hard for me to judge anything that purports to advise someone going through the process that I am currently a part of -- but it at least sounds good, and I've found myself thinking of little pieces of this page from time to time over the last few months. (I took a class from Steele last semester, and probably discovered this back in September; looking for information that advises me on what to do as a graduate student is one of the less damaging ways in which I procrastinate. It helps that these pages rarely say "don't procrastinate!", although they do say things like "work consistently", which is similar. I characterize this as "less damaging" because I suspect that I may have internalized some of the advice I've seen, and will remember it when I am seriously immersing myself in research. That date is not all that far in the future.)

This is something of an embarrassment of riches; I want one for my class tomorrow, because we're talking about regular polyhedra, and to talk about that without mentioning Euler's formula would be remiss.

I'll probably give "Proof #8: Sum of angles" from the page, which sums up the various angles involved in a drawing of the graph corresponding to a polyhedron, or the proof that Wikipedia attributes to Cauchy, which triangulates that graph (not changing V-E+F) and then removes edges one or two at a time (still not changing V-E+F). These seem the most accessible. (They're also the ones I managed to struggle in the general direction of while walking to the coffee house a couple hours ago.)

If I wanted to scare the hell out of my students (this class has no prerequisites), I'd give the "binary homology" proof.

20 January 2008

The story goes that Hamilton figured out the definition of quaternions while walking across Broom Bridge in Dublin.

What I didn't know is that there's a plaque there commemorating this. The text of the plaque says:"Here as he walked by on the 16th of October 1843, Sir William Rowan Hamilton in a flash of genius discovered the fundamental formula for quaternion multiplication i2 = j2 = k2 = ijk = -1" carved (?) on a stone of this bridge."

There's also a sign commemorating ENIAC, the "first computer", across the street from my office. I didn't know it was there until about a year after I came to Penn, because it was obscured by construction. It says "ENIAC, the Electronic Numerical Integrator and Computer, was invented by J. Presper Eckert and John Mauchly. It was built here at the University of Pennsylvania in 1946. The invention of this first all-purpose digital computer signaled the birth of the Information Age." Mark Jason Dominus pointed me to a picture in the Wikipedia article.

What other signs do you know of that say, roughly, "math happened here"?

19 January 2008

And this doesn't even mention integrals like that of exp(-x2) over the whole real line, for which there's a "trick", or real integrals that are best done by integrating over a contour in the complex plane -- the focus here is solely on integrals where there is a definite integral but something weird happens, the sort of thing where you think you know what you're doing but you really don't. (This is the sort of thing that teachers who have a reputation for being a bit sadistic pepper their tests with.)

(Feynman supposedly got a reputation for being really good at integration because he knew some contour integration tricks that a lot of other people didn't. He didn't know a lot of the tricks that they knew, but they only came to him after they had already banged their head against it. The moral of this story: people think you're smart if you know things that they don't. Edited: Efrique points out in a comment that I have this backwards -- but the idea still stands.)

18 January 2008

Nassim Taleb (author of The Black Swan and Fooled By Randomness, both of which I enjoyed very much), takes on Edge'squestion for 2008: "What have you changed your mind about?"

His short answer is "The irrelevance of 'probability'". Naturally, I took offense.

It turns out that Taleb is not saying that the theory of probabilities is worthless for real-world situations -- although this would be the sort of thing he would say, and as he has large piles of money which have been made in the markets, perhaps he knows something I don't. It could be the case that probability theory, like any other axiomatic theory, just doesn't reflect what happens in real life. Compare Newtonian mechanics or Euclidean geometry; who will be probability's Einstein or Riemann?

Rather, Taleb is saying that we need to take into account expected value -- the product of probability and payoff. And as I understand it, the funds he manages lose a small amount of money most days -- but they make very large amounts of money on the days they make money, so they come out in the black. (Psychologically, this is hard for human beings to take, because basically we feel sad if we lose money and happy if we make money, and we're not that sensitive to the amounts lost or gained; thus on most days someone with this sort of scheme feels sad. The trick, apparently, is to not look at your portfolio that often; most days you lose, but most months (say) you win.)

More generally, depending on the problem, knowledge of the whole distribution of some random variable can be useful. Of course we can't cut a whole distribution down to a single number. Or even two numbers. Not every distribution is normal.

(Read the rest of the responses to this question, too. A hundred smart people have things to say.)

17 January 2008

What can be said about the average number of iterations needed to run the Euclidean algorithm on (x, y), where x and y are, say, selected u. a. r.1 from {1, 2, ..., N}? It clearly can't grow any faster than logarithmically, since the Euclidean algorithm operated on (x, y) takes at most logφ max(x,y) + O(1) steps (when its inputs are consecutive Fibonacci numbers). It turns out that the mean number of iterations required on inputs which are at most N is in fact C log N for some constant C; this is apparently a classical (1969) result, and a special case of a more general result of Viviane Baladi and Brigitte Vallee in their paper Euclidean algorithms are Gaussian (arXiv:0307062v4). So the Fibonacci numbers, which are the worst possible input in this algorithm, are only a constant factor slower than the average. Baladi and Vallee also show that the number of iterations is in fact asymptotically Gaussian with variance also of order log N.

1 "u. a. r." stands for "uniformly at random". This seems to be a semi-standard abbreviation.

Studies have shown that students leave introductory physics courses almost universally less excited about the topic than when they came in. This article details an experiment to address this problem: a course weblog or "blog" which discusses real-world applications of physics and engages students in discussion and thinking outside of class. Students who read, commented on, and were involved with the blog maintained their initially positive attitudes towards physics in contrast to the typical deterioration in attitude seen in students who did not participate in the blog study.

However, there's one huge flaw in this research -- reading and commenting to the course blog (which was written by the instructors) was optional for the students; those who did so received extra credit. At least in my experience, usually the people doing things for extra credit are the ones who don't need the extra credit anyway; they do well in the class because they actually like the material. So the students who pay attention to the blog are probably the ones who weren't going to get disillusioned about physics no matter what.

This blog isn't a "course blog", and I don't know of any in mathematics at the introductory college level. (Or really at any other level -- Terence Tao has been posting lecture notes for his graduate course, but that's not the same thing at all.) It would be interesting to see if something like that could work, especially in a class where there are real-world applications of the material. (The "Ideas in Mathematics" course I mentioned that I'm TAing is not such a course, because of the particular preferences of the instructor; other courses offered under the same title could be.) But I suspect that one really has to make the blog an integral part of the course in order for it to have the desired effect, and even then it's not something students are used to so there's the distinct possibility that they'll dismiss it as something that was "added on" to the course.

16 January 2008

I'll refrain from commenting -- obviously I have thoughts, but I have things I really should be doing instead of trying to articulate them (like preparing for the class I'm teaching tomorrow morning!) -- and just let Hardy's work, one of the classic answers to questions such as "What is mathematics?" and "Why do mathematics?", stand on its own.

David Speyer asks: what is total variation distance? Total variation distance is defined as follows: given two measures f and g on a space X, of total mass 1, the total variation distance is the maximum (strictly speaking, the supremum) of f(A) - g(A) over all subsets A of X. But this is only really enlightening if you're one of those people who buys the whole "probability theory is the study of measures of total mass 1" (I first heard this from Robin Pemantle but I don't know if it's original to him), which makes it sound like probability is strictly a subset of measure theory. It's not, because when you restrict to measures of total mass 1 a whole host of probabilistic intuition is valuable. Speyer gives an interpretation of the total variation distance in terms of gambling.

Speaking of gambling, you should shuffle a deck of cards seven times before using it; this comes from the famous paper of Bayer and Diaconis (Dave Bayer and Persi Diaconis, "Trailing the Dovetail Shuffle to its Lair". Ann. Appl. Probab. Volume 2, Number 2 (1992), 294-313), which Speyer was reading. (This was in the January 9, 1990 New York Times. Had this blog existed eighteen years ago, I surely would have mentioned this article. But there were no blogs back then, and I was not old enough to be reading mathematics journals.) I had incorrectly believed that this means that seven shuffles gave a uniform distribution over all possible arrangements of cards, but in fact the distribution isn't uniform; it's just "close enough", in the sense that the total variation distance is small.

I suspect I confused this with the result that "if a deck is perfectly shuffled eight times, the cards will be in the same order as they were before the shuffling"; thus you should stop at seven shuffles because on the eighth shuffle all your work will be for nought! But nobody shuffles perfectly.

15 January 2008

I'm reading Random sampling of plane partitions, by Olivier Bodini, Eric Fusy, and Carine Pivoteau. (arXiv:0712.0111v1). (Note: if you follow posts where I mention new things I've come across in the arXiv, you'll find that I'm currently reading papers I come across there at about a six-week lag.)

The authors give a way to sample from plane partitions of size n uniformly at random. The method is not as efficient as one might like -- it begins by generating a plane partition of size approximately n, where "approximately" means something like "within o(n) of", from a distribution which is uniform when restricted to partitions of exactly n, by a method known as "Boltzmann sampling", and then throws out those partitions which are not of size n. Still, a plane partition of size n can be chosen uniformly at random in time O(n4/3). (Note that uniformity is important here; if we didn't care about the distribution, we could just write down a bunch of numbers that sum up to n and be done! More seriously, uniform sampling of this sort of combinatorial object with a bunch of highly interdependent parts tends to be tricky.)

But the idea I really like here is that of Boltzmann sampling, the inspiration for which I assume comes from physics. Namely, given a combinatorial class, we give each object a weight xn where n is its size, where x is a parameter between 0 than 1; then we pick objects with probabilities proportional to their weights. It turns out to be routine to give a Boltzmann sampler -- that is, an algorithm which picks a member of the class according to this distribution -- for any combinatorial class we can specify. (This is according to the papers of Duchon et al. and Flajolet et al. I've listed below, which I haven't actually read yet.) It reminds me of the partition function of statistical mechanics (the fact that this has "partition" in the name is a coincidence, as one could do this for any combinatorial class). Say a certain sort of physical system can occupy states labeled 1, 2, ..., N. Let Ej be the energy of state j. Then give each state the weight exp(-β Ej), where β = 1/(kT) is the "inverse temperature"; the probabilities that the system occupies various states are proportional to these weights. Replacing energy by size and exp(-β) by x gives the combinatorial Boltzmann sampling. So the parameter x is some sort of temperature.

14 January 2008

Basically, it appears to be more likely that we are some sort of naked brain living in an illusion of a world than that we live in the actual world we perceive. Roughly speaking, this occurs if we assume that the universe is infinite -- and thus everything that can occur does occur -- because a naked brain is supposedly much more likely to form by chance than the reality we think surrounds us does.

The obvious rebuttal, if one is wedded to this particular model of cosmology, is an evolutionary one -- maybe naked brains aren't so likely after all, because brains are produced (or so we think) by evolutionary processes, so is one really so likely to find a brain just sitting there without the biology in which it evolved? Overbye's article only mentions physicists; I wonder what (if anything) the biologists have to say. And I don't think our probabilistic understanding of evolution is quite to the point where the first sentence of this paragraph can be made rigorous. (On this point, I'd love to be told I'm wrong!)

edit: Sean at Cosmic Variance has written about this much more insightfully than I, and with links to a lot of the relevant research.

Are Jews Smarter? (Jennifer Senior, New York magazine, October 16, 2005 -- but somehow I just came across it.) You may be familiar with the idea that Jews are smarter than the general population because they have been subject to different selection pressures; namely, the fact that they have historically been forced out of the place where they're living leads the ones who are smart enough to be able to perform some intellectual work -- which is portable -- are the ones who reproduce more. I'm a bit suspicious of this argument, but it's food for thought. Also, apparently there's an interesting analogy: certain diseases that Ashkenazi Jews are prone to, for example Tay-Sachs disease, perhaps bear the same relationship to intelligence as sickle-cell anemia (which people of African descent are prone to) bear to resistance to malaria. That is, people who carry one copy of a certain allele are likely to be smarter, or to be resistant to malaria; people who carry two copies have Tay-Sachs or a related disease, or are anemic.

It's basically impossible to deny that Jewish people are more common in certain intellectual fields than in the population as a whole. Mathematics is one of those fields. But is this due to genetics, or to environment? The other common explanation for the prevalence of Jewish people in academia is that the Jewish culture has historically valued the study of the Torah and this has carried over to secular scholarship. (See, for example, some of the comments to Stanley Fish's The Uses of the Humanities, Part Two, from Stanley Fish's blog at the New York Times.) It wouldn't surprise me to learn that both of these effects play a significant role.

Fish is talking about what the humanities are good for, but it's never really clear what the humanities are in opposition to. (I think things like engineering or business, where the things college students learn are very clearly linked to the jobs they expect to have after college.) But which side of that line is mathematics on? Sure, mathematical ability is useful for its own sake -- but one often hears that employers want to hire people with mathematical training not to do mathematics, but because they know how to think rigorously and abstractly. And I can hear echoes of this in Fish's claims about what the humanities are good for.

Not so. Intrade users were willing to pay $3.90 for a contract that will pay them $10 if Obama wins the Democratic nomination, and $5.70 for the same contract with Clinton. Most (more than 57 percent) of these individuals probably feel that Clinton is more likely to win than Obama, and if they had to bet on a single candidate would pick Clinton. What the prediction market says is that people believe there is a 57% probability that Clinton will get the nomination. What this means is another issue; does it mean that if we reran the election 100 times, Clinton would get the nomination 57 times? No, because if we reran the election 100 times, people wouldn't show up to the polls. But it does seem to mean that if someone accepted 100 such contracts on different events, paying $5.70 for each of them, they'd expect 57 of them to pay, at $10 each. If they expected less than 57 contracts to pay, then they wouldn't take the bet. (I'm assuming here that expected value is a good way to measure these things. This is probably reasonable here, because neither Clinton winning nor Clinton losing is a rare event.)

In short, the prediction market says that Clinton has a 57% probability of getting the nomination, which is different than saying that 57% of people think Clinton is the most likely to be nominated.

In his honor, I hope to take a class in the analysis of algorithms this term. (Okay, I was planning to anyway.) In fact, just now I got it approved that I could take this class. It's not in my department, so doing so required a couple e-mails -- but the analysis of algorithms is a legitimate field of mathematics. I'm not going to explain why here, because in doing so I would probably just say lots of foolish things that I don't fully understand. This is why I'm taking the class -- I am interested in the analysis of algorithms but I don't know much about it.

I hope that one day I have enough money that I feel like I can give $2.56 to everybody who catches a mistake that I make and not be bankrupted by this. (Knuth gives this amount -- "one hexadecimal dollar" -- to anyone who finds an error in one of his books.) I would be willing to adopt such a scheme right now if all other authors adopted it as well, though; since I read much more than I write I'm reasonably sure I'd come out ahead -- so long as I was the only one eligible to receive such money. (Is there anybody who reads less than they write? That seems like it would be a very strange state of affairs.) However, if all other authors adopted such a scheme they would probably also be more diligent in proofreading their work, and I'd have competition for finding the errors; in the end some sort of equilibrium would be reached among all people who read and/or write, and I'm not sure whether I would end up paying out more in such bounties than I take in.

I think that even sighted mathematicians will get something from this, because the main issues for a visually impaired mathematician are that they cannot read or write in the usual way, and many of us do work in a situation where reading or writing is not available to us. Much of my best work gets done while walking to or from school, which is why I refuse to take SEPTA even though it would be faster. Plus, I get exercise that way. I've often taken to calling my own cell phone and dictating the solution to a problem into my voice mail. But this clearly isn't the same thing, because in the end I write things up in the traditional way.

Not surprisingly, Raman seems to find that the largest difficulties come in trying to communicate with other mathematicians, although this is becoming less of an issue as mathematics moves online, especially with the proliferation of TeX. (But this raises a question for me: often I write TeX that isn't strictly correct, but compiles anyway, and gives the right output on the page. How do systems like Raman's AS TE R (Audio System for TEchnical Readings, his Ph. D. thesis) handle this?

08 January 2008

Yesterday and today I read John Allen Paulos'Irreligion: A Mathematician Explains Why the Arguments for God Just Don't Add Up . A quite good book overall, although Paulos doesn't really cover any new ground here. (Paulos is a mathematician but the book is mostly devoid of mathematical content.) However, the book is a lot more lighthearted than some of the snarky anti-religion books that have been out there lately (say, Dawkins' The God Delusion, or Hitchens' God Is Not Great). Worth reading, although I'm not sure if it's worth paying $20 sure, since it's quite short.

But I just wanted to share the following:

Although an atheist, Erdos often referred to an imaginary book in which God has inscribed all the most beautiful mathematical proofs. Whenever he thought that a proof or argument led to a particularly exquisite epiphany, he'd say, "This one's from the book." (Alas, none of the arguments for the existence of God are even close to being in God's book.)

I suspect that some of the people who came up with such arguments thought they were proofs from the book, though. Indeed, are there any long arguments for the existence of God, or are all they all the sort of thing that can be written in half a page or so?

(And although Erdos' book is imaginary, Martin Aigner and Gunter Ziegler's Proofs from THE BOOK is real.)

He starts out by squaring some two-digit numbers... this didn't impress me much, because I could almost keep up with him. (And 37 squared is especially easy for me. One of my favorite coffeehouses in Cambridge was the 1369 Coffee House, and at some point I noticed that that was 37 squared. So I'll always remember that one.) Squaring two-digit numbers is just a feat of memory. Three- and four-digit numbers, though... that's a bit more impressive. And of course I'm harder to impress in this area than the average person.

One trick that might not be obvious how it works: he asks four people to each find a number 8649x (the 8649 was 93 squared, from the number-squaring part of the show) for some three-digit integer x, and give him six of the seven digits in any order; he says which digit is left out. How does this work? 8649 is divisible by 9. So the sum of the digits of 8649x must be divisible by 9. So, for example, say he gets handed 2, 2, 2, 7, 9, 3; these add up to 25, so the missing digit must be 2, to make 27? (How could he tell apart a missing zero and a missing nine? I suspect there's a workaround but I don't know what it is; the number 93 was given by someone in the audience, so I don't think it's just memory.)

He also asks people for the year, month, and day which they were born and gives the date; I found myself trying to play along but I can't do the Doomsday algorithm quite that fast... and I suspect he uses something similar. (I noticed that he asked three separate questions: first the year, then the month, then the day. This gives some extra time. I know this trick well; when a student asks a question I haven't previously thought about, I repeat it. I suspect I'm not the only one.)

The impressive part, for me, is not the ability to do mental arithmetic -- I suspect most mathematicians could, if they practiced -- but the ability to keep up an engaging stage show at the same time.

(The video is on ted.com, which shows talks from an annual conference entitled "Technology, Entertainment, and Design"; there look to be quite a few other interesting videos on there as well.

which I am calling unfamiliar not because it's a particularly deep result, but because it's just not something one usually writes out. It could be proven by induction from the more "standard" product rule (uv)' = u'v + uv'.

But why don't we teach this to freshmen? Sure, the notation might be a bit of a barrier; I get the sense that a lot of my students learn the Σ notation for the first time when we teach them about infinite sequences and series, at the end of the second-semester calculus course; of course they learn about derivatives, including multiple derivatives, sometime in the first semester. (If it is true that they are seeing the Σ notation for the first time then, it doesn't seem quite fair, because then we're asking them to assimilate this weird notation and some real understanding of the concept of infinity at the same time. Indeed, at Penn we often tell them not to worry so much about the notation.) But ignoring the notational difficulties, fix a value of k -- say, 4 -- and get

so basically we notice two things: there are four primes in each term, and the coefficients are the binomial coefficients, which are familiar to most students.

One doesn't take the fourth derivative of a product that often; but even knowing that might be preferable to

Also, one can expand a rule like this to products of more than two terms; we have

Again, this doesn't come up that often, and I don't want to try to write it for derivatives of products of an arbitrary number of factors. Still, the idea is fairly natural but how many freshmen would even know that

?

I really don't know the answer to this -- but products of three factors are not incredibly rare, and the rule here is quite simple -- just take the derivative of each factor in turn, and sum up the results. There's even a nice "physical" interpretation of it -- how much does the volume of a three-dimensional box change as we change its various dimensions?

The coefficients seem kind of arbitrary, though; the point of Hardy's paper is that if things get recast in terms of partial derivatives they go away, both here and in Faa di Bruno's formula for the derivative of a composition of functions. One way to think of this is to imagine that, in the product rule, we have a "prime 1", a "prime 2", and so on up to a "prime k" if we're taking kth derivatives; we attach these primes to the variables in all possible ways, sum up the results, and then "collapse" them by reading all the primes to just mean ordinary differentiation.

06 January 2008

A fictional "presidential candidate" on The Simpsons said something about how the top fifth of Americans consume sixty percent of the resources and the bottom two fifths consume only one eighth of the resources, leading to his slogan "end quintile disparity".

So I usually don't talk about politics here. And for the moment, this blog will refrain from endorsing a presidential candidate. This is mainly because I haven't thought too hard about the presidential elections, because the Pennsylvania primary isn't until April and the nominations will probably be decided by then; the election that really matters for me is the general election, and I don't want to get too attached to a particular candidate right now since they may not be in the general election. (Here's an interesting interview with William Poundstone on different methods of voting, via Slashdot.)

But it occurs to me that one thing that we should be against in a presidential candidate is pigheadedness of the sort that George W. "Stay The Course" Bush has shown. New information becomes available, and this is something that any presidential candidate -- well, really any president -- should take into account. They should not become wedded to their positions if new information becomes available. (By the way, I'm not saying that I want a president who decides what to do based on opinion polling that tells them whether they'll be able to keep their jobs. I want someone with principles -- but these principles should include a willingness to change their mind.) I don't want to go so far as to say "path-dependence is the scourge of history", but I'll say it inside quotes.

05 January 2008

One of Chu-Carroll's big points here is that mathematical notation should be used as a tool for communication, not just a way to make the author look smarter.

Scott Aaronson had a recent post Ten Signs a Claimed Mathematical Breakthrough is Wrong which is also of interest. His first criterion for identifying a flawed "breakthrough" -- "The authors don’t use TeX." -- seems in itself quite strong. Of course, this is not because TeX forces the author to be a good mathematician (wouldn't it be nice if such a program existed?). Rather, the use of TeX is something of a shibboleth; people who haven't bothered to learn TeX are, at the present time, probably outside the mainstream of the mathematical community. I'm not saying that this automatically means the author is wrong -- there's no reason why advances can't come out of left field -- but that's a strike against such an author that they have to overcome. Using nonstandard nomenclature or notation -- which Aaronson doesn't mention -- is in the same general area.

04 January 2008

I didn't know that the original meaning of the Latin word norma had to do with right angles (which is probably the most common technical meaning, though there are lots of others); the meaning of "following a rule" is a metaphorical extension of that, coming from the square that was used to make right angles.

Dr. Sarah's Futurama Math, from Sarah Greenwald. Apparently a new Futurama DVD was just recently released, if you care about that sort of thing. (Personally, I like the show but not enough to go out of my way to watch it.) The DVD includes a lecture on the math of Futurama. I didn't know that a lot of the writers of the show had serious mathematical training, but it doesn't surprise me at all.

Also, simpsonsmath.com from Greenwald and Andrew Nestler. I like this one more, because I can get The Simpsons but not Futurama on my dirt-cheap cable package, so the Simpsons references are more current to me. I linked to this one a long time ago, but you probably weren't reading this blog then, because at the time I had maybe one percent of the readers I have now.

(In a not-all-that-strange coincidence, I'm reading William Poundstone's biography of Carl Sagan. Sagan was born in 1934, and often cited the "real" Futurama, a pavillion at the 1939 New York World's Fair, as one of the first things that pushed him towards being a scientist.)

03 January 2008

The average price for an apartment reached $1.4 million in the last quarter of 2007.... The reports noted, however, that average prices were being pushed to record levels because of the increasing number of apartments selling at the top end of the market, above $10 million... Still, real estate brokers and economists point out that the overall market has performed just as well as the luxury market. In fact, half of the sales in Manhattan involved apartments that sold for less than $828,0000 [sic], according to data tracked by Halstead Property and Brown Harris Stevens.

Assuming that that "$828,0000" is actually $828,000, this makes sense; the distribution of housing prices is skewed, just like the distribution of incomes. It's rare, though, to see an article that mentions both the mean and the median of a distribution.

A photographer that I know went out and took some beautiful pictures of downtown Philadelphia overnight; then he complained that he picked the "coldest day of the year" to do it. Of course, he meant the coldest day of the year so far.

But I suspect it won't be the coldest day of the year, for the following reasons:

the coldest weather in Philadelphia usually comes in late January. (But this is a bit specious; if this morning's low had been 5oF (-15o C) I wouldn't be saying that.)

In the last eleven years, it has been 19 degrees or below on some day after January 3 every year. (There's more complete climate data available out there, but I don't feel like looking at it.)

But that raises an interesting question: what can we expect the coldest temperature all year to be?

The normal low temperature in Philadelphia on any given night in late January is 25 degrees -- that's when the normal low is at its lowest -- but only an idiot would say that this means the average lowest temperature all year is 25 degrees. (In fact, even without looking at the records, I'd be shocked to learn that there has been a year in recorded history in which Philadelphia didn't go below 25 at some point.) One tempting thing to do is to make the following assumptions:

the coldest day of the year will always fall between, say, January 5 and February 10, a period of 37 days. Call this number t;

the low temperature on any day in that interval is normally distributed with mean 25 degrees and standard deviation σ

thus the annual minimum temperature should be at the 1/(t+1) quantile of that normal distribution.

That turns out to be roughly 2 standard deviations below the mean. Unfortunately, I don't know what the standard deviation is! And there's a much bigger problem here -- I've implicitly assumed that the annual minimum always falls in a certain cold period, and that the temperature on each day is independent of each other day! The second assumption is spectacularly bad. If it's colder than average today, it'll probably be colder than average tomorrow.

Also, I've assumed that the low temperatures on a given day of the year are normally distributed, which probably isn't true...

However, one could use a method like this to estimate, say, the mean number of days below, say, fifteen degrees in a given year. If we know the distribution then we can compute, say, that the probability of it being below fifteen degrees on the night of January 3 is ten percent; adding up these probabilities for every night of the year gives an expected mean number of cold nights; call this μ. But the variance is important here, as well. All I can say instinctively is that the variance is probably greater than that of a Poisson distribution with mean μ (which is what you'd usually use to model "rare events"), since if it's cold today it'll probably be cold tomorrow, and it probably doesn't tell us much about whether next week will be cold; in particular I suspect that within the winter season, there aren't pairs of days for which being cold is inversely correlated.

02 January 2008

What surprised me is how, at least in this particular game of Tetris, each individual piece got broken up pretty quickly, with only one or two pieces out of the original four remaining on the board. (The four people who made up each dropping piece were wearing the same color T-shirt, and there were enough different colors that for the most part one could assume that people wearing the same color and next to each other were part of the same piece.) This isn't obvious if you're used to, say, the NES version of Tetris (which is the one I played the most); the color scheme is different there. If you stop to think about it for a moment, though, it makes perfect sense that this should happen; it's very rare that four squares which fall at the same time all get eliminated at the same time.

Incidentally, the people writing that post say "Now, we're not mathologists". What is a mathologist? I would argue that mathologists are to mathematicians as musicologists (i. e. people who study music in a scholarly fashion, as opposed to those who produce it) are to musicians. However, there don't seem to be many mathologists in this sense; it's quite difficult to get the sort of deep appreciation for mathematics that one needs without actually doing mathematics, or so it seems to me.

01 January 2008

Here in the United States, the Writers Guild of America -- who write a lot of the television shows -- are on strike, and have been for two months. As a result, a lot of TV shows are in reruns, and the networks are starting to move into "reality" television which doesn't need writers.

Anyway, zap2it.com, a web page with TV listings, currently has a banner ad reading: "WRITERS' STRIKE DAY -307: Find out how the Hollywood writers' strike will affect you." The strike actually started on November 5, 2007; this is day 58.

Apparently whoever wrote the code which automatically generates these banners didn't consider the possibility that the strike might run into 2008. And 308 days from today (the putative "Day 1" if the count increments by one each day) is November 4, not November 5... 2008 is a leap year! If I had to guess, I'd say that the code incorporates the fact that the strike started on the 309th day of the year... which is November 5 in an ordinary year, but November 4 in a leap year.

And do people remember how occasionally, around eight years ago, you'd see web sites referring to "19100" for 2000, "19101" for 2001, and so on, since the code which was automatically generating the dates hadn't been fixed for Y2K? This error reminds me of that, although there's no mathematical similarity. But they're both "stupid calendar tricks".

I just came across this article: Florian Cajori, History of symbols for n-factorial. Isis, Vol. 3, No. 3 (Summer, 1921), pp. 414-418. Available from JSTOR, if you have access. Cajori was the author of A History of Mathematical Notations, which is the canonical source on the subject of the history of mathematical notations; I will confess I have never seen a copy of his book.

I didn't realize how many historical notations there have been for the factorial. n! is of course the most common one these days. Γ(n+1) is seen sometimes, although I personally find it a bit perverse to use this notation if you know that n is a positive integer.

Supposedly Gauss used Π(n). Someone named Henry Warburton used 1n|1, a special case of an|1 = a(a+1)...(a+(n-1)). (This is a variant of the Pochhammer symbol. It's not clear to me what the 1 in the superscript means.) Other notations include a bar over the number and writing the number inside a half-box (with lines on the left and below). Augustus de Morgan is mildly famous for not using a symbol, and once said: "Among the worst of barabarisms is that of introducing symbols which are quite new in mathematical, but perfectly understood in common, language. Writers have borrowed from the Germans the abbreviation n! to signify 1.2.3.(n - 1).n, which gives their pages the appearance of expressing surprise and admiration that 2, 3, 4, &c. should be found in mathematical results." (I'm copying this from Earliest Uses of Symbols in Mathematics, although I learned it from somebody's office door. I found the Cajori article while looking for this quote just now.)

Apparently adopting any sort of symbol was resisted by some people, at least in their more elementary writings (textbooks for undergraduates and the like), because they didn't want to overload their students with symbols. I'm not sure if I agree with this for the factorial. A little thought experiment, though -- why don't we have a symbol for 1 + 2 + ... + n? (Although a bit of reflection convinces me that the reason is because we have an explicit formula for this, namely n(n + 1)/2.) But n! probably arises more often.

While I'm on the subject, you should read Knuth's Two notes on notation (Amer. Math. Monthly 99 (1992), no. 5, 403--422; arXiv:math/9205211v1), which suggests the notation [P] for "1 if P is true, 0 if P is false"; this turns out to be a quite useful generalization of the Kronecker delta. It also suggests notation for the Stirling cycle and subset numbers (those are, um, the Stirling numbers of the first and second kinds, respectively? or the second and first kinds? See, those names are better.)