Devlin's
Angle

November 2008

Polling, Polling, Polling

One nice thing about waking up on November 5 will be that we'll be spared the daily media bombardment of election opinion polls we've been subjected to for weeks now. Or maybe not. With an entire industry of pollsters, and news media eager for something to report on, a more likely scenario is that only the question will change. "Who will you vote for?" will be replaced by "How do you feel about the result?" Or some such.

Truth is, I find myself sucked in as much as anyone else. I could simply ignore the polls but I don't. The reason is that they actually do tell us something. Not how the election will turn out, of course - no one can do that; rather, they tell us how our fellow citizens say (at the time of the
poll) they intend to vote. To the degree that declared intentions indicate subsequent action, and barring unusual circumstances, the former can be inferred from the latter. If this were not the case, surely none of us would pay the opinion polls much, if any, attention.

The fact is, whether we like it or not, opinion polling is now a major part of our life, and has been for many decades. We take for granted the fact that by asking a tiny fraction of the population - perhaps as few as 1,000 Americans - we can obtain a fairly reliable indication of how an entire state will vote. Yet if you stop and think about it for a moment, that is a remarkable fact.

Even more remarkable, to my mind, is the very notion that we can make any prediction about a future event such as an election, or how a roll of two dice will come out, or what might happen to the stocks in our retirement fund. What's that you say? What is remarkable about that? After all, you say, no one is claiming that we can ever know for sure what tomorrow will bring. Rather what polling - and other predictive techniques - do is put numerical values on the various likelihoods of future events. What's the big deal about that? Surely, anyone with even the most basic mathematical training will accept that you can assign probabilities to future events.
Right?

True - today. But that's a fairly recent state of affairs. Aristotle - who certainly was no slouch when it came to math - believed, and wrote, that one realm where mathematics could not be applied was the future. The future was unpredictable to man, known only to the gods.

And so everyone believed until 1654, when the great French mathematician Pierre de Fermat solved the problem of the Unfinished Game, a topic I touched on briefly in last month's column.

The Unfinished Game

The problem of the unfinished game, also known as the problem of the points, was described in a book on arithmetic and geometry written by the Italian mathematician Luca Pacioli in 1494, though it is known to predate that mention. It asks how the pot should be fairly divided when a multi-round tournament has to be abandoned before it is finished. For instance, suppose two players are rolling a pair of dice and agree to play a best of five rounds tournament. Three rounds are played, leaving one player ahead 2 to 1, at which point they must abandon the game. How should they divide the pot?

Pacioli was unable to solve this problem. So too were a number of other mathematicians (and gamblers) who tried, including Girolamo Cardano, Niccolo Tartaglia, and Lorenzo Forstani. The consensus was that the problem could not be solved.

Then, early in 1654, a gambler by the name of Antoine Gombaud, more often referred to in modern history books by his French nobleman's title of the Chevalier de Mere, asked his friend the mathematician Blaise Pascal.
Pascal produced a complicated argument that can be made to work, but was not happy with it, so at a friend's urging he wrote to Fermat about it.
Fermat quickly found a simple solution.

There are two rounds left unplayed, argued Fermat. In each round, either player can win, so there are in all four different ways the game could continue to its five-round completion. The player who has won one round to the other's two must win both those final rounds in order to win the contest; in the other three possible endings, the player who is ahead after three rounds will win. Therefore, said Fermat, the player who is ahead when the game is abandoned should take 3/4 of the pot, with the other player taking 1/4.

To anyone who sees this solution today, it seems simple enough. (The solution assumes the tournament is thought of as a "best-of-five" rounds, as opposed to a "first-to-three". You need a slightly more complicated argument in the latter case, but the answer is the same, a 3 to 1 division of the pot.) But no one before Fermat saw it, including Cardano who did work out all of the basic rules we use today to combine probabilities.
Moreover, when he did see Fermat's solution, Pascal could not accept it, and nor could various of his colleagues he showed it to. What was their problem?

Since the computation is trivial, indeed no different from the calculation of the odds in any game of chance (and actually much simpler than many), the only thing that could be holding everyone back was the fact that what Fermat was counting were "possible futures." Something that two thousand years of received wisdom said was not possible.

Once word got out about Fermat's breakthrough, however - presumably through the highly mobile network of gambling European noblemen - it did not take long for others to jump into the "future prediction" act. Within a single lifespan, modern future prediction and risk management were in place.

The modern world begins

The speed of developments that followed the solution to the problem of the unfinished game is staggering.

1657. Christian Huyghens writes a 16-page paper that lays out pretty well all of modern probability theory, including the notion of expectation, which he introduces.

1662. John Graunt, an English haberdasher, publishes an analysis of the London mortality tables, and in so doing establishes the beginnings of modern statistical inference.

1709. Nikolas Bernoulli writes a book describing applications of the new methods in the law. One problem he shows how to solve is how long must elapse after an individual goes missing before the court can declare him dead and allow his estate to be divided among his heirs.

1713. Jakob Bernoulli writes a book showing how the new probability theory can be used to predict the future in the everyday world. This is the first time the word "probability" is used in the precise, mathematical sense we use it today. He also proves the law of large numbers, of which more in a moment.

1732. The first American insurance company begins in Charleston, S.C., restricted to fire insurance.

1732. Edward Lloyd starts the precursor of what in 1734 becomes Lloyd's List, and eventually gives birth to the insurance company Lloyds of London.

1733. Abraham de Moivre discovers the bell curve, the icon of modern data collection.

1738. Daniel Bernoulli introduces the concept of utility to try to get a better handle on human decision making under uncertainty.

1760s. The first life insurance companies begin.

Then came opinion polling.

Enter the pollsters

The mathematical basis for opinion polling is Jakob Bernoulli's law of large numbers. Roughly speaking, this says that if you take a sufficiently large random subcollection of a population, it will be representative of the entire population. The more numerous the random subcollection, the more it will reflect the entire population.

The first known opinion poll was in 1824, when the Harrisburg Pennsylvanian newspaper conducted a local poll that showed, incorrectly it turned out, Andrew Jackson was leading John Quincy Adams in the presidential race. (Jackson became president next time round.)

The first national poll was in 1916, when the Literary Digest predicted - correctly - that Woodrow Wilson would be elected. Their approach was to mail out millions of postcards and count the returns. This is now recognized as a woefully unreliable method, but the magazine managed to correctly predict the following four presidential elections this way before getting it badly wrong in 1936, when it erroneously predicted that Alf Landon would beat Franklin D. Roosevelt.

The difficult part - or rather, one difficult part - of conducting a reliable poll is making sure that the sample is random. The math requires this. Using a non-random sample was a large part of the reason why the 1936 Digest poll came unstuck. That same year, George Gallup conducted a much smaller poll based on a properly representative sample and got the right answer.

Another famous case when the pollsters got it wrong was in 1948, when major polls, including Gallup, indicated that Thomas Dewey would defeat Harry S. Truman in the presidential election in a landslide victory. As we know, Truman came out on top, and there is a famous photograph of a smiling Truman holding up a first-edition Chicago newspaper that had a big headline saying "Dewey wins."

The problem that time was that the pollsters relied on telephone interviews, and in those days only wealthier people had phones, and so the sample was heavily biased toward Dewey supporters.

Most people are surprised by how small a random sample can be and yet still yield a reliable result. If you do the math, you find that (provided the sample polled is truly random) 1,000 people will give you a prediction accurate to within a 3% margin of error. You could get the error down to 1% if you polled 10,000, but the more people you poll the more expensive it gets, of course. 1,000 is a number typically used these days.

In recent elections, with phones no longer restricted to the more wealthy, phone interviews seem to have done pretty well. But they do leave out people whose only phone is a mobile phone, and as more and more young people get to voting age, that could become a significant factor.

The kinds of polls you see on news organization web sites that ask people to vote on an issue are extremely unreliable, because the population sampled is self-selected. Polling results are reliable only when the sample is chosen in a truly random fashion.

Finally, while on the topic of applying mathematics to predicting elections, I can't pass up the opportunity to point you to what is surely the most cerebral political ad video in this year's campaign. It's titled "The Theorem". I point you to it with no intended or implied endorsement, etc. etc.