Let's say that tomorrow national president election is held. A poll asks 1100 persons which of the two candidates, A or B, will he or she will vote. 750 say will vote A, and 250 say will vote B. What are the chances each will win the election? I think this problem is not the same as choosing a random voter and let him or her decide. Therefore, I assume that the chance A wins elections is higher than 0.75, and closer to 1.

Update: Since the election generated a lot of fuss let me rephrase the question: you have a box containing $N$ white and black balls. From the box you extract $N' <= N$ random balls and count $W$ white balls and $B$ black balls ($W + B = N'$). What is the probability that the box initially contained more white balls than black balls?

much much much much much higher. I would simply say 1.
–
Gil KalaiDec 5 '09 at 21:40

Real case: in Romania the president will be elected tomorrow. Polls say 54% voters will choses one candidate and 46% for the other. I wouldn't simply say that first candidate will win.
–
Alexandru MoșoiDec 5 '09 at 21:49

1

Without some details about how you model the chances of people lying to the pollster and/or changing their minds after the poll, this has no answer/.
–
Mariano Suárez-Alvarez♦Dec 5 '09 at 21:57

Assume that the people won't lie to the pollster and they won't change their mind.
–
Alexandru MoșoiDec 5 '09 at 22:01

Or even if they do vote as they told the pollster, it's not clear that the sampling was probabilistic.
–
José Figueroa-O'FarrillDec 5 '09 at 22:01

6 Answers
6

Even with your updated version, there is not a well defined answer (which does not make this a bad question!) You need to have a prior distribution on the probability that the box contained various numbers of balls to start with.

Let's do some small examples. Suppose, that you are given a box with three balls. You sample one ball, and it is white. What should your estimate be that the majority of the balls are white? As we will see, we can't answer this question without some prior beliefs about how the balls were placed in the box.

Suppose, at first, that you know the two balls were chosen at random from a large collection, with equally many white and black balls. So, before sampling, your probability estimates were $1/8$ for three black balls, $3/8$ for BBW, $3/8$ for BWW, and $1/8$ for WWW. The probability that the initial distribution was BBW, and that you would sample a white, is $(3/8)(1/3)=1/8$. The corresponding probabilities for BWW and WWW are $1/4$ and $1/8$. So, conditioned on the fact that you sampled a white, the probability that the balls
are BBW is $(1/8)/(1/8 + 1/4 + 1/8) = 1/4$, and we see that the probability that the majority are white is $3/4$.

But now suppose that white balls are very rare. For example, say your box was packed from a collection which had 9 black balls for every 1 white ball. Then the corresponding probabilities before sampling are $(9/10)^3$ for BBB, $3\*(1/10)\*(9/10)^2$ for BBW, $3\*(1/10)^2\*(9/10)$ for BWW and $(1/10)^3$ for WWW. The probabilities that these distributions would hold and that you would sample a white ball are $(1/10)\*(9/10)^2$ for BBW, $2\*(1/10)^2\*(9/10)$ for BWW and $(1/10)^3$ for WWW. So, conditioned on you sampling a white ball, the probability that the box holds BBW is
$$\frac{(1/10)\*(9/10)^2}{(1/10)\*(9/10)^2+ 2\*(1/10)^2\*(9/10)+(1/10)^3},$$
and so forth. I'm too lazy to calculate the probability that the box is majority white in this case, but it should be much less than $3/4$.

This is why polls always report their margin of error as a $p$-value. When a pollster says "75% of Americans will vote for Kodos, with a margin of error of $\pm 10%$", this means "the probability that I would have obtained these poll results, conditioned on the assumption that either fewer than 65% or more than 85% of Americans will vote for Kodos, is $<.05$." (The use of $.05$ as a threshold is traditional, and has no deep significance.) If you have prior reason to know that the majority is very likely to vote for Kang then, even after you see the poll, the odds of Kodos winning might still be slim.

You question about balls is still not well-defined. You would be able to solve it if you had some a priori probability on the set of possible distributions of preferences of voters. Then you data would change this distribution to an a posteriori distribution $P\{$number of votes for A is Q$\} = P_i$ and it would be possible to find the probability that A wins as $\sum_{i>N/2} P_i$.

An example why more data is necessary: suppose the political tradition in the country is as follows:

There are exactly two parties

During each election each party suppors exactly one candidate, possibly the same

Exactly half of the people belongs to each party and votes for the candidate party supports

Then the survey data rule out the possibility that the parties support the same candidate so there must be a draw with probability 1.

While it's strange to have such a big amount of a priori information (though who are we to judge the ways of those people?), and it's unlikely that the survey will split 750/250 if the real probabilities are 50/50, this it's logically consistent and possible with your model.

So it's clear there can be no general method for finding the probability you requested the from data you have.

A possible a priori distribution would be uniform among the number of voters voting for A, though a more realistic one would be that each person votes for A with probability P. Either way, the probability that given your data A will win is close to 1.

I suppose one needs to assume that every potential voter had a nonzero probability of being polled. Then one needs to estimate that probability so that the results from the poll can be weighted accordingly. Otherwise, you really cannot say anything.

Depends on where you've held the poll. If you took that poll in the home town of A, and asked mostly his relatives then the poll is extremely unreliable.

What counts is demography and the number of people who will do the actual voting. This poll would be more reliable if only 10.000 people would vote than when 1 billion people would vote. You could calculate an error factor for your poll, which would be related to the ratio of people polled compared to the actual number of people who will vote. The closer you get to the real number of voters, the more reliable the poll will be.

With 110 million people, chances will still be about 50-50, not 75-25, since both candidates still have a reasonable equal chance. With 11.0000 people, it would be closer to e.g. 60-40 in favour of A, but I haven't done the exact math for this.

Btw, if you take this poll in a shopping centre, you're likely to get a different result than when you would do an online poll. You'd be polling a different audience. This is the demographic influence.

Btw, 750 + 250 equals 1000. Are the last 100 unsure or didn't they have any preference?

To make the notation more meaningful, let $N' = W' + B'$ and let $B$ and $W$ denote the true number of black balls and white balls in the box, respectively.

We need to count the number of ways in which there can be more white balls than black balls out of the $N$ balls where there are at least $W'$ white balls and $B'$ black balls. We then divide this by the number of ways to put black balls and white balls in the box where there are at least $W'$ white balls and $B'$ black balls.

To find the first value, we have to select $N-N'=N-(W'+B')$ balls, with the condition that there are more white balls than black balls in the end. So the condition we require is that $W \geq \textrm{ceil}(N/2)$. There are $N-\textrm{floor}(N/2) = \textrm{ceil}(N/2)$ ways to fill a box with the required number of white balls. Supposing that the number of white balls is at least $W'$, there are $\textrm{ceil}(N/2)-W' +1$ ways to do this.

Now we count the number of ways to put black and white balls in the box, with $W'$ of the balls being white and $B'$ of the balls being black. Suppose we've already put in these $N'$ balls, then we can but between $0$ and $N-N'$ more white balls in and fill the rest with black balls. So there are $N-N' +1$ ways to do this.

This gives us the probability:
$$
\frac{\textrm{ceil}(N/2)-W'+1}{N-N'+1}
$$