18 October 2007

will the best team win?

Basically, there's a very good chance that the best team does not win in the Major League Baseball playoffs.

On a related note, often before a playoff series sportswriters will make predictions of the form "[team] in [number of games]". In a best-of-(2n-1) series, if we know the probability p that a team (let's call them the Phillies) wins a game against some other team (as I've done before, let's call them the Qankees), then we can compute the probability that they'll win the series. Let q = 1 - p be the probability that the Qankees win a single game. Then the probability that the Phillies win in k games, where k is between n and 2n-1, can be obtained as follows. There are arrangements of wins and losses in a k-game series that allow the Phillies to win in k games -- they must win n-1 our of the first k-1 games. Each of these occurs with probability pnqk-n. So the probability that the Phillies win in k games is

and likewise the probability that the Qankees win in k games isQ_{n,k}(p) = {k-1 \choose n-1} (1-p)^n p^{k-n}

(The number n is the number of games needed to win the series; in a best-of-seven series, n = 4.) For example, P4,6(.6) is the probability that the Phillies win a best-of-seven series in six games, given that they have a .6 probabilty of winning each games; it's .

A prediction of the winner of a series and the number of games they win in amounts to a prediction of p. If we assume that the predictor simply predicts the most likely outcome of the series given what they believe p to be, then we want to find the largest of

.To do this, we start by finding the ratio ; this is k(1-p)/(k-n). If this is greater than 1, it means a win in k+1 games is more likely than a win in n games; it becomes greater than 1 at p=(n-1)/k. So, for example, as we decrease p, a five-game win in a best-of-seven series becomes more likely than a sweep when p = 3/4 (we have n=3 and k=4); a six-game win in a best-of-seven series becomes more likely than a five-game win when p = 3/5; a seven-game win becomes more likely than a six-game win when p = 3/6 = 1/2.

And in general, as we decrease p, a win in (2n-1) games becomes more likely than a win in (2n-2) games when p = (n-1)/(2n-2) = 1/2.

But when p dips below 1/2, that's also when losses should become more likely than wins!

In particular, the ratio between the Phillies' probability of winning in 2n-1 games and that of losing in 2n-1 games is Pn,2n-1(p)/Qn,2n-1(p) = p/(1-p); if p < 1-p then winning in 2n-1 games can't be the most common outcome. At best, winning in 2n-1 games is as probable as winning in 2n-2 games... when p = 1/2, and at that moment losing in 2n-1 and losing in 2n-2 have the same probability.

Concretely, in a best-of-seven series you should predict that the Phillies:

win in four, if p > 3/4;

win in five, if 3/5 < p < 3/4;

win in six, if 1/2 < p < 3/5;

lose in six, five, or four in cases symmetric to the three above.

If p = 1/2, then the probability of a win in six, win in seven, loss in six, or loss in seven are all the same, 5/32 each.

The point here is that either type of seven-game series is never the sole most likely outcome in this model (although it may be in reality, because games aren't independent -- home-field advantage, who's starting that day, and so on enter into the picture), and that it almost never makes sense to predict a sweep (playoff teams will be evenly enough matched that the worse one should be able to beat the better one more than one-quarter of the time).

Yet four- and seven-game series happen. I'm not saying that these are ridiculously rare events, just that it doesn't make sense to predict them a priori. It's a bit surprising, though -- if you actually played all seven games, 4-3 would be the most common outcome for series than are nearly evenly matched -- but enough of those come from the team already down 4-2 winning the last game that you don't see that in the best of seven format.

Realistically, though, a prediction of "[team] in 7" is just a sportswriter's way of signaling "I think this team is slightly better than its opponent", which is all it should be taken as.