FiveThirtyEight Baseball Division Champs Puzzle

Update: I’ve added a link to the Perl progam I used to do these simulations.

Oliver Roeder presents a weekly puzzler on FiveThirtyEight, and this week it was a baseball-themed puzzle. Assume a sport (say, “baseball”) in which each team plays 162 games in a season. Also assume a “division” (e.g. the “AL East”) containing 5 teams, each of exactly equal skill. In other words, each team has exactly a 50% chance of winning any given game. The puzzle is to compute the expected value of wins for the division-winner.

Interestingly, the problem is open to interpretation, and the result I get depends on what assumptions I make. My initial assumption was to treat each game for each team as a simple coin-flip. I ran 100,000 simulated “seasons”, getting an average of 88.4 wins for the division leader. But games have two teams, and who the opponent is could matter to this problem. In an extreme situation, the “coin flip” model could result in winning the division with 0 wins, in the highly improbable event that each team lost every game.

Since I happened to have the 2016 MLB schedule available, I used it for each game. This adds the constraint that in games involving two teams in the same division, one team winning implies its opponent must lose. Doing this, I got an average of 88.8 wins for the division winner.

The third variant I tested produced the toughest constraint: I assumed the five teams only played games among themselves (at least 40 against each opponent). Thus every win for one team always means a loss for its opponent. This gave me an average of 89.3 wins for the division winner.

There are other corner cases I can envision: for example, if two or more teams tie for the lead, there might be a one-game playoff (or some larger sequence of playoffs) to break the tie, but those games would count as “regular season” games.

So I modified my program to track ties for the championship, and also to let me specify a number of games against division opponents in the schedule. Rerunning for different levels, I got this table when simulating 10000 seasons for each different schedule type:

Tot G

Div G

Avg Champ Wins

Sole Winner

2 Way Tie

3 Way Tie

4 Way Tie

5 Way Tie

810

0

88.41

91313

8059

591

37

0

760

50

88.50

91315

8086

578

21

0

710

100

88.62

91421

7975

575

28

1

660

150

88.72

91565

7851

552

31

1

620*

190

88.83

91688

7733

557

22

0

610

200

88.83

91730

7717

526

27

0

560

250

88.93

91726

7718

528

27

1

510

300

89.06

91884

7599

499

18

0

460

350

89.17

91795

7679

496

30

0

405

405

89.27

92131

7418

434

15

2

* In this case, I used the actual 2016 MLB schedule to determine games.

So what can we see? Rounding to one decimal place for expected wins, this later run replicated the findings from my earlier runs for the 0 division games, all division games, and actual MLB schedule cases.

The constraint that a win for one team implies a loss for a division rival has a real effect, not just on the average wins of the division winner, but also on how likely there is to be a tie for the division lead. You’re more likely to have a tie for the lead when teams play fewer games against their division rivals, because the more often one team wins implies another loses, the less likely two teams will match the same above average win total.

The 5-way tie is extremely unlikely in any event, as it implies all five teams go 81-81. But I would expect it would be more common in the all division games scenario than in the 0 division games one, because in the latter case the cumulative division record will usually not be .500, while in the former case it always is by definition.

All of this is done via a simulation program. If we take the 0 division games assumption (i.e. each team’s games are all their own coin flip, independent of results of any other team), the team wins will follow a binomial distribution where N = 162. So there probably is a closed-form way to arrive at the numbers I show above – compute the expected value of the highest of five independent draws from the binomial distribution. Adjusting for schedule constraints complicates the problem further from an analytical perspective, and since I already have a simulation program now, that’s good enough for me!

The key point here is that when teams play other division members, there’s one random event which gives two results, but when they play outside teams there’s still one random event, but which now gives only one relevant result in this model. So having more games against division teams leads to a somewhat higher expected win total for the division winner.

Extension Idea

Oliver also asks readers to come up with an “extension” of the problem. One idea I’d thought of was to find out if we assume 4 of the five teams remain equal in skill level (e.g. each has a 50/50 chance to win any game against the other 4 such teams), how high an expected winning percentage would we need to give the 5th team against the other 4 so that we’d expect this 5th team would win (or tie for) the division title at least 50% of the time? What if we wanted the 5th team to win outright at least half the time?

The schedule format affects these questions, too, so I’ve done two runs, one assuming no intra-division games, and the other assuming all games are intradivision (the two extremes from above).

First, with no intradivision games, simulating 100000 seasons with different winning percentages for the best team, I get:

Best Pct

Div Games

Avg Champ Wins

Best Titles

Sole winner

2 Way Tie

3 Way Tie

4 Way Tie

5 Way Tie

0.50

0

88.39

22015

18352

3270

376

17

0

0.51

0

88.78

28870

24638

3798

394

38

2

0.52

0

89.26

35897

31150

4335

391

21

0

0.53

0

89.87

44527

39494

4609

400

23

1

0.54

0

90.60

52603

47363

4845

375

20

0

0.55

0

91.44

60788

55635

4799

334

19

1

0.56

0

92.49

68675

63931

4474

259

11

0

0.57

0

93.62

75497

71321

3930

235

11

0

0.58

0

94.87

81547

77978

3389

166

14

0

0.59

0

96.20

86772

83818

2821

132

1

0

0.60

0

97.59

90501

88183

2220

95

3

0

0.75

0

121.48

100000

99999

1

0

0

0

0.90

0

145.77

100000

100000

0

0

0

0

1.00

0

162.00

100000

100000

0

0

0

0

In these tables I’m now only showing how many times the “best” team outright wins or ties for the division lead, but the average wins column is still of whichever team(s) won the division. So that is comparable to the first table, and we see that when the best team’s winning percentage is very close to 0.50, the average champion wins just a little more often. I included 0.50 as a control; this is effectively another run of the test above, and I’m getting similar results. But it is interesting to see that just a 1% increase in the best team’s expected winning percentage results in that team winning the championship much more often, and that widens faster as you increase the best team’s percentage.

From the above table, the best team wins at least half the time at 0.54 or higher, and wins outright half the time or more at 0.56 or more. At very high levels, the wins of the champion team are always the wins of the best team, and we see those converge to the winning percentage I assign the best team. At lower levels, the winning percentage of the division winner will be higher than that of the best team.

Now I assume 162 division games for each team:

Best Pct

Div Games

Avg Champ Wins

Best Titles

Sole winner

2 Way Tie

3 Way Tie

4 Way Tie

5 Way Tie

0.50

405

89.29

21672

18299

3059

297

16

1

0.51

405

89.33

29290

25234

3697

341

17

1

0.52

405

89.57

37866

33521

4012

317

15

1

0.53

405

89.96

46855

42183

4353

305

13

1

0.54

405

90.54

56273

51683

4322

260

7

1

0.55

405

91.35

65316

60984

4083

244

4

1

0.56

405

92.27

73455

69539

3735

179

2

0

0.57

405

93.40

80680

77374

3151

151

4

0

0.58

405

94.65

86212

83570

2532

108

2

0

0.59

405

96.01

90892

88896

1924

71

1

0

0.60

405

97.46

93970

92497

1430

42

1

0

0.75

405

121.52

100000

100000

0

0

0

0

0.90

405

145.81

100000

100000

0

0

0

0

1.00

405

162.00

100000

100000

0

0

0

0

As before the 0.50 case is consistent with the earlier run. But now that we’re playing only inside the division, the best team wins the division more often for a given winning percentage. At 0.54, it wins more than half the time overall, and also outright, for the first time in the levels I’ve tried. Yet even here, a team expected to win 60% of its games against its opponents still does not win the division about 6% of the time. Luck can have a bigger impact than we usually think!