See the original blog post for details and history. Here's the short story: in my Statistical and Thermal Physics class, we want to use Monte Carlo simulations to generate brackets for March Madness. There are at least two obvious ways to go about this:

Make some function that tells us the chance that team A beats team B, then flip coins for each matchup. That gets you one bracket. Repeat 100,000 times, collect statistics. This is the way Nate Silver's 538.com handles simulations for basketball, elections, etc, and I should probably implement it (note to self/motivated students: it's as easy as just generating 100,000 new brackets at a given temperature).

Generate one bracket, then do a Monte-Carlo walk through bracket space. This is tougher. We have to figure out how to make a move in bracket space, which is part of the fun of Monte Carlo simulations in general. To see how this is done, check out the code in Bracket.swap and Brackets.simulate.

As you can tell, we take option 2 above. I've made things a bit nicer from a user standpoint this year; here's a walkthrough. First, load up our standard IPython setup

If we had chosen option 1 at the top, we'd just flip coins with a given probability of winning. Here (and this may be a questionable decision), we need to set an overall temperature for our simulation. Intuitively, the higher the temperature, the closer we come to a random outcome. The lower the temperature, the closer we come to a "best seed always wins" bracket. If we're going to make sense of temperature, we should pick a reasonable energy function.

Now, what should our actual temperature be? Historically, we know that an 8 seed vs. a 9 seed should essentially be a tossup. So, as a proxy here, we could just look at the chance of an 8 seed winning over a range of temperatures, and pick the point where it's pretty close to 0.5.

So what does a bracket happen to look like? Well, in the original blog post, I mentioned that we came up with two different ways to run brackets. It can take a while to run a bracket, so a fast way (implemented as runbracket2) is to run separate brackets for each of the individual regions, then take the winner from each region's most common bracket to form a Final Four. Below, we do just that, running 10,000 trials for each of the regions, and 1000 trials for the Final Four.

Let's visualize the results, then make a table of results, as per 538.

In [10]:

# Basic visualizationforregionin'midwest west east south'.split():MMMC.showstats(results[region],description=region,newfig=True)

Here you can see the classic reason to do Monte Carlo samping in the first place: we're absolutely nowhere near fully sampling bracket space. You can tell this from each of the "Unique brackets over the trajectory" plots, none of which have leveled out.

Further, you can see that the strength distribution in each of the regions is pretty different. Check out the energy distribution histograms for visual proof.

And here's our set of tabulated results. Note that things look a little funny for the Final Four: runbracket2 runs the final four separate from the rest of the tourney. So, only four teams are allowed to have Championship and Win percentages above zero (the final four percentages are taken from the first chunk of the tourney).

In [11]:

h=HTML(MMMC.maketable(results))h

Out[11]:

Team

Region

Rank

2nd Round

3rd Round

Sweet 16

Elite 8

Final 4

Championship

Win

Kentucky

midwest

1

100.0

65.74

49.18

36.16

25.29

55.3

34.5

Wisconsin

west

1

100.0

63.38

44.28

30.04

19.44

44.7

24.2

Virginia

east

2

100.0

64.6

41.15

27.68

17.86

51.3

22.4

Utah

south

5

100.0

58.29

35.86

20.96

12.19

48.7

18.9

Arizona

west

2

100.0

62.35

42.23

28.64

18.74

0.0

0.0

Villanova

east

1

100.0

65.73

44.05

29.84

18.61

0.0

0.0

Duke

south

1

100.0

66.03

41.64

25.2

15.05

0.0

0.0

Gonzaga

south

2

100.0

61.67

37.12

23.51

14.22

0.0

0.0

Kansas

midwest

2

100.0

58.6

33.71

19.16

10.31

0.0

0.0

Oklahoma

east

3

100.0

60.74

34.96

18.34

10.01

0.0

0.0

Northern Iowa

east

5

100.0

65.23

37.58

19.72

9.97

0.0

0.0

Notre Dame

midwest

3

100.0

57.37

32.47

18.57

9.65

0.0

0.0

Iowa St.

south

3

100.0

57.01

33.22

17.32

9.07

0.0

0.0

Baylor

west

3

100.0

56.3

33.23

16.77

8.79

0.0

0.0

North Carolina

west

4

100.0

61.15

35.63

17.73

8.7

0.0

0.0

SMU

south

6

100.0

53.2

28.16

14.61

7.62

0.0

0.0

Wichita St.

midwest

7

100.0

51.59

28.31

15.41

7.54

0.0

0.0

Georgetown

south

4

100.0

60.99

28.74

15.08

7.37

0.0

0.0

Michigan St.

east

7

100.0

55.17

25.73

14.43

7.17

0.0

0.0

Iowa

south

7

100.0

51.15

25.99

13.76

6.63

0.0

0.0

Louisville

east

4

100.0

56.62

29.3

14.05

6.58

0.0

0.0

West Virginia

midwest

5

100.0

55.77

31.54

13.3

6.47

0.0

0.0

Providence

east

6

100.0

53.03

27.3

12.58

6.19

0.0

0.0

Butler

midwest

6

100.0

53.61

27.03

13.43

6.05

0.0

0.0

Texas

midwest

11

100.0

46.39

23.9

12.46

5.99

0.0

0.0

Xavier

west

6

100.0

50.71

25.68

12.46

5.95

0.0

0.0

Ohio St.

west

10

100.0

51.62

22.26

11.19

5.29

0.0

0.0

Arkansas

west

5

100.0

57.67

28.22

11.83

5.14

0.0

0.0

Maryland

midwest

4

100.0

55.25

25.84

11.73

5.14

0.0

0.0

VCU

west

7

100.0

48.38

21.78

11.24

5.06

0.0

0.0

Oklahoma St.

west

9

100.0

50.19

21.58

11.31

4.84

0.0

0.0

San Diego St.

south

8

100.0

47.92

23.04

10.71

4.81

0.0

0.0

Oregon

west

8

100.0

49.81

20.61

10.42

4.73

0.0

0.0

Cincinnati

midwest

8

100.0

49.74

19.41

9.93

4.72

0.0

0.0

Davidson

south

10

100.0

48.85

23.46

10.7

4.54

0.0

0.0

LSU

east

9

100.0

53.31

24.25

11.58

4.53

0.0

0.0

North Carolina St.

east

8

100.0

46.69

21.93

10.18

4.35

0.0

0.0

Stephen F. Austin

south

12

100.0

41.71

21.31

9.08

4.3

0.0

0.0

UCLA

south

11

100.0

46.8

21.71

9.51

4.22

0.0

0.0

St. John's

south

9

100.0

52.08

22.83

9.84

4.17

0.0

0.0

Purdue

midwest

9

100.0

50.26

19.41

8.86

3.76

0.0

0.0

Buffalo

midwest

12

100.0

44.23

23.13

8.76

3.55

0.0

0.0

Georgia

east

10

100.0

44.83

19.41

8.76

3.53

0.0

0.0

Dayton

east

11

100.0

46.97

22.59

8.55

3.52

0.0

0.0

Mississippi

west

11

100.0

49.29

21.45

9.01

3.51

0.0

0.0

Indiana

midwest

10

100.0

48.41

22.29

8.57

3.14

0.0

0.0

Valparaiso

midwest

13

100.0

44.75

19.49

7.86

3.1

0.0

0.0

Georgia St.

west

14

100.0

43.7

19.64

6.96

2.67

0.0

0.0

Harvard

west

13

100.0

38.85

18.65

6.98

2.36

0.0

0.0

Wofford

west

12

100.0

42.33

17.5

6.84

2.33

0.0

0.0

New Mexico St.

midwest

15

100.0

41.4

15.69

6.3

2.26

0.0

0.0

UC Irvine

east

13

100.0

43.38

19.57

6.64

2.17

0.0

0.0

Northeastern

midwest

14

100.0

42.63

16.6

6.1

2.17

0.0

0.0

UAB

south

14

100.0

42.99

16.91

6.09

2.02

0.0

0.0

Wyoming

east

12

100.0

34.77

13.55

5.06

1.8

0.0

0.0

Albany

east

14

100.0

39.26

15.15

4.86

1.59

0.0

0.0

Eastern Washington

south

13

100.0

39.01

14.09

5.06

1.42

0.0

0.0

Coastal Carolina

west

16

100.0

36.62

13.53

4.85

1.42

0.0

0.0

North Dakota St.

south

15

100.0

38.33

13.43

4.5

1.35

0.0

0.0

Belmont

east

15

100.0

35.4

13.71

4.8

1.28

0.0

0.0

Texas Southern

west

15

100.0

37.65

13.73

3.73

1.03

0.0

0.0

Robert Morris

south

16

100.0

33.97

12.49

4.07

1.02

0.0

0.0

Hampton

midwest

16

100.0

34.26

12.0

3.4

0.86

0.0

0.0

Lafayette

east

16

100.0

34.27

9.77

2.93

0.84

0.0

0.0

Given those concerns, let's just run runbracket1 and look at the results

Huh. Well, that's not exactly my pick for best bracket, but there you have it.

Excersizes for the reader:

make a better energy function, using maybe a weighted average of different rankings. I slurped in KenPom and Jeff Sagarin, but you could add your own.

come up with a better "hometown wins" version. E.g. explicitly check for KU, and tweak the rankings.

In [46]:

fromMarchMadnessMonteCarloimportRankingsAndStrengthasRASstrength=RAS.kenpom['Pyth']jsstrength=RAS.sagarin['Rating']defweighted_KU_energy_game(winner,loser):ifwinner=='Kansas':win_pct=0.99elifloser=='Kansas':win_pct=0.01else:A,B=strength[winner],strength[loser]# see http://207.56.97.150/articles/playoff2002.htmkenpom=(A-A*B)/(A+B-2*A*B)A,B=jsstrength[winner]/100,jsstrength[loser]/100A,B=min(A,0.9999),min(B,0.9999)sagarin=(A-A*B)/(A+B-2*A*B)win_pct=0.70*kenpom+0.30*sagarinreturn-win_pct