Search

Bash Shell Script: Building a Better March Madness Bracket

Last year, I wrote an article for Linux Journal titled "Building Your March
Madness Bracket"
My article was timely, arriving just in time for the "March Madness" college
basketball series. You see, I don't follow college basketball (or really, any
sports at all), but I do like to participate in office pools. And every year, it
seems my office likes to fill out the March Madness brackets to see who can
best predict the outcomes.

Since I don't follow college basketball, I am not a good judge of which teams
might perform better than others. But fortunately, the NCAA ranks the teams
for you, so I wrote a Bash script that filled out my March Madness bracket for
me. Since teams were ranked 1–16, I used a "D16" method borrowed from tabletop
gaming. I thought this was an elegant method to predict the outcomes.

But, there's a bug in my script. Specifically, there's an error in a key
assumption for the D16 algorithm, so I'd like to correct that with an improved
March Madness script here.

Let's Review What Went Wrong

My Bash script predicted the outcome of a match by comparing the ranking of
each team. So, you can throw a D16 "die" to determine if team A wins and
another D16 "die" to determine if team B loses, or vice versa. If the two
throws agree, you know the outcome of the game: team A wins and team B loses,
or team A loses and team B wins.

I asserted that a #1 team should be a strong team, so I assumed the #1 team had
15 out of 16 "chances" to win, and one out of 16 "chances" to
lose. Without any other inputs, the #1 ranked team would win if its D16 throw is
two or greater, and the #1 team could lose only if the D16 value was one. With
that assumption, I wrote this function:

In the guesswinner function, each D16 roll generates a random number 1–16. If
the rank of team A is "rankA" and the rank of team B is "rankB," and the D16
roll for team A is "A" and the roll for team B is "B," the function tests two
D16 rolls like this:

If A greater than rankA (team A wins) and B less than or equal to rankB
(team B loses), then team A wins.

If A less than or equal to rankA (team A loses) and B greater rankB (team B
wins), then team B wins.

But look at what happens if team A is ranked #1 and team B is ranked #16. Team
A will always win:

A roll 1–16 will have a 15 out of 16 chance to be greater than 1 (team A
wins), and a 1–16 roll will always be less than or equal to 16 (team B loses).

A roll 1–16 will have a 1 out of 16 chance to be less than or equal to 1
(team A loses) but a 1–16 roll will never be greater than 16 (team B wins).

There's no scenario in which a rank #16 team B can win over a rank #1 team A.
It's a forgone conclusion that in any match of a rank 1 team versus a rank 16
team, the rank 1 team will always win. That's not right. There should be a
slim chance for the rank 16 team to win over the rank 1 team.

A Better Algorithm

Instead of a "static" D16 die, we need a custom "die" that has faces relative
to the chance of each team to win. Let's consider this simple algorithm to
generate a custom die:

Team A gets a=16-rankA+1 sides.

Team B gets b=16-rankB+1 sides.

Under this assumption, a rank 1 team versus a rank 16 team would generate a
die with a=16-1+1=16 "team A" sides and b=16-16+1=1 "team B" sides, resulting
in a 17-sided die. Similarly, a more even match, such as a rank 8 team versus a
rank 9 team, would create a die with a=16-8+1=9 "team A" sides and b=16-9+1=8
"team B" sides, resulting in another 17-sided die.

It's not always a 17-sided die, however. A rank 1 team against a rank 9 team
would generate a die with a=16-1+1=16 "team A" sides and b=16-9+1=8 "team B"
sides, or a 24-sided die.

In Bash, you can simulate a virtual custom "die" through a file. It's simple
enough to generate a file with the correct number of "team A" sides and "team
B" sides. If you already have calculated a and b as above, you can write a
file like this:

Picking a random value from this file is as easy as randomizing or "shuffling"
the file, then selecting the first line. On Linux systems, you can use the
shuf(1) program from GNU coreutils to generate a random permutation of lines
from a file. This randomizes whatever data you feed into shuf. Once shuffled,
you easily can select the first line of the randomized output using
head:

That simple expression becomes the heart of the improved March Madness script.
It operates the way I want it to: a rank 1 team almost always (but not
always) will win over a team 16 team, yet more closely matched games, such as a rank
8 team versus a rank 9 team or a rank 2 team against a rank 3 team, will
present more even odds.

Building a Better March Madness Script

The above can be wrapped into a new guesswinner function to predict a contest
between two teams, whose ranks are passed as arguments. The function generates
the virtual "die" and uses that to guess a winner:

Since the March Madness brackets are always played in order, you can write a
playbracket function to run through the different iterations of the bracket.
Winners from round one are carried into rounds two and three to select an
ultimate winner for the bracket in round four:

Finally, you need only call the playbracket function for each of the four
regions. You are left with the "Final Four" with the winners of each bracket,
but I'll leave the final determination of those contests for you to resolve on
your own:

In this sample run, my script selects team 1 in the Midwest, team 2 in the
East, team 8 in the West, and team 4 in the South. More important, note that
the rank 16 team won the first round against the rank 1 team in the East
bracket. This could not happen in the script I posted last year. My bug is
fixed!

The point of using a script to build your NCAA March Madness basket bracket
isn't to take away the fun of the game. On the contrary, since I don't have
much familiarity with basketball, building my bracket programmatically allows
me to participate in the office basketball pool. It's entertaining without
requiring much familiarity with sports statistics. My script gives me a reason
to follow the games, but without the emotional investment if my bracket
doesn't perform well—and that's good enough for me.