The Question

What is the best way of ranking ultimate teams based on their
performance? This fundamental question can be asked both in the context of a whole
season or for a single tournament. While USA Ultimate's use of a season-based ranking
algorithm has lead to quitesomediscussions,
I will in this blog post focus on the simpler case of ranking teams at
tournaments, where various factors such as
home-field advantage, changing rosters over a season etc. play a
smaller role.
One possibility currently used in
the Swissdraw
format is that teams earn "swiss points" at the completion of
every game. The number of swiss points awarded depends on the point
differential (also called margin) of the game. So far, we have been
using the following table to convert point differentials into swiss
points:

There are a number of advantages in using the innovative Swissdraw format
compared to the more common pool format. All teams can potentially
play each other. The Swissdraw format is designed so that teams of
similar strength match up quickly and within a few rounds, the ranking
of the teams represents their level of play. This guarantees
attractive games against different opponents of comparable strength.
While I am personally a big fan of the Swissdraw format, I will argue that
the currently used system to rank teams has the problem that teams are
awarded the same amount of Swiss points for a certain point
differential, independent of the strength of the opponent. This
drawback has particularly bad consequences in big divisions with a
widespread level of play, where in later rounds of Swiss draw, teams
can still make big jumps in the ranking by winning/losing by big
margins. In this post, I would like to suggest another method of
ranking the teams which will make the Swissdraw format work even
better.

Power Rankings

This method has been suggested already back in 1976 by Leake in the
context of ranking American college football teams. A nice
mathematical explanation of it can be found in Chapter 4 of Ken Massey's
undergrad thesis.

The basic assumption is that every team can be assigned a numerical
value representing its strength (or power) so that the point
differential in a game is the difference in strength of the
participating teams. For example, if Team Alice wins against Team
Danny with a score of 15-10, this result could be explained by
assigning a strength of +2.5 to Team Alice and a strength of −2.5 to
Team Danny. (Of course, any two numbers with a difference of +5 would
work, but let us try to keep the numbers as small as possible in
absolute value.)

If there are more teams with many games played among them, it will
become more difficult to assign strengths to the teams, but we can
nevertheless try to optimize these numbers so that they fit the
outcomes as well as possible. In fact, this problem is well-studied in
the area of mathematical regression.
In more mathematical terms, we assume that the game outcome yij
between teams i and j is the difference of their according strength
βi - βj plus some error term ϵi,j which is independent
and identically normally distributed for every game. Expressed in
matrix form, we can write

y=Xβ+ϵ

where y is a
column vector containing all game margins, the rows of the matrix X
are all-zero except for a +1 in column i and a −1 in
column j. The column vector β denotes the strength of
the teams and ϵ is the column vector of the
error terms with normal distribution. There exist efficient
methods (such as the least-square method)
to compute strength vectors β that minimize the square of the
errors:

∥y−Xβ∥2.

Example

Let us consider a simple example based on a
made-up tournament with six teams. The game outcomes reflect the
imaginary fact that their level of play is about evenly spread out
among the top three teams and the bottom three teams where Team Alice is the best and Team Fred the worst team.
The results of the first round are as follows:

resulting in the following strength values and swiss points:

PowerRank denotes the team's rank according to their strength, whereas
SwissRank is the team's rank according to the amount of Swiss
points earned so far.

All game outcomes can be perfectly explained with those strengths.
After the first round (and assuming no prior knowledge of the strength
of these teams), it is impossible to compare Team Alice with Team Bob,
because there is no connection between them yet.

After a second round with the following results:

We can compute the following strength values, Swiss points and
according ranks:

Notice that Team Charlie would be ranked first if sorted according to
swiss points, as it had the largest marginal of +5 in the second round
among the winners of the first round. Analogously, Team Danny would be
ranked last. However, the new method re-evaluates all previous games
from the point of view of the latest results, with the (correct) outcome that
assigning the biggest strength to Alice gives the best explanation of
the results. In fact, the power ranks after only two rounds already reflect the order of
teams we had in mind when making up the results.

The seventh column (entitled "predicted margin") is the difference in
current strength of the teams involved in a particular game which can be
interpreted as the margin predicted by the strength. The values in the
last columns are the squared differences of the actually observed and
predicted margins. If such a value is high, the model could not predict
this game outcome well. Hence, big values stand for surprising game
outcomes.The least-square procedure tries to find strength
values that minimize the sum of the surprise values in the last
column.

Playing one more round with results:

gives:

By now the teams are clearly separated in strength. However, notice
that the Swiss points still do not reflect the strengths of the teams
correctly. Sorting according to number of wins as first criterion (and Swiss points
as second) would put Alice
on first place (she is the only one with three wins), but it would still place Team Eve ahead of Team Danny
(both have one win and two losses).

Let us examine the example graphically in the following
chart. Clicking on series in the legend will toggle its
visibility. Clicking on particular points in the chart will show
detailed explanations how that strength was obtained.

1: Team Alice2: Team Bob3: Team Charlie4: Team Danny5: Team Eve6: Team Fred123Rounds-6-4-20246Strength1: Team Alice: 3 (1st)2: Team Bob: 2.33 (2nd)3: Team Charlie: 2.33 (2nd)4: Team Danny: -2.33 (4th)5: Team Eve: -2.33 (4th)6: Team Fred: -3 (6th)Highcharts.comExport to raster or vector imagePrint the chart

For comparison, let us consider the graph of average Swiss points:

Average Swiss scores1: Team Alice2: Team Bob3: Team Charlie4: Team Danny5: Team Eve6: Team FredR1R2R3Rounds7.51012.51517.52022.5Average Swiss Scores1: Team Alice: 18 (2nd)2: Team Bob: 17 (3rd)3: Team Charlie: 20 (1st)4: Team Danny: 10 (6th)5: Team Eve: 13 (4th)6: Team Fred: 12 (5th)Highcharts.comExport to raster or vector imagePrint the chart

The Swiss score does not reflect the
correct order of the teams, neither after round 2 nor after round
3. In contrast, the power ranking "gets it right" already after two
rounds. The frequent crossings of the lines indicates that teams make
jumps in their placement as illustrated here: (click on series in the
legend to toggle visibility of the power ranks)

Ranks1: Team Alice1: Team Alice (power)2: Team Bob2: Team Bob (power)3: Team Charlie3: Team Charlie (power)4: Team Danny4: Team Danny (power)5: Team Eve5: Team Eve (power)6: Team Fred6: Team Fred (power)R1R2R3Rounds01234567Rank1: Team Alice: 22: Team Bob: 33: Team Charlie: 14: Team Danny: 65: Team Eve: 46: Team Fred: 5Highcharts.comExport to raster or vector imagePrint the chart

Here are the final evaluations of the games, based on the team's
strength after Round 3, sorted by the
upset/surprise value.

The first line can be read as follows: based on the strengths computed
based on all game results, the biggest surprise of all games happened
in the round-2 game, where Team Charlie won against Team Danny with a
margin of +5 where the model predicted a margin of only +3.73.

Equipped with all this knowledge you can dive into the power scores for
Windmill Windup and Wisconsin Swiss provided in the Further Analysis section at the end.

Conclusion

There exist a wide variety
of sports
ranking systems. Most of these systems do not reveal the details
of their algorithms. The power-rating system presented here is a very
basic variant that I suggest to use for ranking teams e.g. in the
Swissdraw phase of an ultimate tournament. As outlined in Chapter 4 of
Massey's
thesis, the system could be extended in various ways to account for
things like home-field advantage, blow-out scores etc. As illustrated in the
example above, it converges faster to the real ranking of the teams.

Here is a list of pros and cons compared to the currently used
swiss-point system:

Pros:

Your strength depends on the performance of your opponents. A
large win against a strong opponent counts more than against a weak one.

Converges faster to the "real" ranking, smaller jumps in rankings
from one round to the next.

Strengths say more about teams than swiss points (e.g. the
difference in strength of two teams directly predicts the game outcome and point
differential. This also allows us to get a sense of "surprising"
results. This information might be of interest for spectators live at a tournament
or when reporting about it afterwards.)

Cons:

Difficult to understand

Games of previous rounds are "re-evaluated", you never have a
certain amount of points for sure.

Your strength depends on the performance of your opponents.

I think that the first concern can be mitigated by providing
interactive graphs like the ones above to help the teams explaining how their
strength was computed. The other two disadvantages are inherent to the
system.

I am very curious to hear what you think about the suggested power-ratings in
Ultimate. Do you see more advantages and disadvantages? Please leave
your comments below.

Further Analysis

To see how these power rankings apply to events in the past, please take a look at some in depth analysis for 5 popular tournaments:

Christian Schaffner has been managing the Swissdraw schedule and scores
since 2009 at Windmill Windup, Europe's largest grass tournament. Since
then, he has been involved in promoting and advancing the Swissdraw
format for Ultimate tournaments. In his everyday life, he is a
researcher in quantum cryptology at the University of Amsterdam.

This post is geared towards those of you who are curious about Swiss style tournaments, and want to learn more about Leaguevine's new algorithm. For even more information on the swiss format, Christian from the Windmill Windup has put together a great wiki page.

For the recent Madison Swiss Tournament, the tournament director Chris Olig decided to use Leaguevine exclusively for the score reporting. A swiss format is totally different from a regular Ultimate tournament, and thus the regular pools and bracket scheduling techniques do not apply. Nonetheless, we decided to build a swiss tournament generator into Leaguevine. We used an innovative approach to build this scheduler, so I thought I'd share it here.

If you do a simple google search or look through one of several lists of swiss tournament scheduling programs out there, you'll find that there is no shortage in this arena due to the abundance of chess, go, and card tournaments using this format. Swiss scheduling is clearly a difficult problem and is one that a lot of people attempt to build programs for. My first instinct was thus to use an existing program and just manually add a bunch of games to Leaguevine each round during the tournament. Yeah, I know, keeping track of games on both Leaguevine and an external scheduler sounds awful. What makes this approach even worse is that after looking into these existing programs, they tended to do a very poor job of scheduling because they did not ensure that there were no rematches. Thus, we built our own.

Before I explain our solution, I should briefly explain how a swiss tournament works. Each round, all of the teams are matched up with another team based on a pairing method that the tournament director chooses. The winners of that round received points which are used to rank the teams. Typically, a team is rewarded one point (we'll refer to this as a "swiss point") for a win, but there also exist variations based on point differential. If multiple teams have the same number of swiss points, the tiebreaker is usually how many swiss points each team's opponents have scored, which essentially rewards teams for their strength of schedule.

To schedule a round of matchups, the teams are first separated into groups based on either their win/loss record or swiss points. Unless point differential is taken into account, the groups will likely be the same for win/loss record and for swiss points. If point differential is taken into account, then the groups are formed based off of win/loss record and the point differential (aka swiss points) determines the team's ranking within that group. The teams will then play the other teams in this group. If a group has an odd number of teams, a team must be added or removed from that group. For deciding which team plays which other team within the group, there are four common group pairing methods.

The difficulty in scheduling occurs when we introduce two almost universally accepted constraints:

No team should play another team more than once during the tournament

No team should have more than one bye during the tournament

Because of this, the matchups that are generated by a tournament scheduler using any of the group pairing techniques will result in matchups that violate either of these two constraints. Almost every piece of existing software I've seen simply points out which ones violate the constraints, but leave it entirely up to the tournament director to resolve the problems. There does appear to be one program out there that resolves these conflicts, but I didn't notice it until well after I finished coding the algorithm up. Luckily it only took a couple days!

I've seen many articles explaining how to resolve any conflicts with these constraints. They tend to list a sequence of steps for either changing the matchups until you find something that works or reordering the standings until you find something that works. However, at the end of these steps usually comes a line that says "this approach usually works but doesn't work all of the time.". Another problem I see with these approaches is that it is very difficult for a person to manually change a matchup and have that change be the optimal possible change. There is just too much room for error.

Leaguevine's Algorithm

The approach taken by the Leaguevine scheduler is much more mathematically sound. First, the Leaguevine scheduler builds a set of "weights" that determines a cost penalty for each team playing against each other team for that given round. The penalty is zero for matchups between sets of teams that are supposed to play each other (according to whatever group pairing algorithm is used) and whose matchups are not rematches. The penalty increases for matchups that are not ideal according to the pairing algorithm.

Let's take a look at a sample vector of weights for a given team in a given round. Lets say there are currently 8 teams, and fold pairing (aka. slaughter pairing) is used (FYI, Slide pairing was used for Wisconsin Swiss). If the teams have an initial seeding of team 1 being ranked 1st through team 8 being ranked 8th, team 1's weight vector for this first round will simply be [100000000, 6, 5, 4, 3, 2, 1, 0]. We see that the preference for team 1 is to play team 8, it's second preference will be team 7, it's third will be team 6, and so on. The penalty for team 1 playing itself is set outrageously high to ensure that this never happens. For this same round, team 2's weight vector for this round will be [6, 100000000, 4, 3, 2, 1, 0, 1]. Since team 2 should ideally play team 7, the weight for that matchup is 0, meaning it is team 2's first preference.

After all of these weight vectors are created for a given round, each corresponding pair of weights is added to make the matrix of weights symmetrical. In other words, the weight for team 1 playing team 6 will be added to the weight for team 6 playing team 1 and vice versa. Finally, these weights are squared for added effect. For this first round, the matchups will come out to be 1 vs 8, 2 vs 7, 3 vs 6, and 4 vs 5.

Now that we have a symmetric matrix, this matrix represents an undirected graph where each node is a team and each edge is a weight. Finding the ideal solution then boils down to finding the set of edges that splits this graph into pairs of nodes and has the smallest possible combined weight. Thankfully, this is a well studied problem in graph theory. Leaguevine uses a minimum weight maximum matching algorithm to determine to solve this problem and arrive at the final matchups. Thanks to Abraham Flaxman, this algorithm has already been coded up in Python!

Okay, so now that we've generated the initial matchups for this 8 team tournament, how do we generated for the second round? First, we should note there will be four 1-0 teams, and four 0-1 teams. We can assume that there is some tie breaker that ranks the teams within these two groups. The default tie breaker for Leaguevine is Victory Point scoring conceived by Christian Schaffner (edit: Christian tells me it was actually created by Michael Cummings and/or his Australian Ultimate buddy). Further, if there are still ties after sorting by the number of wins and the number of victory points, the next tie breaker is the team's strength of schedule determined by the sum of all previous opponents' victory points.

So getting back to scheduling matchups for these teams after the first round is over, we assume the teams are again ranked 1-8 and we will continue using the fold pairing grouping method. The ideal matchups will be 1 vs 4, 2 vs 3, 5 vs 8, and 6 vs 7. The weight vector for team 1 will look like [100000000, 2, 1, 0, 23, 22, 21, 20]. The weights against teams 2-4 are self explanatory. The weights against the teams in the 2nd group were calculated by determining what the weight would be if that team were moved down as the top seed in the 2nd group, and adding a penalty of 20 for each group it had to move down which in this case was just one. As a second example, the weight vector for team 2 would be [2, 100000000, 0, 1, 23, 22, 21, 20]. After calculating all of these vectors, they are again run through the minimum weight maximum matching algorithm and the output is the ideal matchups for the next round.

There are a couple of other finer points of interest that make the lives of schedulers difficult if they are doing this by hand. First, if there are an odd number of teams, then the groups will be odd and on top of that there is still a chance that teams will have to switch groups to avoid rematches or double byes. The Leaguevine algorithm makes these groups even by creating groups of the same record starting with the highest ranked group, and if a group size is odd, the highest ranked team outside that group is then added to the group. This continues until all groups have been created. To ensure that teams do not play the same team twice and do not have more than one bye in the tournament, a large number (100000) is added to the already calculated weight of the matchup.

Determining optimal matchups as rigorously as this by hand is nearly impossible and certainly not practical, which gave me the motivation to build this algorithm. Because Leaguevine's internal framework had already been built, and a python version of the maximum matching algorithm was available for free, the entire process of devising this algorithm and coding it up took only two days, which I am really happy about.

We were very pleased with how well it performed at Wisconsin Swiss, as all of the teams kept telling us how difficult and evenly matched all of their games were after the 2nd round. In fact, if you look at the results, 11 of the 15 games in the 5th round were "close" if we define close to mean 13-8 or closer. We were really happy with this considering the pool of teams ranged from Drag'n Thrust who finished 3rd at Nationals last year all the way to MUFA recreational pickup teams that were not playing in the competitive division.

One Final Note

While all of this scheduling might sound complicated, from a user's perspective it couldn't be easier. Since Leaguevine takes care of all of this programmatically, all a user has to do is click "create next round" and wait a few seconds.

Wisconsin Swiss in Madison, WI will be the first tournament to use Leaguevine's scheduler and score reporting to fully manage the tournament. Several tournaments have used it in the past as an aid, but Chris "Bus" Olig of Midwest Ultimate is the first to use its full functionality. By using this innovative score reporting system, absolutely everything will be taken care of using smartphones!

For players/captains/coaches, this means you will never have to walk back to tournament central to report scores. All of the score reporting will be done on your smartphone, and you will be expected to update the scores of your games multiple times per game. To update a score on your smartphone, visit m.leaguevine.com, create an account, and simply find and update your game. Leaguevine Mobile is a simple web application that runs in your browser and thus you do not need to download anything. As you and others update scores, you'll be able to see the live results coming in from other fields. Your schedule is posted on m.leaguevine.com as well. Because each round depends on the previous round's results, the schedule for the next round will be posted 10 minutes before game time. Leaguevine Mobile will work on almost any smartphone from mid 2009 or later. If no one on your team has a smartphone (iPhone, Android, Palm Pixie, Blackberry 6, Windows Mobile 7), please stop by tournament central before games start and we will find a solution for you.

For the tournament directors, this means they will never have to input any scores. Scheduling will be as easy as clicking "Add a round" once each round completes.

The Wisconsin Swiss tournament format uses 5 rounds of a Swiss format then has bracket play with the seedings for the brackets determined by the Swiss games. The format is inspired by the popular Windmill Windup Tournament and the specifics for the Wisconsin Swiss Saturday scheduling can be seen on the Standings page. Players/captains/coaches do not need to know how the scheduling works because we will generate it automatically for you, but the information is there if you are curious.

Here is the tournament format specific to Wisconsin Swiss:

All games to 13, win by 2.

Each team has one time out per half as well as a floater.

Games last 90 minutes, with 1 hour and 45 minutes between start times. Games will begin at 9:00, 10:45, 12:30, and 2:15 on both days.

The Soft cap is at 80 minutes and the hard cap is at 90.

Schedules for each round of play will be posted 5 minutes after all teams have submitted their scores. Because generating schedules relies on teams entering scores, please enter your scores as often as possible (preferably each point) at m.leaguevine.com and once a game is done, check the "Final Score" box when you submit it.

There will be 5 rounds of Swiss play. Four will take place on Saturday and the final round of Swiss play will take place on Sunday.

The teams ranked 1-8 after Swiss play will move onto the Swiss Bracket (1st place).

The teams ranked 9-16 after Swiss play will move onto the Sharp Cheddar Bracket (9th place).

The teams ranked 17-24 after Swiss play will move onto the Mild Cheddar Bracket (17th place).

The teams ranked 25-32 after Swiss play will move onto the Aged Cheddar Bracket (25th place).

Winners of each bracket will receive a wheel of cheese. The top three teams in the tournament will receive monetary prizes.