The ATP Knows How to Seed

Next week, Tourneygeek will be at the Western and Southern Open, one of the more important tennis tournaments leading up to the U.S. Open. I’m planning to get to Cincinnati early enough to attend the “draw party”, so I’ll presumably be among the first to know how the luck of the draw affects the chances of the various participants.

In preparation for this glad event, I’ve been polishing up my tourney simulator so that I can again report the result of the draw in terms of the difference in the expectation of each player before and after the draw has been made.

It’s too soon to do that now. There will, in all likelihood, be a few late scratches that significantly change the environment, and the betting odds I use as an indicator of true skill need some time to settle down.

But perhaps it’s worth revisiting the issue of the distinctive way tennis tournaments are seeded as reflected by the Association of Tennis Professionals (ATP) Rulebook, and to show how this affects expectations.

Note: This post, and the next, appeared briefly with incorrect simulation analysis based on a couple of misconceptions – I had conflated the fifth and sixth seeding tiers, and failed to realize that seeding was based on a more current version of the ATP points list than the one used to determine the direct acceptances. I regret these errors.

Here’s the bracket structure for the Western and Southern: W&amp;S. You can see the distinctive seeding pattern by looking at the seeding markers, in green.

The first thing to note is that only 16 of the 56 entrants are seeded at all. The top eight seeds get first round byes. They’re arranged in four tiers: {1, 2, 3-4, 5-8}. The top two seeds go to specific lines, but the rest do not. The other seed groups are drawn to lines within the group in random order. Thus, the three seed does not automatically go to a line in the same half of the draw as the two seed, but is just as likely to go to the line in the upper half of the draw normally reserved for the four seed. Likewise, seeds five through eight go to the lines customarily reserved for those seeds, but within those lines they can be in any order.

The next two tiers {9-12, 13-16} don’t get byes, but are spaced so as to get some protection from other seeds. No two seeded players will meet each other until the third round. But again, the fifth and sixth tiers are drawn in random order. And the other 40 players don’t get seeded at all.

What effect does this have on the prospects for individual seeds? The graph below reports simulator results for 500,000 trials of each of three possible seeding patterns. It is a semi-log plot of money expectations against each seeding level from 1 to 56. For the purposes of these simulations, I’m assuming that the seeding is perfect – that the number one seed really is the best player in the tourney, the two seed second, and so forth all the way down to the 56 player being the worst. This will not be the case with the real tourney, in which the Novak Djokovic, a likely top pick of the punters, is seeded in the fifth tier, where he doesn’t get a bye. But we’ll show the effects of imperfect seeding in a subsequent post.

As we might expect, any form of seeding sends money to the higher-ranked players at the expense of lower ranked ones. Thus, fairness (C) is enhanced, and fairness (B) compromised. The fairness (C) figures for three seeding patterns are:

25.75 for an unseeded (blind draw) tourney;
17.76 for a tourney run, as the W&S is, under ATP rules; and
17.28 for a tourney strictly seeded in all 56 places.

Plotting the numbers on a semi-log scale, shows that the blue line (unseeded, blind draw) is exceedingly regular. This is, intuitively, the right relationship between seeding and expectation, at least from a fairness (C) perspective. There’s a simple, positive relationship between expectation and skill.

In contrast, the green line (fully seeded) wobbles a good deal, showing seeding waves. It’s better to be seeded 18 or 19 than it is to be 17. It’s better to be anywhere from 34 to 44 than it is to be 32 or 33. The reason that these particular seeds are so undesirable is that they put the player in line for an early encounter with the 1 seed. And these are just the extreme cases for which the positional disadvantage overcomes a difference in skill levels.

The red line (tiered seeding, the actual ATP practice) is smooth except for marked jumps at the tier boundaries. It’s highly desirable to get into the next tier, but in all cases, it’s better to have a low seed number than it is to have a high one. That’s because for each tier level the disadvantage of falling into the path of the tourney favorites is randomly distributed. This is a way of realizing almost all of the fairness (C) benefit of seeding while avoiding its most perverse effects.

Some form of seeding seems needful for this kind of competition. Many of us are secretly (or perhaps even openly) hoping to see Federer play Nadal at the Western and Southern, but we’d prefer to have that happen in the final, and we really don’t want it to happen in the first or second round.

From my point of view, this graph makes a strong case for the actual seeding practice at the Western and Southern (and in tennis generally). This teaching can be distilled into two points:

Don’t seed any deeper than you need to; and

Use tiered seeding to break up seeding waves in the part of the field you do seed.

If only other tournament-running organizations did as well. (Yes, I’m talking about you, NCAA, but there are plenty of other offenders.)