Posts Tagged ‘tennis’

Prime Federer’s feats are mind-numbing to those who understand the implications, including e.g. ten straight Grand-Slam finals with eight victories

Nadal has since won his 12th (!!!) French Open—and was at eleven at the time of writing. How do these feats compare?

This is a tricky question—and Nadal’s accomplishment undoubtedly is also one of the most amazing in tennis history.

Overall, I would give Federer a clear nod when it comes to “mind-numbing”, because he has so many other stats that complement the specific one mentioned. This includes semi- and quarter-finals “in a row” statistics that are arguably even more impressive.

When we look at these two specific feats, it is closer and the evaluation will likely be partially a matter of taste. Leaving probability theory out (in a first step), I would tend to favour Federer, because (a) he had a greater element of bad luck in that he ran into Nadal* on clay in the two finals that he lost, (b) had to compete on different surfaces, which makes it a lot harder, (c) the clay competition (Nadal, himself, aside) has been much weaker than the hard-court competition, (d) Federer reached the finals in his misses while Nadal fell well short of the finals. In Nadal’s favor, he had to span at least** twelve years of high level play, while Federer only needed*** two-and-a-half.

*Nadal almost indisputably being the “clay-GOAT”, Federer likely being the number two clay player of the years in question, and the results possibly being misleading in the way that Mike Powell’s were in [2]. (Then again, some other complication might have arisen, e.g. had Federer played in another era.)

**Assuming a twelve-in-a-row. As is, he has missed thrice and therefore needed a span of fifteen years.

***But note that his longevity has been extraordinary.

From an idealized probabilities point-of-view, looking just at numbers and ignoring background information, we have to compare 8 out of 10 to 12 out of 15.* To get some idea, let us calculate the probability** of a tournament victory needed to have a 50 % chance of each of these feats. By the binomial formula, the chance of winning at least*** 8 out of 10 is p^10 + 10 * p^9 * (1 – p) + 45 * p^8 * (1 – p)^2, where p is the probability of winning a single tournament. This amounts to a p of approximately .74, i.e. a 74 % chance of winning any given major. Similarly, at least 12 out of 15 amounts to p^15 + 15 * p^14 * (1 – p) + 105 * p^13 * (1 – p)^2 + 455 * p^12 * (1 – p)^3 and a p of roughly 0.76 or a 76 % chance of winning any given French Open. In other words, the probabilities are almost the same, with Nadal very slightly ahead. (But note both the simplifying assumptions per footnote and that this is a purely statistical calculation that does not consider the “real world” arguments of the previous paragraph.) From another point of view, both constellations amount to winning 80 %, implying that someone with p = 0.8 would have had an expectation value of respectively 8 out of 10 and 12 out of 15.

*The latter being Nadal’s record from his first win and participation in 2005 until the latest in 2019. In this comparison, I gloss over the fact that Nadal realistically only had one attempt, while Federer arguably had more than one. This especially because it would be very hard to determine the number of attempts for Federer, including questions like what years belonged to his prime (note that his statistic is a “prime effort” while Nadal’s is a “longevity effort”) and how “overlapping” attempts are to be handled. I also, this time to Federer’s disadvantage, gloss over the greater difficulty of reaching a final in a miss. (I.e. I treat a lost final as no better than even a first-round loss.) I am uncertain who is more favored by these simplifications.

**Unrealistically assumed to be constant over each of the tournaments during the time period in question. This incidentally illustrates Federer’s had-to-face-Nadal-on-clay problem: Two French Opens belong to both series and would then have had both Federer and Nadal at considerably better than a 50 % chance of winning… (Both were, obviously, won by Nadal.)

***Winning nine or ten out of ten is a greater feat, but must be considered here. If not, eight out of ten might seem even harder than it actually is. (Exactly eight out of ten corresponds to the third term, for those who must know.)

As a comparison, having a 74, 76, or 80 % (geometric average) chance of winning any individual match of a Grand-Slam tournament is quite good—and above we talk about the tournaments in their entirety.

When I watched tennis in the mid-1980s, I was often puzzled by the way players would miss “simple” shots, e.g. a smash at the net—why not just hit the ball a little less hard and with more control?

I did understand issues like nerves and over-thinking even back then; however, I had yet to understand the impact of probabilities: Hitting a safety shot reduces the risk of giving the point away—but it also gives the opponent a greater chance to keep the ball in play. When making judgments about what shot to make, a good compromise between these two factors have to be found, and that is what a good player tries* to do. Moreover, the difference in points won is often so small that surprisingly large risks can be justified. Consider e.g. a scenario where player A wins 55 % of rallies over player B. Now assume that he has the opportunity to hit a risky shot with a 35 % risk of immediate loss and a 65 % chance of immediate victory,** and the alternative of keeping the ball in play at the “old” percentages. Clearly, he should normally take the risk, because his chance of winning the point just rose by ten percentage points… It is true that he might look like a fool, should he fail, but it is the actual points that count.

*I am not saying that the decision is always correct, a regard in which young me had a point, but there is more going on than just e.g. recklessness and over-confidence. The decision is also not necessarily conscious—much more often, I suspect, it is an unconscious or instinctual matter, based on many years of play and training.

**Glossing over cases where the ball remains in play. I also assume, for simplicity, that there are no middle roads, e.g. hitting a safe shot that still manages to increase the probability of a rally win. Looking more in detail, we then have questions like whether hitting the ball a little harder or softer, going for a point closer to or farther from this-or-that line, whatnot, will increase or decrease the overall likelihood of winning the point.

Similarly, I had trouble understanding the logic behind first and second serves: If a player’s First Serve* is “better” than his Second (which is what my grand-mother explained**), why not just use the same type of serve on the second serve? Vice versa, if his Second Serve actually was good enough to use on the second serve and safer than the First (again, per my grand-mother**), why is it not good enough for the first serve? Again, it is necessary to understand the involved probabilities (and the different circumstances of the first and second serve): A serve can have at least two relevant*** outcomes, namely a fault and a non-fault (which I will refer to as “successful” below). Successful serves, in turn, can be divided into those that ultimately lead to a point win (be it through an ace, a return error, or through later play) respectively a point loss. A fault leads to a second serve when faulting the first serve but a point loss (“double fault”) when faulting the second serve, which is the critical issue.

*To avoid confusion, I capitalize “first serve” and “second serve” (and variations) when speaking of the actual execution (as in e.g. “Federer has a great First Serve”) and leave it uncapitalized when speaking of the classification by rule (as in e.g. “if a player faults his first serve, he has a second chance on his second serve”). Thus, normally, a player would use his First Serve on the first serve, but might theoretically opt to use his Second Serve instead, etc.

**I am reasonably certain that these two explanations tapped out her own understanding: she was an adult and a tennis fan, but also far from a big thinker.

***A third, the “let”, is uninteresting for the math and outcomes, because it leads to a repeat with no penalty. I might forget some other special case.

If we designate the probability* of a first serve being successful as p1s and ditto second serve p2s, and further put the respective probability of a point win given that the serve is successful at p1w respectively p2w, we can now put the overall probability of a point win (on serve) at p1s * p1w + (1 – p1s) * p2s * p2w. If using the same Serve, be it First or Second, for both serves, the formula simplifies to p1s * p1w * (2 – p1s) (or, equivalently, p2s * p2w * (2 – p2s)). A first obvious observation is that keeping the serves different gives a further degree of freedom, which makes it likely (but not entirely certain, a priori) that this is the better strategy. Looking more in detail at the formula, it is clear that the ideal second serve maximizes p2s * p2w, while the ideal first serve maximizes the overall formula given a value for p2s * p2w. Notably, an increase in p2s will have two expected effects, namely the tautological increase of the first factor and a diminishing of the second (p2w), because the lower risk of missing the serve will (in a typical, realistic scenario) come at the price of giving the opponent an easier task. An increase of p1s, on the other hand, will have three effects, those analogue to the preceding and a diminishing of the (1 – p1s) factor, which makes the optimal value for p1s smaller than for p2s.** In other words, the first serve should be riskier than the second.

*Here simplifying (and unrealistic) assumptions are silently made, including that the probabilities are constant and that the player attempts the exact same serve on each occasion.

**Barring the degenerate case of p2s * p2w = 0. If this expression has already been maximized, then p1s * p1w must also be = 0—and so must the overall formula. Further, unless p1w reacts pathologically to changes in p1s, e.g. flips to 0 whenever p1s < p2s. In such cases, p1s = p2s might apply. (But not p1s > p2s, because p1s * p1w is no larger than p2s * p2w, by assumption of optimization, while (1 – p1s) would then be smaller than (1 – p2s), implying that an increase of p1s above p2s lowers the overall value.)

A more in depth investigation is hard without having a specific connection between the probabilities. To look at a very simplistic model, assume that we have an new variable r (“risk”) that runs from 0 to 1 and controls two functions ps(r) = 1 – r and pw(r) = r that correspond to the former p1s and p2s resp. p1w and p2w. (Note that the functions for “1” and “2” are the same, even if the old variables were kept separate.) We now want to choose an r1 and r2 for the first and second serve to maximize (1 – r1) * r1 + r1 * (1 – r2) * r2 (found by substitution in the original formula). The optimal value of r2 to maximize (1 – r2) * r2 can (regardless of r1) be found as 0.5, resulting in 0.25. The remaining expression in r1 is then (1 – r1) * r1 + 0.25 * r1 = 1.25 * r1 – r1^2, which maximizes for r1 = 0.625 with a value of 0.390625. In this specific case, the optimal first serve is, in some sense, two-and-a-half times as risky as the optimal second serve. (But note that this specific number need apply even remotely to real-life tennis: the functions were chosen to lead to easy calculations and illustration, not realism. This can be seen at the resulting chance of winning a point on one’s own serve being significantly smaller than 0.5…)

There are a lot of debates on who is the GOAT—the Greatest Of All Time. While I will not try to settle that question,* I am greatly troubled by the many unsound arguments proposed, including an obsession with Grand-Slam tournaments (“majors”) won. This includes making claims like “20 > 17 > 15” (implying that Federer is greater than Nadal, who in turns is greater than Djokovic, based solely on their counts at the time of writing) and actually painting Serena** Williams (!) as the “she-GOAT”. The latter points to an additional problem, as might the original great acclaims for Sampras, namely a tendency to value “local heroes” more highly than foreigners.***

*But I state for the record that I would currently order the “Big Three” Federer > Djokovic > Nadal (for a motivation, see parts of the below); probably have Djokovic > Sampras > Nadal; and express great doubts about any GOAT discussion that ignores the likes of Borg, Laver, Gonzales, Tilden. I would also have at least Graf > Serena (see excursion), Court > Serena, Navratilova > Serena.

**To avoid confusion with her sister Venus (another highly successful tennis player), I will stick with “Serena” in the rest of this text.

***Relative the country of the evaluator and not limited to the U.S. The U.S. is particularly relevant, however, for the dual reason that authorship of English-language articles, forum posts, whatnot comes from U.S. citizens disproportionately often (measured against the world population) and that U.S. ideas have a considerable secondary influence on other countries.

The fragility of majors won is obvious e.g. from comparing Borg and Sampras. Looking at the Wikipedia entries for “career statistics” (especially, the heading “Singles performance timeline”) for Borg and Sampras, we can e.g. see that Borg won 11 majors by age 25, while largely ignoring the Australian Open, and then pretty much retired*; while Sampras was at roughly** 8 at this age and only reached his eventual 14 some six years later. To use Sampras’ 14 majors as the sole argument for him being greater is misleading, because Borg might very well have won another 3 merely by participating in the Australian Open—or by prolonging his serious career for a few years more.***

*His formal retirement situation is a little vague, especially with at least one failed come back, but it is clear that he deliberately scaled back very considerably at this point.

**I have not checked exact time of birth vs. time of this-or-that tournament, because it is very secondary to my overall point. The same might apply to some other points in this text.

***There are, obviously, no guarantees. For instance, as it is claimed that Borg suffered from a burn-out, he might not have been able to perform as well for those “few years more” (and/or needed a year off to get his motivation back) and playing the Australian Open might have brought on the burn-out at an earlier stage. Then again, what if the burn-out had been postponed by someone telling Borg that “your status among the all-time greats will be determined by whether you have more or less than 14 majors”…

More generally, the Australian Open was considerably less prestigious than the other majors until at least the 1980s, and many others, e.g. Jimmy Connors, often chose to skip it. The 1970s saw other problems, including various boycotts and bans (Connors, e.g., missed a number of French Opens).

Before 1968, the beginning of the “open era”, we have other problems, including the split into amateur and professional tennis, which (a) led to many of the leading pros having lesser counts than they could have had (Gonzales 2!!!), (b) softened the field for the amateurs, leaving some (most notably Emerson) with a likely exaggerated count.

On the other end, we have to look at questions like length of career vs. number of majors, with an eye on why a certain length of career was reached. Federer, for instance, has reached considerable success at an age that would have been considered almost absurd in the mid-1980s, when I first watched tennis—players were considered over the hill at twenty-five and teens like Wilander, Becker, Chang were serious threats.* Is this difference because Federer is that much of a greater player, or is the reason to be found in e.g. better medicine or different circumstances of some other type? Without at least some attempt at answering that question, a comparison of e.g. Wilander and Nadal would be flawed**: Both won three majors in their respective best years (1988, 2010) around age 24. Wilander never won another and ended with 7; Nadal was a bit ahead at 9 already, but has since added another 8***!

*Interestingly, I do recall that there was some puzzlement as to why tennis was suddenly dominated by people so young, when it used to be an “old” man’s sport. Today, we have the opposite situation.

**From a “methodological” point of view. It is not a given that the eventual conclusion would be different, because it is possible to be right for the wrong reason. (Certainly, in this specific constellation, the question is not so much whether Wilander trails Nadal, as by what distance. Is 17–7 a fair quantification or would e.g. 17–13 be closer to the truth?)

***This is written shortly before the 2019 French Open final, which might see yet another added. If so, fully half (and counting…) of his tally came after the age when Wilander dropped out of sight.

Or how about the claimed “surface homogenization”, i.e. that the different surfaces (grass/hard court/clay) play more similarly to each other than in e.g. the 1990s? Is it possible that the Big Three would have been less able to rack up major* wins, with more diverse surfaces? Vice versa, should some of the tallies of old be discounted for being played on fewer surfaces? (Notably, grass was once clearly dominant.)

*Looking past the majors, we can also note the almost complete disappearance of carpet.

Then there is the question of competition faced. For instance, with an eye on the dominance of the Big Three, is Wilander–Nadal a reasonable comparison, or would e.g. Wilander–Murray or Wilander-Wawrinka be more reasonable? Who is to say that Wilander would have got past 3 majors or that Murray/Wawrinka would have been stuck at 3, had their respective competition been switched? What if the removal of just one of the Big Three had given the remaining two another five majors each? (While the removal of some past great would have given his main competitors two each?) The unknowns and the guesswork needed make the comparison next to impossible when two players were not contemporaries.

For that matter, below a certain number of majors won, the sheer involvement of chance makes the measure useless. Comparing Federer and Sampras might be somewhat justified, because they both have a sufficiently large number of wins that the effects of good and bad luck are somewhat neutralized (“you win some; you lose some”)—but why should Johansson (1 major) be considered greater than Rios (none)? (Note that Rios was briefly ranked number one, while Johansson was never even close to that achievement.) How many seriously consider Wawrinka the equal of Murray (both at 3)?

Many other measures are similarly flawed. So what if Nadal has more “masters” wins than Connors? Today, these tournaments are quasi-mandatory for the top players, while they were optional or even non-existent during Connors’ career. Many of the top players of the past simply had no reason (or opportunity) to play them sufficiently often to rack up a number that is competitive by today’s standards. (But, as a counter-point, those who did play them might have had an easier time than current players due to lesser competition.)

Tournament wins (in general) will tend to favor the players of the past unduly, because many tournaments were smaller and (so I am told) the less physical tennis of yore made it possible to play more often—and not having to compete in e.g. the masters allowed top players to gobble up easy wins in weaker competition.

Looking at single measures, I would consider world ranking the least weak, especially weeks at number one. (But I reject the arbitrary “year end” count as too dependent on luck and not comparable to e.g. winning a Formula One season or to the number-one-of-the-year designations preceding the weekly rankings.) However, even this measure is not perfect. For instance, Nadal trails Lendl in weeks at number one, but has a clear advantage in terms of weeks on number two—usually (always?) behind Federer or Djokovic. Should Lendl truly be given the nod? Borg often trailed Connors in the (computerized) world ranking while being considered the true number one by many experts; similarly, many saw Federer as the true number one over Nadal for stretches of 2017 and 2018 when Nadal was officially ahead. Go back sufficiently long (1973?) and there was no weekly ranking at all.

The best way to proceed is almost certainly to try to make a judgment over an aggregate of many different measures, including majors won, ranking achievements, perceived dominance, length of career, … (And, yes, the task is near impossible.) For instance, look at the Wikipedia page on open era records in men’s singles* and note how often Federer appears, how often he is the number one of a list, how often he is one of the top few, and how rarely his name does not appear in a significant list. That is a much stronger argument for his being the GOAT than “20 majors”. Similarly, it gives a decent argument for the Big Three being the top three of the open era; similarly, it explains** why I would tend to view Djokovic as ahead of Nadal, and why I see it as more likely that Djokovic overtakes Federer than that Nadal does (in my estimate, not necessarily in e.g. the “has more majors” sense).

*A page with all-time records is available. While it has the advantage of including older generations, the great time spans and changing circumstances make comparisons less reasonable.

**Another reason is Nadal’s relative lack of success outside of clay. He might well be the “clay-GOAT”, but he is not in the same league as some others when we look at other surfaces and he sinks back when we look at a “best major removed” comparison. For instance, if we subtract his French-Open victories, he “only” has 6 majors, while Federer (sans Wimbledon) still has 12 (!), Djokovic (sans Australian Open) has 8, and Sampras (sans Wimbledon) has 7.

Notes on sources:
For the above, I have drawn on (at least) two other Wikipedia pages, namely [1] and [2]. Note that the exact contents on Wikipedia, including page structure, can change over time, independent of future results. (That future results, e.g. a handful of major wins by Nadal, can make exact examples outdated is a given.)

Excursion on Serena vs. Graf:
Two common comparisons is Federer vs. Sampras and the roughly respective contemporaries Serena vs. Graf. If Federer is ahead of Sampras, then surely Serena is ahead of Graf? Hell no!

Firstly, if we look just at majors won (which is the typical criterion), we find that Graf hit 22 majors at age 29* and retired the same year, while Serena had 13 at a comparable age, hit 22 at age 34/35 and only reached her current (and final?) tally of 23 a year later. By all means, Serena’s longevity is to be praised, but pulling ahead by just one major over such a long time is not impressive. Had Graf taken a year off and returned, she would be very likely to have moved beyond both 22 and 23. In contrast, Federer reached (and exceeded) Sampras tally at a younger age than Sampras—and then used his longevity to extend his advantage.

*Not to mention 21 several years earlier, after which she had a few injury years.

Secondly, most other measures on the women’s open era records page put Graf ahead of Serena, including weeks at number one. This the more so, when we discount those measures where Serena’s longer career has allowed her to catch up with or only barely pass Graf.

Excursion on GOAT-but-one, GOAT-but-two, etc.:
While determining the GOAT is very hard, the situation might be even worse for the second (third, fourth, …) best of all times. A partial solution that I have played with is to determine the number one, remove his results from record (leading to e.g. a new set of winners), re-determining the number one in this alternate world, declare him the overall number two, remove his results from the record, etc. For instance, Carl Lewis is the long-jump GOAT by a near unanimous estimate, but how does e.g. Mike Powell (arguably the number two of the Lewis era) compare to greats like Jesse Owens and Ralph Boston? Bump everyone who lost to Lewis in a competition by one spot in that competition, re-make the yearly rankings without Lewis, etc., and now re-compare. While I have not performed this in detail, a reasonable case could now be made for Mike Powell as the number two of all time.

Unfortunately, this is trickier in tennis than in e.g. the long jump, because of the “duel” character of the former. For instance, if were to call Federer the GOAT and tried to bump individual players in a certain tournament won by him, would it really be fair to give the runner-up the first place? How do we now that the guy whom Federer beat in the semi-final would not have won the final? Etc. (A similar problem can occur in the long jump, e.g. in that someone who was knocked out during the U.S. Olympic trials in real life, might have done better than those who actually went, after the alternate-reality removal of a certain athlete. The problem is considerably smaller, however.)

Preamble: This and a following text were intended as a single, not that long, piece. Because the length of the first part grew out of hand, I decided to split the text into (at least) two parts. Beware that a mixture of time constraints and the growing-out-hand left me lazy with the math—there might be errors through lack of checking that change the details (but not the principle), and there is a lack of explanation. (However, the math is not more advanced than what many high-schoolers encounter.) Note that I use the convention of ^ to indicate exponentiation, e.g. 2^3 = 2 * 2 * 2 = 8, and that “*” might be displayed oddly for technical reasons. (I normally use it only to indicate footnotes, and have not bothered to implement e.g. a math mode in my markup.)

With the latest French Open reaching its deciding phase, I have been reading a bit about tennis. A few resulting observations on tennis, numbers, and reasoning:

(Part I)

There is very little understanding of how probabilities play in when it comes to e.g. who-beats-whom, what is and is not impressive, whatnot. Notably, even many hard-core fans seem to jump to odd conclusions about superiority, inferiority, or who is too past his prime to be reckoned with based on a single* match. This is highly naive, even when we discount questions like surface preferences, off days, and whatnot.

*Note: “single”, not “singles”.

Consider a hypothetical match-up, where two players (A and B) are so close in abilities that the winner of each individual set is a 50–50 matter. Even in a best-of-five setting, this leaves player A with a one-in-eight chance of a straight set victory—and ditto player B. In other words, there is a quarter chance, that the match will be decided in only three sets and who wins is a toss up. Correspondingly, a single straight set victory does not necessarily say anything about the involved players. In a best-of-three-setting, half of the matches would be straight set victories and who wins is, again, a toss up.

What can be done is to look at “Bayesian probabilities”*, i.e. try to determine the probability of something based on observed events. Given that player A beat player B, we can suspect that his chance of winning is higher. Certainly, if the probabilities of a set win are shifted from 50–50 to 90–10, this would also normally result in player A winning, while a 10–90 shift would typically leave player B as the winner. (But note that even a 90–10 scenario can result in an upset, especially in best-of-three.) To get reliable information from such considerations, however, a fairly large data set can be needed, as in repeated meetings or a clear superiority in terms of games or points won in a single match (but not just the match it self or the sets of the match; of course, any single-match evaluation is prone to other weaknesses, like ignoring the possibility of a single “bad day”).

*Going into details would go past the high-school level and, frankly, I might need to refresh my own memory. The principle, however, is that (a) the probability of X and X-given-that-Y are not (necessarily) the same, (b) suitable choices allow us to e.g. calculate an expectation value for an unknown probability. For instance, the probability that the sum of two fair and six-sided dice exceeds seven is 5/12 a priori but 5/6 given that we already know that one of the dice came up six. For instance, if this sum exceeds seven at a different ratio than 5/12 over a great number of repetitions, we might conclude that one or both dice are not fair, and even attempt to estimate new probabilities for the individual sides of the dice. The “reasoning” used when it comes to some tennis “experts” could be seen as a highly naive misapplication of this, viz. that “A beat B; ergo, the probability of A beating B is 100 %; ergo, A will always beat B”.

As a notable example, let us look at the one official meeting between Pete Sampras and Roger Federer:

According to an archived version of official statistics, Federer and Sampras won respectively 1 and 0 matches (100–0), 3 and 2 sets (60–40), 31 and 29 games* (51.67–48.33), and 190 and 180 points (51.35–48.65).

*Including a tie-break each. Subtracting tie-breaks, we have 30 vs 28 and virtually the same percentages. Note that the set–game difference is likely increased and the game–point difference diminished through alternating service games (as opposed to e.g. alternating serve after each point).

Looking at the overall match, it tells us next to nothing. Indeed, had but one or two points gone differently, it might have been Sampras winning.* The games tells us a little more, but still nothing that could not easily be the product of chance. Only the points give us some truer indication (despite having the smallest relative difference)–but even that could be a product of chance or, e.g., some difference** in playing style or point distribution that is of little import.

*At least one example is obvious without looking at the individual development: Federer won the first set tie-break 9–7. Switch two points around and Sampras would, all other things equal, have won the match 3–1 (a somewhat clear victory to the naive eye). Switch one around and he would have had a roughly 50 % chance of winning from 8–8, and there might have been some earlier point in the tie-break, where even a single point would have handed him e.g. a 7–5.

**Consider e.g. a scenario where a player who already is a break up prefers to not fight back on his opponents serve, in order to save himself for the next set. (Whether such factors applied in this specific match, I leave unstated.)

This was a genuinely close match and even just looking at the game score, this should be obvious. (Nevertheless, I have seen this match cited as proof that Federer was better* than Sampras—notwithstanding factors like that none of them were in their primes.) Still, the margins on the point level are often fairly small and can still result in notable differences in overall results. For instance, imagine a 0.55 (i.e. 55 %) probability of winning any individual point**, and see how this scales. Winning a point is (tautologically) a 55–45 proposition and the result of a point played will tell us next to nothing (but the score over one hundred, two hundred, three hundred, …, points will be increasingly telling). If we assume that a game is played as best-of-five points,*** we now have a probability of 1 * 0.55^5 + 5 * 0.55^4 * 0.45^1 + 10 * 0.55^3 * 0.45^2 = .5931268750 or roughly 3/5 that player A wins an individual game (per the binomial formula). The difference in game-winning percentage is then almost doubled compared to the point-winning difference. If we now approximate a set as best-of-nine games****, the binomial formula gives roughly a .7189 chance of player A winning a set. Applying this to matches determined by best-of-three and best-of-five sets,***** we then have a match winning probability of roughly .8074 respectively .8610.

*This is another case of my disagreeing with the reasoning behind a claim—not necessarily the claim it self.

**Glossing over the complication that the probabilities will vary widely depending on who serves.

***This is not the case, nor is it necessarily a very realistic approximation. I considered making a more elaborate model, but deemed it too much work for a demonstration of principle. The best-of-five approximation is easy to calculate and requires no deeper modeling. To boot, it is likely to understate the difference that I try to show, which makes it more acceptable; to boot, the simplifications of ignoring serves might be the larger error, had I intended to find more exact numbers (rather than demonstrate the principle); to boot, any model of a tennis game that involves fix probabilities for all points (ignoring e.g. their relative importance, tiredness, nerves, …) is inherently simplistic. (An approximation as best-of-six might have been better, but would have involved the possibility of a draw, while best-of-seven might have overstated the difference.)

****Similar remarks apply.

*****Here the modeling is exact, because matches are played as best-of-three and best-of-five sets.

From another point of view, consider claims like “player A would not be able to take a game of player B”. Even when this applies to a typical match, it does not (or only very, very rarely) apply categorically over all matches played between them–again for statistical* reasons. Assume that player A is so much worse that he virtually never wins a point in his opponents service games and a mere 20 % of points in his own service games (making 15–60 a typical score for an own service game). This still gives him a chance of 1/5^4 or one in 625 to win any of his service games to love and .05792 or roughly 1/17 to win it at all by the above best-of-five model. This model might overstate the probability in this case, but if we say 1/30 as a rough guesstimate, and factor in that he would have at least three opportunities to serve per set, he would likely win a game roughly once every three best-of-five** or once every five best-of-three** matches. With a less disastrous difference, the odds improve correspondingly.

*Even discounting factors like player B gifting a game to be kind, player B having a sudden cramp, whatnot.

**Note that this translates to playing (three times) three resp. (five times) two sets under the assumptions made, because he would need absurd luck not to loose in straight sets.

This type of thinking demonstrates how unbelievable some of the exploits of the all-time greats are. For instance, to win forty straight matches requires an enormous superiority over the average opponent (and/or a ridiculous amount of luck). Prime Federer’s feats are mind-numbing to those who understand the implications, including e.g. ten straight Grand-Slam finals with eight victories—the full, mythical Grand Slam (i.e. all four tournaments won in the same year) is a considerably lesser accomplishment.

Excursion on other sports:
Some of the above applies equally to some or most other sports, e.g. the impressiveness of victories in a row. For instance, if an athlete or a team has a geometric average chance of 95* % of winning any individual competition (e.g. a tennis, boxing, or basket-ball match), the chance of winning ten in a row is 0.95^10 or roughly three in five, twenty in a row carries just a little more than a one in three chance, and forty in a row roughly one in eight. To have an at least 50 % chance at forty in a row, an individual probability of better than 98.28** % is required. Other parts do not apply, due to the unusual scoring (where e.g. a basket-ball game leaves the higher scorer the victor, while a tennis match might see the party with fewer points take the match).

*Note that this is a very high number, seeing that it must last for some time, is vulnerable to external conditions, must cover the risk of injury, etc. Moreover, the geometric average is more sensitive to outliers than the regular arithmetic average. For instance, playing seven opponents with an individual 99 % chance of victory and a single toss-up opponent gives a geometric average of less than 91 % but an arithmetic of 92.875 %.

**To understand how high this number is, note that it cuts the opponents chance of winning down to a little more than third of what it is for 0.95—an already very high number.

Excursion on probabilities, upsets, and the oddities of score keeping:
It might seem paradoxical that the score keeping used in tennis increases the difference in score compared to a plain point counting, e.g. as with Federer–Sampras above, while also increasing the probability of upsets. This, however, is easy to understand by considering the games and sets a division of smaller somewhat independent events into larger somewhat independent events. A reasonable analogy is a “plain” election system vs. a “first past the post” system.

This weakness to upsets is arguably a part of the charm of tennis, but it is a strong argument in favor of keeping important men’s matches at five sets and to introduce them among the women too.

Over the weekend, I encountered two interesting and contrasting sports events—Serena Williams levying another sexism charge and track-and-field athletes encountering true problems. These form an interesting contrast.

Let us start with Williams:

Williams played the U.S. Open final against Naomi Osaka. She clearly lost the first set. At the beginning of the second set, she received illegal coaching—and, in accordance with the rules, was given a formal warning by the umpire.

This warning, in and by it self, had no practical effect, except through opening the door for further penalties. Still, Williams reacted very negatively, appearing to take the warning as a personal insult and arguing with the umpire—the odder, as her own involvement was not necessary for the accusation to hold.*

*The infraction actually being committed by her coach, with her as an intended beneficiary. Indeed, in my understanding, her coach has since explicitly admitted to the infraction… We can discuss whether such punishment of the player is fair (in general), but that is a different matter entirely—as is the question of whether coaching should be banned in the first place.

As the events continued to favor Osaka, Williams smashed her racket in anger—and was now, again in accordance with the rules, given a point deduction* for a second infraction. Even this point deduction likely had no effect on the outcome, however, seeing that Osaka was clearly playing better and proceeded to win the next three points after the deduction became effective. Without out it, she would have had a 40–0 lead and three game points; with it, she had the game.

*It might be more accurate to say that Osaka was awarded an extra point.

This ultimately resulted in Williams starting a major row and calling the umpire a “thief”… She, again in accordance with the rules, received further punishment. With this third infraction, she lost an entire game without play, allowing Osaka to secure the set and match two games later. Even this penalty, however, likely did not change the outcome of the match: In order for Williams to win (all other events equal, assuming no injuries, etc.), she would have had to actually take the game that was not played, win the rest of the set,*and proceed to win the third set. Assuming a rough probability of 1/2** for each of these three events, her chance at winning the match was 1/8.

*Assuming that she took the contested game, we probably would have had a 4–4 (instead of a 5–3). They each did win one of the two following games, leaving a counterfactual 5–5 (instead of 6–4) and no-one with a numerical advantage. I make some reservations for memory errors.

**Discounting the respective level of play and the server’s advantage, this would be true. However, the higher level of play of Osaka would have given her an advantage with all three, barring lack-of-experience issues. To boot, I believe that the contested game was in Osaka’s serve, giving her a natural additional advantage. All-in-all, an estimate of 1/2 is likely too kind to Williams.

Of course, there are the mental effects to consider, but the only infraction that was even disputable (cf. below) was the original warning—and someone with Williams’ experience should not have been thrown to such a degree by such a minor event. Indeed, my personal suspicion is that she was already out of balance due to losing badly, which gave the event more fertile ground than it otherwise would have. Still, this second set was “only” a 4–6 loss, while in the other three sets that they have played (at all?, this year?) Williams has done even worse—including the first set of this match. It might even be that Osaka was hurt more by the mental effects…

After this Williams raises accusations of sexism… To my understanding (which might be incorrect, considering the nonsensical nature of the accusation in context), she believes that she would not have received the original warning, had she been a man. However, even if we assume* this to be true, it would (a) not automatically be sexism (just a very mild form of sexual discrimination), (b) it would not have given her any disadvantage. To see the latter, note that she was playing another woman and that the main issue is to keep a level playing field between the actual competitors; that this was just a question of in-game opportunities (unlike e.g. issues of who receives what prize money); and that the difference caused is likely to have been very minor. To boot, if men and women were to play by exactly the same standards, her own career might have suffered—imagine playing her many majors under best-of-five instead of best-of-three, with the increased wear-and-tear, the greater disadvantage of carrying all that bulk, and the risk that someone else’s playing style would have been better suited.** Then again, even if the events had been motivated by sexism, she still only has herself (and her coach) to blame: Illegal coaching might often be tolerated, but it is still illegal. No-one forced*** her to smash her racket, and if she had not she would not have received that point deduction, even with the warning. No-one forced her to insult the umpire, and without doing so she would not have defaulted that game, even with the warning and the preceding point deduction.

*I am highly skeptical, but I cannot rule it out without having done considerably more research. Possibly, such warnings are given to men just as often as to women; possibly, Williams’ case was unusually blatant; possibly, Williams had a history of such problems making her risk of being warned higher; possibly, other aspects of her own behavior influenced the matter; possibly, it was more a matter of who the umpire was than what sex the players had; …

**It is also notable, if not relevant to the immediate topic, that Williams herself has received special treatments, including, over the last few months, artificial improvements to her seeding relative what applied to everyone else. Her success during this time period give these artificial improvements some justification, but others in a similar situation have (to my knowledge) not received them, and they are potentially contributors to this success—what if she had been up against Kerber in an earlier round of Wimbledon or Osaka in an earlier round of the U.S. Open?

***I am well aware of how anger can get past our deliberate control, and I am no position to throw the first stone. However, (a) the rules were the same for both her and Osaka, so she had no objective reason to be angry, (b) the ability to control anger is a part of the game, just like the ability to re-group after losing a tie-break, the ability to keep fighting when two sets and break down, and the ability to keep focused when winning easily.

On to track-and-field, specifically the IAAF Continental Cup, which gives yet another example of the incompetence and contempt for the athletes that the IAAF displays:

This event took place between four teams, roughly corresponding to continental groupings, with two competitors for each team in each discipline. Looking at e.g. the throws, the field was halved to include only the best competitor from each team after the first three throws. Then a one-throw (!) “semi-final” between the remaining four was held, discounting the old results. The a one-throw (!) final was held between the top two from the semi-final, again discounting the old results. The resulting list of positions and throws in the discus* was

*The situation is similar or even worse in the other throws. The discus was my point of entry, because one of the victims of this idiotic system was my compatriot Daniel Ståhl, who ended up fifth, despite having the third longest throw—and who might very well have won the competition, had the regular six-throw format been used. (He is the number one in both season’s best and personal best.)

Position

Mark

1

67.97

2

63.99

3

66.95

4

63.49

5

64.84

6

63.49

7

54.03

8

27.15

Note how the third-placer actually had the second longest throw and would normally have come in second and how the fifth-placer had the third longest throw and would normally have been third—together bumping the second-placer down to fourth. Normally, the fourth and sixth-placers, both with a 63.49, would have been in the other order, seeing (cf. source) that the latter had the better second throw… This looking only at artificial re-orderings through the flawed system, and discounting the improvements through giving everyone a more equal* and fairer number of throws, which could have lead to a completely different order—especially with an eye on the cut based on best-of-team, instead of top-four. Indeed, the system is so absurd that when (“first three’, “semi-final”, “final”) a certain throw is made can have a great impact on the outcome—the (otherwise) exact same series of throws can lead to a radical different classification when in a different order. To boot, the ideal order of a given series will depend on how the other competitors throw…

*In most competitions, a group of eight would have been given six throws each. In a larger group, everyone would have received three throws and the top eight after these another three.

Another problem with this system, especially with the very chancy discus, is that the competitors have to find a middle-ground between throwing long and throwing safely in a manner that favors those who throw later. Consider the “final”: We now have two throwers, with one throw each. If the first thrower goes out too hard, he risks a foul; if to meekly, he risks giving up several meters of length. The second thrower can relax, look at what the first thrower achieved, and act accordingly—and this can range from just putting out an extreme 30m safety-throw to going all out in attempt at breaking 70m. The advantage of throwing last is enormous.

Someone might argue that this was a team competition and that the individual placings were secondary. However, even discounting the risk that these idiocies are later implemented in non-team competitions, there are similarly weird effects on the teams. Notably, a team can have the two best throwers and end up with places four and five instead of one and two,* and going better than one and five is impossible. This is particularly bad, as the same does not apply to e.g. the track events, skewing the relative benefit of having good athletes in various events. Notably, it can make sense for a team to let its weaker thrower tank the competition.** Notably, once in the semi-final, the situation is as bad for the teams as for the individual athletes.

*Scenario: These two are first and second after the first three rounds. The leader joins the four “semi-finalists”, the second-placer is relegated to fifth place as the “best of the rest”. The leader does not get a valid throw in the semi-final, while the other three do; and is thereby relegated to place four. Note that this would apply even if the two went well over 70m on each of the first three throws and no-one else broke 60m during the entire competition.

**Scenario: The team has throwers with season’s bests of 70m resp. 65m, while all members of the other teams have a season’s best between these marks. The first round sees a foul from the former and a 63m by the latter, while all competitors land between 65m and 70m. The second round starts with a foul from the former, leaving him with one attempt to get a valid mark. As it would, with minor reservations for details, be better to have the 70m-thrower in the semi-final, the best strategy now is for the 65m-thrower to deliberately tank his second and third throws, so that the 70m-thrower can put a safety throw beyond 63m. If this succeeds, the better thrower is in the semi-final; if it fails, little harm is done. (On the outside, the 65m-thrower misses a chance to move past the lesser three members of the other teams; however, needing a new season’s best to do so, his chances would have been very poor to begin with.)

There have been a few recent sports events that have been more interesting to me outside of sports than within:

Firstly, the European Championships in handball: During the time when I was the most interested in sports (late teens or so), Sweden was one of the world’s leading handball nations, often dueling it out with Russia. These days are long gone, and the world has changed sufficiently that Sweden’s smaller neighbor Denmark, an absolute nobody back then, is the reigning Olympic champion—something that teenage me would likely have considered an absurdity, even an insult, seeing that Sweden has racked up four silver medals without ever reaching the gold.

In the first game of the European Championships came the ultimate blow: A humiliating loss against dwarf country Iceland… I wrote off the rest of the Championship, reflecting on how sadly similar things had happened in tennis and table tennis, and noting how well this matched some of my thoughts on how short-lived traditions actually often are and how the world can change from what we know in our formative days. (Cf. my Christmas post.)

Today, Sweden played in the final of the same Championships against Spain, even having a half-time lead and an apparent good chance at victory. (Before, regrettably, losing badly in the second half. Still, a silver is far beyond what seemed possible after the Iceland game and a very positive sign for the future.) The road there was very odd, including the paradox of an extremely narrow semi-final win over Denmark, the aforementioned Olympic Champions, and another embarrassing and unnecessary loss against a smaller neighbor in Norway. Funny thing, sports.

Secondly, the immensity of Roger Federer’s 20th Grand-Slam title. A year ago, he and Nadal met up in the final of the Australian Open for what seemed like their last big hurrah—one of them was going to get a last title before age or injuries ended their competitive careers. Since Federer’s narrow win, we have seen another four Grand-Slam tournaments—with the winners Nadal, Federer, Nadal, and (with this year’s Australian Open) Federer. Indeed, where a year ago I was thrilled over the (presumed) last win, I was now slightly annoyed that Federer narrowly* missed going through the tournament without a loss of set. This is a very good illustration of how humans tend never to be satisfied, to ever want more or better**, and of how our baseline for comparisons can change.

*He entered the final without a lost set, won sets one and three, and only missed the second in a tie break. One or two points more and he would have had it. Such a result is extremely rare. (The oddity of 2017 notwithstanding, where it actually happened twice, making the year the more remarkable: Nadal in the French Open and Federer a few weeks later in Wimbledon.)

**Whether this is a good or bad thing will depend on the circumstances and on whether this tendency leaves us unhappy or not. At any rate, humanity would hardly have gotten to where it is without this drive.

An interesting lesson is the importance of adapting to new circumstances: Apparently, Federer has spent considerable time modifying his approach* to tennis in order to remain reasonably healthy and competitive even at his ancient-by-tennis-standards age of 36. Those who stand still fall behind (generally) and we all do well to adapt to counter aging (specifically).

*In a number of areas including style of play, racket size, and yearly schedule.

I make a post on Hope Solo and an outrageous and unjustified suspension, practically naming it the height of abuse, and what happens? More or less immediately a three-fold example that is equally bad pops up.

In one case (Benoit Paire), the crime, according to the article, consisted of “He reportedly sometimes stayed away from the village or came back late. He also made dismissive comments about the importance of the Games because there are no ranking points.”.

Here we have an adult man who “stays away” from a team location or sets his own hours—oh, the horror! Going by Wikipedia, he is 27 years old and has been a professional player for almost ten years—meaning that he is not only an adult, he is also used to traveling internationally, taking responsibility for his schedule, knows how to live his life when competing, … We are not talking a teenager leaving his home-town for the first time. If and in as far as he made any misjudgments (say, by getting too little sleep) that is his responsibility as an adult—not the French federation’s as a baby-sitter.

Now, if he had been a member of, say, a soccer or basket team, I could possibly, on the very outside, have seen a point, because he just might have damaged the teams coordination, training, spirit, whatnot. He was not a member of such a team: He is a tennis player, who competed in the singles (!) tournament.

As for “dismissive comments”: So what? Not only does he have a full right to free speech and opinion, but this opinion has considerable merit. By not awarding* ATP points, the Olympic Tournament is placed outside the normal world of professional tennis, and is diminished severely in value. Even when points were awarded, it was arguably only the fifth or sixth most important tournament of the year (behind the Grand Slams and, possibly, the Tour Finals)—and possibly not even reaching top-twenty over the entire Olympic cycle. Without points? I can understand very well how someone from within the tennis world would consider it a blip on the radar screen. This is not figure skating where the Olympics compares to the second best competition as France does to Luxembourg.

*I am not aware why this is so or who made the decision. However, since points are allocated by ATP (their tour and their points…), the IOC could be free of guilt.

The other two (double players Kristina Mladenovic and Caroline Garcia) apparently had the audacity to complain about incompetence on behalf of the federation—and appear* to have a good case to do so! This is one of the very, very worst signs of corruption: Trying to silence dissenters and sources of criticism through threats and sanctions, where, on the outside, solid arguments would have been used by a fair-minded organization. To boot, in my experience, the more prone someone is to censorship of criticism, threats against dissenters, etc., the more likely it is that the criticism is justified… The French federation does more to condemn it self that the two ever could.

*I do not know the details, but it seems clear that information that the two should have had was not communicated sufficiently early or sufficiently clearly. At worst, I would assume that they made their statements in good faith and in genuine disappointment and frustration, which might require an apology—not a suspension. At best, they are entirely right and the French federation tries to cover its own incompetence in an inexcusable manner.

Zvonareva rises from place 8 to place 4, while Clijsters … falls from place 3 to place 5—one place behind Zvonareva. (Assuming that I read the preliminary numbers provided by Wikipediaw correctly.)

The explanation for this seeming (arguably, actual) absurdity is simple:

Clijsters was the defending champion. Her victory merely maintained her point score.

Zvonareva lost early in last years edition—and saw a significant increase in points. (Semi-finalist Venus Williams passes Clijsters for the same reason.)

Obviously, the scoring system is not based on one tournament, but on the performances over the last year, on several surfaces, in different weather conditions, and with varying other circumstances. Zvonareva’s overall performance in this year has simple gathered her a higher score—even though she stood no chance today. (Nor, looking at historical performances, has a comparable single’s record. I rarely follow women’s tennis, but I do have the impression that Clijsters is one of the best there ever was—when she is not injured, retired, or otherwise off her game.)

The lessons: Be careful by what signs success are measured, be wary of both one-dimensional indicators and impressions from single instances, be prepared for life to be absurd or even unfair, and keep in mind that unfairness is often a matter of perspective.