The Olympic Figure skating controversy at Salt Lake City

Figure Skating is not a sport that gains a lot of my attention. It's
not that I don't appreciate the impressive turns and jumps:
Cloggies simply find the concept of skating an
oval at maximum speed a lot easier to grasp. Last Monday, NBC promised
both events during their Olympic
coverage, and as a result I caught a glimpse of the commotion
surrounding the pairs competition.

To summarize: the RussiansAnton Sikharulidze and Elena
Berezhnaya skated a difficult routine with several mistakes, but
managed to get high marks by the judges. In a following run, the
Canadian pair David Pelletier and Jamie Sale skated a
nearly perfect (although arguably a less difficult) routine. The Canadians,
supported by a huge crowd were convinced they won the gold. The
TV commentators were also convinced of a Canadian victory. However, to
everyone's astonishment the Russians skated away with the first prize.

Let's step aside from the issue of subjectiveness in judged sports. For a moment, also forget
about the preference of the audience, Internet polls,
and commentators who watched last Monday's Olympic event. Let's take a
closer look at the voting of the individual judges in the pairs skating
event. After all, they decide who steps up to the medal platform. The
voting behavior of all the judges, and not just the French
judge in question should stand up to scrutiny. In fact, they
should be evaluated separately from the performances of the
skaters, because they represent objective, quantitative markers by which
the individual achievements are measured.

The Olympic figure skating is evaluated by nine judges. Each judge
gives two marks on a scale from 0.0-6.0, with a 6.0 being perfect and
faultless. One mark is given for "technical merit", and one for
"presentation". The two marks are then added for a total score. The
following table compares the results of the Russian and Canadian pairs:

If we add up the total scores for the Russian pair, and the Canadian
pair, the score for the Canadians is actually slightly higher. But that
is not how the medals are decided. The marks are merely used to
determine ordinals. The ordinals are numbers that represent
how each judge thinks a pair performs compared to other pairs. Each
individual judge assigns an ordinal to each pair. In case of a tie, the
highest mark for "presentation" receives the lowest ordinal. For
the Russian and Canadian pair, this came out to:

Thus five judges determined that Russia won the gold, whereas four
judges favored the Canadians. The judge's preferences appear to be split
along traditional geopolitical borders, with the exception of the French
jury member voting for the Russian pair. Without a doubt, this has
raised suspicions about her voting behavior.

But the question is whether the French judge's voting was out of
line. Certainly, any favoritism should show up in her voting
statistics. First, let's look at the statistics for the Russian pair:
the mean (average) for the total score is equal to 11.63 ±
0.08 (99% Confidence Interval). Thus, if we assume that the judges
are unbiased and their votes are spread normally around the mean, 99
out of a 100 judges would score the Russian run with a score between
11.55 and 11.71. All the judges scored the Russian performance within
the confidence interval except the Canadian judge, whose score
(11.5) was below the bottom confidence limit. Note that the French judge
did not significantly upvote the Russian pair, compared to the mean
scoring; in fact, three other judges gave identical scores.

What does this mean? There are two possible answers to this:
either the Canadian judge was biased and downvoted the Russian
pair to favor the Canadian skaters or there were several biased
judges who upvoted the Russian pair (an "eastern-block" vote). The
Russian, Chinese, French, and
Ukrainian judges awarded identical scores to the Russians. It
would be unfair to pin the entire scandal solely on one French judge.

Now let us take a closer look at the statistics of the Canadian pair:
the mean for their total score is equal to 11.68 ± 0.09 (99%
Confidence Interval). Again, if we assume unbiased voting, 99 out of a
100 votes for the Canadian performance would be ranked between 11.58 and
11.77. Two scores fall outside of the confidence limits: the scores of
the Canadian judge (11.8) and the German judge (11.8).

Why didn't the French judge downvote the Canadians? Surely, if there
was a deal with the Russians, this would have increased the
chances of a Russian victory. Perhaps that would have been too obvious.
But Canada's apparent upvote for its own team is perhaps a little too
obvious as well. And is Germany's vote for Canada a sign of "western-block" voting?

The preceding analysis shows how difficult it is to prove
irregularities in the judging system. Especially if the allegations are
based on one disputed vote. Perhaps the French judge was pressured to
vote for the Russian team, but the figures can't really tell us that she did. They may indicate some other shady voting patterns, but most likely we'll never
find out the full story.

The bottom line of this whole affair is that the discipline of
Figure Skating is in a lose-lose situation. The incident has stirred
up too much commotion to simply ignore it. On the other hand, France
will most likely not act as a fall guy, and will probably stir up
stories of vote rigging in past (Olympic) events. The worst case
scenario would be the conclusion that Figure Skating has been tainted by
east-west block voting ever since the Cold War; this would discredit
the discipline to the point of reconsidering Figure Skating as a
true sport. It seems unlikely that the IOC will drop Figure Skating as an Olympic Sport because it
is a major source of revenue. But the events in Salt Lake City may well
be a turning point in the sport's history.

The ISU, pressured by the
IOC concluded that the French judge's
voting was compromised, and awarded an additional gold medal to the
Canadian pair.

The Olympic jury panel always consists of a 10th member overviewing the 9
scoring judges, but also scoring each performance (even though these results are not used during the competition). It was decided not to use these results to replace those by the French judge. In that case the Canadian pair would have won the gold, and the Russian pair would have ended up with silver. However, the ISU rule book does not mention any procedure on how to deal with tampered judging, and stripping Sikharulidze and Berezhnaya from their gold would have severely upset the Russian camp.

Although awarding two golds appeared to be the politically correct way out of the situation, it raised more questions about the whole process of sports judging in general, and figure skating in particular. There is hardly any mention of the other four judges who voted for the Russian pair as the winners. Was their judging equally unfair? Or was the French judge's voting in order, and awarding the second gold solely based on external pressure?

The ISU's chairman Ottavio Cinquanta proposed several changes to the jury system that should avoid future mishaps. Based on the historical record and the way this situation was resolved, that appears to be an opportunistic dream.

I only occasionally watch figure skating, but I do skate a little myself, which makes me really appreciate some of the stuff I see world-class skaters not only doing, but making look easy. Still, as someone with a mathematics degree, there are three things about the scoring system that have always seemed, well, whacko to me.

First, there's this whole business of the ordinals. If the point is to come out with a ranking at the end, why not just have the judges rank the competitors to start with and drop the 5.6...5.8...5.7...5.7... nonsense?

Second, there's the fiction that scores range from 0.0 to 6.0. When's the last time you saw anyone get less than a 5.5 in a serious competition? There's really very little room for discrimination among performances. If you're really basically ranking performers in 5 or 6 steps, then adopt a scoring scale that does that instead of pretending you really have 60 steps.

Finally, there's the whole idea that all the sports that do subjective judging do it differently. Skating does it one way, gymnastics another, and diving yet another. I have no idea how moguls and snowboarding do it. Couldn't representatives of all these sports get together and settle on one basic scheme?