Welcome to the Science of Sport where we bring you the second, third, and fourth level of analysis you will not find anywhere else.

Be it doping in sport, hot topics like Caster Semenya or Oscar Pistorius, or the dehydration myth, we try to translate the science behind sports and sports performance. Consider a donation if you like what you see here!

On the scale, 0 means no suspicion of doping – a rider with an extremely clean passport record. Scores between 6 and 10 apparently imply a high degree of circumstantial evidence because of large fluctuations in the passport data.

Not surprisingly, the reaction from the cycling world has been angry, with the UCI condemning the leak and the riders reacting angrily to both the leak and the fact that they’re being rated on some kind of suspicion scale.

But let’s cut through the expected emotive reactions and look at some of the issues behind the story.

The first point is that the leak and the fact that riders are being rated on a suspicion scale are two unrelated events. That is, the fact that there is a leak should be condemned, mostly because it’s a symptom of an organization that doesn’t have it’s house in order. And a leak has some far-reaching implications, in that it undermines the trust that biological passport system relies on. So that is certainly a concern for those involved in the fight against doping.

But the fact that riders are being ranked on this scale is a separate issue. It may even be commended, depending on how it is done, but the leak has created confusion and speculation which shouldn’t be confused with the actual concept of rating riders (a practice that certainly can be debated).

The rating concept is an issue that has people asking some very pertinent questions: “How is the score calculated? What are the UCI doing with the scores? How can a rider be suspicious and not simply positive or negative? Are they targeting testing based on the ratings? Is this fair?”.

For the answers to these questions, let’s go back to go forward.

The biological passport concept

We’ve said a fair amount on the biological passport in 2011, most of it very positive. If you missed those pieces, I’d encourage you to read them (they are linked below).

Without wishing to rehash those (lengthy) posts, a summary is important because it sets the context for why a rating system is part of the control and policing of the passport system. First, the passport is a very stringently designed scientific tool. It has the backing of substantial research, some of which we discussed in that first post on the legal issues. It’s not perfect, and of course more research is needed, but it’s not a stab in the dark at catching dopers.

And so what has been done is to set a limit of probability that allows 99.9% confidence that a a value (for reticulocytes, hemoglobin or off-score) lies outside the ‘normal’ range. All measurements are analysed using software to answer the question “what is the probability of finding these values in an undoped sample?” Only when a value lies outside this 99.9% confidence limit does it constitute a ‘strike’.

That’s the first line of defense against unfairly pursuing doping cases. The second is that a rider is not sanctioned on a once-off strike. A case is only opened if several different variables are beyond these boundaries on more than one occasion.

If this happens, then a third layer is that a team of experts evaluate and analyze the values. If they feel that the profile is typical for a certain doping intervention, the athlete is contacted and questioned about potential reasons for his values. His justifications are again evaluated by the experts. Only if they are still convinced that the profile is typical of doping and is not caused by the explanations put forward by the athlete (as has happened for Pellizotti and Valjevic), do they suggest the opening of a procedure against the athlete.

Legal clout, but expanding the impact of the passport through ranking the “near strikes”

This strength and “legal clout” comes from the high probability limit of 99.9%, which means that false positives will happen only once in a 1000 samples. This is essential, because it must protect riders against false accusation. The “problem” is that it also sets the bar so high that a lot of sample that are doped will perhaps fall into a 99% confidence limit, or even a 95% confidence limit, and thus not counted as strikes. The legal requirement is a “conservative system”, which doesn’t necessarily ensure maximum impact on doping.

In other words, at say 95%, a sample would be suspicious and indicative of doping, but not legally enforceable, because you could get a false positive once every 20 measurements. But that cyclist may still be doping and if they repeatedly produce these kinds of measurements, then these “near-strikes” also need to be accounted for. And this, I suspect, is how the rating system works.

A justified process of control and one way of implementing the passport

Now, this is speculative, because we simply don’t know that this is the basis for those scores. However, for the sake of illustration, this is how I would use a rating system: Consider a rider whose measurements would produce a ‘strike’ at a 99% confidence level, but not 99.9%. The first such occurrence may be worth a score of 1. If that same rider repeatedly ‘strikes’ at this lower probability level, he accumulates more points. These points could be scaled afterwards by some correction factor to produce the 11-point scale (0 to 10) that was used, and there’s the value you see in the lists below.

Our rider never exceeds a 99.9% confidence level, and so quite rightly, a case can’t be opened, but he’s been identified as a suspicious rider based on a long sequence of unusual values that lie on the border of being strikes. The more often he is flagged with a near strike, the higher his score, which is reasonable. And I believe it is entirely justified to watch this rider more closely. Why? Because we know from testimony that riders are able to manipulate their blood values through masking with EPO and by micro-dosing, and so it’s quite conceivable that the precision of doping allows riders to stay beneath a 99.9% “radar” but not the 99%.

“Too normal” – sometimes variation is good

The other interesting possibility, mentioned in the L’Equipe report, is that some riders are suspicious precisely because they are “too normal”. That is, variation in blood markers is supposed to happen, and when it does not, then that too is a flag, and so is part of the designation of what constitutes suspicious. I’m not sure how this is incorporated into a score, but it’s hopefully understandable that if a rider presents with “perfect” values time after time, that’s as much a problem as wild fluctuations outside the limits.

The fact that he is then rated on a scale, receiving a value of say 4 or 5, is simply part of the management of such a system. In my opinion, this is entirely justified and acceptable. It is part of the effective implementation of a passport system, which would lose much of its “bite” if there was not some avenue to monitor and target testing of riders who did not hit the 99.9% level.

The fact that these scores are now public knowledge, without necessarily explaining where the number comes from, that’s the problem, for the following reasons:

1. The possibility of subjective scoring and generalization

Honestly, in the immediate aftermath of seeing that list, did you look at some cyclists and say “Ha, I knew it! Doper!”, or did you look at some names and scores and say “No way is Rider X more suspicious than Rider Y!”? Because if you did, then you were, at least partly, comparing riders based on subjective perceptions of guilt and innocence!

The problem is that it’s quite possible that whoever developed those rankings did so objectively, but with some subjective moderation, and that would not be a good thing. If the rating is developed solely on the basis of biological passport measurements (as we are led to believe by the L’Equipe article), then I’d be satisfied. But if it is subjectively moderated, then that may be an issue.

An extension of this is that once public, then the scores can further re-inforce stereotypes about certain teams or riders. For example, L’Equipe took the step of using the UCI scores to calculate which teams and nationalities were most suspicious. In what some will describe as ‘vindication’, it turns out that Astana and Radioshack are the most suspicious teams, and Russia, Ukraine and Kazhakstahn are most suspicious nations. I’m reminded that sometimes, stereotypes exist because they’re true – sometimes, reputations are earned and deserved. But one can sympathize with the rider from these teams/countries who is NOT a doper, but is now implicated by association and generalization, which would be unfair.

For the public, the subjective perception will always feature, but what a leak does is allow confusion, and confusion allows more subjectivity and generalization.

2. Uncertainty over the relative rankings and scaling

How is a rating of 8 different from a rating of 4? How close to a doping sanction is a guy who scores 9? Is a score of 3 supposed to be interpreted as possible doping, likely innocence or neither? Does the score actually have any doping relevance? For one thing, it doesn’t mean that the rider who scores 8 is doping twice as much as the rider scoring 4. It doesn’t mean he is twice as likely to be doping either. The truth could well be the other way around, but it’s just that the one is smarter at avoiding larger changes in bio-passport data. But the problem is that by the time you get to 9 or 10, then that rider is so strongly suspected that a lot of people would be wondering why he isn’t just sanctioned. Or why the rider who scores 5 (Contador) is caught whereas riders who score 9 or 10 (Menchov, Popovych) are not caught? Hopefully, in my short summary above, and in this previous post, I’ve explained why bringing a legal case has a more stringent requirement than being suspected.

I’ve also given my illustration of how the bio-passport MIGHT be used to create these scores, by combining ‘near-strikes’ according to an objective criteria with the number of times they occur. The cyclist scoring 8 or 9 might have produced 9 suspicious values at a probability limit of 99%, but not one at 99.9%. Of course, this is speculative, which is why a leak undermines the credibility of the process, because it invites speculation.

3. The witch-hunt phenomenon

The third point, very relevant, is that this kind of scoring (independent of it being leaked) will lead to a witch-hunt, where some riders will be targeted for testing. I actually don’t have a problem with this, provided that the classification of riders is done objectively using solid science (see point 1). I will concede that some cyclists may be targeted unfairly, perhaps based on their reputation or allegations. But honestly, I don’t even have a problem with this, as long as it’s not prejudicial to the rider’s participation (that is, cases are opened unfairly).

But that doesn’t seem to be the case – every case opened so far has been won. The expertise behind the passport is world class (even if the management and administration by the UCI is not – see the leak as Exhibit A), and so I’m confident that targeted testing is not a witch-hunt, but a more cost-effective and intelligent way to weed dopers out. It’s much like the sniper approach compared to the machine gun approach – if finances are limited, then rather spend the money where it may be most effective, and that is part of the value of the passport.

Incidentally, targeted testing has helped catch dope cheats before – Ricco was suspected for a long time and targeted for testing before eventually being caught, and so there is definitely merit. I guess the key question here is what do you actually do with a rider who is targeted? By definition, the fact that he is suspicious and not positive means that you’ll never be able to bring a case against him unless he slips up. And perhaps more frequent testing for the higher rated cyclists will improve the chances of catching these slip-ups. That’s one way of focusing testing. This kind of focus saves money, and as long as the process is objective and fair, and not played out unnecessarily in public, then I think they’re positive. But I accept that some may disagree.

The other thing that is really interesting to consider is that some cyclists may be better than others as disguising doping. Now, because of this leak, cheaters who are better at manipulating their values look squeaky clean, and have this rating as their defense, when in fact, all they are is more effective at doping! That’s part of the reason why linking names to scores may not be ideal, and certainly why a leak doesn’t help because we lack the context to fully understand how the score is derived.

As an aside, it is interesting to note that last year, during the Tour, a lot of the riders who were rated 8 or higher were NOT tested (as raised by an independent WADA Report) – this point is raised by Festinagirl in the comments section below, and it again opens the UCI to alleged favouritism as a result of lax testing. If the point of this rating system is to target testing, then testing had better reflect it. If it doesn’t, well, what’s the point of the scores? Too much unanswered.

Fascinating insight into the process, but have we moved backwards?

I’d be lying if I said the scores were not fascinating to see. I’ve often argued for full transparency in anti-doping efforts, that the bio-passport values should be made available (as some promised to do but never did). Seeing the rankings of riders is about as transparent as it gets, but the confusion and uncertainty over exactly what the scores mean and how they’re used means one step forward for transparency, one step backward for improved knowledge! And perhaps most worryingly of all, two steps back for trust and credibility of the whole anti-doping effort.

So perhaps the scores might have been made available without the associated names. Revealing that riders are rated, and telling us that say 50% of the riders have a score of 4 or higher would send the same message to the professional peloton that they are being watched as publishing names.

As mentioned at the outset, linking the names with the scores only undermines credibility of the system, and the system is so reliant on the trust of anonymous analysis of blood values. It also provides a convenient public defense to some riders who may not necessarily be innocent, and possibly disadvantages riders who may be innocent but now have high index scores.

The reality is that putting names to the biological passport should be done only in the very final step of the process – the leak has given us an insight into the process, but perhaps the same effect might have been achieved without the risk to the credibility and trust within the between rider and tester.

But the UCI’s reaction, combined with other UCI anti-doping actions only provides more fodder for critical of the UCI. For one thing, the leak undermines the credibility of an organization that can’t seem to sort out its own internal processes. One of the key requirements of a successful anti-doping programme is trust – every party must trust the others to be fair and accurate in their actions and a leak like this undermines that trust. The UCI already has this clandestine reputation, and this doesn’t help.

It also doesn’t help when the UCI can’t reach agreements over testing at races and ends up marginalizing independent organizations, as happened for the upcoming Tour of California. Rather than having independent and comprehensive testing from USADA, Bonnie Ford reports that the UCI will now handle the testing, which further increases the perception that the UCI are hiding something, giving preferential treatment to certain teams and riders, and generally having an enormous conflict of interest.

When the UCI deplore the leak and publication of their suspicion scores, and follow this up a day later by marginalizing independent testing authorities as a result of unwillingness to make available test results or allow targeted testing, it only reinforces the perception of a clandestine approach. Perhaps full transparency is the way to go – but not for the first time, it’s the media who have forced it onto the UCI, who continue to send mixed signals about anti-doping efforts.

Difficult times for cycling. And all the while, a bike race goes ahead in Italy, but it’s anti-doping on the back pages.

Did you know?

We published The Runner's Body in May 2009. With an average 4.4/5 stars on Amazon.com, it has been receiving positive reviews from runners and non-runners alike. Available for the Kindle and also in paperback.