The World Anti-Doping Agency (WADA) has been created in 1999 to improve and harmonize the global fight against doping.

Here it is stressed that, in our opinion, this timely and commendable initiative overly relies on the questionable science of anti-doping laboratories.

This caveat also applies to the biological passport, which has recently been introduced by the UCI, see presentation below. N.B. All these claims are substantiated in peer-reviewed articles.

Questionable science yields sub-optimal procedures and poorly defined decision rules. As a result, the risk of a false positive declaration is often unknown. Note, however, that WADA should currently also be prepared to accept an unnecessarily high risk of a false negative declaration. Since the false negative problem is obviously difficult to assess in practice, criticism will be illustrated with the former deficit.

Presentation

For practical examples of potentially flawed science, see:

N.M. FaberA think tank for anti-doping research
Play the Game 2009Download (=1,079 kB)

Now follows the treatment of a real case that should even embarrass a lay person. Interestingly, in a 2008 interview the director of this laboratory, Frans Delbeke, noted that he was proud ('fier') that his laboratory ranked second-best in the world after Cologne.

Briefly, the suspect sample is declared 'positive' because it sufficiently matches a reference sample for three (3) signals. Since measurement results cannot be obtained without negligible uncertainty, a match range is calculated for the reference signals. The procedure is summarized in:

Calculation of match between reference and suspect sample.

N.B. As a first step, all signals for a sample (reference/suspect) are scaled to the largest one, the so-called base peak. The scaled signal for the base peak follows as 100 (%) for both samples - a fixed number without a match range. The laboratory documentation package gives a nonsensical range 85-115, but this is not repeated here.

For the remaining ones, the match range depends on scaled signal size, and is given in a WADA technical document (=129 kB):

if scaled signal < 25

if 25 ≤ scaled signal ≤ 50

if scaled signal > 50

match range is ±10 (fixed size)

match range is ±25% of the scaled signal (relative size)

match range is ±15 (fixed size)

One observes that the relative-sized match range for signal 2 is calculated after rounding the scaled signal to integer, which is obviously the incorrect order.

Although the 'selected procedure' did not affect the decision in this particular case, it should be clear that it is not indicative of the sophistication expected in an anti-doping laboratory. In our opinion, it is nothing less than either thoughtless or careless.

Finally, it is noted that WADA's decision criteria lack a rigorous statistical underpinning, which can also be easily understood by a lay person:

The match range is the same in size, regardless whether the results are obtained from peak heigths or areas, i.e. different quantities with entirely different uncertainties.

The criteria do not provide a smooth transition for match range at scaled signal values 25 and 50. From a measurement science point of view, it does not make any sense at all to attach a different uncertainty to the indistinguishable signals 24.999 and 25.000 (say), namely 10 and 6.25 respectively.

The number of signals deemed sufficient for identification for all possible target compounds, namely three (3), is based on results obtained for a single model compound (diethylstilbestrol) and a single technique (EI-GC-MS), as stressed in:

It cannot be denied that procedures are carried out in either a thoughtless or careless manner.

General critique

Questionable science may affect critical stages of a doping test, for example:

The processing of the analytical data
Instruments deliver large amounts of data that need to be processed to arrive at an analytical finding. This task is commonly executed using commercial software that is not state-of-the art from a scientific point of view. It is simply embarrassing to see that this software (e.g. used for standard operations such as background correction) often lags behind by more than 20 years.

The assessment of an analytical finding to be 'adverse'
The analytical finding needs to be assessed as 'normal' or 'adverse'. Often, decision criteria are encountered that lack a rigorous statistical underpinning. N.B. For mass spectrometric detection, one of the work horses of anti-doping research, the statistical solution has been available in the scientific literature for more than a decade, but remains to be ignored (see 'worked example' for an illustration).

The conclusion that an 'adverse analytical finding' is due to doping
Anti-doping researchers often compare elite athletes with 'normal controls', which, as has been pointed out by others, is a basic flaw. Elite athletes, e.g. by their different genetic make-up, do not necessarily come out as 'normal' in a test although being 'clean'.

For written accounts of our scientific criticism, see:

N.M. Faber and R. Boqu&eacute
On the calculation of decision limits in doping controlAccreditation and Quality Assurance, 11 (2006) 536-538

N.M. Faber
The limit of detection is not the analyte level for deciding between "detected" and "not detected"Accreditation and Quality Assurance, 13 (2008) 277-278