Researcher Finds Doping Tests To Be Flawed

Cyclist Floyd Landis was stripped of his 2006 Tour de France title after he tested positive for synthetic testosterone. Donald Berry, a statistician at the University of Texas, doesn't buy it. Berry explains how drug testing could be more scientific.

You're listening to Talk of the Nation: Science Friday. I'm Ira Flatow. We've - program note, on Wednesday, Neal Conan is back at the Newseum in Washington, along with Ken Rudin, NPR's political junkie, and Ralph Nader. And if you're going to be in Washington, and you want tickets to join the audience for that live broadcast of Talk of the Nation, here's how you get them. You send an email to tickets@npr.org.

For the rest of the hour, a look at drug tests for professional and Olympic athletes. And with the start of the Olympic Games today, questions about drug testing and which athletes might be using performance-enhancing drugs will certainly be a topic of concern. Anti-doping agencies continue to test and disqualify athletes. According to USA Today, at least 37 athletes have been barred from the games since April. But what if these analyses are fraud, and some of the athletes testing positive are not actually abusing drugs?

A commentary published in this week's Nature claims that because of the problems with the way these tests are conducted and interpreted, it is possible that we're too quick to assume that a positive test means a guilty athlete. If errors on how athletes are tested make it nearly impossible to come to any conclusive decision, what needs to be done to make drug testing more scientific? Joining me now to talk about this is the author of the commentary, Donald Berry, head of the division of quantitative sciences, the chairman of the department of biostatistics, and the Frank T. McGraw Memorial chair of cancer research at the University of Texas M.D. Anderson Cancer Center in Houston. Welcome to the program, Dr. Berry.

Dr. BERRY: Well, I've been interested in doping - as, I guess, most sports fans are - for some time. I testified ten years ago in the Mary Decker Slaney case of testosterone testing, and came to be familiar with the Landis data...

FLATOW: Mm.

Dr. BERRY: From the fact that they had posted the - their defense, their so-called wiki defense, on the web, and so I was able to read about that.

FLATOW: The Floyd Landis case.

Dr. BERRY: The Floyd Landis case, the 2006 Tour de France issue.

FLATOW: Mm-hm.

Dr. BERRY: And so I've been discussing this with others, and a number of people besides me are concerned about the science or lack thereof in the testing process.

FLATOW: Well, I - yeah - let's...

Dr. BERRY: And so I took it upon myself to write.

FLATOW: Let's talk a bit about your concerns. You write that a positive drug test doesn't necessarily mean that the athlete was using drugs? How can that be?

Dr. BERRY: Well, how can it be?

FLATOW: Is there something wrong with the test? Is it flawed?

Dr. BERRY: It's - so, we don't know. What has to be done is to - you know, if somebody says the person tested positive, I want to know, if an individual is not a user, how likely is it that the test is going to show a positive result? I also want to know, if the individual is a user, how likely is it that the test is going show a positive result? And we don't have that information. Those experiments have never been done, and so we don't really know the accuracy of the test.

FLATOW: So, we're giving the - we're giving a test to which we don't know the accuracy of?

Dr. BERRY: Correct.

FLATOW: Wow. Let's - and let's focus in on the Floyd Landis case, because you focus in on that in your paper.

Dr. BERRY: OK.

FLATOW: You used that 26 - 2006 Tour de France case. The original winner, Floyd Landis, was stripped of his title after testing positive for drugs, and as an example of how the drug-testing process is flawed, and a logic under which it applies, you said that it makes it, though, impossible to make it conclusion about him either way. We should not have concluded that - given the testing that we have in the science of testing, you would say that he should not been - there's not enough evidence to have stripped him off his medal. Would that be fair?

Dr. BERRY: Well, I mean, I would leave that to a hearing.

FLATOW: Yeah.

Dr. BERRY: But what I'm saying is that the testing is non-informative, because we don't know its accuracy.

FLATOW: Mm-hm.

Dr. BERRY: There's never been a test of the process, and indeed, exactly what the process is, is subject to some controversy. But there has never been a test of the process to decide what the false positive rate is. And this would, in modern science, be conducted in a blinded experiment, where the individuals involved don't know whether that...

FLATOW: Mm-hm.

Dr. BERRY: The person being tested is a user or not. And actually, hopefully even the person being tested doesn't know that there's a double-blind experiment.

FLATOW: Mm-hm.

Dr. BERRY: So, that we can assess the false-positive rate and the false-negative rate.

FLATOW: Can you walk us through that case a little bit to explain? Give us the process that was used to explain this process of drug testing.

Dr. BERRY: OK. So, in the old days, 10 years ago, the testosterone was tested, it was compared to something called epitestosterone in a ratio, and if the ratio was bigger than a number - and typically six - then you were declared a user. As science progressed and technology progressed, they were able to look at metabolites of testosterone, and they were especially concerned with the possibility of synthetic testosterone, that it would raise the metabolites more than would the natural testosterone, and this was done using mass spectroscopy to get suspend of four metabolites.

So, in the Landis case, they've now dropped back to a ratio four to one. So, if your ratio is more than four to one, that puts you in the suspicious range, and then they do the more sophisticated test using mass spec. And the four metabolites that they look at, they ask, do we see unusual results? And if they see unusual results, sufficiently unusual according to their criteria, and exactly what those are, are not perfectly clear, which is a problem, then they call it a positive result.

Dr. BERRY: Well, there's an arbitrariness, but you know, there's always some arbitrariness. I mean, if you - think of a cut point, think of a continuum of results, and you put the cut point down, you say to the right means positive, to the left means negative, there's an arbitrariness in setting that. If you move it to the right, then you decrease the true positive rate. You also decrease the false-positive rate, but you increase the false-negative rate. And so, there's an arbitrariness associated with it. But my point is that you write down the procedure, then you have to validate it, then you have to address how well it performs in as-nearly-realistic setting as you can test.

FLATOW: Yeah. And you're saying this hasn't been done.

Dr. BERRY: This has not been done.

FLATOW: So, the baseline stuff hasn't been done.

Dr. BERRY: That's correct.

FLATOW: And so, where you - we're giving a test that we really don't know what the baseline variances may be.

Dr. BERRY: Well, we don't know what the false-positive rate is...

FLATOW: Right.

Dr. BERRY: And we don't know what the false-negative rate is.

FLATOW: And if you don't know that, you could have an athlete who falls into one of those gray areas and then to be labeled erroneously.

Dr. BERRY: That's correct.

FLATOW: Wow. Why haven't we had those done?

(Soundbite of laughter)

Dr. BERRY: Well, it's the - several reasons. One is that the - it hasn't been - this is a niche science, if I could call it that. I mean, it's not - it's a standard thing that we do in medicine, in diagnostic tests. I mean, you can't get something approved by the FDA for uses of diagnostic test unless you do these experiments. And I don't think that the science has been communicated sufficiently well to the anti-doping community. But there is a - they say, you know, we don't want to tell the athletes too much about what the test is...

FLATOW: Mm-hm.

Dr. BERRY: And about what the criteria are, or indeed the performance characteristics of the test, because, you know, it's the mouse in the mouse trap. We'd be telling the mouse about our mouse trap.

FLATOW: Mm-hm. Mm-hm.

Dr. BERRY: So, a problem, Ira, is that the anti-doping community is very closed. They don't want to communicate too much. They think they know the answers, and I want some demonstration that they know the answers. It's not conducive to science to have these things done in the small, dark communities.

FLATOW: Mm-hm.

Dr. BERRY: On the other hand, they say, we need to do that to protect our, you know, our methods and be able to use them, as they're suggesting now in the Olympics, to spring these new methods that they've come up with, presumably without the adequate testing.

SARA (Caller): Hi. OK. Concerning false positive, don't they do repeated testings? Like, if I get a positive, then they do another one to double check? And don't they even sometimes do a third? And then, my other question is, I've read that a person may have a super-high level of testosterone that could be naturally in his body, that may give us a false positive.

FLATOW: Well...

Dr. BERRY: Yes. Well, with respect to the latter, that's the issue - I mean, there's heterogeneity among individuals, and in order to understand, and presumably within an individual over time, depending on what he or she eats and various other things, potentially including exercise, so that's the kind of thing that can give rise to false positives. And you're right, they do, do additional testing. So, for example, they take a sample A and a sample B, essentially splitting the urine sample into two pieces, and to allow for testing potentially from other labs. But the formal criterion is that they do the P over E, the testosterone over epitestosterone. And if it's suspicious, then they do the additional tests for the mass spec.

Dr. BERRY: It's just that we don't understand how good it is, and how good it is, is not only the variability and the biology, but also the variability in the handling...

FLATOW: Mm-hm.

Dr. BERRY: And the potential for contamination and for mishandling. And we - you know, that's part of the process, and we need to know what the error rate is there.

FLATOW: Mm-hm. OK, Sara?

SARA: Oh, OK, sure. Thanks.

FLATOW: Thanks for calling. So, you're saying that if you're going to do these tests, let's say, at least test the tests first.

Dr. BERRY: Exactly. Exactly.

FLATOW: Let's test the test and see how reliable they are.

Dr. BERRY: Yes.

FLATOW: It's almost - this almost - and this may be way off base, but it almost sounds to me like we're asking the same thing we do of fingerprints here. It had never been tested, you know? It was just taken...

Dr. BERRY: Yes. Well, it's - so, for example, DNA fingerprinting.

FLATOW: Yeah.

Dr. BERRY: The issue that have - has been much publicized in the O. J. Simpson case, et cetera, there were tests of how well that happened in the early days of DNA fingerprinting, where they actually did some blinded tests to see, could you match effectively?

FLATOW: Mm-hm.

Dr. BERRY: And in a very interesting case in the Minnesota Supreme Court, they ruled that they would disallow the DNA fingerprinting evidence, because these tests had shown that the false positive rate was really out of whack, despite the incredibly small probabilities that the DNA testing would come up with.

FLATOW: So, what does that say to you?

Dr. BERRY: That it's critical to test the test.

FLATOW: And do you think - have you heard back since your editorial, is that your commentary? About testing the test from...

Dr. BERRY: You mean, from the World...

FLATOW: Yeah.

Dr. BERRY: Anti-Doping Agency? No, I haven't. I know that they think that I'm a lunatic, but other than that...

(Soundbite of laughter)

Dr. BERRY: I've gotten really positive feedback from quite a number of people who had complained about the same thing, including to the doping community, and were put off. I mean, it's one of the - this is a very powerful organization as we've seen in the recent United States - you know President Bush signing an agreement that they essentially rule.

Dr. BERRY: Right. And one of the issues that may come up in the calls - I hope it does - is the many tests that are carried out in the Olympics, et cetera, and if there's a false-positive rate, that false-positive rate gets magnified by the number of tests that we do.

FLATOW: Explain that a bit more.

Dr. BERRY: So, if you - if the false-positive rate for a single test is, let's pick a number and say...

FLATOW: Right.

Dr. BERRY: One percent.

FLATOW: OK.

Dr. BERRY: And now we're going to do 100 tests of 100 different athletes. If there's - the probability that you find at least one positive among those 100 is then very large.

FLATOW: Right.

Dr. BERRY: In fact, you expect to find one out of the 100, and the probability of finding the one is, you know, at least one is substantial. And so, erase the true positive rate by looking many times, either within an athlete as they do in the Tour de France, if you're among the winners, or across athletes, but you also increase the false-positive rate.

FLATOW: So, we'll wait to see perhaps after the Olympics is over, I mean. But if anybody reacts, or now that you've run this up to flagpole one more time, if anybody wants to test the test - thank you, Dr Berry.

Dr. BERRY: OK.

FLATOW: For taking time to be with us. And have a good weekend.

Dr. BERRY: All right. Thank you.

FLATOW: You're welcome. Donald Berry...

Dr. BERRY: Yeah.

FLATOW: Head of the division of quantitative sciences, chairman of the department of biostatistics. And he's also chair of the cancer research - the Steve McGraw Memorial chair of cancer research at the University of Texas in the M. D. Anderson Cancer Center in Houston.

NPR transcripts are created on a rush deadline by a contractor for NPR, and accuracy and availability may vary. This text may not be in its final form and may be updated or revised in the future. Please be aware that the authoritative record of NPR's programming is the audio.