A quick lesson on type-II errors (false negatives)

written by michaelgrayer, on 23 June, 2010 at 10:33 am

David Tredinnick, MP for Bosworth and staunch advocate of alternative therapies (such as homeopathy) in the House of Commons, is at it again. He has tabled three Early Day Motions proposing that the House welcome the findings of three separate trials of homeopathy that report “positive” results. One of them (a particularly nasty one since it relates to breast cancer, a very serious and life-threatening disease) has already received a properfisking. One of them is so laughably easy to debunk right from the abstract that I’m going to do so here. (I haven’t read the third yet, but I would be surprised if it’s not similarly nonsensical).

The authors conducted what is known as a non-inferiority trial. In other words, instead of trying to show that treatment X is superior to treatment Y (or placebo, if it’s a placebo controlled trial), which is the usual course of action, they try to show that treatment X isn’t worse than treatment Y, at least not by a pre-determined margin. These trials are only used when it is ethically difficult to conduct a regular trial, and have many weaknesses, which are detailed here. Funnily enough, this critical appraisal of non-inferiority trials is not cited in Adler et al’s paper. Whoops.

So what is the problem? Well, usually, you take a sample of people, randomise them into two groups, give one group the treatment you’re testing, give one group the comparitive treatment, get the results, and determine whether the results you got could have occurred by chance or whether the results are extreme enough to conclude that your treatment had a greater effect. In this type of study, a superiority trial, two types of error can be made:

you conclude that there is a difference when in fact there isn’t (false positive)

you conclude that the two treatments are the same when in fact there is, and your study contained too few subjects to actually detect that difference (false negative)

However, since this is a non-inferiority trial, rather than a regular superiority trial, everything is reversed, because a “successful” trial is one where no significant difference is detected. So here, a type-1 error will lead to a falsely negative conclusion, and a type-2 error will lead to a falsely positive one.

So what’s happened here? Well, the main problem is that only 91 patients are included in the study. That’s a tiny number (though David Tredinnick appears to think otherwise). If this were a regular superiority trial, we would say that the study is underpowered, i.e. if there is a difference between the two groups, the error margins that you put around the statistics you get from the trial are so wide that it’s highly likely that these error margins (more formally known as confidence intervals, or CIs) will overlap, and you don’t get a significant difference. That same principle holds here in the non-inferiority trial, only this time, the trial is erroneously deemed to be “successful” rather than unsuccessful.

The numbers quoted in the abstract should cause alarm bells to ring. Here are the figures:

Non-inferiorityof homeopathy was indicated because the upper limit of the confidenceinterval (CI) for mean difference in MADRS [the scale used in this study for measuring depression] change was less thanthe non-inferiority margin: mean differences (homeopathy–fluoxetine)were –3.04 (95% CI –6.95, 0.86) and –2.4 (95%CI –6.05, 0.77) at 4th and 8th week, respectively.

OK, so for there to be no “significant” difference, these confidence intervals should include zero (indicating no difference), which they do. But only just. Statistically speaking, the results are very much borderline, and given the tiny number of people involved in the study, it really is a leap of faith to conclude strongly that homeopathy is not inferior to fluoxetine. In fact, the mean differences appear to be quite large; with no discussion of the plausible range of MADRS scores, or what sort of difference in score constitutes an “improvement”, it’s very hard to tell. But if your error margins are as wide as the expected improvement, then of course you’re going to conclude that there’s no difference, simply because your error margins are too large to detect any.

In summary then, the article reads as follows: “we took a handful of people, gave one lot the standard drug, gave the other lot magic water, the standard drug seemed to work marginally better but because we took so few people, we can’t tell whether there’s any difference between the drug and the magic water, therefore the magic water is not inferior to the drug.” No shit.

5 comments to A quick lesson on type-II errors (false negatives)

On a quick scan of it, it looked sort of OK to me (other than n=91 and I wasn’t convinced by the pharmacist blinding), but it’s great to see someone with the knowledge and expertise demolish this nonsense.

Good demolition Michael and completely agree with you but one quick question:

Despite the lack of statistical significance of this study, is it possible that it hints at a truth which is that both Homeopathy and SSRIs are completely useless, or at least no more effective than a placebo?

Would be interested in your analysis of this article in the Times on SSRIs and Irving Kirsch (registration required) – is the evidence really so clear cut as it suggests?

I can only speak as a statistician here – someone with more expertise in the clinical arena can elaborate further on that side of things. I’m really not familiar with the wider literature on SSRIs, and I don’t want this discussion to get sidetracked into a discussion about the usefulness/uselessness of them.

But, based on the statistical evidence presented in the paper I’ve commented on, I’d say it’s possible, but right now the more plausible explanation is that the study was underpowered, and to try to draw conclusions about SSRIs from this paper alone would be a mistake. If SSRIs aren’t particularly effective, then yes, that would increase the tendency to an erroneous conclusion about the “non-inferiority” of homeopathy as well; it would suggest that it’s “not inferior to a drug which isn’t all that great in the first place”. But to me there’s nothing in these results that suggests that fluoxetine is “completely useless”. There isn’t some magical p-value at which a drug trial switches from being “completely useless” to “completely effective”, though many papers are written as though that’s the case. I have discussed this point in another post: see http://www.nontoxic.org.uk/?p=128. Without any discussion as to what kind of improvement is clinically relevant (i.e. how much should the MADRS score improve by to constitute a clinical improvement) it’s very hard to say.

I’m afraid that I haven’t read the Times article, and because I’m not a fan on principle of the silly paywall they’ve erected around their archives, I’m not willing to register just so as I can read it. If you can find a version that isn’t paywalled then I’m happy to take a look, but I doubt that I can add anything relevant to that particular discussion because I’m a statistician and not a pharmacologist.