This is the third post in this series*; please see Part II for a review. Part II offered several arguments against the assertion that it is a good idea to perform efficacy trials of medical claims that have been refuted by basic science or by other, pre-trial evidence. This post will add to those arguments, continuing to identify the inadequacies of the tools of Evidence-Based Medicine (EBM) as applied to such claims.

Prof. Simon Replies

Prior to the posting of Part II, statistician Steve Simon, whose views had been the impetus for this series, posted another article on his blog, responding to Part I of this series. He agreed with some of what both Dr. Gorski and I had written:

The blog post by Dr. Atwood points out a critical distinction between “biologically implausible” and “no known mechanism of action” and I must concede this point. There are certain therapies in CAM that take the claim of biological plausibility to an extreme. It’s not as if those therapies are just implausible. It is that those therapies must posit a mechanism that “would necessarily violate scientific principles that rest on far more solid ground than any number of equivocal, bias-and-error-prone clinical trials could hope to overturn.” Examples of such therapies are homeopathy, energy medicine, chiropractic subluxations, craniosacral rhythms, and coffee enemas.

The Science Based Medicine site would argue that randomized trials for these therapies are never justified. And it bothers Dr. Atwood when a systematic review from the Cochrane Collaboration states that no conclusions can be drawn about homeopathy as a treatment for asthma because of a lack of evidence from well conducted clinical trials. There’s plenty of evidence from basic physics and chemistry that can allow you to draw strong conclusions about whether homeopathy is an effective treatment for asthma. So the Cochrane Collaboration is ignoring this evidence, and worse still, is implicitly (and sometimes explicitly) calling for more research in this area.

On the other hand:

There are a host of issues worth discussing here, but let me limit myself for now to one very basic issue. Is any research justified for a therapy like homeopathy when basic physics and chemistry will provide more than enough evidence by itself to suggest that such research is futile(?) Worse still, the randomized trial is subject to numerous biases that can lead to erroneous conclusions.

I disagree for a variety of reasons.

Prof. Simon offered 5 reasons, quoted here in part:

It’s good for business. I don’t want to sound shallow, but there’s money to be made by statisticians when research is done, and if I make a few bucks and in the process help to make the research more rigorous, that’s a win-win situation. I’ve not done much work with CAM, but I have helped on several projects at Cleveland Chiropractic College…

Everyone deserves their day in court. I believe that if someone is sincere in testing whether a therapy is effective or not, then they deserve my help…

CAM therapies represent an enormous expenditure of limited health care dollars, and if research can help limit the fraction of CAM expenditures that are inappropriate then that represents a good use of scarce research dollars…

We have to trust that the system can work. Randomized trials are indeed subject to many biases, and it is worth noting them. But are the biases so serious that they will lead to incorrect conclusions about CAM? [etc.]

Scientific testing is the norm for other claims that lack scientific plausibility. I am a regular (reader) of Skeptical Inquirer and Skeptic Magazine, and when someone makes a claim about ghosts, telekinesis, or reincarnation, they’ll point out all the existing knowledge that makes such claims unbelievable. But then they’ll still go to the haunted house or set up a spoon bending experiment or reinterview people who remember past lives. These claims have even less credibility than much of CAM research, but they are still being tested. So why not test CAM the same way?

I take the “I don’t want to sound shallow” remark at face value, although I’d remind Prof. Simon that without the ‘subluxation,’ whose fatal implausibility he appears to have conceded, chiropractic is left with very little. Reasons 1, 2, and 4 amount to the same assertions: that efficacy trials (“testing whether a therapy is effective or not”) of futile methods have something worthwhile to add to the already compelling evidence against those methods; that they will be performed safely and ethically; that they will dependably show that the methods are ineffective beyond ‘placebo effects’ (we’ve already agreed on this, no?); and that EBM referees, such as those at Cochrane, will subsequently judge such methods futile.

I’ve previously offered several counterexamples to those assertions, including the Cochrane homeopathy reviews quoted in Part I, and the Cochrane “Touch Therapies” review linked from Part II. I’ve also offered examples of methods that are not quite as implausible but are dangerous and have been sufficiently refuted by other means, including biology and even clinical tests, but that EBM experts have deemed worthy of further testing: Laetrile (discussed in Part I), Na2EDTA “chelation therapy” for atherosclerotic cardiovascular disease, and the “Gonzalez Regimen” for cancer of the pancreas (each discussed in Part II).

Can “Research Help Limit CAM Expenditures”?

I’ll discuss such issues more below, including a response to Prof. Simon’s point 5, but first let me briefly address his point 3. The assertion that there is societal value to studying implausible methods has been the usual justification for such trials, as I discussed at some length in Part II. It began as an untested presumption in itself, and it does not excuse endangering experimental subjects or siphoning scarce public funds away from promising research. Moreover it would be, at best, redundant: if other facts are sufficient to refute a claim, there is no point in subjecting the claim to a trial. If people don’t understand that point, then it is the job of experts to explain it to them, not to devalue science by granting every preposterous notion a “day in court” that it has already had, or by issuing preposterous opinions such as “it is not possible to comment on the use of homeopathy in treating dementia.”

Regarding the presumption that “research can help limit the fraction of CAM expenditures that are inappropriate,” the evidence, such as it is, suggests otherwise. In the 1980s, Petr Skrabanek could accurately report that “numerous controlled trials have shown that acupuncture is nothing more than a placebo.” Yet even as additional, abundant, increasingly rigorous trials have relentlessly shown the same thing, acupuncture has steadily increased in popularity. The same is true for homeopathy.

Referring to slightly more plausible methods, Josephine Briggs, the Director of the NCCAM, reported that sales of echinacea, glucosamine-chondroitin sulfate, and gingko biloba had declined after disconfirming trials funded by her Center, but according to Steve Novella the decline was only temporary for echinacea. Perhaps some industrious reader can find data for the other two preparations—I don’t feel like shelling out $200+.

Such Trials Don’t Work

The final reason that efficacy trials of highly implausible claims are a bad idea is that they don’t work very well: they tend to yield, in the aggregate, equivocal, rather than merely disconfirming results. Yes, the biases are so serious that they have led to incorrect conclusions about CAM, at least for a substantial period. This is something that most physicians and even many statisticians seem unaware of, although it was utterly predictable. I’ve discussed this at length, beginning here:

EBM and “CAM”

To many in this era of EBM it seems self-evident that all unproven methods, including homeopathy, should be subjected to such scrutiny. After all, the anecdotal impressions that are typically the bases for such claims are laden with the very biases that blinded RCTs were devised to overcome. This opinion, however, is naive. Some claims are so implausible that clinical trials tend to confuse, rather than clarify the issue. Human trials are messy. It is impossible to make them rigorous in ways that are comparable to laboratory experiments. Compared to laboratory investigations, clinical trials are necessarily less powered and more prone to numerous other sources of error: biases, whether conscious or not, causing or resulting from non-comparable experimental and control groups, cuing of subjects, post-hoc analyses, multiple testing artifacts, unrecognized confounding of data due to subjects’ own motivations, non-publication of results, inappropriate statistical analyses, conclusions that don’t follow from the data, inappropriate pooling of non-significant data from several, small studies to produce an aggregate that appears statistically significant, fraud, and more.

Most of those problems are not apparent in primary reports. Several have already been discussed or referenced elsewhere on this site: here, here, hereand here, for example. Academics active in the EBM movement are aware of most of them and want to correct them—as a quick scan of the contents of almost any major medical journal will reveal.

It is clear that such biases are more likely to skew the results of studies that are funded or performed by advocates. This has been found in studies of trials funded by drug companies, for example, as referenced here. In the case of “CAM,” the charge is supported by the preponderance of favorable reports in advocacy journals (here, here, and here) and by examples of overwhelmingly favorable reports emanating from regions with strong political motivations.

For those reasons we can predict that RCTs of ineffective claims championed by impassioned advocates will demonstrate several characteristics. Small studies, those performed by advocates or reported in advocacy journals, and those judged to be of poor quality will tend to be “positive.” The larger the study and the better the design, the more likely it is to be “negative.” Over time, early “positive” trials and reviews will give way to negative ones, at least among those judged to be of high quality and reported in reputable journals. In the aggregate, trials of ineffective claims championed by impassioned advocates will appear to yield equivocal rather than merely “negative” outcomes. The inevitable, continual citations of dubious reports will lead some to judge that the aggregate data are “weakly positive” or that the treatment is “better than placebo.” An example is the claim that stimulation of the “pericardium 6” acupuncture point is effective in the prevention and treatment of post-operative nausea and vomiting—a purportedly proven “CAM” method.

Homeopathic “Remedies” are Placebos

After 200 years and numerous studies, including many randomized, controlled trials (RCTs) and several meta-analyses and systematic reviews, homeopathy has performed exactly as described above. The best that proponents can offer is equivocal evidence of a weak effect compared to placebo. That is exactly what is expected if homeopathy isplacebo.

Nevertheless, EBM advocates on the whole don’t see it that way. Those who want to see homeopathy vindicated, such as homeopath Wayne Jonas, the former director of the NIH Office of Alternative Medicine, point to the weakly positive evidence. Others, even those who find homeopathy implausible, are so convinced that EBM can answer the question (“Either homeopathy works or controlled trials don’t!”) that they call for more trials, with no end in sight. Such judgments expose a major weakness in EBM that is not apparent when the exercise is applied to plausible claims.

“CAM” Research and Parapsychology

That passage was a prelude to introducing the EBM “Levels of Evidence” scheme and the Cochrane abstracts later discussed in Part I of this series. It applies equally well to acupuncture, “energy medicine,” and other highly implausible claims that have been subjected to efficacy trials.

Here we come to Prof. Simon’s point 5, that “scientific testing is the norm for other claims that lack scientific plausibility,” such as ghosts and telekinesis. It is true that “psychic detectives” such as Randi and Joe Nickell and Ray Hyman and Richard Wiseman have tested such claims and continue to do so, but Prof. Simon ought to understand the differences between such tests and what’s at issue here. The former tend to be of the sort that I favored in Part II: simple (bias-resistant), inexpensive, performed by skeptics, with the onus of proof placed on the claimants. Such testing, moreover, is fun, which I believe is the main reason that Randi and others are drawn to it.

Typical “CAM” efficacy trials are altogether different, as I began to explain in Part II: they are expensive, messy, and bias-prone, and those who perform them are often enthusiasts or otherwise credulous. Such trials are akin to tests of telekinesis performed not by Randi or Wiseman, but by hopeful or true-believing parapsychologists—and that is exactly what is reflected in EBM-style reviews of their outcomes. Ironically, some “CAM” efficacy trials really are tests of telekinesis performed by true-believing parapsychologists, and much of “CAM” is nothing more than recycled psi claims now pitched to a naïve audience, as discussed here under “The Psi Myth.”

Thus homeopath David Reilly was correct, as I wrote here, when he asserted that “either homeopathy works or controlled trials don’t”:

…but not in the way that he supposed. If there is anything that the history of parapsychology can teach the biomedical world, it is the point just made: that human RCTs, as good as they are at minimizing bias or chance deviations from population parameters, cannot ever be expected to provide, by themselves, objective measures of truth. There is still ample room for erroneous conclusions. Without using broader knowledge (science) to guide our thinking, we will plunge headlong into a thicket of errors—exactly as happened in parapsychology for decades and is now being repeated by its offspring, “CAM” research.

Yes, “CAM” has much to owe to parapsychology, none of it good. Prof. Simon, a statistician, might consider that parapsychology has flirted with the barely positive side of the null effect for decades. Its apparent successes, modest and irreproducible though they’ve been, have rendered it an immortal field of fruitless inquiry: a pathological science.

History of science teaches us that scientific endeavor has often in the past wasted effort in fields with absolutely no yield of true scientific information, at least based on our current understanding. In such a “null field,” one would ideally expect all observed effect sizes to vary by chance around the null in the absence of bias. The extent that observed findings deviate from what is expected by chance alone would be simply a pure measure of the prevailing bias.

EBM, Eventually, Sort of Works

For fairness’ sake, let me mention that two veteran “CAM” researchers, Edzard Ernst and R. Barker Bausell (a statistician), eventually decided that most of what they had studied was bogus, and they seem to have arrived at this realization after examining the results of efficacy trials. Thus it is true that EBM, as it is currently practiced, can lead some researchers to rational conclusions.

The problem is far from solved, however: in the cases of the two researchers just mentioned, it took years for them to arrive at a truth that was always staring them in the face (as I discussed in Part I and elsewhere), during which time the waste, the unethical treatment of human subjects, and the false promise that is “CAM” research marched on. Because of the continued, inevitable, equivocal results of such research, moreover, the views of Ernst and Bausell are not shared by other “CAM” research enthusiasts, and probably won’t be anytime soon.

EBM Ignores External Evidence, but not Entirely: a Prelude to Part IV

To reiterate, the major problem with EBM, as it has been applied to implausible medical claims, is that it fails to give adequate strength to evidence from sources other than RCTs. Yet RCTs involving numerous experimental variables and outcomes, as have been typical for “CAM” efficacy trials, are prone to numerous errors and biases, whereas other sources of evidence can be definitive—as has been the case for homeopathy, Laetrile, Therapeutic Touch, chelation for atherosclerosis, and Craniosacral Therapy, for example.

For the first time in several years, motivated by this series, I’ve looked at a few complete Cochrane “CAM” Reviews. In the final posting I’ll discuss more external evidence that is missing from those reviews, but I’ll also report a couple of pleasant surprises. It turns out that Steve Simon was not entirely wrong when he asserted that “people within EBM (are) working both formally and informally to replace the rigid hierarchy with something that places each research study in context.” They have a long way to go, but there is at least a suggestion of change in that direction.

I’ll also address, I hope briefly, this statement by Prof. Simon:

Also how can we invoke scientific plausibility in a world where intelligent people differ strongly on what is plausible and what is not? Finally, is there a legitimate Bayesian way to incorporate information about scientific plausibility into a Cochrane Collaboration systematic overview(?)