replication revisited

One of the
traits of a cogent argument is that the evidence be sufficient to warrant
accepting the conclusion. In causal arguments, this generally
requires--among other things--that a finding of a significant correlation
between two variables, such as magnets and pain, be reproducible.
Replication of a significant correlation usually indicates that the finding
was not a fluke or due to methodological error. Yet, I am often sent copies
of articles regarding single studies and advised that it may be about time
for me to change my mind on some subject. For example, I recently heard from
Jouni Helminen that "It may be time to update the Skepdic website regarding
magnet therapy on fibromyalgia patients." Jouni referred me to an
article from the University of Virginia News. I state in my entry on
magnet therapy: "There is
almost no scientific evidence supporting magnet therapy." The article about
a study done on magnet therapy to reduce fibromyalgia pain did nothing to
change my mind. The study, conducted by University of Virginia (UV)
researchers, was published in theJournal of Alternative
and Complementary Medicine, which asserts that it "includes
observational and analytical reports on treatments outside the realm of
allopathic medicine...."

The only people who refer to conventional medicine as
allopathic are rabid
opponents of conventional medicine and may not be the most objective folks
in the world when it comes to evaluating anything "alternative." Be that as
it may, the study must stand or fall on its own merits, not on the biases of
those who publish it. Furthermore, the study must be distinguished from the
press release put out by UV. The headline of the UV article states that
Magnet Therapy Shows Limited Potential for Pain Relief. The first
paragraph states that "the results of the study were inconclusive." Not very
promising. Even so, the researchers claimed that magnet therapy reduced
fibromyalgia pain intensity enough in one group of study participants to be
"clinically meaningful." I guess "limited potential" is the middle ground
between "inconclusive" and "clinically meaningful." This is somewhat
confusing.

The UV study involved 94 fibromyalgia patients who
were randomly assigned to one of four groups. One control group "received
sham pads containing magnets that had been demagnetized through heat
processing" and the other got nothing special. One treatment group got
"whole-body exposure to a low, uniformly static magnetic field of negative
polarity. The other...[got]...a low static magnetic field that varied
spatially and in polarity. The subjects were treated and tracked for six
months."

"Three measures of pain were used: functional status
reported by study participants on a standardized fibromyalgia questionnaire
used nationwide, number of tender points on the body, and pain intensity
ratings."

One of the investigators, Ann Gill Taylor, R.N.,
Ed.D. stated: "When we compared the groups, we did not find significant
statistical differences in most of the outcome measures." Taylor is a
professor of nursing and director of the Center for Study of Complementary
and Alternative Therapies at UV. "However, we did find a statistically
significant difference in pain intensity reduction for one of the active
magnet pad groups," said Taylor. The article doesn't mention how many
outcome measures were used.

The study's principal
investigator was Dr. Alan P. Alfano, assistant professor of physical
medicine and rehabilitation and medical director of the UV HealthSouth
Rehabilitation Hospital. Alfano claimed that "Finding any positive results
in the groups using the magnets was surprising, given how little we know
about how magnets work to reduce pain." Frankly, I find it surprising that
Alfano finds that surprising, since it is unlikely he would have conducted
the study if he didn't think there might be some pain relief benefit to
using magnets. His statement assumes they work to reduce pain and the task
is to figure out how. Alfano is also quoted as saying that "The results tell
us maybe this therapy works, and that maybe more research is justified. You
can't draw final conclusions from only one study." Certainly, his last claim
is correct. His double use of the weasel word "maybe" indicates that he
realizes that you can't even make a strong claim that more research ought to
be done based on the results of one study, especially if the results aren't
that impressive.

Not knowing how many outcome measures the
researchers used makes it difficult to assess the significance of finding
one or two outcomes that look promising. Given all the variables that go
into "pain" and measuring pain, and the variations in the individuals
suffering pain (even those diagnosed as having the same disorder), it should
be expected that if you measure enough outcomes you are going to find
something statistically significant. Whether that's meaningful or not is
another issue. A competent researcher would not want to make any strong
causal claims about magnets and pain on the basis of finding one or two
statistically significant outcomes in a study that found that most outcomes
showed nothing significant.

But even if most of the outcomes
had been statistically significant in this study of 94 patients, that still
would not amount to strong scientific evidence in support of magnet therapy.
The experiment would need to be replicated. Given the variables mentioned
above, it would not be surprising if this study were replicated but found
different outcomes statistically significant. Several studies might find
several different outcomes statistically significant and some researcher
might then do a meta-study and claim that when one takes all the studies
together one gets one large study with very significant results. What you
would actually get is one misleading study.

If other
researchers repeat the UV study, looking only at the outcome that was
statistically significant in the original study, and they duplicate the
results of the UV study, then we should conclude that this looks promising.
But one replication shouldn't seal the deal on the causal connection between
magnets and pain relief. One lab might duplicate another lab's results but
both might be using faulty equipment manufactured by the same company. Or both
might be using the same faulty subjective measures to evaluate their data.
Several studies that showed nothing significant for magnets and pain might
be followed by several that find significant results, even if all the
studies are methodologically sound. Why? Because you are dealing with human
beings, very complex organisms who won't necessarily react the same way to
the same treatment. Even the same person won't necessarily react the same
way to the same treatment at different times.

So, a single
study on something like magnets and pain relief should rarely be taken by
anybody as significant scientific evidence of a causal connection between
the two. Likewise, a single study of this issue that finds nothing
significant should not be taken as proof that magnets are useless. However,
when dozens of studies find little support that magnets are effective in
warding off pain, then it seems reasonable to conclude that there is no good
reason to believe in magnet therapy. And I would not give up that belief on
the basis of what I read in the UV press release about their little study on
magnets and fibromyalgia.

reader comments

The Univ. Virginia study sounds
fairly typical of Alternative Medicine studies. Take a large number of
indicators that the treatment could be effective. After the study, hunt
around for one or two indicators that show effectiveness. In the "normal"
curve, you can get a 2-standard-deviation effect by chance about 5% of the
time. So if you have 20 indicators, you will likely find one indicator
that "shows" that the treatment is effective. If you have 40 indicators,
you will likely find two that indicate effectiveness. This procedure is
called "data dredging," and is a definite no-no. Instead, the proper
scientific procedure is to use all the data, not just those that support
your pet hypothesis. "Inconclusive" is often a euphemism for "didn't
work."