Tuller and Rehmeyer Trial By Error, Continued: Did PACE Really Prove that GET is safe

One of the most important and controversial claims from the PACE Trial was that graded exercise therapy is safe for patients with chronic fatigue syndrome (or ME/CFS, as U.S. government agencies now call it).

“If this treatment is done by skilled people in an appropriate way, it actually is safe and can stand a very good chance of benefiting [patients],” Michael Sharpe, one of the principal PACE investigators, told National Public Radio in 2011, shortly after The Lancet published the first results.

But to many in the ME/CFS community, this safety claim goes against the very essence of the disease. The hallmark of chronic fatigue syndrome, despite the name, is not actually fatigue but the body’s inability to tolerate too much exertion — a phenomenon that has been documented in exercise studies. All other symptoms, like sleep disorders, cognitive impairment, blood pressure regulation problems, and muscle pain, are exacerbated by physical or mental activity. An Institute of Medicine report this year even recommended that the illness be renamed to emphasize this central problem, choosing the name “systemic exertion intolerance disease,” or SEID.

Second, after the trial began, the researchers tightened their definition of harms, just as they had relaxed their methods of assessing improvement. In the protocol, for example, a steep drop in physical function since the previous assessment, or a slight decline in reported overall health, both qualified as a “serious deterioration.” However, as reported in The Lancet, the steep drop in physical function had to be sustained across two out of the trial’s three assessments rather than just since the previous one. And reported overall health had to be “much worse” or “very much worse,” not just slightly worse. The researchers also changed their protocol definition of a “serious adverse reaction,” making it more stringent.

Click to expand...

The assessments were at 12, 24 and 52 weeks.

Therefore reports of a steep drop in physical functioning occurring between 25 and 52 weeks were simply ignored.

I find this worrying because I think it is likely that the amount of harm inflicted by a GET program is cumulative and would be highest at the end.

Furthermore, a doctor needed to determine that the event was directly caused by the treatment—a decision that was made after the doctor was told which arm of the trial the patient was in.

Click to expand...

Considering that therapists are told that GET is safe since patients have no underlying pathology this isn't reassuring.

I threw in a comment about absolutely absurd idea of an "objective" study that fails to measure what changes as the result of the trial.

The business about failing to measure total activity is not simply a minor issue concerning matters somebody might dream up to measure. I feel like saying: Hello? This is a chronic disease characterized by fatigue. Fatigue leads to reduced activity, and this is not normal fatigue which resolves within hours or days. Do you know if there was any change in total activity required to participate in treatment? Do you know if there was a difference in activity between the different arms, so we could estimate the effort put into GET? Do you know if there was a change, positive or negative, in total activity following treatment? If you don't know these things you don't have the claimed objective results, all you have are opinions -- the same ones you had going into the trial.

Every one of us must know exactly what I mean by displacing activity to participate in some desired activity. We know about the problem of offloading activities necessary to survive onto others. The researchers appear to be clueless. It is quite possible patients participating in GET displaced or offloaded activity in order to participate. This could even result in a complete lack of change in total overall activity during GET. In this case it is hard to claim you tested anything except the patience of patients.

The objective evidence that CBT/GET is safe amounts to this: 641 patients entered the trial; one withdrew consent; 640 patients survived the trial. By this standard you could show that sentencing patients to prison was safe and effective treatment.

The objective evidence that CBT/GET is safe amounts to this: 631 patients entered the trial; one withdrew consent; 630 patients survived the trial. By this standard you could show that sentencing patients to prison was safe and effective treatment.

Click to expand...

I agree with all of your post with one minor quibble. Did all 630 finish the GET arm??

Fantastic. Many thanks to Tom Kindlon for the huge amount of work he has put in on this despite his own health problems.

Click to expand...

Yes. And great to have Tuller and Rehmeyer covering these concerns so effectively. It is difficult to explain these matters in an easy to understand way, and we're lucky to have two talents on the job (and now working together).

Three of them did, from Radboud University in Nijmegen, Netherlands. CBT/GET, patients said they felt less fatigued or disabled, and actometers showed no increase in activity. But that actometer data was omitted from the original studies, and instead was published years later in a single followup paper.

At the very least it proves that subjective questionnaires cannot be used to reach conclusions about activity levels and symptoms, following months of brain-washing targeting the perception of those. Though that was already obvious to anyone with a modicum of sense.

In case anyone cares about subtle reasoning, I'm going to add something that convinced me this bunch was not to be even minimally trusted. There was the problem with the 6-minute-walk test that fully 1/3 of patients counted as participants in the study did not complete this both before and after treatment. This naturally led me to wonder how they dealt with missing data.

After considerable effort I decided this involved an unstated assumption that no one got worse after therapy, one claimed result of the study. When I made fairly modest assumptions of a different kind it was easy to reduce even the limited and clinically insignificant gains in the GET arm to nothing. (If you use Cohen's D measure, you can accomplish the same thing in a more academically acceptable quantitative analysis.) After this I was far from surprised when the step test data revealed no change in physical condition.

Counting those patients who did not provide objective performance data before and after treatment as participants made it clear these researchers didn't give a damn about having valid objective measures, they only wanted the label. Their later attacks on the validity of the walk test data showing poor performance make this even clearer. Having Actometers and failing to use them to judge changes in total activity required to participate in therapy, or to measure alleged gains or possible setbacks resulting from therapy, is part and parcel of the same attitude.

If it were possible to dig up documentary evidence that they really did assume the desired conclusion when handling missing data I should think that would sink the whole enterprise, reputations and all.

The only way I can see that the results show efficacy is by delaying or denying payments or expensive treatments via insurance or national health care, which does nothing for patients, but does benefit paid consultants.

Your question brings up the question of selection effects. If there weren't so many blatant problems in other aspects I think this would be a major point of contention. Permit me to rant a little on just how pervasive these were.

The original intake of patients referred for treatment for "CFS" by NHS doctors, who have long had material generated by the study authors to guide them, was 3158. Some of the same authors have argued that 30% of CFS diagnoses are in error, while implicitly assuming they are infallible, but we will overlook that. (Who said "we never make mistakes", the KGB perhaps?)

Out of that intake, the authors chose nearly 900 they wished to participate, but got only 641. The authors assume those people who declined simply had a perverse desire to remain ill. If we had some idea of the change in activity needed to participate we might be able to judge how many declined to participate because they lacked the marginal energy levels required. If such a reason had any influence on the process, it would mean that patients without the ability to offload or displace activity were underrepresented in the trial. They would probably have been near the lower bound for entry. We already know about selection problems near the upper bound, due to the change in entrance criteria.

We then divide the participants into four different arms of the study, yielding 150-160 in each. After that we simply omit about 1/3 who don't supply complete objective data from which we might measure improvement or decline. This means the only remaining objective measure is based on a something like 100-120 individuals in each arm. At this point the effect of about 3 individuals is enough to substantially shift mean values. In terms of initial intake this is about 0.1%, and even pathologists admit to such an error rate in diagnoses.

If these samples are normally distributed you can predict the number of outliers you might see without falsifying the assumption of a normal distribution. This is where individual data, though not necessarily identifiable individuals, become important. If the distributions violate the assumption of normality then those reported measures of significance don't mean much.

It happens that the population distribution being sampled for comparison was far from normal, even being one-sided. The assumption that selection was independent enough to create normally-distributed sample groups is untested. As far as I can tell selection boiled down to the same people saying "we want this individual in the study" or "we don't want that individual in the study" before we get to the randomization in assigning them to particular arms of the study. If you weight all the dice you don't have to worry about any particular die.

A normal distribution is a stable distribution completely characterized by two parameters. All study groups including the control showed an increase in variance/standard deviation over time, which argues against stability. The idea that these group distributions were completely determined by only two parameters remains an untested and unproven assumption. Detailed anonymous data would make it possible to test this. Such has not been provided.

In arguing that this study is suitable for setting national policy the authors implicitly assumed that the same illness found in a small percentage of that intake of 3158 affected all of them. They also assumed all these patients had enough marginal activity available to participate in therapy, despite the real possibility around 1/3 of those they wanted in the study might not have met this unstated criterion.

If I can use the effect on 3 individuals in a study, who may or may not have the same disease as others, to set national policy I can accomplish all kinds of things of dubious scientific validity.

Whew! Even after letting off the steam above, I still find that I stopped short of the connection with the immediate topic: did they demonstrate that GET is safe for CFS patients?

At this point I will put out an appeal for anyone who understands what the PACE authors actually did w.r.t. serious adverse responses related to anything, not just GET. Frankly, I don't understand.

The idea that 640 patients of the type I know went through a year of treatment without some ending up in hospital would be absurd. I don't even remember how many were hospitalized. What then appears to have happened is that someone, and I'm not certain of the independence of these judgments, made the determination that these hospitalizations were unrelated to anything taking place in PACE. If these determinations were also made by the authors, who have already demonstrated a strong will to deny patient reports, then I think we have a clear conflict of interest and potential manipulation of reported data. Redefinition of criteria for adverse events and serious adverse events during the study, in such a way that these became harder to report, does not fill me with confidence.

If these determinations were also made by the authors, who have already demonstrated a strong will to deny patient reports, then I think we have a clear conflict of interest and potential manipulation of reported data.

Click to expand...

Do you think that the real reason for refusing to publish data is that they manipulated the data about harms caused by GET?

Do you think that the real reason for refusing to publish data is that they manipulated the data about harms caused by GET?

Click to expand...

No, I doubt that would be apparent in the performance data in contention. We simply don't have any objective data we could request that would reveal what patients who did not participate in the walk test might have done if they had been tested. Data from the step test doesn't show much of anything. No data with the right granularity to show any decrease in performance during a period of hours or days following an instance of GET were collected. It is quite possible that recovery over weekends reduced cumulative harm from GET.

We don't even know that GET actually resulted in increased total activity for patients. The increase seen in therapy may have been offset by decreases in outside activities. We don't know if there were increases or decreases in total activity as a result of therapy. How can you judge harms in that situation?

What concerns me w.r.t. serious harms are data about hospitalizations. What I read doesn't tell me much of anything. Is there anyone who can tell me rates of hospitalization in CFS patients in general as compared to this study?

In answering an objection on the virology blog I ran into a misunderstanding. The poster thought I was proposing new objective measures I simply imagined the study needed.

The words in contention are "objective", "recovery" and "safe" or "safety". No critic put these words in the authors' mouths or compelled them to repeat claims many times. The proposal that garnered funding included measurement of total patient activity via Actometers (resembling "FitBit"). This was to be done before and after therapy to show improvement resulting from treatment. It is less clear that there was intent to show an absence of "harms" via objective measures of performance, but that is what many who read the documents associated with the study seem to have expected.

Modifying methods is often necessary in research, based on experience gained during a trial, but dropping a whole series of measures while retaining claims of objective evidence should be highly suspect. Had the authors simply dropped claims of objective data, and concentrated on their preferred subjective measures, there wouldn't be much problem. Everybody could have their own opinion about what the study showed.

When it comes to harms, and showing their absence, I cannot see any way to produce good evidence via fundamentally subjective questionnaires. Anyone who has some technique I have not imagined should publish it. This is what brings us back to questions about objective measures of improvement or decline in individual patients. Showing that group means improved does not rule out a subset of patients suffering harm as a result of therapy, particularly when those improvements are as shaky as these appear.

Reading interviews and press reports based on the study you might be forgiven for thinking they demonstrated that all participants in favored therapies showed significantly improved performance, and no patient declined in performance. (Personally, I think this is absurd on the face of it.) You would never guess that the patients in the preferred group considered "recovered" exhibited performance in the range of patients with congestive heart failure after therapy.

I've already said that there are real questions about the significance of means and p values if the distributions are not normal. I'm confident that publication of individual scores, without revealing identities of patients, will show the distributions were far from normal. A particular consequence would be revealing the effect of outliers who may have been in the study as the result of misdiagnosis. Counting about three positive outliers, and ignoring negative ones who declined to perform a test, might account for the total objective evidence supporting efficacy.

Questions about safety (absence of harms) are more complicated. Even if mean values move up there is a real likelihood some number of patients declined in performance. Part of the reason is that there is more "headroom" in scores on the upside. Patients whose scores declined much would be unable to participate, and would likely be hospitalized. I've already mentioned my uncertainty about published material regarding hospitalization.

Even if only 5 or 6 patients in the GET group declined in performance, as a result of therapy, that would translate into, say, 3% of the group. Rolling this out to the entire population of 250,000 patients in the U.K. would harm thousands of patients. We don't even know the percentage or the size of any decline so we can't decide what would constitute significant harm. Arguments that we can't prove harms, based on published data, sound a lot like a defense attorney in a malpractice suit. This is not how respectable scientific research is conducted.

The way these questions are handled in drug trials is that, in addition to mean values, we get detailed information about the number of patients who benefited, plus a list of exceptions which clinicians should be alert to avoid. No such information can be derived from published PACE literature. Doctors who aren't in a position to make the kind of in-depth study of these results attempted here would be very likely to assume the kind of outcome I dismissed as an absurdity above. In my opinion this looks like a deliberate attempt to deceive professional peers (assuming these authors admit they even have any peers.)