Jennie and Erica highlighted serious and sometimes insurmountable flaws with this Review, including:

The failure to be clear and specific about what disease was being studied.

The acceptance of 8 disparate ME or CFS definitions as equivalent in spite of dramatic differences in inclusion and exclusion criteria.

The bad science reflected in citing Oxford’s flaws and then using Oxford studies anyway.

The well-known problems with the PACE trial.

The flawed process that used non-experts on such a controversial and conflicted area.

Flawed search methods that focused on fatigue.

Outright errors in some of the basic information in the report and apparent inconsistencies in how inclusion criteria were applied.

Poorly designed and imprecise review questions.

Misinterpretation of cited literature.

In this post, I will describe several additional key problems with the AHRQ Evidence Review.

Keep in mind that comments must be submitted by October 20, 2014. Directions for doing so are at the end of this post.

We Don’t Need No Stinking Diagnostic Gold Standard

Best practices for diagnostic method reviews state that a diagnostic gold standard is required as the benchmark. But there is no agreed upon diagnostic gold standard for this disease, and the Review acknowledges this. So what did the Evidence Review do? The Review allowed any of 8 disparate CFS or ME definitions to be used as the gold standard and then evaluated diagnostic methods against and across the 8 definitions. But when a definition does not accurately reflect the disease being studied, that definition cannot be used as the standard. And when the 8 disparate definitions do not describe the same disease, you cannot draw conclusions about diagnostic methods across them.

What makes this worse is that the reviewers recognized the importance of PEM but failed to consider the implications of Fukuda’s and Oxford’s failure to require it. The reviewers also excluded, ignored or downplayed substantial evidence demonstrating that some of these definitions could not be applied consistently, as CDC’s Dr. Reeves demonstrated about Fukuda.

Beyond this, some diagnostic studies were excluded because they did not use the “right” statistics or because the reviewer judged the studies to be “etiological” studies, not diagnostic methods studies. Was NK-Cell function eliminated because it was an etiological study? Was Dr. Snell’s study on the discriminative value of CPET excluded because it used the wrong statistics? And all studies before 1988 were excluded. These inclusion/exclusion choices shaped what evidence was considered and what conclusions were drawn.

Erica pointed out that the Review misinterpreted some of the papers expressing harms associated with a diagnosis. The Review failed to acknowledge the relief and value of finally getting a diagnosis, particularly from a supportive doctor. The harm is not from receiving the diagnostic label, but rather from the subsequent reactions of most healthcare providers. At the same time, the Review did not consider other harms like Dr. Newton’s study of patients with other diseases being diagnosed with “CFS” or another study finding some MS patients were first misdiagnosed with CFS. The Review also failed to acknowledge the harm that patients face if they are given harmful treatments out of a belief that CFS is really a psychological or behavioral problem.

The Review is rife with problems: Failing to ask whether all definitions represent the same disease. Using any definition as the diagnostic gold standard against which to assess any diagnostic method. Excluding some of the most important ME studies. It is no surprise, then, that the Review concluded that no definition had proven superior and that there are no accepted diagnostic methods.

But remarkably, reviewers felt that there was sufficient evidence to state that those patients who meet CCC and ME-ICC criteria were not a separate group but rather a subgroup with more severe symptoms and functional limitations. By starting with the assumption that all 8 definitions encompass the same disease, this characterization of CCC and ICC patients was a foregone conclusion.

But Don’t Worry, These Treatment Trials Look Fine

You would think that at this point in the process, someone would stand up and ask about the scientific validity of comparing treatments across these definitions. After all, the Review acknowledged that Oxford can include patients with other causes of the symptom of chronic fatigue. But no, the Evidence Review continued on to compare treatments across definitions regardless of the patient population selected. Would we ever evaluate treatments for cancer patients by first throwing in studies with fatigued patients? The assessment of treatments was flawed from the start.

But the problems were then compounded by how the Review was conducted. The Review focused on subjective measures like general function, quality of life and fatigue, not objective measures like physical performance or activity levels. In addition, the Review explicitly decided to focus on changes in the symptom of fatigue, not PEM, pain or any other symptom. Quality issues with individual studies were either not considered or ignored. Counseling and CBT studies were all lumped into one treatment group, without consideration of the dramatic difference in therapeutic intent of the two. Some important studies like Rituxan were not considered because the treatment duration was considered too short, regardless of whether it was therapeutically appropriate.

And finally, the Review never questioned whether the disease theories underlying these treatments were applicable across all definitions. Is it really reasonable to expect that a disease that responds to Rituxan or Ampligen is going to also respond to therapies that reverse the patient’s “false illness beliefs” and deconditioning? Of course not.

If their own conclusions on the diagnostic methods and the problems with the Oxford definition were not enough to make them stop, the vast differences in disease theories and therapeutic mechanism of action should have made the reviewers step back and raise red flags.

At the Root of It All

This Review brings into sharp relief the widespread confusion on the nature of ME and the inappropriateness of having non-experts attempt to unravel a controversial and conflicting evidence base about which they know nothing.

But just as importantly, this Review speaks volumes about the paltry funding and institutional neglect of ME reflected in the fact that the study could find only 28 diagnostic studies and 9 medication studies to consider from the last 26 years. This Review speaks volumes about the institutional mishandling that fostered the proliferation of disparate and sometimes overly broad definitions, all branded with the same “CFS” label. The Review speaks volumes about the institutional bias that resulted in the biggest, most expensive and greatest number of treatment trials being those that studied behavioral and psychological pathology for a disease long proven to be the result of organic pathology.

This institutional neglect, mishandling and bias have brought us to where we are today. That the Evidence Review failed to recognize and acknowledge those issues is stunning.

Shout Out Your Protest!

This Evidence Review is due to be published in final format before the P2P workshop and it will affect our lives for years to come. Make your concerns known now.

Mary Dimmock and Jennie Spotila have written a very important post about a big problem with P2P. With their permission it is being reposted here in its entirety. (Thank you Mary and Jennie.)

P2P: The Question They Will Not Ask

The most important question about ME/CFS – the question that is the cornerstone for every aspect of ME/CFS science – is the question that the P2P Workshop will not ask:

How do ME and CFS differ? Do these illnesses lie along the same continuum of severity or are they entirely separate with common symptoms? What makes them different, what makes them the same? What is lacking in each case definition – do the non-overlapping elements of each case definition identify a subset of the illness or do they encompass the entirety of the population?

Boiled down to its essence, this set of questions is asking whether all the “ME/CFS” definitions represent the same disease or set of related diseases. The failure to ask this question puts the entire effort at risk.

This fundamental question was posed in the 2012 application for the Office of Disease Prevention to hold the P2P meeting (which I obtained through FOIA). It was posed in the 2013 contract between AHRQ and the Oregon Health & Science University for the systematic evidence review (which I obtained through FOIA). It was posed to the P2P Working Group at its January 2014 meeting to refine the questions for the evidence review and Workshop (according to Dr. Susan Maier at the January 2014 Institute of Medicine meeting).

And then the question disappeared.

The systematic evidence review protocol does not include it. Dr. Beth Collins-Sharp said at the June 2014 CFSAC meeting that the Evidence Practice Center is not considering the question because there is “not enough evidence” in the literature to answer the question. However, she said that the P2P Workshop could still consider the question.

Every section of the Workshop agenda lumps all the populations described by the multiple case definitions together, discussing prevalence, tools, subsets, outcomes, presentation, and diagnosis of this single entity.

A 20 minute presentation on “Case Definition Perspective” is the only lip service paid to this critical issue. This is completely inadequate, if for no other reason than because the presentation is isolated from discussions on the Workshop Key Questions and dependent topics like prevalence and natural history. As a result, it is unlikely to be thoroughly discussed unless one of the Panelists has a particular interest in it.

Why is this problematic? Because both the P2P Workshop and the evidence review are based on the assumption that the full set of “ME/CFS” case definitions describe the same disease. This assumption has been made without proof that it is correct and in the face of data that indicate otherwise, and therein lies the danger of failing to ask the question.

What if the case definitions do not actually describe a single disease? If there are disparate conditions like depression, deconditioning, non-specific chronic fatigue and a neuroimmune disease characterized by PEM encompassed by the full set of “ME/CFS” definitions, then lumping those together as one entity would be unscientific.

The most important part of designing scientific studies is to properly define the study subjects. One would not combine liver cancer and breast cancer patients into a single cohort to investigate cancer pathogenesis. The combination of those two groups would confound the results; such a study would be meaningful only if the two groups were separately defined and then compared to one another to identify similarities or differences. The same is true of the P2P evidence review of diagnostics and treatments: assuming that all “ME/CFS” definitions capture the same disease (or even a set of biologically related diseases) and attempting to compare studies on the combined patients will yield meaningless and confounded results if those definitions actually encompass disparate diseases.

There is a growing body of evidence that underscores the need to ask the fundamental question of whether “ME/CFS” definitions represent the same disease:

The P2P Workshop is focused on “extreme fatigue” as the defining characteristic of “ME/CFS,” but fatigue is a common but ill-defined symptom across many diseases. Further, not all “ME/CFS” definitions require fatigue or define it in the same way. For instance, Oxford requires subjective fatigue, and specifically excludes patients with a physiological explanation for their fatigue. But the ME-ICC does not require fatigue; instead it requires PENE, which is defined to have a physiological basis.

When FDA asked CFS and ME patients to describe their disease, we did not say “fatigue.” Patients told FDA that post-exertional malaise was the most significant symptom: “complete exhaustion, inability to get out of bed to eat, intense physical pain (including muscle soreness), incoherency, blacking out and memory loss, and flu-like symptoms.”

Multiple studies by Jason, Brenu, Johnston and others have demonstrated significant differences in disease severity, functional impairment, levels of immunological markers and patient-reported symptoms among the different case definitions.

Multiple studies have demonstrated that patients with PEM have impairment in energy metabolism and lowered anaerobic threshold, and have shown that patients with depression, deconditioning and a number of other chronic illnesses do not have this kind of impairment.

Multiple studies have demonstrated differences in exercise-induced gene expression between Fukuda/CCC patients and both healthy and disease control groups.

The wide variance in prevalence estimates shines a light on the case definition problem. Prevalence estimates for Oxford and Empirical populations are roughly six times higher than the most commonly accepted estimate for Fukuda. Even Fukuda prevalence estimates vary widely, from 0.07% to 2.6%, underscoring the non-specificity of the criteria. Nacul, et al., found that the prevalence using CCC was only 58% of the Fukuda prevalence. Vincent, et al., reported that 36% of Fukuda patients had PEM, representing a smaller population that would be eligible for diagnosis under CCC.

The work of Dr. Jason highlights the danger of definitions that include patients with primary psychiatric illnesses, especially because such patients may respond very differently to treatments like CBT and GET.

By contrast, there have not been any published studies that demonstrate that the set of “ME/CFS” definitions being examined in P2P encompass a single entity or biologically related set of entities. From Oxford to Fukuda to ME-ICC, there are significant differences in the inclusion and exclusion criteria, including differences in the exclusion of primary psychiatric illness. The magnitude of these differences makes the lack of such proof problematic.

Given that treating all “ME/CFS” definitions as a single entity is based on an unproven assumption of the clinical equivalence of these definitions, and given that there is ample proof that these definitions do not represent the same disease or patient population, it is essential that the P2P “ME/CFS” study start by asking this question:

Does the set of “ME/CFS” definitions encompass the same disease, a spectrum of diseases, or separate, discrete conditions and diseases?

The failure to tackle this cornerstone question up-front in both the agenda and the evidence review puts the scientific validity of the entire P2P Workshop at risk. If this question is not explicitly posed, then the non-ME/CFS expert P2P Panel will swallow the assumption of a single disorder without question, if for no other reason than that they do not know the literature well enough to recognize that it is an assumption and not established fact.