Reality-Based Evidence: Prospecting for the Elusive Gold Standard

By Anthony Rosner, PhD, LLD [Hon.], LLC

Sometimes when I'm seeking what one would imagine to be absolutes or gold standards, I'm shocked and frankly amused to find otherwise. There always seems to be extenuating circumstances. So it is with meta-analyses, randomized clinical trials, or even medical guidelines - a topic I've addressed at some length in this space previously1 with references to Damon Runyon's classic, A Story Goes with It.2 And so it is also with Bart Koes' recent and insightful review of clinical guidelines issued in 11 countries from 1994-2000 for the management of low back pain.

You would think that, given the fact that the available scientific evidence used to construct these guidelines is the same, the guidelines appearing should be similar, irrespective of their country of origin - right? Wrong. It turns out that the guidelines compared from the United States; United Kingdom; The Netherlands; Israel; New Zealand; Finland; Australia; Switzerland; Germany; Denmark; and Sweden were alike in six basic aspects: [i] diagnostic triage, history taking, and physical examination; [ii] their conclusion that radiographs were not useful for managing nonspecific low back pain; [iii] their recognition of the importance of psychosocial factors; [iv] their discouragement of bed rest; [v] certain stipulations regarding the prescription of medications; and [vi] their concluding that the vast majority of low back pain cases should be managed in a primary care setting. However, the guidelines differed significantly in their recommendations for exercise therapy, muscle relaxants, and patient information. While the guidelines from most countries sanctioned spinal manipulation for pain relief, those from Australia and Israel did not. The Dutch guidelines straddled the fence, somewhat, by recommending spinal manipulation for acute, but not chronic low back pain.3

This tells us that human and cultural values are very much at work here, dispersing what you would think would be logical and uniform conclusions to the four winds. Very much the same phenomenon has caused the conclusions of 25 different meta-analyses to deliver diametrically opposing results, depending upon whose values are employed,4 as discussed in this space previously.1 In an insightful review of common omissions of meta-analyses, Feinstein reveals that groups of patients of varying homogeneity are often tossed into only a single analysis (a mixed salad), whereas the clinician needs to know more about subgroups of patients bearing more clinical resemblance to the patient likely to walk into a doctor's office. Such real-world issues as severity of the illness, comorbidities, and pertinent co-therapies are often ignored in meta-analyses.5

Back in the realm of guidelines, there are more problems to iron out. For one, it has been shown that in properly adhering to methodological standards in the analysis of evidence, formulating recommendations, and developing the guidelines themselves, ratings were significantly below 50 percent overall. Only a slight improvement in guideline quality has been evident in comparisons from 1997 to before 1990.6 For another, there is always the question of noncompliance to consider - although in this particular instance this may be predominantly the result of practitioner conscientiousness instead of laxity. Regarding the use of appropriate blood chemistry tests, for example, van Wijk has recently indicated that noncompliance with guidelines may be primarily caused by adding tests - by practitioners applying new medical insight before it is incorporated into a revision of that guideline.7

The foregoing discussion immediately leads to questions as to how often and how guidelines should be updated. In reviewing the "late great" AHCPR guidelines - all 17 that were issued before the politically-driven orthopedic group called the Center for Patient Advocacy managed to kill that venerable body's desire to issue any additional guidelines,8 Shekelle concluded that 80 percent of the guidelines were still valid after a mean shelf-life of 4.4 years, and that 50 percent were still valid after an average of only 5.8 years. Interestingly, the well-known set of guidelines addressing low-back pain9 were considered to require only minor changes, primarily concerning the need for more refined estimates of effects, tempering the recommendations made with reference to the limited effectiveness of back schools, lumbar corsets, and epidural steroid injections.10 In this regard, the current efforts of both the Oregon Board of Chiropractic Examiners and the Chiropractic Council on Guidelines and Practice Parameters to update chiropractic guidelines are to be commended.

As far as the "how" question of guideline development, it is essential to remember that the rowdy qualities of human bias, subjectivity, and disagreement extend well into the rarefied echelons of randomized clinical trials, meta-analyses, and actual clinical guidelines. Interestingly, what had been considered to be a lower stratum of evidence in quality (observational studies) compared to randomized clinical trials may not be - as it has been recently reported - that in terms of treatment effects, estimates from observational studies conducted since 1984 do not appear to be consistently larger, or qualitatively different from those obtained in more fastidiously constructed randomized clinical trials.11 Indeed, the entire pecking order of the quality of scientific clinical evidence, as it appears to have been envisioned by conventional wisdom, has been seriously questioned, such that observations taken in the doctor's office assume a far greater value in establishing an evidence base with external as well as internal, validity.12

It gets even more complicated. Tonelli points out that there will always be a region (an epistemological zone)13 in which discrete differences between individuals cannot be made explicit and quantified. Horwitz does one better by declaring that to assume that the entire range of clinical treatment in any modality has been successfully captured by the precision of existing analytical methods in the scientific literature, "would be like saying that a medical librarian who has access to systematic reviews, meta-analyses, Medline, and practice guidelines provides the same quality of healthcare as an experienced physician."14 To me, this entire tale of complexities and the lack of easy answers conjures images of a parallel universe, masterfully captured by a cartoonist (whose name I have long forgotten) who envisioned the astonishing experience of a visitor to Euro-Disney. Instead of meeting the classical life-sized Mickey Mouse uttering such expressions as "Golly, gee - hello!" this hypothetical visitor would find a bearded version of Mickey sitting in a bar over a beer slung over a copy of Sartre and offering with a casual nod, "Life is complex, n'est-ce pas?"

What all this is simply saying is that the entire range of research and clinical experience needs to be called into play when making a clinical judgment. This would mean that the "lowly" case study takes a rightful place in the pantheon of accumulated evidence supporting a particular type of healthcare intervention. As David Eisenberg once remarked, Chinese medicine endured into the 20th century without documentation from clinical trials because it was substantiated by 3,000 years of case studies. Only 15 percent of what we regard as modern medicine appears to have been supported by any scientific evidence at all,15 and only one percent has been described as sound.16 Within this framework, therefore, any reasonable statements that engender public inquiry and research, while assessing the orientation of what both published scientific studies in the journals and careful clinical observations are telling us, is the most productive course to follow. FCER, in its promotion of the acquisition and dissemination of pertinent research data, appears to have successfully followed the dictates laid down over 90 years ago by no less an authority than D.D. Palmer:

"If the author of this and other schemes would put in as much time and energy in developing the science, art and philosophy of chiropractic as he does in enveloping them, he would advance instead of retard them."17

To sum up, whoever is seeking documentation of clinical practice needs to be critical enough to avoid the lure of the gold standard in assessing evidence, so as not to end up like the three prospectors in "The Treasure of Sierre Madre" who, much to their great horror, find that they have come up with fool's gold.

References

Rosner A. Tales from the crypt: Fables of foibles, or RCTs that go bump in the night. Dynamic Chiropractic 2000;18(3).

Rosner A. A story goes with it: Otitis media and the sanctity of medical guidelines. Dynamic Chiropractic 2001;19(1).

Comments are encouraged, but you must follow our User Agreement
Keep it civil and stay on topic. No profanity, vulgar, racist or hateful comments or personal attacks. Anyone who chooses to exercise poor judgement will be blocked. By posting your comment, you agree to allow MPA Media the right to republish your name and comment in additional MPA Media publications without any notification or payment.