The Nature of Research Evidence Usedin Systematic Reviews and Guidelines

Introduction to the NCDDR Task Force Papers

The National Center for the Dissemination of Disability Research (NCDDR) has established three task forces to assist the project in analyzing, understanding, and commenting on features of the evidence production process within the disability and rehabilitation research context.

Task Force on Standards of Evidence and Methods

Task Force on Systematic Review and Guidelines

Task Force on Knowledge Translation/Knowledge Value Mapping

Each task force is comprised of senior researchers with current or recent experience in conducting NIDILRR-sponsored research activities. Each task force is charged with developing positions and statements that are relevant in light of current circumstances.

The Task Force on Systematic Review and Guidelines developed When the Best is the Enemy of the Good. This task force paper explores critical issues related to the "gold standard" for research designs, the emergence of systematic reviews, and implications for evidence-based rehabilitation and clinical practice. This paper is one of two developed by the Task Force in 2008. The first manuscript, entitled "The value of "traditional reviews" in the era of systematic reviewing," has been published in the American Journal of Physical Medicine & Rehabilitation (May 2009, Vol. 88, No. 5, pp. 423-430). In addition, the Task Force has conducted two webcast events. Both of these events are archived and available on the NCDDR web site:"

Webcast 11: The value of "traditional" reviews in the era of systematic reviewing

Webcast 13: When the best is the enemy of the good - The nature of research evidence used in systematic reviews and guidelines

Disclosure:
For the Task Force on Systematic Review and Guidelines sponsored by the National Center for the Dissemination of Disability Research (NCDDR)

RECOMMENDED CITATION:
Dijkers, M. P. J. M. for the NCDDR Task Force on Systematic Review and Guidelines. (2009). When the best is the enemy of the good: The nature of research evidence used in systematic reviews and guidelines. Austin,
TX: SEDL.

ABSTRACT
Evidence-based practice, according to authoritative statements by the founders of this approach to health care, involves using the "best available" evidence in addition to clinical expertise and patient preferences to make decisions on the care of patients. However, many systematic reviewers interpret "best available" as "best possible" and exclude from their reviews any evidence produced by research of a grade less than the highest possible (e.g., the randomized clinical trial [RCT] for interventions), even if that means making no recommendations at all. Voltaire's comment that "the best is the enemy of the good" is applicable here. Rehabilitation would be disadvantaged especially, as it can boast few RCTs, because of its nature. The myopic focus on the "strongest" research designs may also steer researchers away from asking, "What is the best design to answer this research question?" Lastly, rehabilitation and other clinicians need to know not just which interventions are effective, but also how these interventions need to be delivered; information relevant to this latter aspect of knowledge translation is typically produced using "weak" research designs.

Evidence-based practice (EBP) is an approach to health care professional practice that stresses "the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence-based medicine… means integrating individual clinical expertise with the best available external clinical evidence from systematic research" [emphasis added] (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996, p. 71). This is the "authoritative" definition, emulated in such later descriptions as, "Evidence-based practice [is] an approach to patient care that incorporates the use of best evidence from well-designed studies, a clinician's expertise, and patient values and preferences" [emphasis added] (Arizona State University, 2005). Other definitions refer to the research literature without the reference to "best": for instance, "The practice of medicine with treatment recommendations that have their origin in objective tests of efficacy published in the scientific literature rather than anecdotal observations" (Gerontology Research Group, 2007).

Since its emergence in 1991, evidence-based practice has swept first medicine, next other health care disciplines such as nursing and physical therapy, then other professional fields from education to criminal justice; worrying some that its popularity, at least in lip service, may beget its undoing (Feinstein & Horwitz, 1997). However, most rehabilitation practitioners and researchers, whether originally trained prior to the emergence of EBP or during the period that it became a force, are more interested in trying to learn the basics of EBP than in considering its potential negative impact.

While there are variations in descriptions of EBP, most adherents would agree that it involves the following main steps:

Pose a clinical question.

Develop a strategy to find evidence relevant to the question.

Appraise the evidence, in terms of its relevance to the clinical question, and in terms of the strength of the research that produced it.

Synthesize the evidence.

Apply the evidence to practice, taking into account local circumstances and patient values.

Two approaches have developed within this framework. One might be called "bedside EBP," where a single practitioner, faced with a clinical problem, does a quick search on MEDLINE or another database; reviews the abstracts to rapidly identify the strongest and most relevant studies; retrieves copies of these papers; synthesizes their findings and recommendations; and integrates the synthesis with clinical expertise and the patient's circumstances, values, and preferences to answer the initial question. The process is quick, informal, and usually far from systematic. Most practitioners might take a shortcut to the end result by first talking with a trusted colleague, who may have broad clinical experience or extensive knowledge of the literature about the question. That may not be the best evidence available, but it is fast, presumably targeted, and inexpensive.

A second approach to EBP is that taken by groups of clinicians and researchers who join together to develop materials that are of benefit to clinicians in a particular area of health care and others who lack the time (and possibly the skills) to take steps 1 through 5 themselves in anything but a cursory manner. These teams evaluate individual papers for publication of EBP-focused digests in one of the many EBP journals that have sprung up (American College of Physicians Journal Club, Evidence-Based Nursing, Evidence-Based Communication Assessment and Intervention, etc.), create critically-appraised topics (CATs), perform systematic reviews, or even use systematic reviews to develop guidelines for practice.

Systematic reviews are systematic in that the evidence is searched for, evaluated, and synthesized in clearly defined steps, following a protocol that has been written before the review begins. Sometimes protocols are based on specific guidelines such as those of the Cochrane Collaboration (Higgins & Green, 2006) or the American Academy of Neurology (AAN) (Edlund, Gronseth, So, & Franklin, 2004). All systematic reviews use a hierarchy of research designs to sort stronger evidence from weaker, based on a positivist view of "evidence." Sackett (1989) created the first, simple hierarchy of evidence:

Large randomized trials with clear-cut results

Small randomized trials with uncertain results

Non-randomized trials with concurrent or contemporaneous controls

Nonrandomized trials with historical controls

Case series with no controls.

This hierarchy, with its ambiguous "large" versus "small" standard and other problems, now is a historical curiosity. Better hierarchies with 4 to 10 levels have been published for reviews addressing various types of clinical questions: therapy, screening and diagnosis, prognosis, costs. Some claim that even the best hierarchies published disregard developments in research methodology over the last 20 to 40 years. The NCDDR Task Force on Standards of Evidence and Methods is expected to publish shortly its recommendations for evidence grading, specifically grading of evidence in disability/rehabilitation research. The better hierarchies, those of AAN for example, take quality of the research implementation as well as basic research design into account in differentiating stronger from weaker research.

In drawing conclusions and making recommendations, the authors of systematic reviews take into account the quality, quantity, and consistency of the evidence from many papers and other sources. Again, there has been increasing sophistication over time in how this is done. Sackett (1989) distinguished three categories of recommendations, differentiated on the basis of a simple "nose count":

Supported by one or more level 1 studies

Supported by one or more level 2 studies

Supported only by level 3, 4 or 5 studies

The quality, consistency, number, and basic design of the studies may be used to qualify recommendations on a scale ranging from "should/should not be done" through "should/should not be considered" to "may/may not be considered" to "no recommendations."

Unfortunately, many systematic reviews and guidelines published in recent years have adopted an all-or-nothing approach to the evidence base. Cochrane review groups may be the most extreme; in many instances only evidence for therapeutic interventions resulting from randomized clinical trials (RCTs) is accepted. If that level of evidence is lacking, "more research" is recommended, and no recommendations for practice are made. Other groups follow a similar practice, although they may draw the line at a different level in the evidence hierarchy. For instance, AAN guidelines specify that no recommendation should be made if there is not at least one Class II study or two consistent Class III studies, and that the recommendation to be made when this minimum level of evidence is available is to be phrased in terms of "may be considered" or "may not be considered" as appropriate (Edlund et al., 2004).

When a well-respected statistician-methodologist like Douglas Altman goes on record stating,

Only randomised trials allow valid inferences of cause and effect. Only randomised trials have the potential directly to affect patient care — occasionally as single trials but more often as the body of evidence from several trials, whether or not combined formally by meta-analysis (Altman, 1996, p. 570)

it is not surprising that the misunderstanding spreads in EBP circles that only RCTs can contribute information that is of use in clinical decision making. This is also reflected in the following: "Treatment decisions in clinical cardiology are directed by results from randomized clinical trials (RCTs)" (Hernandez, Boersma, Murray, Habbema, & Steyerberg, 2006, p. 257).

It would seem that Voltaire's comment that "the best is the enemy of the good" (Le mieux est l'ennemi du bien) is applicable here. Some systematic review panels or their parent guideline development organizations have raised the bar on the level of evidence required so high that in their reviews no appropriate evidence is discerned, resulting in no recommendation. However, that would appear to go against the grain of EBP as defined by some of its pioneers – as expressed in the quote from Sackett et al. (1996): "judicious use of current best evidence in making decisions." Similar sentiments can be found in other key EBP texts, such as the book by Straus, Richardson, Glasziou, and Haynes (2005, p. 1): "By best research evidence we mean valid and clinically relevant research, often from the basic sciences of medicine, but especially from patient-centered clinical research into the… efficacy and safety of therapeutic, rehabilitative and preventive regimens."

"Best" should be understood in the meaning of "best available," not as "best possible." By repudiating the benefit from whatever value there may be in "flawed" research, the EBP practitioners who refuse to consider anything below a certain evidence grade throw away research that may be informative for the clinical issue in question. Depending on the level of scrutiny applied, they may accept a poorly executed randomized trial over an exemplary case-control study. It would be too bold to state that a panel of reviewers carefully considering meager evidence is always more knowledgeable than the lone clinician who has only his or her own experience and possible uncritical reading of the literature on which to rely. In most instances, however, expert consensus supplemented by weak evidence from the research literature likely is preferable over the lone practitioner's intuition. Thus, systematic reviewers should consider all available research, and not disregard investigations of a quality level below an artificially drawn line.

The disregard of "weaker" studies is especially damaging in rehabilitation, because there are so few clinical trials on which to rely (Johnston, 2003). This shortage is due in large part to the nature of rehabilitation: a coordinated treatment effort of many disciplines all using treatments and approaches individualized to the patient, and focusing on long-term outcomes that are affected by multiple personal and environmental factors that largely are not under control of the rehabilitation team. In addition, realistic placebos are not available for many interventions, and blinding (of providers, and sometimes even of patients) is not feasible. (See Johnston, Sherer, & Whyte, 2006 for additional issues justifying rehabilitation research's "low" evidence levels.) Rehabilitation research is not unique in this respect; behavioral medicine, health services research, and others share the problem that their treatments do not fit the mold of what often is the exemplar in EBP: the drug versus placebo short-term double-blinded RCT.

Another consideration is that in accepting evidence from studies weaker than the RCT it is not simply a matter of settling for second-best. The real question is not "What is the most rigorous research design?" but "At this time, what is the best research design for the research question or practical problem at issue?" "Rigorous" and "best" are not the same. Large RCTs can be premature and can take funds away from the needed development of new interventions. Traditional RCTs apply narrow selection criteria, and therefore their results do not generalize well to a wider universe of patients; "practical clinical trials" have been proposed as a way of producing evidence with more applicability to real life (Tunis, Stryer, & Clancy, 2003). RCTs are largely inapplicable to assistive technology and environmental modifications, which are core interventions in disability and rehabilitation. In some instances, RCTs are unnecessary, because strong evidence can be generated by means of a much weaker design. For instance, who would do an RCT to test whether wheelchairs work? Clearly, standards for "best research design" in disability and rehabilitation as in other health care and human services fields cannot be driven by an insistence on large RCTs or an uncritical application of standards promulgated by certain evidence-based medicine adherents.

A further issue related to the practice of restricting EBP reviews to RCTs is the wide variation in the interventions that may occur in research in areas such as rehabilitation, social services and education. In medical research this may not be a problem when the intervention involves a single active ingredient expressed in an easily measured dosage, such as a drug. But in other professional fields, the "intervention" may consist of much more difficult-to-measure entities such as parent training, job coaching, or self-advocacy training. When the process of synthesizing the body of evidence about these types of interventions is restricted to RCTs, much useful information that could guide practitioners may be lost. Reaching a judgment about the effectiveness of such interventions based on the overall body of evidence often requires selection of studies in which the intervention may have been implemented in many different ways or at many different levels of intensity. The "average effect size across many studies" on which the typical EBP systematic review judgment is based does not provide much guidance for practitioners about how, specifically, to apply the intervention to their own clients or students. In contrast, coupling meta-analysis of RCT studies about a particular intervention with other information gathered from, for example, meta-syntheses of qualitative studies could provide a rich source of guidance for practitioners (Sandelowski, Docherty, & Emden, 1997). If the end goal is the incorporation of best available research into decision making about practices, then for knowledge translation purposes the best that different research approaches have to offer should be included in the synthesis.

No one is likely to claim that RCTs are equal to other designs, at least for demonstrating internal validity—the conclusive proof that a certain intervention has specific positive and negative consequences, compared to placebo or compared to another treatment. (The relative weakness of RCTs versus other designs when it comes to external validity—the generalizing of study findings to a group of which the study subjects are representative—has been discussed extensively in recent literature.) (Horn, DeJong, Ryser, Veazie, & Teraoka, 2005; Tunis et al., 2003). RCTs are better, and if designed and executed well they offer a higher level of confidence that a particular treatment is better than or is not significantly different from another treatment or placebo. This level of confidence in a conclusion based on study data cannot be matched by other, observational, designs, however large the sample or sophisticated the measurement of outcomes. However, this gold standard is feasible only in limited circumstances. There are so many treatments and approaches in rehabilitation that deserve evaluation that application of RCTs to them all could exhaust the National Institutes of Health budget, let alone that of the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR). If we are to gain information on treatments that work in rehabilitation (for specific categories of patients, at a particular stage of their disability) we need to make creative use of research designs that are less restricted and less expensive than RCTs (Horn & Gassaway, 2007).

The argument is not that in all circumstances any level of evidence is better than nothing. If only one small study of questionable quality has been done, and its findings contradict common sense, there obviously is no reason to base recommendations on those findings. And if a number of very similar studies have been done, some of high quality and some of lesser strength, it is defensible to disregard the latter and base recommendations on the former only, although it should be noted that the filtering of studies based on quality is still a contentious issue (Cipriani, Malvini, Furukawa, & Barbui, 2007; Greenland & O'Rourke, 2001; Herbison, Hay-Smith, & Gillespie, 2006; Moher, Cook, Jadad, et al. 1999; van der Velde, van Tulder, Cote, et al. 2007; Verhagen, de Vet, de Bie, Boers, & van den Brandt, 2001; Whiting, Harbord, & Kleijnen, 2005). While it may take the EBP community some more time to determine under what circumstances quality filtering is or is not recommended, the issue addressed here is one tangentially related: what to do in situations where there is no "embarrassment of riches" when it comes to evidence, but only a few studies are available, all of a design weaker than an RCT or equivalent.

If we are to offer guidance to clinicians as to what approaches may or likely will be most effective or efficient with their patients and clients, our systematic reviews need to be more catholic than allowed by the EBP purists, and sometimes accept, by necessity, all levels of evidence. It is never the case that in the absence of recommendations from a systematic review no rehabilitation services are delivered; given the need to help patients with their impairments and problems, rehabilitation clinicians almost always will try something. If that "something" is supported by weak evidence carefully considered by expert clinicians and researchers, it likely will be better than what a single clinician not guided by the literature will create.

As long as the strength of the evidence is carefully set forth and taken into account along with the quantity and consistency of the evidence, little harm is possible, and much benefit may result. Let's not make the best the enemy of the good.

Acknowledgments

Members of the NCDDR Task Force on Systematic Review and Guidelines at the time this analysis was prepared included Michael Boninger, MD, Tamara Bushnik, PhD, Peter Esselman, MD, Steven Gard, PhD, Wayne Gordon, PhD, Allen Heinemann, PhD, Mark Sherer, PhD, David Vandergoot, PhD, and Michael Wehmeyer, PhD, all of whom contributed to the development and critical review of the ideas presented in this communication. Careful review and suggestions by Mark Johnston, PhD, and Jean Ann Summers, PhD, contributed to this statement's final format. Joann Starks of the National Center for the Dissemination of Disability Research provided administrative support to the Task Force.

Marcel P.J.M. Dijkers, PhD, FACRM, is the Facilitator for the Task Force on Systematic Review and Guidelines. He is research professor in the Department of Rehabilitation Medicine, Mount Sinai School of Medicine (MSSM). Dr. Dijkers is project director for the Disability and Rehabilitation Research Project (DRRP) on Classification and Measurement of Medical Rehabilitation Interventions. He is also senior investigator in MSSM's NIDILRR-funded Rehabilitation Research and Training Center (RRTC) on Traumatic Brain Injury (TBI) Interventions, as well as the New York TBI and Spinal Cord Injury (SCI) Model Systems.

Michael L. Boninger, MD, is professor and interim chair of the Department of Physical Medicine and Rehabilitation, University of Pittsburgh, and is director of the NIDILRR-supported University of Pittsburgh Model Center on Spinal Cord Injury (UPMC-SCI). Dr. Boninger is also research physician and medical director for the Veterans Administration (VA) Rehabilitation Research and Development Center of Excellence in Wheelchairs and Associated Rehabilitation Engineering, VA Pittsburgh Healthcare System.

Tamara Bushnik, PhD, is director of the Rehabilitation Research Center at Santa Clara Valley Medical Center (SCVMC) and is principal investigator and co-director of the Northern California Traumatic Brain Injury Model System of Care. In August 2009 she will become the director of Rehabilitation Research at the Rusk Institute for Rehabilitation in New York City.

Peter C. Esselman, MD, is professor and chair of the Department of Rehabilitation Medicine, University of Washington. Dr. Esselman is co-principal investigator of the University of Washington Burn Injury Rehabilitation Model System.

Wayne A. Gordon, PhD, is the Jack Nash Professor of Rehabilitation Medicine and associate director of the Department of Rehabilitation Medicine at the Mount Sinai School of Medicine. He is the director of Research of the Department of Rehabilitation, and the director of Brain Injury Research. He is project director of MSSM's TBI Model System, the RRTC on TBI Interventions, and the Mount Sinai Injury Control Research Center.

Allen W. Heinemann, PhD, ABPP, FACRM is professor in Physical Medicine and Rehabilitation at the Feinberg School of Medicine, Northwestern University and director of the Center for Rehabilitation Outcomes Research, Rehabilitation Institute of Chicago. He is the principal investigator of the Advanced Rehabilitation Research Training project in Rehabilitation Services Research, the Health Services Research DRRP on Medical Rehabilitation, and the RRTC on Measuring Rehabilitation Outcomes and Effectiveness.

Mark Sherer, PhD, ABPP-Cn, FACRM is clinical professor of Physical Medicine and Rehabilitation, Baylor College of Medicine, and senior scientist and director of Research and Neuropsychology at TIRR–Memorial Hermann. Dr. Sherer is principal investigator for the Texas TBI Model System of TIRR.

Joann Starks is the NCDDR Liaison for the Task Force on Systematic Review and Guidelines. She is a program associate for SEDL's NCDDR and for the SEDL partnership with SUNY Buffalo's Center on Knowledge Translation for Technology Transfer.

David Vandergoot, PhD, is project co-director for the Employment Service Systems Research and Training Center (ESSRTC), a NIDILRR-funded RRTC. He is president of the Center for Essential Management Services (CEMS) where he manages all aspects of research, training and demonstration projects.

Michael L. Wehmeyer, PhD, is professor of special education; director of the Kansas University Center on Developmental Disabilities; and senior scientist and associate director, Beach Center on Families and Disability, at the University of Kansas. Dr. Wehmeyer is also principal investigator for two NIDILRR-funded DRRPs: Mental Retardation and Technology, and Impact of Interventions on Self-Determination and Adult Outcomes.

Alexander Libin, PhD, is a senior researcher at the Medstar Research Institute Division at the National Rehabilitation Hospital, and training director for the NIDILRR-funded RRTC on SCI: Promoting Health & Preventing Complications Through Exercise. Dr. Libin is a principal investigator for Department of Defense-funded projects on computerized assessment of executive functioning in neurologic populations, and assistant professor of Physical Medicine and Rehabilitation at the Georgetown University Medical Center in Washington, DC.

The Task Force on Systematic Review and Guidelines is sponsored by the National Center for the Dissemination of Disability Research (NCDDR). NCDDR's Task Force Papers are published by SEDL and the NCDDR under grant H133A060028 from the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR) in the U.S. Department of Education's Office of Special Education and Rehabilitative Services (OSERS).

NCDDR's scope of work focuses on knowledge translation (KT) of NIDILRR-sponsored research and development results into evidence-based instruments and systematic reviews. NCDDR is developing systems for applying rigorous standards of evidence in describing, assessing, and disseminating research and development outcomes.

SEDL operates the NCDDR, which is funded 100% by NIDILRR
at $750,000 per project year. However, these contents do not
necessarily represent the policy of the U.S. Department of
Education, and you should not assume endorsement by the
federal government.

SEDL is an Equal Employment Opportunity/Affirmative
Action Employer and is committed to affording equal
employment opportunities for all individuals in all
employment matters. Neither SEDL nor the NCDDR
discriminate on the basis of age, sex, race, color, creed,
religion, national origin, sexual orientation, marital or
veteran status, or the presence of a disability available in alternate formats upon request.

The contents of this site were developed under grant number 90DPKT0001 from the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR). NIDILRR is a Center within the Administration for Community Living (ACL), Department of Health and Human Services (HHS). The contents of this website do not necessarily represent the policy of NIDILRR, ACL, HHS, and you should not assume endorsement by the Federal Government.