Evaluating the State of Quality-Improvement Science through Evidence Synthesis: Insights from the Closing the Quality Gap Series

Perm J 2013 Fall;17(4)52-61

https://doi.org/10.7812/TPP/13-010

Abstract

Context: The Closing the Quality Gap seriesfrom the Agency for Healthcare Research and Quality summarizes evidence for eight high-priority health care topics: outcomes used in disability research, bundled payment programs, public reporting initiatives, health care disparities, palliative care, the patient-centered medical home, prevention of health care-associated infections, and medication adherence.Objective: To distill evidence from this series and provide insight into the "state of the science" of quality improvement (QI).Methods: We provided common guidance for topic development and qualitatively synthesized evidence from the series topic reports to identify cross-topic themes, challenges, and evidence gaps as related to QI practice and science.Results: Among topics that examined effectiveness of QI interventions, we found improvement in some outcomes but not others. Implementation context and potential harms from QI activities were not widely evaluated or reported, although market factors appeared important for incentive-based QI strategies. Patient-focused and systems-focused strategies were generally more effective than clinician-focused strategies, although the latter approach improved clinician adherence to infection prevention strategies. Audit and feedback appeared better for targeting professionals and organizations, but not patients. Topic reviewers observed heterogeneity in outcomes used for QI evaluations, weaknesses in study design, and incomplete reporting.Conclusions: Synthesizing evidence across topics provided insight into the state of the QI field for practitioners and researchers. To facilitate future evidence synthesis, consensus is needed around a smaller set of outcomes for use in QI evaluations and a framework and lexicon to describe QI interventions more broadly, in alignment with needs of decision makers responsible for improving quality.

Introduction

The quality of health care in the US is widely recognized as needing improvement. Indeed, as many as 50% of all patients, on average, may receive suboptimal care.1-3 Yet quality is improvable, and efforts to make improvements are widespread.1,4,5

Just as medical science focuses on treating ailments and supporting the health of the human body through medical, surgical, pharmacologic, and preventive interventions, the science of quality improvement (QI) focuses on "treating" quality gaps and supporting optimal performance of the health care system through improvement interventions and quality monitoring. A key question for both medical and improvement science is how altering one part of a system—either the human body or the health care system—produces desired results. Additional questions relate to how interventions interact with the surrounding environment and circumstances (the context of change) and how delivery of the intervention (implementation of change) has an impact on effectiveness. Many of the tools of medical research that were tailored to answer such questions have also been applied to improvement science, including systematic reviews and meta-analyses.

In 2004, the Agency for Healthcare Research and Quality (AHRQ) launched a collection of systematic reviews on QI strategies related to high-priority chronic conditions (eg, diabetes, asthma, hypertension), practice areas (eg, prevention of health care-associated infections, antibiotic prescribing behavior), and processes (eg, care coordination) identified by the Institute of Medicine.6-12 AHRQ followed this collection with a new series of eight evidence reports—Closing the Quality Gap: Revisiting the State of the Science—to continue the focus on improving the quality of health care, including current efforts to reward high-quality care through measurement and reporting as well as key tenets of health care reform legislation passed under the Patient Protection and Affordable Care Act.13 In addition, through two cross-topic synthesis projects,14,15 the new series of reports also sought to illuminate broader lessons about the state of QI science by aggregating evidence in a qualitative way across the sample of topics included in the series.

This article builds on that synthesis, summarizing the "state of the science" for the effectiveness, implementation decision factors, and evidence base of the QI field on the basis of findings from the most recent Closing the Quality Gap series of topic reports.

Methods

Series Topics

The Closing the Quality Gap series included eight topics selected by leaders in AHRQ for their relevance to high-priority populations, settings, and processes,4 and to provisions of the Affordable Care Act (Table 1). Selected topics were also ripe for systematic review and expected to yield actionable evidence for patients, practitioners, health systems, and policy makers.

We mapped these topics to three core approaches ("3 Is") for achieving improvements, as noted by health care systems researcher Victor Fuchs,16 who said that real reform "requires changes in the organization and delivery of care that provide physicians with the information, infrastructure, and incentives they need to improve quality and control costs." In today's complex health care system, these leverage points for improvement apply beyond the physician to include other clinicians, systems managers, and patients themselves. The set of topics selected for the series address each of these three core approaches (see Table 1).

Topic Reviews

Each topic was reviewed by a team from an AHRQ Evidence-based Practice Center (EPC) using a standard methods guide.17 Complete details of review methods for each topic are available in the individual topic reports.18-25 A brief summary is presented here. In conjunction with topic-specific technical expert panels, team members of each EPC developed a set of key questions to guide their review. The EPC teams searched a wide variety of literature databases, including at a minimum MEDLINE, and an average of 25 years of literature for each topic (range, 5 to 65 years). They identified relevant articles through multiple rounds of review and abstracted detailed information from each included study. All studies were evaluated for quality and potential bias using a standard protocol. Likewise, when reported and applicable, evaluations of strength of evidence across studies also followed standard methods.17

Cross-Topic Synthesis

Results presented in this article are based on the eight series topic review reports.18-25 We initially provided common guidance to each topic review team for the series to facilitate cross-topic synthesis. Then we reviewed the evidence presented in the reports, including tables and text, to identify cross-cutting themes, take-home lessons, common challenges, and evidence gaps as they relate to the science of QI. Thus, this synthesis is based on comparisons across the series topic reports rather than on primary studies reviewed in those reports. We did not perform quantitative meta-analyses, but instead focused on qualitative synthesis to provide insight into the field of QI. Additional discussion of topic-specific findings and implications for key stakeholder audiences may be found in the series summary report15 and an accompanying methods report.14

Key Questions

We developed a set of series key questions to guide evidence synthesis across series reports. These key questions focus on the "state of the science" for three core aspects of QI: effectiveness, implementation decision factors, and evidence. The key question areas are as follows:

1. What is the state of the evidence for the effectiveness of QI activities? What outcomes have been examined in evaluating effectiveness? What is known about the benefits and harms of particular types of QI strategies or targets?

2. What is the state of the science for factors of likely importance to those individuals and organizations deciding whether and how to implement QI interventions? What is known about the role of context and implementation approaches/challenges in QI activities? What is known about the impact of QI activities on disparities or vulnerable populations?

3. What is the state of QI and implementation science evidence? What gaps exist in the quality of evidence or in methods for evidence synthesis?

We summarized evidence of effectiveness—both benefits and potential harms—for the series topics (excluding disability outcomes, which focused on use of outcomes and did not address effectiveness) and considered the role of outcomes choice in effectiveness evaluations. We also examined evidence of effectiveness for QI strategies by type, using a taxonomy of improvement strategies developed for the first Closing the Quality Gap series.10 We grouped these strategies by the intervention target—patients, clinicians, or systems/organizations—to further analyze evidence of effectiveness.

To examine the state of the science regarding factors likely to inform implementation decisions, we summarized findings from each report that relate to the context of QI implementation or evaluation, implementation approaches and challenges, and the impact of QI efforts on health disparities or on vulnerable populations. Finally, we evaluated the state of the science on the basis of the entire evidence base, summarizing common challenges encountered by the EPCs and identifying gaps in the evidence and in systematic review methods applied to improvement and implementation science.

Results

Key Question 1: Effectiveness of Quality-Improvement Strategies

Table 2 summarizes key findings about the effectiveness of QI efforts for each of the seven series topics that evaluated interventions. Authors of all seven topics found mixed results, with evidence of benefit for some outcomes but not for others. For example, the bundled payment review found evidence that the impact of payment bundling on quality of care depended on the quality measure evaluated (Table 2). The medication adherence review authors found variability in how adherence was defined, and they noted that only a subset of studies reporting improved adherence also showed improvements in other outcomes.

Six reports sought information about potential harms associated with QI interventions (Table 2). Potential harms were evaluated most often for the incentive-based interventions (bundled payment, public reporting), whereas harms were rarely addressed in the literature reviewed for the infrastructure-focused intervention topics (disparities, patient-centered medical home, health care-associated infections, medication adherence). Although the potential for harm from public reporting was widely discussed, the review authors found only limited evidence examining whether harm actually occurred and concluded that evidence of no harm outweighed evidence of harm. The bundled payment review found consistent evidence that single-setting bundled payment programs resulted in care shifting to other settings, but few other potential harms were examined. The review authors noted that most current bundled payment programs are now administered across settings, which is expected to reduce incentives for care shifting.

The disability outcomes review identified 71 different outcomes measures used in evaluating health care for disabled populations. Many of these assessed similar concepts, including health, quality of life, functioning, and patient experience, but used different definitions, tools, and measurement scales. The review authors also noted that researchers' perspective—whether trained and practicing in medicine, rehabilitation, or social services—had a profound impact on the ways in which care and life goals were conceptualized for people with disabilities, influencing their choice of outcomes for evaluation.18

Across the series topics, most QI interventions were multifaceted, using more than one type of improvement strategy (Table 3). There was greater evidence of effectiveness for systems-focused strategies than for either clinician- or patient-focused strategies. However, most evidence of systems-focused strategies related to organizational change, which can encompass many different kinds of activities.10 For most topics examined, clinician-focused strategies were generally less effective than patient-focused strategies, with the exception of interventions aimed at improving clinician adherence to strategies to prevent health care-associated infections. Among the patient-focused strategies, patient education often showed benefit.

In contrast, evidence of effectiveness was mixed for patient and clinician reminder systems and for audit and feedback strategies (Table 3). The latter strategies can be patient-focused when aimed at influencing consumers' decisions about where to seek care, such as through public reporting of quality information. These strategies can be clinician-focused when aimed at motivating clinicians to make changes in their practice on the basis of their performance on quality measures. Alternatively, the strategies can be system-focused if intended to influence organizations' practices or motivate QI efforts. Four reports found evidence related to audit and feedback strategies, showing that they were not effective when targeting patients but were generally effective when targeting clinicians and organizations.

Differences in outcomes seen across topics may reflect topic-specific differences in the locus of control, contextual factors, variable adaptation of intervention components, interaction between intervention components, and underlying barriers to improved performance. Reviews typically found limited details about the presumed mechanism of an intervention for influencing behavior (sometimes referred to as the logic model), limiting synthesis-based insights about which interventions are effective and why.

Key Question 2: Quality-Improvement Implementation Decision Factors

Many of the reports examined three key drivers of QI implementation decisions: the role of context, implementation approaches and challenges, and the impacts of QI efforts on vulnerable populations or health care disparities (Table 4). In assessing contextual factors to determine reasons for amplification or dampening of the effect of an intervention, both the bundled payment and public reporting reviews found evidence that these incentive-based strategies were more effective when financial pressures were greater, such as in competitive markets (public reporting), and in for-profit or financially stressed hospitals (bundled payment). Other reports of contextual factors varied greatly in the type of factors examined and their use in the primary studies, ranging from economic considerations to patient characteristics (disease severity, age, insurance coverage, health needs) and organizational characteristics (leadership, change, resource availability). All five series reports that examined the role of context in some manner (Table 4) found that information on contextual factors was often lacking, incompletely described, or noted only anecdotally.

Four reports examined the impact of QI efforts or choice of evaluation outcomes on health disparities or vulnerable populations (Table 4). Although the available literature was limited, the disparities report found some promise for reducing disparities in health outcomes among racial minorities using collaborative care and targeted patient education interventions. Racial and ethnic minorities were the most widely studied vulnerable populations across the topics.

Key Question 3: State of Quality-Improvement Evidence

The EPC teams conducting the topic reviews encountered several common challenges that limited their ability to synthesize evidence across studies and to address their research questions. Many of these challenges stemmed from limitations in the primary studies. Members of the EPCs for all eight topics observed great heterogeneity in choice and definition of outcomes used for QI evaluations. They also noted study design weaknesses and incomplete reporting of key details such as intervention design and its theoretical basis, contextual factors and impact on outcomes, intervention components, and comparators.

Across the series, just a handful of conclusions were based on moderate or high strength of evidence (the confidence that a conclusion reflects a true effect). They were as follows: reducing the patient's out-of-pocket costs improved medication adherence (moderate strength of evidence), hospital-level public reporting decreased mortality rates (moderate strength of evidence), and public reporting stimulated improvement in competitive markets and among low performers (high strength of evidence). The strength of evidence for most other research questions addressed across the series topics was low or inconclusive.

These limitations in the primary studies created challenges in adapting systematic review methods to the QI literature. The heterogeneity in outcomes, coupled with the complexity of multifaceted, systems-level interventions typical of the QI literature, limited the ability of the EPC teams to quantitatively synthesize results across studies. They instead summarized evidence qualitatively, grouping evidence by particular disease groups, settings, outcomes, or intervention components. Ambiguity around use of key terms in the primary studies (eg, quality improvement itself, as well as some topic-specific terms such as medical home and palliative care) complicated development of search strategies. Other systematic review challenges included assessment of the body of evidence across heterogeneous studies and the lack of statistical or other approaches to synthesize across a diversity of study designs, intervention components, implementation factors, contextual factors, and outcomes.

Some challenges encountered may positively reflect characteristics of QI evidence. Whereas heterogeneity in QI strategies presented difficulties in synthesis and drawing conclusions, this also reflects the variety of strategies used in practice that are likely to be relevant to decision makers. Similarly, heterogeneity in outcomes offers many different lenses through which to view quality of care. Furthermore, despite challenges, the methodologic quality of the evidence base has improved, as noted by the authors of the report on health care-associated infections.21 All reports found a body of evidence to synthesize. Most reports included various study types to complement evidence from controlled trials, providing additional detail that improved the usefulness of the reports.

Discussion

This Closing the Quality Gap series systematically reviewed and synthesized evidence relating to eight QI topics. Although far from inclusive of all QI efforts, the eight topics included within this series represent a sample of the range of topics, populations, settings, strategies, and improvement targets within the broader universe of QI science; they cover three critical leverage points for improving care: information, incentives, and infrastructure.16

Individually, each of the series reviews offers detailed information that can help inform QI efforts and decisions related to its respective topic. Viewing the evidence together across series reports revealed broader insights. For example, the finding that both the incentive-based improvement topics (bundled payment and public reporting) were sensitive to the market context—competitiveness of the health care market, financial pressure on delivery organizations—suggests that particular attention should be paid to the market and financial context of any incentive-based improvement efforts. Context is likely also important to consider for information and infrastructure-based improvement efforts. The disability outcomes reviewers observed that the professional background of researchers influenced their conception of how to evaluate interventions for disabled populations, highlighting the relevance of the evaluation context, especially choice of outcomes.

Looking across topics, the series also found evidence supporting the effectiveness of broader types of intervention strategies, in particular organizational change. Although specific studies varied with respect to the kinds of organizational change implemented (eg, collaborative care, patient-centered medical home, case management) and ways in which organizational change was combined with other intervention strategies, these results suggest that this is likely an important component of many effective QI interventions.

Additional patterns of effectiveness became apparent when examining the target for improvement strategies. Public reporting, an example of an audit and feedback strategy, was generally effective in changing clinician and organizational behaviors, but not patients' behavior. Qualitative evidence included in the public reporting review supported this finding. Interventions that focused solely on clinicians as a target group tended to demonstrate less benefit, with the exception of the topic of health care-associated infections.

The teams from the EPCs also identified a gap in examination and reporting of potential harm from QI activities. Although examination of side effects of medical therapies is expected in the medical literature, the reviews revealed that few studies of QI efforts have addressed the potential for unintended negative consequences. Among the series topics, public reporting had received the most attention toward potential harms, but even for this topic, the reviewers found that the potential for harm was discussed far more often than it was evaluated. This gap in QI evidence is ripe for development, and it may require guidelines for evaluating and reporting harms that may be far-reaching or that may occur well after the initial intervention.

In addition to these insights, synthesis of evidence across series topics also sheds light on the "state of the science" for the QI field itself. The common challenges experienced by the teams from the EPCs highlight areas where additional methods or conceptual development is needed. Inconsistencies in how interventions are described in the literature point to the need for an underlying framework and lexicon to describe QI interventions. Although a framework and terminology must be flexible enough to cover the diverse universe of QI strategies, consistent use of a common set of terms would help facilitate synthesis of results across studies, as was done in the cross-topic review presented in this article. Table 5 presents an example of a typology used to describe improvement interventions, adapted from the medication adherence review.24 As Table 3 demonstrated, combining one element of this typology—the intervention target—with the taxonomy of improvement strategies used in the original Closing the Quality Gap series10 provided insights that apply across topics and that were not readily apparent without this structure. Reaching consensus around a common framework and lexicon for QI science requires further development, but the approach demonstrated in this synthesis presents a useful starting place in that endeavor.

The evidence base is growing regarding the importance of context for quality and patient safety topics,26-28 yet all five series reports that examined the role of context found that implementation context was rarely described in the QI literature. The teams from the EPCs recommended that contextual factors be more frequently and robustly measured and reported. To accomplish this will require development of reliable and valid measures of such factors, but at this early stage of exploration, little is known about which contextual factors are important to measure, and how to do so. Thus, filling this knowledge gap will require iterative measure development, measurement, research, and refinement of the measures. Each of these steps will contribute valuable knowledge to the field. Table 5 includes a starter set of contextual factors, adapted from the patient safety field,26 that can help lend structure and a common language to future work around implementation context. These context factors also map well to the Consolidated Framework for Implementation Research.29,30

This "meta" review evinces the promise of scaling up knowledge across topics through a structured qualitative synthesis, in this case relying on a common conceptualization of different levers (information, infrastructure, and incentives) for influencing behavior change to improve clinical and economic outcomes, a typology of QI strategies and contexts (Table 5), and attention to potential harms and vulnerable populations. To foster useful description and synthesis, we also recommend extending a framework acronym that is commonly applied to systematic reviews of clinical interventions to the needs of QI evaluation. Thus, PICOTS (population, intervention, comparator, outcomes, timing, setting) becomes PLICCOTS, adding "L" for logic model, and "C" for context. The overarching question for QI studies is then: for a defined population, what is the logic argument for a complex intervention working better than its comparator in a given context to produce outcomes (of interest to QI) within a time period and setting?

All eight reviews in this series were limited in their ability to synthesize the available evidence and draw conclusions across studies in part because of the extreme heterogeneity in the outcomes reported and ways in which those outcomes were measured, which highlights the need for more consistent outcomes measurement. Developing a set of consensus-based, clearly defined, and fully specified outcomes measures for use in QI research would help facilitate evidence synthesis by enhancing comparability across studies, although use of standardized measures must be balanced with the need to tailor choice of outcomes to the goals of particular QI efforts or research studies. Some efforts to develop core measure sets and harmonize quality measures are underway31-35 and hold promise for advancing the state of QI science if the resulting consensus-based measures are widely used. For example, the US Department of Health and Human Services (HHS), as a part of the National Strategy for Quality Improvement in Health Care, is developing measure selection processes with public input and the goal of aligning measures across new and existing HHS programs and focusing on patient outcomes and patient experience of care. Because of this effort, multiple measures for blood pressure control in use across HHS were identified, and in the future a consensus-developed set of measures will be used across all HHS programs.34

Evidence synthesis by the series EPCs was further hampered by limitations in applying study designs and systematic review methods that were developed for evaluating clinical interventions to the kinds of multifaceted, context-dependent, systems-level interventions and implementation approaches typical of the QI field. However, the topic teams did explore various approaches to improve the relevance of the reports for decision makers. For example, the public reporting review included qualitative research to complement the quantitative studies, and the patient-centered medical home review included a horizon scan to inform decision makers about ongoing research. Because of the context-dependent nature of QI interventions, other complementary methods may inform questions related to policy and practice, and may provide information for better decision making. These methods could potentially help address the diversity of intervention components, implementation factors, and context. Advances could include qualitative research synthesis techniques, exploration of methods to systematically identify and assess gray literature, and exploration of methods to assess and incorporate a variety of study designs. Further methodologic attention to meta-analytic approaches is also needed to achieve sufficient statistical power with relatively few intervention units (eg, hospitals, clinics, health systems) for organization-level interventions. Although it is beyond the scope here to describe specific methods, the choice of method will depend on the anticipated use of the review, the type of questions asked, underlying assumptions, and breadth and depth of the proposed review. Overall, the preponderance of low strength of evidence findings and limited information on additional considerations of interest to local decision makers (eg, context, implementation approaches/challenges, vulnerable population impact) found across the eight series reports speaks to the immaturity of the QI and implementation science fields. In these fields, decision-salient research questions and standards for robust and complementary study design continue to evolve.

Although synthesizing evidence across the series topics provided valuable insight into the state of QI science, the eight topics in the series represent just a sample of the QI field. Findings from this synthesis can help guide future QI efforts and suggest directions for future research but do not represent conclusive evidence of effectiveness or associations between particular strategies and other important factors. In addition, findings reported in this synthesis are presented in broad terms; much detail about the particular populations, settings, outcomes, and strategies included in the primary studies is omitted for the sake of highlighting conclusions that are applicable across major portions of the health care system. The individual topic reports provide much greater granularity in their findings and should be consulted to interpret particular topic-specific findings.

Conclusion

This series synthesis highlights the value in expanding our view from the level of individual improvement efforts to examine effectiveness of QI strategies across initiatives, topics, and targets. Limitations in the literature encountered by the EPCs point to areas in need of more rigorous standards for study design and reporting, methodologic weaknesses in need of further development, and research questions ripe for exploration. The findings also highlight common challenges limiting much of the QI literature, in particular, the lack of consensus around key outcomes important for evaluating QI effectiveness, gaps in analyzing other factors important to decisions about implementing a particular QI strategy, and weaknesses in study design and analytic methods. Using these challenges and methodologic weaknesses to generate practical and scientifically sound solutions can help guide future research efforts and the development of the QI field.

Disclosure Statement

During the writing of this article, CC was employed by the Agency for Healthcare Research and Quality (AHRQ), Rockville, MD. The author(s) have no conflicts of interest to disclose.

This work was supported by the AHRQ (Contract no. 290-2007-10062-I). AHRQ did not play any role in study design, data collection, analysis, and interpretation for the Closing the Quality Gap systematic reviews. The views expressed in this paper are those of the authors and do not necessarily represent the views of the US Department of Health and Human Services and the Agency for Healthcare Research and Quality.

Acknowledgments

The authors thank the author teams and AHRQ's task order officers from each of the Closing the Quality Gap reports in the series for undertaking these challenging topics and supporting efforts to develop common approaches to allow the synthesis reported here.

5. Committee on Quality of Health Care in America, Institute of Medicine. Crossing the quality chasm: a new health system for the 21st century. Washington, DC: The National Academies Press; 2001 Mar 1.