Evidence-based medicine

This editable Main Article has an approved citable version (see its Citable Version subpage). While we have done conscientious work, we cannot guarantee that this Main Article, or its citable version, is wholly free of mistakes. By helping to improve this editable Main Article, you will help the process of generating a new, improved citable version.

Evidence-based medicine is "the conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients."[1] Alternative definitions are "the process of systematically finding, appraising, and using contemporaneous research findings as the basis for clinical decisions"[2] or "evidence-based medicine (EBM) requires the integration of the best research evidence with our clinical expertise and our patient's unique values and circumstances."[3] Better known as EBM, evidence based medicine has roots in clinical epidemiology and the scientific method, and emerged in the early 1990's in response to discoveries about variations and deficiencies[4] in medical care to help healthcare providers and policy makers evaluate the efficacy of different treatments.

Evidence-based practice is not restricted to medicine; dentistry, nursing and other allied health science are adopting "evidence-based medicine" as well as alternative medical approaches, such as acupuncture[5][6]. Evidence-Based Health Care or evidence-based practice extends the concept of EBM to all health professions, including management[7][8] and policy[9][10][11].

Ask

To "ask" and formulate a well-structured clinical question, i.e. one that is directly relevant to the identified problem, and which is constructed in a way that facilitates searching for an answer. The question should have four 'PICO' elements[16][17]: the patient or problem (P); the medical intervention or exposure (e.g., a cause for a disease) (I); the comparison intervention or exposure (C); and the clinical outcomes (O). The better focused the question is, the more relevant and specific the search for evidence will be.[18]

The ability to "acquire" evidence, also called information retrieval, in a timely manner may improve healthcare.[25][26] Unfortunately, doctors may be led astray when acquiring information because of difficulties in selecting best articles[27], or because individual trials may be flawed or their outcomes may not be fully representative.[28]

Research conflicts on the ability of end-users of MEDLINE. One study found that users were almost as likely to misinterpret articles found as correctly interpret them.[28] A second study that was in vitro found that searching for evidence was helpful.[29]

One proposed structure for a comprehensive evidence search is the 5S search strategy,[15] and 6S[30] which starts with the search of "summaries" (textbooks). [31]

Appraisal of evidence

The U.S. Preventive Services Task Force (USPSTF) [32] grades its recommendations for treatments according to the strength of the evidence and the expected overall benefit (benefits minus harms). Its grades are:

C.— Fair evidence that the treatment can improve health outcomes but the balance of benefits and harms is too close to justify a general recommendation.

D.— Recommendation against routinely providing the treatment to asymptomatic patients. There is fair evidence that [the service] is ineffective or that harms outweigh benefits.

I.— The evidence is insufficient to recommend for or against routinely providing [the service]. Evidence that the treatment is effective is lacking, of poor quality, or conflicting and the balance of benefits and harms cannot be determined.

The USPSTF also grades the quality of the overall evidence as good, fair or poor:

Fair: Evidence is sufficient to determine effects on health outcomes, but its strength is limited by the number, quality, or consistency of the individual studies, generalizability to routine practice, or indirect nature of the evidence on health outcomes.

Poor: Evidence is insufficient to assess the effects on health outcomes because of limited number or power of studies, flaws in their design or conduct, gaps in the chain of evidence, or lack of information on important health outcomes.

To "Appraise" the quality of the evidence in a study is very important, as one third of the results of even the most visible medical research is eventually either attenuated or refuted.[19] There are many reasons for this[33]; two of the most common being publication bias[34] and conflict of interest[35] (see article on Medical ethics). These two problems interact, as conflict of interest often leads to publication bias.[36][34] Complicating the appraisal process further, many (if not all) studies contain some design flaws, and even when there are no clear methodological flaws, any outcome of a test that is evaluated by statistical test has a margin of error: this means that some positive outcomes will be "false positives".

Often, only the abstract of an article will be read,[37] but many abstracts contain errors.[38] These are usually errors of omission rather than contradiction, but abstracts often over-emphasise positive findings and neglect to mention limitations.

When abstracts are read, readers may be biased [39] and make illogical conclusions[40].

The initial steps in reading an article are determining what the article's conclusion and whether its conclusion, if valid, is important.[41]

Overall risk in the population in a study can be assessed by examining outcome rr prevalence rates in the control groups.[42]

"Levels of evidence" are used for describing the strength of a research study.[44][45] The Equator network is a collection of standards to improve the reporting of health research. An example is the Consort statement for the reporting of randomized controlled trials.

Publication biasPublication bias is "the influence of study results on the chances of publication [in academic journals] and the tendency of investigators, reviewers, and editors to submit or accept manuscripts for publication based on the direction or strength of the study findings. Publication bias has an impact on the interpretation of clinical trials and meta-analyses. Bias can be minimized by insistence by editors on high-quality research, thorough literature reviews, acknowledgment of conflicts of interest, modification of peer review practices, etc."[46]

The presence of a conflict of interest has many effects in medical publishing.

Statistical analysis
Common problems include small sample sizes in subgroup analyses[47], problems of "multiple comparisons" when several outcomes are being assessed, and biasing of study populations by selection criteria.

Statistical significance:
The statistical significance of the outcome of a study is often summarized by a "P-value" that expresses the likelihood that an observed difference between treatment groups reflects a true difference in treatment effect; the P value is a calculation of the chance that the observed difference reflects the chance outcome of random sampling. Some have argued that focusing on P values neglects other important sources of knowledge and information that should be used to assess the likely efficacy of a treatment [51] In particular, some argue that the P-value should be interpreted in light of how plausible is the hypothesis based on the totality of prior research and physiologic knowledge.[52][51][53] Bayesian inference formalizes this approach to statistical significance.

Application

It is important to "apply" the best practices found to the correct situation. One common problem in applying evidence is that both patients and healthcare professionals often have difficulties with health numeracy and probabilistic reasoning.[54] Another problem is successful clinical reasoning at the bedside.[55] Successful reasoning is associated with the use of pattern matching [56][57] and recognizing clinical findings that are "pivot" or specific to certain diseases[57]. A third problem is to identify exactly which patients will benefit from the new practices. Extrapolating study results to the wrong patient populations (over-generalization)[58][59][60] and not applying study results to the correct population (under-utilization)[61][62] can both increase adverse outcomes.

The problem of over-generalizating study results may be more common among specialist physicians.[63] Two studies found specialists were more likely to adopt cyclooxygenase 2 inhibitor drugs before the drug rofecoxib was withdrawn by its manufacturers because of its unanticipated adverse effects [64][65]. One of the studies went on to state:

"using COX-2s as a model for physician adoption of new therapeutic agents, specialists were more likely to use these new medications for patients likely to benefit but were also significantly more likely to use them for patients without a clear indication".[65]

Similarly, orthopedists provide more expensive care for back pain, but without measurably increased benefit compared to other types of practitioners.[66] Some of the reason that subspecialists may be more likely to adopt inadequately studied innovations is that they read from a spectrum of journals that have less consistent quality.[67] Articles from specialty journals have been noted to persist in supporting claims in the medical literature that have been refuted.[68]

The problem of under-utilizing study results may be more common when physicians are practising outside their expertise. For example, specialist physicians are less likely to under-utilize specialty care[69][70], while primary care physicians are less likely to under-utilize preventive care[71][72].

Cost of Preventing an Event (COPE)[76]. For example, to prevent a major vascular event n a high-risk adult , the number needed to treat is 19, the number of years of treatment are 5, and the daily cost of the generic drug is 68 cents. The COPE is 19 * 5 * ( 365 * .68) which equals $23,579 in the United States.

Years (or months or days) of life saved. "A gain in life expectancy of a month from a preventive intervention targeted at populations at average risk and a gain of a year from a preventive intervention targeted at populations at elevated risk can both be considered large."[77]

Original research studies: levels of evidence

'Levels of evidence' were first proposed in 1979 bye the Canadian Task Force on the Periodic Health Examination[44] then modified by the American College of Chest Physicians[78][79] to create a framework for judging the strength of research. An example of levels of evidence is, based on prior work[45]:

Discovery and explanation. In this view, the randomized controlled trial is not supreme and observational studies, subgroup analyses, and secondary analyses are important.

In practice, randomized controlled trials are available to support only 21%[82] to 53%[83] of principal therapeutic decisions.[84] Due to this, evidence-based medicine has evolved to accept lesser levels of evidence when randomized controlled trials are not available.[85]

The Grade Working Group recognizes that assessing medical evidence involves attention to other dimensions than study design, and expanded upon the levels of evidence in 2004.[86][87] These dimensions include: study design; study quality; consistency of study results; and directness (e.g. how close does the research study mirror clinical practice)

Summarizing the evidence

Assessing a body of research evidence

Several systems have been proposed to assess whether a body of research evidence establishes causality.

"The bacterium must be isolated from the diseased host and grown in pure culture."

"The specific disease must be reproduced when a pure culture of the bacterium is inoculated into a healthy susceptible host."

"The bacterium must be recoverable from the experimentally infected host."

In the 1880's Koch proposed postulates (Koch's Postulates) that suggest association.[89] However, even though the postulates focus on establishing casuality among infections diseases, examples have emerged that violate his postulates.[88]

More explicit methods have been proposed by groups such as the U.S. Preventive Services Task Force (USPSTF) (see yellow text box above), the American College of Chest Physicians[78][79] and the Grade Working Group[91] The Grade Working Group combines the quality of research studies with their clinical relevance.[86] For example, this allows comparing a randomized control trial in a population that is not exactly like the population of interest versus a cohort study that is from a relevant population. The concept that there are multiple dimensions in assessing evidence has been expanded to a proposal that assessing evidence is not a linear process, but is circular.[92]

Diamond has proposed a method using Bayesian beliefs for assessing medical evidence that categorizes beliefs similarly to how evidence is assessed in the legal system.[93]

The legal system

Systematic review

A systematic review is a summary of healthcare research that involves a thorough literature search and critical appraisal of individual studies to identify the valid and applicable evidence. It often, but not always, uses appropriate techniques (meta-analysis) to combine studies, and may grade the quality of the particular pieces of evidence according to the methodology used, and according to strengths or weaknesses of the study design. While many systematic reviews are based on an explicit quantitativemeta-analysis of available data, there are also qualitative reviews which nonetheless adhere to the standards for gathering, analyzing and reporting evidence.

Clinical practice guidelines

Clinical practice guidelines are defined as "Directions or principles presenting current or future rules of policy for assisting health care practitioners in patient care decisions regarding diagnosis, therapy, or related clinical circumstances. The guidelines may be developed by government agencies at any level, institutions, professional societies, governing boards, or by the convening of expert panels. The guidelines form a basis for the evaluation of all aspects of health care and delivery."[94]

Incorporating evidence into clinical care

Medical informatics

Practicing clinicians cite the lack of time for keeping up with emerging medical evidence that may change clinical practice.[95]Medical informatics is an essential adjunct to EBM, and focuses on creating tools to access and apply the best evidence for making decisions about patient care.[3] Before practicing EBM, informaticians (or informationists) must be familiar with medical journals, literature databases, medical textbooks, practice guidelines, and the growing number of other evidence-based resources, like the Cochrane Database of Systematic Reviews and Clinical Evidence.[95] Similarly, for practicing medical informatics properly, it is essential to have an understanding of EBM, including the ability to phrase an answerable question, locate and retrieve the best evidence, and critically appraise and apply it.[96][97]

Health care quality assurance

Criticisms of EBM

Statue of David Hume. "Man is a reasonable being; and as such, receives from science his proper food and nourishment: But so narrow are the bounds of human understanding, that little satisfaction can be hoped for in this particular..."Hume recognised clearly the difficulties in gaining a general understanding merely by accumulating observations.

Excessive reliance on empiricism and deduction

EBM has been criticized as an attempt to define knowledge in medicine in the same way that was done unsuccessfully by the logical positivists in epistemology, "trying to establish a secure foundation for scientific knowledge based only on observed facts" [100]and not recognizing the fallible nature of knowledge in general.[101] The problem of relying on empiric evidence as a foundation for knowledge was recognized over 100 years ago and is known as the "Problem of Induction" or "Hume's Problem".[102] Alternative logic is induction and abduction.[103]

Inability to individualize for each patient

A general problem with EBM is that it seeks to make recommendations for treatment that (on balance) are likely to provide the best treatment for most patients. However what is the best treatment for most patients is not necessarily the best treatment for a particular individual patient. The causes of disease, and the patient responses to treatment all vary considerably, and are affected for example by the individual's genetic make-up, their particular history, and by factors of individual lifestyle. To take these properly into account requires the clinical experience of the treating physician, and over-reliance upon recommendations based upon statistical outcomes of treatments given in a standardised way to large populations may not always lead to the best care for a particular individual.

Ulterior motives

An early criticism of EBM is that it will be a guise for rationing resources or other goals that are not in the interest of the patient.[104][105] In 1994, the American Medical Association helped introduce the "Patient Protection Act" in Congress to reduce the power of insurers to use guidelines to deny payment for a medical services.[106] As a possible example, Milliman Care Guidelines state they produce "evidence-based clinical guidelines since 1990".[107] In 2000, an academic pediatrician sued Milliman for using his name as an author on a practice guidelines that he stated were "dangerous" [108][109][110] A similar suit disputing the origin of care decisions at Kaiser has been filed.[111] The outcomes of both suits are not known.

EBM not recognizing the limits of clinical epidemiology

EBM is a set of techniques derived from clinical epidemiology, but a common criticism is that epidemiology can show association but not causation. While clinical epidemiology has its role in inspiring clinical decisions if it is complemented with testable hypotheses on disease,[114] many critics consider that EBM is a form of clinical epidemiology which became so common in health care systems, and imposed such an empiricist bias on medical research, that it has undermined the notion of causal inference in clinical practice.[115] It is argued that it has even become condemnable to use common sense,[116] as was cleverly illustrated in a systematic review of randomized controlled trials studying the effects of parachutes against gravitational challenges (free falls).[117]

Lorenz butterfly showing two attractors in a complex, chaotic system. See complexity science.