Category Archives: critical appraisal

Post navigation

Status

Another Study Warns That Evidence From Observational Studies Provides Unreliable Results For Therapies

We have previously mentioned the enormous contributions made by John Ioannidis MD in the area of understanding the reliability of medical evidence. [Ioannidis, Delfini Blog, Giannakakis] We want to draw your attention to a recent publication dealing with the risks of relying on observational data for cause and effect conclusions. [Hemkens] In this recent study, Hemkens, Ioannidis and other colleagues assessed differences in mortality effect size reported in observational (routinely collected data [RCD]) studies as compared with results reported in RCTs.

Eligible RCD studies used propensity scores in an effort to address confounding bias in the observational studies. The authors compared the results of RCD and RCTs. The analysis included only RCD studies conducted before any RCT was published on the same topic. They assessed the risk of bias for RCD studies and randomized controlled trials (RCTs) using The Cochrane Collaboration risk of bias tools. The direction of treatment effects, confidence intervals and effect sizes (odds ratios) were compared between RCD studies and RCTs. The relative odds ratios were calculated across all pairs of RCD studies and trials.

These authors remind us yet again that If no randomized trials exist, clinicians and other decision-makers should not trust results from observational data from sources such as local or national databases, registries, cohort or case-control studies.

Status

“Reading” a Clinical Trial Won’t Get You There—or Let’s Review (And Apply) Some Basics About Assessing The Validity of Medical Research Studies Claiming Superiority for Efficacy of Therapies

An obvious question raised by the title is, “Get you where?” Well, the answer is, “To where you know it is reasonable to think you can trust the results of the study you have just finished reading.” In this blog, our focus is on how to critically appraise medical research studies which claim superiority for efficacy of a therapy.

Because of Lack of Understanding Medical Science Basics, People May Be Injured or Die

Understanding basic requirements for valid medical science is very important. Numbers below are estimates, but are likely to be close or understated—

Over 63,000 people with heart disease died after taking encainide or flecainide because many doctors thought taking these drugs “made biological sense,” but did not understand the simple need for reliable clinical trial information to confirm what seemed to “make sense” [Echt 91].

An estimated 60,000 people in the United States died and another 140,000 experienced a heart attack resulting from the use of a nonsteroidal anti-inflammatory drug despite important benefit and safety information reported in the abstract of the pivotal trial used for FDA approval [Graham].

In another example, roughly 42,000 women with advanced breast cancer suffered excruciating side effects without any proof of benefit, many of them dying as a result, and at a cost of $3.4 billion dollars [Mello].

At least 64 deaths out of 751 cases in nearly half the United States were linked to fungal meningitis thought to be caused by a contaminated treatment that is used for back and radicular pain—but there is no reliable scientific evidence of benefit from that treatment [CDC].

In the above instances, these were preventable deaths and harms—from common treatments—which patients might have avoided if their physicians had better understood the importance and methods of evaluating medical science.

Failures to Understand Medical Science Basics

Many health care professionals don’t know how to quickly assess a trial for reliability and clinical usefulness—and yet mastering the basics is not difficult. Over the years, we have given a pre-test of 3 simple questions to more than a thousand physicians, pharmacists and others who have attended our training programs. Approximately 70% fail—”failure” being defined as missing 2 or 3 of the questions.

One pre-test question is designed to see if people recognize the lack of a comparison group in a report of the “effectiveness” of a new treatment. Without a comparison group of people with similar prognostic characteristics who are treated exactly the same except for the intervention under study, you cannot discern cause and effect of an intervention because a difference between groups may explain or affect the results.

A second pre-test question deals with presenting results as relative risk reduction (RRR) without absolute risk reduction (ARR) or event rates in the study groups. A “relative” measure raises the question, “Relative to what?” Is the reported RRR in our test question 60 percent of 100 percent? Or 60 percent of 1 percent?

The last of our pre-test questions assesses attendees’ basic understanding of only one of the two requirements to qualify as an Intention-to-Treat (ITT) analysis. The two requirements are that people should be randomized as analyzed and that all people should be included in the analysis whether they have discontinued, are missing or have crossed over to other treatment arms. The failure rate at knowing this last requirement is very high. (We will add that this last requirement means that a value has to be assigned if one is missing—and so, one of the most important aspects of critically appraising an ITT analysis is the evaluation of the methods for “imputing” missing data.)

By the end of our training programs, success rates have always markedly improved. Others have reported similar findings.

There is a Lot of Science + Much of It May Not Be ReliableEach week more than 13,000 references are added to the world’s largest library—the National Library of Medicine (NLM). Unfortunately, many of these studies are seriously flawed. One large review of 60,352 studies reported that only 7 percent passed criteria of high quality methods and clinical relevancy [McKibbon]. We and others have estimated that up to (and maybe more than) 90% of the published medical information that health care professionals rely on is flawed [Freedman, Glasziou].

Bias Distorts ResultsWe cannot know if an intervention is likely to be effective and safe without critically appraising the evidence for validity and clinical usefulness. We need to evaluate the reliability of medical science prior to seriously considering the reported therapeutic results because biases such as lack of or inadequate randomization, lack of successful blinding or other threats to validity—which we will describe below—can distort reported result by up to 50 percent or more [see Risk of Bias References].

Patients Deserve BetterPatients cannot make informed choices regarding various interventions without being provided with quantified projections of benefits and harms from valid science.

Some Simple Steps To Critical AppraisalBelow is a short summary of our simplified approach to critically appraising a randomized superiority clinical trial. Our focus is on “internal validity” which means “closeness to truth” in the context of the study. “External validity” is about the likelihood of reaching truth outside of the study context and requires judgment about issues such as fit with individuals or populations in circumstances other than those in the trial.

Is reading this study worth my time? If the results are true, would they change my practice? Do they apply to my situation? What is the likely impact to my patients

Can anything explain the results other than cause and effect? Evaluate the potential for results being distorted by bias (anything other than chance leading away from the truth) or random chance effects.

Is there any difference between groups other than what is being studied? This is automatically a bias.

If the study appears to be valid, but attrition is high, sometimes it is worth asking, what conditions would need to be present for attrition to distort the results? Attrition does not always distort results, but may obscure a true difference due to the reduction in sample size.

Evaluating Bias

There are four stages of a clinical trial, and you should ask several key questions when evaluating bias in each of the 4 stages.

Subject Selection & Treatment Assignment—Evaluation of Selection Bias

Important considerations include how were subjects selected for study, were there enough subjects, how were they assigned to their study groups, and were the groups balanced in terms of prognostic variables?

Your critical appraisal to-do list includes—

a) Checking to see if the randomization sequence was generated in an acceptable manner. (Minimization may be an acceptable alternative.)

b) Determining if the investigators adequately concealed the allocation of subjects to each study group? Meaning, is the method for assigning treatment hidden so that an investigator cannot manipulate the assignment of a subject to a selected study group?

c) Examining the table of baseline characteristics to determine whether randomization was likely to have been successful, i.e., that the groups are balanced in terms of important prognostic variables (e.g., clinical and demographic variables).

The Intervention & Context—Evaluation of Performance Bias

What is being studied, and what is it being compared to? Was the intervention likely to have been executed successfully? Was blinding likely to have been successful? Was duration reasonable for treatment as well as for follow-up? Was adherence reasonable? What else happened to study subjects in the course of the study such as use of co-interventions? Were there any differences in how subjects in the groups were treated?

Your to-do list includes evaluating:

a) Adequacy of blinding of subjects and all working with subjects and their data—including likely success of blinding;

b) Subjects’ adherence to treatment;

c) Inter-group differences in treatment or care except for the intervention(s) being studied.

Data Collection & Loss of Data—Evaluation of Attrition Bias

What information was collected, and how was it collected? What data are missing and is it likely that missing data could meaningfully distort the study results?

b) Classification and quantification of missing data in each group (e.g., discontinuations due to ADEs, unrelated deaths, protocol violations, loss to follow-up, etc.)

c) Whether missing data are likely to distort the reported results? This is the area that the evidence on the distorting risk of bias provides the least help. And so, again, often it is worthwhile asking, “What conditions would need to be present for attrition to distort the results?”

Results & Assessing The Differences In The Outcomes Of The Study Groups—Evaluating Assessment Bias

Were outcome measures reasonable, pre-specified and analyzed appropriately? Was reporting selective? How was safety assessed? Remember that models are not truth.

Your to-do list includes evaluating—

a) Whether assessors were blinded.

b) How the effect size was calculated (e.g., absolute risk reduction, relative risk, etc.). You especially want to know benefit or risk with and without treatment.

c) Were confidence intervals included? (You can calculate these yourself online, if you wish. See our web links at our website for suggestions.)

d) For dichotomous variables, was a proper intention-to-treat (ITT) analysis conducted with a reasonable choice for imputing values for missing data?

e) For time-to-event trials, were censoring rules unbiased? Were the number of censored subjects reported?

After you have evaluated a study for bias and chance and have determined that the study is valid, the study results should be evaluated for clinical meaningfulness, (e.g., the amount of clinical benefit and the potential for harm). Clinical outcomes include morbidity; mortality; symptom relief; physical, mental and emotional functioning; and, quality of life—or any surrogate outcomes that have been demonstrated in valid studies to affect a clinical outcome.

Final Comment

It is not difficult to learn how to critically appraise a clinical trial. Health care providers owe it to their patients to gain these skills. Health care professionals cannot rely on abstracts and authors’ conclusions—they must assess studies first for validity and second for clinical usefulness. Authors are often biased, even with the best of intentions. Remember that authors’ conclusions are opinions, not evidence. Authors frequently use misleading terms or draw misleading conclusions. Physicians and others who lack critical appraisal skills are often mislead by authors’ conclusions and summary statements. Critical appraisal knowledge is required to evaluate the validity of a study which must be done prior to seriously considering reported results.

For those who wish to go more deeply, we have books available and do training seminars. See our website at www.delfini.org.

Status

Progression Free Survival (PFS) in Oncology Trials

Progression Free Survival (PFS) continues to be a frequently used endpoint in oncology trials. It is the time from randomization to the first of either objectively measured tumor progression or death from any cause. It is a surrogate outcome because it does not directly assess mortality, morbidity, quality of life, symptom relief or functioning. Even if a valid trial reports a statistically significant improvement in PFS and the reported effect size is large, PFS only provides information about biologic activity of the cancer and tumor burden or tumor response. Even though correlational analysis has shown associations between PFS and overall survival (OS) in some cancers, we believe that extreme caution should be exercised when drawing conclusions about efficacy of a new drug. In other words, PFS evidence alone is insufficient to establish a clinically meaningful benefit for patients or even a reasonable likelihood of net benefit. Many tumors do present a significant clinical burden for patients; however, clinicians frequently mistakenly believe that simply having a reduction in tumor burden equates with clinical benefit and that delaying the growth of a cancer is a clear benefit to patients.

PFS has a number of limitations which increases the risk of biased results and is difficult for readers to interpret. Unlike OS, PFS does not “identify” the time of progression since assessment occurs at scheduled visits and is likely to overestimate time to progression. Also, it is common to stop or add anti-cancer therapies in PFS studies (also a common problem in trials of OS) prior to documentation of tumor progression which may confound outcomes. Further, measurement errors may occur because of complex issues in tumor assessment. Adequate blinding is required to reduce the risk of performance and assessment bias. Other methodological issues include complex calculations to adjust for missed assessments and the need for complete data on adverse events.

Attrition and assessment bias are made even more difficult to assess in oncology trials using time-to-event methodologies. The intention-to-treat principle requires that all randomly assigned patients be observed until they experience the end point or the study ends. Optimal follow-up in PFS trials is to follow each subject to both progression and death.

Delfini Comment

FDA approval based on PFS may result in acceptance of new therapies with greater harms than benefits. The limitations listed above, along with a concern that investigators may be less willing to conduct trials with OS as an endpoint once a drug has been approved, suggest that we should use great caution when considering evidence from studies using PFS as the primary endpoint. We believe that PFS should be thought of as any other surrogate marker—i.e., it represents extremely weak evidence (even in studies judged to be at low risk of bias) unless it is supported by acceptable evidence of improvements in quality of life and overall survival.

When assessing the quality of a trial using PFS, we suggest the following:

Remember that although in some cases PFS appears to be predictive of OS, in many cases it is not.

In many cases, improved PFS is accompanied by unacceptable toxicity and unacceptable changes in quality of life.

Improved PFS results of several months may be due to methodological flaws in the study.

As with any clinical trial, assess the trial reporting PFS for bias such as selection, performance, attrition and assessment bias.

Compare characteristics of losses (e.g., due to withdrawing consent, adverse events, loss to follow-up, protocol violations) between groups and, if possible, between completers and those initially randomized.

Pay special attention to censoring due to loss-to-follow-up. Administrative censoring (censoring of subjects who enter a study late and do not experience an event) may not result in significant bias, but non-administrative censoring (censoring because of loss-to-follow-up or discontinuing) is more likely to pose a threat to validity.

Status

Network Meta-analyses—More Complex Than Traditional Meta-analyses

Meta-analyses are important tools for synthesizing evidence from relevant studies. One limitation of traditional meta-analyses is that they can compare only 2 treatments at a time in what is often termed pairwise or direct comparisons. An extension of traditional meta-analysis is the “network meta-analysis” which has been increasingly used—especially with the rise of the comparative effectiveness movement—as a method of assessing the comparative effects of more than two alternative interventions for the same condition that have not been studied in head-to-head trials.

A network meta-analysis synthesizes direct and indirect evidence over the entire network of interventions that have not been directly compared in clinical trials, but have one treatment in common.

Example
A clinical trial reports that for a given condition intervention A results in better outcomes than intervention B. Another trial reports that intervention B is better than intervention C. A network meta-analysis intervention is likely to report that intervention A results in better outcomes than intervention C based on indirect evidence.

Network meta-analyses, also known as “multiple-treatments meta-analyses” or “mixed-treatment comparisons meta-analyses” include both direct and indirect evidence. When both direct and indirect comparisons are used to estimate treatment effects, the comparison is referred to as a “mixed comparison.” The indirect evidence in network meta-analyses is derived from statistical inference which requires many assumptions and modeling. Therefore, critical appraisal of network meta-analyses is more complex than appraisal of traditional meta-analyses.

In all meta-analyses, clinical and methodological differences in studies are likely to be present. Investigators should only include valid trials. Plus they should provide sufficient detail so that readers can assess the quality of meta-analyses. These details include important variables such as PICOTS (population, intervention, comparator, outcomes, timing and study setting) and heterogeneity in any important study performance items or other contextual issues such as important biases, unique care experiences, adherence rates, etc. In addition, the effect sizes in direct comparisons should be compared to the effect sizes in indirect comparisons since indirect comparisons require statistical adjustments. Inconsistency between the direct and indirect comparisons may be due to chance, bias or heterogeneity. Remember, in direct comparisons the data come from the same trial. Indirect comparisons utilize data from separate randomized controlled trials which may vary in both clinical and methodological details.

Estimates of effect in a direct comparison trial may be lower than estimates of effect derived from indirect comparisons. Therefore, evidence from direct comparisons should be weighted more heavily than evidence from indirect comparisons in network meta-analyses. The combination of direct and indirect evidence in mixed treatment comparisons may be more likely to result in distorted estimates of effect size if there is inconsistency between effect sizes of direct and indirect comparisons.

Usually network meta-analyses rank different treatments according to the probability of being the best treatment. Readers should be aware that these rankings may be misleading because differences may be quite small or inaccurate if the quality of the meta-analysis is not high.

Delfini Comment
Network meta-analyses do provide more information about the relative effectiveness of interventions. At this time, we remain a bit cautious about the quality of many network meta-analyses because of the need for statistical adjustments. It should be emphasized that, as of this writing, methodological research has not established a preferred method for conducting network meta-analyses, assessing them for validity or assigning them an evidence grade.

Status

Sounding the Alarm (Again) in Oncology

Five years ago Fojo and Grady sounded the alarm about value in many of the new oncology drugs [1]. They raised the following issues and challenged oncologists and others to get involved in addressing these issues:

There is a great deal of uncertainty and confusion about what constitutes a benefit in cancer therapy; and,

How much should cost factor into these deliberations?

The authors review a number of oncology drug studies reporting increased overall survival (OS) ranging from a median of a few days to a few months with total new drug costs ranging from $15,000 to $90,000 plus. In some cases, there is no increase in OS, but only progression free survival (PFS) which is a weaker outcome measure due to its being prone to tumor assessment biases and is frequently assessed in studies of short duration. Adverse events associated with the new drugs are many and include higher rates of febrile neutropenia, infusion-related reactions, diarrhea, skin toxicity, infections, hypertension and other adverse events.

Fojo and Grady point out that—

“Many Americans would likely not regard a 1.2-month survival advantage as ‘significant’ progress, the much revered P value notwithstanding. But would an individual patient agree? Although we lack the answer to this question, we would suggest that the death of a mother of four at age 37 years would be no less painful were it to occur at age 37 years and 1 month, nor would the passing of a 67-year-old who planned to travel after retiring be any less difficult for the spouse were it to have occurred 1 month later.”

In a recent article [2] (thanks to Dr. Richard Lehman for drawing our attention to this article in his wonderful BMJ blog) Fojo and colleagues again point out that—

Cancer is the number one cause of mortality worldwide, and cancer cases are projected to rise by 75% over the next 2 decades.

Of the 71 therapies for solid tumors receiving FDA approval from 2002 to 2014, only 30 of the 71 approvals (42%) met the American Society of Clinical Oncology Cancer Research Committee’s “low hurdle” criteria for clinically meaningful improvement. Further, the authors tallied results from all the studies and reported very modest collective median gains of 2.5 months for PFS and 2.1 months for OS. Numerous surveys have indicated that patients expect much more.

Expensive therapies are stifling progress by (1) encouraging enormous expenditures of time, money, and resources on marginal therapeutic indications; and, (2) promoting a me-too mentality that is stifling innovation and creativity.

The last bullet needs a little explaining. The authors provide a number of examples of “safe bets” and argue that revenue from such safe and profitable therapies rather than true need has been a driving force for new oncology drugs. The problem is compounded by regulations—e.g., rules which require Medicare to reimburse patients for any drug used in an “anti-cancer chemotherapeutic regimen”—regardless of its incremental benefit over other drugs—as long as the use is “for a medically accepted indication” (commonly interpreted as “approved by the FDA”). This provides guaranteed revenues for me-too drugs irrespective of their marginal benefits. The authors also point out that when prices for drugs of proven efficacy fall below a certain threshold, suppliers often stop producing the drug, causing severe shortages.

What can be done? The authors acknowledge several times in their commentary that the spiraling cost of cancer therapies has no single villain; academia, professional societies, scientific journals, practicing oncologists, regulators, patient advocacy groups and the biopharmaceutical industry—all bear some responsibility. [We would add to this list physicians, P&T committees and any others who are engaged in treatment decisions for patients. Patients are not on this list (yet) because they are unlikely to really know the evidence.] This is like many other situations when many are responsible—often the end result is that “no one” takes responsibility. Fojo et al. close by making several suggestions, among which are—

Academicians must avoid participating in the development of marginal therapies;

Professional societies and scientific journals must raise their standards and not spotlight marginal outcomes;

All of us must also insist on transparency and the sharing of all published data in a timely and enforceable manner;

Actual gains of benefit must be emphasized—not hazard ratios or other measures that force readers to work hard to determine actual outcomes and benefits and risks;

We need cooperative groups with adequate resources to provide leadership to ensure that trials are designed to deliver meaningful outcomes;

We must find a way to avoid paying premium prices for marginal benefits; and,

We must find a way [federal support?] to secure altruistic investment capital.

Delfini Comment
While the authors do not make a suggestion for specific responsibilities or actions on the part of the FDA, they do make a recommendation that an independent entity might create uniform measures of benefits for each FDA-approved drug—e.g., quality-adjusted life-years. We think the FDA could go a long way in improving this situation.

And so, as pointed out by Fojo et al., only small gains have been made in OS over the past 12 years, and costs of oncology drugs have skyrocketed. However, to make matters even worse than portrayed by Fojo et al., many of the oncology drug studies we see have major threats to validity (e.g., selection bias, lack of blinding and other performance biases, attrition and assessment bias, etc.) raising the question, “Does the approximate 2 month gain in median OS represent an overestimate?” Since bias tends to favor the new intervention in clinical trials, the PFS and OS reported in many of the recent oncology trials may be exaggerated or even absent or harms may outweigh benefits. On the other hand, if a study is valid, since a median is a midpoint in a range of results and a patient may achieve better results than indicated by the median, some patients may choose to accept a new therapy. The important thing is that patients are given information on benefits and harms in a way that allows them to have a reasonable understanding of all the issues and make the choices that are right for them.

Status

Comparative Effectiveness Research (CER), “Big Data” & Causality

For a number of years now, we’ve been concerned that the CER movement and the growing love affair with “big data,” will lead to many erroneous conclusions about cause and effect. We were pleased to see the following blog from Austin Frakt, an editor-in-chief of The Incidental Economist: Contemplating health care with a focus on research, an eye on reform—

Status

Cochrane Risk Of Bias Tool For Non-Randomized Studies

Like many others, our position is that, with very few exceptions, cause and effect conclusions regarding therapeutic interventions can only be drawn when valid RCT data exists. However, there are uses for observational studies which may be used to answer additional questions, and non-randomized studies (NRS) are often included in systematic reviews.

In September 2014, Cochrane published a tool for assessing bias in NRS for systematic review authors [1]. It may be of interest to our colleagues. The tool is called ACROBAT-NRSI (“A Cochrane Risk Of Bias Assessment Tool for Non-Randomized Studies”) and is designed to assist with evaluating the risk of bias (RoB) in the results of NRS that compare the health effects of two or more interventions.

The tool focuses on internal validity. It covers seven domains through which bias might be introduced into a NRS. The domains provide a framework for considering any type of NRS, and are summarized in the table below, and many of the biases listed here are described and explanations of how they may cause bias are presented in the full document, and you can see our rough summary here: http://www.delfini.org/delfiniClick_Observations.htm#robtable

Response options for each bias include: low risk of bias; moderate risk of bias; serious risk of bias; critical risk of bias; and no information on which to base a judgment.

Details are available in the full document which can be downloaded at—https://sites.google.com/site/riskofbiastool/

Delfini Comment
We again point out that non-randomized studies often report seriously misleading results even when treated and control groups appear similar in prognostic variables and agree with Deeks that, for therapeutic interventions ,“non-randomised studies should only be undertaken when RCTs are infeasible or unethical”[2]—and even then, buyer beware. Studies do not get “validity grace” because of scientific or practical challenges.

Furthermore, we are uncertain that this tool is of great value when assessing NRS. Deeks [2] identified 194 tools that could be or had been used to assess NRS. Do we really need another one? While it’s a good document for background reading, we are more comfortable approaching the problem of observational data by pointing out that, when it comes to efficacy, high quality RCTs have a positive predictive value of about 85% whereas well-done observational trials have a positive predictive value of about 20% [3].

References

Sterne JAC, Higins JPT, Reves BC on behalf of the development group for ACROBAT- NRSI. A Cochrane Risk Of Bias Asesment Tol: for Non-Randomized Studies of Interventions (ACROBAT- NRSI), Version 1.0.0, 24 September 2014. Available from htp:/www.riskofbias.info [accessed 10/11/14.

Status

Many critical appraisers assess bias using tools such as the Cochrane risk of bias tool (Higgins 11) or tools freely available from us (http://www.delfini.org/delfiniTools.htm). Internal validity is assessed by evaluating important items such as generation of the randomization sequence, concealment of allocation, blinding, attrition and assessment of results.

Jefferson et al. recently compared the risk of bias in 14 oseltamivir trials using information from previous assessments based on the study publications and the newly acquired, more extensive clinical study reports (CSRs) obtained from the European Medicines Agency (EMA) and the manufacturer, Roche.

Key findings include the following:

Evaluations using more complete information from the CSRs resulted in no difference in the number of previous assessment of “high” risk of bias.

However, over half (55%, 34/62) of the previous “low” risk of bias ratings were reclassified as “high.”

Most of the previous “unclear” risk of bias ratings (67%, 28/32) were changed to “high” risk of bias ratings when CSRs were available.

The authors discuss the idea that the risk of bias tools are important because they facilitate the process of critical appraisal of medical evidence. They also call for greater availability of the CSRs as the basic unit available for critical appraisal.

Delfini Comment

We believe that both sponsors and researchers need to provide more study detail so that critical appraisers can provide more precise ratings of risk of bias. Study publications frequently lack information needed by critical appraisers.

We agree that CSRs should be made available so they can be used to improve their assessments of clinical trials. However, our experience has been the opposite of that experienced by the authors. When companies have invited us to work with them to assess the reliability of their studies and made CSRs available to us, frequently we have found important information not otherwise available in the study publication. When this happens, studies otherwise given a rating at higher risk of bias have often been determined to be at low risk of bias and of high quality.

Status

This is a complex area, and we recommend downloading our freely available 1-page summary to help assess issues with equivalence and non-inferiority trials. Here is a short sampling of some of the problems in these designs: lack of sufficient evidence confirming efficacy of referent treatment, (“referent” refers to the comparator treatment); study not sufficiently similar to referent study; inappropriate Deltas (meaning the margin established for equivalence or non-inferiority); or significant biases or analysis methods that would tend to diminish an effect size and “favor” no difference between groups (e.g., conservative application of ITT analysis, insufficient power, etc.), thus pushing toward non-inferiority or equivalence.

However, we do want to say a few more things about non-inferiority trials based on some recent questions and readings.

Is it acceptable to claim superiority in a non-inferiority trial? Yes. The Food and Drug Administration (FDA) and the European Medicines Agency (EMA), among others, including ourselves, all agree that declaring superiority in a non-inferiority trial is acceptable. What’s more, there is agreement that multiplicity adjusting does not need to be done when first testing for non-inferiority and then superiority.

See Delfini Recommended Reading: Included here is a nice article by Steve Snapinn. Snappin even recommends that “…most, if not all, active-controlled clinical trial protocols should define a noninferiority margin and include a noninferiority hypothesis.” We agree. Clinical trials are expensive to do, take time, have opportunity costs, and—most importantly—are of impact on the lives of the human subjects who engage in them. This is a smart procedure that costs nothing especially as multiplicity adjusting is not needed.

What does matter is having an appropriate population for doing a superiority analysis. For superiority, in studies with dichotomous variables, the population should be Intention-to-Treat (ITT) with an appropriate imputation method that does not favor the intervention under study. In studies with time-to-event outcomes, the population should be based on the ITT principle (meaning all randomized patients should be used in the analysis by the group to which they were randomized) with unbiased censoring rules.

Confidence intervals (CIs) should be evaluated to determine superiority. Some evaluators seem to suggest that superiority can be declared only if the CIs are wholly above the Delta. Schumi et al. express their opinion that you can declare superiority if the confidence interval for the new treatment is above the line of no difference (i.e.., is statistically significant). They state, “The calculated CI does not know whether its purpose is to judge superiority or non-inferiority. If it sits wholly above zero [or 1, depending upon the measure of outcome], then it has shown superiority.” EMA would seem to agree. We agree as well. If one wishes to take a more conservative approach, one method we recommend is to judge whether the Delta seems clinically reasonable (you should always do this) and if not, establishing your own through clinical judgment. Then determine if the entire CI meets or exceeds what you deem to be clinically meaningful. To us, this method satisfies both approaches and makes practical and clinical sense.

Is it acceptable to claim non-inferiority trial superiority? It depends. This area is controversial with some saying no and some saying it depends. However, there is agreement amongst those on the “it depends” side that it generally should not be done due to validity issues as described above.