Abstract

Because of scepticism concerning study results when relying solely on relative effect estimates, the number needed to treat (NNT) has been used extensively to quantify the net clinical benefit of an intervention, and is reported increasingly in randomised trials and observational studies. This method is a simple measure representing the number of patients who would need to be treated to prevent one additional adverse event. However, like relative risk, the NNT is an inherently time-dependent measure. Thus, its calculation may lead to misleading interpretations, especially for studies involving varying follow-up times or recurrent outcomes. In addition to study duration and the efficacy of the therapy and the comparator, multiple other factors directly influence the NNT and should be taken into account in its interpretation as for comparative effectiveness of therapies. Its accurate estimation and interpretation, as well as its limitations, are therefore crucial to avoid erroneous clinical and public health decisions. We discuss the calculation and the interpretation of risk reduction and the NNT in the context of the changing landscape of clinical trials in pulmonary arterial hypertension.

Abstract

Number needed to treat is widely used in clinical trials, but many factors influence its appropriate interpretationhttp://ow.ly/lMuS30jAwPJ

Introduction

Throughout the past 20 years, numerous specific pharmacological agents, including phosphodiesterase-5 inhibitors, endothelin receptor antagonists, prostaglandins, soluble guanylate cyclase stimulators and, more recently, selective prostacyclin receptor agonists have emerged for the treatment of pulmonary arterial hypertension (PAH) [1]. During the same period, the landscape of PAH clinical trials changed dramatically. Early clinical trials were typically of short-term duration, comparing the effects of PAH-targeted therapies versus placebo and using exercise tolerance as the primary end-point. A meta-analysis of these trials documented a reduction in short-term mortality of ≈40% with monotherapy [2]. More recently, we have witnessed a progressive shift in PAH study designs with longer event-driven trials comparing the effects of combination therapy on clinical worsening that is perceived as a more clinically relevant outcome measure [3]. A meta-analysis demonstrated that combination therapy significantly reduced the risk of clinical worsening by ≈35% as compared to monotherapy alone [4].

However, many physicians express scepticism when efficacy is presented only as odds or risk ratios (RR) because of the dangers of misinterpreting the importance of a therapy when relying solely on relative effect estimates. For example, a therapy reducing the risk of an event from 3% to 2% or reducing this risk from 30% to 20% would both represent a ∼35% relative risk reduction. In an attempt to provide information on the reduction in absolute risk, it is thus not surprising that the number needed to treat (NNT), a global measure representing the number of patients who would need to be treated to prevent one additional adverse event, has been embraced an alternative to express results. Since its description >25 years ago [5] this method has been used extensively to quantify the net clinical benefit of an intervention, as well as for cost comparison and cost-effectiveness analyses. There is indeed an “implicit belief by physicians that the NNT value adequately captures the overall worth of a treatment” [6]. However, NNT and risk ratios are inherently time-dependent measures that few medical students, and even physicians interpret correctly (figure 1) [7, 8].

Risk ratio (RR) and number needed to treat (NNT) are time-dependent measures. a) When an intervention is associated with constant relative risk reduction over time, the NNT progressively decreases as the number of events increases; b) conversely, when the absolute risk reduction remains constant over time, the relative risk reduction progressively decreases (increased relative risk).

Indeed, multiple factors, in addition to the efficacy of the therapy and the comparator, may directly influence the NNT and RR and should be taken into account in their interpretation. As a result, the added value of the NNT over the absolute risk reduction has been questioned [9]. Herein, we expand on some of these issues raised by the interpretation of our recent meta-analyses on combination therapies in PAH [4, 10] and discuss the calculation and the interpretation of risk reduction and the NNT in the context of the changing landscape of clinical trials in pulmonary arterial hypertension.

Methods to calculate the NNT

The NNT is calculated by taking the reciprocal of the absolute risk reduction between two treatment options. In practice, it is generally measured by the difference in the cumulative incidence of the outcome over a fixed follow-up time period between two groups of patients as a proxy for treatment efficacy [11]. Using the data from AMBITION study [12] in which 46 of the 253 and 77 of the 247 patients randomised to combination therapy and monotherapy, respectively, met the primary efficacy outcome, it would seem appealing to compute the NNT for preventing one event of clinical failure as:
However, this calculation does not account for varying times in individual study arms. Indeed, the NNT has traditionally been derived from single trials measuring dichotomous outcomes with equal follow-up periods for all patients, and its computation must be performed with care in trials involving more complex study designs with varying follow-up periods such as recent long-term event-driven PAH trials. For example, patients in the AMBITION trial were treated on average for 550 days in the combination-therapy group and 484 days in the pooled monotherapy group.

Various methods to calculate the NNT have been proposed for studies with different follow-up times [13]. The incidence rates of the outcome may be used, the NNT being computed as the reciprocal of the between-group difference in incidence density rates (or person-time incidence rates) (table 1). This difference represents the incidence rate of prevented events per person-moment. In the example above, the incidence rates of the primary outcome would be 0.120 versus 0.235 events per person-years, yielding an NNT=1/(0.235–0.120)=8.7≈9 per person-years. However, some authors have computed the corresponding NNT with the interpretation that it represents the number of patients who need to be treated to prevent one outcome over a given time period. Interpreting this NNT as the need to treat nine persons for 1 year to prevent an outcome would indeed be inaccurate. Because this NNT is computed from the incidence rate of events, its inverse does not represent persons, but rather person-time. Moreover, this computation assumes that patient-time is intrinsically interchangeable, so that 1 year of follow-up in one patient is equivalent to 6 months of follow-up of two patients, which is frequently inaccurate. Ultimately, doing the same calculation for the other recent long-term trials would then suggest that the addition of selexipag might be associated with the lowest NNT among the evaluated therapies (table 1). Alternatively, the event probability in the active treatment group at a fixed time can be computed from the event probability in the control group at that time and the hazard ratio [2]. While the hazard ratio is an estimate of the instantaneous risk at that specific time rather than a cumulative over an entire study [18], both methods assume that the RR reduction with a therapy is relatively constant over time, which is generally inaccurate. More importantly, physicians often make the mistake of trying to intrapolate or even to extrapolate beyond the original study duration to standardise comparisons between interventions.

Ultimately, the NNT can be estimated using the Kaplan–Meier approach using the difference in the cumulative incidence of the outcome between the two groups at a specific time of follow-up [16]. This method generally accounts for the heterogeneous risk reduction over varying follow-up and represents the number of patients who need to be treated to prevent one patient with the outcome over the period of interest. However, the Kaplan–Meier approach is also subject to distorted NNT calculation, especially when outcomes of interest do not occur randomly. In PAH trials, clinical worsening is a composite end-point generally defined as a combination of death, hospitalisation, lung transplant, treatment escalation and symptomatic progression [3]. While death and PAH-related hospitalisation generally occur erratically, symptomatic progression and treatment escalation, which contributed to 40–80% of the first clinical-worsening event in recent combination trials [12, 14, 15, 17], probably cluster around study visits. This tends to results in inhomogeneous decline in event-free survival on Kaplan–Meier curves (figure 2). Comparing the cumulative incidence of the outcome between the two groups at a specific time of follow-up may thus result in aberrant estimation of the NNT, as probably occurred when estimating the NNT at week 16 from the AMBITION study [12] before clinical worsening was confirmed in a required subsequent study visit (table 1).

Kaplan–Meier curves for the probability of a first adjudicated primary end-point event in the AMBITION trial, suggesting that the primary outcome events frequently clustered around study visits. Compared to death, pulmonary arterial hypertension-related hospitalisation and transplantation which generally occur erratically, symptomatic progression and treatment escalation, which contributed to 40–80% of the first clinical-worsening events in recent event-driven trials, are defined by the investigator, clustering around study visits. This may contribute to the inhomogeneous decline in event-free survival on Kaplan–Meier curves (arrows) and the distorted number needed to treat calculations when estimated at a specific time of follow-up. Reproduced and modified from [12] with permission from the publisher.

The impact of study and treatment duration on NNT and RR: are long-term event-driven trials necessary in PAH?

The impact of time dependency on NNT and RRs is even more problematic when trials of various durations are compared inappropriately. Indeed, NNT and RR are inherently time-dependent measures (figure 1): when the RR reduction is constant over time, increasing follow-up duration will progressively decrease the NNT as the absolute event rate increases, whereas RR will progressively increase (thus decreasing the RR reduction) with longer follow-up when absolute risk reduction is constant over time [4]. For example, clinical trials evaluating the addition of phosphodiesterase type 5 inhibitors resulted in a lower RR compared to endothelin receptor antagonists (RR 0.44, 95% CI 0.31–0.63 versus RR 0.76, 95% CI 0.64–0.90), suggesting that the addition of the former might have reduced clinical worsening more effectively. However, clinical trials evaluating phosphodiesterase type 5 inhibitors were of shorter duration (31±27 weeks versus 90±56 weeks). Interestingly, in our recent meta-analysis, the RR of clinical worsening correlated tightly with study duration in recent trials comparing combination therapy versus monotherapy, whereas the NNT did not [10]. This is consistent with long-term event-driven trials in which the treatment effect on clinical worsening was dominantly observed during the first 6–12 months of treatment. In these trials, the NNT progressively decreased until 52 weeks of follow-up, event-free survival curves being essentially parallel thereafter [12, 14, 15, 17], thus implying a relatively constant absolute risk reduction thereafter [10]. Thus, study duration alone may largely influence differences in RR reduction. This concept is important in the PAH field, which has witnessed a shift in study design and duration within the past two decades.

More pragmatically, this constant absolute risk reduction after 6–12 months of follow-up questions the requirement for long-term event-driven trials in PAH, especially that morbidity events occurring at 3, 6 and 12 months were recently shown to reliably predict subsequent deaths [19]. Indeed, most recent event-driven studies lasted 4–6 years, patients being exposed to the study drugs on average for ≈2 years [12, 14, 15, 17]. In the context of an orphan disease with limited recruitment and the rapidly changing treatment paradigm in PAH, the optimal duration of future trials should be revisited, balancing study power (longer follow-up will probably be associated with increased numbers of events) with the possibility for patients to contribute to subsequent trials and benefit from newer PAH-target therapies and treatment algorithms [1].

Baseline risk of events, comparison between interventions and external validity of the NNT and RRs

Even with similar relative efficacy of drug therapies, the NNT varies inversely with baseline risk of experiencing an adverse event [20]. Contemporary trial participants generally have lower baseline risk than those enrolled in earlier trials. Novel therapies may thus be associated with larger NNTs, even if they are similarly effective. Consequently, the calculated NNT is entirely specific to a single comparison in a particular study population and NNT cannot be generalised to a particular subgroup of patient, therapy, drug class or treatment strategy. Therefore, comparisons for different therapies based on NNT is most commonly misleading, unless therapies were tested in study populations with the same stage of disease, comorbidities and background therapies, for their effects on the same outcomes, against the same comparator and over the same time frame. Even in these circumstances, confidence intervals for the NNT are commonly not reported, precluding valid comparison between various randomised controlled trials [9, 16]. Similarly, generalising an NNT from a particular trial to routine care may lead to erroneous conclusions due to selection bias, enhanced comorbidities, lower intensity of monitoring by physicians, lower levels of adherence and higher use of co-interventions.

Outcomes overlooked by the NNT and RRs: treatment impact or utility?

The NNT is generally expressed for binary primary outcomes (e.g. clinical worsening). However, most therapies impact more than one predefined outcome. Yet, secondary or continuous outcome measures (e.g. exercise capacity) may still be relevant to PAH patients. Of note, a method for estimating the proportion of patients who benefit from a treatment and its NNT when the outcome is a continuous variable has been suggested [21]. Similarly, the NNT calculation using time-to-first event analyses overlooks recurrent events that can occur more than once during the patient follow-up, such as hospitalisation. Thus, the calculation of the NNT to prevent one clinical worsening event in the AMBITION trial [12] does not mean that only one out of between six and nine patients benefited upfront combination therapy versus monotherapy.

Conversely, the NNT to “prevent” clinical worsening may be misleading, as the NNT best applies to acute conditions without long-term sequelae. In chronic conditions such as PAH, adverse outcomes are not permanently avoided, but are merely postponed [22]. Thus, it has been suggested that it may be more accurate to describe the potential impact of these chronic therapies in terms of average duration of life without clinical worsening gained rather than focusing on differential event-free survival at a single time point [23]. For these reasons, the NNT is an expression of the frequency of a specific outcome in a specific study population treated with a specific therapy and comparator, rather than a surrogate for the intervention's utility. Patients, physicians and eventually regulatory authorities vary their treatment decisions depending on cost, side-effect profile, the outcome it prevents and personal/societal values. Thus, NNT alone is insufficient to fully capture the global benefit of a therapy and conclude whether a therapy should be used, and other considerations need to be incorporated into treatment decision making [24]. This is of particular importance in PAH where drugs are generally expensive, constraining (e.g. parenteral prostaglandins) or associated with potential side-effects.

Conclusions

The NNT is a simple, appealing and valuable measure of the treatment impact. However, its computation and interpretation may be misleading, especially in trials with varying follow-up times. The NNT and RR reduction are time-dependent measures, both decreasing with longer follow-up when relative and absolute risk reductions are constant over time, respectively. This concept is important in the PAH field, which witnessed a progressive shift in study design and duration. However, even with appropriate computation, the comparison of the NNT and RR between therapies is generally misleading unless therapies were tested in similar study populations with the same disease severity, the same outcomes, against the same comparator and over the same time frame. Even when this fundamental premise is respected, these comparisons are only indirect and subject to artefacts and should therefore be interpreted with extreme caution in the absence of head-to-head clinical trials.

References

2015 ESC/ERS Guidelines for the diagnosis and treatment of pulmonary hypertension: the Joint Task Force for the Diagnosis and Treatment of Pulmonary Hypertension of the European Society of Cardiology (ESC) and the European Respiratory Society (ERS). Eur Respir J2015; 46: 903–975.