Center outcomes are adjusted for recipient severity, donor age, procedure (single or bilateral lung transplantation [LT]), and the year the transplantation was performed. Centers with a hazard ratio greater than 1 have an increased risk of death after transplantation. Error bars indicate 95% confidence intervals.

Figure 4. Relative Hazards for Death Attributable to Variables Included in the Final Model

The relative hazards for death are plotted as functions of the percentiles of the covariates' values. The hazard corresponding to the median covariate value was used as the reference. Larger ranges of variation in the vertical direction imply greater effects of those variables on mortality after transplantation. Functional status of 1, 2, or 3 indicates that the patient performed activities of daily living with no, some, or total assistance, respectively. COPD indicates chronic obstructive pulmonary disease; CF, cystic fibrosis; LT, lung transplantation; PH, pulmonary hypertension; Y, yes.

Table 1. Main Characteristics of Patients According to Transplantation Period

Context Although case loads vary substantially among US lung transplant centers, the impact of center effects on patient outcomes following lung transplantation is unknown.

Objective To assess variability in long-term survival following lung transplantation among US lung transplant centers.

Design, Setting, and Patients Analysis of data from the United Network for Organ Sharing registry for 15 642 adult patients undergoing lung transplantation between 1987 and 2009 in 61 US transplantation centers still active in 2008.

Results In 2008, 19 centers (31.1%) performed between 1 and 10 lung transplantations; 18 centers (29.5%), from 11 to 25 transplantations; 20 centers (32.8%), from 26 to 50 transplantations; and 4 centers (6.6%), more than 50 transplantations. One-month, 1-year, 3-year, and 5-year survival rates among all 61 centers were 93.4% (95% confidence interval [CI], 93.0% to 93.8%), 79.7% (95% CI, 79.1% to 80.4%), 63.0% (95% CI, 62.2% to 63.8%), and 49.5% (95% CI, 48.6% to 50.5%), respectively. Characteristics of donors, recipients, and surgical techniques varied substantially among centers. After adjustment for these factors, marked variability remained among centers, with hazard ratios for death ranging from 0.70 (95% CI, 0.59 to 0.82) to 1.71 (95% CI, 1.36 to 2.14) for low- vs high-risk centers, for 5-year survival rates of 30.0% to 61.1%. Higher lung transplantation volumes were associated with improved long-term survival and accounted for 15% of among-center variability; however, variability in center performance remained significant after controlling for procedural volume (P < .001).

Conclusions Center-specific variation in survival following lung transplantation was only partly associated with procedural volume. However, other statistically significant sources of variability remain to be identified.

More than 2500 lung transplantations (LTs) are performed annually at more than 150 LT centers worldwide,1,2 and approximately 1500 of these are performed at the 61 LT centers currently active in the United States. Although LT provides the only option for improved survival for patients with many end-stage lung diseases,1 the complexity of this intervention—both in terms of perioperative approach and in long-term care—suggests that the center where the patient undergoes LT might influence outcome.

Single-center experiences with outcomes following LT suggest practice variability among these centers. For example, studies from several large centers report 3-year survival rates greater than 70%3 and even 75%,4 rates that far exceed the average 3-year survival of 64% reported in the 2009 report of the International Society for Heart and Lung Transplantation Registry.2 There are several possible explanations for these discrepancies, including differences in the selection of donors or recipients or surgical approach (ie, single LT vs bilateral LT) or true differences in the quality of care that centers provide. For example, centers may vary in proficiency with the surgical procedure itself, long-term monitoring and treatment, or both.

Disentangling these multiple potential explanations for center variability in outcomes could highlight ways to improve care and inform several aspects of lung allocation policy. We designed the present study to assess the magnitude of variation in long-term survival among centers performing LT in the United States and to explore potential explanations for this variability.

METHODS

Patients

All data were supplied by the United Network for Organ Sharing (UNOS) as a standard analysis and research file based on Organ Procurement and Transplantation Network data as of February 2009 that included a coded transplant center identifier. The registry contains data on all patients who have undergone LT in the United States since the registry's inception, in 1987, with the first transplant entry on October 16, 1987. All adult patients were eligible if the patient had undergone deceased-donor single or bilateral LT for any indication between the inception of the registry and January 1, 2009; the dates of the LT and the last follow-up were known; and the vital status at the last follow-up was known. This study was classified as exempt from review by the Mayo Clinic institutional review board.

We restricted our analyses to patients undergoing LT at centers that were still active in 2008. Data were collected from the UNOS registry on donor, recipient, and surgery characteristics at the time of transplantation. We excluded variables for which data were sparse or that described rare characteristics.

Outcomes

The primary outcome was survival time following transplantation. Only patients undergoing a first transplantation were included in the primary analyses of outcome. Patients who were lost to follow-up or who underwent retransplantation were censored when these events occurred; additional analyses considered retransplantation as a failure.

Statistical Methods

All analyses were repeated for 2 different sets of patients: (1) all patients undergoing LT during the whole study period and (2) only patients undergoing LT after the implementation of the lung allocation score (LAS) in May 2005. Main analyses are reported for patients undergoing LT during the entire study period.

We used Cox proportional hazards regression models to assess survival following LT at individual centers, adjusting for covariates related to donors, recipients, and surgery. We used purposeful selection of covariates, as described by Hosmer and Lemeshow,5 to select the multivariate models. This is a manual method completely controlled by the data analyst. The first step was the inclusion of all variables significant at the 20% level on bivariate analysis, as well as all variables known or specifically hypothesized to be clinically important.6 The second step was to remove one by one variables that did not significantly contribute to the multivariate model on the basis of the P value of the Wald test (threshold of .05) and change in the coefficient of the remaining variables (threshold change of 20%). The final models included recipient age; functional status, indicating whether the patient performed activities of daily living with no, some, or total assistance; creatinine level; mechanical ventilation requirement; diagnosis; forced expiratory volume in the first second; donor age; and surgical procedure (single or bilateral LT). To account for improvement in results over time, we included the date of transplantation in all models and ran additional analyses using calendar date as the time scale. The linear combination of recipient variables from these models, centered on 0, was used as a score of recipient disease severity.

We assessed the scale of the continuous covariates by analyzing martingale residuals and fitting regression splines.7,8 These analyses supported a linear relation between all continuous covariates and the log hazard for death. The proportional hazard assumption was tested graphically by plotting scaled Schoenfeld residuals.8,9 Inspection of residual plots suggested that the proportional hazard assumption did not hold for the effect of procedure (single vs bilateral LT) over time, so Cox models were stratified on this variable.

We modeled center as a random effect in the multivariate analyses (mixed-effect Cox model). These models assume that the center effect follows a given distribution with a mean of 0. The random effect for a given center represents the deviation of this center from the overall underlying baseline risk. The main analyses assumed a gaussian distribution for the random effect; additional analyses assumed a gamma distribution. The 95% confidence intervals (CIs) for the standard deviation of the random effect were based on the profile likelihood of the random parameter.8 The overall significance of the random effect (center) was determined using a likelihood ratio test constructed as twice the difference between the log partial likelihood of the model with the random effect and the model without the random effect. To test the proportional hazard assumption for the random effect, we fitted a fixed-effect Cox model, including all variables cited here and the random effect obtained from the mixed-effect Cox model.

To assess the association of center volume and outcome, we fitted both marginal and mixed-effect Cox models, taking into account center volume in 2 ways: (1) mean center volume, defined as the number of LTs divided by the number of years of center activity; and (2) yearly center volume, defined for a given year as the number of LTs performed by the center during the year preceding the LT. Thus, for a given center, yearly center volume varied each year, whereas mean center volume was fixed. The marginal model refers to a traditional Cox model in which the variance is estimated by a grouped jackknife procedure taking into account within-center correlations. We also tested whether the center's overall prior experience (number of LTs performed by that center prior to the current LT) was associated with outcome. In addition, the impact of transplantation region (n = 11) on outcome was evaluated. Additional details regarding model development are provided in the eAppendix).

We used variations in the Akaike information criteria induced by the removal of each variable from the final model to estimate the contribution of each variable to mortality after transplantation. To better describe the relative importance of each covariate in the model, we also developed a rank-hazard plot in which the relative hazards were plotted against the rank of the variables included in the model.10

Based on numerical simulations, we found that our study provided 90% power to detect a standard deviation of 0.05 and 99% power to detect a standard deviation of 0.1 for the center random effect. All tests were 2-sided, with P < .05 indicating statistical significance.

All analyses were performed using R version 2.9.2 (R Foundation for Statistical Computing, Vienna, Austria). Marginal models involved use of the coxph function (survival package) with the cluster option; random-effect models involved use of the coxme function (coxme package).

RESULTS

The database included data for 18 115 LTs. We excluded data for 312 LTs from living donors (n = 250) or non–heart-beating donors (n = 62), 777 LTs involving pediatric patients, 497 retransplantations, and 28 LTs with missing survival times. Of the 16 501 remaining patients, 859 LTs were performed in 51 centers no longer active in 2008 and were thus removed from the analyses. Thus, 15 642 first LTs performed in 61 centers were included in the primary analyses. During the study period, 2072 patients (13.2%) underwent LT in centers performing fewer than 10 LTs each year, 6145 (39.3%) in centers performing from 10 to 25 LTs, 6393 (40.9%) in centers performing from 25 to 50 LTs, and 1032 (6.6%) in centers performing more than 50 LTs.

Figure 1 depicts the annual LT activity of these centers over time. On average, the activity of centers grew over time, and the activity of some centers varied over the study period. In 2008, the 61 centers performed a median of 17.0 LTs (interquartile range, 7.0-34.0). Nineteen centers (31.1%) performed between 1 and 10 LTs, 18 centers (29.5%) from 11 to 25 LTs, 20 centers (32.8%) from 26 to 50 LTs, and 4 centers (6.6%) more than 50 LTs.

Characteristics of Donors, Recipients, and Surgery

The main characteristics of donors, recipients, and procedures are in Table 1. These results are reported according to center activity in eTable 1. We created a severity score for each recipient based on recipient characteristics associated with expected survival following transplantation. Higher severity scores indicate lower expected survival after LT, and individual scores ranged from −0.58 to 1.55. The mean recipient scores among centers varied markedly, from −0.38 (95% CI, −0.41 to −0.35) to 0.18 (95% CI, 0.15 to 0.22) (P < .001), yielding hazard ratios (HRs) for mortality after transplantation from 0.68 (95% CI, 0.66 to 0.70) to 1.20 (95% CI, 1.16 to 1.25) (Figure 2). However, there was no clear relationship between a center's volume and its mean recipient severity score (Figure 2).

Regarding the surgical approach, the proportion of single LTs varied markedly among centers, from 0% to 94.5%, with single LTs accounting for 51.0% of all LTs. The proportion of single LTs across centers ranged from 0% to 100% when restricting the analysis to patients with either chronic obstructive pulmonary disease or pulmonary fibrosis, diseases for which debate exists regarding the relative utility of single LT.11,12

Unadjusted Outcomes

The median follow-up was 2.2 years (range, 0-18.0 years). During follow-up, 7935 patients died (50.7%), 301 were lost to follow-up (1.9%), 496 underwent retransplantation (3.2%), and 6910 (44.2%) were still alive at the end of the study.

According to the mixed-effects Cox model, there was a statistically significant center effect (P < .001) that followed a normal distribution with mean of 0 and an SD of 0.21 (95% CI, 0.16 to 0.27). This estimate remained unchanged when variables related to recipient, donor, and surgical procedure were added to the model (eTable 2). Table 2 reports the importance of each independent variable both in univariate and multivariate analyses of survival after transplantation, showing that the random center effect has a strong independent association with outcome. The individual prediction of the center effect on mortality, expressed as an HR, ranged from 0.70 (95% CI, 0.59 to 0.82) to 1.71 (95% CI, 1.36 to 2.14) across centers (Figure 3). Based on this model and the observed cohort survival, 1-month and 1-, 3-, and 5-year mortality were expected to vary by center from 89.0% to 95.3%, 67.8% to 85.3%, 45.4% to 72.4%, and 30.0% to 61.1%, respectively. Figure 4 shows rank-hazard plots10 for continuous and categorical covariates, in which the relative hazards are plotted against the rank of the variables included in the model. These figures demonstrate that the association between center and survival is of a magnitude comparable with that of recipient age and functional status.

With the analysis restricted to patients who underwent LT after the implementation of the LAS (n = 4807), the SD for the center effect was 0.34 (95% CI, 0.24 to 0.46), which could indicate an even greater degree of variation in survival after LT among the 61 centers in the modern era. eFigure 1 shows the distribution of the center effect on mortality for the whole period and after implementation of the LAS. After implementation of the LAS, 85.2% of patients underwent transplantation at centers with at least a 25% excess mortality compared with centers with the lowest mortality rates, 62.6% of patients underwent transplantation at centers with at least a 50% excess mortality, and 12.6% of patients underwent transplantation at centers with at least a 100% excess mortality, or twice the risk of death compared with centers with the lowest mortality. Results of residuals plots suggested that the impact of center on survival peaks during the first postoperative year and diminishes with time. For example, as can be seen in eTable 3, the difference in mortality risks across hospitals was greater during the first year following LT than it was subsequently.

Sensitivity Analyses

To explore the sensitivity of our findings to different assumptions about patient populations, we repeated the analyses across a range of model specifications (eTable 4). Our results were largely unchanged by time period and by restricting the analysis to patients with obstructive disorders or those receiving single or bilateral LT. Moreover, the results were not substantially altered by using different assumptions regarding the distribution of the random effects (gamma instead of gaussian distribution), including the LAS in the models, adjusting on a different set of covariates, using calendar time as the time scale, or considering retransplant as a failure instead of a censoring event.

Among-Center Variability

We assessed the association of center volume and survival using different time frames, different definitions of volume (yearly or mean over the study period), and different statistical methods. All analytic frames showed a consistent and statistically significant positive association between center volume and survival (Table 3). Residual analyses suggested that the association between center volume and outcome was roughly linear up to 60 transplantations a year. Beyond this threshold there were too few centers (n = 3) to provide reliable estimates of this relationship (eFigure 2).

However, center volume contributed only a small amount to mortality after LT (Table 2). Including center volume in the model using data from 1987 to 2008 reduced among-center variance by 15%, and variability in center performances remained significant after controlling for procedural volume (P < .001) (eTable 2). Moreover, as can be seen in Figure 3, several low-volume centers achieved good outcomes, suggesting that volume alone does not determine performance.

We also evaluated the impact of a center's past activity (number of LTs performed by that center prior to the current LT). Although center past activity was associated with survival in an unadjusted model (P = .03), this association was no longer significant when current LT volume was included in the model.

To determine whether the static or dynamic nature of changes in procedure volume over time was associated with transplant outcomes, we evaluated the outcomes of patients undergoing LT after LAS implementation and included variables reflecting center activity during this period (2005-2008) and during the 4 prior years (2001-2004). Again, only current LT volume was associated with outcome. Finally, we tested whether the transplant region (n = 11) was associated with outcome and explained part of the among-center variance. The impact of the transplant region on outcome was not statistically significant (P = .64).

COMMENT

This study of all LTs performed in the United States reveals clinically and statistically significant variability among centers in survival after transplantation. Thus, the center where a patient undergoes LT may be a major determinant of survival rate. The observation that this variability among centers remains after controlling for differences in the selection of donors, recipients, or surgical approaches suggests that centers may exhibit true differences in the quality of care provided during or following transplantation.

We were particularly interested in whether the volume of procedures performed at a transplant center accounted for center effects because LT centers vary more than 10-fold in the number of procedures they perform annually.2 Indeed, we found that center effects are explained in part by differences in center activity, which is consistent with other volume-outcome relationships described in the medical and surgical literature.13- 16 Our results are also consistent with a recent study among LT centers.17 However, our analyses were designed to account for temporal trends in LT and for potentially important confounders of a volume-outcome relationship. Overcoming these methodological limitations reveals that some low- or medium-volume LT centers achieve good outcomes and that significant variability in center performances remains after controlling for procedural volume.

Determining whether such residual performance variability among centers reflects true differences in quality of care—and if so, which aspects of care—is a methodologically challenging task, particularly for a service as multifaceted as LT. First, the observed variability may reflect unmeasured differences in case mix or center characteristics. Although our models accounted for recipient-, donor-, and surgery-related variables, it is possible that some degree of residual confounding may remain if prognostically important variables were not measured.

Second, a variety of decisions regarding the population under study, time frame for the analysis, and definition of center volume could have influenced the results. The robustness of this study's findings to multiple inclusion criteria and exposure definitions mitigates the possibility the results are spurious manifestations of such choices.

Third, our finding that most outcome variability occurs during the first year after LT suggests that differences in surgical expertise might contribute to center variability. Ideally, a “surgeon effect” would be distinguished from a center effect to provide more specific data on sources of variability. However, because only a single surgeon performs LTs at most centers, and because UNOS data do not include surgeon identities or practice characteristics that would be associated with their expertise, distinguishing surgeon effects from center effects was not possible.

There are several possible statistical approaches for testing associations between center volume and outcome. Importantly, our approaches account for clustering, or correlation of patient outcomes within centers. Patients in the same hospital are more likely to experience similar outcomes than patients treated in another hospital with the same volume because of differences in technique, skill, or supportive care. In statistical terms, observations within a center are correlated; those in different centers are independent. We adjusted for the correlated nature of outcomes within each center in 2 ways. First, we used a mixed-effect model, also known as a random-effect model, hierarchical model, or frailty model. This enabled quantification of among-center variability and of the amount of variability explained by the center volume. Second, using a marginal model to adjust CIs to account for the effect of clustering18,19 led to similar conclusions regarding the volume-outcome relationship.

That our central results remained unchanged through a series of sensitivity analyses testing these and other potential influences therefore strengthens considerably the conclusions that can be drawn. Specifically, our results suggest that the influence of center on survival after transplantation is large and, as can be seen in Figure 4, may be of comparable magnitude to the influence of recipient age. However, half of all LTs performed in the United States are undertaken at hospitals whose survival after transplantation is at least 50% less than achieved at the centers with highest survival rates. The finding that center performance was more heterogeneous during the first year after LT than subsequently suggests that there may be undue variability in centers' perioperative and early postoperative practices.

Taken together, these results yield several implications. First, our results suggest that further exploration of the causes of variation in center outcomes is warranted because presumed sources of variability such as donor and recipient selection and a center's procedural volume do not explain large proportions of this variability. Second, our results suggest that presenting patients and family members with center-specific outcome data might foster more informed decision making in terms of being put on a list for LT at a given center. Such information transfer may be particularly valuable in producing more informed decisions for patients with diseases such as chronic obstructive pulmonary disease, for which the survival benefits associated with transplantation are modest. For such patients, the choice to be listed for transplantation or not could be sensitive to even moderate differences in the expected outcomes among local centers.20- 22

Finally, our results suggest that it may be possible to identify specific practices at high-performing centers and to export these to lower-performing centers, for example, through programs like Organ Donation Breakthrough Collaboratives.23 The fact that some low-volume centers achieve good outcomes despite a strong volume-outcome relationship overall suggests that excellence in LT is not merely a “practice makes perfect” phenomenon. Additionally, the finding that center-specific outcomes are particularly discrepant in the first months to a year after transplantation suggest that examining differences in perioperative practices, rather than long-term maintenance care, may be most likely to identify further practice patterns that influence outcomes.

In summary, this study suggests that true variability exists in the quality of care provided across LT centers. There is a great need to explore practices at high-performing centers with the goal of exporting beneficial practices to lower-performing centers. If such efforts do not equalize outcomes for lung transplant recipients, consideration might be given to further regionalizing the LT system in the United States.

Funding/Support: Dr Thabut received support for his work from Assistance Publique–Hopitaux de Paris and Collège des Enseignants de Pneumologie (CEP) and AstraZeneca during his fellowship. Dr Halpern was supported by grant 1K08 HS018406-01 from the Agency for Healthcare Research and Quality.

Role of the Sponsors: The funding agencies had no role in the design and conduct of the study; in the collection, analysis, and interpretation of the data; or in the preparation, review, or approval of the manuscript.

Disclaimer: The data reported here have been supplied by the United Network for Organ Sharing (UNOS) as the contractor for the Organ Procurement and Transplantation Network (OPTN). The interpretation and reporting of these data are the responsibility of the authors and in no way should be seen as an official policy of or interpretation by the OPTN or the US government.

Additional Information: Study protocol and statistical code are available from Dr Thabut. The data set is available on request from UNOS (http://www.unos.org).