Wolters Kluwer Health may email you for journal alerts and information, but is committed
to maintaining your privacy and will not share your personal information without
your express consent. For more information, please refer to our Privacy Policy.

I. SIGNIFICANCE/CONTEXT AND IMPORTANCE OF THE STUDY
This article attempted to determine if lumbar discectomy outcomes are affected by the medical center at which surgery is performed. The authors performed a retrospective review of the surgical cohort of the Spine Patient Outcomes Research Trial (SPORT). Desai et al address the increasing need for evidence based medicine in spine surgery, both to improve patient care as well as to justify costs. This is particularly appropriate because reimbursement is increasingly being tied to outcomes such as readmission and reoperation when compared to national averages. In addition, this study attempts to address the generalizability of the SPORT trial to the average spine surgical practice performing discectomy, i.e., if no significant differences exist among thirteen different centers, a center outside the study could expect similar outcomes.

II. ORIGINALITY OF THE WORK
This study based its findings on the data of an originally well designed, prospective, high impact study (SPORT). Although a previous review addressed long term outcomes after lumbar discectomy, the direct effect on center variability was not assessed. 1 We agree that this paper is among the first to directly assess the correlation of varied outcomes to different clinical centers for lumbar discectomy surgery.

III. APPROPRIATENESS OF THE STUDY DESIGN OR EXPERIMENTAL APPROACH
This article is a retrospective review of prospectively gathered data. Therefore, although it avoids some of the pitfalls of a retrospective study, its shares the weakness of the original prospective study: difficulty in finding true clinical equipoise in the participating centers. In addition, it should be noted that the data used in this retrospective review comes from a trial that was not originally designed to show a difference amongst centers. This is the basis of our main critique. As the authors noted, patient demographics vary significantly from center to center. If we look at the p-values in Table 1 more than half are less than .05. How do we know then, that any differences between centers are due to differing procedures or surgeons at the centers and not to an imbalance in patient characteristics? As an example, consider that the SF-36 bodily pain score is associated with one of the outcome measures. The mean pain measures across the centers vary from 18 to 28, p-value of 0.003. If we test for an effect of center on the outcome but do not control for bodily pain, we are going to find a significant difference in centers. But that difference may have nothing to do with the centers per se. The apparent differences in centers may be a result of the true differences in baseline bodily pain among the patients treated at those centers.

IV. ADEQUACY OF EXPERIMENTAL TECHNIQUES
In this study, the centers are the ‘treatments’ being compared, and we are trying to determine whether there are any differences in treatments. If this study were being designed de novo, we would ideally randomize the 792 patients to the various centers in order to try to minimize any imbalances in patient characteristics across treatment centers. In a clinical trial, participants generally are not allowed to select their own treatments because that might maximize imbalances. We have the same issue here in less dramatic form: the subjects have chosen their own treatments centers, and we have ended up with treatment groups that are imbalanced on many variables, and we do not know whether those variables are associated with outcomes.
The ideal study for the investigational question would have randomized patients to centers so that when we tested for differences among centers, the potentially confounding variables would be spread roughly equally among the centers. The methods section makes no mention of controlling for the patient characteristic imbalances among centers. We are left with results about the effect of center on various outcomes, but we have no idea whether these effects are really due to differences among centers or due to preoperative differences in the patients.

V. SOUNDNESS OF CONCLUSIONS AND INTERPRETATION
The authors conclude that treatment center has no effect on long-term outcomes. However, there are clearly differences in long term outcomes; they are just not statistically significant. Figure 2 shows obvious spread among the centers on three measures of functional outcomes. For example bodily pain at 48 months varies from about 35 to 55 (20 point difference among centers), p-values 0.13. However, we don’t know that if there were a larger population per center, a 20 point difference among centers might demonstrate a statistically significant difference. This is an issue of statistical power. Lack of statistical significance does not mean that there is no difference. There is always a difference of some kind. A p-value above 0.05 in this case just means that given the sample size, the difference is not large enough for us to detect it with any certainty. Since a “lack of difference” is one of the main outcome points, there needs to be an explanation of the power necessary to detect a meaningful difference on these outcomes.
For short term outcomes where significant difference was found, clinical significance is difficult to be determined. For example, the average OR time was 76.6 minutes. The longest was on average 112 minutes, giving a difference of 37 minutes. Though this may add slightly to operative cost, it is unlikely to add significant risk to the patient. Blood loss ranged from 40 to 80 mL. Again the outlier was 191 (SD = 320) mL, which seems higher, but is unlikely to add risk to the patient or require more frequent transfusion. Even in center H, with the second highest rate of dural tear (9%), the average length of stay was only 0.25 days. This results shows that the reported number of complications may not be clinically significant to the patient or to the center.

VI. RELEVANCE OF DISCUSSION
The authors conclude the manuscript with an adequate discussion, recognizing some study limitations such as the inability to assess the effect of center volume on outcomes, and noting that a center’s volume may not correlate with the recruitment for this particular study. 2,3,4 However, this sizeable part of the discussion is devoted to a question different from the one this study tries to answer. This study attempted to correlate treatment center choice with outcomes, but surgical volume was never hypothesized to be the primary determining factor of what makes the centers different.

VII. CLARITY OF WRITING, STRENGTH, AND ORGANIZATION OF THE PAPER
The authors have delivered a well organized background and rationale for this study, succinctly described the original work being retrospectively reviewed, and provided a clear analysis of their outcomes and their ramifications. Though there were some deficiencies in the statistical analyses, the authors recognize that further work needs to be done to study the effect of center volume on outcomes.

VIII. ECONOMY OF WORDS
The manuscript is clearly and succinctly written. In fact, an additional paragraph addressing the study’s power to detect the various differences between centers could have been added.

IX. RELEVANCE, ACCURACY, AND COMPLETENESS OF BIBLIOGRAPHY
The authors provide an adequate bibliography, and included important studies as examples of other types of surgeries whose outcomes were compared against surgical center. The work by Khuri et al with the VA National Surgical Quality Improvement Program is a particularly important paper which showed no correlation between outcomes and center volume for eight commonly performed operations. 5 Obviously, this study draws heavily on the original Spine Outcomes Research Trial as a source of its study design and shares the same data.

X. NUMBER AND QUALITY OF FIGURES, TABLES, AND ILUSTRATIONS
The authors provide several figures and tables to explain their findings. They were mostly easy to read. However, the last column is Table 3 appears to be shifted up one row, which makes the post-operative complications more difficult assess. Table 2 was particularly useful in directly comparing the actual percentages of short term outcomes amongst centers.

XI. WHAT FUTURE/NEXT STEPS DOES THIS PAPER LOGICALLY LEAD TO?
Lumbar discectomy is one of the most common operations performed by spine surgeons. Long term outcomes of operative treatment, particularly the disability indices that were measured for this study, are of interest to the patients, surgeons, and payers involved in this procedure. This study begins to address the relationship between center choice and outcomes, specifically for spine surgery. We hope that further work controls for confounding patient characteristics, and future studies are designed specifically to answer this question. A logical next step in this process would be to use the methods used to compare other types of surgeries amongst treatment centers 2,3,4,5. It also further supports the need for a neurosurgery registry so similar type analyses can be done with a much larger number of centers and with higher power to detect variability among different clinical centers. We have previously mentioned that future reimbursement will likely be tied to outcomes, and we look forward to further work strengthening this study’s conclusions that although short term differences may exist, long term outcomes are similar across centers.