Abstract

Purpose. Survival following biochemical failure is highly variable. Using a randomized trial dataset, we sought to define a risk stratification scheme in men with locally advanced prostate cancer (LAPC).
Methods. The TROG 96.01 trial randomized 802 men with LAPC to radiation ± neoadjuvant androgen suppression therapy (AST) between 1996 and 2000. Ten-year follow-up data was used to develop three-tier post-biochemical failure risk stratification schemes based on cutpoints of time to biochemical failure (TTBF) and PSA doubling time (PSADT). Schemes were evaluated in univariable, competing risk models for prostate cancer-specific mortality. The performance was assessed by c-indices and internally validated by the simple bootstrap method. Performance rankings were compared in sensitivity analyses using multivariable models and variations in PSADT calculation.
Results. 485 men developed biochemical failure. c-indices ranged between 0.630 and 0.730. The most discriminatory scheme had a high risk category defined by PSADT < 4 months or TTBF < 1 year and low risk category by PSADT > 9 months or TTBF > 3 years.
Conclusion. TTBF and PSADT can be combined to define risk stratification schemes after biochemical failure in men with LAPC treated with short-term AST and radiotherapy. External validation, particularly in long-term AST and radiotherapy datasets, is necessary.

1. Introduction

Biochemical failure is a very common problem in the treatment of prostate cancer. Klotz has estimated that approximately 40% of men treated curatively by prostatectomy or radiotherapy will develop biochemical failure [1]. In the United States, the figure is expected to be at least 60,000 per annum [2]. Outcomes following biochemical failure are known to be highly variable. Clinical signs of local or distant progression can follow within months but may take years to become evident, and five year prostate cancer-specific survival probability has been shown to vary between 35% and 100% [3]. An important breakthrough in the management of prostate cancer is the emergence of effective new options for the treatment of castrate-resistant prostate cancer (CRPC) [4]. Presently these options are routinely withheld until castration-resistant tumour growth develops; however, many clinicians now believe that earlier intervention may be beneficial. The identification of subgroups of men with unfavourable outcomes after biochemical failure, needing immediate treatment and/or new therapeutic agents, is therefore a high priority in clinical prostate cancer research.

Prognostic data can provide assistance to clinical management practices if presented in the form of a nomogram or risk stratification scheme. Nomograms are of value to individual patients and their clinicians when determining the need for interventions. Risk stratification schemes, however, are more valuable in identifying patient subgroups who would benefit from participation in trials of new therapeutic approaches and for stratification purposes in these trials.

With trials of the effective new CRPC agents in mind, we sought to develop a risk stratification scheme to predict the outcome following biochemical failure using data from the TROG 96.01 randomized controlled trial which addressed the value of neoadjuvant androgen deprivation prior to and during radiotherapy for locally advanced prostate cancer. Earlier reports from the TROG 96.01 trial showed that PSA doubling time (PSADT) and time to biochemical failure (TTBF) were independent and highly prognostic variables after biochemical failure [16] and at various cutpoints were successful surrogate candidates for prostate cancer-specific mortality (PCSM) [17]. Other reports have also affirmed their prognostic value [8, 10, 13, 15, 18–31]. Using combinations of PSADT and TTBF, we therefore explored the predictive accuracy of different risk stratification schemes in our trial dataset with minimum 10-year follow-up data from randomization. So far as we are aware, this paper describes the first risk stratification scheme for men with locally advanced prostate cancer who experience biochemical failure after curative radiotherapy with or without adjuvant androgen suppression therapy.

Biochemical failure (BF) was defined according to the Phoenix definition (time from end of radiotherapy to a PSA rise of ≥2 μg/L above the post-treatment nadir value) [33]. PSADT estimates were based on PSA values from immediately prior and up to 6 months after biochemical failure and were derived from the slope of the regression line of best fit through the log transformed PSA values selected. Errors in estimating time to biochemical failure and PSADT in our dataset were described previously in Lancet Oncology 2008 [17]. In this dataset, TTBF and PSADT were correlated, but not strongly so. Both variables were independently prognostic for PCSM (data not shown).

2.2. Endpoints

Endpoints used in this paper were the cumulative incidences of PCSM, distant progression, and STI from BF. PCSM occurred at the time of death due to prostate cancer (attribution of cause validated in Lancet Oncology 2011 [34]). Distant progression occurred on the date when the first evidence (clinical, radiological, or isotopic bone scan) of metastatic disease in lymph nodes, skeleton, or other site outside of the prostatic region became available. STI occurred when the first type of secondary therapy commenced.

2.3. Statistical Methods

The analysis group consisted of 485 subjects who experienced BF prior to clinical failure or STI. Three-tier post-BF risk categorization (BFRC) schemes based on low, intermediate, and high risk groups were derived in two stages: (1) 12 “cut point range-finding” schemes were identified and evaluated using combinations of TTBF and PSADT cutpoints regularly cited in the literature as being predictive of outcome following BF and (2) new “candidate” BFRC schemes were derived based on the most prognostic ranges identified in the range-finding schemes. To ensure that they were unique and consisted of sizeable risk strata, candidate schemes had to satisfy three criteria: at least three months separation between the high and low risk PSADT; at least one year separation between the high and low risk TTBF; and a minimum of 20% of patients in each risk stratum. All BFRC schemes were evaluated in unadjusted regression models for PCSM from BF using the method of Fine and Gray [35]. The performance of each BFRC scheme was assessed by calculating the Harrell’s concordance c-index [36]. The c-index is a measure of predictive discrimination and is defined as the proportion of patient pairs in which predictions and outcomes are concordant. A c-index of 0.5 indicates no predictive ability and 1.0 indicates perfect predictive accuracy. Differences between c-indices were computed using a paired Student’s t-test. The most predictive BFRC model (i.e., with the highest c-index) was identified and internally validated by the simple bootstrap method [36] using 200 replications with replacement to obtain an optimism-corrected performance estimate.

Sensitivity analyses were undertaken to compare the performance rankings of the candidate schemes in a range of adjusted PCSM models. The first model adjusted for trial arm (0 versus 3 versus 6 months maximal AST) as duration of androgen suppression could influence outcome after BF. The second model adjusted for baseline factors as well as trial arm because these could determine the aggression of the relapse process. Additional covariates included age at BF (continuous, years), pretreatment PSA (<10 versus ≥10 and <20 versus ≥20 μg/L), Gleason score (2–6 versus 7 versus 8–10), and tumour stage (T2b versus T2c versus T3 and T4). The third model also adjusted for STI as an ordinal time-dependent covariate (no STI versus STI without diagnosis of distant progression versus STI after diagnosis of distant progression) because STI practices could have changed over the follow-up period. Finally, to determine if the performance of the BFRC schemes was sensitive to the method of calculating PSADT, the schemes were reconstructed and tested in unadjusted PCSM models based on PSAs over 12 months post BF.

Competing risks methodology was used to calculate the cumulative incidences of distant progression, STI and PCSM. Competing risks were defined as STI, and death for distant progression; death for STI; and death due to other or unknown cause for PCSM. Univariable analyses were performed to determine the cumulative incidences of these endpoints in the three strata of the best BFRC and were compared using Gray’s test.

All analyses involving trial arms were conducted on an “intention to treat” basis and two-sided probability levels below 0.05 were considered significant. Analyses were performed using Stata Version 11.2.

3. Results

As on 31 August 2010, 485 (60.5%) out of the 802 eligible subjects had experienced biochemical failure (BF) prior to clinical failure or STI. Of these, 343 (71%) received STI, 150 (31%) died due to prostate cancer, and 69 (14%) died due to other causes. Median follow-up time after BF was 5.6 years (IQR 3.1–8.0).

3.1. Performance of Risk Strata Based on TTBF and PSADT

Table 1 summarises the performance of the 12 initial “range-finding” BFRC schemes in unadjusted models of PCSM after BF. The lowest c-indices were associated with schemes derived using cutpoints <9 months for PSADT or <3 years for TTBF to define the high risk categories, and >5 years for TTBF to define the low risk categories. Hence these cutpoints were not used to define the new “candidate” schemes. A total of 72 evaluable schemes were constructed according to our BFRC criteria using permutations of the following cutpoints: high risk PSADT (<3, <4, <5, <6 months), high risk TTBF (<1, <1.5, <2 years), low risk PSADT (>6, >9, >12, >18, >24 months), and low risk TTBF (>2, >3, >4 years). Table 2 summarises the characteristics of 24 BFRC schemes based on the best three and worst three performing schemes for each high risk PSADT cutpoint. c-indices ranged between 0.685 and 0.732. The best schemes were characterized by a high risk category using a PSADT cutpoint <4 and <5 months and TTBF <1 year. These schemes had low risk categories with TTBF cutpoint >3 years and variable cutpoints for PSADT. The most predictive BFRC, with c-index of 0.732, was defined by high risk cutpoints PSADT <4 months and/or TTBF <1 year, and low risk cutpoints PSADT >9 months and/or TTBF >3 years. It divided subjects into three, sizeable risk groups with 246 (51%) categorised as low risk and 119 (25%) as high risk. Internal validation of the best BFRC model estimated the degree of overoptimism as 0.002, thus the optimism-corrected value of its performance was 0.730.

Sensitivity analyses using multivariable models confirmed that our findings were not influenced by potentially important confounding covariables (data not shown). The rankings remained stable across all models, with the best BFRC in the unadjusted model also being the most predictive scheme in the models adjusting for trial arm (c-index 0.747), as well as for prognostic factors known at time of BF (c-index 0.751). In addition, the best BFRC in the unadjusted model was also the most predictive (c-index 0.744) in schemes derived using 12 months of PSAs post-BF to calculate PSADT instead of 6 months of PSAs.

Cumulative incidences of distant progression and STI for the best BFRC scheme are presented in Figures 1(a) and 1(b). The scheme separated the three risk categories very effectively. This figure shows that the majority of high risk subjects experienced distant progression and STI within the first 2 years after BF. In contrast, the cumulative incidences of distant progression and STI in low risk subjects 7 years after BF were approximately 20% and 60%, respectively. These findings were reflected in the cumulative incidences of PCSM shown in Figure 2(a). Pairwise comparisons of cumulative incidences were significant for all endpoints analysed (low versus intermediate risk, ; intermediate versus high risk, ). Figure 2(b) shows that the cumulative incidence of death due to causes other than PC was near identical in all three risk categories, as would be hoped in an effective risk categorization scheme.

Table 3 presents pre- and postprimary treatment characteristics of the three risk categories of the best BFRC scheme. From this table, it can be shown that in the 59 subjects with tumours classified by the D’Amico stratification system [37] as intermediate risk, only 6.7% developed high risk BFs, whereas 76% developed low risk BFs. In contrast, 27% of the 426 subjects with D’Amico high risk tumours developed high risk BFs and 47% low risk BFs.

Table 3: Pre- and post-treatment characteristics of the 485 subjects who developed biochemical (Phoenix) failure before clinical failure according to the most predictive post-biochemical failure risk category (BFRC) stratification scheme.

4. Discussion

This study has confirmed that TTBF and PSADT can be used to identify sizeable risk categories of men with poor, intermediate, and highly favourable outcomes after BF. The main strength of our study is that its findings are based on prospectively collected 10-year follow-up data from a randomised, clinical controlled trial. A further strength is the internal validation of the prognostic importance of the combination of the two variables. Our sensitivity analyses confirmed that the prognostic value of the combination was not influenced in multivariable models adjusting for treatment arm and other factors which could affect outcome, or in models using either 6 or 12 months of PSAs to estimate PSADT.

The optimal cutpoint ranges found to identify men at very high risk of early PCSM were PSADTs in the range <4 to <5 months and TTBF <1 year. These cutpoints were substantially lower than those found to be successful candidate surrogate endpoints for PCSM [17]. This finding is to be expected as the surrogate endpoints were derived from all eligible men on the trial (including those without BF) and measured PCSM from randomisation, whereas this study evaluated the subgroup of men who experienced BF after failing primary treatment and measured PCSM from BF.

The best risk scheme identified in this study had a modestly predictive optimism-corrected c-index of 0.730 for PCSM after BF. The high risk category for this scheme comprised men with PSADT <4 months and/or TTBF <1 year. Within a year of biochemical failure cumulative incidences of distant progression and STI for this category were 49% and 77%, respectively. In spite of the early introduction of STI, PCSM at 5 years after BF was 45%. These data suggest that many of these high risk men had microscopic metastases at the time of BF. Had modern imaging advances been available at the time, it is quite possible some of these men could have had imaging evidence of macroscopic metastastic disease. In either event, men in the high risk fail category could have benefited from immediate inclusion in trials of the new agents effective against CRPC, had they been available.

Although not the specific intention of this study, the low risk stratum in our optimal risk scheme identified a sizeable subgroup of men who could be safely reassured that their prognosis is good enough to avoid STI for many years and possibly indefinitely. The cutpoints identified in this subgroup were PSADT >9 months and/or TTBF >3 years. In these men cumulative incidences of distant progression and PCSM at 5 years were only 17% and 4%, respectively. At this time point cumulative incidence of death due to causes other than prostate cancer was 10%. This is a potentially important finding because two randomized trials designed to determine the value of early STI after biochemical failure [38, 39] were discontinued recently due to poor accrual.

For comparative purposes we have presented prognostically significant variables at biochemical failure identified in the most recently updated studies published since 2000 [5, 7–15, 40] (Table 4). In seven of these studies, Gleason score at or before treatment was used in determining a high risk stratum. In five of these, Gleason scores >7 were prognostic. All eleven studies used PSADT as a prognostic variable: three of these used PSADT cutpoints ≤12 months and one study used a PSADT cutpoint <9 months. Five examined TTBF but prognostic value for this variable was identified in only three studies. We found in our dataset that prognostically important variables prior to primary treatment, such as Gleason score, were no longer prognostic at the time of BF [16]. Amongst the explanations advanced we speculated that Gleason score based on prostatectomy findings would be more reliable in the present context than sextant fine needle biopsies or transurethral resection material. Five of the six studies in Table 4 where Gleason score was prognostic included sizeable numbers of men undergoing prostatectomy. Table 5 shows the performance of the variables identified in these studies in predicting 5-year prostate cancer-specific mortality after biochemical failure in the TROG 96.01 dataset. The low prognostic value of pre-treatment Gleason score at biochemical failure in our dataset is illustrated by its failure to add prognostic value after substratification by PSADT or TTBF. The best stratification scheme from the studies presented in Table 5 had a c-index of 0.694 when modelled on TROG 96.01 data and was devised by Freedland et al. We attribute this to the use of PSADT >9 months and TTBF >3 years as cutpoints to define a low risk category group. It demonstrates that predictions based on prostatectomy data can be validated by a radiotherapy dataset based on men with locally advanced disease.

A few limitations of our study need to be acknowledged. Firstly, it is a secondary retrospective study not prespecified in the trial protocol. However, the disciplined prospective collection of data in the context of a randomized trial, as in this study, does avoid many of the unseen selection biases that exist in most retrospective clinical studies. Secondly, the radiation dose used in the trial (66 Gy) was low by modern standards [40]. Failure at the primary site would have been more frequent in the TROG 96.01 trial than it would be following the increased radiation doses used nowadays. In addition, distant progression as a result of metastasis from uncontrolled tumour at the primary site could also have been more common. It is quite probable that preventable local progressions (i.e., due to low radiation dosage) would have been associated with prolonged PSADTs (e.g., >10 months) and TTBFs (e.g., >3 years). If this is true then the major impact of the low doses used in the TROG 96.01 trial on our stratification scheme would be to increase the size of the low risk stratum. Thirdly, as pointed out earlier, modern imaging techniques could have indicated that some men in the high risk category already had macroscopic metastatic disease. They would therefore have M1 disease and arguably would not be considered high risk. They certainly would not be eligible for inclusion in trials where the first appearance of metastatic disease is the main trial endpoint. Fourthly, due to funding difficulties, there was no centralized review of histopathological material. However, although such a review could have increased the strong prognostic value of the assigned Gleason score at the time of randomization, we are very doubtful that it could have led to a large enough number of score reassignments to render this variable prognostic after BF. Finally, it has become a common practice in the clinic to commence STI shortly after BF. It is therefore possible that outcomes could be improved as a result of earlier intervention. However, in this dataset survival was shorter in men who received earlier STI [16] rendering this point moot.

Although we performed internal validation of our findings in this paper, external validation in a wider range of clinical scenarios is important. These datasets should comprise men with different initial risk profiles, and who have undergone a wider range of curative treatments than used in this study, for example, prostatectomy alone in earlier stage disease, and long-term AST and radiation in later stage disease. In recommending external validation, however, we need to caution that the proportion of men with low risk disease prior to primary treatment who develop “high risk” biochemical failures according to our definition is likely to be very small. In our dataset only 4 (6.8%) of 59 men with intermediate risk cancer who developed BFs were classified as high risk. In those with low risk disease treated by prostatectomy, other prognostic factors, such as Gleason score, margin status, seminal vesicle, or nodal involvement at prostatectomy, may assume greater prognostic importance and might need inclusion for a risk categorization scheme to be effective. We suspect therefore that men experiencing BF after radiation and long-term AST for LAPC will be most likely to derive benefits from a risk stratification based on PSADT and TTBF.

Finally, if risk stratification schemes based on TTBF and PSADT derived shortly after biochemical failure are to be reproducible, there is a need for international consensus on the most appropriate means of calculating PSADT within months of biochemical failure and ensuring that TTBF is measured accurately [17, 41]. The stratification schemes presented in this paper were produced with PSADTs calculated using the limited number of PSA values available within 6 months of BF in this dataset. Time will tell, however, whether the calculation of PSADT using at least four PSA values within 6 months of biochemical failure in the on-going RADAR trial run by our trials group [42] will produce more accurate estimates with improved prognostic precision.

5. Conclusions

This study has shown that time to biochemical failure and PSA doubling time can be combined to define risk stratification schemes after biochemical failure in men with locally advanced prostate cancer treated with short-term androgen suppression therapy and radiotherapy. External validation of these stratification schemes is necessary, particularly in datasets evaluating long-term androgen suppression therapy and radiotherapy.

Disclosure

A. Steigler received financial support from AstraZeneca to attend a meeting. D. Lamb received financial support from AstraZeneca to attend meetings. D. Joseph received honoraria associated with membership of AstraZeneca’s Breast Cancer Medical Advisory Board. N. Spry received honoraria associated with AstraZeneca and Schering Plough. K.-H. Tai received FROGG-AstraZeneca education grant in 2003 and was supported by AstraZeneca to attend two Casodex Investigators Meetings in 2001 and 2004.

Conflict of Interests

The authors declare that they have any conflict of interests.

Acknowledgments

The authors acknowledge trial funding by the Australian Government National Health and Medical Research Council (Project Grant Applications 9936572, 209801, and 455520); Hunter Medical Research Institute (Newcastle, Australia); AstraZeneca Pty Ltd. and (Sydney, Australia); and Schering-Plough Pty Ltd., Sydney, Australia. Ms. Rosemary Bradford is thanked for her skillful preparation of the paper.