Introduction: Several authors have underscored a strong relation between the molecular subtypes and the axillary status of breast cancer patients. The aim of our work was to decipher the interaction between this classification and the probability of a positive sentinel node biopsy.

Materials and methods: Our dataset consisted of a total number of 2654 early-stage breast cancer patients. Patients treated at first by conservative breast surgery plus sentinel node biopsies were selected. A multivariate logistic regression model was trained and validated. Interaction covariate between ER and HER2 markers was a forced input of this model. The performance of the multivariate model in the training and the two validation sets was analyzed in terms of discrimination and calibration. Probability of axillary metastasis was detailed for each molecular subtype.

Results: The interaction covariate between ER and HER2 status was a stronger predictor (p = 0.0031) of positive sentinel node biopsy than the ER status by itself (p = 0.016). A multivariate model to determine the probability of sentinel node positivity was defined with the following variables; tumour size, lympho-vascular invasion, molecular subtypes and age at diagnosis. This model showed similar results in terms of discrimination (AUC = 0.72/0.73/0.72) and calibration (HL p = 0.28/0.05/0.11) in the training and validation sets. The interaction between molecular subtypes, tumour size and sentinel nodes status was approximated.

Discussion: We showed that biologically-driven analyses are able to build new models with higher performance in terms of breast cancer axillary status prediction. The molecular subtype classification strongly interacts with the axillary and distant metastasis process.

pone-0020297-g003: Percentage of positive sentinel node.Percentage of positive sentinel node calculated for each 5 mm tumour size subclasses from 0 to 40 mm. Number of patient by tumour size subclasses are printed. The training and two validation datasets have been merged to determine these probability plots.

Mentions:
Table 1 summarizes the training (1543 patients) and the two validation sets (615 and 496 patients). These three populations significantly differ in terms of age at diagnosis, ER status, HER2 status, histological grade, lympho vascular invasion, histological subtypes, number of sentinel nodes removed and number of positive sentinel node biopsy. These differences are of major interest in a validation process to test the robustness of a classification algorithm. The training set (Table 2) was composed of 516 patients with a positive sentinel node biopsy (33.4%) and 1027 patients with a negative sentinel node biopsy (66.6%). We showed that patients with a positive sentinel node biopsy differed from those with a negative biopsy in terms of age at diagnosis, ER status, pathological tumor size, histological grade, mitotic index, lympho vascular invasion and number of sentinel node removed. The proportion of patients with a positive HER2 status was not significantly different between the two groups [8.6% vs 7.6%, p = 0.58]. The interaction covariate between ER and HER2 status [ERneg HER2neg, ERpos HER2neg, ERpos HER2pos, ERneg HER2pos] was a stronger predictor (p = 0.0031) of positive sentinel node biopsy than the ER status by itself (p = 0.016). We designed a multivariate logistic regression model to determine the probability of having a positive sentinel node biopsy (Table 3). The initial input was based on the variables found significant in the univariate analysis. Tumour size, lympho-vascular invasion, molecular subtypes classification as defined by the interaction covariate between the ER and HER2 status and age at diagnosis were the final input into this model. Odds Ratio, Confidence Intervals and pvalue are summarized in table 3. The logistic regression parameters indicate the relative degree to which each of these variables is correlated to nodal metastasis. The performance of the multivariate model in the training and the two validation sets was analyzed in terms of discrimination and calibration. ROC curves are plotted in figure 1a. It showed a very similar area under curves (AUC) for each population [Training set AUC = 0.72 (95% CI, 0.69–0.75), IC validation set AUC = 0.73 (95% CI, 0.68–0.78), T validation set AUC = 0.72 (95% CI, 0.67–0.77)]. Calibration curves are plotted in figure 1b. The logistic model was well calibrated, with no significant difference between the predicted and the observed probability in the training and the two validation sets. The Hosmer-Lemeshow goodness of fit statistic showed similar results when applied to each datasets (Institut Curie Trainin Set p = 0.28, Institut Curie Validation Set p = 0.05, Hopital Tenon Validation Set p = 0.11). Using the multivariate logistic regression model, a nomogram was build (Figure 2). Finally we analyzed the correlation between the tumour size and the probability of having a positive sentinel node biopsy procedure for each molecular subtype (Figure 3, Table 4). We showed an almost slope of the correlation axis in the ER negative HER2 negative subgroup. The probability of having an axillary metastasis seemed to be more or less 20% whatever the tumour size. Both ER positive (either HER2 negative or positive) tumour subgoups showed an intermediate slope and the ER negative HER2 positive tumour subgroup showed the steepest slope. Tumour size was a major determinant of axillary metastasis development only in the HER2 positive or ER positive tumour subgroups. Sentinel node biopsies for breast cancers of less than 30 mm was associated with a rate of less than 30% of axillary metastasis in the ER negative HER2 negative subgroup and with one higher than 50% in the other three subgroups. For each molecular subtype as defined by a combination of ER and HER2 immuno-histochemistry markers, we summarized (table 5) eight publications addressing the percentage of axillary metastases [9]–[16].

pone-0020297-g003: Percentage of positive sentinel node.Percentage of positive sentinel node calculated for each 5 mm tumour size subclasses from 0 to 40 mm. Number of patient by tumour size subclasses are printed. The training and two validation datasets have been merged to determine these probability plots.

Mentions:
Table 1 summarizes the training (1543 patients) and the two validation sets (615 and 496 patients). These three populations significantly differ in terms of age at diagnosis, ER status, HER2 status, histological grade, lympho vascular invasion, histological subtypes, number of sentinel nodes removed and number of positive sentinel node biopsy. These differences are of major interest in a validation process to test the robustness of a classification algorithm. The training set (Table 2) was composed of 516 patients with a positive sentinel node biopsy (33.4%) and 1027 patients with a negative sentinel node biopsy (66.6%). We showed that patients with a positive sentinel node biopsy differed from those with a negative biopsy in terms of age at diagnosis, ER status, pathological tumor size, histological grade, mitotic index, lympho vascular invasion and number of sentinel node removed. The proportion of patients with a positive HER2 status was not significantly different between the two groups [8.6% vs 7.6%, p = 0.58]. The interaction covariate between ER and HER2 status [ERneg HER2neg, ERpos HER2neg, ERpos HER2pos, ERneg HER2pos] was a stronger predictor (p = 0.0031) of positive sentinel node biopsy than the ER status by itself (p = 0.016). We designed a multivariate logistic regression model to determine the probability of having a positive sentinel node biopsy (Table 3). The initial input was based on the variables found significant in the univariate analysis. Tumour size, lympho-vascular invasion, molecular subtypes classification as defined by the interaction covariate between the ER and HER2 status and age at diagnosis were the final input into this model. Odds Ratio, Confidence Intervals and pvalue are summarized in table 3. The logistic regression parameters indicate the relative degree to which each of these variables is correlated to nodal metastasis. The performance of the multivariate model in the training and the two validation sets was analyzed in terms of discrimination and calibration. ROC curves are plotted in figure 1a. It showed a very similar area under curves (AUC) for each population [Training set AUC = 0.72 (95% CI, 0.69–0.75), IC validation set AUC = 0.73 (95% CI, 0.68–0.78), T validation set AUC = 0.72 (95% CI, 0.67–0.77)]. Calibration curves are plotted in figure 1b. The logistic model was well calibrated, with no significant difference between the predicted and the observed probability in the training and the two validation sets. The Hosmer-Lemeshow goodness of fit statistic showed similar results when applied to each datasets (Institut Curie Trainin Set p = 0.28, Institut Curie Validation Set p = 0.05, Hopital Tenon Validation Set p = 0.11). Using the multivariate logistic regression model, a nomogram was build (Figure 2). Finally we analyzed the correlation between the tumour size and the probability of having a positive sentinel node biopsy procedure for each molecular subtype (Figure 3, Table 4). We showed an almost slope of the correlation axis in the ER negative HER2 negative subgroup. The probability of having an axillary metastasis seemed to be more or less 20% whatever the tumour size. Both ER positive (either HER2 negative or positive) tumour subgoups showed an intermediate slope and the ER negative HER2 positive tumour subgroup showed the steepest slope. Tumour size was a major determinant of axillary metastasis development only in the HER2 positive or ER positive tumour subgroups. Sentinel node biopsies for breast cancers of less than 30 mm was associated with a rate of less than 30% of axillary metastasis in the ER negative HER2 negative subgroup and with one higher than 50% in the other three subgroups. For each molecular subtype as defined by a combination of ER and HER2 immuno-histochemistry markers, we summarized (table 5) eight publications addressing the percentage of axillary metastases [9]–[16].

Introduction: Several authors have underscored a strong relation between the molecular subtypes and the axillary status of breast cancer patients. The aim of our work was to decipher the interaction between this classification and the probability of a positive sentinel node biopsy.

Materials and methods: Our dataset consisted of a total number of 2654 early-stage breast cancer patients. Patients treated at first by conservative breast surgery plus sentinel node biopsies were selected. A multivariate logistic regression model was trained and validated. Interaction covariate between ER and HER2 markers was a forced input of this model. The performance of the multivariate model in the training and the two validation sets was analyzed in terms of discrimination and calibration. Probability of axillary metastasis was detailed for each molecular subtype.

Results: The interaction covariate between ER and HER2 status was a stronger predictor (p = 0.0031) of positive sentinel node biopsy than the ER status by itself (p = 0.016). A multivariate model to determine the probability of sentinel node positivity was defined with the following variables; tumour size, lympho-vascular invasion, molecular subtypes and age at diagnosis. This model showed similar results in terms of discrimination (AUC = 0.72/0.73/0.72) and calibration (HL p = 0.28/0.05/0.11) in the training and validation sets. The interaction between molecular subtypes, tumour size and sentinel nodes status was approximated.

Discussion: We showed that biologically-driven analyses are able to build new models with higher performance in terms of breast cancer axillary status prediction. The molecular subtype classification strongly interacts with the axillary and distant metastasis process.