Purpose: To investigate the potential of FDG-PET image-derived characteristics for the prediction of head neck cancer treatment outcomes.

Methods: A cohort of 67 patients with histologically proven head and neck squamous cell carcinoma (HNSCC) was retrospectively evaluated in this study. All patients underwent pre-treatment FDG-PET scans before receiving radical radiotherapy (n=7) or chemo-radiotherapy (n=60). Patients had a median follow-up of 30 months (range: 4-71). Treatment failure (TF) was reported for 11 patients as tumor recurrence and/or distant metastases (DM, n=8). Eleven features were extracted from the FDG-PET tumor region: 6 texture features (energy, entropy, homogeneity, contrast, correlation and variance), 2 SUV measures (SUVmax and % inactive volume) and 3 shape features (volume, solidity and eccentricity). Multivariable modeling was performed using ensembles of logistic regression (LR) classifiers. The corresponding classification performance was assessed using receiver operating characteristic (ROC) metrics on leave-one-out cross-validation (LOO-CV) resampling. The LR ensembles accounted for the effect of data imbalance by repeating the TF/DM instances (n=11/8) into an optimal number M of partitions and by randomly distributing the non-TF/DM instances (n=56/59) into the M partitions (for 100 LOO-CV repetitions), to finally average the partitions LR responses.

Results: The subset of features that yielded the highest area under the ROC curve (AUC) for TF prediction using M=7 was: entropy, variance, volume and solidity. This model reached an AUC of 0.73 (0.74 sensitivity, 0.63 specificity). Similarly, the prediction of DM with M=8 using an equivalent model (energy, variance, volume, solidity) reached an AUC of 0.77 (0.78 sensitivity, 0.67 specificity).

Conclusion: Our results demonstrate the possibility of using prognostic models combining tumor shape and FDG-PET texture features for the prediction of treatment outcomes in HNSCC. The ensemble methodology used in this study allowed the modeling of unbalanced data without compromising either the sensitivity or the specificity of the LR classifiers.