Résumé : Automatic classification of high-resolution mass spectrometry
proteomic data has increasing potential in the early diagnosis of
cancer. We present and discuss a new procedure of biomarker discovery
in serum protein profiles based on :

(1) discrete wavelet transformation of the spectra,

(2) selection of discriminative wavelet coefficients by a statistical test,

(3) building and evaluating a support vector machine classifier by
double cross-validation with attention to the generalizability of the
results.

In addition to the evaluation results (total recognition rate,
sensitivity and specificity), the procedure provides the biomarker
patterns, i.e. the parts of spectra which discriminate cancer and
control individuals. The evaluation was performed on MALDI-TOF serum
protein profiles of 66 colorectal cancer patients and 50 controls. Our
procedure provided a high recognition rate (97.3%), sensitivity
(98.4%), and specificity (95.8%). The extracted biomarker patterns
mostly represent the peaks expressing mean differences between the
cancer and control spectra. However, we show that the discriminative
power of a peak is not simply expressed by its mean height and can not
be derived by comparison of the mean spectra. The obtained classifiers
have high generalization power as measured by the number of support
vectors. This prevents overfitting and contributes to the
reproducibility of the results, which is required to find
biomarkers differentiating cancer patients from healthy individuals.