Boosting Feature Selection

Abstract

It is possible to reduce the error rate of a single classifier using a classifier ensemble. However, any gain in performance is undermined by the increased computation of performing classification several times. Here the AdaboostFS algorithm is proposed which builds on two popular areas of ensemble research: Adaboost and Ensemble Feature Selection (EFS). The aim of AdaboostFS is to reduce the number of features used by each base classifer and hence the overall computation required by the ensemble. To do this the algorithm combines a regularised version of Boosting AdaboostReg [1] with a floating feature search for each base classifier.

AdaboostFS is compared using four benchmark data sets to AdaboostAll, which uses all features and to AdaboostRSM, which uses a random selection of features. Performance is assessed based on error rate, ensemble error and diversity, and the total number of features used for classification. Results show that AdaboostFS achieves a lower error rate and higher diversity than AdaboostAll, and achieves a lower error rate and comparable diversity to AdaboostRSM. However, over the other methods AdaboostFS produces a significant reduction in the number of features required for classification in each base classifier and the entire ensemble.