AdaBoost [Freund-Schapire, 1997] is one of the best off-the-shelf supervised classification methods developed in the last fifteen years. Despite (or perhaps due to?) its simplicity and versatility, it is suprisingly under-represented in the family of open softwares. The goal of this submission is to fill this gap.

Our implementation is based on the AdaBoost.MH algorithm [Schapire-Singer, 1999]. It is an intrinsically multi-class classification method (unlike SVM for example), and it was easy to extend to multi-label or multi-task classification (when one item can belong to several classes). The program package can be divided into four modules that can be changed more-or-less independently depending on the application.

The strong learner. It tells you how to boost. The main boosting engine is AdaBoost.MH, but we have also implemented FilterBoost for a research project, and Arc-GV which is basically straightforward once you have the main engine. Other possible strong learners could be LogitBoost and ADTrees.

The base (or weak) learner. It tells you what features to boost. Right now we have two basic (feature-wise) base learners: decision stumps for real-valued features and indicators for nominal features. We have two meta base learners: trees and products. They can use any base learner and construct a generic complex base learner using a "classic" tree-structure (decision trees), or using the product of simple base learners (self advertisement: boosting products of stumps is the best reported no-domain-knowledge algorithm on MNIST after Hinton and Salakhutdinov's deep belief nets). We have also implemented Haar filters [Viola-Jones, 2004] for image classification, a meta base learner that uses stumps over a high dimensional feature space computed "on the fly". It is a nice example of a domain dependent base learner that works hand-in-hand with its appropriate data structure.

The data representation. The basic data structure is a matrix of observations with a vector of labels. We also have multi-label classification when the label data is also a full matrix. In addition, we have sparse data representation for both the observation matrix and the label matrix. In general, base learners are implemented to work with their own data representation (for example, sparse stumps work on sparse observation matrices, or Haar filters work on a integral image data representation).

Data parser. We can read in data in arff and svmlight formats.

The base learner/data structure combinations cover a large spectrum of possible applications, but the main advantage of the package is that it is easy (for the advanced user) to adapt MultiBoost to a specific (non-standard) application by implementing the base learner and data structure interfaces that work together.

The source code is available from the website multiboost.org. It can be compiled on Mac OS X, Linux, and Microsoft Windows. The interface is command line execution with switches.

A new fast (sublinear in the number of instances) stump algorithm is implemented. The gain in time is proportional to the sparsity of the features (it is significant when a lot of instances take the most frequent feature value). See Section B.2 in the documentation.

A parametrized early stopping option is added in --traintest mode. We stop if the (smoothed) test error does not improve for a certain number of iterations. See Section 4.1.3 in the documentation.