Recent research has considered the relationship between boosting and
more standard statistical methods, such as logistic regression,
concluding that AdaBoost is similar but somehow still very different
from statistical methods in that it minimizes a different loss
function. In
this paper we derive an equivalence between AdaBoost and the dual of a
convex optimization problem. In this setting, it is seen that the
only difference between minimizing the exponential loss used by
AdaBoost and maximum likelihood for exponential models is that the
latter requires the model to be normalized to form a conditional
probability distribution over labels; the two methods minimize the
same Kullback-Leibler divergence objective function subject to
identical feature constraints. In addition to establishing a simple
and easily understood connection between the two methods, this
framework enables us to derive new regularization procedures for
boosting that directly correspond to penalized maximum likelihood.
Experiments on UCI datasets, comparing exponential loss and maximum
likelihood for parallel and sequential update algorithms, confirm our
theoretical analysis, indicating that AdaBoost and maximum likelihood
typically yield identical results as the number of features increases
to allow the models to fit the training data.