Although Neural Networks (NNs) are an effective tool in many applications, a NN may
be inefficient for solving (complex) tasks. To tackle this problem, we may combine a set of
NNs in order to construct NN ensemble capable of solving the initial problem, providing
an easier design solution and helping to interpret more clearly the resulting machine. The
above reasons have increased the interest in this research area during recent years. Among
NN ensembles, boosting methods, and in particular AdaBoost, are attractive because of
their simple conceptual principles and their good generalization performance.
In this Ph.D. Thesis, we start from the Real AdaBoost (RA) algorithm, where the emphasis
function can be decomposed into the product of two factors. The first depending on
the quadratic error of each sample, and the second being a function of the “proximity” of
the sample to the classification border. This decomposition makes it possible to generalize
the structure of the RA emphasis function by introducing an adjustable mixing parameter
¸ to control the trade-off between both emphasis terms; the algorithm resulting from this
proposal is referred to as RA with weighted emphasis (RA-we). Experiments show that
a significant improvement over the classical RA performance can be achieved if mixing
parameter ¸ is adequately selected. However, finding the optimal ¸ is not always an easy
task, and using Cross Validation selection methods does not exploit fully the potential that
the mixed emphasis function can provide.
Following this research line, this Dissertation also explores two alternatives for selecting
the mixing parameter. Rather than trying to find the best value for ¸, the first proposal
combines the outputs of a number of RA-we networks trained with different values of ¸; in
this way, we take advantage of the diversity introduced by the mixing coefficient to build
committees of RA-we networks. The second approach considers a generalized version of
the learner edge defined by the RA algorithm (a weighted correlation between the learners
output and the true labels) as an indication of the learner quality, and it proposes to dynamically
adjust the mixing parameter during the ensemble growth. In order to do this, we
iteratively select the value that provides the learner with the largest generalized edge.
The effectiveness of these two approaches is corroborated over several benchmark biVIII
nary decision problems, showing the efficacy of the mixed emphasis approach, as well
as the appropriateness of both schemes for selecting ¸: (1) committees of RA-we networks,
and (2) dynamic ¸ selection. Finally, we conclude that the algorithms described in
this Thesis in comparison to traditional RA algorithms present interesting possibilities for
building multi-net systems.