Download Item:

Abstract:

Random Forests are a successful ensemble prediction technique
that combines two sources of randomness to generate base
decision trees; bootstrapping instances for each tree and
considering a random subset of features at each node. Breiman in
his introductory paper on Random Forests claims that they are
more robust than boosting with respect to overfitting noise, and
are able to compete with boosting in terms of predictive
performance. Multiple recently published empirical studies
conducted in various application domains confirm these claims.
Random Forests use simple majority voting to combine the
predictions of the trees. However, it is clear that each decision tree
in a random forest may have different contribution in classifying a
certain instance. In this paper, we demonstrate that the prediction
performance of Random Forests may still be improved in some
domains by replacing the combination function. Dynamic
integration, which is based on local performance estimates of base
predictors, can be used instead of majority voting. We conduct
experiments on a selection of classification datasets, analysing the
resulting accuracy, the margin and the bias and variance
components of error. The experiments demonstrate that dynamic
integration increases accuracy on some datasets. Even if the
accuracy remains the same, dynamic integration always increases
the margin. A bias/variance decomposition demonstrates that
dynamic integration decreases the error by significantly
decreasing the bias component while leaving the same or
insignificantly increasing the variance. The experiments also
demonstrate that the intrinsic similarity measure of Random
Forests is better than the commonly used Heterogeneous
Euclidean/Overlap Metric in finding a neighbourhood for local
estimates in this context.