The combination of classifiers has long been proposed as a method to improve the
accuracy achieved in isolation by a single classifier.
In contrast to such well-explored methods as boosting and bagging,
we are interested in ensemble methods that
allow the combination of heterogeneous sets of classifiers,
which are classifiers built using differing learning paradigms.
We focus on theoretical and experimental comparison of five such combination methods:
majority vote, a Bayesian method, a Dempster-Shafer method,
behavior-knowledge space, and logistic regression.
We have developed an upper bound on the accuracy that
can be obtained by any of the five methods of combination,
and can show that this estimate can be used to determine whether
an ensemble may improve the performance of its members.
We have conducted a series of experiments using standard data sets and learning methods,
and compared experimental results to theoretical expectations.