Abstract

In this paper we consider the generalization
accuracy of classification methods based on the iterative use of
linear classifiers. The resulting classifiers, which we call threshold decision lists} act as follows. Some points of the data
set to be classified are given a particular classification
according to a linear threshold function (or hyperplane). These
are then removed from consideration, and the procedure is iterated
until all points are classified. Geometrically, we can imagine
that at each stage, points of the same classification are
successively chopped off from the data set by a hyperplane.
We analyse theoretically the generalization properties of data classification techniques
that are based on the
use of threshold decision lists and on the special subclass of multilevel threshold
functions. We present bounds on the generalization error in a standard probabilistic learning framework.
The primary focus in this paper is on obtaining generalization error bounds
that
depend on the levels of separation---or margins---achieved by the successive linear
classifiers. We also improve and extend previously published
theoretical bounds on the generalization ability of perceptron
decision trees.