Object detection is a fundamental problem in computer vision. For
such applications as image indexing, simply knowing the presence or
absence of an object is useful. Detection of faces, in particular, is
a critical part of face recognition and, and critical for systems
which interact with users visually.

Techniques for addressing the object detection problem include those
matching a two- and three-dimensional geometric models to images, and
those using a collection of two-dimensional images of the object for
matching. This dissertation will show that the latter view-based
approach can be effectively implemented using artificial neural
networks, allowing the detection of upright, tilted, and non-frontal
faces in cluttered images. In developing a view-based object detector
using machine learning, three main subproblems arise. First, images
of objects such as faces vary considerably with lighting, occlusion,
pose, facial expression, and identity. When possible, the detection
algorithm should explicitly compensate for these sources of variation,
leaving as little as possible unmodelled variation to be learned.
Second, one or more neural networks must be trained to deal with all
remaining variation in distinguishing objects from non-objects.
Third, the outputs from multiple detectors must be combined into a
single decision about the presence of an object.

This thesis introduces some solutions to these subproblems for the
face detection domain. A neural network first estimates the
orientation of any potential face. The image is then rotated to an
upright orientation and preprocessed to improve contrast, reducing its
variability. Next, the image is fed to a frontal, half profile, or
full profile face detection network. Supervised training of these
networks requires examples of faces and nonfaces. Face examples are
generated by automatically aligning labelled face images to one
another. Nonfaces are collected by an active learning algorithm,
which adds false detections into the training set as training
progresses. Arbitration between multiple networks and heuristics,
such as the fact that faces rarely overlap in images, improve the
accuracy. Use of fast candidate face selection, skin color detection,
and change detection allows the upright and tilted detectors to run
fast enough for interactive demonstrations, at the cost of slightly
lower detection rates.

The system has been evaluated on several large sets of grayscale test
images, which contain faces of different orientations against
cluttered backgrounds. On their respective test sets, the upright
frontal detector finds 86.0% of 507 faces, the tilted frontal
detector finds 85.7% of 223 faces, and the non-frontal detector finds
56.2% of 96 faces. The differing detection rates reflect the
relative difficulty of these problems. Comparisons with several other
state-of-the-art upright frontal face detection systems will be
presented, showing that our system has comparable accuracy. The
system has been used successfully in the Informedia video indexing and
retrieval system, the Minerva robotic museum tour-guide, the WebSeer
image search engine for the WWW, and the Magic Morphin' Mirror
interactive video system.