Caltech 10, 000 Web Faces

The
dataset contains images of people
collected from the web by typing common given names into
Google Image Search. The coordinates
of the eyes, the nose and the center of the mouth for each frontal
face are provided in a ground truth file. This information can be used to align
and crop the human faces or as a ground truth for a face detection
algorithm. The dataset has 10,524
human faces of various resolutions and in different settings,
e.g. portrait images, groups of people, etc. Profile faces or very
low resolution faces are not labeled.

Before you download
the data, please note: The pictures in the dataset were harvested from the web
for the purpose of carrying out not-for-profit scientific experiments and are
not Caltech property. Any use of the dataset, other than 'fair
use', must be negotiated with the pictures'
owners. Caltech is not responsible for the content nor the meaning of the
images.

The data contains a total of 10,524
faces in 7,092 images. The average image resolution is 304x312 pixels across
the data. Here is a script which displays
image resolution statistics.

The statistics of the resolution of
faces present in the dataset are presented below. The matlab script used to
generate them is here.

The data has a number of duplicate
images. These duplicate images were distributed among different people to
provide ground truth and can be used to evaluate reliability and precision of
the manually generated ground truth. Here is a list
of the images which we believe to be duplicates and here is a
script which identifies them.