We present a method
to automatically discover meaningful features in unlabeled image collections.
Each image is decomposed into semi-local features that describe neighborhood
appearance and geometry. The goal is to determine for each image which of these
parts are most relevant, given the image content in the remainder of the
collection. Our method first computes an initial image-level grouping based on
feature correspondences, and then iteratively refines cluster assignments based
on the evolving intra-cluster pattern of local matches. As a result, the
significance attributed to each feature influences an image’s cluster
membership, while related images in a cluster affect the estimated significance
of their features. We show that this mutual reinforcement of object-level and
feature-level similarity improves unsupervised image clustering, and apply the
technique to automatically discover categories and foreground regions in images
from benchmark datasets.

System Overview

The
images are grouped based on weighted semilocal
feature matchings (a), and then image-specific
feature weights are adjusted based on their contribution in the match relative
to all other intra-cluster images (b). These two processes are iterated (as
denoted by the block arrows in the center) to simultaneously determine
foreground features while improving cluster quality. Dotted arrows denote
images with updated cluster memberships.

Evaluation

We
performed experiments both to analyze the mutual reinforcement of foreground
and clusters, and to compare against existing unsupervised methods. We work
with images from the Caltech-101, because the dataset
provides object segmentations that we need as ground truth to evaluate our
foreground detection.We formed a
four-class (Faces, Dalmatians, Hedgehogs, and Okapi) and 10-class (previous
four plus Leopards, Car side, Cougar face, Guitar, Sunflower, and Wheelchair)
set. For each class, we use the first 50 images.

If
our algorithm correctly identifies the important features, we expect those
features to lie on the foreground objects since that
is what primarily re-occurs in these datasets. To evaluate this, we compare the
feature weights computed by our method with the ground truth list of foreground
features. We quantify accuracy by the percentage of total feature weight in an
image that our method attributes to true foreground features.

As
our method weights foreground features more highly, we also expect a positive
effect on cluster quality. Since we know the true labels of each image, we can
use the F-measure to measure cluster homogeneity.