Abstract

A successful representation of objects in literature is as a collection of patches,
or parts, with a certain appearance and position. The relative locations of the different
parts of an object are constrained by the geometry of the object. Going beyond a single
object, consider a collection of images of a particular scene category containing
multiple (recurring) objects. The parts belonging to different objects are not constrained
by such a geometry. However, the objects themselves, arguably due to their semantic
relationships, demonstrate a pattern in their relative locations. Hence, analyzing
the interactions among the parts across the collection of images can allow for extraction
of the foreground objects, and analyzing the interactions among these objects can
allow for a semantically meaningful grouping of these objects, which characterizes
the entire scene. These groupings are typically hierarchical. We introduce hierarchical
semantics of objects (hSO) that captures this hierarchical grouping. We propose an
approach for the unsupervised learning of the hSO from a collection of images of a
particular scene. We also demonstrate the use of the hSO in providing context for
enhanced object localization in the presence of significant occlusions, and show its
superior performance over a fully connected graphical model for the same task.