Occlusion Reasoning for Object Detection under Arbitrary Viewpoint

People

Description

Occlusions are common in real world scenes and are a major obstacle to robust object detection. Whereas previous approaches primarily modeled local coherency of occlusions or attempted to learn the structure of occlusions from data, we propose to explicitly model occlusions by reasoning about 3D interactions of objects. For a given environment, we compute physical statistics of objects in the scene and represent an occluder as a probabilistic distribution of 3D blocks. The physical statistics need only be computed once for a particular environment and can be used to represent occlusions for many objects in the scene. By reasoning about occlusions in 3D, we effectively provide a unified occlusion model for different viewpoints of an object as well as different objects in the scene. The main contributions of this work are (1) a concise model of occlusions under arbitrary viewpoint without requiring additional training data and (2) a method to capture global visibility relationships without combinatorial explosion.

Figure 1. Occlusion model. Given the object viewpoint, an occluder (red) is modeled by its projected width and projected height in the image.

Figure 2. Occlusion probabilities.

We validate our approach by extending the LINE2D method, a current state-of-the-art system for instance detection under arbitrary viewpoint. Since current datasets for object detection under multiple viewpoints contain either objects on simple backgrounds or have minimal occlusions, we collected our own dataset for evaluation under a more natural setting. Our dataset contains 1600 images of 8 objects in real, cluttered environments and is split evenly into two parts; 800 for a single view of an object and 800 for multiple views of an object. The single-view part contains ground truth labels of the occlusions and contains roughly equal amounts of partial occlusion (1-35%) and heavy occlusions (35-80%) as defined by Dollar et al. Our results on this challenging dataset demonstrate that capturing global visibility relationships is more informative than the typical a priori probability of a point being occluded and that our approach can significantly improve object detection performance.

References

Funding

This material is based upon work
partially supported by the National Science Foundation under Grant No.
EEC-0540865.

Copyright notice

The documents contained in these
directories are included by the contributing authors as a means to
ensure timely dissemination of scholarly and technical work on a
non-commercial basis. Copyright and all rights therein are maintained
by the authors or by other copyright holders, notwithstanding that they
have offered their works here electronically. It is understood that all
persons copying this information will adhere to the terms and
constraints invoked by each author's copyright.