The general problem of recognition from stereo processing, laser scanning, or ordinary image data provides basic environmental information for a robot to act autonomously. Results may be the identity and pose of known objects or object parts in possibly cluttered scenes, as well as grasp points or other generic affordances of objects from a known class.

Model-based object recognition or, more generally, scene interpretation may be conceptualized as a two-part process: one that generates a sequence of hypotheses on object identities and poses, the other that evaluates them based on the object models. Viewed as an optimization problem, the former is concerned with the search sequence, the latter with the objective function.

Robust scene interpretation by means of machine vision is a key factor in various new applications in robotics. Part of this problem is the efficient recognition and classification of previously known three-dimensional (3D) shapes in arbitrary scenes. So far, heavily constrained conditions have been utilized, or otherwise solutions have not been achieved in real time.

Suppose we constrain objects from a known set to stand in certain `upright' poses on a supporting plane, for instance, chairs and tables on the floor, or glasses and bottles on a table. The orientation of such planes is often known, as for a mobile robot in flat terrain.