Teaching

Visual Scene Interpretation

Visual scene interpretation is an important ability for intelligent systems. We realize this ability as a complete loop from extracting bottom-up information from visual data, fusing these information with top-down knowledge, and improving bottom-up processing by feeding back interpretations as relevance or context information.

Research Questions

The goal for artificial vision systems is to analyze the visually perceivable environment in order to build up a rich scene representation, enabling the robot to navigate and manipulate its environment. In particular, we focus on determining salient spatial regions and develop local and global features for object and supporting structures detection [1], scene category learning [2], and functional property analysis. We work on representations which integrate top-down knowledge like existing models or information gathered in interaction with humans, e.g., spatial descriptions [3] or activity patterns [4].

We aim at analyzing these representations to extract semantic meaningful information to enhance the bottom-up processing. For example, identifying the scene category, the support type, or typical reference frames of a spatial structure improve detection of objects. Observed scene activities can be used to determine object affordances, knowing the layout of objects will provide the best candidate among a given set of grasping points.

Robotic grasping and processing of highly deformable objects, such as laundry, requires a very detailed object analysis. We support this task by developing algorithms for several steps of this process, including visual category recognition and grasp point detection for clothes.