Abstract:

Recurring visual elements in videos commonly represent central content entities, such as main characters and dominant objects. The automated detection of such elements is crucial for various application fields ranging from compact video content summarization to the retrieval of videos sharing common visual entities. Recent approaches for content-based video analysis commonly require for prior knowledge about the appearance of potential objects of interest or build upon a specific assumption, such as the presence of a particular camera view, object motion, or a reference set to estimate the appearance of an object. In this paper, we propose an unsupervised, model-based approach for the detection of recurring visual elements in a video sequence. Detected elements do not necessarily represent an object, yet, they allow for visual and semantic interpretation. The experimental evaluation of detected models across different videos demonstrate the ability of the models to capture potentially high diversity in the visual appearance of the traced elements.