Abstract: Visual place recognition under difficult perceptual conditions remains a challenging
problem due to changing weather conditions, illumination and seasons.
Long-term visual navigation approaches for robot localization should be robust to
these dynamics of the environment. Existing methods typically leverage
feature descriptions of whole images or image regions from Deep Convolutional Neural
Networks. Some approaches also exploit sequential information to alleviate the problem
of spatially inconsistent and non-perfect image matches. In this paper, we propose a novel approach for learning a discriminative holistic image representation which exploits the image content to create a dense and salient scene description.
These salient descriptions are learnt over a variety of datasets under large perceptual changes.
Such an approach enables us to precisely segment the regions of an image which are geometrically
stable over large time lags. We combine features from these salient regions and an off-the-shelf
holistic representation to form a more robust scene descriptor. We also introduce a semantically
labeled dataset which captures extreme perceptual and structural scene dynamics over
the course of 3 years. We evaluated our approach with extensive experiments on data collected
over several kilometers in Freiburg and show that our learnt image representation outperforms off-the-shelf
features from the deep networks and hand-crafted features.