Towards Robust Data Association in Real-time Visual SLAM

Abstract

Recent years have seen the emergence of systems capable of tracking in real-time the 6-D pose of a moving camera whilst simultaneously building a structural map of the surrounding environment. Such vision based simultaneous localisation and mapping (SLAM) systems have huge potential in terms of providing low cost and flexible 3-D location sensing, capable of operating with agile hand held devices and in previously unseen environments. Applications are numerous, particularly in areas such as Augmented and Virtual Reality, in which positioning and tracking technology play
a key role.

However, a requirement for this potential to be realised is that these systems need to operate reliably and robustly in the presence of natural human motions, including rapid accelerations, erratic motion and sudden changes in viewpoint. Building in resistance to these real-world motion characteristics is the subject of this Thesis. Specifically, we investigate how to improve the data association stage of visual SLAM systems. Data association is the process of obtaining correct feature correspondences between any two images and is vital for stable operation. Previous approaches rely on simple but not very discriminative matching, leading to the selection of erroneous measurements, especially during fast or erratic motions.

To address this we propose the use of more distinctive image descriptors for matching which provide additional resilience and improved capability to recover following tracking failure. The descriptors are based on a histogram of spatial gradients representation which provides a degree of invariance to the camera viewpoint. Crucially, we couple this with scale prediction derived from knowledge of the camera pose which is derived from the SLAM system. This leads to an efficient and robust data association mechanism which gives improved performance over previous approaches.

In addition, we introduce the use of exemplar descriptors to provide further invariance to viewing direction. These are based on warped versions of original descriptors derived from a planar model of local image structure. These are also coupled with scale prediction and again lead to improved performance in terms of reliability of data association. We also demonstrate that the proposed image descriptor approach provides an effective framework within which to base a camera relocalisation mechanism, based on efficient matching of descriptors within the map to those in frames captured when the camera is lost. The resulting visual SLAM system is efficient, running at real-time rates of 20-30 fps, and robust, providing resilience and recovery even in the presence of highly erratic and rapid motions experienced during hand-held camera operation. The Thesis
concludes with examples of utilising the system in two Augmented Reality type applications.