Robust Generation of 3D Models from Video Footage of Urban Scenes

Abstract

There are a large number of potential applications for a system that is capable of automatically reconstructing texture--mapped 3D models directly from video footage of a scene. The structure from motion problem has been the focus of a great deal of research effort in recent years, however the subsequent creation of surface meshes from the sparse 3D point clouds produced by structure from motion algorithms has received much less attention. If the goal of a fully automated model generation system is to be realised, then a reliable method of fitting surface meshes to these point clouds must be found. This thesis presents work concerning the construction and improvement of surface meshes, using the data generated by structure from motion algorithms, with a view to the creation of a system capable of automatically generating 3D models directly from video footage of large--scale urban scenes.

A technique for robustly creating surface meshes from sparsely populated 3D point clouds is presented. Image--consistent triangulation is used within the framework of a simulated annealing algorithm to iteratively modify an initial naive mesh. This method copes well with data contaminated by outliers, it produces a simplified mesh, particularly for urban scenes where planar surfaces are prevalent, and it is likely to converge successfully to a global minimum. The algorithm is shown to be capable of producing meshes that accurately represent the scene even in the presence of significant numbers of outliers.

Because the meshing technique is reliant on the scene being sufficiently represented by points, a method for augmenting the initial point clouds is developed. By exploiting the presence of adjacent planar facets in the scene, new points are added to the point cloud in crucial areas such as along edges and on corners. This method proceeds by robustly identifying, and fitting planes to, sets of coplanar points in the point cloud. Suitable candidate plane--pairs are identified and points are created along the lines of intersection. A number of measures are necessary to ensure that only valid intersections are made and that the resulting points lie on the surface of the scene. Additionally, a new robust estimation algorithm is created to address a number of issues associated with identifying multiple populations of coplanar points where points may be shared between populations. It is shown that point clouds augmented in this way prior to meshing can, depending on the prior representation by the initial point cloud, result in substantially higher quality models being produced.