(Auteur) One major goal of vision is to infer physical models of objects, surfaces, and their layout from sensors. In this paper, we aim to interpret indoor scenes from one RGBD image. Our representation encodes the layout of orthogonal walls and the extent of objects, modeled with CAD-like 3D shapes. We parse both the visible and occluded portions of the scene and all observable objects, producing a complete 3D parse. Such a scene interpretation is useful for robotics and visual reasoning, but difficult to produce due to the well-known challenge of segmentation, the high degree of occlusion, and the diversity of objects in indoor scenes. We take a data-driven approach, generating sets of potential object regions, matching to regions in training images, and transferring and aligning associated 3D models while encouraging fit to observations and spatial consistency. We use support inference to aid interpretation and propose a retrieval scheme that uses convolutional neural networks to classify regions and retrieve objects with similar shapes. We demonstrate the performance of our method on our newly annotated NYUd v2 dataset (Silberman et al., in: Computer vision-ECCV, 2012, pp 746–760, 2012) with detailed 3D shapes.

(auteur) Various widely available applications such as Google Earth have made interactive 3D visualizations of spatial data popular. While several studies have focused on how users perform when interacting with these with 3D visualizations, it has not been common to record their virtual movements in 3D environments or interactions with 3D maps. We therefore created and tested a new web-based research tool: a 3D Movement and Interaction Recorder (3DmoveR). Its design incorporates findings from the latest 3D visualization research, and is built upon an iterative requirements analysis. It is implemented using open web technologies such as PHP, JavaScript, and the X3DOM library. The main goal of the tool is to record camera position and orientation during a user’s movement within a virtual 3D scene, together with other aspects of their interaction. After building the tool, we performed an experiment to demonstrate its capabilities. This experiment revealed differences between laypersons and experts (cartographers) when working with interactive 3D maps. For example, experts achieved higher numbers of correct answers in some tasks, had shorter response times, followed shorter virtual trajectories, and moved through the environment more smoothly. Interaction-based clustering as well as other ways of visualizing and qualitatively analyzing user interaction were explored.

(Auteur) We address the issue of the semantic segmentation of large-scale 3D scenes by fusing 2D images and 3D point clouds. First, a Deeplab-Vgg16 based Large-Scale and High-Resolution model (DVLSHR) based on deep Visual Geometry Group (VGG16) is successfully created and fine-tuned by training seven deep convolutional neural networks with four benchmark datasets. On the val set in CityScapes, DVLSHR achieves a 74.98% mean Pixel Accuracy (mPA) and a 64.17% mean Intersection over Union (mIoU), and can be adapted to segment the captured images (image resolution 2832 ∗ 4256 pixels). Second, the preliminary segmentation results with 2D images are mapped to 3D point clouds according to the coordinate relationships between the images and the point clouds. Third, based on the mapping results, fine features of buildings are further extracted directly from the 3D point clouds. Our experiments show that the proposed fusion method can segment local and global features efficiently and effectively.

(auteur) Precise localization in dense urban areas is a challenging task for both mobile mapping and driver assistance systems. This paper proposes a strategy to use road markings as localization landmarks for vision based systems. First step consists in reconstructing a map of road marks. A mobile mapping system equipped with precise georeferencing devices is applied to scan the scene in 3D and to generate an ortho-image of the road surface. A RJMCMC sampler that is coupled with a simulated annealing method is applied to detect occurrences of road marking templates instanced from an extensible database of road mark patterns. The detected objects are reconstructed in 3D using the height information obtained from 3D points. A calibrated camera and a low cost GPS receiver are embedded on a vehicle and used as localization devices. Local bundle adjustment (LBA) is applied to estimate the trajectory of the vehicle. In order to reduce the drift of the trajectory, images are matched with the reconstructed road marks frequently. The matching is initialized by the initial poses that are estimated by LBA and optimized by a MCMC algorithm. The matching provides ground control points that are integrated in the LBA in order to refine the pose parameters. The method is evaluated on a set of images acquired in a real urban area and is compared with a precise ground-truth.