3D Shape Completion

This page presents a novel learning-based and weakly-supervised approach to 3D shape completion of point clouds. Specifically, a shape prior enables to learn shape completion without access to ground truth shapes, as relevant in many scenarios including autonomous driving, 3D scene understanding or surface reconstruction. Below, a more detailed abstract can be found. Additionally, this page provides the full CVPR'18 paper as well as the source code and two new synthetic shape completion benchmarks.

Abstract

3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well.

Download & Citing

The CVPR paper can be downloaded below; the supplementary material with additional experimental results and details on the setup is provided separately:

Our benchmark derived from KITTI; it uses the ground truth 3D bounding boxes to extract observations from the LiDAR point clouds. It does not include ground truth shapes; however, we tried to generate an alternative by considering the same bounding boxes in different timesteps.

Pre-trained models of the proposed approach as well as the evaluated baselines can be downloaded below; documentation for running the pre-trained models is included in the main repository:

Method

We focus on the problem of inferring and completing 3D shapes based on sparse and noisy 3D point observations as illustrated in Figure 1. Existing approaches to shape completion can be into data-driven and learning-based methods. The former usually rely on learned shape priors and formulate shape completion as optimization problem over the corresponding (lower-dimensional) latent space [][][]. Learning-based approaches, in contrast, assume a fully supervised setting in order to directly learn shape completion on synthetic data [][][][][][]. In the paper, we tackle the problems of both approaches -- the optimization problem of data-driven approaches and the required supervision of learning-based approaches -- in order to combine their strengths -- applicability to real data and efficient inference.

To this end, we propose an amortized maximum likelihood approach for 3D shape completion. More specifically, we first learn a shape model on synthetic data using a variational auto-encoder (Figure 1, step 1). Shape completion can then be formulated as maximum likelihood problem -- in the spirit of []. Instead of maximizing the likelihood independently for distinct observations, however, we follow the idea of amortized inference and learn to predict the maximum likelihood solutions directly given the observations. Towards this goal, we train a new encoder which embeds the observations in the same latent space using an unsupervised maximum likelihood loss (Figure 1, step 2). This allows us to learn 3D shape completion in challenging real-world situations, e.g., on KITTI []. For experimental evaluation, we introduce two novel, synthetic shape completion benchmarks based on ShapeNet [] and ModelNet []. On KITTI, we further compare our approach to the work of Engelmann et al. [] -- the only related work which addresses shape completion on KITTI. Our experiments demonstrate that we obtain shape reconstructions which rival data-driven techniques while significantly reducing inference time.

Experiments

Figure 2 (click to enlarge): Qualitative results on ShapeNet and KITTI, comparing against [] as well as a maximum likelihood (ML) baseline and a fully-supervised baseline (Sup). The proposed approach, amortized maximum likelihood is abbreviated as AML.

Figure 3 (click to enlarge): Qualitative results on ModelNet, where our AML model is trained on all ten classes. We compare against our supervised baseline (Sup) on occupancy grids only.

Figures 2 and 3 show qualitative results of the proposed amortized maximum likelihood (AML) approach to shape completion. We compare against a maximum likelihood (ML) baseline, [] and a supervised baseline (Sup) — see the paper for details. The qualitative results illustrate that our approach is competitive to fully-supervised approaches while only using a fraction of the supervision. Similarly, we show that AML is able to generalize across object categories and is able to correctly identify object categories based on sparse observations. Additionally, our approach is shown to outperform [] while being significantly faster.

Conclusion

We presented a weakly-supervised, learning-based approach to 3D shape completion. After using a variational auto-encoder to learn a shape prior on synthetic data, we formulated shape completion as maximum likelihood problem. We fixed the learned generative model and trained a new, deterministic encoder to amortize the ML problem. This encoder can be trained in an unsupervised fashion. On ShapeNet and ModelNet, we demonstrated that our approach outperforms a state-of-the-art data-driven method [] (while significantly reducing runtime) and generalizes across object categories. We also showed that it is able to compete with the fully-supervised model both quantitatively and qualitatively while using 9% or less supervision. We also demonstrated the applicability to real data on KITTI. Overall, our experiments demonstrate the benefits of the proposed approach: reduced runtime compared to data-driven approaches and training on unlabeled, real data compared to learning-based approaches.