ARTICLE

My master thesis, written at the Autonomous Vision Group of Max Planck Institute for Intelligent Systems under the supervision of Prof. Andreas Geiger, addresses the problem of 3D shape completion of sparse point clouds under weak supervision. Specifically, based on a learned shape prior it is possible to learn 3D shape completion without access to ground truth shapes, as shown on KITTI. This article briefly introduces the problem and the main contributions and offers the thesis as download.

Introduction

Figure 1 (click to enlarge): Illustration of the shape completion problem on KITTI.

Shape perception is a long-standing and fundamental problem both in human [][] and computer vision []. In both disciplines, a large body of work focuses on 3D reconstruction:reconstructing objects or scenes from one or more views, an inherently ill-posed inverse problem because many configurations of shape, color, texture and lighting may give rise to the very same views []. In human vision, one of the fundamental problems is understanding how the human visual system accomplishes such tasks; in computer vision, in contrast, the goal is to develop 3D reconstruction systems. Results from human vision suggest that priors as well as the ability to process involved cues is innate and not learned. In computer vision, as well, cues and priors are commonly built into 3D reconstruction pipelines through explicit assumptions. Recently, however, researchers started to learn shape models from data. Predominantly generative models have been used to learn how to generate, manipulate and reason about shapes [][][][][][]. Learning such shape models offers many interesting possibilities for a wide variety of problems in 3D computer vision.

In this context, we focus on a specific problem in the realm of 3D reconstruction, namely shape completion from point clouds, as illustrated in Figure 1. This problem occurs when only a single view of an individual object is provided and large parts of the object are not observed or occluded. Motivated by the success of learning shape models, we intend to tackle shape completion using a learning-based approach where we make use of shape priors learned from large datasets of shapes such as ModelNet [] or ShapeNet []. This idea, i.e. learning-based shape completion, has recently gained traction
by works such as [][][][][] or []. Most learning-based approaches require full supervision; this means that observations are either synthesized from known models, or datasets need to be annotated. On real data, e.g. on KITTI [][], shape completion without supervision can be posed as energy minimization problem over a latent space of shapes [][][]. In this case, shape completion usually involves solving a complex minimization problem using iterative approaches. Deep learning-based approaches, in contrast, can complete shapes using a single forward pass of the learned network. We find that both problems, the required supervision on the one hand and the computationally expensive optimization problems at the other, constrain the applicability of these approaches to real data considerably.

Contributions

We propose two different probabilistic frameworks enabling us to learn shape completion with weak supervision, thereby mitigating both mentioned problems. In both cases, we first train a shape prior, particularly a variational auto-encoder. In the spirit of [], we can then formulate shape completion as maximum likelihood problem over the learned latent space. Instead of maximizing the likelihood independently for distinct observations, however, we follow the idea of amortized inference [] and learn to predict the maximum likelihood solution directly given the corresponding observations. Specifically, we train an encoder, which embeds the observations in the same latent space, using an unsupervised, maximum likelihood loss between the observations and the corresponding shapes. This variant of amortized maximum likelihood allows us to learn shape completion under real conditions,e.g. on KITTI, and is able to compete with a fully-supervised baseline on a ShapeNet-based, synthetic dataset used for evaluation.

As alternative approach, we extend the general framework of latent space models, as implemented by variational auto-encoders, to specifically account for the observations. Applied to a pre-trained variational auto-encoder — representing the required shape prior — we derive the evidence lower bound of this extended variational auto-encoder which we then optimize in an unsupervised fashion, i.e. only given the observations. We also show that the underlying objective is closely related to our amortized maximum likelihood approach. On our synthetic, ShapeNet-based dataset, we experimentally demonstrate the applicability of the extended variational auto-encoder regarding shape completion. Overall, we present two approaches in favor of our claim that shape priors allow to learn shape completion in an unsupervised fashion, thereby also introducing many interesting directions for future research.

ABOUTTHEAUTHOR

In September, I was honored to receive the MINT-Award IT 2018, sponsored by ZF and audimax, for my master thesis on weakly-supervised shape completion. For CVPR 2019, however, I am working on a different topic: adversarial robustness and generalization of deep neural networks.
18thOCTOBER2018 , David Stutz

What is your opinion on this article? Did you find it interesting or useful? Let me know your thoughts in the comments below: