ARTICLE

PointNet Auto-Encoder in Torch

Recently proposed neural network architectures, including PointNets and PointSetGeneration networks, allow deep learning on unordered point clouds. In this article, I present a Torch implementation of a PointNet auto-encoder — a network allowing to reconstruct point clouds through a lower-dimensional bottleneck. As loss during training, I implemented a symmetric Chamfer distance in C/CUDA and provide the code on GitHUb.

In several recent papers [][][], researchers proposed neural network architectures for learning on unordered point clouds. In [], shape classification and segmentation is performed by computing per-point features using a succession of multi-layer perceptrons which are finally max- or average-pooled into one large vector. This pooling layer ensures that the network is invariant to the order of the input points. In [], in contrast, a network predicting point clouds is proposed for single-image 3D reconstruction. Both approaches are illustrated in Figure 1.

(a) Illustration of the PointNet architecture for shape classification and segmentation as discussed in [].

(b) Illustration of the point set generation network as proposed in [].

Figure 1 (click to enlarge): Illustrations of the PointNet and point set generation networks.

As part of my master thesis I experimented with a PointNet auto-encoder which combines the encoder from [] and the decoder from [] to learn how to reconstruct point clouds. While there are implementations from the authors available, I wanted to have a custom implementation in Torch for experimentation. The implementation can be found on GitHub:

Note that the code was not thoroughly tested; and as discussed below, training is not necessarily stable.

Details

As indicated above, the main problem of deep learning on point clouds is their unordered nature. In [], Qi et al. propose a solution based on pooling. In particular, they argue that any function $f(\{x_1,\ldots,x_n\})$ on a set of points $\{x_1,\ldots,x_n\}$ can be approximated using a point-wise function $h$ and a symmetric function $g$:

$f(\{x_1,\ldots,x_n\}) \approx g(h(x_1),\ldots,g(h(x_n))$.

In practice, the point-wise function $h$ can be learned using a multi-layer perceptron and the symmetric function $g$ may be a symmetric pooling operation such as max- or average-pooling.

Note that the point-wise function $h$ is implemented using several convolutional layers. With the correct kernel size, these are used to easily implement the point-wise multi-layer perceptron with shared weights across points. As a result, the input point cloud is provided as $N \times 3$ image; in the above example $1000$ points are assumed to be provided. As symmetric function $g$, an average pooling layer is used; a subsequent convolutional layer inflates the pooled features in order to predict a point cloud.

Given a network as shown above, the loss used for training also needs to be invariant to the order of the points. To this end, a symmetric nearest point distance can be used; in [], Qi et al. refer to this approach as Chamfer distance, which can be computed as follows:

Discussion

Figure 2 (click to enlarge): qualitative results of targets (left) and predictions (right). Note that we predicted $1000$ points; for the ground truth, however, we show $5000$ points.

Figure 2 shows some qualitative results on a dataset of rotated and slightly translated cuboids. In general, I found that these auto-encoders are difficult to train. For example, when adding a linear layer after pooling — to compute a low-dimensional bottleneck — training becomes significantly more difficult, such that the network often predicts a "mean point cloud". Similarly, the network architecture has a significant influence on training.

ABOUTTHEAUTHOR

In September, I was honored to receive the MINT-Award IT 2018, sponsored by ZF and audimax, for my master thesis on weakly-supervised shape completion. For CVPR 2019, however, I am working on a different topic: adversarial robustness and generalization of deep neural networks.
18thOCTOBER2018 , David Stutz

What is your opinion on this article? Did you find it interesting or useful? Let me know your thoughts in the comments below: