Abstract

This paper proposes a 3D shape descriptor network, which is a deep convolutional energy-based model, for
modeling volumetric shape patterns. The maximum likelihood training of the model follows an “analysis by synthesis”
scheme and can be interpreted as a mode seeking and mode shifting process. The model can synthesize 3D shape
patterns by sampling from the probability distribution via MCMC such as Langevin dynamics. The model can be used
to train a 3D generator network via MCMC teaching. The conditional version of the 3D shape descriptor net can be
used for 3D object recovery and 3D object super-resolution. Experiments demonstrate that the proposed model can generate
realistic 3D shape patterns and can be useful for 3D shape analysis.

Experiment 1: Generating 3D Objects

Each row displays one experiment, where the first 3 3D objects are some observed examples, columns 4, 5, 6, 7, 8, and 9 are 6 of the synthesized 3D objects. The nearest neighbors retrieved from the training set are shown in columns 10 and 11 for the last two synthesized objects.

Experiment 2: 3D Object Recovery

We can perform recovery on occluded data by sampling from conditional distribution p(YM|YV,θ), which is learned from fully observed training pairs {(YM, YV)}, where YM is the masked part of the data and YV is the visible part of the data. The sampling is accomplished by Langevin dynamics, which is the same as the one that samples from p(Y; θ), except that we fix the visible part YV and only update the masked part YM through the Langevin dynamics. For each experiment shown below, the first row displays some original 3D data as ground truths, the second row displays the corresponding corrupted data, and the third row displays the corresponding recovery results by our learned model.

Experiment 3: 3D Object Super Resolution

We can perform super-resolution on a low resolution 3D objects by sampling from p(Yhigh|Ylow, θ), which is learned from fully observed training pairs {(Yhigh , Ylow)}. In each iteration, we first up-scale Ylow by expanding each voxel into a d × d × d block (where d is the scaling ratio) of constant intensity to obtain an up-scaled version Y'high of Ylow and then run Langevin dynamics staring from Y'high to obtain Yhigh. The first row displays some original 3D data as ground truths, the second row displays the corresponding low resolution (16 × 16 × 16) 3D data, and the third row displays the corresponding super-resolution (64 × 64 × 64) results by our learned model.

Experiment 4: Cooperative Training of 3D Generator

We evaluate a 3D generator trained by a 3D DescriptorNet in cooperative training scheme on experiments of latent space interpolation and 3D object arithmetic.

Exp 4.1: Interpolation

The following shows interpolation between latent vectors of the 3D objects on the two ends. Our method can learn smooth 3D generator model that traces the manifold of the 3D data distribution.

Exp 4.2: 3D Object Arithmetic

The following shows 3D object arithmetic by the 3D generator net. It encodes semantic knowledge of 3D shapes in the latent space.

Experiment 5: 3D Object Classification

We first train a single model on all categories of the training set of ModelNet10 dataset in an unsupervised manner. Then we use the model as a feature extractor. We train a multinomial logistic regression classifier from labeled data based on the extracted feature vectors for classification. The following shows 3D object classification results on ModelNet10 dataset. We evaluate the classification accuracy on the testing data using the one-versus-all rule.

Method

Classification

Geometry Image

88.4%

PANORAMA-NN

91.1%

ECC

90.0%

3D ShapeNets

83.5%

DeepPana

85.5%

SPH

79.8%

VConv-DAE

80.5%

3D-GAN

91.0%

3D DescriptorNet (ours)

92.4%

Acknowledgment

We thank Erik Nijkamp for his help on coding. We thank Siyuan Huang for helpful discussions.