What is 3D modified Fisher Vector (3DmFV) representation for 3D point clouds

September 8, 2018

Recently we published a paper about 3D point cloud classification (and segmentation) using our proposed 3D modified Fisher Vector (3DmFV) representation and convolutional neural networks (CNNs). The preprint is available on ArXiv and the final version is available in Robotics and Automation Letters (RA-L) journal.

I believe in making research accessible to everyone so I give here a brief explanation about the 3DmFV representation for point clouds.

Before reading this post it is important to have a solid understanding of Gaussian Mixture Models (GMMs) and Fisher Vectors (FVs). If you are a bit rusty on these subjects I posted two primers in the links below:

The 3D modified Fisher Vector (3DmFV) generalizes the Fisher Vector along two directions

GMM choice – 3DmFV uses a uniform Gaussian grid – The Gaussians parameters ( mean, standard deviation and weights) are not estimated from the data, they are predefined and position on a 3D grid, with equal weights and standard deviation.

Symmetric functions – adding minimum and maximum (in addition to the summation) operating on all points, for each Gaussian.

The Math

3DmFV is formally defined by :

The Intuition

It is much easier to understand the 3DmFV for points with some nice visualizations.

Let’s take a single 3D point in a single Gaussian and show its 3DmFV (here the grid is basically 1x1x1):

Now let’s see what happens when we move the point (hint 3DmFV changes)

Next, we can see what happens when we move the point all around (the .gif file might take a few seconds to load).

Finally, let’s see how the 3DmFV looks like when we take multiple points on a GMM of 2x2x2:

Reconstruction from 3DmFV

Some may argue that 3DmFV is simply another handcrafted feature. However, we argue that it is simply another form to represent the data and therefore, the process is reversible. It is possible to show analytically that for simple cases it is reversible (single point, single Gaussian, points on a plane in a Gaussian). It gets a bit more complex in the general case when more points and more Gaussians are present. Therefore, we trained a simple 3DmFV decoder that is able to take a 3DmFV representation as input and produce a 3D point cloud as output.