Developers

Latest 15 blog updates

It is with pleasure to share the successful completion of this Toyota Code Sprint in this final blog post. In this project, homography estimation based on multi-modal, multi-descriptor correspondence sets has been explored, and inspired the introduction of the multi-descriptor voting approach (MDv). The proposed MDv approach achieved a consistent accuracy in the 0.0X range, a level of consistency that is better than those based on single-type state of the art descriptors including SIFT. In the process, a framework for analyzing and evaluating single and multi-descriptor performance has been developed, and employed to validate the robustness of MDv, as compared with homography estimations based on a single descriptor type, as well as those based on RANSAC registration of best-K multi-descriptor correspondence sets. The code and dataset for this project are hosted on https://github.com/mult-desc/md, with dependencies on both PCL 1.7 and OpenCV 2.4.6.

Follows is an in-depth report detailing the project’s accomplishments, as well as design and validation considerations:

In the previous blog post I described my attempts to find a good balance
between the contributions of the three terms (distance, normal, and color) to
edge weight computation. As it often happens, as soon as I was done with the
evaluation and the blog post, I realized that there is another type of
information that could be considered: curvature. And indeed, it proved to have
a very positive effect on the performance of the random walker segmentation.

Let me begin by exposing a problem associated with edge weights computed using
the normal term alone (i.e. depending only on the angular distance between the
normals of the vertices). Consider the following scene (left):

Voxelized point cloud (left) and a close-up view of the graph

edges in the region where the tall and round boxes touch

(right). The edges are colored according to their weights

(from dark blue for small weights to dark red for large

weights).

Most of the edges in the boundary region (right) are dark blue, however there
are a number of red edges with quite large weights. This sort of boundary is
often referred to as a “weak boundary” and, not surprisingly, has a negative
effect on the performance of many segmentation algorithms. You can imagine
that a boundary like this is a disaster for the flood-fill segmentation,
because the flood will happily propagate through it. Luckily, the random
walker algorithm is known for its robustness against weak boundaries:

Segmentations produced by the random walker algorithm

using three different choices of seeds (shown with red

squares)

In the first two cases one of the seeds is very close to the weak boundary,
whereas another one is far away. In the third case there are multiple green
seeds placed along the boundary, however a single purple seed is able to
“resist” them from its remote corner.

This robustness has limits, of course. In the figure below the “table seed” is
placed in the rear of the table, far from the boundaries with the boxes. The
box segments managed to “spill” on the table through the weak boundaries:

Segmentation failure when a seed is placed too far from

a weak boundary

One way to address this problem is to make the sigma of the normal term
smaller, therefore penalizing differences in normals’ orientations more.
Enabling the distance term might also help, because the edges that contribute
to the boundary weakness are often diagonal and therefore longer than the
average. The figure below (left) demonstrates the graph edges in the same
boundary region with new weights, computed using decreased normal term sigma
(10% of the original one), and with the distance term enabled. (The overall
edge color shift towards blue is due to it.)

Graph edges with weights computed with a smaller normal

term sigma and enabled distance term. Close-up view of

the region where the tall and round boxes touch (left)

and top-down view at the rear of the table (right).

Still, there are several edges with relatively large weights in the boundary
region, but there are no large-weight paths connecting vertices on both sides
of the boundary anymore. We got rid of the weak boundary, but this came at a
price. Although the table itself is flat, the cloud that we get from the
Kinect is not, and the further from the camera the more wavy it is. The image
on the right shows a top-down view at the rear of the table, where the waves
are particularly large. The edges that belong to the cavities between the
“waves” were heavily penalized and virtually disappeared. Random walkers will
have a hard time traversing this part of the table on their way to the boxes!

Having considered all of these I came to a conclusion that some additional
geometrical feature is needed to improve the weighting function. Curvature was
the first candidate, especially in the light of the fact that we anyways get
it for free when estimating voxel normals (via PCA of the covariance matrix of
the voxel neighborhood). I added one more exponential term to the weighting
function:

where is simply a product of the voxel curvatures.
Similarly to how it is done in the normal term, the product is additionally
multiplied by a small constant if the angle between the voxels is convex (in
order not to penalize convex boundaries).

The figure below demonstrates the edge weights computed using the new term
alone:

Graph edges with weights computed using only the new

curvature term. Close-up view of the region where the

the rear of the table (right).

The boundary between the boxes is perfectly strong, whereas the weights in the
rear of the table are not penalized too much. The seeding that resulted in a
segmentation failure before no longer causes problems. I used the set of
random seedings described in the last post to find the best sigmas for the
extended weighting function. Then I generated 50 new seedings to test and
compare the performance of the old (without the curvature term) and the new
(with the curvature term) weighting functions. The figure below summarizes the
distributions of under-segmentation errors:

The performance significantly improved both in terms of stability, quality,
and number of failures (in fact, there are no failures at all).

The random walker segmentation algorithm requires that the data are modeled as
a weighted graph, and the choice of edge weighting function has a great impact
on the performance of the algorithm. In this blog post I will describe the
weighting function and parameters that I ended up using.

Before talking about the weights of the edges between vertices, let’s discuss
the vertices themselves. As mentioned in the previous blog posts, I have a
pre-processing step where the input cloud is voxelized using
OctreePointCloudAdjacency. Voxelization serves three purposes:

Data down-sampling. The number of voxels is smaller than the number of
points in the original cloud.

Data smoothing. The normal orientation and color of a voxel are averaged
over the points of the original cloud that belong to it.

Establishing of adjacency relations. The regular grid structure of the
octree naturally defines a 26-neighborhood for each voxel.

The voxels consequently become vertices of the graph, and each of them is
connected with its neighbors by an edge. Each voxel has several properties: 3D
position, normal orientation, and color, which may be used in edge weight
computation.

As mentioned in the very first blog post on random walker segmentation,
originally I used the edge weighting function from the following paper:

Later on I introduced several modifications. Now for a pair of vertices
and the weight is defined as:

,

where , , and are the
Euclidean, angular, and color differences between voxels, and the sigmas are
used to balance their contributions. Compared to the weighting function of
Lai et al., the color term was added, and scaling by mean values was
removed.

I devised the following procedure in order to find appropriate values for the
sigmas. I took a scene with known ground truth segmentation and generated 50
random proper seedings. Here by a “proper seeding” I mean a set of seeds where
each seed belongs to a distinct ground truth segment, and each ground truth
segment has a single seed (see example in the figure below on the left). For
each of these seedings I ran random walker and computed the under-segmentation
error (that was defined in one of the earlier blog posts, see example in the
figure below on the right). Then I analyzed the distributions of errors
resulted from different sigma values.

Voxelized point cloud with

one of the randomly

generated proper seeding

used in the experiments

Under-segmentation error of

the segmentation produced by

random walker from the

given seeds (erroneous voxels

pained red)

Note that the ground truth itself is not perfect, because it is often
impossible to tell apart the points at the boundary of two objects.
Consequently, the ground truth segmentation is somewhat random at the
boundaries, and we should not expect (or strive) any segmentation algorithm to
produce exactly the same result. The under-segmentation error displayed above
has 1039 erroneous voxels, and this is pretty much the best performance we
could expect from a segmentation algorithm with this ground truth.

Let’s begin by examining the influence of the angular term. In this experiment
I set Euclidean sigma to the value of voxel resolution and color sigma to 0.1.
Below is a plot of under-segmentation error distributions for different
choices of angular sigma (note that larger sigmas correspond to less
influence):

Each distribution is visualized using a boxplot.
The three main features are the position of the red bar (median of the
distribution), size of the box (50% of the values fall inside the
box), and the amount and positions of pluses (outliers). The first one gives
an idea of the average performance of the algorithm. The second one expresses
the segmentation stability with respect to the seed choice (with smaller box
meaning better stability). The third one indicates segmentation failures.
Indeed, a significant deviation of the under-segmentation error means that the
output segmentation has large mis-labeled regions, which may be deemed as a
failure.

Clearly, the median values of the distributions are almost the same. The
differences are very small and due to the discussed properties of the
under-segmentation error can not be used to draw conclusions of which sigmas
are better. The box sizes, however, vary significantly. The sigmas from 0.1 to
1.1 yield the most stable performance. Also the number of failures is less for
those sigmas. This evaluation does not provide enough information to chose
any particular sigma in this range, so for now I settled on 0.2 (it is the
second most stable, but yields less failures than 0.1).

In order to explore the influence of the color term, I set Euclidean sigma to
the value of voxel resolution again and angular sigma to 0.2:

Note the last column, which shows the error distribution when the color term
is removed completely. We see that sigmas from 0.1 to 0.15 provide slightly
more stable results. Unfortunately it is not visible in the plot, but the
number of pluses on the row is less for these sigmas. So I
chose 0.15 as a result of this evaluation.

Speaking about the distance sigma, it turned out to have very small influence
on the results. In most cases introduction of the distance term does not change
the output at all. Still, in few cases it helps to avoid complete segmentation
failure. It turned out that setting this sigma to the voxel resolution value
gives the best results.

Finally, let me demonstrate the segmentations produced with the chosen sigmas.
Among the 50 random seedings only 5 resulted in segmentation failure:

5 failed segmentations

Clearly, failures happened when a seed was placed either exactly on the
boundary between two objects (#2), or on the outermost voxels of an object
(#1, #3, #4, #5).

The remaining 45 seedings yielded good segmentations. Below are 15 of them
(selected randomly):

In the past weeks I decided to put the “spectral thing” on hold and turned
back to the random walker segmentation. In this blog post I will talk about
refactoring of the random walker algorithm and my experiments with different
linear solvers. In the follow-up post I will explore how the segmentation
results depend on seed placement, as well as discuss edge weighting functions
and the choice of parameters for them.

First of all, I refactored and cleaned up my implementation. Recall that the
random walker algorithm could be applied to cluster any kind of data as long
as there is a way to model it using a weighted graph. I decided that it makes
sense to have a generic templated implementation of the algorithm which would
work with any weighted graph. An obvious choice to represent graphs in C++
is to use the primitives available in the Boost Graph Library (BGL).
This library is very generic, feature-rich, and flexible, though at the
expense of a rather steep learning curve. I took their implementation of the
Boykov-Kolmogorov max-flow algorithm as an example of how to design the interface
for a generic graph-based algorithm. In my case the public interface is just
one templated function:

The user has to provide a graph, a property map that associates a weight to
each edge of the graph, and a property map that contains initial vertex
colors. (I adopted the term “colors” instead of “labels” because BGL has a
pre-defined vertex property type with this name.) The output of the algorithm
(i.e. label assignment) is written back to the color map. Internally the
function instantiates a class, which does all the boring work of constructing
and solving a system of linear equations, as well as interpreting its solution
as a label assignment.

While this generic graph segmentation function might be useful for someone,
the general audience will be interested in a class that implements a complete
point cloud segmentation pipeline. This class should take care of converting
an input cloud into a weighted graph, segmenting it, and turning the random
walker output into a labeled point cloud. At the moment it is not clear for me
how the first step should be designed. Indeed, there are multiple ways to
represent a point cloud as a weighted graph, both in terms of topology and
edge weights. Currently I voxelize the input cloud and use 26-neighborhood to
establish edges between nodes. Alternatively, one may work with a full point
cloud and use some fixed-radius neighborhood. One more option might be to
generate a mesh and work with it. Exploration of these possibilities remains as
a future work.

The second issue that I addressed recently was the performance of the
algorithm. The main computational effort is spent on solving a sparse system
of linear equations, where the number of equations is determined by the number
of unlabeled vertices (i.e. basically the size of the whole point cloud). For
example, the typical size of voxelized scenes from the OSD dataset
that I use in my experiments is about 30000 vertices. Originally, I used the
ConjugateGradient solver of Eigen, because it is “recommended for
large symmetric problems”. The time needed to segment a typical point cloud
with this solver is about 1 second on my three years old i5 laptop. I decided
to try other options
available in Eigen. In particular, I tested BiCGSTAB with Diagonal
and IncompleteLUT preconditioner, SimplicialLLT, SimplicialLDLT,
and SimplicialCholesky solvers. The figure below shows the runtime of
these solvers with respect to the problem size. (Only one of the
Simplicial*** solvers is plotted as they demonstrated very similar
performance.)

The computation time depends linearly on the problem size for all solvers,
however SimplicialLDLT has a much smaller growth rate. For a typical 30
thousand vertices problem it needs about 200 ms. What’s more, it can solve for
multiple right-hand sides at the same time, whereas ConjugateGradient and
BiCGSTAB can not. This means that as the number of labels (i.e. desired
segments) grows, the computational time does not increase.

In fact, Eigen offers some more options such as CholmodSupernodalLLT,
which is a wrapper for SuiteSparse package, and SparseLU, which uses
the techniques from the SuperLU package. Unfortunately, the former
complained that the matrices that I provide are not positive definite (though
they actually are), and the latter is a very recent addition that is only
available in Eigen3.2 (which I do not have at the moment).

Taking into account the evaluation results I switched to the SimplicailLDLT
solver in my random walker implementation.

I am pleased to announce that I just finished the second Toyota Code Sprint. The topic we tackled this time was superquadrics applied for Computer Vision tasks such as object modeling and object detection.

The code we developed was not yet integrated into PCL (it does use PCL for all the processing), but lies in a separate repository which you can find here: https://github.com/aichim/superquadrics .

An extensive report that presents some of the theory behind the concepts and algorithms we used in the project, as well as implementation details and results can be found at the end of this post.

At the end of this work, we present a set of ideas that can be used to extend the project in subsequent code sprints:

performance evaluation of the superquadric-based algorithms compared to other state-of-the-art object modeling and object detection approaches and integrating my code into PCL if the results are satisfactory

explore further possibilities of object modeling inside point clouds. This includes techniques different from superquadrics (see the report for refecences), or improving superquadric-based techniques (see supertoroids, deformable superquadrics)

in this code sprint we explored one approach for multipart object segmentation using superquadrics. This technique is valuable for point cloud compression, but not very efficient nor robust. More work in this direction can bring interesting results.

more robust and efficient object fitting using superquadrics - right now we use only the 3d location of the points, but the quality of the fitting can be improved by using normal and/or curvature information.

considering that most of the scans of objects will not cover the complete sphere of possible views, we should think about how to fit only partial superquadrics

In the several previous posts I tried to provide some insight in how the
spectral clustering technique may be applied to the point cloud processing
domain. In particular, I have demonstrated different visualizations of
eigenvectors and also did a manual analysis of one particular scene. In this
blog post I will (finally) describe my algorithm that does automatic analysis
of eigenvectors, which leads to unsupervised supervoxel clustering.

The input of the clustering algorithm is , a set of first
eigenvectors of the graph Laplacian. Each eigenvector has
elements that correspond to the supervoxels is the original problem
space. The task of the algorithm is to determine the number of clusters that
the data points form in the subspace spanned by the eigenvectors and, of
course, assign points to these clusters.

The key insight drawn from previous examinations and discussions of the
eigenvectors is that the clusters are linearly separable in one-dimensional
subspaces spanned by the eigenvectors. In other words, for every pair of
clusters there exists at least one eigenvector so that in its subspace these
clusters are linearly separable. Based on this premise I built an algorithm
which is a pretty straightforward instance of divisive hierarchical clustering
approach. It starts with a single cluster that contains all the data points
and recursively splits it in a greedy manner. The following pseudo-code
summarizes the algorithm:

The interesting part are, of course, FindBestSplit and SplitQuality
functions. But to get this straight, let me first define what a “split” is. A
split is a tuple , where is the eigenvector
along whose subspace the split occurs, and is a threshold value. The
points that have their corresponding elements in the eigenvector less than
go to the first cluster, and the remaining go to the second. For
example, the figure below shows with a red line a split
on the left and a split on the right:

Example splits in the subspaces spanned by the

eigenvectors and

Which of these two splits is better? Intuitively, the one on the left is more
promising than the one on the right. But how to define the split quality?
My initial approach was to use the difference between the points immediately
above and below the splitting line as the measure. This worked to some extent,
but was not good enough. Then I switched to a measure based on the relative
densities of the bands above and below the splitting line. Consider the figure
below:

Example splits with three bands highlighted. The

“split” band is shown in red, the “top” and “bottom”

bands are shown in yellow

The “split” band is the region of low density around the splitting line. The
“top” and “bottom” bands are the high density regions immediately above and
below the “split” band. I will omit the details of how these bands are
computed, because it is likely that I modify the implementation in future.
The quality of the split is defined as , where is the density of the
corresponding band.

With this quality measure at hand, the FindBestSplit function simply
iterates over all available one-dimensional subspaces (i.e. over all
eigenvectors) and finds the split with the highest quality.

The performance of the algorithm is excellent on simple scenes:

Spectral supervoxel clustering of simple table-top scenes

And is rather good (though definitely not perfect) on more cluttered ones:

Spectral supervoxel clustering of cluttered table-top scenes

For example, the green book in the first scene is split into two clusters. In
the second scene two small boxes in the center of the image are erroneously
merged into one cluster. I think these issues are closely related with the
split quality measure and the threshold associated with it. Definitely, there
are ways to improve these and I plan to work on it the future.

Before exposing the clustering algorithm as promised in the last blog post, I
decided to motivate it by showing how the eigenvectors may be analyzed “by
hand”. Hopefully, this will also provide more intuition about what these
eigenvectors actually are and how they are related with the data.

Just to remind, the problem I am trying to solve is about segmenting a set of
supervoxels into meaningful components. Here a meaningful component means a
subset of supervoxels that are close to each other in Euclidean sense, and are
separated from the rest by a sharp change in orientation. If we view each
supervoxel as a point then the problem is about clustering points in a
-dimensional space. (Currently since supervoxels have 3
Euclidean coordinates plus 3 coordinates of the normal vector, however
additional dimensions, e.g. color, may be added later.) The difficulty of the
problem comes from the fact that the components may have arbitrary irregular
shape in these dimensions. Therefore I want to map these points so some other
space where the components will correspond to tight clusters, perhaps even
linearly separable. The current idea is to use a subspace spanned by the first
few eigenvectors of graph Laplacian of the original data.

In the last blog post I provided a visualization of the eigenvectors in the
original problem space. For convenience, here are the first four eigenvectors
again:

The first 4 eigenvectors in the original problem space

I want to perform clustering in a subspace though, so it is helpful to develop
an intuition about how the data look like in it. The figure below demonstrates
the data points (that is, supervoxels), projected on each of the first four
eigenvectors. (Here and in what follows the data is whitened, i.e. de-meaned
and scaled to have unit variance. Additionally, the values are sorted in
increasing order.)

Data points in subspaces spanned by the first 4 eigenvectors

Obviously, in each of these subspaces (except for the second) the data is
linearly separable in two clusters. What is not obvious, however, is how many
clusters there will be in the combined subspace. The next figure shows data
points in subspaces spanned by two different pairs of eigenvectors:

Data points in subspaces spanned by first and third (left)

and fourth and third eigenvectors (right)

Now it becomes evident that there are at least three clusters. Will there be
more if we consider the subspace spanned by all these three eigenvectors? It
turns out there will, see the point cloud below:

Unfortunately, we have just approached the limit in terms of how many
dimensions could be conveniently visualized. Though for this particular data
set it is enough, there won’t appear more clusters if we consider additional
eigenvectors. Summarizing, in the subspace spanned by the first four
eigenvectors the data points form four tight and well-separated (linearly)
clusters. And these clusters actually correspond to the three boxes and the
table in the original problem space.

Now the only thing left is to develop an algorithm which would do this kind
of analysis automatically!

Correspondence rejection classes implement methods that help eliminate correspondences based on specific criteria such as distance, median distance, normal similarity measure or RanSac to name a few. Couple of additional filters I’ve experimented with include a uniqueness measure, and Lowe’s ratio measure as in “Distinctive image features from scale invariant keypoints”, D.G. Lowe, 2004. I’ve also explored the tradeoffs in implementing the filters within CorresondenceEstimation itself, or as external CorrespondenceRejection classes. The former is computationally more efficient if the rejection process is done in one pass, while the latter allows for scene-specific squential filter banks.

Follows is a quick reference guide of the available correspondence rejection classes with remarks extracted from the source code.

I concluded the last blog post by noting that it seems to be possible to
segment objects based on analysis of the eigenvectors of Laplacian constructed
from the point cloud. This time I will provide a visual interpretation of
eigenvectors and then describe the problem of their analysis.

Let me start with a quick note on eigenvector computation. As mentioned
before, they are obtained through eigendecomposition of Laplacian that
represents the surface. In the beginning I used SelfAdjointEigenSolver
of Eigen library. It runs in time
(where is the number of points), which obviously does not scale
well. Later I switched to SLEPc. It can
limit computation only to a desired number of first eigenpairs and therefore
does the job much faster, but still seems to have polynomial time. Therefore I
decided to execute supervoxel clustering as a pre-processing step and then
compute the distances over the supervoxel adjacency graph, which has a
dramatically smaller size than the original point cloud.

Now let’s turn to the eigenvalues themselves. The figure below demonstrates
the voxelized point cloud of a simple scene (left) and supervoxel adjacency
graph (right), where the adjacency edges are colored according to their
weights (from dark blue for small weights to dark red for large weights):

Voxelized point cloud (voxel

size 0.006 m)

Supervoxel adjacency graph

(seed size 0.025 m), colored

according to edge weight

Eigendecomposition of Laplacian of this weighted adjacency graph yields a set
of pairs . Each eigenvector
has as many elements as there are vertices in the
graph. Therefore it is possible to visualize an eigenvector by painting each
supervoxel according to its corresponding element in the vector. The figure
below shows the first 9 eigenvectors which correspond to the smallest
eigenvalues:

First 9 eigenvectors of the graph Laplacian

The first eigenvector clearly separates the scene into two parts: the third
box and everything else. In the second eigenvector the table is covered with
gradient, but the first and third boxes have (different) uniform colors and,
therefore, stand out. The third eigenvector highlights the second box, and so
on.

I have examined quite a number of eigenvectors of different scenes, and I
think the following common pattern exists. First few eigenvectors tend to
break scene in several regions with uniform colors and sharp edges. In the
next eigenvectors gradients begin to emerge. Typically, a large part of the
scene would be covered with gradient, and a smaller part (corresponding to a
distinct component of the graph) would be have some uniform color.

The overall goal is to figure out the number of distinct components of the
graph (that is, objects in the scene) and segment them out. As I admitted
before, it is clear that the eigenvectors capture all the information needed
to do this, so the question is how to extract it. In fact, this problem has
already received a lot of attention from researchers under the name of
“Spectral Clustering”. (Yeah, I made quite a detour through all these distance
measures on 3D surfaces before I came to know it). The standard approach is
described in the following paper:

In a nutshell, the original problem space typically has many dimensions (in
our case the dimensions are Euclidean coordinates, normal orientations, point
colors, etc.). The clusters may have arbitrary irregular shapes, so they could
neither be separated linearly, nor with hyper-spheres, which renders standard
techniques like K-means inapplicable. The good news are, in the subspace
spanned by the first few eigenvectors of Laplacian of the graph (constructed
form the original data) the data points form tight clusters, and thus K-means
could be used. This effect is evident in the eigenvectors that I demonstrated
earlier. Unfortunately, the number of clusters still needs to be known. There
exist literature that addresses automatic selection of the number of clusters,
however I have not seen any simple and reliable method so far.

In the next blog post I will describe a simple algorithm that I have developed
to analyze the eigenvectors and demonstrate the results.

With my current work on optimizing correspondence estimation across the uv/xyz domains, it is worth providing a topology of the available correspondence estimation classes in PCL. For a highlevel treatment of the registration API, please refere to the registration tutorial.

Correspondence estimation attempts to match keypoints in a source cloud to keypoints in a target cloud, based on some similarity measure, feature descriptors in our case. Although applying scene relevant descriptor parameters and correspondence thresholds may reduce erronous matches, outliers persist with impact on pose estimation. This is due to the implied assumption that for each source keypoint, a corresponding target keypoint exists. The difficulty in estimating model or scene-specific descriptor parameters is another factor.

Follows is a quick reference guide of the available correspondence estimation classes with remarks extracted from the source code.

Last time I wrote about distance measures on 3D surfaces, though I did not
give any details about how they are computed. In this blog post I will give a
formal definition, followed by two important properties that simplify the
computation and provide insights that might help to solve the ultimate goal:
identification of distinct objects in a scene.

Given a mesh (or a point cloud) that represents a surface, a discretization
of the Laplace-Beltrami operator (LBO)
is constructed. This discretization is a sparse symmetric matrix of size
, where is the number of vertices (points) in the
surface. The non-zero entries of this matrix are the negated weights of the
edges between adjacent vertices (points) and also vertex degrees. This matrix
is often referred to as Laplacian. Eigendecomposition of Laplacian consists
of pairs , where
are eigenvalues, and
are corresponding eigenvectors.

The diffusion distance is defined in terms of the eigenvalues and
eigenvectors of Laplacian as follows:

The biharmonic distance bears a strong resemblance to it:

Here means -th element of eigenvector
. Both distances have a similar structure: a sum over all
eigenpairs, where summands are differences between corresponding elements of
eigenvectors scaled by some function of eigenvalues. There are two properties
of these distances that I would like to stress.

Firstly, the summands form a decreasing sequence. The figure below illustrates
this point with eigenvalues of Laplacian of a typical point cloud:

In the left image the first hundred of eigenvalues (except to
which is always zero) are plotted. Note that the values are
normalized (i.e. divided) by . The magnitudes of eigenvalues
increase rapidly. On the right the multipliers of both diffusion and
biharmonic distances are plotted (also computed with normalized eigenvalues).
The biharmonic distance multiplier is plotted for several choices of the
parameter . Clearly, only a few first terms in the summation are
needed to approximate either of the distances. This has an important
consequence that there is no need to solve the eigenproblem completely, but
rather is suffices to find a limited number of eigenpairs with small
eigenvalues.

Secondly, the distance between two points and depends on
the difference between their corresponding elements in eigenvectors
and . The figure below demonstrates
the (sorted) elements of a typical eigenvector:

One may see that there are groups of elements with the same value. For
example, there are about one hundred elements with value close to
. The pair-wise distances between the points that correspond to
these elements will therefore be close to zero. In other words, plateaus in
eigenvector graphs correspond to sets of incident points, and such sets may be
interpreted as objects.

Summing up, it seems like it should be possible to identify distinct
objects in a point cloud by analyzing the eigenvectors of Laplacian (even
without explicitly computing any of the distance measures). Moreover, only a
few first eigenvectors are relevant, so it is not necessary to solve the
eigenproblem entirely.

In the recent weeks I have been developing an approach which would allow
automatic selection of seed points. I decided to proceed with the methods
which use graph-based representation of point clouds and, as the matter of
fact, are closely related with random walks on those graphs. I still do not
have any solid results, though there are some interesting outputs that I would
like to share in this blog post.

In the domain of mesh segmentation, or more generally 3D shape analysis, there
is a fundamental problem of measuring distances between points on a surface.
The most trivial and intuitive is geodesic distance, which encodes the
length of the shortest path along the surface between two points. It has a
number of drawbacks, the most important being its sensitivity to perturbations
of the surface. For example, introducing a hole along the shortest path
between two points, or a small topological shortcut between them may induce
arbitrary large change in the distance. This is an undesired property,
especially considering the noisy Kinect data that we work with.

A more sophisticated distance measure, diffusion distance, is based on the
mathematical study of heat conduction and diffusion. Suppose we have a graph
that represents a shape (it may be constructed exactly the same way as for the
Random Walker segmentation). Imagine that a unit amount of heat is applied at
some vertex. The heat will flow across the edges and the speed of its
diffusion will depend on the edge weights. After time has passed,
the initial unit of heat will be somehow distributed among the other vertices.
The Heat Kernel () encodes this distribution. More specifically,
is the amount of heat accumulated after time at
vertex if the heat was applied at vertex . Based on this
kernel the diffusion distance between each pair of points is defined.
Importantly, the distance depends on the time parameter and captures either
local or global shape properties.

Another distance measure, commute-time distance, is the average time it
takes a random walker to go from one vertex to the other and come back.
Finally, biharmonic distance was proposed most recently in:

This distance measure is non-parametric (does not depend on e.g. time) and is
claimed to capture both local and global properties of the shape.

The figure below demonstrates biharmonic, commute-time, and heat diffusion
distance maps computed with respect to the point marked with a red circle:

Left column: voxelized point cloud (top), biharmonic distance

(middle), commute-time distance (bottom).

Right column: heat diffusion distance for several choices of

the time parameter.

In each image the points with smallest distance are painted in dark blue, and
the points with largest distances are dark red. The absolute values of
distances are very different in all cases.

I think these distance maps could be used to infer the number of distinct
objects in the scene. Indeed, the points that belong to the same object tend
to be equidistant from the source point, so different objects correspond to
different blobs of homogeneous points. Finding objects thus is the same as
finding modes of the distribution of distances, which could be accomplished
with Mean-Shift algorithm.

Speaking about particular choice of distance measure, biharmonic distance and
heat diffusion distance with large time parameter intuitively seem to be
better than others, however this is a subject for a more careful examination.

This is a follow-up post to the segmentation using random walks. It turned out
that my initial implementation had several bugs which significantly worsened
the performance. In this blog post I will describe them and show the outputs
produced by the fixed algorithm.

The segmentation results that I demonstrated last time expose two
(undesirable) features: vast regions with random label assignment and “zones
of influence” around seed points. My first intuitive explanation was that
since the edge weights are always less than 1, a random walker can not get
arbitrary far from its starting point, therefore if a seed is reasonably far,
the probability of getting there is very small. Surprisingly, I did not find
any mentions or discussions of this effect in the literature. Moreover, while
carefully re-reading “Random Walks for Image Segmentation” I mentioned that
the algorithm outputs for each point and label pair not the probability that a
random walker started from that point will reach the label, but rather the
probability that it will first reach that label (i.e. earlier than other
labels).

I decided to visualize edge weights and the probabilities I get with my
implementation. In the following experiment I labeled three points, one on the
table, and the other two on the boxes (see the left figure):

Color point cloud

with three labeled

points

Edges between

voxels

Probabilities that

a random walker

will first reach the

top right label

The figure in the middle shows all edges along which random walkers move.
Each edge is visualized by its middle point colored according to edge weight
(using “jet” color map where 0 is dark blue and 1 is dark red). The weights do
make sense, as the edges in the planar regions are mostly reddish (i.e. weight
is close to 1), whereas on the surface boundaries they are bluish (i.e. weight
is close to 0). Moreover, it is clear that concave and convex boundaries have
different average weights.

The figure on the right shows all points colored according to the probability
that a random walker started from that point will first reach the labeled
point on the box in the back. The probability images for the other labels are
similar, thus for the majority of points the probability of first reaching
either label is close to zero. Clearly, this can not be right, because
some label has to be reached first anyways, and therefore the probabilities at
each point should sum up to one. This observation triggered a long search for
bugs in my implementation, but in the end I discovered and fixed two issues.

As I mentioned before, the solution for random walker segmentation is obtained
by solving a system of linear equations, where coefficients matrix consists of
edge weights and their sums, arranged in a certain order. The system is huge
(there are as many equations as there are unlabeled points), but sparse (the
number of non-zero coefficients in a row depends on the number of edges
incident to a point). The first issue was due to a bug (or a feature?) of
OctreePointCloudAdjacency class that I used to determine neighborhoods of
points. In a nutshell, it is a specialized octree, where leaves store pointers
to their direct neighbors (in terms of 26-connectivity). For some reason, a
leaf always stores a pointer to itself in the list of neighbors. This caused
bogus self-edges in my graph which themselves did not show up in the matrix
(because diagonal elements are occupied by sums of weights of incident edges),
however treacherously contributed to those sums, thus invalidating them.

The second issue was more subtle. For the system to have a solution it has to
be non-singular. In terms of the graph it means that it either has to be
connected, or should contain at least one seed in every connected component.
From the beginning I had a check to enforce this requirement, however I did
not take into account that the weighting function may assign zero weight to an
edge! In the pathological case all the edges incident to a point may happen to
be “weightless”, thus resulting in an all-zero row in the system.

Below are the results obtained using the random walker algorithm after I fixed
the described issues:

Random walk segmentation

with overlaid seed points

Probabilities that a random

walker will first reach each

of the labels

As I did it previously, I labeled a single point in each object. (Actually, a
single point in each disjoint part of each object. That is why there are
multiple labeled points on the table.) The scene is perfectly segmented. On
the right is an animated GIF with a sequence of frames which show
probabilities (potentials) for each label. In most cases the separation
between object and background is very sharp, but in two cases (the small box
on top of the book and standing box on the right) some non-zero probabilities
“spill” outside the object. Nevertheless, this does not affect the final
segmentation since the potentials for the correct labels are higher.

I am very happy with the results, however one should remember that they were
obtained using manual selection of labels. This code sprint is aimed at an
automatic segmentation, so next I plan to consider different strategies of
turning this into a fully autonomous method.

In the last few weeks I have been working on two things in parallel. On one
hand, I continued to improve supervoxel segmentation, and a description of
this is due in a later blog post. On the other hand, I started to look into an
alternative approach to point cloud segmentation which uses random walkers.
In this blog post I will discuss this approach and show my initial results.

The input is an image where several pixels are marked (by the user) with
different labels (one for each object to be segmented out) and the output is a
label assignment for all the remaining pixels. The idea behind the algorithm
is rather intuitive. Think of a graph where the nodes represent image pixels.
The neighboring pixels are connected with edges. Each edge is assigned a
weight which reflects the degree of similarity between pixels. As it was
mentioned before, some of the nodes are marked with labels. Now take an
unlabeled node and imagine that a random walker is released from it. The
walker randomly hops to one of the adjacent nodes with probability
proportional to the edge weight. In the process of this random walk it will
occasionally visit labeled nodes. The one that is most likely to be visited
first determines the label of the node where the walker started.

Luckily, one does not have to simulate random walks from each node to obtain
the probabilities of arriving at labeled nodes. The probabilities may be
calculated analytically by solving a system of linear equations. The matrix of
coefficients consists of edge weights and their sums, arranged in a certain
order. The author did a great job explaining why this is so and how exactly
the system should be constructed, so those who are interested are referred to
the original paper.

Originally, the algorithm was applied to segment 2D images, however it could
be used for any other data as long as there is a way to model it using a
weighted graph. For us, of course, 3D point clouds are of a particular
interest. The following paper describes how random walker segmentation could
be applied for meshes or point clouds:

The authors construct the graph from a point cloud is the following way. Each
point in the cloud becomes a node in the graph and the
normal vector is computed for it.

For a pair of nodes and two distances are defined:

Euclidean distance between points

Angular distance between normals

In the angular distance is a coefficient which depends on the
relative concavity between points. For the convex case it is set to
, effectively discounting the difference, whereas for the concave
case it is equal to .

Using K-nearest neighbors search the neighborhood of a node
is established. An edge is created between the node and each other node in the
neighborhood. The weight of the edge depends on the two distances and is
defined as:

and are used to balance the contributions of
different distances. and in the
denominators stand for the mean values of the distances over the whole point
cloud.

In my implementation I decided to voxelize point cloud like it is done in
SupervoxelSegmentation. I also reused the OctreePointCloudAdjacency
class contributed by Jeremie Papon to establish the neighborhood relations
between voxels. The weight is computed exactly as it is proposed by Lai et
al. Finally, I use the Sparse module of Eigen to solve the linear
system.

I mentioned before, but did not stress attention on the fact that this
segmentation approach is semi-automatic. This means that the user has to
provide a set of labeled points. My current implementation either accepts user
input or generates labeled points uniformly using the same approach as
SupervoxelSegmentation does for seed generation. There are smarter ways of
producing initial labeling and I plan to consider this issue later.

Here are the some initial results that I got using random walk segmentation
algorithm. The big red dots show the locations of labeled points:

Voxelized point cloud (voxel

size 0.006 m)

Random walk segmentation

(with one manually labeled

point per object)

Random walk segmentation

(with multiple manually

labeled points in each object)

Random walk segmentation

(labeled points are uniformly

distributed)

In the first case (top right) I manually labeled each object in the scene with
one point. Many of the object boundaries are correctly labeled, however there
are vast regions with totally random labels. This is especially evident in the
upper right corner, where there is just one seed. A circular region around it
is correctly labeled with single color, however the points outside of it are
all randomly colored. According to my understanding, this happens because the
edge weights are always less than 1, so a random walker can not get arbitrary
far from the starting point. Thus if there are no labeled points in the
vicinity of a point, then all the computed probabilities are nearly zero and
label assignment happens randomly (because of numerical imprecision).

In the second case (bottom left) I manually labeled each object in the scene
with multiple points guided by an intuition about how large the “zones of
influence” are around each labeled point. The resulting segmentation is rather
good.

In the third case (bottom right) the labeled points were selected uniformly
from a voxel grid with size 10 cm. As a result I got an over-segmentation that
resembles the ones produced by SupervoxelSegmentation.

I think the initial results are rather promising. In the future I plan to work
on the weighting function as it seems to be the key component for the random
walk segmentation. I would like to understand if and how it is possible to
vary the size of the “zone of influence” around labeled points.

I concluded the previous blog post with a claim that the improved refinement
procedure yields a better segmentation. That conclusion was based purely on a
visual inspection and rang a bell reminding that it is time to start thinking
about quantitative evaluation metrics and prepare a tool set to preform
evaluations. In this blog post I will describe my first steps in this
direction.

Let us start with a prerequisite for any evaluation, ground truth data. So far
I have been using scenes from Object Segmentation Database (OSD).
This dataset contains a ground truth annotation for each point cloud, however
I am not completely satisfied with its quality. Consider the following point
cloud (left) and the provided annotation (middle):

Color point cloud

Original ground

truth annotation

Improved ground

truth annotation

Some of the points in the close vicinity of the object boundaries are wrongly
annotated. For example, almost all points on the left edge of the standing box
are annotated as belonging to the table. I hypothesize that the annotation was
obtained automatically using some color image segmentation algorithm. (Thus it
is not really “wrong”, it is correct in terms of the color data.) As we are
working with 3D point clouds, the spatial relations derived from the depth
data should have larger weight than color and the annotation should be revised
to respect geometrical boundaries of the objects. Unfortunately, there is no
tool in PCL that would allow to load a point cloud and edit the labels. It
costed me quite a few hours of work to put together a simple editor based on
PCLVisualizer that could do that. An example of refined annotation is
shown in the figure above (right).

The authors of SupervoxelSegmentation used two metrics to evaluate how
nicely the supervoxels adhere to the object boundaries: under-segmentation error
and boundary recall. These metrics were introduced in:

I decided to begin with the under-segmentation error. Formally, given a ground
truth segmentation into regions and a supervoxel
segmentation into supervoxels , the under-segmentation
error for region is defined as:

Simply put, it takes a union of all the supervoxels that overlap with a given
ground truth segment and measures how much larger its total size is than the
size of the segment. The authors of SupervoxelSegmentation use a slightly
modified version which summarizes the overall error in a single number:

,

where is the total number of voxels in the scene. In practice, this
error sums up the sizes of all the supervoxels that cross the ground truth
boundaries.

In my opinion, this definition biases error with the average supervoxel size.
Consider the supervoxel segmentation (left):

Supervoxel

segmentation

(seed size 0.1 m)

Supervoxels that

cross ground truth

boundaries (red)

Erroneous voxels

according to the

proposed error

definition (red)

Clearly, the segmentation is rather good in terms of the border adherence. The
only failure is in the right corner of the lying box, where the green
supervoxel “bleeds” on the table. The middle image shows which voxels will be
counted towards the error, i.e. those supervoxels that cross the ground truth
boundaries. In fact, every supervoxel in the close vicinity of an edge is
counted. The reason is simple: the edge between two surfaces in a typical
point cloud obtained with an RGB-D camera is ambiguous. The ground truth
segmentation is therefore random to some extent and the chance that
the supervoxel boundary will exactly coincide with the ground truth segment
boundary is close to zero. Consequently, the error always includes all the
supervoxels around the boundary, and the larger they are on average, the
larger the error is. The “real” error (“bled” green supervoxel) is completely
hindered with it.

I decided to use a modified definition. Whenever a supervoxel crosses ground
truth boundary(ies), it is split in two (or more) parts. I assume that the
largest part is correctly segmented, and only count the smaller parts as
erroneous. The figure above (right) demonstrates which voxels would be counted
towards the error in this case. Still, there is some “noise” around
boundaries, but it does not hinder the “real” error. I also do not like that
the error in the original definition is normalized by the total point cloud
size. I think that the normalization should be related to the amount of
objects or the amount (length) of object boundaries. I will consider this
options in future, but for now I just count the number of erroneous pixels and
do not divide it by anything.

I would like to conclude with the results of evaluating simple and improved
supervoxel refinement procedures using the described error metric. The figure
below shows the voxels considered as erroneous (in red) in supervoxel
segmentations obtained with 2 rounds of simple refinement (left) and 2 rounds
of improved refinement (right):

Under-segmentation error

(seed size 0.1 m, with

simple refinement)

Under-segmentation error

(seed size 0.1 m, with

improved refinement)

The under-segmentation error without refinement is 15033. Simple refinement
reduces it to 11099 after the first iteration and 10755 after the second.
Improved refinement reduces it to 9032 and 8373 respectively. I think the
numbers do agree with the intuition, so I will continue using this metric in
future.