ARTICLE

Superpixel Algorithms Compared using NVD3.js

The comparison of superpixel algorithms presented in my bachelor thesis “Superpixel Segmentation using Depth Information” as interactive plots using nvd3.

After completing my bachelor thesis about superpixel segmentation at the Computer Vision Group at RWTH Aachen University, I thought about an interactive visualization of the results. Finally, I used nvd3 to create the visualizations presented below.

This paper proposes a graph based image segmentation technique which can be applied to a given superpixel segmentation, as well. An image is interpreted as weighted undirected graph $G = (V,E)$. If a superpixel segmentation is given, each superpixel forms a node, otherwise each pixel represents a node. The goal is to bipartition the graph into two disjoint sets $A \subset V$ and $B \subset V$ by minimizing the normalized cut $NCut(A, B)$ given by

As shown in the paper, this criterion can be minimized by discretizing the second smallest eigenvalue corresponding to the generalized eigenvalue problem

$(D - W)y = \lambda D y$.

Assuming the nodes to be consecutively numbered, the matrix $W$ holds the edge weights: $W_{u, v} = w_{u,v}$, and thus is symmetric. $D$ is a diagonal matrix where $D_{u,u} = \sum_{v \in V} w_{u,v}$. The edge weights $w_{u,v}$ can for example be defined as follows: $w_{u,v} = \exp(\| I(u) - I(v) \|_2^2/\sigma^2)$ if superpixels $u$ and $v$ have a common border, $w_{u,v} = 0$ otherwise. In this case, $I(u)$ denotes the color vector of the superpixel corresponding to node $u$ (this may be the mean color of the superpixel). The second smallest eigenvector needs to be discretized. Therefore, Shi et al. choose a threshold value minimizing the normalized cut. Further details can be found in the paper. Code is available online: original MatLab code and an implementation by Gori to generate superpixels.

After introducing SEEDS [1] and SLIC [2], this article focusses on another approach to superpixel segmentation proposed by Felzenswalb & Huttenlocher. The procedure is summarized in algorithm 1 and based on the following definitions. Given an image $I$, $G = (V, E)$ is the graph with nodes corresponding to pixels, and edges between adjacent pixels. Each edge $(n,m) \in E$ is assigned a weight $w_{n,m} = \|I(n) - I(m)\|_2$. Then, for subsets $A,B \subseteq V$ we define $MST(A)$ to be the minimum spanning tree of $A$ and

where $\tau$ is a threshold parameter and $MInt(A,B)$ is called the minimum internal difference between components $A$ and $B$. Starting from an initial superpixel segmentation where each pixel forms its own superpixel the algorithm processes all edges in order of increasing edge weight. Whenever an edge connects two different superpixels, these are merged if the edge weight is small compared to the minimum internal difference.

Proposed in 2008, Quick Shift is a segmentation algorithm based on mode seeking which can also be used to generate superpixels. In general, a mode seeking algorithm starts from a Parzen density estimate $p(x_n)$ for all pixels $x_n \in \{1,\ldots,W\} \times \{1,\ldots,H\}$ (here, $W$ and $H$ are the width and height of the image, respectively) and each pixel is assigned to a mode by following the density $p(x_n)$ upwards, that is in the direction of the gradient. Therefore, Quick Shift may be categorized as gradient ascent method. In particular, Quick Shift pre-computes $p(x_n)$ for all pixels using a Gaussian kernel. In practice, given an image $I$, the distance $d(x_n, x_m)$ used within the Gaussian kernel consists of a color term and a spatial term:

$d(x_n, x_m) = α \| I(x_n)−I(x_m)\|_2 + \|x_n −x_m\|_2$

where $α$ weights the influence of the color term. Subsequently, each pixel $x_n$ gets assigned
to the pixel $x_m \in N_R(x_n) = \{x_m : \|x_n −x_m\|_\infty \leq R/2\}$ such that $p(x_m) > p(x_n)$ or is left unassigned. These assignments correspond to the modes, which represent the final superpixels. The algorithm is summarized in algorithm 1.

TurboPixels is one of the first superpixel algorithms (that is, the algorithm was, in contrast to Quick Shift [1] and the approach by Felzenswalb and Huttenlocher [2], originally intended to generate superpixels). Inspired by active contours, after placing superpixel centers on a reglar grid, the superpixels are grown based on an evolving contour. The contour is implemented as level set of the function

$\psi : \mathbb{R}^2 \times [0, \tau) \rightarrow \mathbb{R}^2$.

The evolution is formally defined by

$\psi_t = -v \|\ \nabla \psi\|_2$

where $\nabla\psi$ denotes the gradient of $\psi$ and $\psi_t$ is the temporal derivative. Here, the speed $v$ describes the future evolution of the contour. In practive, $\psi$ will be the signed euclidean distance and evolution is carried out using a first-order discretization. The contour in iteration $(T + 1)$ is given by

$\psi^{(T+1)} = \psi^{(T)} - v_I v_B \|\nabla \psi^{(T)}\|\Delta t$.

The speed $v$ is split up into two components: $v_I$ which depends on the image content and $v_B$ which ensures that superpixels do not overlap. Iteratively, the superpixels are grown by computing $v_I$ and $v_B$ and then applying the equation above, see the paper for details. The procedure is summarized in algorithm 1.

Figure 1: Superpixel segmentations with roughly $600$ superpixels generated by the original implementation of SLIC which allows to adjust the compactness of the superpixels. The images are taken from the validation set of the Berkeley Segmentation Dataset [8]. From top to bottom: compactness set to $1$; compactness set to $10$; compactness set to $40$.

Veksler et al. propose a graph based superpixel algorithm - to be exact, the paper proposes two algorithms: Compact Superpixels and Constant Intensity Superpixels. In the following we focus on Constant Intensity Superpixels, as the algorithm shows better performance in practice. Note that we assume the image $I$ to be a grayscale image, however, the below description can easily be extended to color images. Initially the image is covered by overlapping squares such that each pixel is covered by multiple squares. Each square represents a superpixel and each pixel can get assigned to one of these squares. Then, the following energy is minimized using $\alpha$-expansion [1]:

Note that this is a simplified formulation: Originally, instead of using the mean color $I(S_i)$ of the superpixel $S_i$, the color of the center pixel of the initial square is used (however, this would require to discuss an additional term enforcing this pixel to belong to $S_i$). Further, $\psi_{n,m}$ is a Potts model:

$\psi_{n,m}(i, j) = 1$, if $i \neq j$, $0$ otherwise.

The weights $w_{n,m}$ of neighboring pixels $x_n$ and $x_m$ are defined as follows:

Another graph-based method for superpixel segmentation was proposed by Lui et al. Using greedy optimization, summarized in algorithm 1, an objective function based on the entropy rate of a random walk on the graph $\hat{G} = (V,M)$ with $M \subseteq E$ is proposed (where we interpret the image $I$ as 4-connected graph $G = (V,E)$):

$E(\hat{G}) = H(\hat{G}) + \lambda B(\hat{G})$

where $H(\hat{G})$ refers to the entropy rate of the randon walk, while $B(\hat{G})$ defines a balancing term. The objective is maximized subject to the constraint that the number of connected components in $\hat{G}$ is equal or lower to the desired number of superpixels $K$. Given weights $w_{n,m}$ between pixels $x_n$ and $x_m$, defined using a Gaussian kernel based on the $L_1$ color distance, $H(\hat{G})$ is defined as:

where $S_i$ denotes the $i^\text{th}$ superpixel. Starting from an initial superpixels segmentation where each pixel forms its own superpixel, the algorithm greedily adds edges to merge superpixels until the desired number of superpixels is reached, see algorithm 1.

Zhang et al. propose a graph-based superpixel algorithm. First, the image is covered by overlapping vertical and horizontal strips such that each pixel is covered by exactly two vertical and two horizontal strips. This way, considering only the horizontal strips, each pixel is either labeled 0 or 1. $N$ being the number of nodes (that is, pixels), an energy similar to the one used for Constant Intensity Superpixels [1] is used:

except that the data term $\theta_n$ is set to zero. The smoothing term $\psi_{n,m}$ is based on the following consideration. Numbering the horizontal strips such that $H_i \subseteq V$ is covered halfway by $H_{i + 1} \subseteq V$, where $V$ is the set of all nodes (that is, pixels), and considering neighboring pixels $x_n$ and $x_m$ such that $x_n$ lies above or at the same horizontal line as $x_m$, three cases are possible:

Including two possible labels per pixel, there are twelve cases to consider. These cases get assigned different weights $w_{n,m}$ which are computed using a Gaussian kernel and the $L_1$ color distance, see their paper. The energy is optimized using max-flow. The final superpixel segmentation can be derived from the vertical and horizontal labels.

A C++ implementation is available at Zhang's webpage. Figure 1 shows superpixel segmentations obtained using the algorithm described above.

Tang et al. propose a superpixel algorithm which generates a regular grid of superpixels, that is the superpixels can be arranged in an array where each superpixel has a consistent, ordered position. Given an edge map

defining the probability of an edge being present at pixel $x_n$, the algorithm proceeds in three steps. Firstly, a set of pixels are chosen as initial grid positions. This is done on a regular grid with horizontal step size $R_h$ and vertical step size $R_v$ given as

$R_v \approx \sqrt{\frac{K H}{W}}$, $R_h = \frac{K}{R_v}$

where $K$ is the desired number of superpixels. Let $\mu_1,\ldots,\mu_{K'}$ denote these positions (where $K'$ is the number of grid positions required to obtain $K$ superpixels). Secondly, the positions are moved towards maximum edge positions by choosing

where $N_R(\mu_i)$ defines a local search region around the position $\mu_i$. Finally, these grid positions define an undirected graph based on their relative positions. Neighboring positions are connected by the shortest path calculated on the undirected, weighted graph with weights

$w_{n,m} = \frac{1}{p(x_n) + p(x_m)}$

for neighboring pixels $x_n$ and $x_m$. The shortest path is computed using Dijkstra’s algorithm. The superpixels are then given by the enclosed regions.

This paper introduces Contour Relaxed Superpixels, a statistical approach to superpixel segmentation. In particular, the value $I_c(x_n)$ of pixel $x_n$ in channel $c$ is assumed to be the outcome of stochastic process described by the parameters $\theta_{s(x_n), c}$ of the corresponding superpixel $S_{s(x_n)}$ where $s(x_n)$ denotes the superpixel $x_n$ belongs to. Using $\theta$ to denote the set of all such parameters, the superpixel segmentation $S$ maximizing $p(S, \theta | I)$ is searched for. Using bayes theorem, and omitting the normalization factor, the energy to be maximized is given by

The parameters $\theta$ are considered deterministic parameters such that $p(S, \theta)$ can be simplified to $p(S, \theta) = \kappa p(S)$. An EM-style (e.g. see [1, p. 423ff]) algorithm is applied: The parameters $\theta$ are optimized using maximum likelihood considering the superpixel segmentation to be constant followed by optimizing for $S$ while the parameters $\theta$ are held constant. Considering each pixel connected to $8$ neighbors, $p(S)$ is modeled using a Gibbs Random Field and can be factorzed into

$p(S) = \kappa' \exp(-N_e C_e - N_v C_v)$

where only the second factor depends on the label of the pixel $x_n$. Here, $N_e$ is the number of direct neighbors of $x_n$ having a different label and $N_v$ is the number of diagonal neighbors with a different label &dash; $C_e$ and $C_v$ are the associated costs. Furtermore, the probability $p(I | S, \theta)$ can be factorized as

This approach can be categorized as gradient ascent approach to superpixel segmentation and is summarized in algorithm 1. The first product in equation (1) runs over all superpixels $S_i$ to which pixel $x_n$ may be assigned.

function crs(
$I$, // Color image.
$K$ // Number of superpixels.
)
// The step size R can be derived from the image size W ×H and K:
initialize $S$ as regular grid with step size $R$
initialize $\theta$ using sufficient statistics (e.g. Gaussian)
for $t = 1$ to $T$
// Originally, the image is traversed multiple times using different
// directions to avoid a directional bias:
for $n = 1$ to $N$
if $x_n$ is a boundary pixel
// This can be evaluated by taking θ as constant; Conrad et al.
// suggest to minimize the negative logarithm of (1) instead:
assign $x_n$ to the label maximizing equation (1)
return $S$

Evaluation and Comparison - Interactive Visualization

Comparison is done on the Berkeley Segmentation Dataset [1] using Boundary Recall and Undersegmentation Error. Given a superpixel segmentation $S = \{S_j\}$ with $S_j \subseteq \{1,\ldots,H\} \times \{1,\ldots,W\}$, and a ground truth segmentation $G = \{G_i\}$, Boundary Recall is part of the Precision-Recall Framework [2] and defined as

$Rec(G, S) = \frac{|TP(G, S)|}{|TP(G, S)| + |FN(G, S)|}$

where $TP(G, S)$ contains all boundary pixels in $S$ for which there exists a boundary pixel in $G$ in range $r$ (that is, true positives), and $FN(G, S)$ contains all boundary pixels in $G$ for which there exists no boundary pixel in $S$ in range $r$ (that is, false negatives). Here, $r$ is a tolerance parameter. Therefore, Boundary Recall is the fraction of boundary pixels captured by the superpixel segmentation and high Boundary Recall is desirable.

where $N = HW$ is the number of pixels. Intuitively, Undersegmentation Error quantifies the leakage (or "bleeding") of superpixels with respect to the ground truth segmentation. Low Undersegmentation Error is preferrable as each superpixel is expected to cover at most one ground truth segment.

Note that oriSLIC is the original implementation of SLIC, oriSEEDS the original implementation of SEEDS and reSEEDS a revised implementation of SEEDS; reSEEDS* is a variant of SEEDS using an additional compactness term, see here for details. Boundary Recall, Undersegmentation Error and Runtime are plotted against the number of superpixels. Runtime is given in seconds and due to the prohibitive long runtime of NC, the corresponding runtimes are excluded. All results have been obtained on the test set of the Berkeley Segmentation Dataset after individually optimizing parameters on the corresponding validation set using discrete grid search.

For links to the corresponding implementations, qualitative results and further quantitative results see SEEDS/Superpixels.

The below visualization requires JavaScript to be enabled; using a recent browser is recommended.

ABOUTTHEAUTHOR

In September, I was honored to receive the MINT-Award IT 2018, sponsored by ZF and audimax, for my master thesis on weakly-supervised shape completion. For CVPR 2019, however, I am working on a different topic: adversarial robustness and generalization of deep neural networks.
18thOCTOBER2018 , David Stutz

What is your opinion on this article? Did you find it interesting or useful? Let me know your thoughts in the comments below: