READING

Menze and Geiger propose a superpixel-based approach to object scene flow (see [1] or this reading for a quick introduction to scene flow) as well as a KITTI-based [2] dataset for scene flow. For scene flow estimation, a CRF-model of the form

is used. Here, $s = \{s_i\}$ denotes a set of superpixels, $o = \{o_i\}$ denotes a set of objects and $\sim$ the relation of adjacent superpixels. Each superpixel is modeled as plane; parameterized by the corresponding normal $n_i$. Further, the scene flow $o_i$ is inherited by the corresponding object a superpixel belongs to; note that objects and their scene flow are denoted by $o_i$. The association of superpixels to objects is estimated jointly with the corresponding scene flow.

Given left and right frames for time $t$ and $(t - 1)$, the data term in Equation (1) describes the matching cost of each superpixel:

where $d(p, p')$ denotes a dissimilarity/distance measure. For $\text{x} \in \{\text{stereo}, \text{flow}, \text{cross}\}$, $[R_{\text{x}}(o_j) | t_{\text{x}}(o_j)]$ denotes the rotation and translation to map a pixel in reference coordinates (Menze and Geiger use the left image at time $t$ as reference) to a pixel in one of the remaining coordinate systems according to the extrinsic camera parameters and the scene flow $o_j$. Menze and Geiger use both dense and sparse matching to define the dissimilarity $d$; for dense matching, the Hamming distance of $5 \times 5$ Census descriptors [4] are used and sparse correspondences are computed beforehand using [5] and [6]. The data term is also illustrated in Figure 1.

Figure 1 (click to enlarge): Illustration of the data term for a specific superpixel in the reference frame (bottom-left).

The second term in Equation (1) encodes smoothness regarding depth, orientation and motion:

where $\text{disp}(n_i, p)$ describes the disparity of plane $n_i$ at pixel $p$.

Overall, the model relies on superpixels computed using StereoSLIC [3], optical flow as in [5] and disparity maps computed using [6]. Due to the association variables $k_i$, Equation (1) describes a discrete-continuous CRF and Menze and Geiger use particle belief propagation (see [7] or [8]). Details are given in the paper.

For evaluation, Menze and Geiger present a KITTI-based dataset for object scene flow by fitting a selected set of CAD models to the point clouds of identified objects. An illustration of the results is shown in Figure 2.