Non-rigid point set registration: Coherent Point Drift

Transcription

1 Non-rigid point set registration: Coherent Point Drift Andriy Myronenko Xubo Song Miguel Á. Carreira-Perpiñán Department of Computer Science and Electrical Engineering OGI School of Science and Engineering Oregon Health and Science University Beaverton, OR, USA, 976 {myron, xubosong, Abstract We introduce Coherent Point Drift (CPD), a novel probabilistic method for nonrigid registration of point sets. The registration is treated as a Maximum Likelihood (ML) estimation problem with motion coherence constraint over the velocity field such that one point set moves coherently to align with the second set. We formulate the motion coherence constraint and derive a solution of regularized ML estimation through the variational approach, which leads to an elegant kernel form. We also derive the EM algorithm for the penalized ML optimization with deterministic annealing. The CPD method simultaneously finds both the non-rigid transformation and the correspondence between two point sets without making any prior assumption of the transformation model except that of motion coherence. This method can estimate complex non-linear non-rigid transformations, and is shown to be accurate on D and 3D examples and robust in the presence of outliers and missing points. Introduction Registration of point sets is an important issue for many computer vision applications such as robot navigation, image guided surgery, motion tracking, and face recognition. In fact, it is the key component in tasks such as object alignment, stereo matching, point set correspondence, image segmentation and shape/pattern matching. The registration problem is to find meaningful correspondence between two point sets and to recover the underlying transformation that maps one point set to the second. The points in the point set are features, most often the locations of interest points extracted from an image. Other common geometrical features include line segments, implicit and parametric curves and surfaces. Any geometrical feature can be represented as a point set; in this sense, the point locations is the most general of all features. Registration techniques can be rigid or non-rigid depending on the underlying transformation model. The key characteristic of a rigid transformation is that all distances are preserved. The simplest nonrigid transformation is affine, which also allows anisotropic scaling and skews. Effective algorithms exist for rigid and affine registration. However, the need for more general non-rigid registration occurs in many tasks, where complex non-linear transformation models are required. Non-linear non-rigid registration remains a challenge in computer vision. Many algorithms exist for point sets registration. A direct way of associating points of two arbitrary patterns is proposed in []. The algorithm exploits properties of singular value decomposition and works well with translation, shearing and scaling deformations. However, for a non-rigid transformation, the method performs poorly. Another popular method for point sets registration is the Iterative Closest Point (ICP) algorithm [], which iteratively assigns correspondence and finds the least squares transformation (usually rigid) relating these point sets. The algorithm then redetermines the closest point set and continues until it reaches the local minimum. Many variants of ICP

2 have been proposed that affect all phases of the algorithm from the selection and matching of points to the minimization strategy [3]. Nonetheless ICP requires that the initial pose of the two point sets be adequately close, which is not always possible, especially when transformation is non-rigid [3]. Several non-rigid registration methods are introduced [4, 5]. The Robust Point Matching (RPM) method [4] allows global to local search and soft assignment of correspondences between two point sets. In [5] it is further shown that the RPM algorithm is similar to Expectation Maximization (EM) algorithms for the mixture models, where one point set represents data points and the other represents centroids of mixture models. In both papers, the non-rigid transform is parameterized by Thin Plate Spline (TPS) [6], leading to the TPS-RPM algorithm [4]. According to regularization theory, the TPS parametrization is a solution of the interpolation problem in D that penalizes the second order derivatives of the transformation. In 3D the solution is not differentiable at point locations. In four or higher dimensions the generalization collapses completely [7]. The M-step in the EM algorithm in [5] is approximated for simplification. As a result, the approach is not truly probabilistic and does not lead, in general, to the true Maximum Likelihood solution. A correlation-based approach to point set registration is proposed in [8]. Two data sets are represented as probability densities, estimated using kernel density estimation. The registration is considered as the alignment between the two distributions that minimizes a similarity function defined by L norm. This approach is further extended in [9], where both densities are represented as Gaussian Mixture Models (GMM). Once again thin-plate spline is used to parameterize the smooth non-linear underlying transformation. In this paper we introduce a probabilistic method for point set registration that we call the Coherent Point Drift (CPD) method. Similar to [5], given two point sets, we fit a GMM to the first point set, whose Gaussian centroids are initialized from the points in the second set. However, unlike [4, 5, 9] which assumes a thin-plate spline transformation, we do not make any explicit assumption of the transformation model. Instead, we consider the process of adapting the Gaussian centroids from their initial positions to their final positions as a temporal motion process, and impose a motion coherence constraint over the velocity field. Velocity coherence is a particular way of imposing smoothness on the underlying transformation. The concept of motion coherence was proposed in the Motion Coherence Theory []. The intuition is that points close to one another tend to move coherently. This motion coherence constraint penalizes derivatives of all orders of the underlying velocity field (thin-plate spline only penalizes the second order derivative). Examples of velocity fields with different levels of motion coherence for different point correspondence are illustrated in Fig.. (a) (b) (c) (d) Figure : (a) Two given point sets. (b) A coherent velocity field. (c, d) Velocity fields that are less coherent for the given correspondences. We derive a solution for the velocity field through a variational approach by maximizing the likelihood of GMM penalized by motion coherence. We show that the final transformation has an elegant kernel form. We also derive an EM algorithm for the penalized ML optimization with deterministic annealing. Once we have the final positions of the GMM centroids, the correspondence between the two point sets can be easily inferred through the posterior probability of the Gaussian mixture components given the first point set. Our method is a true probabilistic approach and is shown to be accurate and robust in the presence of outliers and missing points, and is effective for estimation of complex non-linear non-rigid transformations. The rest of the paper is organized as follows. In Section we formulate the problem and derive the CPD algorithm. In Section 3 we present the results of CPD algorithm and compare its performance with that of RPM [4] and ICP []. In Section 4 we summarize the properties of CPD and discuss the results.

3 Method Assume two point sets are given, where the template point set Y = (y,..., y M ) T (expressed as a M D matrix) should be aligned with the reference point set X = (x,..., x N ) T (expressed as a N D matrix) and D is the dimension of the points. We consider the points in Y as the centroids of a Gaussian Mixture Model, and fit it to the data points X by maximizing the likelihood function. We denote Y as the initial centroid positions and define a continuous velocity function v for the template point set such that the current position of centroids is defined as Y = v(y ) + Y. Consider a Gaussian-mixture density p(x) = M M p(x m) with x m N (y m, σ I D ), where Y represents D-dimensional centroids of equally-weighted Gaussians with equal isotropic covariance matrices, and X set represents data points. In order to enforce a smooth motion constraint, we define the prior p(y λ) exp ( λ φ(y)), where λ is a weighting constant and φ(y) is a function that regularizes the motion to be smooth. Using Bayes theorem, we want to find the parameters Y by maximizing the posteriori probability, or equivalently by minimizing the following energy function: E(Y) = log e xn ym σ + λ φ(y) () We make the i.i.d. data assumption and ignore terms independent of Y. Equation has a similar form to that of Generalized Elastic Net (GEN) [], which has shown good performance in nonrigid image registration []; note that there we directly penalized Y, while here we penalize the transformation v. The φ function represents our prior knowledge about the motion, which should be smooth. Specifically, we want the velocity field v generated by template point set displacement to be smooth. According to [3], smoothness is a measure of the oscillatory behavior of a function. Within the class of differentiable functions, one function is said to be smoother than another if it oscillates less; in other words, if it has less energy at high frequency. The high frequency content of a function can be measured by first high-pass filtering the function, and then measuring the resulting power. This can be represented as φ(v) = R d ṽ(s) / G(s) ds, where ṽ indicates the Fourier transform of the velocity and G is some positive function that approaches zero as s. Here G represents a symmetric low-pass filter, so that its Fourier transform G is real and symmetric. Following this formulation, we rewrite the energy function as: E(ṽ) = log e xn ym σ + λ Rd ṽ(s) ds () G(s) It can be shown using a variational approach (see Appendix A for a sketch of the proof) that the function which minimizes the energy function in Eq. has the form of the radial basis function: v(z) = w m G(z y m ) (3) We choose a Gaussian kernel form for G (note it is not related to the Gaussian form of the distribution chosen for the mixture model). There are several motivations for such a Gaussian choice: First, it satisfies the required properties (symmetric, positive definite, and G approaches zero as s ). Second, a Gaussian low pass filter has the property of having the Gaussian form in both frequency and time domain without oscillations. By choosing an appropriately sized Gaussian filter we have the flexibility to control the range of filtered frequencies and thus the amount of spatial smoothness. Third, the choice of the Gaussian makes our regularization term equivalent to the one in Motion Coherence Theory (MCT) []. The regularization term R ṽ(s) / G(s) ds, with a Gaussian function for G, is equivalent to the sum of weighted squares of all d order derivatives of the velocity field β m R d m! (D m v) [, 3], where D is a derivative operator so that m D m v = m v and D m+ v = ( m v). The equivalence of the regularization term with that of the Motion Coherence Theory implies that we are imposing motion coherence among the points and thus we call our method the Coherent Point Drift (CPD) method. Detailed discussion of MCT can be found in []. Substituting the solution obtained in Eq. 3 back into Eq., we obtain

4 CPD algorithm: Initialize parameters λ, β, σ Construct G matrix, initialize Y = Y Deterministic annealing: EM optimization, until convergence: E-step: Compute P M-step: Solve for W from Eq. 7 Update Y = Y + GW Anneal σ = ασ Compute the velocity field: v(z) = G(z, )W Figure : Pseudo-code of CPD algorithm. E(W) = log e xn y m P M k= w k G(y k y m ) σ where G M M is a square symmetric Gram matrix with elements g ij = e W M D = (w,..., w M ) T is a matrix of the Gaussian kernel weights in Eq λ tr ( W T GW ) (4) y i y j β and Optimization. Following the EM algorithm derivation for clustering using Gaussian Mixture Model [4], we can find the upper bound of the function in Eq. 4 as (E-step): Q(W) = P old (m x n ) x n y m G(m, )W σ + λ tr ( W T GW ) (5) where P old denotes the posterior probabilities calculated using previous parameter values, and G(m, ) denotes the m th row of G. Minimizing the upper bound Q will lead to a decrease in the value of the energy function E in Eq. 4, unless it is already at local minimum. Taking the derivative of Eq. 5 with respect to W, and rewriting the equation in matrix form, we obtain (M-step) Q W = σ G(diag (P))(Y + GW) PX) + λgw = (6) yold m xn yold m xn where P is a matrix of posterior probabilities with p mn = e σ / M e σ. The diag ( ) notation indicates diagonal matrix and is a column vector of all ones. Multiplying Eq. 6 by σ G (which exists for a Gaussian kernel) we obtain a linear system of equations: (diag (P)) G + λσ I)W = PX diag (P) Y (7) Solving the system for W is the M-step of EM algorithm. The E step requires computation of the posterior probability matrix P. The EM algorithm is guaranteed to converge to a local optimum from almost any starting point. Eq. 7 can also be obtained directly by finding the derivative of Eq. 4 with respect to W and equating it to zero. This results in a system of nonlinear equations that can be iteratively solved using fixed point update, which is exactly the EM algorithm shown above. The computational complexity of each EM iteration is dominated by the linear system of Eq. 7, which takes O(M 3 ). If using a truncated Gaussian kernel and/or linear conjugate gradients, this can be reduced to O(M ). Robustness to Noise. The use of a probabilistic assignment of correspondences between point sets is innately more robust than the binary assignment used in ICP. However, the GMM requires that each data point be explained by the model. In order to account for outliers, we add an additional uniform pdf component to the mixture model. This new component changes posterior probability matrix P yold m xn yold m xn in Eq. 7, which now is defined as p mn = e σ /( (πσ ) D a + M e σ ), where a defines the support for the uniform pdf. The use of the uniform distribution greatly improves the noise. Free Parameters. There are three free parameters in the method: λ, β and σ. Parameter λ represents the trade off between data fitting and smoothness regularization. Parameter β reflects the strength

5 of interaction between points. Small values of β produce locally smooth transformation, while large values of β correspond to nearly pure translation transformation. The value of σ serves as a capture range for each Gaussian mixture component. Smaller σ indicates smaller and more localized capture range for each Gaussian component in the mixture model. We use deterministic annealing for σ, starting with a large value and gradually reducing it according to σ = ασ, where α is annealing rate (normally between [.9.98]), so that the annealing process is slow enough for the algorithm to be robust. The gradual reducing of σ leads to a coarse-to-fine match strategy. We summarize the CPD algorithm in Fig.. 3 Experimental Results We show the performance of CPD on artificial data with non-rigid deformations. The algorithm is implemented in Matlab, and tested on a Pentium4 CPU 3GHz with 4GB RAM. The code is available at myron/matlab/cpd. The initial value of λ and β are set to. in all experiments. The starting value of σ is 3. and gradually annealed with α =.97. The stopping condition for the iterative process is either when the current change in parameters drops below a threshold of 6 or the number of iterations reaches the maximum of 5. CPD algorithm RPM algorithm ICP algorithm Figure 3: Registration results for the CPD, RPM and ICP algorithms from top to bottom. The first column shows template ( ) and reference (+) point sets. The second column shows the registered position of the template set superimposed over the reference set. The third column represents the recovered underlying deformation. The last column shows the link between initial and final template point positions (only every second point s displacement is shown). On average the algorithm converges in few seconds and requires around 8 iterations. All point sets are preprocessed to have zero mean and unit variance (which normalizes translation and scaling). We compare our method on non-rigid point registration with RPM and ICP. The RPM and ICP implementations and the D point sets used for comparison are taken from the TPS-RPM Matlab package [4]. For the first experiment (Fig. 3) we use two clean point sets. Both CPD and RPM algorithms produce accurate results for non-rigid registration. The ICP algorithm is unable to escape a local minimum. We show the velocity field through the deformation of a regular grid. The deformation field for RPM corresponds to parameterized TPS transformation, while that for CPD represents a motion coherent non-linear deformation. For the second experiment (Fig. 4) we make the registration problem more challenging. The fish head in the reference point set is removed, and random noise is added. In the template point set the tail is removed. The CPD algorithm shows robustness even in the area of

6 missing points and corrupted data. RPM incorrectly wraps points to the middle of the figure. We have also tried different values of smoothness parameters for RPM without much success, and we only show the best result. ICP also shows poor performance and is stuck in a local minimum. For the 3D experiment (Fig. 5) we show the performance of CPD on 3D faces. The face surface is defined by the set of control points. We artificially deform the control point positions non-rigidly and use it as a template point set. The original control point positions are used as a reference point set. CPD is effective and accurate for this 3D non-rigid registration problem. CPD algorithm RPM algorithm ICP algorithm Figure 4: The reference point set is corrupted to make the registration task more challenging. Noise is added and the fish head is removed in the reference point set. The tail is also removed in the template point set. The first column shows template ( ) and reference (+) point sets. The second column shows the registered position of the template set superimposed over the reference set. The third column represents the recovered underlying deformation. The last column shows the link between the initial and final template point positions. 4 Discussion and Conclusion We intoduce Coherent Point Drift, a new probabilistic method for non-rigid registration of two point sets. The registration is considered as a Maximum Likelihood estimation problem, where one point set represents centroids of a GMM and the other represents the data. We regularize the velocity field over the points domain to enforce coherent motion and define the mathematical formulation of this constraint. We derive the solution for the penalized ML estimation through the variational approach, and show that the final transformation has an elegant kernel form. We also derive the EM optimization algorithm with deterministic annealing. The estimated velocity field represents the underlying non-rigid transformation. Once we have the final positions of the GMM centroids, the correspondence between the two point sets can be easily inferred through the posterior probability of

7 (a) (b) (c) (d) (e) (f) Figure 5: The results of CPD non-rigid registration on 3D point sets. (a, d) The reference face and its control point set. (b, e) The template face and its control point set. (c, f) Result obtained by registering the template point set onto the reference point set using CPD. the GMM components given the data. The computational complexity of CPD is O(M 3 ), where M is the number of points in template point set. It is worth mentioning that the components in the point vector are not limited to spatial coordinates. They can also represent the geometrical characteristic of an object (e.g., curvature, moments), or the features extracted from the intensity image (e.g., color, gradient). We compare the performance of the CPD algorithm on D and 3D data against ICP and RPM algorithms, and show how CPD outperforms both methods in the presence of noise and outliers. It should be noted that CPD does not work well for large in-plane rotation. Typically such transformation can be first compensated by other well known global registration techniques before CPD algorithm is carried out. The CPD method is most effective when estimating smooth non-rigid transformations. Appendix A E = log e xn ym σ + λ Rd ṽ(s) ds (8) G(s) Consider the function in Eq. 8, where y m = y m + v(y m ), and y m is the initial position of y m point. v is a continuous velocity function and v(y m ) = R d ṽ(s)e πi<ym,s> ds in terms of its Fourier transform ṽ. The following derivation follows [3]. Substituting v into equation Eq. 8 we obtain: E(ṽ) = log e xn y m R R d ṽ(s)eπi<y m,s> ds σ + λ Rd ṽ(s) ds (9) G(s) In order to find the minimum of this functional we take its functional derivatives with respect to ṽ, so that δe(ṽ) δṽ(t) =, t Rd : N δe(ṽ) δṽ(t) = λ R d M e xn ym σ δ ṽ(s) N δṽ(t) G(s) ds = δṽ(s) δṽ(t) eπi<ym,s> ds σ (x n y m ) R d + M e xn ym σ M e xn ym σ σ (x n y m )e πi<y,t> + M e xn ym σ λṽ( t) G(t) =

A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

Stock Option Pricing Using Bayes Filters Lin Liao liaolin@cs.washington.edu Abstract When using Black-Scholes formula to price options, the key is the estimation of the stochastic return variance. In this

1 2 The calibration problem was discussed in details during lecture 3. 3 Once the camera is calibrated (intrinsics are known) and the transformation from the world reference system to the camera reference

Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a

SOLVING LINEAR SYSTEMS Linear systems Ax = b occur widely in applied mathematics They occur as direct formulations of real world problems; but more often, they occur as a part of the numerical analysis

Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms

Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field 77 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field Overview: The antiderivative in one variable calculus is an important

Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

Interpolating and Approximating Implicit Surfaces from Polygon Soup 1. Briefly summarize the paper s contributions. Does it address a new problem? Does it present a new approach? Does it show new types

Object Recognition and Template Matching Template Matching A template is a small image (sub-image) The goal is to find occurrences of this template in a larger image That is, you want to find matches of

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

OBJECT TRACKING USING LOG-POLAR TRANSFORMATION A Thesis Submitted to the Gradual Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements

Optical Flow as a property of moving objects used for their registration Wolfgang Schulz Computer Vision Course Project York University Email:wschulz@cs.yorku.ca 1. Introduction A soccer game is a real

Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering