Figure 2. Embeddings of the synthetic manifold S‐curve. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result. (a) Sample data. The title of subplots (b)-(f) indicates the abbreviation of the the manifold learning algorithm and the GSCD result.

Figure 3. Embeddings of the synthetic manifolds Swiss Hole. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result. (a) Sample data. The title of subplots (b)-(f) indicates the abbreviation of the the manifold learning algorithm and the GSCD result.

Figure 4. Embeddings of the synthetic manifolds Punctured Sphere. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result. (a) Sample data. The title of subplots (b)-(f) indicates the abbreviation of the the manifold learning algorithm and the GSCD result.

Figure 5. Embeddings of the synthetic manifolds Toroidal Helix. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result. (a) Sample data. The title of subplots (b)-(f) indicates the abbreviation of the the manifold learning algorithm and the GSCD result.

Figure 6. Embeddings of the ISOFACE data set. Subfigure (a) shows nine sample images, and subfigure (b) to subfigure (f) are the embedding results of different manifold learning algorithms. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result.

research-article

Abstract

Spectral analysis‐based dimensionality reduction algorithms, especially the local manifold learning methods, have become popular recently because their optimizations do not involve local minima and scale well to large, high‐dimensional data sets. Despite their attractive properties, these algorithms are developed based on different geometric intuitions, and only partial information from the true geometric structure of the underlying manifold is learned by each method. In order to discover the underlying manifold structure more faithfully, we introduce a novel method to fuse the geometric information learned from different local manifold learning algorithms in this chapter. First, we employ local tangent coordinates to compute the local objects from different local algorithms. Then, we utilize the truncation function from differential manifold to connect the local objects with a global functional and finally develop an alternating optimization‐based algorithm to discover the low‐dimensional embedding. Experiments on synthetic as well as real data sets demonstrate the effectiveness of our proposed method.

Keywords: dimensionality reduction, manifold learning

1. Introduction

Nonlinear dimensionality reduction (NLDR) plays an important role in the modern data analysis system, since many objects in our world can only be electronically represented with high‐dimensional data such as images, videos, speech signals, and text documents. We usually need to analyze a large amount of data and process them, and however, it is very complicated or even infeasible to process these high‐dimensional data directly, due to their high computational complexity on both time and space. Over the past decade, numerous manifold learning methods have been proposed for nonlinear dimensionality reduction. From methodology, these methods can be divided into two categories: global algorithms and local algorithms. Representative global algorithms contain isometric mapping [1], maximum variance unfolding [2], and local coordinates alignment with global preservation [3]. Local methods mainly include Laplacian eigenmaps (LEM) [4], locally linear embedding (LLE) [5], Hessian eigenmaps (HLLE) [6], local tangent space alignment (LTSA) [7], local linear transformation embedding [8], stable local approaches [9], and maximal linear embedding [10].

Different local approaches try to learn different geometric information of the underlying manifold, since they are developed based on the knowledge and experience of experts for their own purposes [11]. Therefore, only partial information from the true underlying manifold is learned by each existing local manifold learning method. Thus, to better discover the underlying manifold structure, it is more informative and essential to provide a common framework for synthesizing the geometric information extracted from different local methods. In this chapter, we propose an interesting method to unify the local manifold learning algorithms (e.g., LEM, LLE, HLLE, and LTSA). Inspired by HLLE which employs local tangent coordinates to compute the local Hessian, we propose to utilize local tangent coordinates to estimate the local objects defined in different local methods. Then, we employ the truncation function from differential manifold to connect the local objects with a global functional. Finally, we develop an alternating optimization‐based algorithm to discover the global coordinate system of lower dimensionality.

2. Local tangent coordinates system

A manifold is a topological space that locally resembles Euclidean space near every point. For example, around each point, there is a neighborhood that is topologically the same as the open unit ball in ℝD. The simplest manifold is a linear manifold, usually called a hyperplane. There exists a tangent space at each point of a nonlinear manifold. The tangent space is a linear manifold which locally approximates the manifold. Suppose there are N points {x1,…,xN} in ℝD residing on a smooth manifold M⊂ℝD, which is the image of a coordinate space Y⊂ℝd under a smooth mapping ψ:Y→ℝD, where d≪D. The mapping ψ is assumed as a locally isometric embedding. The aim of a NLDR algorithm is to acquire the corresponding low‐dimensional representation yi∈Y of each xi∈M and preserve certain intrinsic structures of data at the same time. Suppose M is smooth such that the tangent space Tx(M) is well defined at every point x∈M. We can regard the local tangent space as a d‐dimensional affine subspace of ℝD which is tangent to M at x. Thus, the tangent space has the natural inner product induced by the embedding M⊂ℝD. Within some neighborhood of x, each point x∈M has a sole closest point in Tx(M), and therefore, an orthonormal coordinate system from the corresponding local coordinates on M can be associated with the tangent space.

A manifold can be represented by its coordinates. While the current research of differential geometry focuses on the characterization of the global properties of manifolds, NLDR algorithms, which try to find the coordinate representations of data, only need the local properties of manifolds. In this chapter, we use local coordinates associated with the tangent space to estimate the local objects over the manifold. To acquire the local tangent coordinates, we first perform Principal Component Analysis (PCA) [12] on the points in N(xi)={xi,xi1,…,xik} that is the local patch built by the point xi and its k nearest neighborhoods, and get d leading PCA eigenvectors Vi={v1i,v2i,…,vdi} which correspond to an orthogonal basis of Txi(M) (the orthogonal basis can be seen as a d‐dimensional affine subspace of ℝD which is tangent to M at xi). For high‐dimensional data, we employ the trick presented by Turk and Pentland for EigenFaces [13]. Then, we obtain the local tangent coordinates Ui={0,u1i,…,uki} of the neighborhood N(xi) by projecting the local neighborhoods to this tangent subspace:

An illustration of the local tangent space at xi and the corresponding tangent coordinates system (i.e., the point xij's local tangent coordinate is uji) is shown in Figure 1.

Figure 1.

Local tangent space and tangent coordinates system.

3. Reformulations of LEM, LLE, HLLE and LTSA using local tangent coordinates

3.1. Reformulation of Laplacian eigenmaps

The method LEM was introduced by Belkin and Niyogi [4]. We can summarize the geometrical motivation of LEM as follows. Assume that we are searching for a smooth one‐dimensional embedding f:M→ℝ from the manifold to the real line so that data points near each together on the manifold are also mapped close together on the line. Think about two adjacent points, x,z∈M, which are mapped to f(x) and f(z), respectively, we can obtain that

where ∇Mf is the gradient vector field along the manifold. Thus, to the first order, ∥∇Mf∥ provides us with an estimate of how far apart f maps nearby points. When we look for a map that best preserves locality on average, a natural choice to find f is to minimize [4]:

where the integral is taken with respect to the standard measure over the manifold. Thus, the function f that minimizes Φlap(f) has to be an eigenfunction of the Laplace‐Beltrami operator ΔM, which is a key geometric object associated with a Riemannian manifold [14].

Suppose that the tangent coordinate of x∈N(x) is given by u. Then, the rule g(u)=f(x)=f∘ψ(u) defines a function g:U→ℝ, where U is the neighborhood of u∈ℝd. With the help of local tangent coordinates, we can reduce the computation of the gradient vector ∇Mf(x) on the manifold to the computation of the ordinary gradient vector on the Euclidean space:

where u=(u1,…,ud)∈ℝd, and we keep up tan in the notation to make clear that it counts on the coordinate system in Tx(M). For different local coordinate systems, although the tangent gradient vector will be different, the norm ∥∇tanf(x)∥ is inimitably defined such that equation (3) can be approximated by estimating the following functional:

It is easy to show that the least‐squares solution of the above object function is αi=(Ui)†fi, where fi=[f(xi1),…,f(xik)]∈ℝk, Ui=[U1i;U2i;…;Uki]∈ℝk×(1+d), and (Ui)† denotes the pseudo‐inverse of Ui. If we define a local gradient operator Gi∈ℝd×k which is constructed by the last d rows of (Ui)†, we have ∇tanf(xi)=Gifi. Furthermore, the local object ∥∇tanf(xi)∥2 can be computed as:

An unresolved problem in our reformulation is how to connect the local object ∥∇tanf(x)∥2 with the global functional Φ˜lap(f) in (5) and its discrete approximation. In Section 4, we will discuss this issue in detail.

3.2. Reformulation of locally linear embedding

The LLE method was introduced by Roweis and Saul [5]. It is based on simple geometric intuitions, which can be depicted as follows. Globally, the data points are sampled from a nonlinear manifold, while each data point and its neighbors are residing on or close to a linear patch of the manifold locally. Thus, it is possible to describe the local geometric properties of the neighborhood of each data point in the high‐dimensional space by linear coefficients which reconstruct the data point from its neighbors under suitable conditions. The method of LLE computes the low‐dimensional embedding which is optimized to preserve the local configurations of the data. In each locally linear patch, the reconstruction error in the original LLE can be written as:

where {wij}j=1k are the reconstruction weights which encode the geometric information of the high‐dimensional inputs and are constrained to satisfy ∑jwij=1.

Since the geometric structure of the local patch can be approximated by its projection on the tangent space Txi(M), we utilize the local tangent coordinates to estimate the local objects over the manifold in our reformulation framework. We can write the reconstruction error of each local tangent coordinate as:

and then normalize the solution by ∑kwik=1. Consider the problem of mapping the data points from the manifold to a line such that each data point on the line can be represented as a linear combination of its neighbors. Let f(xi1),…,f(xik) denote the mappings of u1i,…,uki, respectively. Motivated by the spirit of LLE, the neighborhood of f(xi) should share the same geometric information as the neighborhood of ui, so we can define the following local object:

3.3. Reformulation of Hessian eigenmaps

The HLLE method was introduced by Donoho and Grimes [6]. In contrast to LLE that obtains linear embedding by minimizing the l2 error in Eq. (10), the HLLE achieves linear embedding by minimizing the Hessian functional on the manifold where the data points reside. HLLE supposes that we can obtain the low‐dimensional coordinates from the (d+1)‐dimensional null‐space of the functional ℋ(f) which presents the average curviness of f upon the manifold, if the manifold is locally isometric to an open connected subset of ℝd. We can measure the functional ℋ(f) by averaging the Frobenius‐norm of the Hessians on the manifold M as [6]:

where Hftan stands for the Hessian of f in tangent coordinates. In order to estimate the local Hessian matrix, we first perform a second‐order Taylor expansion at a fixed xi on the smooth functions: {f(xij)}j=1k,f:M→ℝ that is C2 near xi:

where g:U→ℝ uses the local tangent coordinates and satisfies the rule g(u)=f(x)=f∘ψ(u). In the second identity of Eq. (17), we have exploited the fact that uii=〈Vi,xi−xi〉=0 [recall the computation of local tangent coordinates in Eq. (1)].

Over Ui, we develop the operator βi that approximates the function g(uji) by its projection on the basis Uji={1,uj1i,…,ujdi,(uj1i)2,…,(ujdi)2,…,uj1i×uj2i,…,ujd−1i×ujdi}, and we have:

The least‐squares solution is βi=(Ui)†fi, where fi=[f(x1),…,f(xk)]∈ℝk, Ui=[U1i;U2i;…;Uki]∈ℝk×(1+d+d(d+1)/2), and (Ui)† signifies the pseudo‐inverse of Ui. Notice that hi is the vector form of local Hessian matrix Hfi, while the last d(d+1)/2 components of βi correspond to hi. Meanwhile, we can construct the local Hessian operator Hi∈ℝ(d(d+1)/2)×k by the last d(d+1)/2 rows of (Ui)†, and therefore, we can obtain hi=Hifi. Thus, the local object ∥Hftan(xi)∥F2 can be estimated with:

3.4. Reformulation of local tangent space alignment

The method LTSA was introduced by Zhang and Zha [7]. LTSA is based on similar geometric intuitions as LLE. The neighborhoods of each data point remain nearby and similarly colocated in the low‐dimensional space, if the data set is sampled from a smooth manifold. LLE constructs low‐dimensional data so that the local linear relations of the original data are preserved, while LTSA constructs a locally linear patch to approximate the tangent space at the point. The coordinates provided by the tangent space give a low‐dimensional representation of the patch. From Eq. (6), we can obtain:

From the above equation, we can discover that there are some relations between the global coordinate f(xij) in the low‐dimensional feature space and the local coordinate uji which represents the local geometry. The LTSA algorithm requires the global coordinates f(xij) that should respect the local geometry determined by the uji:

where fi=[f(xi1),…,f(xik)]T, Ui=[u1i;u2i;…;uki], and e is a k‐dimensional column vector of all ones. Naturally, we should seek to find the optimal mapping f and a local affine transformation Li to minimize the following global functional:

4. Fusion of local manifold learning methods

So far we have discussed four basic local objects: ∥∇tanf(x)∥2, |σf(x)|2, ∥Hftan(xi)∥F2, and |κf(xi)|2. From different perspectives, they depict the geometric information of the manifold. We look forward to collect these geometric information together to better reflect the geometric structure of the underlying manifold. Notice that we can estimate these local objects under the local tangent coordinate system according to Eqs. (9), (14), (21), and (28), respectively. Taking stock of the structure of these equations, it is not hard to discover that we can fuse these local objects together under our proposed framework. Assume that there are M different local manifold learning algorithms, we can define the fused local object as follows:

where {cj}j=1M are the nonnegative balance parameters, {LOj(x)}j=1M are the local objects, such as ∥∇tanf(x)∥2, |σf(x)|2, ∥Hftan(xi)∥F2, and |κf(xi)|2, from different algorithms. It is worth to note that the other local manifold learning algorithms can also be reformulated to incorporate into our unified framework.

We employ the truncation function from differential manifold to connect the local objects with their corresponding global functional such that we can obtain a consistent alignment of the local objects to discover a single global coordinate system of lower dimensionality. The truncation function is a crucial tool in differential geometry to build relationships between global and local properties of the manifold. Assume that U and V are two nonempty subsets of a smooth manifold M, where V¯ is compact and V¯∈U ( V¯ is the closure of V). Accordingly, the truncation function [15] can be defined as a smooth function s:M→ℝ such that:

where Ni={i1,…,ik} denotes the set of indices for the k‐nearest neighborhoods of data point xi. Let f=[f(x1),…,f(xN)]∈ℝN be a function defined on the whole data set sampled from the global manifold. Thus, the local mapping fi=[f(x1i),…,f(xki)]∈ℝk can be expressible by fi=(Si)Tf. With the help of the selection matrix, we can discretely approximate the global functional G(f) as follows:

where {Lji}j=1M are the local matrices such as (Gi)TGi, (Wi)TWi, (Hi)THi, and (Wi)TWi which are defined in Eqs. (9), (14), (21), and (28). Pj​=​1N​∑i=1NSiLji(Si)T is the alignment matrix of the j‐th local manifold learning method. The global embedding coordinates Y=[y1,y2,…,yN]∈ℝd×N can be obtained by minimizing the functional G(f). Let y=f=[f(x1),…,f(xN)] be a row vector of Y. It is not hard to show that the global embedding coordinates and the nonnegative weights c=[c1,…,cM] can be obtained by minimizing the following objective function:

where the power parameter r>1 is set to avoid the phenomenon that the solution to c is cj=1 corresponding to the minimum Tr(YPjYT) over different local methods and ck=0(k≠j) otherwise, since our aim is to utilize the complementary geometric information from different manifold learning methods.

We propose to solve the objective function [Eq. (33)] by employing the alternating optimization [16] method, which iteratively updates Y and c in an alternating fashion. First, we fix c to update Y. The optimization problem in Eq. (33) is equivalent to:

where P=∑j=1McjrPj. When c is fixed, we can solve the optimization problem [Eq. (34)] and obtain the global optimal solution Y as the second to (d+1) st smallest eigenvectors of the matrix P. Second, we fix Y to update c. While Y is fixed, we can minimize the objective function [Eq. (33)] analytically through utilizing a Lagrange multiplier to enforce the constraint that ∑j=1Mcj=1. And the global optimal c can be obtained as:

5. Experimental results

In this section, we experiment on both synthetic and real‐world data sets to evaluate the performance of our method, named FLM. For LEM, LLE, HLLE, LTSA, and our Fusion of local manifolds (FLM) algorithms, we experiment on these data sets to obtain both visualization and quantitative evaluations. We utilize the global smoothness and co‐directional consistence (GSCD) criteria [17] to quantitatively compare the embedding qualities of different algorithms: the smaller the value of GSCD, the higher the global smoothness, and the better the co‐directional consistence. There are two adjustable parameters in our FLM method, that is, the tuning parameter r and the number of nearest neighbors k. FLM works well when the values of r and k are neither too small nor too large. The reason is that only one local method is chosen when r is too small, while the relative weights of different methods tend to be close to each other when it is too large. As a general recommendation, we suggest to work with r∈[2,6] and k∈[0.7⌈log(N)⌉,2⌈log(N)⌉].

5.1. Synthetic data sets

We first apply our FLM to the synthetic data sets that have been commonly used by other researchers: S‐Curve, Swiss Hole, Punctured Sphere, and Toroidal Helix. The character of these data sets can be summarized as: general, non‐convex, nonuniform, and noise, respectively. In each data set, we have total 1000 sample points, and the number of nearest neighbors is fixed to k=10 for all the algorithms. For the S‐Curve and Swiss Hole, we empirically set r=2, and for the Punctured Sphere and Toroidal Helix data sets, we set r = 3. Figures 2–5 show the embedding results of the above algorithms on the four synthetic data sets. Each manifold learning algorithm and the corresponding GSCD result are shown in the title of each subplot. We can evaluate the performances of these methods by comparing the coloring of the data points, the smoothness, and the shape of the projection coordinates with their original manifolds. Figures 2–5 reveal the following interesting observations.

Figure 2.

Embeddings of the synthetic manifold S‐curve. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result. (a) Sample data. The title of subplots (b)-(f) indicates the abbreviation of the the manifold learning algorithm and the GSCD result.

Figure 3.

Embeddings of the synthetic manifolds Swiss Hole. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result. (a) Sample data. The title of subplots (b)-(f) indicates the abbreviation of the the manifold learning algorithm and the GSCD result.

Figure 4.

Embeddings of the synthetic manifolds Punctured Sphere. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result. (a) Sample data. The title of subplots (b)-(f) indicates the abbreviation of the the manifold learning algorithm and the GSCD result.

Figure 5.

Embeddings of the synthetic manifolds Toroidal Helix. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result. (a) Sample data. The title of subplots (b)-(f) indicates the abbreviation of the the manifold learning algorithm and the GSCD result.

On some particular data sets, the traditional local manifold learning methods perform well. For example, LEM works well on the Toroidal Helix; LLE works well on the Punctured Sphere; HLLE works well on the S‐Curve and Swiss Hole; and LTSA performs well on the S‐Curve, Swiss Hole, and Punctured Sphere.

In general, our FLM performs the best on all the four data sets.

The above consequence is because only partial geometric information of the underlying manifold is learned by each traditional local manifold learning method, while the complementary geometric information learned from different manifold learning algorithms is respected by our FLM method.

5.2. Real‐world data set

We next conduct experiments on the isometric feature mapping face (ISOFACE) data set [1], which contains 698 images of a 3‐D human head. The ISOFACE data set is collected under different poses and lighting directions. The resolution of each image is 64×64. The intrinsic degrees of freedom are the horizontal rotation, vertical rotation, and lighting direction. The 2‐D embedding results of different algorithms and the corresponding GSCD results are shown in Figure 6. In the embedding, we randomly mark about 8% points with red circles and attach their corresponding training images. In the experiment, we fix the number of nearest neighbors to k=12 for all the algorithms. We empirically set r in FLM as 4. Figure 6 reveals the following interesting observations.

Figure 6.

Embeddings of the ISOFACE data set. Subfigure (a) shows nine sample images, and subfigure (b) to subfigure (f) are the embedding results of different manifold learning algorithms. The title of each subplot indicates the abbreviation of the manifold learning algorithm and the GSCD result.

As we can observe from Figure 6b and c, the embedding results of LEM and LLE show that the orientations of the faces change smoothly from left to right along the horizontal direction, and the orientations of the faces change from down to up along the vertical direction. However, as we can see at the right‐hand side of Figure 6b and c, the embedding results of both LEM and LLE come out to be severely compressed, and it is not obvious to survey the changes along the vertical direction.

As we can observe from Figure 6d and e, the horizontal rotation and variations in the brightness of the faces can be well revealed by the embedding result of HLLE and LTSA.

As we can observe from Figure 6f, orientations of the faces change smoothly from left to right along the horizontal direction, while the orientations of the faces change from down to up, and the light of the faces varies from bright to dark simultaneously along the vertical direction. These results illustrate that our FLM method successfully discovers the underlying manifold structure of the data set.

Our FLM performs the best on the ISOFACE data set, since our method makes full use of the complementary geometric information learned from different manifold learning methods. The corresponding GSCD results further verify the above visualization results in a quantitative way.

6. Conclusions

In this chapter, we introduce an interesting method, named FLM, which assumes a systematic framework to estimate the local objects and align them to reveal a single global low‐dimensional coordinate space. Within the framework, we can fuse together the geometric information learned from different local methods easily and effectively to better discover the underlying manifold structure. Experimental results on both the synthetic and real‐world data sets show that the proposed method leads to satisfactory results.

Acknowledgements

This work was supported by the Fundamental Research Funds for the Central Universities of China, Natural Science Fund of Heilongjiang Province of China, and Natural Science Foundation of China under Grant No. HEUCF160415, F2015033, and 61573114.