Transcription

1 Out-of-Sample Extensons for LLE, Isomap, MDS, Egenmaps, and Spectral Clusterng Yoshua Bengo, Jean-Franços Paement, Pascal Vncent Olver Delalleau, Ncolas Le Roux and Mare Oumet Département d Informatque et Recherche Opératonnelle Unversté de Montréal Montréal, Québec, Canada, H3C 3J7 Abstract Several unsupervsed learnng algorthms based on an egendecomposton provde ether an embeddng or a clusterng only for gven tranng ponts, wth no straghtforward extenson for out-of-sample examples short of recomputng egenvectors. Ths paper provdes a unfed framework for extendng Local Lnear Embeddng (LLE), Isomap, Laplacan Egenmaps, Mult-Dmensonal Scalng (for dmensonalty reducton) as well as for Spectral Clusterng. Ths framework s based on seeng these algorthms as learnng egenfunctons of a data-dependent kernel. Numercal experments show that the generalzatons performed have a level of error comparable to the varablty of the embeddng algorthms due to the choce of tranng data. 1 Introducton Many unsupervsed learnng algorthms have been recently proposed, all usng an egendecomposton for obtanng a lower-dmensonal embeddng of data lyng on a non-lnear manfold: Local Lnear Embeddng (LLE) (Rowes and Saul, 2000), Isomap (Tenenbaum, de Slva and Langford, 2000) and Laplacan Egenmaps (Belkn and Nyog, 2003). There are also many varants of Spectral Clusterng (Wess, 1999; Ng, Jordan and Wess, 2002), n whch such an embeddng s an ntermedate step before obtanng a clusterng of the data that can capture flat, elongated and even curved clusters. The two tasks (manfold learnng and clusterng) are lnked because the clusters found by spectral clusterng can be arbtrary curved manfolds (as long as there s enough data to locally capture ther curvature). 2 Common Framework In ths paper we consder fve types of unsupervsed learnng algorthms that can be cast n the same framework, based on the computaton of an embeddng for the tranng ponts obtaned from the prncpal egenvectors of a symmetrc matrx. Algorthm 1 1. Start from a data set D = {x 1,..., x n } wth n ponts n R d. Construct a n n neghborhood or smlarty matrx M. Let us denote K D (, ) (or K for shorthand) the data-dependent functon whch produces M by M j = K D (x, x j ). 2. Optonally transform M, yeldng a normalzed matrx M. Equvalently, ths corresponds to generatng M from a K D by M j = K D (x, x j ).

2 3. Compute the m largest postve egenvalues λ k and egenvectors v k of M. 4. The embeddng of each example x s the vector y wth y k the -th element of the k-th prncpal egenvector v k of M. Alternatvely (MDS and Isomap), the embeddng s e, wth e k = λ k y k. If the frst m egenvalues are postve, then e e j s the best approxmaton of M j usng only m coordnates, n the squared error sense. In the followng, we consder the specalzatons of Algorthm 1 for dfferent unsupervsed learnng algorthms. Let S be the -th row sum of the affnty matrx M: S = j M j. (1) We say that two ponts (a, b) are k-nearest-neghbors of each other f a s among the k nearest neghbors of b n D {a} or vce-versa. We denote by x j the j-th coordnate of the vector x. 2.1 Mult-Dmensonal Scalng Mult-Dmensonal Scalng (MDS) starts from a noton of dstance or affnty K that s computed between each par of tranng examples. We consder here metrc MDS (Cox and Cox, 1994). For the normalzaton step 2 n Algorthm 1, these dstances are converted to equvalent dot products usng ( the double-centerng formula: ) M j = 1 M j 1 2 n S 1 n S j + 1 n 2 S k. (2) The embeddng e k of example x s gven by λ k v k. 2.2 Spectral Clusterng Spectral clusterng (Wess, 1999) can yeld mpressvely good results where tradtonal clusterng lookng for round blobs n the data, such as K-means, would fal mserably. It s based on two man steps: frst embeddng the data ponts n a space n whch clusters are more obvous (usng the egenvectors of a Gram matrx), and then applyng a classcal clusterng algorthm such as K-means, e.g. as n (Ng, Jordan and Wess, 2002). The affnty matrx M s formed usng a kernel such as the Gaussan kernel. Several normalzaton steps have been proposed. Among the most successful ones, as advocated n (Wess, 1999; Ng, Jordan and Wess, 2002), s the followng: M j = k M j S S j. (3) To obtan m clusters, the frst m prncpal egenvectors of M are computed and K-means s appled on the unt-norm coordnates, obtaned from the embeddng y k = v k. 2.3 Laplacan Egenmaps Laplacan Egenmaps s a recently proposed dmensonalty reducton procedure (Belkn and Nyog, 2003) that has been proposed for sem-supervsed learnng. The authors use an approxmaton of the Laplacan operator such as the Gaussan kernel or the matrx whose element (, j) s 1 f x and x j are k-nearest-neghbors and 0 otherwse. Instead of solvng an ordnary egenproblem, the followng generalzed egenproblem s solved: (S M)v j = λ j Sv j (4) wth egenvalues λ j, egenvectors v j and S the dagonal matrx wth entres gven by eq. (1). The smallest egenvalue s left out and the egenvectors correspondng to the other small egenvalues are used for the embeddng. Ths s the same embeddng that s computed wth the spectral clusterng algorthm from (Sh and Malk, 1997). As noted n (Wess, 1999) (Normalzaton Lemma 1), an equvalent result (up to a componentwse scalng of the embeddng) can be obtaned by consderng the prncpal egenvectors of the normalzed matrx defned n eq. (3).

3 2.4 Isomap Isomap (Tenenbaum, de Slva and Langford, 2000) generalzes MDS to non-lnear manfolds. It s based on replacng the Eucldean dstance by an approxmaton of the geodesc dstance on the manfold. We defne the geodesc dstance wth respect to a data set D, a dstance d(u, v) and a neghborhood k as follows: D(a, b) = mn p d(p, p +1 ) (5) where p s a sequence of ponts of length l 2 wth p 1 = a, p l = b, p D {2,..., l 1} and (p,p +1 ) are k-nearest-neghbors. The length l s free n the mnmzaton. The Isomap algorthm obtans the normalzed matrx M from whch the embeddng s derved by transformng the raw parwse dstances matrx as follows: frst compute the matrx M j = D 2 (x, x j ) of squared geodesc dstances wth respect to the data D, then apply to ths matrx the dstance-to-dot-product transformaton (eq. (2)), as for MDS. As n MDS, the embeddng s e k = λ k v k rather than y k = v k. 2.5 LLE The Local Lnear Embeddng (LLE) algorthm (Rowes and Saul, 2000) looks for an embeddng that preserves the local geometry n the neghborhood of each data pont. Frst, a sparse matrx of local predctve weghts W j s computed, such that j W j = 1, W j = 0 f x j s not a k-nearest-neghbor of x and ( j W jx j x ) 2 s mnmzed. Then the matrx M = (I W ) (I W ) (6) s formed. The embeddng s obtaned from the lowest egenvectors of M, except for the smallest egenvector whch s unnterestng because t s (1, 1,... 1), wth egenvalue 0. Note that the lowest egenvectors of M are the largest egenvectors of M µ = µi M to ft Algorthm 1 (the use of µ > 0 wll be dscussed n secton 4.4). The embeddng s gven by y k = v k, and s constant wth respect to µ. 3 From Egenvectors to Egenfunctons To obtan an embeddng for a new data pont, we propose to use the Nyström formula (eq. 9) (Baker, 1977), whch has been used successfully to speed-up kernel methods computatons by focussng the heaver computatons (the egendecomposton) on a subset of examples. The use of ths formula can be justfed by consderng the convergence of egenvectors and egenvalues, as the number of examples ncreases (Baker, 1977; Wllams and Seeger, 2000; Koltchnsk and Gné, 2000; Shawe-Taylor and Wllams, 2003). Intutvely, the extensons to obtan the embeddng for a new example requre specfyng a new column of the Gram matrx M, through a tranng-set dependent kernel functon K D, n whch one of the arguments may be requred to be n the tranng set. If we start from a data set D, obtan an embeddng for ts elements, and add more and more data, the embeddng for the ponts n D converges (for egenvalues that are unque). (Shawe-Taylor and Wllams, 2003) gve bounds on the convergence error (n the case of kernel PCA). In the lmt, we expect each egenvector to converge to an egenfuncton for the lnear operator defned below, n the sense that the -th element of the k-th egenvector converges to the applcaton of the k-th egenfuncton to x (up to a normalzaton factor). Consder a Hlbert space H p of functons wth nner product f, g p = f(x)g(x)p(x)dx, wth a densty functon p(x). Assocate wth kernel K a lnear operator K p n H p : (K p f)(x) = K(x, y)f(y)p(y)dy. (7) We don t know the true densty p but we can approxmate the above nner product and lnear operator (and ts egenfunctons) usng the emprcal dstrbuton ˆp. An emprcal Hlbert space Hˆp s thus defned usng ˆp nstead of p. Note that the proposton below can be

7 10 x 10 4 x x Fgure 1: Tranng set varablty mnus out-of-sample error, wrt the proporton of tranng samples substtuted. Top left: MDS. Top rght: spectral clusterng or Laplacan egenmaps. Bottom left: Isomap. Bottom rght: LLE. Error bars are 95% confdence ntervals. 1. We choose F D wth m = F samples. The remanng n m samples n D/F are splt nto two equal sze subsets R 1 and R 2. We tran (obtan the egenvectors) over F R 1 and F R 2. When egenvalues are close, the estmated egenvectors are unstable and can rotate n the subspace they span. Thus we estmate an affne algnment between the two embeddngs usng the ponts n F, and we calculate the Eucldean dstance between the algned embeddngs obtaned for each s F. 2. For each sample s F, we also tran over {F R 1 }/{s }. We apply the extenson to out-of-sample ponts to fnd the predcted embeddng of s and calculate the Eucldean dstance between ths embeddng and the one obtaned when tranng wth F R 1,.e. wth s n the tranng set. 3. We calculate the mean dfference (and ts standard error, shown n the fgure) between the dstance obtaned n step 1 and the one obtaned n step 2 for each sample s F, and we repeat ths experment for varous szes of F. The results obtaned for MDS, Isomap, spectral clusterng and LLE are shown n fgure 1 for dfferent values of m. Experments are done over a database of 698 synthetc face mages descrbed by 4096 components that s avalable at Qualtatvely smlar results have been obtaned over other databases such as Ionosphere ( mlearn/mlsummary.html) and swssroll ( rowes/lle/). Each algorthm generates a twodmensonal embeddng of the mages, followng the experments reported for Isomap. The number of neghbors s 10 for Isomap and LLE, and a Gaussan kernel wth a standard devaton of 0.01 s used for spectral clusterng / Laplacan egenmaps. 95% confdence

What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

Dmensonalty Reducton for Data Vsualzaton Samuel Kask and Jaakko Peltonen Dmensonalty reducton s one of the basc operatons n the toolbox of data-analysts and desgners of machne learnng and pattern recognton

Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

4. GCD 1 The greatest common dvsor of two ntegers a and b (not both zero) s the largest nteger whch s a common factor of both a and b. We denote ths number by gcd(a, b), or smply (a, b) when there s no

EIGENVALUES AND EIGENVECTORS The Characterstc Polynomal If A s a square matrx and v s a non-zero vector such that Av v we say that v s an egenvector of A and s the correspondng egenvalue Av v Example :

Learnng Objectves 9.1 The Cumulatve Sum Control Chart 9.1.1 Basc Prncples: Cusum Control Chart for Montorng the Process Mean If s the target for the process mean, then the cumulatve sum control chart s

PH575 SPRING QUANTUM MECHANICS, BRAS AND KETS The followng summares the man relatons and defntons from quantum mechancs that we wll be usng. State of a phscal sstem: The state of a phscal sstem s represented

Antono Olmos, 01 Multple Regresson Problem: we want to determne the effect of Desre for control, Famly support, Number of frends, and Score on the BDI test on Perceved Support of Latno women. Dependent

MeshCurrent Method The meshcurrent s analog of the nodeoltage method. We sole for a new set of arables, mesh currents, that automatcally satsfy KCLs. As such, meshcurrent method reduces crcut soluton to

Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today

REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

New bounds n Balog-Szemeréd-Gowers theorem By Tomasz Schoen Abstract We prove, n partcular, that every fnte subset A of an abelan group wth the addtve energy κ A 3 contans a set A such that A κ A and A

An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte

ErrorPropagaton.nb Error Propagaton Suppose that we make observatons of a quantty x that s subject to random fluctuatons or measurement errors. Our best estmate of the true value for ths quantty s then

Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

Regresson Lectures So far we have talked only about statstcs that descrbe one varable. What we are gong to be dscussng for much of the remander of the course s relatonshps between two or more varables.

Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

Fnancal Mathemetcs 15 Mathematcs Grade 12 Teacher Gude Fnancal Maths Seres Overvew In ths seres we am to show how Mathematcs can be used to support personal fnancal decsons. In ths seres we jon Tebogo,

HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo