54 Pages·2008·16.08 MB·English

Nonlinear Dimensionality Reduction (a.k.a. Manifold Learning) David Capel 346B IST Bldg [email protected] Non-monotonicity Non-monotonicity Rank ordering of Euclidean distances is What is “nonlinear dimensionality reduction?” NOT preserved in “manifold learning”. Rank ordering of Euclidean distances is NOT preserved in “manifold learning”. C A C C A B A C B A B B NDR Low-dimensional High-dimensional data (Manifold Learning) embedding d(A,C) < d(A,B) d(A,C) > d(A,B) d(A,C) < d(A,B) d(A,C) > d(A,B) We often suspect that high-dim may actually lie on or near a • low-dim manifold (often much lower!) • It would be useful if we could reparametrize the data in terms of this manifold, yielding a low-dim embedding • BUT - we typically don’t know the form of this manifold Background:Linear subspaces IsoMap Locally Linear Embedding R E P O R T S Table 1. The Isomap algorithm takes as input the distances d (i,j ) between all pairs i,j from N data points local dimensionality varies across the data set. When X in the high-dimensional input space X, measured either in the standard Euclidean metric (as in Fig. 1A) available, additional constraints such as the temporal ordering of observations may also help to determine or in some domain-speciﬁc metric (as in Fig. 1B). The algorithm outputs coordinate vectors y in a i neighbors. In earlier work (36) we explored a more d-dimensional Euclidean space Y that (according to Eq. 1) best represent the intrinsic geometry of the complex method (37), which required an order of data. The only free parameter (! or K ) appears in Step 1. magnitude more data and did not support the theo- retical performance guarantees we provide here for Step !- and K-Isomap. 16. This procedure, known as Floyd’s algorithm, requires 1 Construct neighborhood graph Deﬁne the graph G over all data points by connecting O(N3) operations. More efﬁcient algorithms exploit- ing the sparse structure of the neighborhood graph points i and j if [as measured by d (i, j )] they are X can be found in (38). closer than ! (!-Isomap), or if i is one of the K 17. The operator " is deﬁned by "(D) # $HSH/2, where S nearest neighbors of j (K-Isomap). Set edge lengths is the matrix of squared distances {S # D2}, and H is equal to d (i,j). ij ij the “centering matrix” {H # % $ 1/N} (13). X ij ij 2 Compute shortest paths Initialize d (i,j) # d (i,j) if i,j are linked by an edge; 18. Our proof works by showing that for a sufﬁciently G X d (i,j) # 0 otherwise. Then for each value of k # high density (&) of data points, we can always choose G a neighborhood size (! or K) large enough that the 1, 2, . . ., N in turn, replace all entries d (i,j) by G graph will (with high probability) have a path not min{d (i,j), d (i,k) / d (k,j)}. The matrix of ﬁnal G G G much longer than the true geodesic, but small values D # {d (i,j)} will contain the shortest path G G enough to prevent edges that “short circuit” the true distances between all pairs of points in G (16, 19). geometry of the manifold. More precisely, given ar- 3 Construct d-dimensional embedding Let ’ be the p-th eigenvalue (in decreasing order) of bitrarily small values of ’ , ’ , and (, we can guar- p 1 2 the matrix "(D ) (17), and v i be the i-th antee that with probability at least 1 $ (, estimates Why might this be useful? G p of the form component of the p-th eigenvector. Then set the p-th component of the d-dimensional coordinate )1 ! ’ *d )i,j* " d )i,j* " )1 # ’ *d )i,j* 1 M G 2 M vector y equal to 1’ vi . will hold uniformly over all pairs of data points i,j. For The variation observed in high-dimensional signals oftein has p p • !-Isomap, we require much lower-dimensional explanation ! ! " )2/+*r 24’ , ! $ s , 0 1 0 ifolds, a guarantee of asymptotic conver- & % ,log)V/(- )’ !/16*d*./- )’ !/8*d d 2 d 2 gence to the true structure; and the ability to where r is the minimal radius of curvature of the 0 discover manifolds of arbitrary dimensional- manifold M as embedded in the input space X, s is 0 ity, rather than requiring a fixed d initialized the minimal branch separation of M in X, V is the (d-dimensional) volume of M, and (ignoring boundary from the beginning or computational resourc- effects) - is the volume of the unit ball in Euclidean d es that increase exponentially in d. d-space. For K-Isomap, we let ! be as above and ﬁx Here we have demonstrated Isomap’s per- the ratio (K / 1)/& # - (!/2)d/2. We then require d formance on data sets chosen for their visu- e!)K#1*/4 " (- )!/4*d/4V, d 64x64 pixel images parametrized by just 3 ally compelling structures, but the technique )e/4*)K#1*/2 " (- )!/8*d/16V, d variables (pose and lighting direction) may be applied wherever nonlinear geometry & % ,4 log)8V/(- )’ !/32+*d*./- )’ !/16+*d d 2 d 2 complicates the use of PCA or MDS. Isomap The exact content of these conditions—but not their • Discovering these modes of variation helps us undecrosmtapnledm ethntes, and may be combined with, general form—depends on the particular technical assumptions we adopt. For details and extensions to linear extensions of PCA based on higher underlying structure of the data and the process that generated it nonuniform densities, intrinsic curvature, and bound- order statistics, such as independent compo- ary effects, see http://isomap.stanford.edu. - Visualization of high-dimensional data nent analysis (31, 32). It may also lead to a 19. In practice, for ﬁnite data sets, d (i,j) may fail to G better understanding of how the brain comes approximate d (i,j) for a small fraction of points that - Machine learning and pattern recognition M are disconnected from the giant component of the to represent the dynamic appearance of ob- neighborhood graph G. These outliers are easily de- jects, where psychophysical studies of appar- tected as having inﬁnite graph distances from the Background:Linear subspaces IsoMap Locally Linear Embedding ent motion (33, 34) suggest a central role for majority of other points and can be deleted from further analysis. geodesic transformations on nonlinear mani- 20. The Isomap embedding of the hand images is avail- folds (35) much like those studied here. able at Science Online at www.sciencemag.org/cgi/ content/full/290/5500/2319/DC1. For additional material and computer code, see http://isomap. References and Notes stanford.edu. 1. M. P. Young, S. Yamane, Science 256, 1327 (1992). 21. R. Basri, D. Roth, D. Jacobs, Proceedings of the IEEE 2. R. N. Shepard, Science 210, 390 (1980). Conference on Computer Vision and Pattern Recog- 3. M. Turk, A. Pentland, J. Cogn. Neurosci. 3, 71 (1991). Fig. 4. Interpolations along straight lines in nition (1998), pp. 414–420. 4. H. Murase, S. K. Nayar, Int. J. Comp. Vision 14, 5 the Isomap coordinate space (analogous to 22. C. Bregler, S. M. Omohundro, Adv. Neural Info. Proc. (1995). Syst. 7, 973 (1995). the blue line in Fig. 3C) implement perceptu- 5. J. W. McClurkin, L. M. Optican, B. J. Richmond, T. J. 23. G. E. Hinton, M. Revow, P. Dayan, Adv. Neural Info. ally natural but highly nonlinear “morphs” of Gawne, Science 253, 675 (1991). Proc. Syst. 7, 1015 (1995). the corresponding high-dimensional observa- 6. J. L. Elman, D. Zipser, J. Acoust. Soc. Am. 83, 1615 24. R. Durbin, D. Willshaw, Nature 326, 689 (1987). (1988). tions (43) by transforming them approxi- 25. T. Kohonen, Self-Organisation and Associative Mem- 7. W. Klein, R. Plomp, L. C. W. Pols, J. Acoust. Soc. Am. mately along geodesic paths (analogous to ory (Springer-Verlag, Berlin, ed. 2, 1988), pp. 119– 48, 999 (1970). the solid curve in Fig. 3A). (A) Interpolations 157. 8. E. Bizzi, F. A. Mussa-Ivaldi, S. Giszter, Science 253, 287 in a three-dimensional embedding of face 26. T. Hastie, W. Stuetzle, J. Am. Stat. Assoc. 84, 502 (1991). images (Fig. 1A). (B) Interpolations in a four- (1989). 9. T. D. Sanger, Adv. Neural Info. Proc. Syst. 7, 1023 dimensional embedding of hand images (20) 27. M. A. Kramer, AIChE J. 37, 233 (1991). (1995). appear as natural hand movements when 10. J. W. Hurrell, Science 269, 676 (1995). 28. D. DeMers, G. Cottrell, Adv. Neural Info. Proc. Syst. 5, 580 (1993). viewed in quick succession, even though no 11. C. A. L. Bailer-Jones, M. Irwin, T. von Hippel, Mon. 29. R. Hecht-Nielsen, Science 269, 1860 (1995). Not. R. Astron. Soc. 298, 361 (1997). such motions occurred in the observed data. (C) 12. P. Menozzi, A. Piazza, L. Cavalli-Sforza, Science 201, 30. C. M. Bishop, M. Svens´en, C. K. I. Williams, Neural Interpolations in a six-dimensional embedding of 786 (1978). Comp. 10, 215 (1998). handwritten “2”s (Fig. 1B) preserve continuity not 13. K. V. Mardia, J. T. Kent, J. M. Bibby, Multivariate 31. P. Comon, Signal Proc. 36, 287 (1994). only in the visual features of loop and arch artic- Analysis, (Academic Press, London, 1979). 32. A. J. Bell, T. J. Sejnowski, Neural Comp. 7, 1129 ulation, but also in the implied pen trajectories, 14. A. H. Monahan, J. Clim., in press. (1995). which are the true degrees of freedom underlying 15. The scale-invariant K parameter is typically easier to 33. R. N. Shepard, S. A. Judd, Science 191, 952 (1976). those appearances. set than !, but may yield misleading results when the 34. M. Shiffrar, J. J. Freyd, Psychol. Science 1, 257 (1990). 2322 22 DECEMBER 2000 VOL 290 SCIENCE www.sciencemag.org Okay, so how do we learn the embedding? Given high-dim data sampled from an unknown low-dim • manifold, how can we automatically recover a good embedding? A Global Geometric Framework for Nonlinear Dimensionality Reduction Tenenbaum, de Silva and Langford Science (Vol. 290, Dec 2000, 2319-2323) Nonlinear Dimensionality Reduction by Locally Linear Embedding Roweis and Saul Science (Vol. 290, Dec 2000, 2323-2327) Background:Linear subspaces IsoMap Locally Linear Embedding Outline Linear subspace embedding • - Principal Components Analysis (PCA) - Metric Multidimensional Scaling (MDS) • Non-linear manifold learning - Isomap (Tenenbaum et al.) - Locally Linear Embedding (Roweis et al.) • Some examples Background:Linear subspaces IsoMap Locally Linear Embedding An excellent tutorial ... Spectral Methods for Dimensionality Reduction Prof. Lawrence Saul Dept of Computer & Information Science University of Pennsylvania NIPS*05 Tutorial, December 5, 2005 Neural Information Processing Systems Conference ... from which I have borrowed liberally! Thanks Lawrence! Background:Linear subspaces IsoMap Locally Linear Embedding Background - Linear Subspace Embedding Linear subspaces We may often assume that our high-dim data lies on/near a • linear subspace D =2 D =3 high high D =1 D =2 low low BBaacckkggrroouunndd::LLiinneeaarr ssuubbssppaacceess IsoMap Locally Linear Embedding Linear subspaces We may often assume that our high-dim data lies on/near a • linear subspace D =2 D =3 high high D =1 D =2 low low • In this case, well-known, stable tools exist for determining the parameters of this subspace - Principal Components Analysis - Metric Multidimensional Scaling • Among the most widely-used algorithms in engineering! BBaacckkggrroouunndd::LLiinneeaarr ssuubbssppaacceess IsoMap Locally Linear Embedding Notation We have a quantity N of D-dimensional data points x • • We seek to map x to a set of d-dimensional points y • N is large and d << D BBaacckkggrroouunndd::LLiinneeaarr ssuubbssppaacceess IsoMap Locally Linear Embedding

What is “nonlinear dimensionality reduction?” High-dimensional data. Low-dimensional embedding. NDR. (Manifold Learning). • We often suspect that

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.