ebook img

Approximation of Positive Semidefinite Matrices Using the Nystrom PDF

168 Pages·2011·5.2 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Approximation of Positive Semidefinite Matrices Using the Nystrom

Approximation of Positive Semidefinite Matrices Using the Nystro¨m Method A dissertation presented by Nicholas Arcolano to The School of Engineering and Applied Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the subject of Applied Mathematics Harvard University Cambridge, Massachusetts May 2011 (cid:13)c 2011 - Nicholas Arcolano All rights reserved. Thesis advisor Author Patrick J. Wolfe Nicholas Arcolano Approximation of Positive Semidefinite Matrices Using the Nystro¨m Method Abstract Positive semidefinite matrices arise in a variety of fields, including statistics, signal processing, andmachinelearning. Unfortunately, whenthesematricesarehigh-dimensional and/or must be operated upon many times, expensive calculations such as the spectral decomposition quickly become a computational bottleneck. Acommonalternativeistoreplacetheoriginalpositivesemidefinitematriceswithlow-rank approximations whose spectral decompositions can be more easily computed. In this thesis, we develop approaches based on the Nystr¨om method, which approximates a positivesemidefinite matrixusinga data-dependentorthogonalprojection. AstheNystr¨om approximation is conditioned on a given principal submatrix of its argument, it essentially recasts low-rank approximation as a subset selection problem. We begin by deriving the Nystr¨om approximation and developing a number of fundamental results, including new characterizations of its spectral properties and approximation error. We then address the problem of subset selection through a study of randomized sampling algorithms. We provide new bounds for the approximation error under uniformly random sampling, as well as bounds for two new data-dependent sampling methods. We continue by extending these results to random positive definite matrices, deriving statistics for the approximationerrorofmatriceswithWishartandbetadistributions,aswellasforabroader class of orthogonally invariant and residual independent matrices. Once this theoretical foundation has been established, we turn to practical applications of Nystr¨om methods. We explore new exact and approximate sampling methods for randomized subset selection, and develop greedy approaches for subset optimization. We concludebydevelopingtheNystr¨omapproximationasalow-rankcovarianceestimatorthat provides for computationally efficient spectral analysis while shrinking the eigenvalues of the sample covariance. After deriving expressions for its bias and mean squared error, we illustrate the effectiveness of the Nystr¨om covariance estimator through empirical examples in adaptive beamforming and image denoising. iii Contents 1 Introduction 1 1.1 The Low-Rank Approximation Problem . . . . . . . . . . . . . . . . . . . . 2 1.2 Example Application: Approximate Spectral Clustering . . . . . . . . . . . 5 1.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . . . . . 9 2 Review of Selected Topics 12 2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Eigenvalues and Singular Values . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3 Special Classes of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Special Functions of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 Unitarily Invariant Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.6 Matrix-Variate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 The Nystr¨om Method 33 3.1 Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Nystr¨om Method for Matrix Approximation . . . . . . . . . . . . . . . . . . 35 3.3 Properties of the Nystr¨om Approximation . . . . . . . . . . . . . . . . . . . 40 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4 Randomized Methods for Nystr¨om Subset Selection 54 4.1 Uniform Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Data-Dependent Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 iv Contents v 5 Nystr¨om Approximation of Random Matrices 73 5.1 Wishart Random Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.2 Beta Random Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.3 Orthogonally Invariant and Residual Independent Random Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6 Algorithms for Nystr¨om Approximation 91 6.1 Existing Methods for Subset Selection . . . . . . . . . . . . . . . . . . . . . 92 6.2 Randomized Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 6.3 Optimization Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6.4 Subset Oversampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.5 Empirical Comparisons. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7 Low-Rank Covariance Estimation 120 7.1 Review of Covariance Estimation . . . . . . . . . . . . . . . . . . . . . . . . 121 7.2 The Nystrom Covariance Estimator . . . . . . . . . . . . . . . . . . . . . . 126 7.3 Example Application: Adaptive Beamforming . . . . . . . . . . . . . . . . . 133 7.4 Example Application: Image Denoising . . . . . . . . . . . . . . . . . . . . 139 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 8 Conclusion 148 8.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.2 Prominent Themes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 8.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Bibliography 153 Acknowledgments First and foremost, I would like to thank the United States Air Force∗ for their generous sponsorship of my research through the MIT Lincoln Scholars Program. Second, I would like to thank my thesis advisor, Patrick Wolfe. A good advisor must act as both advocate and critic, nurturing and mentoring his students while pushing them toward the limits of theirpotential. ProfessorWolfebalancedtheseroleswell, andmytechnicalandprofessional developmentbenefitedgreatlyfromhisguidance. Ialsowishtothanktheothermembersof my thesis committee—Edoardo Airoldi and Roger Brockett—for their helpful suggestions and comments. In addition, I would like to acknowledge the many other students, colleagues, and supervisors who have assisted me during my time at Harvard, including the members of the Harvard Statistics and Information Sciences Laboratory (SISL), my former co-workers and supervisors from Lincoln Laboratory’s Group 36 (BMDS Integration), and my new co- workers and supervisors from Group 102 (Embedded and High Performance Computing). In particular, I wish to thank my friend and colleague Daniel Rudoy for engaging me in manyhoursofhelpfulresearchdiscussions, aswellasmyLincolnScholarsMentorKeh-Ping Dunn, who has provided me with a great deal of professional guidance over the years. On a personal note, I would like to thank my friends and family for their support and kindness, especially my mother Cheryl. Throughout my life, many opportunities I encountered were possible only because of great effort and sacrifice on her part. I also wish to thank her for being a devoted grandparent to my beloved son Maxwell, always drawing from the same boundless reserves of patience and selflessness she used when raising me. Above all else, I am truly grateful to my wonderful wife Joy for her tireless encouragement andunrelentingaffection. Asalovingspouseandmother, abrilliantteacher, andatalented artist, she continues to be a source of personal and professional inspiration. All of my proudest achievements are owed in no small part to her influence. ∗This work is sponsored by the United States Air Force under contract FA8721-05-C-0002. Opinions, interpretations,recommendations,andconclusionsarethoseoftheauthorandarenotnecessarilyendorsed by the United States Government. vi Contents vii In regards to the resources needed to support a community of scholars, my favorite author once wrote: “It takes forty men with their feet on the ground to keep one man with his head in the air.” Once again, I extend my deepest thanks to all of the people whose help has allowed me to soar through the lofty heights of academia these past four years. For my family. viii Notation R field of real numbers C field of complex numbers x,y,z (column) vectors x , [x] i-th element of vector x i i Rn set of all real-valued n-length column vectors Cn set of all complex-valued n-length column vectors A,B,C matrices a , [A] (i,j)-th element of matrix A ij ij AT matrix transpose AH matrix conjugate transpose (Hermitian transpose) I n×n identity matrix n rank(A) rank of matrix A R(A) range space (column space) of matrix A dim(S) dimension of subspace S span(X) vector span of set of vectors X A submatrix of A specified by index sets I and J IJ A principal submatrix of A specified by I; equivalent to A I II x (cid:31) y x majorizes y x (cid:31) y x weakly majorizes y w σ (A) i-th singular value of matrix A (in nonincreasing order) i λ (S) i-th (real) eigenvalue of symmetric matrix S (in nonincreasing order) i A+ Moore-Penrose pseudoinverse of matrix A A Schur complement of A in A I I C (A) k-th compound of a matrix A k ∼ distributed as; used to relate a random variable to its distribution D = equal in distribution I(·) indicator function S∗ optimal k-th order approximation of S k S(cid:98)(I) Nystr¨om approximation of S given I ix Chapter 1 Introduction Data is an essential part of engineering and the applied sciences. Nearly every technical field involves the collection, processing, and analysis of data in some form, and as the availabilityandpowerofcomputationalresourcescontinuetoimprove,sodoourcapabilities for performing these tasks. In many fields, however, our ability to acquire data far outpaces our capacity to operate on it. Today, it is common to find applications involving data sets of size and dimensionality ranging from several thousand to millions (or even larger), such as gene identification in bioinformatics[1,2], portfoliooptimizationinfinance[3], andcommunitydetectioninlarge- scale graphs and networks [4,5]. For these applications, performing even the most basic analyses requires a substantial amount of computation. Fortunately, “data” is not always synonymous with “information”, and we often believe (or at least hope) that a collection of data vectors can be described by a mechanism much simplerthanmightbesuggestedbyitsrawsizeanddimension. Ifweassumethismechanism to be linear, one of the most powerful notions of simplicity involves the linear-algebraic concept of rank. Considerap×nmatrixX,whosecolumnsareacollectionofdatavectors{x ,...,x } ⊂ Rp. 1 n The rank of X, denoted rank(X), is equal to the size of the largest linearly independent set of its columns. Consequently, rank(X) = k ≤ min(p,n) implies that there exists a set of k vectors {b ,...,b } ⊂ Rp such that 1 k k (cid:88) x = a b , j ij i i=1 where {a } is a set of coefficients for i = 1,...,k and j = 1,...,n. ij 1

Description:
Unfortunately, when these matrices are high-dimensional .. its eigenvalues and eigenvectors, they can be computed for a reduced cost that scales as.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.