University of Colorado, Boulder CU Scholar Applied Mathematics Graduate Theses & Applied Mathematics Dissertations Spring 1-1-2012 Randomized Methods for Computing Low-Rank Approximations of Matrices Nathan P. Halko University of Colorado at Boulder, [email protected] Follow this and additional works at:https://scholar.colorado.edu/appm_gradetds Part of theMathematics Commons Recommended Citation Halko, Nathan P., "Randomized Methods for Computing Low-Rank Approximations of Matrices" (2012).Applied Mathematics Graduate Theses & Dissertations. 26. https://scholar.colorado.edu/appm_gradetds/26 This Dissertation is brought to you for free and open access by Applied Mathematics at CU Scholar. It has been accepted for inclusion in Applied Mathematics Graduate Theses & Dissertations by an authorized administrator of CU Scholar. For more information, please contact [email protected]. Randomized methods for computing low-rank approximations of matrices by Nathan P. Halko B.S., University of California, Santa Barbara, 2007 M.S., University of Colorado, Boulder, 2010 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Applied Mathematics 2012 This thesis entitled: Randomized methods for computing low-rank approximations of matrices written by Nathan P. Halko has been approved for the Department of Applied Mathematics Per-Gunnar Martinsson Keith Julien David M. Bortz Francois G. Meyer Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii Halko, Nathan P. (Ph. D., Applied Mathematics) Randomized methods for computing low-rank approximations of matrices Thesis directed by Professor Per-Gunnar Martinsson Randomized sampling techniques have recently proved capable of efficiently solving many standard problemsinlinearalgebra, andenablingcomputationsatscalesfarlargerthanwhatwaspreviously possible. Thenewalgorithmsaredesignedfromthebottomuptoperformwellinmoderncomputing environments where the expense of communication is the primary constraint. In extreme cases, the algorithms can even be made to work in a streaming environment where the matrix is not stored at all, and each element can be seen only once. The dissertation describes a set of randomized techniques for rapidly constructing a low-rank ap- proximationtoamatrix. Thealgorithmsarepresentedinamodularframeworkthatfirstcomputes an approximation to the range of the matrix via randomized sampling. Secondly, the matrix is pro- jected to the approximate range, and a factorization (SVD, QR, LU, etc.) of the resulting low-rank matrix is computed via variations of classical deterministic methods. Theoretical performance bounds are provided. Particular attention is given to very large scale computations where the matrix does not fit in RAM on a single workstation. Algorithms are developed for the case where the original matrix must be stored out-of-core but where the factors of the approximation fit in RAM. Numerical examples are provided that perform Principal Component Analysis of a data set that is so large that less than one hundredth of it can fit in the RAM of a standard laptop computer. Furthermore, the dissertation presents a parallelized randomized scheme for computing a reduced rank Singular Value Decomposition. By parallelizing and distributing both the randomized sampling stage and the processing of the factors in the approximate factorization, the method requires an amount iv of memory per node which is independent of both dimensions of the input matrix. Numerical experiments are performed on Hadoop clusters of computers in Amazon’s Elastic Compute Cloud with up to 64 total cores. Finally, we directly compare the performance and accuracy of the randomized algorithm with the classical Lanczos method on extremely large, sparse matrices and substantiate the claim that randomized methods are superior in this environment. Dedication This thesis is dedicated to my Mom and Dad who instilled in me the importance of education and have always shown me their love and support. vi Acknowledgements I would like to thank my co-authors Joel A. Tropp, Yoel Shkolnisky and Mark Tygert. Their hardworkaccountsformuchofthisthesisbeingalreadypublishedintopacademicjournals. Thank you to Ted Dunning and Dmitriy Lyubimov who inducted the SSVD into Mahout and facilitated study of this algorithm on large scales. I am in debt to Dave Angulo, Lanny Ripple and the SpotInfluence team for sharing their knowledge of large scale computing and being so gracious in the final months of this thesis. And finally, thank you to my advisor Gunnar Martinsson whose wisdom, style and knowledge have guided me these past years and enabled me to achieve this most epic goal. I am honored and grateful for all this support. vii Contents Chapter 1 Introduction 1 1.1 Approximation by low rank matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Existing methods for computing low-rank approximations . . . . . . . . . . . . . . . 4 1.2.1 Truncated Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Direct Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.3 Iterative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Changes in computer architecture, and the need for new algorithms. . . . . . . . . . 7 1.4 Computational framework and problem formulation . . . . . . . . . . . . . . . . . . 7 1.5 Randomized algorithms for approximating the range of a matrix . . . . . . . . . . . 9 1.5.1 Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5.2 Basic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 viii 1.5.3 Probabilistic error bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.5.4 Computational cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.6 Variations of the basic theme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.6.1 Structured Random Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.6.2 Increasing accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.6.3 Numerical Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6.4 Adaptive methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.7 Numerical linear algebra in a distributed computing environment . . . . . . . . . . . 21 1.7.1 Distributed computing environment . . . . . . . . . . . . . . . . . . . . . . . 22 1.7.2 Distributed operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.7.3 Stochastic Singular Value Decomposition in MapReduce . . . . . . . . . . . . 24 1.7.4 Lanczos comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 1.7.5 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.8 Survey of prior work on randomized algorithms in linear algebra . . . . . . . . . . . 26 1.9 Structure of dissertation and overview of principal contributions . . . . . . . . . . . 27 ix 2 Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions 32 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.1.1 Approximation by low-rank matrices . . . . . . . . . . . . . . . . . . . . . . . 36 2.1.2 Matrix approximation framework . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.1.3 Randomized algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.1.4 A comparison between randomized and traditional techniques . . . . . . . . . 43 2.1.5 Performance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.1.6 Example: Randomized SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.1.7 Outline of paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2 Related work and historical context. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.1 Randomized matrix approximation . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.2 Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.3 Linear algebraic preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.3.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.3.2 Standard matrix factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.3.3 Techniques for computing standard factorizations . . . . . . . . . . . . . . . . 66 2.4 Stage A: Randomized schemes for approximating the range . . . . . . . . . . . . . . 71
Description: