The Nonnegative Matrix Factorization a tutorial Barbara Ball Atina Brooks Amy Langville [email protected] [email protected] [email protected] C. of Charleston N.C. State U. C. of Charleston Mathematics Dept. Statistics Dept. Mathematics Dept. NISS NMF Workshop February 23–24, 2007 Outline • Two Factorizations: — Singular Value Decomposition — Nonnegative Matrix Factorization • Why factor anyway? • Computing the NMF — Early Algorithms — Recent Algorithms • Extensions of NMF Data Matrix A with rank r × m n Examples term-by-document matrix feature-by-item matrix pixel intensity-by-image matrix user-by-purchase matrix gene-by-DNA microarray matrix terrorist-by-action matrix SVD (cid:1) T r T A = UΣ V = σ u v i=1 i i i What is the SVD? 7 of 30 decreasing importance The SVD 8 of 30 Data Matrix A with rank r × m n Examples term-by-document matrix feature-by-item matrix pixel intensity-by-image matrix user-by-purchase matrix gene-by-DNA microarray matrix terrorist-by-action matrix SVD (cid:1) T r T A = UΣ V = σ u v i=1 i i i Low Rank Approximation (cid:1) k T use A = σ u v in place of A k i=1 i i i SVD Rank Reduction 10 of 30 Why use Low Rank Approximation? • Data Compression and Storage when k << r • Remove noise and uncertainty ⇒ improved performance on data mining task of retrieval (e.g., find similar items) ⇒ improved performance on data mining task of clustering Properties of SVD • basis vectors u and v are orthogonal i i • u , v are mixed in sign ij ij T A = U Σ V k k k k nonneg mixed nonneg mixed • U, V are dense • uniqueness—while there are many SVD algorithms, they all create the same (truncated) factorization • optimality—of all rank-k approximations, A is optimal k (cid:2) − (cid:2) (cid:2) − (cid:2) A A = min A B ≤ k F rank(B) k F Summary of Truncated SVD Strengths • using A in place of A gives improved performance k • noise reduction isolates essential components of matrix • best rank-k approximation • A is unique k Weaknesses • storage—U and V are usually completely dense k k • interpretation of basis vectors is difficult due to mixed signs • good truncation point k is hard to determine 8 7 6 • 5 orthogonality restriction sigma 4 3 2 1 00 20 40 60 80 100 120 k=28
Description: