Nonnegative Matrix Factorization DDII0022__GGIILLLLIISS__FFMM__VV33..iinndddd 11 1100//77//22002200 99::5544::2277 AAMM Data Science Book Series Editor-in-Chief Ilse Ipsen North Carolina State University Editorial Board Amy Braverman Nicholas J. Higham Ali Rahimi Jet Propulsion Laboratory University of Manchester Google Baoquan Chen Michael Mahoney David P. Woodruff Shandong University UC Berkeley Carnegie Mellon University Amr El-Bakry Haesun Park Hua Zhou ExxonMobil Upstream Research Georgia Institute of Technology University of California, Los Angeles Daniela Calvetti and Erkki Somersalo, Mathematics of Data Science: A Computational Approach to Clustering and Classification Nicolas Gillis, Nonnegative Matrix Factorization DDII0022__GGIILLLLIISS__FFMM__VV33..iinndddd 22 1100//77//22002200 99::5544::2277 AAMM Nonnegative Matrix Factorization Nicolas Gillis University of Mons Mons, Belgium Society for Industrial and Applied Mathematics Philadelphia DDII0022__GGIILLLLIISS__FFMM__VV33..iinndddd 33 1100//77//22002200 99::5544::2288 AAMM Copyright © 2021 by the Society for Industrial and Applied Mathematics 10 9 8 7 6 5 4 3 2 1 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 Market Street, 6th Floor, Philadelphia, PA 19104-2688 USA. No warranties, express or implied, are made by the publisher, authors, and their employers that the programs contained in this volume are free of error. They should not be relied on as the sole basis to solve a problem whose incorrect solution could result in injury to person or property. If the programs are employed in such a manner, it is at the user’s own risk and the publisher, authors, and their employers disclaim all liability for such misuse. Trademarked names may be used in this book without the inclusion of a trademark symbol. These names are used in an editorial context only; no infringement of trademark is intended. MATLAB is a registered trademark of The MathWorks, Inc. For MATLAB product information, please contact The MathWorks, Inc., 3 Apple Hill Drive, Natick, MA 01760-2098 USA, 508-647-7000, Fax: 508-647-7001, [email protected], www.mathworks.com. Publications Director Kivmars H. Bowling Executive Editor Elizabeth Greenspan Developmental Editor Mellisa Pascale Managing Editor Kelly Thomas Production Editor Lisa Briggeman Copy Editor Susan Fleshman Production Manager Donna Witzleben Production Coordinator Cally A. Shrader Compositor Cheryl Hufnagle Graphic Designer Doug Smock Library of Congress Cataloging-in-Publication Data Names: Gillis, Nicolas, author. Title: Nonnegative matrix factorization / Nicolas Gillis, University of Mons, Mons, Belgium. Description: Philadelphia : Society for Industrial and Applied Mathematics, [2021] | Series: Data science ; 2 | Includes bibliographical references and index. | Summary: “This book provides a comprehensive and up-to-date account of the NMF problem and its most significant features”-- Provided by publisher. Identifiers: LCCN 2020042037 (print) | LCCN 2020042038 (ebook) | ISBN 9781611976403 (paperback) | ISBN 9781611976410 (ebook) Subjects: LCSH: Non-negative matrices. | Factorization (Mathematics) | Computer algorithms. | Data mining. Classification: LCC QA188 .G566 2021 (print) | LCC QA188 (ebook) | DDC 512.9/434--dc23 LC record available at https://lccn.loc.gov/2020042037 LC ebook record available at https://lccn.loc.gov/2020042038 is a registered trademark. DDII0022__GGIILLLLIISS__FFMM__VV33..iinndddd 44 1100//77//22002200 99::5544::2288 AAMM Pour Aline, Elinor et Rose DDII0022__GGIILLLLIISS__FFMM__VV33..iinndddd 55 1100//77//22002200 99::5544::2288 AAMM Contents Preface xi Notation xv ListofFigures xxi ListofTables xxv 1 Introduction 1 1.1 Lineardimensionalityreductiontechniquesfordataanalysis. . . . . . . . . 1 1.2 Problemdefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 FourapplicationsofNMFindataanalysis . . . . . . . . . . . . . . . . . . 6 1.4 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 I Exactfactorizations 17 2 ExactNMF 19 2.1 Geometricinterpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 RestrictedExactNMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3 ComputationalcomplexityofRE-NMFandExactNMF . . . . . . . . . . . 43 2.4 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3 Nonnegativerank 55 3.1 Somepropertiesofthenonnegativerank . . . . . . . . . . . . . . . . . . . 55 3.2 Thenonnegativerankunderperturbations . . . . . . . . . . . . . . . . . . 57 3.3 Genericvaluesofthenonnegativerank . . . . . . . . . . . . . . . . . . . . 61 3.4 Lowerboundsonthenonnegativerank . . . . . . . . . . . . . . . . . . . . 62 3.5 Upperboundsforthenonnegativerank . . . . . . . . . . . . . . . . . . . . 82 3.6 Lowerboundsonextendedformulationsviathenonnegativerank . . . . . . 83 3.7 Linkwithcommunicationcomplexity . . . . . . . . . . . . . . . . . . . . . 94 3.8 Otherapplicationsofthenonnegativerank . . . . . . . . . . . . . . . . . . 97 3.9 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4 Identifiability 99 4.1 Caserank(X)≤2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.2 ExactNMFwithr =rank(X) . . . . . . . . . . . . . . . . . . . . . . . . 103 4.3 RegularizedExactNMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 4.4 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 vii viii Contents II Approximatefactorizations 157 5 NMFmodels 159 5.1 Errormeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.2 Model-orderselection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.3 Regularizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.4 NMFvariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 5.5 ModelsrelatedtoNMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 5.6 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 6 ComputationalcomplexityofNMF 195 6.1 Frobeniusnorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 6.2 Kullback–Leiblerdivergence . . . . . . . . . . . . . . . . . . . . . . . . . 200 6.3 Infinitynorm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 6.4 WeightedFrobeniusnorm . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 6.5 Componentwise(cid:96) norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 1 6.6 OtherNMFmodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 6.7 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 7 Near-separableNMF 207 7.1 Contextandapplications. . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 7.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 7.3 Idealizedalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 7.4 Greedy/sequentialalgorithms . . . . . . . . . . . . . . . . . . . . . . . . . 223 7.5 Heuristicalgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 7.6 Convex-optimization-basedalgorithms . . . . . . . . . . . . . . . . . . . . 248 7.7 Summaryofprovablyrobustnear-separableNMFalgorithms . . . . . . . . 256 7.8 Separabletri-symNMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 7.9 Furtherreadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 7.10 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 8 IterativealgorithmsforNMF 261 8.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 8.2 Themultiplicativeupdates . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 8.3 AlgorithmsfortheFrobeniusnorm . . . . . . . . . . . . . . . . . . . . . . 280 8.4 Numberofinneriterationsandacceleration. . . . . . . . . . . . . . . . . . 291 8.5 Stoppingcriteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 8.6 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 8.7 Alternativealgorithmicapproaches . . . . . . . . . . . . . . . . . . . . . . 301 8.8 Furtherreadings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 8.9 Onlineresources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 8.10 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 9 Applications 307 9.1 Bewareofscalingambiguity . . . . . . . . . . . . . . . . . . . . . . . . . 307 9.2 Shouldyourdatasetbeapproximatelyoflowrank? . . . . . . . . . . . . . 308 9.3 Self-modelingcurveresolution . . . . . . . . . . . . . . . . . . . . . . . . 308 9.4 Geneexpressionanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 9.5 Recommendersystemsandcollaborativefiltering . . . . . . . . . . . . . . 311 9.6 Otherapplications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 9.7 Take-homemessages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 Contents ix Bibliography 315 Index 343 Preface Identifyingtheunderlyingstructureofadatasetandextractingmeaningfulinformationisakey problem in data analysis. Simple and powerful methods to achieve this goal are linear dimen- sionalityreduction(LDR)techniques, whichareequivalenttolow-rankmatrixapproximations (LRMA). Examples of LDR techniques are principal component analysis (PCA), independent componentanalysis,sparsePCA,robustPCA,low-rankmatrixcompletion,andsparsecompo- nentanalysis.Thereasonforthesuccessofthistypeofmethodsisthat,althoughsimple,theyare applicableinawiderangeofapplicationssuchasrecommendersystems,model-orderreduction andsystemidentification,clustering,imageanalysis,andblindsourceseparation,tociteafew. Among LRMA techniques, nonnegative matrix factorization (NMF) requires the factors of thelow-rankapproximationtobecomponentwisenonnegative. Thismakesitpossibletointer- pret them meaningfully, for example when they correspond to nonnegative physical quantities. ApplicationsofNMFincludeextractingpartsoffaces(suchaseyes,noses,andlips)inasetof facialimages,identifyingtopicsinasetofdocuments,learninghiddenMarkovmodels,extract- ingmaterialsandtheirabundancesinhyperspectralimages,separatingaudiosourcesfromtheir mixture,detectingcommunitiesinlargenetworks,analyzingmedicalimages,anddecomposing geneexpressionmicroarrays. Aim of the book Theaimofthisbookistoprovideacomprehensiveaccountofthemost importantaspectsoftheNMFproblem: • Theoreticalaspects: thenonnegativerank,thenonuniqueness/identifiabilityofNMFsolu- tions,thegeometricinterpretationofNMF,andcomputationalcomplexityissues. • Models: choiceoftheobjectivefunctionandregularizations,linkwithwell-knowntech- niques such as k-means, and use of additional constraints such as orthogonality or sym- metry. • Algorithms: heuristic algorithms using standard nonlinear optimization schemes such as blockcoordinatedescentmethods,andprovablycorrectalgorithmsunderappropriateas- sumptions. • Applications: theyincludeimageanalysis,documentclassification,hyperspectralunmix- ing,audiosourceseparation,topicmodeling,andcommunitydetection. Thisbookisaccessibletoawideaudience. Inparticularitisintendedforpeoplewhowant to know about the workings of NMF. It also aims to give more insights to practitioners so that they can use NMF meaningfully. To read this book, basic knowledge of linear algebra and optimizationisneeded. xi xii Preface Why is this book important? AlthoughNMFhasbeenstudiedextensivelyforthelast 20years, thereiscurrentlyonlyonebookonthetopic, byCichockietal.[98](2009)whichis already more than 10 years old. It focuses on iterative algorithms and applications, and many aspectsofNMFarenotcoveredinthatbook—especiallysincemanyimportantresultshavebeen obtainedinthelast10years.1 The aim of this book is to fill in this gap by providing more insights into the theoretical aspectsofNMF.ThesearekeytobeabletouseNMFeffectivelyandmeaningfullyinpractice. ThiswillallowthereadertomakebetteruseofNMFasacomputationaltool.Thisbookisaimed atresearcherswhowanttounderstandtheNMFproblem;forexample, • Youdonotknow(much)NMFandwanttodiscoverthisproblem,whyandhowitworks, andwhatitcanbeusedfor. Thisbookwouldbeidealforexampleforamaster’sorPh.D. studentstartingtoworkonNMF. • You are using NMF for applications but you would like to understand better its subtly difficult aspects such as its computational complexity, its geometric interpretation, or its nonuniqueness/identifiabilityissues. Also,youwouldliketoknowaboutthestate-of-the- artalgorithms. • You are already rather familiar with NMF but have not yet studied all of its aspects (for exampleyouwouldliketoknowmoreaboutthenonnegativerank,orthenonuniqueness ofNMFsolutions). ThisbookwillallowyoutodelveintodifferentaspectsoftheNMF problemandwillgiveyouusefulreferences. Moreover,thisbookcontainsafewnewresultsnotpresentintheliterature(asfarasIknow): bounds on the nonnegative rank under rank-one perturbations (Theorem 3.3), the study of the generic value of the nonnegative rank (Section 3.3.2), the identifiability of orthogonal NMF (Section4.3.2),andanecessaryconditionforthesufficientlyscatteredcondition,acrucialnotion whenstudyingtheuniquenessofNMFsolutions(seeTheorem4.28). MATLAB code Allalgorithmsandnumericalexperimentspresentedinthisbookareavail- ablefrombookstore.siam.org/di02/bonus. Whenwediscussanalgorithm,ordisplayre- sultsfromanumericalexperiment,thecorrespondingMATLABfilewillbeindicatedusing [Matlab file:Name of file] It can be found in the folder of the corresponding chapter. Hence the interested reader can easilyfindthecorrespondingMATLABfile. ToprovideabetterviewofalltheNMFalgorithms availablewiththisbook,thereisanexceptionforNMFalgorithms: theycanallbefoundinthe folder[algorithms]. Forexample,theseparableNMFalgorithmspresentedinChapter7canbe foundinthefolder[algorithms/separableNMF],althoughthenumericalexperimentspresented inChapter7canbefoundinthefolder[Chapter7-SeparableNMF]. AlltestsinthisbookareperformedusingMATLABR2019bonalaptopIntelCorei7-7500U [email protected],24GBRAM. How to use this book Thebookisorganizedsothatitispossibletoreadonlysubsetsof the chapters depending on the reader’s interests. The book was written so that each chapter is 1Suchastheidentifiabilityresultsbasedonthesufficientlyscatteredcondition(Chapter4),andthepolynomial-time algorithmsforseparableNMF(Chapter7).