Interactive Algorithms for Unsupervised Machine Learning Akshay Krishnamurthy CMU-CS-15-116 June2015 SchoolofComputerScience CarnegieMellonUniversity Pittsburgh,PA15213 ThesisCommittee: AartiSingh,Chair Maria-FlorinaBalcan Barnaba´sPoczo´s LarryWasserman SanjoyDasgupta(UCSD) JohnLangford(MicrosoftResearch) Submittedinpartialfulfillmentoftherequirements forthedegreeofDoctorofPhilosophy. Copyright c 2015AkshayKrishnamurthy � This research was sponsored by the Air Force Office of Scientific Research under grant number FA95501010382 andtheNationalScienceFoundationundergrantnumbersIIS-1116458,IIS-1247658,IIS-1252412,DGE-0750271, andDGE-0750271. Theviewsandconclusionscontainedinthisdocumentarethoseoftheauthorandshouldnotbe interpretedasrepresentingtheofficialpolicies,eitherexpressedorimplied,ofanysponsoringinstitution,theU.S. governmentoranyotherentity. Keywords: Statistical Machine Learning, Interactive Learning, Unsupervised Learning, Matrix Completion,SubspaceLearning,HierarchicalClustering,NetworkTomography. Tomyparents. iv Abstract This thesis explores the power of interactivity in unsupervised machine learning problems. Interactive algorithms employ feedback-driven measurements to reduce data acquisition costs and consequently enable statistical analysis in otherwise in- tractable settings. Unsupervised learning methods are fundamental tools across a variety of domains, and interactive procedures promise to broaden the scope of sta- tistical analysis. We develop interactive learning algorithms for three unsupervised problems: subspace learning, clustering, and tree metric learning. Our theoretical and empirical analysis shows that interactivity can bring both statistical and com- putational improvements over non-interactive approaches. An over-arching thread of this thesis is that interactive learning is particularly powerful for non-uniform datasets,wherenon-uniformityisquantifieddifferentlyineachsetting. We first study the subspace learning problem, where the goal is to recover or approximate the principal subspace of a collection of partially observed data points. We propose statistically and computationally appealing interactive algorithms for boththematrixcompletionproblem,wherethedatapointslieonalowdimensional subspace, and the matrix approximation problem, where one must approximate the principal components of a collection of points. We measure uniformity with the notion of incoherence, and we show that our feedback-driven algorithms perform wellundermuchmilderincoherenceassumptions. We next consider clustering a dataset represented by a partially observed simi- laritymatrix. Weproposeaninteractiveprocedureforrecoveringaclusteringfroma smallnumberofcarefullyselectedsimilaritymeasurements. Thealgorithmexploits non-uniformityofclustersize,usingfewmeasurementstorecoverlargerclustersand focusing measurements on the smaller structures. In addition to coming with strong statisticalandcomputationalguarantees,thisalgorithmperformswellinpractice. We also consider a specific metric learning problem, where we compute a latent tree metric to approximate distances over a point set. This problem is motivated by applications in network tomography, where the goal is to approximate the network structure using only measurements between pairs of end hosts. Our algorithms use aninteractivelychosensubsetofthepairwisedistancestolearnthelatenttreemetric whilebeingrobusttoeitheradditivenoiseorasmallnumberofarbitrarilycorrupted distances. Asbefore,weleveragenon-uniformityinherentinthetreemetricstructure toachievelowsamplecomplexity. Finally,westudyaclassicalhypothesistestingproblemwherewefocusonshow fundamentallimitsfornon-interactiveapproaches. Ourmainresultisaprecisechar- acterizationoftheperformanceofnon-interactiveapproaches,whichshowsthat,on particular problems, all non-interactive approaches are statistically weaker than a simple interactive one. These results bolster the theme that interactivity can bring aboutstatisticalimprovementsinunsupervisedproblems. vi Acknowledgments Firstandforemost,IwouldliketothankAartiSingh,myadvisor,whohasplayed a central role in shaping my research interest, style, and ability. Aarti has a keen awareness for broad context and perspective of our research that I strive to develop. She has challenged me to think deeply about research problems, encouraged me to pursuemyownresearchinterests,andprovidedmethesupportandfreedomtogrow individually. Her support, guidance, wisdom, and encouragement all helped shape this thesis and much of my work, and they were all instrumental to my success in graduateschool. IamthankfultoLarryWasserman,whoseinstructionincoursesandguidancein researchareaprimaryreasonformyinterestinstatisticalmachinelearning. Larry’s encyclopedic knowledge and grasp of statistics were extremely valuable resources formyresearch,butIalsoappreciatehiswiseadviceonpersonalandcareermatters. My collaboration with Barnaba´s Poczo´s and Larry has been thought-provoking and fun,andIamthankfulthattheyencouragedmetotacklenewproblems. This thesis is a product of reading many papers on interactive learning by my committee members, Nina Balcan, Sanjoy Dasgupta, and John Langford. Nina’s unbounded energy and her passion for machine learning are qualities that I strive for,whileSanjoy’scommentsduringmyproposalanddefensehaveleadmetomany new ideas. I am inspired by John’s deep understanding of both theory and practice and his ability to push both frontiers with unique and innovative ideas. I am truly excitedtoworkcloselywithandcontinuetolearnfromJohnoverthenextyear. I am thankful for many amazing collaborators that I have had the opportunity to work with: Sivaraman Balakrishnan for listening to my ideas and spending the time to think deeply about them, Min Xu for teaching me the importance of rigor, James Sharpnack for teaching me that simple problems can have deep and beautiful answers,andMartinAzizyanandKirthevasanKandasamyforalwaysbeingeagerto discuss research and brainstorm with me. You all have become wonderful friends, andIlookforwardtofuturemeetingsandcollaborations. I would like to thank many faculty and staff members at Carnegie Mellon Uni- versity for conversations and interactions that I cherish. It has been fun to discuss statisticsproblemsinthegymwithRyanTibshirani,whohasbecomeagoodfriend. I had a wonderful TA experience with Venkat Guruswami, and he, along with Anu- pamGupta,haveencouragedmetolearnmoreaboutTheoreticalComputerScience. IamthankfulformanyconversationswithMorHarchol-Balterthathelpedconvince me to pursue a career in academia. I am also grateful to all of the phenomenal staff members,butparticularlyDebCavlovich,CatherineCopetas,andDianeStidlewho greatlyenrichedmylifeatCMU. I am thankful to Zeeshan Syed and Eu-Jin Goh who supported me during my internshipatGoogleandhelpeddevelopmyengineeringability. Iamalsothankfulto AlekhAgarwal,MiroDud´ık,Kai-WeiChangandmanyothersatMicrosoftResearch NYC for a fun and productive internship. I am looking forward to spending another yearatMSRandcontinuingtocollaboratewithandlearnfromeveryoneatthelab. Many friends in Pittsburgh and elsewhere helped temper the challenges of grad- uate school and I am truly fortunate to have such a supportive network: My of- ficemates, Sarah Loos, Jeremiah Blocki, and Colin White; fellow computer science studentsDanaandYairMovshovitz-Attias,GabeandCatWeisz,JohnWright,David Witmer, Kevin Waugh, Erik Zawadzki, JP Dickerson, Jamie Morgenstern; machine learningfriendsMatusTelgarsky,AadityaRamdas,GautamDasarathy,MladenKo- lar,andWillieNeiswanger;ultimatefrisbeefriendsJesseKummer,AaronKane,Ben Clark,NipunnKoorapati,JeremyKanter,NickAudette,AndyFassler,LilyNguyen, and Carolyn Norwood; California friends Robbie Paolini, Ravi Raghavan, and Has- sanKhanwhomovedtoPittsburghwithme;andallofmyclosefriendsfromcollege, highschool,andbeyond. Thankyouallfortheamazingmemories! Lastly, I would like to thank my family: my parents, my grandparents, and my brother, Jayant Krishnamurthy. It was a unique and wonderful experience to attend graduateschoolwithJayantandhissupportwasinvaluable. Iwillcherishourmany attempts at collaboration and our conversations about research and life. I am eter- nallythankfultomyparentswhohavebothbeenwonderfulrolemodels. Thankyou somuchforyourenduringloveandsupport! viii Contents 1 Introduction 1 1.1 OverarchingThemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 OverviewofResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 InteractiveSubspaceLearning . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 InteractiveHierarchicalClustering . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 InteractiveLatentTreeMetricLearning . . . . . . . . . . . . . . . . . . 5 1.2.4 PassiveandInteractiveSamplinginNormalMeansInference . . . . . . . 5 1.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 InteractiveMatrixCompletion 9 2.0.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.1 RelatedworkonMatrixandTensorCompletion . . . . . . . . . . . . . . 12 2.1.2 RelatedworkonMatrixApproximation . . . . . . . . . . . . . . . . . . 13 2.2 MatrixandTensorCompletion . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Necessaryconditionsfornon-interactivesampling . . . . . . . . . . . . 18 2.3 MatrixApproximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3.1 Comparisonwithrelatedresults . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.1 ProofofTheorem2.1andCorollary2.2 . . . . . . . . . . . . . . . . . . 23 ix 2.4.2 ProofofTheorem2.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.3 ProofofTheorem2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.4 ProofofTheorem2.5andrelatedpropositions . . . . . . . . . . . . . . 33 2.5 EmpiricalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3 InteractiveHierarchicalClustering 41 3.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 MainResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.1 AnInteractiveClusteringFramework . . . . . . . . . . . . . . . . . . . 45 3.2.2 InteractiveSpectralClustering . . . . . . . . . . . . . . . . . . . . . . . 48 3.2.3 Activek-meansclustering . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.2.4 FundamentalLimits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.3 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.1 PracticalConsiderations . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.3.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.3.3 RealWorldExperiments . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.1 ProofofTheorem3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.4.2 ProofofTheorem3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.4.3 ProofofTheorem3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.4.4 ProofofProposition3.4 . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.4.5 ProofofTheorem3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4 InteractiveLatentTreeMetricLearning 69 4.1 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 x
Description: