ebook img

An Introduction to Matrix Concentration Inequalities PDF

1.5 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An Introduction to Matrix Concentration Inequalities

AnIntroductionto MatrixConcentrationInequalities 5 JoelA.Tropp 1 0 2 24December2014 FnTMLDraft,Revised n a J 7 I ] R P . h t a m [ 1 v 1 7 5 1 0 . 1 0 5 1 : v i X r a i ii ForMargotandBenjamin Contents Contents iii Preface v 1 Introduction 1 1.1 HistoricalOrigins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 TheModernRandomMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 RandomMatricesforthePeople . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 BasicQuestionsinRandomMatrixTheory . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 RandomMatricesasIndependentSums . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 ExponentialConcentrationInequalitiesforMatrices . . . . . . . . . . . . . . . . . 6 1.7 TheArsenalofResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.8 AboutThisMonograph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2 MatrixFunctions&ProbabilitywithMatrices 17 2.1 MatrixTheoryBackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 ProbabilitywithMatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3 TheMatrixLaplaceTransformMethod 31 3.1 MatrixMomentsandCumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 TheMatrixLaplaceTransformMethod. . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3 TheFailureoftheMatrixMgf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.4 ATheoremofLieb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.5 SubadditivityoftheMatrixCgf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.6 MasterBoundsforSumsofIndependentRandomMatrices . . . . . . . . . . . . . 36 3.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4 MatrixGaussian&RademacherSeries 41 4.1 ANormBoundforRandomSerieswithMatrixCoefficients. . . . . . . . . . . . . . 42 4.2 Example:SomeGaussianMatrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Example:MatriceswithRandomlySignedEntries . . . . . . . . . . . . . . . . . . . 47 4.4 Example:GaussianToeplitzMatrices. . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.5 Application:RoundingfortheMaxQPRelaxation . . . . . . . . . . . . . . . . . . . 50 4.6 AnalysisofMatrixGaussian&RademacherSeries . . . . . . . . . . . . . . . . . . . 51 4.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 iii iv CONTENTS 5 ASumofRandomPositive-SemidefiniteMatrices 59 5.1 TheMatrixChernoffInequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2 Example:ARandomSubmatrixofaFixedMatrix. . . . . . . . . . . . . . . . . . . . 63 5.3 Application:WhenisanErdo˝s–RényiGraphConnected? . . . . . . . . . . . . . . . 67 5.4 ProofoftheMatrixChernoffInequalities . . . . . . . . . . . . . . . . . . . . . . . . 70 5.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6 ASumofBoundedRandomMatrices 75 6.1 ASumofBoundedRandomMatrices. . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.2 Example:MatrixApproximationbyRandomSampling . . . . . . . . . . . . . . . . 80 6.3 Application:RandomizedSparsificationofaMatrix . . . . . . . . . . . . . . . . . . 85 6.4 Application:RandomizedMatrixMultiplication . . . . . . . . . . . . . . . . . . . . 88 6.5 Application:RandomFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.6 ProofoftheMatrixBernsteinInequality . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.7 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 7 ResultsInvolvingtheIntrinsicDimension 105 7.1 TheIntrinsicDimensionofaMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.2 MatrixChernoffwithIntrinsicDimension . . . . . . . . . . . . . . . . . . . . . . . . 106 7.3 MatrixBernsteinwithIntrinsicDimension . . . . . . . . . . . . . . . . . . . . . . . 108 7.4 RevisitingtheMatrixLaplaceTransformBound . . . . . . . . . . . . . . . . . . . . 111 7.5 TheIntrinsicDimensionLemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.6 ProofoftheIntrinsicChernoffBound . . . . . . . . . . . . . . . . . . . . . . . . . . 113 7.7 ProofoftheIntrinsicBernsteinBounds . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8 AProofofLieb’sTheorem 119 8.1 Lieb’sTheorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 8.2 AnalysisoftheRelativeEntropyforVectors . . . . . . . . . . . . . . . . . . . . . . . 121 8.3 ElementaryTraceInequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8.4 TheLogarithmofaMatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 8.5 TheOperatorJensenInequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 8.6 TheMatrixPerspectiveTransformation . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.7 TheKroneckerProduct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 8.8 TheMatrixRelativeEntropyisConvex . . . . . . . . . . . . . . . . . . . . . . . . . . 137 8.9 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 MatrixConcentration:Resources 143 Bibliography 147 Preface Inrecentyears,randommatriceshavecometoplayamajorroleincomputationalmathematics, butmostoftheclassicalareasofrandommatrixtheoryremaintheprovinceofexperts. Over thelastdecade,withtheadventofmatrixconcentrationinequalities,researchhasadvancedto thepointwherewecanconquermany(formerly)challengingproblemswithapageortwoof arithmetic. Myaimistodescribethemostsuccessfulmethodsfromthisareaalongwithsomeinteresting examplesthatthesetechniquescanilluminate.Ihopethattheresultsinthesepageswillinspire futureworkonapplicationsofrandommatricesaswellasrefinementsofthematrixconcentra- tioninequalitiesdiscussedherein. IhavechosentopresentacoherentbodyofresultsbasedonageneralizationoftheLaplace transformmethodforestablishingscalarconcentrationinequalities.Inthelasttwoyears,Lester MackeyandI,togetherwithourcoauthors,havedevelopedanalternativeapproachtomatrix concentrationusingexchangeablepairsandMarkovchaincouplings. Withsomeregret,Ihave chosen to omit this theory because the ideas seem less accessible to a broad audience of re- searchers.Theinterestedreaderwillfindpointerstothesearticlesintheannotatedbibliography. Theworkdescribedinthesenotesreflectstheinfluenceofmanyresearchers.Theseinclude Rudolf Ahlswede, Rajendra Bhatia, Eric Carlen, Sourav Chatterjee, Edward Effros, Elliott Lieb, Roberto Imbuzeiro Oliveira, Dénes Petz, Gilles Pisier, Mark Rudelson, Roman Vershynin, and AndreasWinter.Ihavealsolearnedagreatdealfromothercolleaguesandfriendsalongtheway. Iwouldliketothanksomepeoplewhohavehelpedmeimprovethiswork. Severalreaders informedmeabouterrorsintheinitialversionofthismanuscript;theseincludeSergBogdanov, Peter Forrester, Nikos Karampatziakis, and Guido Lagos. The anonymous reviewers tendered manyusefulsuggestions,andtheypointedoutanumberoferrors.SidBarmangavemefeedback onthefinalrevisionstothemonograph.Last,IwanttothankLéonNijensohnforhiscontinuing encouragement. IgratefullyacknowledgefinancialsupportfromtheOfficeofNavalResearchunderawards N00014-08-1-0883andN00014-11-1002,theAirForceOfficeofStrategicResearchunderaward FA9550-09-1-0643, andanAlfredP.SloanFellowship. Someofthisresearchwascompletedat theInstituteofPureandAppliedMathematicsatUCLA.IwouldalsoliketothanktheCalifornia InstituteofTechnologyandtheMooreFoundation. JoelA.Tropp Pasadena,CA December2012 Revised,March2014andDecember2014 v 1 CHAPTER Introduction Randommatrixtheoryhasgrownintoavitalareaofprobability,andithasfoundapplications inmanyotherfields. Tomotivatetheresultsinthismonograph,webeginwithanoverviewof theconnectionsbetweenrandommatrixtheoryandcomputationalmathematics.Weintroduce thebasicideasunderlyingourapproach,andwestateoneofourmainresultsonthebehavior of random matrices. As an application, we examine the properties of the sample covariance estimator,arandommatrixthatarisesinstatistics. Afterward,wesummarizetheothertypesof resultsthatappearinthesenotes,andweassessthenoveltiesinthispresentation. 1.1 HistoricalOrigins Randommatrixtheorysprangfromseveraldifferentsourcesinthefirsthalfofthe20thcentury. GeometryofNumbers. PeterForrester[For10,p.v]tracesthefieldofrandommatrixtheoryto work of Hurwitz, who defined the invariant integral over a Lie group. Specializing this analysistotheorthogonalgroup,wecanreinterpretthisintegralastheexpectationofa functionofauniformlyrandomorthogonalmatrix. MultivariateStatistics. AnotherearlyexampleofarandommatrixappearedintheworkofJohn Wishart[Wis28].Wishartwasstudyingthebehaviorofthesamplecovarianceestimatorfor thecovariancematrixofamultivariatenormalrandomvector.Heshowedthattheestima- tor,whichisarandommatrix,hasthedistributionthatnowbearshisname. Statisticians haveoftenusedrandommatricesasmodelsformultivariatedata[MKB79,Mui82]. NumericalLinearAlgebra. Intheirremarkablework[vNG47,GvN51]oncomputationalmeth- ods for solving systems of linear equations, von Neumann and Goldstine considered a randommatrixmodelforthefloating-pointerrorsthatarisefromanLUdecomposition.1 Theyobtainedahigh-probabilityboundforthenormoftherandommatrix,whichthey 1vonNeumannandGoldstineinventedandanalyzedthisalgorithmbeforetheyhadanydigitalcomputeronwhich toimplementit!See[Grc11]forahistoricalaccount. 1 2 CHAPTER1. INTRODUCTION tookasanestimatefortheerrortheproceduremighttypicallyincur. Curiously, insub- sequentyears,numericallinearalgebraistsbecameverysuspiciousofprobabilistictech- niques,andonlyinrecentyearshaverandomizedalgorithmsreappearedinthisfield.See thesurveys[Mah11,HMT11,Woo14]formoredetailsandreferences. NuclearPhysics. Intheearly1950s, physicistshadreachedthelimitsofdeterministicanalyt- icaltechniquesforstudyingtheenergyspectraofheavyatomsundergoingslownuclear reactions. EugeneWignerwasthefirstresearchertosurmisethatarandommatrixwith appropriatesymmetriesmightserveasasuitablemodelfortheHamiltonianofthequan- tummechanicalsystemthatdescribesthereaction.Theeigenvaluesofthisrandommatrix modelthepossibleenergylevelsofthesystem. SeeMehta’sbook[Meh04,§1.1]foranac- countofallthis. In each area, the motivation was quite different and led to distinct sets of questions. Later, random matrices began to percolate into other fields such as graph theory (the Erdo˝s–Rényi model[ER60]forarandomgraph)andnumbertheory(asamodelforthespacingofzerosof theRiemannzetafunction[Mon73]). 1.2 TheModernRandomMatrix By now, random matrices are ubiquitous. They arise throughout modern mathematics and statistics,aswellasinmanybranchesofscienceandengineering. Randommatriceshavesev- eraldifferentpurposesthatwemaywishtodistinguish. Theycanbeusedwithinrandomized computeralgorithms;theyserveasmodelsfordataandforphysicalphenomena;andtheyare subjectsofmathematicalinquiry. Thissectionoffersatasteoftheseapplications. Notethatthe ideasandreferencesherereflecttheauthor’sinterests,andtheyarefarfromcomprehensive! 1.2.1 AlgorithmicApplications The striking mathematical properties of random matrices can be harnessed to develop algo- rithmsforsolvingmanydifferentproblems. ComputingMatrixApproximations. Randommatricescanbeusedtodevelopfastalgorithms forcomputingatruncatedsingular-valuedecomposition.Inthisapplication,wemultiply alargeinputmatrixbyasmallerrandommatrixtoextractinformationaboutthedominant + singularvectorsoftheinputmatrix. Theseedofthisideaappearsin[FKV98,DFK 99]. Thesurvey[HMT11]explainshowtoimplementthismethodinpractice, whilethetwo monographs[Mah11,Woo14]covermoretheoreticalaspects. Sparsification. Onewaytoacceleratespectralcomputationsonlargematricesistoreplacethe originalmatrixbyasparseproxythathassimilarspectralproperties. Anelegantwayto produce the sparse proxy is to zero out entries of the original matrix at random while rescalingtheentriesthatremain. Thisapproachwasproposedin[AM01,AM07],andthe papers[AKL13,KD14]containrecentinnovations.Relatedideasplayanimportantrolein SpielmanandTeng’swork[ST04]onfastalgorithmsforsolvinglinearsystems. SubsamplingofData. Inlarge-scalemachinelearning,onemayneedtosubsampledataran- domlytoreducethecomputationalcostsoffittingamodel.Forinstance,wecancombine 1.2. THEMODERNRANDOMMATRIX 3 randomsamplingwiththeNyströmdecompositiontoobtainarandomizedapproxima- tionofakernelmatrix. ThismethodwasintroducedbyWilliams&Seeger[WS01]. The paper[DM05]providesthefirsttheoreticalanalysis,andthesurvey[GM14]containsmore completeresults. DimensionReduction. Abasictemplateinthetheoryofalgorithmsinvokesrandomizedpro- jectiontoreducethedimensionofacomputationalproblem. Manytypesofdimension reduction are based on properties of random matrices. The two papers [JL84, Bou85] establishedthemathematicalfoundationsofthisapproach. Theearliestapplicationsin computerscienceappearinthework[LLR95]. Manycontemporaryvariantsdependon ideasfrom[AC09]and[CW13]. CombinatorialOptimization. One approach to solving a computationally difficult optimiza- tionproblemistorelax(i.e.,enlarge)theconstraintsetsotheproblembecomestractable, tosolvetherelaxedproblem,andthentousearandomizedproceduretomapthesolution backtotheoriginalconstraintset[BTN01,§4.3]. Thistechniqueiscalledrelaxationand rounding. Forhardoptimizationproblemsinvolvingamatrixvariable,theanalysisofthe roundingprocedureofteninvolvesideasfromrandommatrixtheory[So09,NRV13]. CompressedSensing. Whenacquiringdataaboutanobjectwithrelativelyfewdegreesoffree- domascomparedwiththeambientdimension,wemaybeabletosieveouttheimportant informationfromtheobjectbytakingasmallnumberofrandommeasurements,where + thenumberofmeasurementsiscomparabletothenumberofdegreesoffreedom[GGI 02, CRT06,Don06].Thisobservationisnowreferredtoascompressedsensing.Randommatri- cesplayacentralroleinthedesignandanalysisofmeasurementprocedures.Forexample, see[FR13,CRPW12,ALMT14,Tro14]. 1.2.2 Modeling Randommatricesalsoappearasmodelsformultivariatedataormultivariatephenomena. By studyingthepropertiesofthesemodels,wemayhopetounderstandthetypicalbehaviorofa data-analysisalgorithmoraphysicalsystem. SparseApproximationforRandomSignals. Sparse approximation has become an important probleminstatistics,signalprocessing,machinelearningandotherareas.Onemodelfor a“typical”sparsesignalposestheassumptionthatthenonzerocoefficientsthatgenerate thesignalarechosenatrandom.Whenanalyzingmethodsforidentifyingthesparsesetof coefficients,wemuststudythebehaviorofarandomcolumnsubmatrixdrawnfromthe modelmatrix[Tro08a,Tro08b]. DemixingofStructuredSignals. Indataanalysis,itiscommontoencounteramixtureoftwo structuredsignals,andthegoalistoextractthetwosignalsusingpriorinformationabout thestructures. Acommonmodelforthisproblemassumesthatthesignalsarerandomly orientedwithrespecttoeachother,whichmeansthatitisusuallypossibletodiscriminate theunderlyingstructures.Randomorthogonalmatricesariseintheanalysisofestimation techniquesforthisproblem[MT14,ALMT14,MT13]. 4 CHAPTER1. INTRODUCTION StochasticBlockModel. One probabilistic framework for describing community structure in anetworkassumesthateachpairofindividualsinthesamecommunityhasarelation- shipwithhighprobability, whileeachpairofindividualsdrawnfromdifferentcommu- nitieshasarelationshipwithlowerprobability. Thisisreferredtoasthestochasticblock model[HLL83].Itisquitecommontoanalyzealgorithmsforextractingcommunitystruc- turefromdatabypositingthatthismodelholds.See[ABH14]forarecentcontribution,as wellasasummaryoftheextensiveliterature. High-DimensionalDataAnalysis. Moregenerally, randommodelsarepervasiveintheanaly- sisofstatisticalestimationproceduresforhigh-dimensionaldata. Randommatrixtheory playsakeyroleinthisfield[MKB79,Mui82,Kol11,BvdG11]. WirelessCommunication. Randommatricesarecommonlyusedasmodelsforwirelesschan- nels.SeethebookofTulinoandVerdúformoreinformation[TV04]. Intheseexamples,itisimportanttorecognizethatrandommodelsmaynotcoincideverywell withreality,buttheyallowustogetasenseofwhatmightbepossibleinsomegenericcases. 1.2.3 TheoreticalAspects Randommatricesarefrequentlystudiedfortheirintrinsicmathematicalinterest.Insomefields, theyprovideexamplesofstrikingphenomena. Inotherareas,theyfurnishcounterexamplesto “intuitive”conjectures.Hereareafewdisparateproblemswhererandommatricesplayarole. Combinatorics. Anexpandergraphhasthepropertythateverysmallsetofverticeshasedges linkingittoalargeproportionofthevertices.Theexpansionpropertyiscloselyrelatedto thespectralbehavioroftheadjacencymatrixofthegraph. Theeasiestconstructionofan expanderinvolvesarandommatrixargument[AS00,§9.2]. NumericalAnalysis. Forworst-caseexamples, theGaussianeliminationmethodforsolvinga linear system is not numerically stable. In practice, however, stability problems rarely arise.Oneexplanationforthisphenomenonisthat,withhighprobability,asmallrandom perturbationofanyfixedmatrixiswellconditioned. Asaconsequence,itcanbeshown thatGaussianeliminationisstableformostmatrices[SST06]. High-DimensionalGeometry. Dvoretzky’sTheoremstatesthat, when N islarge, theunitball ofeachN-dimensionalBanachspacehasasliceofdimensionn≈logN thatisclosetoa Euclideanballwithdimensionn. Itturnsoutthatarandomsliceofdimensionnrealizes thisproperty[Mil71]. Thisresultcanbeframedasastatementaboutspectralproperties ofarandommatrix[Gor85]. QuantumInformationTheory. Randommatricesappearascounterexamplesforanumberof conjecturesinquantuminformationtheory.Hereisoneinstance.Inclassicalinformation theory,thetotalamountofinformationthatwecantransmitthroughapairofchannels equalsthesumoftheinformationwecansendthrougheachchannelseparately. Itwas conjecturedthatthesamepropertyholdsforquantumchannels.Infact,apairofquantum channelscanhavestrictlylargercapacitythanasinglechannel. Thisresultdependsona randommatrixconstruction[Has09].See[HW08]forrelatedwork.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.