Lecture Notes in Mathematics 2033 Editors: J.-M.Morel,Cachan B.Teissier,Paris Subseries: E´coled’E´te´deProbabilite´sdeSaint-Flour Forfurthervolumes: http://www.springer.com/series/304 Saint-FlourProbabilitySummerSchool The Saint-Flour volumes are reflections of the courses given at the Saint-Flour Probability Summer School. Founded in 1971, this school is organised every year by the Laboratoire de Mathe´matiques (CNRS and Universite´BlaisePascal,Clermont-Ferrand,France).ItisintendedforPhD students,teachersandresearcherswhoareinterestedinprobabilitytheory, statistics,andintheirapplications. The duration of each school is 13 days (it was 17 days up to 2005), and up to 70 participants can attend it. The aim is to provide, in three high- levelcourses, a comprehensivestudy of some fields in probabilitytheory orStatistics. Thelecturersarechosenbyaninternationalscientific board. Theparticipantsthemselvesalsohavetheopportunitytogiveshortlectures abouttheirresearchwork. Participantsarelodgedandworkinthesamebuilding,aformerseminary builtinthe18thcenturyinthecityofSaint-Flour,atanaltitudeof900m. Thepleasantsurroundingsfacilitatescientificdiscussionandexchange. TheSaint-FlourProbabilitySummerSchoolissupportedby: – Universite´BlaisePascal – CentreNationaldelaRechercheScientifique(C.N.R.S.) – Ministe`rede´le´gue´a` l’Enseignementsupe´rieureta` laRecherche Formoreinformation,seebackpagesofthebookand http://math.univ-bpclermont.fr/stflour/ JeanPicard SummerSchoolChairman LaboratoiredeMathe´matiques Universite´BlaisePascal 63177Aubie`reCedex France Vladimir Koltchinskii Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems ´ ´ Ecole d’Ete´ de Probabilite´s de Saint-Flour XXXVIII-2008 123 Prof.VladimirKoltchinskii GeorgiaInstituteofTechnology SchoolofMathematics Atlanta USA [email protected] ISBN978-3-642-22146-0 e-ISBN978-3-642-22147-7 DOI10.1007/978-3-642-22147-7 SpringerHeidelbergDordrechtLondonNewYork LectureNotesinMathematicsISSNprintedition:0075-8434 ISSNelectronicedition:1617-9692 LibraryofCongressControlNumber:2011934366 MathematicsSubjectClassification(2010):62J99,62H12,60B20,60G99 (cid:2)c Springer-VerlagBerlinHeidelberg2011 Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting, reproductiononmicrofilmorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9, 1965,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violations areliabletoprosecutionundertheGermanCopyrightLaw. Theuseofgeneral descriptive names,registered names, trademarks, etc. inthis publication does not imply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotective lawsandregulationsandthereforefreeforgeneraluse. Coverdesign:deblik,Berlin Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface The purpose of these lecture notes is to provide an introduction to the general theory of empirical risk minimization with an emphasis on excess risk bounds andoracleinequalitiesin penalizedproblems.Intherecentyears,therehavebeen new developments in this area motivated by the study of new classes of methods in machine learning such as large margin classification methods (boosting,kernel machines).Themainprobabilistictoolsinvolvedintheanalysisoftheseproblems areconcentrationanddeviationinequalitiesbyTalagrandalongwithothermethods of empirical processes theory (symmetrization inequalities, contraction inequality forRademachersums,entropyandgenericchainingbounds).Sparserecoverybased on `1-type penalization and low rank matrix recovery based on the nuclear norm penalization are other active areas of research, where the main problems can be statedintheframeworkofpenalizedempiricalriskminimization,andconcentration inequalitiesandempiricalprocessestoolsprovedtobeveryuseful. My interest in empirical processes started in the late 1970s and early 1980s. It was largely influenced by the work of Vapnik and Chervonenkis on Glivenko– Cantelli problem and on empirical risk minimization in pattern recognition, and, especially,bytheresultsofDudleyonuniformcentrallimittheorems.Talagrand’s concentrationinequality provedin the 1990s was a major result with deep conse- quences in the theory of empirical processes and related areas of statistics, and it inspiredmanynewapproachesinanalysisofempiricalriskminimizationproblems. Over the last years, the work of many people have had a profound impact on my own research and on my view of the subject of these notes. I was lucky to worktogetherwithseveralofthemandtohavenumerousconversationsandemail exchanges with many others. I am especially thankful to Peter Bartlett, Lucien Birge´, Gilles Blanchard, Stephane Boucheron, Olivier Bousquet, Richard Dudley, Sara van de Geer, Evarist Gine´, Gabor Lugosi, Pascal Massart, David Mason, Shahar Mendelson, Dmitry Panchenko, Alexandre Tsybakov, Aad van der Vaart, JonWellnerandJoelZinn. v vi Preface IamthankfultotheSchoolofMathematics,GeorgiaInstituteofTechnologyand totheDepartmentofMathematicsandStatistics,UniversityofNewMexicowhere mostofmyworkforthepastyearshavetakenplace. TheresearchdescribedinthesenoteshasbeensupportedinpartbyNSFgrants DMS-0304861,MSPA-MPS-0624841,DMS-0906880andCCF-0808863. I was working on the initial draft while visiting the Isaac Newton Institute for MathematicalSciencesinCambridgein2008.IamthankfultotheInstituteforits hospitality. I am also very thankful to Jean-Yves Audibert, Kai Ni and Jon Wellner who providedmewiththelistsoftyposandinconsistenciesinthefirstdraft. Iamverygratefultoallthestudentsandcolleagueswhoattendedthelecturesin 2008andwhosequestionsmotivatedmanyofthechangesIhavemadesincethen. I am especially thankfulto Jean Picard for all his effortsthat make The Saint- FlourSchooltrulyunique. Atlanta VladimirKoltchinskii February2011 Contents 1 Introduction .................................................................. 1 1.1 AbstractEmpiricalRiskMinimization................................ 1 1.2 ExcessRisk:DistributionDependentBounds ........................ 3 1.3 RademacherProcessesandDataDependentBounds onExcessRisk.......................................................... 5 1.4 PenalizedEmpiricalRiskMinimizationandOracleInequalities.... 7 1.5 ConcreteEmpiricalRiskMinimizationProblems.................... 8 1.6 SparseRecoveryProblems............................................. 11 1.7 RecoveringLowRankMatrices ....................................... 13 2 EmpiricalandRademacherProcesses..................................... 17 2.1 SymmetrizationInequalities........................................... 18 2.2 ComparisonInequalitiesforRademacherSums...................... 19 2.3 ConcentrationInequalities............................................. 23 2.4 ExponentialBoundsforSumsofIndependentRandomMatrices ... 26 2.5 FurtherComments...................................................... 31 3 Bounding Expected Sup-Norms of Empirical andRademacherProcesses ................................................. 33 3.1 GaussianandSubgaussianProcesses,MetricEntropies andGenericChainingComplexities................................... 33 3.2 FiniteClassesofFunctions ............................................ 36 3.3 ShatteringNumbersandVC-classesofSets.......................... 38 3.4 UpperEntropyBounds................................................. 42 3.5 LowerEntropyBounds................................................. 48 3.6 GenericChainingComplexitiesandBoundingEmpirical ProcessesIndexedbyF2 .............................................. 52 3.7 FunctionClassesinHilbertSpaces.................................... 54 3.8 FurtherComments...................................................... 57 vii viii Contents 4 ExcessRiskBounds.......................................................... 59 4.1 DistributionDependentBoundsand Ratio Bounds forExcessRisk......................................................... 59 4.2 RademacherComplexitiesandDataDependentBounds onExcessRisk.......................................................... 70 4.3 FurtherComments...................................................... 78 5 ExamplesofExcessRiskBoundsinPredictionProblems............... 81 5.1 RegressionwithQuadraticLoss....................................... 82 5.2 EmpiricalRiskMinimizationwithConvexLoss ..................... 88 5.3 BinaryClassificationProblems........................................ 91 5.4 FurtherComments...................................................... 97 6 Penalized Empirical Risk Minimization andModelSelectionProblems ............................................. 99 6.1 PenalizationinMonotoneFamiliesFk ............................... 101 6.2 PenalizationbyEmpiricalRiskMinima............................... 104 6.3 LinkingExcessRiskandVarianceinPenalization ................... 110 6.4 FurtherComments...................................................... 118 7 LinearProgramminginSparseRecovery................................. 121 7.1 SparseRecoveryandNeighborlinessofConvexPolytopes.......... 121 7.2 GeometricPropertiesoftheDictionary ............................... 123 7.2.1 ConesofDominantCoordinates .............................. 123 7.2.2 RestrictedIsometryConstantsandRelated Characteristics.................................................. 126 7.2.3 AlignmentCoefficients ........................................ 130 7.3 SparseRecoveryinNoiselessProblems............................... 133 7.4 TheDantzigSelector................................................... 140 7.5 FurtherComments...................................................... 149 8 ConvexPenalizationinSparseRecovery.................................. 151 8.1 GeneralAspectsofConvexPenalization.............................. 151 8.2 `1-PenalizationandOracleInequalities............................... 157 8.3 Entropy Penalization and Sparse Recovery inConvexHulls:RandomErrorBounds.............................. 175 8.4 ApproximationErrorBounds,AlignmentandOracle Inequalities.............................................................. 186 8.5 FurtherComments...................................................... 189 9 LowRankMatrixRecovery:NuclearNormPenalization .............. 191 9.1 Geometric Parameters of Low Rank Recovery andOtherPreliminaries................................................ 191 9.2 MatrixRegressionwithFixedDesign................................. 196 9.3 MatrixRegressionwithSubgaussianDesign ......................... 203 9.4 OtherTypesofDesigninMatrixRegression ......................... 219 9.5 FurtherComments...................................................... 233 Contents ix A AuxiliaryMaterial........................................................... 235 A.1 OrliczNorms ........................................................... 235 A.2 ClassicalExponentialInequalities..................................... 236 A.3 Propertiesof]-and[-Transforms ..................................... 237 A.4 SomeNotationsandFactsinLinearAlgebra......................... 238 References......................................................................... 241 Index............................................................................... 249 Programmeoftheschool........................................................ 251 Listofparticipants............................................................... 253