Boosting FoundationsandAlgorithms AdaptiveComputationandMachineLearning ThomasDietterich,Editor ChristopherBishop,DavidHeckerman,MichaelJordan,andMichaelKearns,AssociateEditors Acompletelistofthebookspublishedinthisseriesmaybefoundatthebackofthebook. Boosting FoundationsandAlgorithms RobertE.Schapire YoavFreund TheMITPress Cambridge,Massachusetts London,England ©2012MassachusettsInstituteofTechnology Allrightsreserved.Nopartofthisbookmaybereproducedinanyformbyanyelectronicormechanicalmeans (includingphotocopying,recording,orinformationstorageandretrieval)withoutpermissioninwritingfromthe publisher. Forinformationaboutspecialqualitydiscounts,[email protected] ThisbookwassetinTimesRomanbyWestchesterBookComposition. PrintedandboundintheUnitedStatesofAmerica. LibraryofCongressCataloging-in-PublicationData Schapire,RobertE. Boosting:foundationsandalgorithms/RobertE.SchapireandYoavFreund. p. cm.—(Adaptivecomputationandmachinelearningseries) Includesbibliographicalreferencesandindex. ISBN978-0-262-01718-3(hardcover:alk.paper) 1.Boosting(Algorithms) 2.Supervisedlearning(Machinelearning) I.Freund,Yoav. II.Title. Q325.75.S33 2012 006.3'1—dc23 2011038972 10 9 8 7 6 5 4 3 2 1 Toourfamilies Contents SeriesForeword xi Preface xiii 1 IntroductionandOverview 1 1.1 ClassificationProblemsandMachineLearning 2 1.2 Boosting 4 1.3 ResistancetoOverfittingandtheMarginsTheory 14 1.4 FoundationsandAlgorithms 17 Summary 19 BibliographicNotes 19 Exercises 20 I COREANALYSIS 21 2 FoundationsofMachineLearning 23 2.1 ADirectApproachtoMachineLearning 24 2.2 GeneralMethodsofAnalysis 30 2.3 AFoundationfortheStudyofBoostingAlgorithms 43 Summary 49 BibliographicNotes 49 Exercises 50 3 UsingAdaBoosttoMinimizeTrainingError 53 3.1 ABoundonAdaBoost’sTrainingError 54 3.2 ASufficientConditionforWeakLearnability 56 3.3 RelationtoChernoffBounds 60 3.4 UsingandDesigningBaseLearningAlgorithms 62 Summary 70 BibliographicNotes 71 Exercises 71 viii Contents 4 DirectBoundsontheGeneralizationError 75 4.1 UsingVCTheorytoBoundtheGeneralizationError 75 4.2 Compression-BasedBounds 83 4.3 TheEquivalenceofStrongandWeakLearnability 86 Summary 88 BibliographicNotes 89 Exercises 89 5 TheMarginsExplanationforBoosting’sEffectiveness 93 5.1 MarginasaMeasureofConfidence 94 5.2 AMargins-BasedAnalysisoftheGeneralizationError 97 5.3 AnalysisBasedonRademacherComplexity 106 5.4 TheEffectofBoostingonMarginDistributions 111 5.5 Bias,Variance,andStability 117 5.6 RelationtoSupport-VectorMachines 122 5.7 PracticalApplicationsofMargins 128 Summary 132 BibliographicNotes 132 Exercises 134 II FUNDAMENTALPERSPECTIVES 139 6 GameTheory,OnlineLearning,andBoosting 141 6.1 GameTheory 142 6.2 LearninginRepeatedGamePlaying 145 6.3 OnlinePrediction 153 6.4 Boosting 157 6.5 Applicationtoa“Mind-Reading”Game 163 Summary 169 BibliographicNotes 169 Exercises 170 7 LossMinimizationandGeneralizationsofBoosting 175 7.1 AdaBoost’sLossFunction 177 7.2 CoordinateDescent 179 7.3 LossMinimizationCannotExplainGeneralization 184 7.4 FunctionalGradientDescent 188 7.5 LogisticRegressionandConditionalProbabilities 194 7.6 Regularization 202 7.7 ApplicationstoData-LimitedLearning 211 Summary 219 BibliographicNotes 219 Exercises 220 Contents ix 8 Boosting,ConvexOptimization,andInformationGeometry 227 8.1 IterativeProjectionAlgorithms 228 8.2 ProvingtheConvergenceofAdaBoost 243 8.3 UnificationwithLogisticRegression 252 8.4 ApplicationtoSpeciesDistributionModeling 255 Summary 260 BibliographicNotes 262 Exercises 263 III ALGORITHMICEXTENSIONS 269 9 UsingConfidence-RatedWeakPredictions 271 9.1 TheFramework 273 9.2 GeneralMethodsforAlgorithmDesign 275 9.3 LearningRule-Sets 287 9.4 AlternatingDecisionTrees 290 Summary 296 BibliographicNotes 297 Exercises 297 10 MulticlassClassificationProblems 303 10.1 ADirectExtensiontotheMulticlassCase 305 10.2 TheOne-against-AllReductionandMulti-labelClassification 310 10.3 ApplicationtoSemanticClassification 316 10.4 GeneralReductionsUsingOutputCodes 320 Summary 333 BibliographicNotes 333 Exercises 334 11 LearningtoRank 341 11.1 AFormalFrameworkforRankingProblems 342 11.2 ABoostingAlgorithmfortheRankingTask 345 11.3 MethodsforImprovingEfficiency 351 11.4 Multiclass,Multi-labelClassification 361 11.5 Applications 364 Summary 367 BibliographicNotes 369 Exercises 369