ebook img

Learning Kernel Classifiers: Theory and Algorithms PDF

371 Pages·2001·2.69 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learning Kernel Classifiers: Theory and Algorithms

Learning Kernel Classifiers AdaptiveComputationandMachineLearning ThomasG.Dietterich,Editor ChristopherBishop,DavidHeckerman,MichaelJordan,andMichaelKearns,AssociateEditors Bioinformatics:TheMachineLearningApproach,PierreBaldiandSørenBrunak ReinforcementLearning:AnIntroduction,RichardS.SuttonandAndrewG.Barto Graphical Models for Machine Learning and Digital Communication, Brendan J.Frey LearninginGraphicalModels,MichaelI.Jordan Causation,Prediction,and Search,secondedition, PeterSpirtes, Clark Glymour, andRichardScheines PrinciplesofDataMining,DavidHand,HeikkiMannilla,andPadhraicSmyth Bioinformatics:TheMachineLearningApproach,secondedition,PierreBaldiand SørenBrunak LearningKernelClassifiers:TheoryandAlgorithms,RalfHerbrich Learning with Kernels: Support Vector Machines, Regularization, Optimization, andBeyond,BernhardSchölkopfandAlexanderJ.Smola Learning Kernel Classifiers Theory and Algorithms RalfHerbrich TheMITPress Cambridge,Massachusetts London,England (cid:1)c2002MassachusettsInstituteofTechnology Allrightsreserved.Nopartofthisbookmaybereproducedinanyformbyanyelectronicormechanicalmeans (includingphotocopying,recording,orinformationstorageandretrieval)withoutpermissioninwritingfromthe publisher. ThisbookwassetinTimesRomanbytheauthorusingtheLATEXdocumentpreparationsystemandwasprinted andboundintheUnitedStatesofAmerica. LibraryofCongressCataloging-in-PublicationData Herbrich,Ralf. Learningkernelclassifiers:theoryandalgorithms/RalfHerbrich. p. cm.—(Adaptivecomputationandmachinelearning) Includesbibliographicalreferencesandindex. ISBN0-262-08306-X(hc.:alk.paper) 1.Machinelearning.2.Algorithms.I.Title.II.Series. Q325.5.H482001 (cid:2) 006.31—dc21 2001044445 Tomywife,Jeannette Therearemanybranchesoflearningtheorythathavenotyetbeenanalyzedandthatareimportant bothforunderstandingthephenomenonoflearningandforpracticalapplications.Theyarewaiting fortheirresearchers. —VladimirVapnik Geometryisilluminating;probabilitytheoryispowerful. —PálRuján Contents SeriesForeword xv Preface xvii 1 Introduction 1 1.1 TheLearningProblemand(Statistical) Inference 1 1.1.1 Supervised Learning . . . . . . . . . . . . . . . 3 1.1.2 Unsupervised Learning . . . . . . . . . . . . . . 6 1.1.3 Reinforcement Learning . . . . . . . . . . . . . 7 1.2 LearningKernelClassifiers 8 1.3 ThePurposesofLearningTheory 11 I LEARNINGALGORITHMS 2 KernelClassifiersfromaMachineLearningPerspective 17 2.1 TheBasicSetting 17 2.2 LearningbyRiskMinimization 24 2.2.1 The(Primal)Perceptron Algorithm . . . . . . . 26 2.2.2 Regularized RiskFunctionals . . . . . . . . . . 27 2.3 KernelsandLinearClassifiers 30 2.3.1 TheKernelTechnique . . . . . . . . . . . . . . 33 2.3.2 KernelFamilies . . . . . . . . . . . . . . . . . . 36 2.3.3 TheRepresenter Theorem . . . . . . . . . . . . 47 2.4 SupportVectorClassification Learning 49 2.4.1 MaximizingtheMargin . . . . . . . . . . . . . 49 2.4.2 SoftMargins—Learning withTrainingError . . 53 2.4.3 GeometricalViewpointsonMarginMaximization 56 2.4.4 Theν–TrickandOtherVariants . . . . . . . . . 58 x Contents 2.5 AdaptiveMarginMachines 61 2.5.1 AssessmentofLearningAlgorithms . . . . . . . 61 2.5.2 Leave-One-OutMachines . . . . . . . . . . . . 63 2.5.3 PitfallsofMinimizingaLeave-One-OutBound . 64 2.5.4 AdaptiveMarginMachines . . . . . . . . . . . . 66 2.6 Bibliographical Remarks 68 3 KernelClassifiersfromaBayesianPerspective 73 3.1 TheBayesianFramework 73 3.1.1 ThePowerofConditioning onData . . . . . . . 79 3.2 GaussianProcesses 81 3.2.1 BayesianLinearRegression . . . . . . . . . . . 82 3.2.2 FromRegressiontoClassification . . . . . . . . 87 3.3 TheRelevanceVectorMachine 92 3.4 BayesPointMachines 97 3.4.1 EstimatingtheBayesPoint . . . . . . . . . . . . 100 3.5 FisherDiscriminants 103 3.6 Bibliographical Remarks 110 II LEARNINGTHEORY 4 MathematicalModelsofLearning 115 4.1 Generativevs.Discriminative Models 116 4.2 PACandVCFrameworks 121 4.2.1 ClassicalPACandVCAnalysis . . . . . . . . . 123 4.2.2 GrowthFunctionandVCDimension . . . . . . 127 4.2.3 StructuralRiskMinimization . . . . . . . . . . . 131 4.3 TheLuckinessFramework 134 4.4 PACandVCFrameworksforReal-ValuedClassifiers 140 4.4.1 VCDimensionsforReal-ValuedFunctionClasses 146 4.4.2 ThePACMarginBound . . . . . . . . . . . . . 150 4.4.3 RobustMarginBounds . . . . . . . . . . . . . 151 4.5 Bibliographical Remarks 158 xi Contents 5 BoundsforSpecificAlgorithms 163 5.1 ThePAC-BayesianFramework 164 5.1.1 PAC-BayesianBoundsforBayesianAlgorithms 164 5.1.2 APAC-BayesianMarginBound . . . . . . . . . 172 5.2 Compression Bounds 175 5.2.1 Compression SchemesandGeneralization Error 176 5.2.2 On-lineLearningandCompressionSchemes . . 182 5.3 AlgorithmicStabilityBounds 185 5.3.1 AlgorithmicStabilityforRegression . . . . . . 185 5.3.2 AlgorithmicStabilityforClassification . . . . . 190 5.4 Bibliographical Remarks 193 III APPENDICES A Theoretical BackgroundandBasicInequalities 199 A.1 Notation 199 A.2 Probability Theory 200 A.2.1 SomeResultsforRandomVariables . . . . . . . 203 A.2.2 FamiliesofProbability Measures . . . . . . . . 207 A.3 Functional AnalysisandLinearAlgebra 215 A.3.1 Covering,PackingandEntropyNumbers . . . . 220 A.3.2 MatrixAlgebra . . . . . . . . . . . . . . . . . . 222 A.4 Ill-PosedProblems 239 A.5 BasicInequalities 240 A.5.1 General(In)equalities . . . . . . . . . . . . . . . 240 A.5.2 LargeDeviationBounds . . . . . . . . . . . . . 243 B ProofsandDerivations—Part I 253 B.1 Functions ofKernels 253 B.2 EfficientComputationofStringKernels 254 B.2.1 EfficientComputationoftheSubstringKernel . . 255 B.2.2 EfficientComputationoftheSubsequence Kernel 255 B.3 Representer Theorem 257 B.4 Convergence ofthePerceptron 258

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.