ebook img

Learning with kernels: support vector machines, regularization, optimization, and beyond PDF

645 Pages·2002·13.61 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Learning with kernels: support vector machines, regularization, optimization, and beyond

Learning with Kernels Support Vector Machines, Regularization, Optimization, and Beyond Bernhard Schölkopf and Alexander J. Smola Learning with Kernels AdaptiveComputationandMachineLearning ThomasDietterich,Editor ChristopherBishop,DavidHeckerman,MichaelJordan,andMichaelKearns,As- sociateEditors Bioinformatics:TheMachineLearningApproach,PierreBaldiandSørenBrunak ReinforcementLearning:AnIntroduction,RichardS.SuttonandAndrewG.Barto GraphicalModelsforMachineLearningandDigitalCommunication,BrendanJ.Frey LearninginGraphicalModels,MichaelI.Jordan Causation,Prediction,andSearch,secondedition,PeterSpirtes,ClarkGlymour,and RichardScheines PrinciplesofDataMining,DavidHand,HeikkiMannila,andPadhraicSmyth Bioinformatics: The Machine Learning Approach, second edition, Pierre Baldi and SørenBrunak LearningKernelClassifiers:TheoryandAlgorithms,RalfHerbrich Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond,BernhardScho¨lkopfandAlexanderJ.Smola Learning with Kernels SupportVectorMachines,Regularization,Optimization,andBeyond BernhardScho¨lkopf AlexanderJ.Smola TheMITPress Cambridge,Massachusetts London,England c2002MassachusettsInstituteofTechnology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanicalmeans(includingphotocopying,recording,orinformationstorageandretrieval)without permissioninwritingfromthepublisher. TypesetbytheauthorsusingLATEX2 PrintedandboundintheUnitedStatesofAmerica LibraryofCongressCataloging-in-PublicationData LearningwithKernels—SupportVectorMachines, Regularization,OptimizationandBeyond/byBernhardScho¨lkopf, AlexanderJ.Smola. p. cm. Includesbibliographicalreferencesandindex. ISBN0-262-19475-9(alk.paper) 1.Machinelearning.2.Algorithms.3.Kernelfunctions I.Scho¨lkopf,Bernhard.II.Smola,AlexanderJ. Toourparents Contents SeriesForeword xiii Preface xv 1 ATutorialIntroduction 1 1.1 DataRepresentationandSimilarity . . . . . . . . . . . . . . . . . . . 1 1.2 ASimplePatternRecognitionAlgorithm . . . . . . . . . . . . . . . 4 1.3 SomeInsightsFromStatisticalLearningTheory . . . . . . . . . . . 6 1.4 HyperplaneClassifiers . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5 SupportVectorClassification . . . . . . . . . . . . . . . . . . . . . . 15 1.6 SupportVectorRegression . . . . . . . . . . . . . . . . . . . . . . . . 17 1.7 KernelPrincipalComponentAnalysis . . . . . . . . . . . . . . . . . 19 1.8 EmpiricalResultsandImplementations . . . . . . . . . . . . . . . . 21 I CONCEPTSANDTOOLS 23 2 Kernels 25 2.1 ProductFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2 TheRepresentationofSimilaritiesinLinearSpaces. . . . . . . . . . 29 2.3 ExamplesandPropertiesofKernels . . . . . . . . . . . . . . . . . . 45 2.4 TheRepresentationofDissimilaritiesinLinearSpaces . . . . . . . . 48 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3 RiskandLossFunctions 61 3.1 LossFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.2 TestErrorandExpectedRisk . . . . . . . . . . . . . . . . . . . . . . 65 3.3 AStatisticalPerspective . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.4 RobustEstimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4 Regularization 87 4.1 TheRegularizedRiskFunctional . . . . . . . . . . . . . . . . . . . . 88 viii Contents 4.2 TheRepresenterTheorem . . . . . . . . . . . . . . . . . . . . . . . . 89 4.3 RegularizationOperators . . . . . . . . . . . . . . . . . . . . . . . . 92 4.4 TranslationInvariantKernels . . . . . . . . . . . . . . . . . . . . . . 96 4.5 TranslationInvariantKernelsinHigherDimensions . . . . . . . . . 105 4.6 DotProductKernels . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.7 Multi-OutputRegularization . . . . . . . . . . . . . . . . . . . . . . 113 4.8 SemiparametricRegularization . . . . . . . . . . . . . . . . . . . . . 115 4.9 CoefficientBasedRegularization . . . . . . . . . . . . . . . . . . . . 118 4.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 4.11 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5 ElementsofStatisticalLearningTheory 125 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2 TheLawofLargeNumbers . . . . . . . . . . . . . . . . . . . . . . . 128 5.3 WhenDoesLearningWork:theQuestionofConsistency . . . . . . 131 5.4 UniformConvergenceandConsistency . . . . . . . . . . . . . . . . 131 5.5 HowtoDeriveaVCBound . . . . . . . . . . . . . . . . . . . . . . . 134 5.6 AModelSelectionExample . . . . . . . . . . . . . . . . . . . . . . . 144 5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6 Optimization 149 6.1 ConvexOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 6.2 UnconstrainedProblems . . . . . . . . . . . . . . . . . . . . . . . . . 154 6.3 ConstrainedProblems . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.4 InteriorPointMethods . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.5 MaximumSearchProblems . . . . . . . . . . . . . . . . . . . . . . . 179 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 II SUPPORTVECTORMACHINES 187 7 PatternRecognition 189 7.1 SeparatingHyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . 189 7.2 TheRoleoftheMargin . . . . . . . . . . . . . . . . . . . . . . . . . . 192 7.3 OptimalMarginHyperplanes . . . . . . . . . . . . . . . . . . . . . . 196 7.4 NonlinearSupportVectorClassifiers . . . . . . . . . . . . . . . . . . 200 7.5 SoftMarginHyperplanes . . . . . . . . . . . . . . . . . . . . . . . . 204 7.6 Multi-ClassClassification . . . . . . . . . . . . . . . . . . . . . . . . 211 7.7 VariationsonaTheme . . . . . . . . . . . . . . . . . . . . . . . . . . 214 7.8 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 7.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 7.10 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 Contents ix 8 Single-ClassProblems:QuantileEstimationandNoveltyDetection 227 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 8.2 ADistribution’sSupportandQuantiles . . . . . . . . . . . . . . . . 229 8.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 8.4 Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 8.5 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 8.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 8.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 8.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 8.9 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 9 RegressionEstimation 251 9.1 LinearRegressionwithInsensitiveLossFunction. . . . . . . . . . . 251 9.2 DualProblems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 9.3 -SVRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 9.4 ConvexCombinationsand -Norms . . . . . . . . . . . . . . . . . . 266 1 9.5 ParametricInsensitivityModels . . . . . . . . . . . . . . . . . . . . . 269 9.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 9.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273 9.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 10 Implementation 279 10.1 TricksoftheTrade. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 10.2 SparseGreedyMatrixApproximation . . . . . . . . . . . . . . . . . 288 10.3 InteriorPointAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . 295 10.4 SubsetSelectionMethods . . . . . . . . . . . . . . . . . . . . . . . . 300 10.5 SequentialMinimalOptimization . . . . . . . . . . . . . . . . . . . . 305 10.6 IterativeMethods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 10.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 10.8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 11 IncorporatingInvariances 333 11.1 PriorKnowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 11.2 TransformationInvariance . . . . . . . . . . . . . . . . . . . . . . . . 335 11.3 TheVirtualSVMethod . . . . . . . . . . . . . . . . . . . . . . . . . . 337 11.4 ConstructingInvarianceKernels . . . . . . . . . . . . . . . . . . . . 343 11.5 TheJitteredSVMethod . . . . . . . . . . . . . . . . . . . . . . . . . . 354 11.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 11.7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 12 LearningTheoryRevisited 359 12.1 ConcentrationofMeasureInequalities . . . . . . . . . . . . . . . . . 360 12.2 Leave-One-OutEstimates . . . . . . . . . . . . . . . . . . . . . . . . 366 12.3 PAC-BayesianBounds . . . . . . . . . . . . . . . . . . . . . . . . . . 381 12.4 Operator-TheoreticMethodsinLearningTheory . . . . . . . . . . . 391

Description:
In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs -- -kernels--for a number of learning tasks.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.