Introduction to Machine Learning Second Edition AdaptiveComputationandMachineLearning ThomasDietterich,Editor ChristopherBishop,DavidHeckerman,MichaelJordan,andMichael Kearns,AssociateEditors Acompletelistofbooks publishedinTheAdaptiveComputationand MachineLearningseriesappearsatthebackof thisbook. Introduction to Machine Learning Second E d i t i o n Ethem Alpaydın The MIT Press Cambridge, Massachusetts London, England ©2010MassachusettsInstituteofTechnology Allrightsreserved. Nopartofthisbookmaybereproducedinanyformbyany electronicormechanicalmeans(includingphotocopying,recording,orinforma- tionstorageandretrieval)withoutpermissioninwritingfromthepublisher. Forinformationaboutspecialquantitydiscounts,pleaseemail [email protected]. Typesetin10/13LucidaBrightbytheauthorusingLATEX2ε. PrintedandboundintheUnitedStatesofAmerica. LibraryofCongressCataloging-in-PublicationInformation Alpaydin,Ethem. Introductiontomachinelearning/EthemAlpaydin. —2nded. p. cm. Includesbibliographicalreferencesandindex. ISBN978-0-262-01243-0(hardcover: alk. paper) 1. Machinelearning. I.Title Q325.5.A462010 006.3’1—dc22 2009013169 CIP 10987654321 Brief Contents 1 Introduction 1 2 SupervisedLearning 21 3 BayesianDecisionTheory 47 4 ParametricMethods 61 5 MultivariateMethods 87 6 DimensionalityReduction 109 7 Clustering 143 8 NonparametricMethods 163 9 DecisionTrees 185 10 LinearDiscrimination 209 11 MultilayerPerceptrons 233 12 Local Models 279 13 KernelMachines 309 14 BayesianEstimation 341 15 HiddenMarkovModels 363 16 GraphicalModels 387 17 CombiningMultipleLearners 419 18 Reinforcement Learning 447 19 DesignandAnalysisofMachineLearningExperiments 475 A Probability 517 Contents Series Foreword xvii Figures xix Tables xxix Preface xxxi Acknowledgments xxxiii NotesfortheSecondEdition xxxv Notations xxxix 1 Introduction 1 1.1 WhatIsMachineLearning? 1 1.2 Examplesof MachineLearningApplications 4 1.2.1 LearningAssociations 4 1.2.2 Classification 5 1.2.3 Regression 9 1.2.4 UnsupervisedLearning 11 1.2.5 ReinforcementLearning 13 1.3 Notes 14 1.4 RelevantResources 16 1.5 Exercises 18 1.6 References 19 2 Supervised Learning 21 2.1 LearningaClassfromExamples 21 viii Contents 2.2 Vapnik-Chervonenkis(VC)Dimension 27 2.3 ProbablyApproximatelyCorrect(PAC)Learning 29 2.4 Noise 30 2.5 LearningMultipleClasses 32 2.6 Regression 34 2.7 ModelSelectionandGeneralization 37 2.8 DimensionsofaSupervisedMachineLearningAlgorithm 41 2.9 Notes 42 2.10 Exercises 43 2.11 References 44 3 Bayesian Decision Theory 47 3.1 Introduction 47 3.2 Classification 49 3.3 LossesandRisks 51 3.4 DiscriminantFunctions 53 3.5 UtilityTheory 54 3.6 Association Rules 55 3.7 Notes 58 3.8 Exercises 58 3.9 References 59 4 ParametricMethods 61 4.1 Introduction 61 4.2 MaximumLikelihood Estimation 62 4.2.1 BernoulliDensity 63 4.2.2 MultinomialDensity 64 4.2.3 Gaussian(Normal)Density 64 4.3 EvaluatinganEstimator: BiasandVariance 65 4.4 TheBayes’Estimator 66 4.5 ParametricClassification 69 4.6 Regression 73 4.7 TuningModelComplexity: Bias/VarianceDilemma 76 4.8 ModelSelectionProcedures 80 4.9 Notes 84 4.10 Exercises 84 4.11 References 85 5 MultivariateMethods 87 5.1 MultivariateData 87 Contents ix 5.2 ParameterEstimation 88 5.3 EstimationofMissingValues 89 5.4 MultivariateNormalDistribution 90 5.5 MultivariateClassification 94 5.6 TuningComplexity 99 5.7 DiscreteFeatures 102 5.8 MultivariateRegression 103 5.9 Notes 105 5.10 Exercises 106 5.11 References 107 6 DimensionalityReduction 109 6.1 Introduction 109 6.2 SubsetSelection 110 6.3 PrincipalComponentsAnalysis 113 6.4 Factor Analysis 120 6.5 MultidimensionalScaling 125 6.6 LinearDiscriminantAnalysis 128 6.7 Isomap 133 6.8 LocallyLinearEmbedding 135 6.9 Notes 138 6.10 Exercises 139 6.11 References 140 7 Clustering 143 7.1 Introduction 143 7.2 MixtureDensities 144 7.3 k-MeansClustering 145 7.4 Expectation-MaximizationAlgorithm 149 7.5 Mixturesof LatentVariableModels 154 7.6 SupervisedLearningafterClustering 155 7.7 HierarchicalClustering 157 7.8 ChoosingtheNumberofClusters 158 7.9 Notes 160 7.10 Exercises 160 7.11 References 161 8 Nonparametric Methods 163 8.1 Introduction 163 8.2 NonparametricDensityEstimation 165 x Contents 8.2.1 HistogramEstimator 165 8.2.2 KernelEstimator 167 8.2.3 k-NearestNeighbor Estimator 168 8.3 GeneralizationtoMultivariateData 170 8.4 NonparametricClassification 171 8.5 CondensedNearestNeighbor 172 8.6 NonparametricRegression: SmoothingModels 174 8.6.1 RunningMeanSmoother 175 8.6.2 KernelSmoother 176 8.6.3 RunningLineSmoother 177 8.7 HowtoChoosetheSmoothingParameter 178 8.8 Notes 180 8.9 Exercises 181 8.10 References 182 9 Decision Trees 185 9.1 Introduction 185 9.2 UnivariateTrees 187 9.2.1 ClassificationTrees 188 9.2.2 RegressionTrees 192 9.3 Pruning 194 9.4 RuleExtractionfromTrees 197 9.5 LearningRulesfromData 198 9.6 MultivariateTrees 202 9.7 Notes 204 9.8 Exercises 207 9.9 References 207 10LinearDiscrimination 209 10.1 Introduction 209 10.2 GeneralizingtheLinearModel 211 10.3 GeometryoftheLinearDiscriminant 212 10.3.1 TwoClasses 212 10.3.2 MultipleClasses 214 10.4 PairwiseSeparation 216 10.5 ParametricDiscriminationRevisited 217 10.6 GradientDescent 218 10.7 LogisticDiscrimination 220 10.7.1 TwoClasses 220
Description: