ebook img

Machine Learning Refined: Foundations, Algorithms, and Applications PDF

301 Pages·2016·30.92 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning Refined: Foundations, Algorithms, and Applications

MachineLearningRefined Providingauniqueapproachtomachinelearning,thistextcontainsfreshandintuitive, yet rigorous, descriptions of all fundamental concepts necessary to conduct research, buildproducts,tinker,andplay. Byprioritizinggeometric intuition,algorithmicthink- ing, and practical real-world applications in disciplines including computer vision, naturallanguageprocessing,economics,neuroscience,recommendersystems,physics, andbiology,thistextprovidesreaderswithbothalucidunderstandingoffoundational material as well as the practical tools needed to solve real-world problems. With in- depth Python and MATLAB/OCTAVE-based computational exercises and a complete treatmentofcuttingedgenumericaloptimizationtechniques,thisisanessentialresource forstudentsandanidealreferenceforresearchersandpractitionersworkinginmachine learning,computerscience,electricalengineering,signalprocessing,andnumericalop- timization. Keyfeatures: • Apresentationbuiltonlucidgeometricintuition • Auniquetreatmentofstate-of-the-artnumericaloptimizationtechniques • Afusedintroductiontologisticregressionandsupportvectormachines • Inclusionoffeaturedesignandlearningasmajortopics • An unparalleled presentation of advanced topics through the lens of function approximation • Arefineddescriptionofdeepneuralnetworksandkernelmethods Jeremy Watt received his PhD in Computer Science and Electrical Engineering from Northwestern University. His research interests lie in machine learning and computer vision,aswellasnumericaloptimization. Reza Borhani received his PhD in Computer Science and Electrical Engineering from Northwestern University. His research interests lie in the design and analysis of algorithmsforproblemsinmachinelearningandcomputervision. Aggelos K. Katsaggelos is a professor and holder of the Joseph Cummings chair in the Department of Electrical Engineering and Computer Science at Northwestern University,wherehealsoheadstheImageandVideoProcessingLaboratory. Machine Learning Refined Foundations, Algorithms, and Applications JEREMY WATT, REZA BORHANI, AND AGGELOS K. KATSAGGELOS NorthwesternUniversity UniversityPrintingHouse,CambridgeCB28BS,UnitedKingdom CambridgeUniversityPressispartoftheUniversityofCambridge. ItfurtherstheUniversity’smissionbydisseminatingknowledgeinthepursuitof education,learningandresearchatthehighestinternationallevelsofexcellence. www.cambridge.org Informationonthistitle:www.cambridge.org/9781107123526 (cid:2)c CambridgeUniversityPress2016 Thispublicationisincopyright.Subjecttostatutoryexception andtotheprovisionsofrelevantcollectivelicensingagreements, noreproductionofanypartmaytakeplacewithoutthewritten permissionofCambridgeUniversityPress. Firstpublished2016 PrintedintheUnitedKingdombyClays,StIvesplc AcatalogrecordforthispublicationisavailablefromtheBritishLibrary LibraryofCongressCataloginginPublicationdata Names:Watt,Jeremy,author. | Borhani,Reza. | Katsaggelos,Aggelos Konstantinos,1956- Title:Machinelearningrefined:foundations,algorithms,and applications/JeremyWatt,RezaBorhani,AggelosKatsaggelos. Description:NewYork:CambridgeUniversityPress,2016. Identifiers:LCCN2015041122|ISBN9781107123526(hardback) Subjects:LCSH:Machinelearning. Classification:LCCQ325.5.W382016 | DDC006.3/1–dc23 LCrecordavailableathttp://lccn.loc.gov/2015041122 ISBN978-1-107-12352-6Hardback Additionalresourcesforthispublicationatwww.cambridge.org/watt CambridgeUniversityPresshasnoresponsibilityforthepersistenceoraccuracy ofURLsforexternalorthird-partyinternetwebsitesreferredtointhispublication, anddoesnotguaranteethatanycontentonsuchwebsitesis,orwillremain, accurateorappropriate. Contents Preface pagexi 1 Introduction 1 1.1 Teachingacomputertodistinguishcatsfromdogs 1 1.1.1 Thepipelineofatypicalmachinelearningproblem 5 1.2 Predictivelearningproblems 6 1.2.1 Regression 6 1.2.2 Classification 9 1.3 Featuredesign 12 1.4 Numericaloptimization 15 1.5 Summary 16 PartI Fundamentaltoolsandconcepts 19 2 Fundamentalsofnumericaloptimization 21 2.1 Calculus-definedoptimality 21 2.1.1 Taylorseriesapproximations 21 2.1.2 Thefirstorderconditionforoptimality 22 2.1.3 Theconvenienceofconvexity 24 2.2 Numericalmethodsforoptimization 26 2.2.1 Thebigpicture 27 2.2.2 Stoppingcondition 27 2.2.3 Gradientdescent 29 2.2.4 Newton’smethod 33 2.3 Summary 38 2.4 Exercises 38 3 Regression 45 3.1 Thebasicsoflinearregression 45 3.1.1 Notationandmodeling 45 3.1.2 TheLeastSquarescostfunctionforlinearregression 47 3.1.3 MinimizationoftheLeastSquarescostfunction 48 vi Contents 3.1.4 Theefficacyofalearnedmodel 50 3.1.5 Predictingthevalueofnewinputdata 50 3.2 Knowledge-drivenfeaturedesignforregression 51 3.2.1 Generalconclusions 54 3.3 Nonlinearregressionand(cid:2) regularization 56 2 3.3.1 Logisticregression 56 3.3.2 Non-convexcostfunctionsand(cid:2) regularization 59 2 3.4 Summary 61 3.5 Exercises 62 4 Classification 73 4.1 Theperceptroncostfunctions 73 4.1.1 Thebasicperceptronmodel 73 4.1.2 Thesoftmaxcostfunction 75 4.1.3 Themarginperceptron 78 4.1.4 Differentiableapproximationstothemarginperceptron 80 4.1.5 Theaccuracyofalearnedclassifier 82 4.1.6 Predictingthevalueofnewinputdata 83 4.1.7 Whichcostfunctionproducesthebestresults? 84 4.1.8 The connection between the perceptron and counting costs 85 4.2 Thelogisticregressionperspectiveonthesoftmaxcost 86 4.2.1 Stepfunctionsandclassification 87 4.2.2 Convexlogisticregression 89 4.3 The support vector machine perspective on the margin perceptron 91 4.3.1 Aquestforthehyperplanewithmaximummargin 91 4.3.2 Thehard-marginSVMproblem 93 4.3.3 Thesoft-marginSVMproblem 93 4.3.4 Supportvectormachinesandlogisticregression 95 4.4 Multiclassclassification 95 4.4.1 One-versus-allmulticlassclassification 96 4.4.2 Multiclasssoftmaxclassification 99 4.4.3 Theaccuracyofalearnedmulticlassclassifier 103 4.4.4 Whichmulticlassclassificationschemeworksbest? 104 4.5 Knowledge-drivenfeaturedesignforclassification 104 4.5.1 Generalconclusions 106 4.6 Histogramfeaturesforrealdatatypes 107 4.6.1 Histogramfeaturesfortextdata 109 4.6.2 Histogramfeaturesforimagedata 112 4.6.3 Histogramfeaturesforaudiodata 115 4.7 Summary 117 4.8 Exercises 118 Contents vii PartII Toolsforfullydata-drivenmachinelearning 129 5 Automaticfeaturedesignforregression 131 5.1 Automaticfeaturedesignfortheidealregressionscenario 131 5.1.1 Vectorapproximation 132 5.1.2 Fromvectorstocontinuousfunctions 133 5.1.3 Continuousfunctionapproximation 134 5.1.4 Commonbasesforcontinuousfunctionapproximation 135 5.1.5 Recoveringweights 140 5.1.6 Graphicalrepresentationofaneuralnetwork 140 5.2 Automaticfeaturedesignfortherealregressionscenario 141 5.2.1 Approximationofdiscretizedcontinuousfunctions 142 5.2.2 Therealregressionscenario 142 5.3 Cross-validationforregression 146 5.3.1 Diagnosingtheproblemofoverfitting/underfitting 149 5.3.2 Holdoutcross-validation 149 5.3.3 Holdoutcalculations 151 5.3.4 k-foldcross-validation 152 5.4 Whichbasisworksbest? 155 5.4.1 Understandingofthephenomenonunderlyingthedata 156 5.4.2 Practicalconsiderations 156 5.4.3 Whenthechoiceofbasisisarbitrary 156 5.5 Summary 158 5.6 Exercises 158 5.7 Notesoncontinuousfunctionapproximation 165 6 Automaticfeaturedesignforclassification 166 6.1 Automaticfeaturedesignfortheidealclassificationscenario 166 6.1.1 Approximationofpiecewisecontinuousfunctions 166 6.1.2 Theformaldefinitionofanindicatorfunction 168 6.1.3 Indicatorfunctionapproximation 170 6.1.4 Recoveringweights 170 6.2 Automaticfeaturedesignfortherealclassificationscenario 171 6.2.1 Approximationofdiscretizedindicatorfunctions 171 6.2.2 Therealclassificationscenario 172 6.2.3 Classifieraccuracyandboundarydefinition 178 6.3 Multiclassclassification 179 6.3.1 One-versus-allmulticlassclassification 179 6.3.2 Multiclasssoftmaxclassification 180 6.4 Cross-validationforclassification 180 6.4.1 Holdoutcross-validation 182 6.4.2 Holdoutcalculations 182 viii Contents 6.4.3 k-foldcross-validation 184 6.4.4 k-fold cross-validation for one-versus-all multiclass classification 187 6.5 Whichbasisworksbest? 187 6.6 Summary 188 6.7 Exercises 189 7 Kernels,backpropagation,andregularizedcross-validation 195 7.1 Fixedfeaturekernels 195 7.1.1 Thefundamentaltheoremoflinearalgebra 196 7.1.2 Kernelizingcostfunctions 197 7.1.3 Thevalueofkernelization 197 7.1.4 Examplesofkernels 199 7.1.5 Kernelsassimilaritymatrices 201 7.2 Thebackpropagationalgorithm 202 7.2.1 Computing the gradient of a two layer network cost function 203 7.2.2 Threelayerneuralnetworkgradientcalculations 205 7.2.3 Gradientdescentwithmomentum 206 7.3 Cross-validationvia(cid:2) regularization 208 2 7.3.1 (cid:2) regularizationandcross-validation 209 2 7.3.2 Regularizedk-foldcross-validationforregression 210 7.3.3 Regularizedcross-validationforclassification 211 7.4 Summary 212 7.5 Furtherkernelcalculations 212 7.5.1 Kernelizingvariouscostfunctions 212 7.5.2 Fourierkernelcalculations–scalarinput 214 7.5.3 Fourierkernelcalculations–vectorinput 215 PartIII Methodsforlargescalemachinelearning 217 8 Advancedgradientschemes 219 8.1 Fixedsteplengthrulesforgradientdescent 219 8.1.1 Gradientdescentandsimplequadraticsurrogates 219 8.1.2 Functionswithboundedcurvatureandoptimallyconservative steplengthrules 221 8.1.3 Howtousetheconservativefixedsteplengthrule 224 8.2 Adaptivesteplengthrulesforgradientdescent 225 8.2.1 Adaptivesteplengthruleviabacktrackinglinesearch 226 8.2.2 Howtousetheadaptivesteplengthrule 227 8.3 Stochasticgradientdescent 229 8.3.1 Decomposingthegradient 229 8.3.2 Thestochasticgradientdescentiteration 230 8.3.3 Thevalueofstochasticgradientdescent 232 Contents ix 8.3.4 Steplengthrulesforstochasticgradientdescent 233 8.3.5 Howtousethestochasticgradientmethodinpractice 234 8.4 Convergenceproofsforgradientdescentschemes 235 8.4.1 ConvergenceofgradientdescentwithLipschitzconstantfixed steplength 236 8.4.2 Convergence of gradient descent with backtracking line search 236 8.4.3 Convergenceofthestochasticgradientmethod 238 8.4.4 Convergencerateofgradientdescentforconvexfunctions withfixedsteplength 239 8.5 CalculationofcomputableLipschitzconstants 241 8.6 Summary 243 8.7 Exercises 243 9 Dimensionreductiontechniques 245 9.1 Techniquesfordatadimensionreduction 245 9.1.1 Randomsubsampling 245 9.1.2 K-meansclustering 246 9.1.3 OptimizationoftheK-meansproblem 249 9.2 Principalcomponentanalysis 250 9.2.1 OptimizationofthePCAproblem 256 9.3 Recommendersystems 256 9.3.1 Matrixcompletionsetup 257 9.3.2 Optimizationofthematrixcompletionmodel 258 9.4 Summary 259 9.5 Exercises 260 PartIV Appendices 263 A Basicvectorandmatrixoperations 265 A.1 Vectoroperations 265 A.2 Matrixoperations 266 B Basicsofvectorcalculus 268 B.1 Basicdefinitions 268 B.2 Commonlyusedrulesforcomputingderivatives 269 B.3 ExamplesofgradientandHessiancalculations 269 C Fundamentalmatrixfactorizationsandthepseudo-inverse 274 C.1 Fundamentalmatrixfactorizations 274 C.1.1 Thesingularvaluedecomposition 274 C.1.2 Eigenvaluedecomposition 276 C.1.3 Thepseudo-inverse 277

Description:
Providing a unique approach to machine learning, this text contains fresh and intuitive, yet rigorous, descriptions of all fundamental concepts necessary to conduct research, build products, tinker, and play. By prioritizing geometric intuition, algorithmic thinking, and practical real world applica
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.