D D The fundamental mathematical tools needed to understand machine E e I ise learning include linear algebra, analytic geometry, matrix decompositions, SE n N rit vector calculus, optimization, probability and statistics. These topics R MATHEMATICS h O et are traditionally taught in disparate courses, making it hard for data T FOR a H l. 9 science or computer science students, or professionals, to effi ciently learn E 7 T 8 the mathematics. This self-contained textbook bridges the gap between 1 A 1 0 mathematical and machine learning texts, introducing the mathematical L 8 . 4 5 concepts with a minimum of prerequisites. It uses these concepts to MACHINE LEAR NING 5 1 M 4 derive four central machine learning methods: linear regression, principal 5 C A o component analysis, Gaussian mixture models and support vector machines. v T e r For students and others with a mathematical background, these derivations H . C E M provide a starting point to machine learning texts. For those learning the M Y mathematics for the fi rst time, the methods help build intuition and practical A K T experience with applying mathematical concepts. Every chapter includes I C worked examples and exercises to test understanding. Programming S tutorials are offered on the book’s web site. F O R MARC PETER DEISENROTH is Senior Lecturer in Statistical Machine M Learning at the Department of Computing, Împerial College London. A C H A. ALDO FAISAL leads the Brain & Behaviour Lab at Imperial College I London, where he is also Reader in Neurotechnology at the Department of N E Bioengineering and the Department of Computing. L E A CHENG SOON ONG is Principal Research Scientist at the Machine Learning R Research Group, Data61, CSIRO. He is also Adjunct Associate Professor at N I Australian National University. N G Marc Peter Deisenroth A. Aldo Faisal Cheng Soon Ong Cover image courtesy of Daniel Bosma / Moment / Getty Images Cover design by Holly Johnson Contents Foreword 1 Part I Mathematical Foundations 9 1 IntroductionandMotivation 11 1.1 FindingWordsforIntuitions 12 1.2 TwoWaystoReadThisBook 13 1.3 ExercisesandFeedback 16 2 LinearAlgebra 17 2.1 SystemsofLinearEquations 19 2.2 Matrices 22 2.3 SolvingSystemsofLinearEquations 27 2.4 VectorSpaces 35 2.5 LinearIndependence 40 2.6 BasisandRank 44 2.7 LinearMappings 48 2.8 AffineSpaces 61 2.9 FurtherReading 63 Exercises 63 3 AnalyticGeometry 70 3.1 Norms 71 3.2 InnerProducts 72 3.3 LengthsandDistances 75 3.4 AnglesandOrthogonality 76 3.5 OrthonormalBasis 78 3.6 OrthogonalComplement 79 3.7 InnerProductofFunctions 80 3.8 OrthogonalProjections 81 3.9 Rotations 91 3.10 FurtherReading 94 Exercises 95 4 MatrixDecompositions 98 4.1 DeterminantandTrace 99 i Draft(October15,2019)of“MathematicsforMachineLearning”(cid:13)c2019byM.P.Deisenroth,A. A.Faisal,andC.S.Ong.TobepublishedbyCambridgeUniversityPress.https://mml-book.com. ii Contents 4.2 EigenvaluesandEigenvectors 105 4.3 CholeskyDecomposition 114 4.4 EigendecompositionandDiagonalization 115 4.5 SingularValueDecomposition 119 4.6 MatrixApproximation 129 4.7 MatrixPhylogeny 134 4.8 FurtherReading 135 Exercises 137 5 VectorCalculus 139 5.1 DifferentiationofUnivariateFunctions 141 5.2 PartialDifferentiationandGradients 146 5.3 GradientsofVector-ValuedFunctions 149 5.4 GradientsofMatrices 155 5.5 UsefulIdentitiesforComputingGradients 158 5.6 BackpropagationandAutomaticDifferentiation 159 5.7 Higher-OrderDerivatives 164 5.8 LinearizationandMultivariateTaylorSeries 165 5.9 FurtherReading 170 Exercises 170 6 ProbabilityandDistributions 172 6.1 ConstructionofaProbabilitySpace 172 6.2 DiscreteandContinuousProbabilities 178 6.3 SumRule,ProductRule,andBayes’Theorem 183 6.4 SummaryStatisticsandIndependence 186 6.5 GaussianDistribution 197 6.6 ConjugacyandtheExponentialFamily 205 6.7 ChangeofVariables/InverseTransform 214 6.8 FurtherReading 221 Exercises 222 7 ContinuousOptimization 225 7.1 OptimizationUsingGradientDescent 227 7.2 ConstrainedOptimizationandLagrangeMultipliers 233 7.3 ConvexOptimization 236 7.4 FurtherReading 246 Exercises 247 Part II Central Machine Learning Problems 249 8 WhenModelsMeetData 251 8.1 Data,Models,andLearning 251 8.2 EmpiricalRiskMinimization 258 8.3 ParameterEstimation 265 8.4 ProbabilisticModelingandInference 272 8.5 DirectedGraphicalModels 278 Draft(2019-10-15)of“MathematicsforMachineLearning”.Feedback:https://mml-book.com. Contents iii 8.6 ModelSelection 283 9 LinearRegression 289 9.1 ProblemFormulation 291 9.2 ParameterEstimation 292 9.3 BayesianLinearRegression 303 9.4 MaximumLikelihoodasOrthogonalProjection 313 9.5 FurtherReading 315 10 DimensionalityReductionwithPrincipalComponentAnalysis 317 10.1 ProblemSetting 318 10.2 MaximumVariancePerspective 320 10.3 ProjectionPerspective 325 10.4 EigenvectorComputationandLow-RankApproximations 333 10.5 PCAinHighDimensions 335 10.6 KeyStepsofPCAinPractice 336 10.7 LatentVariablePerspective 339 10.8 FurtherReading 343 11 DensityEstimationwithGaussianMixtureModels 348 11.1 GaussianMixtureModel 349 11.2 ParameterLearningviaMaximumLikelihood 350 11.3 EMAlgorithm 360 11.4 Latent-VariablePerspective 363 11.5 FurtherReading 368 12 ClassificationwithSupportVectorMachines 370 12.1 SeparatingHyperplanes 372 12.2 PrimalSupportVectorMachine 374 12.3 DualSupportVectorMachine 383 12.4 Kernels 388 12.5 NumericalSolution 390 12.6 FurtherReading 392 References 395 Index 407 (cid:13)c2019M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress. Foreword Machine learning is the latest in a long line of attempts to distill human knowledgeandreasoningintoaformthatissuitableforconstructingma- chinesandengineeringautomatedsystems.Asmachinelearningbecomes moreubiquitousanditssoftwarepackagesbecomeeasiertouse,itisnat- uralanddesirablethatthelow-leveltechnicaldetailsareabstractedaway and hidden from the practitioner. However, this brings with it the danger that a practitioner becomes unaware of the design decisions and, hence, thelimitsofmachinelearningalgorithms. The enthusiastic practitioner who is interested to learn more about the magic behind successful machine learning algorithms currently faces a dauntingsetofpre-requisiteknowledge: Programminglanguagesanddataanalysistools Large-scalecomputationandtheassociatedframeworks Mathematicsandstatisticsandhowmachinelearningbuildsonit At universities, introductory courses on machine learning tend to spend earlypartsofthecoursecoveringsomeofthesepre-requisites.Forhistori- calreasons,coursesinmachinelearningtendtobetaughtinthecomputer sciencedepartment,wherestudentsareoftentrainedinthefirsttwoareas ofknowledge,butnotsomuchinmathematicsandstatistics. Current machine learning textbooks primarily focus on machine learn- ing algorithms and methodologies and assume that the reader is com- petent in mathematics and statistics. Therefore, these books only spend one or two chapters of background mathematics, either at the beginning of the book or as appendices. We have found many people who want to delve into the foundations of basic machine learning methods who strug- glewiththemathematicalknowledgerequiredtoreadamachinelearning textbook.Havingtaughtundergraduateandgraduatecoursesatuniversi- ties,wefindthatthegapbetweenhighschoolmathematicsandthemath- ematicslevelrequiredtoreadastandardmachinelearningtextbookistoo bigformanypeople. Thisbookbringsthemathematicalfoundationsofbasicmachinelearn- ing concepts to the fore and collects the information in a single place so thatthisskillsgapisnarrowedorevenclosed. 1 Draft(October15,2019)of“MathematicsforMachineLearning”(cid:13)c2019byM.P.Deisenroth,A. A.Faisal,andC.S.Ong.TobepublishedbyCambridgeUniversityPress.https://mml-book.com. 2 Foreword Why Another Book on Machine Learning? Machine learning builds upon the language of mathematics to express concepts that seem intuitively obvious but that are surprisingly difficult toformalize.Onceformalizedproperly,wecangaininsightsintothetask we want to solve. One common complaint of students of mathematics around the globe is that the topics covered seem to have little relevance topracticalproblems.Webelievethatmachinelearningisanobviousand directmotivationforpeopletolearnmathematics. This book is intended to be a guidebook to the vast mathematical lit- “Mathislinkedin erature that forms the foundations of modern machine learning. We mo- thepopularmind tivate the need for mathematical concepts by directly pointing out their withphobiaand usefulness in the context of fundamental machine learning problems. In anxiety.You’dthink the interest of keeping the book short, many details and more advanced we’rediscussing spiders.”(Strogatz, concepts have been left out. Equipped with the basic concepts presented 2014,page281) here, and how they fit into the larger context of machine learning, the readercanfindnumerousresourcesforfurtherstudy,whichweprovideat theendoftherespectivechapters.Forreaderswithamathematicalback- ground,thisbookprovidesabriefbutpreciselystatedglimpseofmachine learning. In contrast to other books that focus on methods and models of machine learning (MacKay, 2003; Bishop, 2006; Alpaydin, 2010; Bar- ber, 2012; Murphy, 2012; Shalev-Shwartz and Ben-David, 2014; Rogers andGirolami,2016)orprogrammaticaspectsofmachinelearning(Mu¨ller andGuido,2016;RaschkaandMirjalili,2017;CholletandAllaire,2018), we provide only four representative examples of machine learning algo- rithms.Instead,wefocusonthemathematicalconceptsbehindthemodels themselves.Wehopethatreaderswillbeabletogainadeeperunderstand- ingofthebasicquestionsinmachinelearningandconnectpracticalques- tions arising from the use of machine learning with fundamental choices inthemathematicalmodel. We do not aim to write a classical machine learning book. Instead, our intentionistoprovidethemathematicalbackground,appliedtofourcen- tral machine learning problems, to make it easier to read other machine learningtextbooks. Who Is the Target Audience? As applications of machine learning become widespread in society, we believethateverybodyshouldhavesomeunderstandingofitsunderlying principles.Thisbookiswritteninanacademicmathematicalstyle,which enables us to be precise about the concepts behind machine learning. We encourage readers unfamiliar with this seemingly terse style to persevere and to keep the goals of each topic in mind. We sprinkle comments and remarks throughout the text, in the hope that it provides useful guidance withrespecttothebigpicture. The book assumes the reader to have mathematical knowledge commonly Draft(2019-10-15)of“MathematicsforMachineLearning”.Feedback:https://mml-book.com. Foreword 3 covered in high school mathematics and physics. For example, the reader should have seen derivatives and integrals before, and geometric vectors intwoorthreedimensions.Startingfromthere,wegeneralizethesecon- cepts. Therefore, the target audience of the book includes undergraduate university students, evening learners and learners participating in online machinelearningcourses. In analogy to music, there are three types of interaction that people havewithmachinelearning: AstuteListener Thedemocratizationofmachinelearningbythepro- vision of open-source software, online tutorials and cloud-based tools al- lowsuserstonotworryaboutthespecificsofpipelines.Userscanfocuson extracting insights from data using off-the-shelf tools. This enables non- tech-savvy domain experts to benefit from machine learning. This is sim- ilar to listening to music; the user is able to choose and discern between different types of machine learning, and benefits from it. More experi- enced users are like music critics, asking important questions about the applicationofmachinelearninginsocietysuchasethics,fairness,andpri- vacy of the individual. We hope that this book provides a foundation for thinkingaboutthecertificationandriskmanagementofmachinelearning systems, and allows them to use their domain expertise to build better machinelearningsystems. ExperiencedArtist Skilledpractitionersofmachinelearningcanplug andplaydifferenttoolsandlibrariesintoananalysispipeline.Thestereo- typicalpractitionerwouldbeadatascientistorengineerwhounderstands machine learning interfaces and their use cases, and is able to perform wonderfulfeatsofpredictionfromdata.Thisissimilartoavirtuosoplay- ing music, where highly skilled practitioners can bring existing instru- ments to life and bring enjoyment to their audience. Using the mathe- matics presented here as a primer, practitioners would be able to under- stand the benefits and limits of their favorite method, and to extend and generalize existing machine learning algorithms. We hope that this book provides the impetus for more rigorous and principled development of machinelearningmethods. FledglingComposer Asmachinelearningisappliedtonewdomains, developersofmachinelearningneedtodevelopnewmethodsandextend existing algorithms. They are often researchers who need to understand themathematicalbasisofmachinelearninganduncoverrelationshipsbe- tween different tasks. This is similar to composers of music who, within therulesandstructureofmusicaltheory,createnewandamazingpieces. Wehopethisbookprovidesahigh-leveloverviewofothertechnicalbooks for people who want to become composers of machine learning. There is a great need in society for new researchers who are able to propose and explore novel approaches for attacking the many challenges of learning fromdata. (cid:13)c2019M.P.Deisenroth,A.A.Faisal,C.S.Ong.TobepublishedbyCambridgeUniversityPress. 4 Foreword Acknowledgments Wearegratefultomanypeoplewholookedatearlydraftsofthebookand suffered through painful expositions of concepts. We tried to implement their ideas that we did not vehemently disagree with. We would like to especiallyacknowledgeChristfriedWebersforhiscarefulreadingofmany parts of the book, and his detailed suggestions on structure and presen- tation. Many friends and colleagues have also been kind enough to pro- vide their time and energy on different versions of each chapter. We have been lucky to benefit from the generosity of the online community, who have suggested improvements via github.com, which greatly improved thebook. Thefollowingpeoplehavefoundbugs,proposedclarificationsandsug- gestedrelevantliterature,eitherviagithub.comorpersonalcommunica- tion.Theirnamesaresortedalphabetically. Abdul-GaniyUsman EllenBroad AdamGaier FengkuangtianZhu AdeleJackson FionaCondon AdityaMenon GeorgiosTheodorou AlasdairTran HeXin AleksandarKrnjaic IreneRaissaKameni AlexanderMakrigiorgos JakubNabaglo AlfredoCanziani JamesHensman AliShafti JamieLiu AmrKhalifa JeanKaddour AndrewTanggara Jean-PaulEbejer AngusGruen JerryQiang AntalA.Buss JiteshSindhare AntoineToisoulLeCann JohnLloyd AregSarvazyan JonasNgnawe ArtemArtemev JonMartin ArtyomStepanov JustinHsi BillKromydas KaiArulkumaran BobWilliamson KamilDreczkowski BoonPingLim LilyWang ChaoQu LionelTondjiNgoupeyou ChengLi LydiaKnu¨fing ChrisSherlock MahmoudAslan ChristopherGray MarkHartenstein DanielMcNamara MarkvanderWilk DanielWood MarkusHegland DarrenSiegel MartinHewing DavidJohnston MatthewAlger DaweiChen MatthewLee Draft(2019-10-15)of“MathematicsforMachineLearning”.Feedback:https://mml-book.com.