Copyright(cid:13)c 2022AndrewWolf MACHINE LEARNING SIMPLIFIED: A GENTLE INTRODUCTION TO SUPERVISED LEARNING ANDREW WOLF THEMLSBOOK.COM GITHUB.COM/5X12/THEMLSBOOK LICENSE 1.0.1 Firstrelease,January2022 Contents I FUNDAMENTALS OF SUPERVISED LEARNING 1 Introduction .................................................... 5 1.1 Machine Learning 6 1.1.1 SupervisedLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2 UnsupervisedLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Machine Learning Pipeline 9 1.2.1 DataScience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.2 MLOperations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Artificial Intelligence 11 1.3.1 InformationProcessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.2 TypesofAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 Overview of this Book 13 2 Overview of Supervised Learning ............................... 15 2.1 ML Pipeline: Example 15 2.1.1 ProblemRepresentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.2 LearningaPredictionFunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.1.3 HowGoodisourPredictionFunction? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.1.4 ControllingModelComplexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 ML Pipeline: General Form 23 2.2.1 DataExtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.2 DataPreparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.3 ModelBuilding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.4 ModelDeployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3 Model Learning ............................................... 29 3.1 Linear Regression 29 3.1.1 LinearModels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.2 Goodness-of-Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.3 GradientDescentAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.4 GradientDescentwithMoreParameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Gradient Descent in Other ML Models 43 3.2.1 GettingStuckinaLocalMinimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.2 OvershootingGlobalMinimum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.3 Non-differentiableCostFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Basis Expansion and Regularization ............................. 49 4.1 Basis Expansion 49 4.1.1 PolynomialBasisExpansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.1.2 ComparisonofModelWeights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Regularization 55 4.2.1 RidgeRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.2 ChoosingRegularizationStrengthλ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.3 LassoRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.4 ComparisonbetweenL1andL2Regularization . . . . . . . . . . . . . . . . . . . . . . . . . 57 5 Model Selection ............................................... 59 5.1 Bias-Variance Decomposition 59 5.1.1 MathematicalDefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.1.2 DiagnosingBiasandVarianceErrorSources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.2 Validation Methods 64 5.2.1 Hold-outValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.2.2 CrossValidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.3 Unrepresentative Data 68 6 Feature Selection .............................................. 71 6.1 Introduction 71 6.2 Filter Methods 73 6.2.1 UnivariateSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.2.2 MultivariateSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.3 Search Methods 76 6.4 Embedded Methods 76 6.5 Comparison 77 7 Data Preparation .............................................. 79 7.1 Data Cleaning 80 7.1.1 DirtyData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.1.2 Outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.2 Feature Transformation 83 7.2.1 FeatureEncoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 7.2.2 FeatureScaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 7.3 Feature Engineering 87 7.3.1 FeatureBinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.3.2 RatioFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.4 Handling Class Label Imbalance 90 7.4.1 Oversampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.4.2 SyntheticMinorityOversamplingTechnique(SMOTE) . . . . . . . . . . . . . . . . . . . . . 92 A Appendix Unsupervised Learning ................................ e B Appendix Non-differentiable Cost Functions ..................... g B.0.1 DiscontinuousFunctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . g B.0.2 ContinuousNon-differentiableFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i i PREFACE It could be said that machine learning is my life. I am a machine learningengineerbydayandanenthusiasticSTEMtutorbynight. Iamconsistentlyinspiredbythisinfinitelyexcitingfieldandithas become one of my greatest passions. My interest in the machine learningdatesbackto2012whenIcameacrossanarticledescribing amachinelearningexperimentconductedbytheGoogleBrainteam. Theteam,ledbyAndrewNgandJeffDean,createdaneuralnetwork thatlearnedtorecognizecatsbywatchingimagestakenfromframes ofYouTubevideos. IbegantoconsiderthepossibilitiesandIwas hooked. WhyIWroteThisBook I,forone,eagerlylookforwardtoafutureinwhichMLwillblossom and reveal its full potential. However, in my conversations with friends and colleagues outside the ML field, I’ve observed that they are often perplexed by the seeming complexity of it. Many ofthemareintriguedbythefieldandwanttolearnmore,butfind a dearth of clear, reliable resources on the internet. Sources are either rife with academic trilogies filled with theorems designed forexperiencedresearchersandprofessionals(Icouldn’tevenget through half of one) or are sprinkled with fishy fairy tales about artificialintelligence,data-sciencemagic,andjobsofthefuture. Thisbookisdedicatedtothem—andthousandsmore-whowant to truly understand the methods and use cases of ML both from conceptualandmathematicalpointsofview,butwhomaynothave the luxury of time, which is required to comb through thousands of hours of technical literature, full of intimidating formulas and academicjargon. WhatThisBookIsAbout My goal for this book is to help make machine learning available toasmanypeopleaspossiblewhethertechnicalornot. Itiseasily accessible fora non-technical reader, but also containsway enough mathematicaldetail toserveas anintroduction tomachine learning for a technical reader. Nevertheless, some prior knowledge of mathematics, statistics and the Python programming language is recommendedtogetthemostoutofthisbook. I’vedonemybesttomakethisbookbothcomprehensiveandfun to read — mind you, that’s no easy feat! I’ve worked to combine mathematical rigor with simple, intuitive explanations based on examples from oureverydaylives. For example, deciding what to do over the weekend, or guessing a friend’s favorite color based onsomethingliketheirheightandweight(Iamonlyhalf-kidding 1 here). Youwillfindanswerstothesequestionsandmanymoreas youreadthisbook. HowtoUseThisBook Thisbookisdividedintotwoparts. PartIdiscussesthefundamen- tals of (supervised) machine learning, and Part II discusses more advanced machine learning algorithms. I divided the book in this wayforaveryimportantreason. Onemistakemanystudentsmake istojumprightintothealgorithms(oftenafterhearingoneoftheir names,likeSupportVectorMachines)withoutaproperfoundation. In doing so, they often fail to understand, or misunderstand, the algorithms. Some of these students get frustrated and quit after thisexperience. Inwritingthisbook,Iassumedthechapterswould be read sequentially. The book has a specific story line and most explanationsappearinthetextonlyoncetoavoidredundancy. Ihavealso supplemented thisbookwith aGitHubrepositorythat containspythonimplementationsofconceptsexplainedinthebook. Formoreinformation, scantheQRcodelocatedin the‘TryIt Now’ boxattheendofeachchapter,orjustgodirectlyto github.com/5x12/themlsbook. FinalWords Hopefullythisbookpersuadesyouthatmachinelearningisnotthe intimidating technology that it initially appears to be. Whatever your background and aspirations, you will find this book a useful introductiontothisfascinatingfield. Shouldyouhaveanyquestionsorsuggestions,feelfreetoreachout tomeatawolf.io. Iappreciateyourfeedback,andIhopethatitwill makethefutureeditionsofthisbookevenmorevaluable. Goodluckinyourmachinelearningjourney, Yourauthor Part I FUNDAMENTALS OF SUPERVISED LEARNING