ebook img

Machine Learning Engineering PDF

310 Pages·2020·38.025 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Machine Learning Engineering

Machine Learning Engineering Andriy Burkov Copyright©2020AndriyBurkov Allrightsreserved. Thisbookisdistributedonthe“readfirst,buylater”principle. Thelatter impliesthatanyonecanobtainacopyofthebookbyanymeansavailable,readitandshare itwithanyoneelse. However,ifyoureadthebook,likeditorfoundithelpfulorusefulin anyway,youhavetobuyit. Forfurtherinformation,[email protected]. Copyeditor: AidaE.Roig-Compton Illustrators: KoenVandenEeckhout,AndriyBurkov Coverdesigner: CristinaEleutérioAlvesAugusto Publisher: TruePositiveInc. ISBN978-1-7770054-5-0 Tomyparents: TatianaandValeriy andtomyfamily: daughtersCatherineandEva, andbrotherDmitriy “In theory, there is no difference between theory and practice. Butinpractice,thereis.” —BenjaminBrewster “Theperfectprojectplanispossibleifonefirstdocuments alistofalltheunknowns.” —BillLangley “Whenyou’refundraising,it’sAI.Whenyou’rehiring, it’s ML. When you’re implementing, it’s linear regres- sion. Whenyou’redebugging,it’sprintf().” —BaronSchwartz Thebookisdistributedonthe“read-first,buy-later”principle. Contents Foreword xxiii Preface xxv WhoThisBookisFor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv HowtoUseThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi ShouldYouBuyThisBook? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvi 1 Introduction 1 1.1 NotationandDefinitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 DataStructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 CapitalSigmaNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 WhatisMachineLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 SupervisedLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 UnsupervisedLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.3 Semi-SupervisedLearning . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.4 ReinforcementLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 DataandMachineLearningTerminology . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 DataUsedDirectlyandIndirectly . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 RawandTidyData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 ix 1.3.3 TrainingandHoldoutSets . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.4 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.5 MachineLearningPipeline . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.6 Parametersvs.Hyperparameters . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.7 Classificationvs.Regression . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.8 Model-Basedvs.Instance-BasedLearning . . . . . . . . . . . . . . . . . . 11 1.3.9 Shallowvs.DeepLearning . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.10 Trainingvs.Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 WhentoUseMachineLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.1 WhentheProblemIsTooComplexforCoding . . . . . . . . . . . . . . . 12 1.4.2 WhentheProblemIsConstantlyChanging . . . . . . . . . . . . . . . . . 13 1.4.3 WhenItIsaPerceptiveProblem . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.4 WhenItIsanUnstudiedPhenomenon . . . . . . . . . . . . . . . . . . . 14 1.4.5 WhentheProblemHasaSimpleObjective . . . . . . . . . . . . . . . . . 14 1.4.6 WhenItIsCost-Effective . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5 WhenNottoUseMachineLearning . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.6 WhatisMachineLearningEngineering. . . . . . . . . . . . . . . . . . . . . . . . 15 1.7 MachineLearningProjectLifeCycle . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2 BeforetheProjectStarts 21 2.1 PrioritizationofMachineLearningProjects . . . . . . . . . . . . . . . . . . . . . 21 2.1.1 ImpactofMachineLearning . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.2 CostofMachineLearning . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2 EstimatingComplexityofaMachineLearningProject . . . . . . . . . . . . . . . 23 x 2.2.1 TheUnknowns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.2 SimplifyingtheProblem . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.3 NonlinearProgress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 DefiningtheGoalofaMachineLearningProject . . . . . . . . . . . . . . . . . . 25 2.3.1 WhataModelCanDo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 PropertiesofaSuccessfulModel . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 StructuringaMachineLearningTeam . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.1 TwoCultures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.4.2 MembersofaMachineLearningTeam . . . . . . . . . . . . . . . . . . . 27 2.5 WhyMachineLearningProjectsFail . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.1 LackofExperiencedTalent . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.2 LackofSupportbytheLeadership . . . . . . . . . . . . . . . . . . . . . . 29 2.5.3 MissingDataInfrastructure . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5.4 DataLabelingChallenge . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5.5 SiloedOrganizationsandLackofCollaboration . . . . . . . . . . . . . . 30 2.5.6 TechnicallyInfeasibleProjects . . . . . . . . . . . . . . . . . . . . . . . . 31 2.5.7 LackofAlignmentBetweenTechnicalandBusinessTeams . . . . . . . . 31 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3 DataCollectionandPreparation 35 3.1 QuestionsAbouttheData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.1.1 IstheDataAccessible? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.1.2 IstheDataSizeable? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.1.3 IstheDataUseable? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.1.4 IstheDataUnderstandable? . . . . . . . . . . . . . . . . . . . . . . . . . 40 xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.