ebook img

Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies PDF

853 Pages·2020·62.679 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies

FundamentalsofMachineLearningforPredictiveDataAnalytics FundamentalsofMachineLearningforPredictiveDataAnalytics Algorithms,WorkedExamples,andCaseStudies SecondEdition JohnD.Kelleher,BrianMacNamee,andAoifeD’Arcy TheMITPress Cambridge,Massachusetts London,England (cid:13)c 2020 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or me- chanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was set in Times New Roman by the authors. Library of Congress Cataloging-in-Publication Data Names: Kelleher, John D., 1974- author. | Mac Namee, Brian, author. | D’Arcy, Aoife, 1978- author. Title: Fundamentals of machine learning for predictive data analytics : algorithms, worked exam- ples, and case studies / John D. Kelleher, Brian Mac Namee and Aoife D’Arcy. Description: Second edition. | Cambridge, Massachusetts : The MIT Press, 2020. | Includes biblio- graphical references and index. Identifiers: LCCN 2020002998 | ISBN 9780262044691 (hardcover) Subjects: LCSH: Machine learning. | Data mining. | Prediction theory. Classification: LCC Q325.5 .K455 2020 | DDC 519.2/870285631–dc23 LC record available at https://lccn.loc.gov/2020002998 Tomywifeandfamily, thankyouforyourlove,support,andpatience. John Tomyfamily. Brian ToGrandadD’Arcy,fortheinspiration. Aoife Contents Preface xv Notation xxiii ListofFigures xxxi ListofTables xlvii I INTRODUCTIONTOMACHINELEARNINGAND DATAANALYTICS 1 1 MachineLearningforPredictiveDataAnalytics 3 1.1 WhatIsPredictiveDataAnalytics? 3 1.2 WhatIsMachineLearning? 5 1.3 HowDoesMachineLearningWork? 7 1.4 InductiveBiasVersusSampleBias 12 1.5 WhatCanGoWrongwithMachineLearning? 13 1.6 ThePredictiveDataAnalyticsProjectLifecycle: CRISP-DM 15 1.7 PredictiveDataAnalyticsTools 17 1.8 TheRoadAhead 19 1.9 Exercises 21 2 DatatoInsightstoDecisions 23 2.1 ConvertingBusinessProblemsintoAnalyticsSolutions 23 2.1.1 CaseStudy: MotorInsuranceFraud 25 2.2 AssessingFeasibility 26 2.2.1 CaseStudy: MotorInsuranceFraud 27 2.3 DesigningtheAnalyticsBaseTable 28 2.3.1 CaseStudy: MotorInsuranceFraud 31 2.4 DesigningandImplementingFeatures 32 2.4.1 DifferentTypesofData 34 viii Contents 2.4.2 DifferentTypesofFeatures 34 2.4.3 HandlingTime 36 2.4.4 LegalIssues 39 2.4.5 ImplementingFeatures 41 2.4.6 CaseStudy: MotorInsuranceFraud 42 2.5 Summary 44 2.6 FurtherReading 47 2.7 Exercises 48 3 DataExploration 53 3.1 TheDataQualityReport 54 3.1.1 CaseStudy: MotorInsuranceFraud 55 3.2 GettingtoKnowtheData 55 3.2.1 TheNormalDistribution 61 3.2.2 CaseStudy: MotorInsuranceFraud 62 3.3 IdentifyingDataQualityIssues 63 3.3.1 MissingValues 64 3.3.2 IrregularCardinality 64 3.3.3 Outliers 65 3.3.4 CaseStudy: MotorInsuranceFraud 66 3.4 HandlingDataQualityIssues 69 3.4.1 HandlingMissingValues 69 3.4.2 HandlingOutliers 70 3.4.3 CaseStudy: MotorInsuranceFraud 71 3.5 AdvancedDataExploration 72 3.5.1 VisualizingRelationshipsbetweenFeatures 72 3.5.2 MeasuringCovarianceandCorrelation 81 3.6 DataPreparation 87 3.6.1 Normalization 87 3.6.2 Binning 89 3.6.3 Sampling 91 3.7 Summary 94 3.8 FurtherReading 95 3.9 Exercises 96 II PREDICTIVEDATAANALYTICS 115 4 Information-BasedLearning 117 4.1 BigIdea 117 4.2 Fundamentals 120 Contents ix 4.2.1 DecisionTrees 121 4.2.2 Shannon’sEntropyModel 123 4.2.3 InformationGain 127 4.3 StandardApproach: TheID3Algorithm 132 4.3.1 AWorkedExample: PredictingVegetationDistributions 135 4.4 ExtensionsandVariations 141 4.4.1 AlternativeFeatureSelectionandImpurityMetrics 142 4.4.2 HandlingContinuousDescriptiveFeatures 146 4.4.3 PredictingContinuousTargets 149 4.4.4 TreePruning 153 4.4.5 ModelEnsembles 158 4.5 Summary 169 4.6 FurtherReading 170 4.7 Exercises 172 5 Similarity-BasedLearning 181 5.1 BigIdea 181 5.2 Fundamentals 182 5.2.1 FeatureSpace 183 5.2.2 MeasuringSimilarityUsingDistanceMetrics 184 5.3 StandardApproach: TheNearestNeighborAlgorithm 187 5.3.1 AWorkedExample 188 5.4 ExtensionsandVariations 191 5.4.1 HandlingNoisyData 191 5.4.2 EfficientMemorySearch 196 5.4.3 DataNormalization 204 5.4.4 PredictingContinuousTargets 208 5.4.5 OtherMeasuresofSimilarity 211 5.4.6 FeatureSelection 223 5.5 Summary 230 5.6 FurtherReading 233 5.7 Epilogue 234 5.8 Exercises 236 6 Probability-BasedLearning 243 6.1 BigIdea 243 6.2 Fundamentals 245 6.2.1 Bayes’Theorem 248 6.2.2 BayesianPrediction 251 6.2.3 ConditionalIndependenceandFactorization 256

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.