ebook img

Data Mining for Business Analytics PDF

577 Pages·2017·25.26 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Mining for Business Analytics

DATA MINING FOR BUSINESS ANALYTICS DATA MINING FOR BUSINESS ANALYTICS Concepts, Techniques, and Applications in R Galit Shmueli Peter C. Bruce Inbal Yahav Nitin R. Patel Kenneth C. Lichtendahl, Jr. Thiseditionfirstpublished2018 ©2018JohnWiley&Sons,Inc. Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmitted,in anyformorbyanymeans,electronic,mechanical,photocopying,recordingorotherwise,exceptaspermittedby law.Adviceonhowtoobtainpermissiontoreusematerialfromthistitleisavailableat http://www.wiley.com/go/permissions. TherightofGalitShmueli,PeterC.Bruce,InbalYahav,NitinR.Patel,andKennethC.LichtendahlJr.tobe identifiedastheauthorsofthisworkhasbeenassertedinaccordancewithlaw. RegisteredOffices JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030,USA EditorialOffice 111RiverStreet,Hoboken,NJ07030,USA Fordetailsofourglobaleditorialoffices,customerservices,andmoreinformationaboutWileyproductsvisitusat www.wiley.com. Wileyalsopublishesitsbooksinavarietyofelectronicformatsandbyprint-on-demand.Somecontentthat appearsinstandardprintversionsofthisbookmaynotbeavailableinotherformats. LimitofLiability/DisclaimerofWarranty Thepublisherandtheauthorsmakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompleteness ofthecontentsofthisworkandspecificallydisclaimallwarranties;includingwithoutlimitationanyimplied warrantiesoffitnessforaparticularpurpose.Thisworkissoldwiththeunderstandingthatthepublisherisnot engagedinrenderingprofessionalservices.Theadviceandstrategiescontainedhereinmaynotbesuitablefor everysituation.Inviewofon-goingresearch,equipmentmodifications,changesingovernmentalregulations,and theconstantflowofinformationrelatingtotheuseofexperimentalreagents,equipment,anddevices,thereader isurgedtoreviewandevaluatetheinformationprovidedinthepackageinsertorinstructionsforeachchemical, pieceofequipment,reagent,ordevicefor,amongotherthings,anychangesintheinstructionsorindicationof usageandforaddedwarningsandprecautions.Thefactthatanorganizationorwebsiteisreferredtointhiswork asacitationand/orpotentialsourceoffurtherinformationdoesnotmeanthattheauthororthepublisher endorsestheinformationtheorganizationorwebsitemayprovideorrecommendationsitmaymake.Further, readersshouldbeawarethatwebsiteslistedinthisworkmayhavechangedordisappearedbetweenwhenthis workswaswrittenandwhenitisread.Nowarrantymaybecreatedorextendedbyanypromotionalstatements forthiswork.Neitherthepublishernortheauthorshallbeliableforanydamagesarisingherefrom. LibraryofCongressCataloging-in-PublicationDataappliedfor Hardback:9781118879368 CoverDesign:Wiley CoverImage:©AchimMittler,FrankfurtamMain/Gettyimages Setin11.5/14.5ptBemboStdbyAptaraInc.,NewDelhi,India PrintedintheUnitedStatesofAmerica. 10 9 8 7 6 5 4 3 2 1 The beginning of wisdom is this: Get wisdom, and whatever else you get, get insight. – Proverbs 4:7 Contents ForewordbyGarethJames xix ForewordbyRaviBapna xxi PrefacetotheREdition xxiii Acknowledgments xxvii PART I PRELIMINARIES CHAPTER 1 Introduction 3 1.1 WhatIsBusinessAnalytics? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 WhatIsDataMining? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 DataMiningandRelatedTerms . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 BigData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 DataScience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 WhyAreThereSoManyDifferentMethods? . . . . . . . . . . . . . . . . . . . 8 1.7 TerminologyandNotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.8 RoadMapstoThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 OrderofTopics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 CHAPTER 2 Overview of the Data Mining Process 15 2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 CoreIdeasinDataMining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 AssociationRulesandRecommendationSystems . . . . . . . . . . . . . . . . . 16 PredictiveAnalytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 DataReductionandDimensionReduction . . . . . . . . . . . . . . . . . . . . 17 DataExplorationandVisualization . . . . . . . . . . . . . . . . . . . . . . . . 17 SupervisedandUnsupervisedLearning . . . . . . . . . . . . . . . . . . . . . . 18 2.3 TheStepsinDataMining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 PreliminarySteps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 OrganizationofDatasets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 PredictingHomeValuesintheWestRoxburyNeighborhood . . . . . . . . . . . 21 vii viii CONTENTS LoadingandLookingattheDatainR . . . . . . . . . . . . . . . . . . . . . . 22 SamplingfromaDatabase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 OversamplingRareEventsinClassificationTasks . . . . . . . . . . . . . . . . . 25 PreprocessingandCleaningtheData. . . . . . . . . . . . . . . . . . . . . . . 26 2.5 PredictivePowerandOverfitting . . . . . . . . . . . . . . . . . . . . . . . . . 33 Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 CreationandUseofDataPartitions . . . . . . . . . . . . . . . . . . . . . . . 35 2.6 BuildingaPredictiveModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 ModelingProcess . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7 UsingRforDataMiningonaLocalMachine . . . . . . . . . . . . . . . . . . . 43 2.8 AutomatingDataMiningSolutions . . . . . . . . . . . . . . . . . . . . . . . . 43 DataMiningSoftware: TheStateoftheMarket(byHerbEdelstein). . . . . . . . 45 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 PART II DATA EXPLORATION AND DIMENSION REDUCTION CHAPTER 3 Data Visualization 55 3.1 UsesofDataVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 BaseRorggplot? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2 DataExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Example1: BostonHousingData . . . . . . . . . . . . . . . . . . . . . . . . 57 Example2: RidershiponAmtrakTrains. . . . . . . . . . . . . . . . . . . . . . 59 3.3 BasicCharts: BarCharts,LineGraphs,andScatterPlots . . . . . . . . . . . . . 59 DistributionPlots: BoxplotsandHistograms . . . . . . . . . . . . . . . . . . . 61 Heatmaps: VisualizingCorrelationsandMissingValues . . . . . . . . . . . . . . 64 3.4 MultidimensionalVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . 67 AddingVariables: Color,Size,Shape,MultiplePanels,andAnimation . . . . . . . 67 Manipulations: Rescaling,AggregationandHierarchies,Zooming,Filtering . . . . 70 Reference: TrendLinesandLabels . . . . . . . . . . . . . . . . . . . . . . . . 74 ScalinguptoLargeDatasets. . . . . . . . . . . . . . . . . . . . . . . . . . . 74 MultivariatePlot: ParallelCoordinatesPlot. . . . . . . . . . . . . . . . . . . . 75 InteractiveVisualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.5 SpecializedVisualizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 VisualizingNetworkedData . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 VisualizingHierarchicalData: Treemaps . . . . . . . . . . . . . . . . . . . . . 82 VisualizingGeographicalData: MapCharts . . . . . . . . . . . . . . . . . . . . 83 3.6 Summary: MajorVisualizationsandOperations,byDataMiningGoal . . . . . . . 86 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 TimeSeriesForecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 UnsupervisedLearning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 CHAPTER 4 Dimension Reduction 91 4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2 CurseofDimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.