Use R! SeriesEditors: RobertGentleman KurtHornik GiovanniParmigiani Forfurthervolumes: http://www.springer.com/series/6991 Wolfgang Jank Business Analytics for Managers 123 WolfgangJank DepartmentofDecisionandInformationTechnologies RobertH.SmithSchoolofBusiness UniversityofMaryland VanMunchingHall CollegePark,MD20742-1815 USA [email protected] SeriesEditors: RobertGentleman KurtHornik PrograminComputationalBiology DepartmentofStatistikandMathematik DivisionofPublicHealthSciences Wirtschaftsuniversita¨tWien FredHutchinsonCancerResearchCenter Augasse2-6 1100FairviewAvenue,N.M2-B876 A-1090Wien Seattle,Washington98109 Austria USA GiovanniParmigiani TheSidneyKimmelComprehensive CancerCenteratJohnsHopkinsUniversity 550NorthBroadway Baltimore,MD21205-2011 USA ISBN978-1-4614-0405-7 e-ISBN978-1-4614-0406-4 DOI10.1007/978-1-4614-0406-4 SpringerNewYorkDordrechtHeidelbergLondon LibraryofCongressControlNumber:2011934258 (cid:2)c SpringerScience+BusinessMedia,LLC2011 Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY10013, USA),except forbrief excerpts inconnection with reviews orscholarly analysis. Usein connectionwithanyformofinformationstorageandretrieval,electronicadaptation,computersoftware, orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. Theuseinthispublicationoftradenames,trademarks,servicemarks,andsimilarterms,eveniftheyare notidentifiedassuch,isnottobetakenasanexpressionofopinionastowhetherornottheyaresubject toproprietaryrights. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) To myFamily: Angel,Isabella,Alexander,Waltraud, Gerhard,andSabina Preface This book is about analytics and data-driven decision making. As such, it could easilybemistakenforabookonstatisticsordatamining.Infact,thisbookconveys ideasandconceptsfrombothstatisticsanddatamining,withthegoalofextracting knowledge and actionable insight for managers. However, this is not a statistics book.Thereexistthousandsofbooksonthetopicofstatistics.Mostofthesebooks are written by statisticians for statisticians. As a result, they often focus primarily onmathematics,formulas,andequationsandnotso muchonthe practicalinsight thatcanbe derivedfromthese equations.Thisbookusesconceptsandideasfrom statistics (without ever getting bogged down in too much mathematical detail) in ordertoextractinsightfromrealbusinessdata. This is also not a book on data mining. There are many good data mining books, some of which are written for data miners and computer scientists, others forpractitioners.However,mostofthesebooksfocusonalgorithmsandcomputing. Thatis,theyemphasizethemanydifferentalgorithmsthatexistinordertoextract similar information from the same set of data. This book does not emphasize algorithms. In fact, it acknowledges early on that while there may exist many differentwaysto solveand tacklea particularproblem,the goalis to conveyonly themainprinciplesofhowtodiscovernewknowledgefromdataandhowtomake data-drivendecisionsinasmartandinformedway. Andfinally,thisisalsonota bookonsoftware.Whilethisbookprovidesinits finalchapteraquick-starttooneofthemostpowerfulsoftwaresolutions,emphasis isplacedonconveyingdata-driventhinking(andnotsomuchonimplementation). Theideasdiscussedinthisbookcanbeimplementedusingmanydifferentsoftware solutionsfrom manydifferentvendors.In fact, this bookpurposefullysteers clear of software implementation since it is our experience that books that do discuss softwareoftenplacetoomuchemphasisonimplementationdetails,whichconfuses readersanddistractthemfromthemainpoint.Afterall,themainpointofthisbook is not to train new statisticians or data miners – there are better books that can accomplishthatgoal.Themainpointistoconveytheuseandvalueofdata-driven decision makingto managers.Managershardlyeverimplementcomplexmethods vii viii Preface and models themselves – however, they frequently communicate with personnel whodo.Withthatinmind,themaingoalsofthisbookareasfollows: (cid:129) Toexcitemanagersanddecisionmakersaboutthe potentialthatresidesindata andthevaluethatdataanalyticscanaddtobusinessprocesses. (cid:129) To provide managers with a basic understanding of the main concepts of data analytics and a common language to convey data-driven decision problems so they can better communicate with personnel specializing in data mining or statistics. Afterall,wearelivinginaninformation-basedsociety,andusingthatinforma- tionsmartlycanhavebenefitstoboththebusinessandtheconsumer. January2011 WolfgangJank Contents 1 Introduction .................................................................. 1 1.1 AnalyticsandBusiness ................................................. 1 1.2 GoalofThisBook ...................................................... 3 1.3 WhoShouldReadThisBook? ......................................... 4 1.4 WhatThisBookIsNot ................................................. 5 1.4.1 ThisIsNotaStatisticsBook................................... 5 1.4.2 ThisIsNotaDataMiningBook............................... 5 1.5 WhatThisBookIs...................................................... 5 1.6 StructureofThisBook.................................................. 6 1.7 UsingThisBookinaCourse........................................... 7 2 ExploringandDiscoveringData ........................................... 9 2.1 BasicDataSummariesandVisualizations:HousePriceData........ 10 2.2 DataTransformationsandTrellisGraphs:DirectMarketingData ... 20 2.3 TimeSeriesGraphs:SoftDrinkSalesData............................ 25 2.4 SpatialGraphs:OnlinePurchasePreferencesData.................... 28 2.5 Graphs for Categorical Responses: Consumer-to- ConsumerLoanData ................................................... 30 2.6 GraphsforPanelData:CustomerLoyaltyData....................... 34 3 DataModelingI–Basics.................................................... 41 3.1 Introduction:WhyDoWeNeedModels? ............................. 42 3.2 Fitting and Interpreting a Regression Model: LeastSquaresRegression............................................... 47 3.2.1 TheIdeaofLeastSquaresRegression......................... 48 3.2.2 InterpretingaFirstSimpleRegressionModel................. 50 3.2.3 EvaluatingaRegressionModel................................ 52 3.2.4 ComparingRegressionModels ................................ 55 ix
Description: