ebook img

Data Mining Methods and Models PDF

340 Pages·2006·6.203 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Mining Methods and Models

SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 DATA MINING METHODS AND MODELS DANIEL T. LAROSE DepartmentofMathematicalSciences CentralConnecticutStateUniversity AJOHNWILEY&SONS,INCPUBLICATION iii SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 DATA MINING METHODS AND MODELS i SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 ii SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 DATA MINING METHODS AND MODELS DANIEL T. LAROSE DepartmentofMathematicalSciences CentralConnecticutStateUniversity AJOHNWILEY&SONS,INCPUBLICATION iii SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 Copyright(cid:1)C 2006byJohnWiley&Sons,Inc.Allrightsreserved. PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada. Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyform or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permittedunderSection107or108ofthe1976UnitedStatesCopyrightAct,withouteithertheprior writtenpermissionofthePublisher,orauthorizationthroughpaymentoftheappropriateper-copyfee totheCopyrightClearanceCenter,Inc.,222RosewoodDrive,Danvers,MA01923,978-750-8400,fax 978-646-8600,oronthewebatwww.copyright.com.RequeststothePublisherforpermissionshouldbe addressedtothePermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030, (201)748–6011,fax(201)748–6008oronlineathttp://www.wiley.com/go/permission. LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbesteffortsin preparingthisbook,theymakenorepresentationsorwarrantieswithrespecttotheaccuracyorcompleteness ofthecontentsofthisbookandspecificallydisclaimanyimpliedwarrantiesofmerchantabilityorfitness foraparticularpurpose.Nowarrantymaybecreatedorextendedbysalesrepresentativesorwrittensales materials.Theadviceandstrategiescontainedhereinmaynotbesuitableforyoursituation.Youshould consultwithaprofessionalwhereappropriate.Neitherthepublishernorauthorshallbeliableforanyloss ofprofitoranyothercommercialdamages,includingbutnotlimitedtospecial,incidental,consequential, orotherdamages. ForgeneralinformationonourotherproductsandservicespleasecontactourCustomerCareDepartment withintheU.S.at877-762-2974,outsidetheU.S.at317-572-3993orfax317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print, however,maynotbeavailableinelectronicformat.FormoreinformationaboutWileyproducts,visitour websiteatwww.wiley.com LibraryofCongressCataloging-in-PublicationData: Larose,DanielT. Dataminingmethodsandmodels/DanielT.Larose. p. cm. Includesbibliographicalreferences. ISBN-13978-0-471-66656-1 ISBN-100-471-66656-4(cloth) 1.Datamining. I.Title. QA76.9.D343L3782005 005.74–dc22 2005010801 PrintedintheUnitedStatesofAmerica 10 9 8 7 6 5 4 3 2 1 iv SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 DEDICATION Tothosewhohavegonebefore, includingmyparents,ErnestLarose(1920–1981) andIreneLarose(1924–2005), andmydaughter,EllyrianeSoleilLarose(1997–1997); Forthosewhocomeafter, includingmydaughters,ChantalDanielleLarose(1988) andRavelRenaissanceLarose(1999), andmyson,TristanSpringLarose(1999). v SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 vi SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 CONTENTS PREFACE xi 1 DIMENSIONREDUCTIONMETHODS 1 NeedforDimensionReductioninDataMining 1 PrincipalComponentsAnalysis 2 ApplyingPrincipalComponentsAnalysistotheHousesDataSet 5 HowManyComponentsShouldWeExtract? 9 ProfilingthePrincipalComponents 13 Communalities 15 ValidationofthePrincipalComponents 17 FactorAnalysis 18 ApplyingFactorAnalysistotheAdultDataSet 18 FactorRotation 20 User-DefinedComposites 23 ExampleofaUser-DefinedComposite 24 Summary 25 References 28 Exercises 28 2 REGRESSIONMODELING 33 ExampleofSimpleLinearRegression 34 Least-SquaresEstimates 36 CoefficientofDetermination 39 StandardErroroftheEstimate 43 CorrelationCoefficient 45 ANOVATable 46 Outliers,HighLeveragePoints,andInfluentialObservations 48 RegressionModel 55 InferenceinRegression 57 t-TestfortheRelationshipBetweenxandy 58 ConfidenceIntervalfortheSlopeoftheRegressionLine 60 ConfidenceIntervalfortheMeanValueofyGivenx 60 PredictionIntervalforaRandomlyChosenValueofyGivenx 61 VerifyingtheRegressionAssumptions 63 Example:BaseballDataSet 68 Example:CaliforniaDataSet 74 TransformationstoAchieveLinearity 79 Box–CoxTransformations 83 Summary 84 References 86 Exercises 86 vii SPH SPH JWDD006-FM JWDD006-Larose November23,2005 14:49 CharCount=0 viii CONTENTS 3 MULTIPLEREGRESSIONANDMODELBUILDING 93 ExampleofMultipleRegression 93 MultipleRegressionModel 99 InferenceinMultipleRegression 100 t-TestfortheRelationshipBetweenyandxi 101 F-TestfortheSignificanceoftheOverallRegressionModel 102 ConfidenceIntervalforaParticularCoefficient 104 ConfidenceIntervalfortheMeanValueofyGivenx1,x2,...,xm 105 PredictionIntervalforaRandomlyChosenValueofyGivenx1,x2,...,xm 105 RegressionwithCategoricalPredictors 105 AdjustingR2:PenalizingModelsforIncludingPredictorsThatAre NotUseful 113 SequentialSumsofSquares 115 Multicollinearity 116 VariableSelectionMethods 123 PartialF-Test 123 ForwardSelectionProcedure 125 BackwardEliminationProcedure 125 StepwiseProcedure 126 BestSubsetsProcedure 126 All-Possible-SubsetsProcedure 126 ApplicationoftheVariableSelectionMethods 127 ForwardSelectionProcedureAppliedtotheCerealsDataSet 127 BackwardEliminationProcedureAppliedtotheCerealsDataSet 129 StepwiseSelectionProcedureAppliedtotheCerealsDataSet 131 BestSubsetsProcedureAppliedtotheCerealsDataSet 131 Mallows’CpStatistic 131 VariableSelectionCriteria 135 UsingthePrincipalComponentsasPredictors 142 Summary 147 References 149 Exercises 149 4 LOGISTICREGRESSION 155 SimpleExampleofLogisticRegression 156 MaximumLikelihoodEstimation 158 InterpretingLogisticRegressionOutput 159 Inference:ArethePredictorsSignificant? 160 InterpretingaLogisticRegressionModel 162 InterpretingaModelforaDichotomousPredictor 163 InterpretingaModelforaPolychotomousPredictor 166 InterpretingaModelforaContinuousPredictor 170 AssumptionofLinearity 174 Zero-CellProblem 177 MultipleLogisticRegression 179 IntroducingHigher-OrderTermstoHandleNonlinearity 183 ValidatingtheLogisticRegressionModel 189 WEKA:Hands-onAnalysisUsingLogisticRegression 194 Summary 197

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.