ebook img

Discovering Knowledge in Data: An Introduction to Data Mining PDF

241 Pages·2004·5.91 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Discovering Knowledge in Data: An Introduction to Data Mining

WY045-FM September24,2004 10:2 DISCOVERING KNOWLEDGE IN DATA An Introduction to Data Mining DANIEL T. LAROSE DirectorofDataMining CentralConnecticutStateUniversity AJOHNWILEY&SONS,INC.,PUBLICATION iii WY045-FM September24,2004 10:2 vi WY045-FM September24,2004 10:2 DISCOVERING KNOWLEDGE IN DATA i WY045-FM September24,2004 10:2 ii WY045-FM September24,2004 10:2 DISCOVERING KNOWLEDGE IN DATA An Introduction to Data Mining DANIEL T. LAROSE DirectorofDataMining CentralConnecticutStateUniversity AJOHNWILEY&SONS,INC.,PUBLICATION iii WY045-FM September24,2004 10:2 Copyright©2005byJohnWiley&Sons,Inc.Allrightsreserved. PublishedbyJohnWiley&Sons,Inc.,Hoboken,NewJersey. PublishedsimultaneouslyinCanada. Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedinanyform orbyanymeans,electronic,mechanical,photocopying,recording,scanning,orotherwise,exceptas permittedunderSection107or108ofthe1976UnitedStatesCopyrightAct,withouteithertheprior writtenpermissionofthePublisher,orauthorizationthroughpaymentoftheappropriateper-copyfeeto theCopyrightClearanceCenter,Inc.,222RosewoodDrive,Danvers,MA01923,978-750-8400, fax978-646-8600,oronthewebatwww.copyright.com.RequeststothePublisherforpermissionshould beaddressedtothePermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken, NJ07030,(201)748-6011,fax(201)748-6008. LimitofLiability/DisclaimerofWarranty:Whilethepublisherandauthorhaveusedtheirbesteffortsin preparingthisbook,theymakenorepresentationsorwarrantieswithrespecttotheaccuracyor completenessofthecontentsofthisbookandspecificallydisclaimanyimpliedwarrantiesof merchantabilityorfitnessforaparticularpurpose.Nowarrantymaybecreatedorextendedbysales representativesorwrittensalesmaterials.Theadviceandstrategiescontainedhereinmaynotbesuitable foryoursituation.Youshouldconsultwithaprofessionalwhereappropriate.Neitherthepublishernor authorshallbeliableforanylossofprofitoranyothercommercialdamages,includingbutnotlimitedto special,incidental,consequential,orotherdamages. ForgeneralinformationonourotherproductsandservicespleasecontactourCustomerCareDepartment withintheU.S.at877-762-2974,outsidetheU.S.at317-572-3993orfax317-572-4002. Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappearsinprint, however,maynotbeavailableinelectronicformat. LibraryofCongressCataloging-in-PublicationData: Larose,DanielT. Discoveringknowledgeindata:anintroductiontodatamining/DanielT.Larose p. cm. Includesbibliographicalreferencesandindex. ISBN0-471-66657-2(cloth) 1.Datamining. I.Title. QA76.9.D343L38 2005 006.3(cid:1)12—dc22 2004003680 PrintedintheUnitedStatesofAmerica 10 9 8 7 6 5 4 3 2 1 iv WY045-FM September24,2004 10:2 Dedication Tomyparents, Andtheirparents, Andsoon... Formychildren, Andtheirchildren, Andsoon... 2004 Chantal Larose v WY045-FM September24,2004 10:2 vi WY045-FM September24,2004 10:2 CONTENTS PREFACE xi 1 INTRODUCTIONTODATAMINING 1 WhatIsDataMining? 2 WhyDataMining? 4 NeedforHumanDirectionofDataMining 4 Cross-IndustryStandardProcess:CRISP–DM 5 CaseStudy1:AnalyzingAutomobileWarrantyClaims:Exampleofthe CRISP–DMIndustryStandardProcessinAction 8 FallaciesofDataMining 10 WhatTasksCanDataMiningAccomplish? 11 Description 11 Estimation 12 Prediction 13 Classification 14 Clustering 16 Association 17 CaseStudy2:PredictingAbnormalStockMarketReturnsUsing NeuralNetworks 18 CaseStudy3:MiningAssociationRulesfromLegalDatabases 19 CaseStudy4:PredictingCorporateBankruptciesUsingDecisionTrees 21 CaseStudy5:ProfilingtheTourismMarketUsingk-MeansClusteringAnalysis 23 References 24 Exercises 25 2 DATAPREPROCESSING 27 WhyDoWeNeedtoPreprocesstheData? 27 DataCleaning 28 HandlingMissingData 30 IdentifyingMisclassifications 33 GraphicalMethodsforIdentifyingOutliers 34 DataTransformation 35 Min–MaxNormalization 36 Z-ScoreStandardization 37 NumericalMethodsforIdentifyingOutliers 38 References 39 Exercises 39 vii

Description:
There is a lot to like about this book, but it has some unfortunate flaws. Note that it is part of a Data Mining trilogy. The other two books are: Data Mining Methods and Models and Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage. My initial reaction was more negative a
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.