ebook img

Data Mining PDF

718 Pages·2012·13.11 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Mining

HAN 01-fm-i-vi-9780123814791 2011/6/18 12:35 Page i #1 Data Mining Third Edition HAN 01-fm-i-vi-9780123814791 2011/6/18 12:35 Page ii #2 TheMorganKaufmannSeriesinDataManagementSystems(SelectedTitles) JoeCelko’sData,Measurements,andStandardsinSQL JoeCelko InformationModelingandRelationalDatabases,2ndEdition TerryHalpin,TonyMorgan JoeCelko’sThinkinginSets JoeCelko BusinessMetadata BillInmon,BonnieO’Neil,LowellFryman UnleashingWeb2.0 GottfriedVossen,StephanHagemann EnterpriseKnowledgeManagement DavidLoshin ThePractitioner’sGuidetoDataQualityImprovement DavidLoshin BusinessProcessChange,2ndEdition PaulHarmon ITManager’sHandbook,2ndEdition BillHoltsnider,BrianJaffe JoeCelko’sPuzzlesandAnswers,2ndEdition JoeCelko ArchitectureandPatternsforITServiceManagement,2ndEdition,ResourcePlanning andGovernance CharlesBetz JoeCelko’sAnalyticsandOLAPinSQL JoeCelko DataPreparationforDataMiningUsingSAS MamdouhRefaat QueryingXML:XQuery,XPath,andSQL/XMLinContext JimMelton,StephenBuxton DataMining:ConceptsandTechniques,3rdEdition JiaweiHan,MichelineKamber,JianPei DatabaseModelingandDesign:LogicalDesign,5thEdition TobyJ.Teorey,SamS.Lightstone,ThomasP.Nadeau,H.V.Jagadish FoundationsofMultidimensionalandMetricDataStructures HananSamet JoeCelko’sSQLforSmarties:AdvancedSQLProgramming,4thEdition JoeCelko MovingObjectsDatabases RalfHartmutGu¨ting,MarkusSchneider JoeCelko’sSQLProgrammingStyle JoeCelko FuzzyModelingandGeneticAlgorithmsforDataMiningandExploration EarlCox HAN 01-fm-i-vi-9780123814791 2011/6/18 12:35 Page iii #3 DataModelingEssentials,3rdEdition GraemeC.Simsion,GrahamC.Witt DevelopingHighQualityDataModels MatthewWest Location-BasedServices JochenSchiller,AgnesVoisard ManagingTimeinRelationalDatabases:HowtoDesign,Update,andQueryTemporalData TomJohnston,RandallWeis (cid:13) DatabaseModelingwithMicrosoftR VisioforEnterpriseArchitects TerryHalpin,KenEvans,PatrickHallock,BillMaclean DesigningData-IntensiveWebApplications StephanoCeri,PieroFraternali,AldoBongio,MarcoBrambilla,SaraComai,MaristellaMatera MiningtheWeb:DiscoveringKnowledgefromHypertextData SoumenChakrabarti AdvancedSQL:1999—UnderstandingObject-RelationalandOtherAdvancedFeatures JimMelton DatabaseTuning:Principles,Experiments,andTroubleshootingTechniques DennisShasha,PhilippeBonnet SQL:1999—UnderstandingRelationalLanguageComponents JimMelton,AlanR.Simon InformationVisualizationinDataMiningandKnowledgeDiscovery EditedbyUsamaFayyad,GeorgesG.Grinstein,AndreasWierse TransactionalInformationSystems GerhardWeikum,GottfriedVossen SpatialDatabases PhilippeRigaux,MichelScholl,andAgnesVoisard ManagingReferenceDatainEnterpriseDatabases MalcolmChisholm UnderstandingSQLandJavaTogether JimMelton,AndrewEisenberg Database:Principles,Programming,andPerformance,2ndEdition PatrickandElizabethO’Neil TheObjectDataStandard EditedbyR.G.G.Cattell,DouglasBarry DataontheWeb:FromRelationstoSemistructuredDataandXML SergeAbiteboul,PeterBuneman,DanSuciu DataMining:PracticalMachineLearningToolsandTechniqueswithJavaImplementations, 3rdEdition IanWitten,EibeFrank,MarkA.Hall JoeCelko’sDataandDatabases:ConceptsinPractice JoeCelko DevelopingTime-OrientedDatabaseApplicationsinSQL RichardT.Snodgrass WebFarmingfortheDataWarehouse RichardD.Hackathorn HAN 01-fm-i-vi-9780123814791 2011/6/18 12:35 Page iv #4 ManagementofHeterogeneousandAutonomousDatabaseSystems EditedbyAhmedElmagarmid,MarekRusinkiewicz,AmitSheth Object-RelationalDBMSs,2ndEdition MichaelStonebraker,PaulBrown,withDorothyMoore UniversalDatabaseManagement:AGuidetoObject/RelationalTechnology CynthiaMaroSaracco ReadingsinDatabaseSystems,3rdEdition EditedbyMichaelStonebraker,JosephM.Hellerstein UnderstandingSQL’sStoredProcedures:ACompleteGuidetoSQL/PSM JimMelton PrinciplesofMultimediaDatabaseSystems V.S.Subrahmanian PrinciplesofDatabaseQueryProcessingforAdvancedApplications ClementT.Yu,WeiyiMeng AdvancedDatabaseSystems CarloZaniolo,StefanoCeri,ChristosFaloutsos,RichardT.Snodgrass,V.S.Subrahmanian, RobertoZicari PrinciplesofTransactionProcessing,2ndEdition PhilipA.Bernstein,EricNewcomer UsingtheNewDB2:IBM’sObject-RelationalDatabaseSystem DonChamberlin DistributedAlgorithms NancyA.Lynch ActiveDatabaseSystems:TriggersandRulesforAdvancedDatabaseProcessing EditedbyJenniferWidom,StefanoCeri MigratingLegacySystems:Gateways,Interfaces,andtheIncrementalApproach MichaelL.Brodie,MichaelStonebraker AtomicTransactions NancyLynch,MichaelMerritt,WilliamWeihl,AlanFekete QueryProcessingforAdvancedDatabaseSystems EditedbyJohannChristophFreytag,DavidMaier,GottfriedVossen TransactionProcessing JimGray,AndreasReuter DatabaseTransactionModelsforAdvancedApplications EditedbyAhmedK.Elmagarmid AGuidetoDevelopingClient/ServerSQLApplications SetragKhoshafian,ArvolaChan,AnnaWong,HarryK.T.Wong HAN 01-fm-i-vi-9780123814791 2011/6/18 12:35 Page v #5 Data Mining Concepts and Techniques Third Edition Jiawei Han University of Illinois at Urbana–Champaign Micheline Kamber Jian Pei Simon Fraser University AMSTERDAM•BOSTON•HEIDELBERG•LONDON NEWYORK•OXFORD•PARIS•SANDIEGO SANFRANCISCO•SINGAPORE•SYDNEY•TOKYO MorganKaufmannisanimprintofElsevier HAN 01-fm-i-vi-9780123814791 2011/6/1 22:44 Page 6 #1 MorganKaufmannPublishersisanimprintofElsevier. 225WymanStreet,Waltham,MA02451,USA (cid:13)c 2012byElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorageand retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek permission,furtherinformationaboutthePublisher’spermissionspoliciesandour arrangementswithorganizationssuchastheCopyrightClearanceCenterandtheCopyright LicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions. Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightby thePublisher(otherthanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchand experiencebroadenourunderstanding,changesinresearchmethodsorprofessionalpractices, maybecomenecessary.Practitionersandresearchersmustalwaysrelyontheirownexperience andknowledgeinevaluatingandusinganyinformationormethodsdescribedherein.Inusing suchinformationormethodstheyshouldbemindfuloftheirownsafetyandthesafetyofothers, includingpartiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors, assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products, instructions,orideascontainedinthematerialherein. LibraryofCongressCataloging-in-PublicationData Han,Jiawei. Datamining:conceptsandtechniques/JiaweiHan,MichelineKamber,JianPei.–3rded. p. cm. ISBN978-0-12-381479-1 1.Datamining.I.Kamber,Micheline.II.Pei,Jian.III.Title. QA76.9.D343H362011 (cid:48) 006.312–dc22 2011010635 BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary. ForinformationonallMorganKaufmannpublications,visitour Websiteatwww.mkp.comorwww.elsevierdirect.com PrintedintheUnitedStatesofAmerica 11 12 13 14 15 10 9 8 7 6 5 4 3 2 1 HAN 02-ded-vii-viii-9780123814791 2011/6/1 23:09 Page vii #1 ToY.DoraandLawrenceforyourloveandencouragement J.H. ToErik,Kevan,Kian,andMikaelforyourloveandinspiration M.K. Tomywife,Jennifer,anddaughter,Jacqueline J.P. HAN 04-fore-xix-xxii-9780123814791 2011/6/1 23:18 Page xix #1 Foreword Analyzinglargeamountsofdataisanecessity.Evenpopularsciencebooks,like“super crunchers,” give compelling cases where large amounts of data yield discoveries and intuitionsthatsurpriseevenexperts.Everyenterprisebenefitsfromcollectingandana- lyzingitsdata:Hospitalscanspottrendsandanomaliesintheirpatientrecords,search enginescandobetterrankingandadplacement,andenvironmentalandpublichealth agencies can spot patterns and abnormalities in their data. The list continues, with cybersecurity and computer network intrusion detection; monitoring of the energy consumptionofhouseholdappliances;patternanalysisinbioinformaticsandpharma- ceuticaldata;financialandbusinessintelligencedata;spottingtrendsinblogs,Twitter, andmanymore.Storageisinexpensiveandgettingevenlessso,asaredatasensors.Thus, collectingandstoringdataiseasierthaneverbefore. Theproblemthenbecomeshowtoanalyzethedata.Thisisexactlythefocusofthis ThirdEditionofthebook.Jiawei,Micheline,andJiangiveencyclopediccoverageofall therelatedmethods,fromtheclassictopicsofclusteringandclassification,todatabase methods(e.g.,associationrules,datacubes)tomorerecentandadvancedtopics(e.g., SVD/PCA,wavelets,supportvectormachines). Theexpositionisextremelyaccessibletobeginnersandadvancedreadersalike.The bookgivesthefundamentalmaterialfirstandthemoreadvancedmaterialinfollow-up chapters.Italsohasnumerousrhetoricalquestions,whichIfoundextremelyhelpfulfor maintainingfocus. WehaveusedthefirsttwoeditionsastextbooksindataminingcoursesatCarnegie Mellon and plan to continue to do so with this Third Edition. The new version has significant additions: Notably, it has more than 100 citations to works from 2006 onward, focusing on more recent material such as graphs and social networks, sen- sornetworks,andoutlierdetection.Thisbookhasanewsectionforvisualization,has expandedoutlierdetectionintoawholechapter,andhasseparatechaptersforadvanced xix HAN 04-fore-xix-xxii-9780123814791 2011/6/1 23:18 Page xx #2 xx Foreword methods—for example, pattern mining with top-k patterns and more and clustering methodswithbiclusteringandgraphclustering. Overall,itisanexcellentbookonclassicandmoderndataminingmethods,anditis idealnotonlyforteachingbutalsoasareferencebook. ChristosFaloutsos CarnegieMellonUniversity HAN 04-fore-xix-xxii-9780123814791 2011/6/1 23:22 Page xxi #1 Foreword to Second Edition Wearedelugedbydata—scientificdata,medicaldata,demographicdata,financialdata, and marketing data. People have no time to look at this data. Human attention has become the precious resource. So, we must find ways to automatically analyze the data, to automatically classify it, to automatically summarize it, to automatically dis- cover and characterize trends in it, and to automatically flag anomalies. This is one ofthemostactiveandexcitingareasofthedatabaseresearchcommunity.Researchers in areas including statistics, visualization, artificial intelligence, and machine learning are contributing to this field. The breadth of the field makes it difficult to grasp the extraordinaryprogressoverthelastfewdecades. Sixyearsago,JiaweiHan’sandMichelineKamber’sseminaltextbookorganizedand presentedDataMining.Itheraldedagoldenageofinnovationinthefield.Thisrevision oftheirbookreflectsthatprogress;morethanhalfofthereferencesandhistoricalnotes aretorecentwork.Thefieldhasmaturedwithmanynewandimprovedalgorithms,and hasbroadenedtoincludemanymoredatatypes:streams,sequences,graphs,time-series, geospatial,audio,images,andvideo.Wearecertainlynotattheendofthegoldenage— indeedresearchandcommercialinterestindataminingcontinuestogrow—butweare allfortunatetohavethismoderncompendium. The book gives quick introductions to database and data mining concepts with particular emphasis on data analysis. It then covers in a chapter-by-chapter tour the concepts and techniques that underlie classification, prediction, association, and clus- tering.Thesetopicsarepresentedwithexamples,atourofthebestalgorithmsforeach problemclass,andwithpragmaticrulesofthumbaboutwhentoapplyeachtechnique. The Socratic presentation style is both very readable and very informative. I certainly learnedalotfromreadingthefirsteditionandgotre-educatedandupdatedinreading thesecondedition. Jiawei Han and Micheline Kamber have been leading contributors to data mining research. This is the text they use with their students to bring them up to speed on xxi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.