ebook img

Graph mining: laws, tools, and case studies PDF

209 Pages·2012·4.764 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Graph mining: laws, tools, and case studies

Series ISSN: 2151-0067 C Series ISSSerNie: s2 I1S5S1N-0: 0261751-0067 C H C M H A H MM SYSNYSTNYHTNHETSHEISESI SSLI SLE LCETCEUTCRUTERUSER SOE SNO NON AKRA KRAB AKRA &&&CCCMMoMororgrgagnana & n & C & Cl aClyalpyaopyopoool olP luP ubPulbilbsilhsihsehrersesrs DATA MINING AND KNOWLEDGE DISCOVERY B A B DADTAA MTA IMNININGI NAGN DA NKDN KOWNOLEWDLGEED GDEI SDCIOSVCEORVYERY AR RT AR T I • T Series Editors: Jiawei Han, University of Illinois at Urbana-Champaign, I • FA FAL I • FA SeriesS Eerdiietso Ersd: iJtioarwse: iJ Hiawanei, UHnainv,e Ursnitiyv oefr sIiltlyin oofi Is lalitn Uoirsb aatn Ua-rCbahnaam-pCahiagmn,paign, L O L Lise Getoor, University of Maryland, Wei Wang, University of North Carolina, Chapel Hill, O U O Lise GLeitsoeo Gr, eUtonoivr,e Ursnitiyv oefr sMitya royf lManadr, yWlaneid W, Waneig W, Uannigv,e Ursnitiyv oefr sNitoyr otfh NCoarrtohl iCnaa,r oClihnaap,e Cl Hhaiplle,l Hill, U T U Graph Mining Johannes Gehrke, Cornell University, Robert Grossman, University of Illinois at Chicago T SO T GGrarpaphh M Minininingg JohanJnoehs aGnnehesr kGe,e Chrokrne,e lCl oUrnneivlle Ursintiyv, eRrsoibtye, rRt oGbreorsts Gmraons, sUmnainv,e Ursnitiyv oefr sIiltlyin oofi Is lalitn Coihs iacat gCohicago S S O S O S S Graph Mining GraGprha pMhi Mnininging Laws, Tools, and Case Studies Laws, Tools, and Case Studies LaLwasw, Ts,o Toloso, lasn, adn Cda Csea Sset uSdtuiedsies LawLs, aTwoos,l sT, oaonlds, Canadse C Satused Sietsudies Deepayan Chakrabarti, Facebook Inc. DeepaDyeaenp Cayhaank rCahbaakrtria, bFaarcteib,o Foak cIenboc.ok Inc. Christos Faloutsos, Carnegie Mellon University ChrisCtohs rFisatloosu Ftsaolos,u Ctsaorsn,e Cgiae rMneeglileo nM Uelnloivne Ursintiyversity G What does the Web look like? How can we find patterns, communities, outliers, in a social network? Which G R G What dWoehs atth ed oWese bth leo oWk elbik leo?o kH loikwe? c aHno wwe c fiann dw pea fittnedrn psa, tctoemrnms,u cnoimtimes,u onuittilieesr, so,u itnl iaer sso, ciina la n seotcwiaol rnke?t wWohrikc?h Which R A R are the most central nodes in a network? These are the questions that motivate this work. Networks A P A are thea mreo tsth ec emntorsat lc nenotdreasl inno ad ens eitnw ao rnke?t wTohreks?e Tarhee sthe ea rqeu tehstei oqnuse stthioatn sm tohtaivt amteo tthiviast we othrkis. Nwoetrwk.o Nrkestworks P H P and graphs appear in many diverse settings, for example in social networks, computer-communication H M H andn gertawapnohdrs kg sar pa(pipnehtasrr ua ispnipo mena adrn eiynt e dmcitviaeonrnys,e dt srieavtfetfriisnce gm sse,a tfntoiarng geexsm,a fmeonrp tel)ex, aipnmr osptoleceii nain-l p nsroeotctwieaoiln rn kiesnt, twceooramrckptsi,uo ctnoe rmn-ecptouwmtoemrr-kucson iminc ambtiiuoonlnoicgayt,ion M IN M netdwoocrnukemst w(einnottr-rkutses (xiiotn nbt ridupesatireotcnitt eido gentr,ea tcprtahifosf niicn, tm rtaeafxnftia crg emetmraienevanagtl)e, ,m ppereronstote)ni, n-p-arpcocrtooetuineni-ntp grinorattepeirhnasc itinniot enfir nancaetntiwocinoa lrn kfersta wiunod br dkiosel tioengc tybi,ioonlo,gy, IN IN IN documdeonctu-tmexetn bt-ipteaxrtt ibteip garratpithe sg irna ptehxst i rne ttreixetv rael,t rpieervsaol,n p-earcscoonu-natc gcoraupnht sg irna pfihnsa innc ifianl afrnacuiadl dfreateucdt idoent,ection, IN G IN and others. G G and otahnedrs .others. Deepayan Chakrabarti In this work, first we list several surprising patterns that real graphs tend to follow. Then we give DeDepeaeypaany aCnh Cakhraakbraarbtai rti In thisI wn othrkis, fiwrostr kw, efi rlisstt w seev leisrta ls esuverprarli ssiunrgp priastitnegr npsa tthteartn rse tahl agtr arepahl sg treanpdh st ot efnodll otow .f Tolhloewn. wTeh egniv ewe give a detailed list of generators that try to mirror these patterns. Generators are important, because they Christos Faloutsos a detaiale dde ltiasitl eodf gliesnt eorfa gtoernse trhataot rtsr yth taot mtryir rtoor m thirersoer ptahtetseer npsa. tGteernnse.r aGtoernse arareto irms paroer tiamnpt,o bretacnaut,s bee tchaeuyse they can help with “what if ” scenarios, extrapolations, and anonymization. Then we provide a list of ChCrihstroisst Fosa lFoaultosuotssos can heclapn w hitehlp “ wwhitaht “iwf ”h sacte inf a”r sioces,n eaxritorasp, oexlattriaopnosl,a atinodn sa, naonndy manioznatyimoniz. aTthioenn. Tweh epnr owveid per ao vliidste oaf list of powerful tools for graph analysis, and specifically spectral methods (Singular Value Decomposition powerfpuol wtoeorflus lf otor oglrsa fpohr ganraaplyhs iasn, aalnydsi ssp, aencidfi scpalelcyi fsipceaclltyr aslp mecettrhalo mdse (tShiondgsu (lSarin Vgaulluaer DVaelcuoem Dpeocsoitmiopnosition (SVD)), tensors, and case studies like the famous “pageRank” algorithm and the “HITS” algorithm (SVD)()S, VteDns)o),r st,e annsodr sc,a asen dst cuadsiee ss tluikdei etsh lei kfea mthoeu fsa “mpaoguesR “panagke” Ralagnokri”t ahlmgo rainthdm th aen “dH tIhTe S“”H aIlgToSr”i tahlmgorithm for ranking web search results. Finally, we conclude with a survey of tools and observations from for ranfkoirn rga nwkeibn gs ewarecbh sreeasruclhts r. eFsiunltasl.l yF, iwnea lclyo, nwcelu cdoen wcliutdhe a w suitrhv eay s oufr vteoyo losf a tnodo los basnedrv oabtisoenrsv aftrioomns from related fields like sociology, which provide complementary viewpoints. relatedr efilealtdesd lfiikeeld sso lciikoel osgoyc,i owlhoigcyh, wprhoicvhid pe rcoovmidpel ecmomenptlaermy evnietawrpy oviinetws.points. About SYNTHESIs AbouAtb oSYuNtT SHYNESTIHs ESIs This volume is a printed version of a work that appears in the Synthesis M This voTluhmis ev oisl uam per iinst ead p vrienrtseiodn v oerfs aio wn oorfk at hwaot rakp tpheaatr sa pinp etahres Siny nththee sSiysnthesis M O M Digital Library of Engineering and Computer Science. Synthesis Lectures O R O DigitalD Liigbirtaarl yL oibf rEanryg ionfe Eernignign aeenrdi nCgo amnpdu Cteorm Spciuetnecre S. Sciyenntche.e Ssiysn Ltheecstuisr eLsectures provide concise, original presentations of important research and development R G R providep croonvcidisee ,c oornicgiisnea, lo prirgeisneanlt aptrieosnesn toaft iimonpso orft aimntp roersteaanrct hr easnedar dchev aenlodp dmeveenltopment G A G topics, published quickly, in digital and print formats. For more information A N A topivciss,i tpt ouwpbwilciwssh,. pmeudob qrlgiusaihcnekcdlyl a,q iyunpi codkoilgyli.,t cianol madnigdi tparli annt dfo prrminatt sf.o Frmora mts.o Freo rin mfoorrme aintifoonrmation N & N visit wwviwsi.tm wowrgwa.nmcolarygpaonocll.acyopmool.com ISBN: 978-1-60845-115-9 &C CL &C SYNTHESIS LECTURES ON MoMroMgroagrnag &n a &Cn l &Ca lyCaplyoapoyoploo Plou Plbu lPbiuslbhislehirsehsresrsISBN: I9S7B8N-:1 -967088-4159--061001980504-00599-00100105-09 LAYP AYPO LAYP SDYDANSDTATAYTAHN AMTET AMSHI MINESIIN SNLIIINGNSEI GNLCA NGTAEUD NCARD NTKEU DKNSR K ONOEWONSN WOLOEWLNDELDGEEGD EGD EDI SDICSIOCSVOCEVOREVYREYRY www.morganclaypool.com 9 781608 451159 O O O www.wmwowr.gmanorcglaaynpcolaoylp.coooml.com 9 7816908784156101859451159 O L O Jiawei Han, Lise Getoor, Wei Wang, Johannes Gehrke, Robert Grossman, Series Editors L L JiaweiJ iHawanei, LHiasen ,G Leistoe oGr, eWtoeoir ,W Waenig W, Joanhgan, Jnoehsa Gnneehsr kGe,e Rhrokbee, rRt oGbreorst sGmraons,s Smeraine,s SEedriietos rEsditors Graph Mining Laws,Tools,andCaseStudies Synthesis Lectures on Data Mining and Knowledge Discovery Editors JiaweiHan,UIUC LiseGetoor,UniversityofMaryland WeiWang,UniversityofNorthCarolina,ChapelHill JohannesGehrke,CornellUniversity RobertGrossman,UniversityofChicago SynthesisLecturesonDataMiningandKnowledgeDiscoveryiseditedbyJiaweiHan,Lise Getoor,WeiWang,JohannesGehrke,andRobertGrossman.Theseriespublishes50-to150-page publicationsontopicspertainingtodatamining,webmining,textmining,andknowledge discovery,includingtutorialsandcasestudies.Thescopewilllargelyfollowthepurviewofpremier computerscienceconferences,suchasKDD.Potentialtopicsinclude,butnotlimitedto,data miningalgorithms,innovativedataminingapplications,dataminingsystems,miningtext,web andsemi-structureddata,highperformanceandparallel/distributeddatamining,datamining standards,dataminingandknowledgediscoveryframeworkandprocess,dataminingfoundations, miningdatastreamsandsensordata,miningmulti-mediadata,miningsocialnetworksandgraph data,miningspatialandtemporaldata,pre-processingandpost-processingindatamining,robust andscalablestatisticalmethods,security,privacy,andadversarialdatamining,visualdatamining, visualanalytics,anddatavisualization. GraphMining:Laws,Tools,andCaseStudies D.ChakrabartiandC.Faloutsos 2012 MiningHeterogeneousInformationNetworks:PrinciplesandMethodologies YizhouSunandJiaweiHan 2012 PrivacyinSocialNetworks ElenaZheleva,EvimariaTerzi,andLiseGetoor 2012 iii CommunityDetectionandMininginSocialMedia LeiTangandHuanLiu 2010 EnsembleMethodsinDataMining:ImprovingAccuracyThroughCombiningPredictions GiovanniSeniandJohnF.Elder 2010 ModelingandDataMininginBlogosphere NitinAgarwalandHuanLiu 2009 Copyright© 2012byMorgan&Claypool Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedin anyformorbyanymeans—electronic,mechanical,photocopy,recording,oranyotherexceptforbriefquotationsin printedreviews,withoutthepriorpermissionofthepublisher. GraphMining:Laws,Tools,andCaseStudies D.ChakrabartiandC.Faloutsos www.morganclaypool.com ISBN:9781608451159 paperback ISBN:9781608451166 ebook DOI10.2200/S00449ED1V01Y201209DMK006 APublicationintheMorgan&ClaypoolPublishersseries SYNTHESISLECTURESONDATAMININGANDKNOWLEDGEDISCOVERY Lecture#6 SeriesEditors:JiaweiHan,UIUC LiseGetoor,UniversityofMaryland WeiWang,UniversityofNorthCarolina,ChapelHill JohannesGehrke,CornellUniversity RobertGrossman,UniversityofChicago SeriesISSN SynthesisLecturesonDataMiningandKnowledgeDiscovery Print2151-0067 Electronic2151-0075 Graph Mining Laws,Tools,andCaseStudies D.Chakrabarti Facebook C.Faloutsos CMU SYNTHESISLECTURESONDATAMININGANDKNOWLEDGEDISCOVERY #6 M &C Morgan &cLaypool publishers ABSTRACT WhatdoestheWeblooklike? Howcanwefindpatterns,communities,outliers,inasocialnetwork?Which arethemostcentralnodesinanetwork?Thesearethequestionsthatmotivatethiswork.Networksand graphs appear in many diverse settings,for example in social networks,computer-communication networks(intrusiondetection,trafficmanagement),protein-proteininteractionnetworksinbiology, document-textbipartitegraphsintextretrieval,person-accountgraphsinfinancialfrauddetection, andothers. Inthiswork,firstwelistseveralsurprisingpatternsthatrealgraphstendtofollow.Thenwe giveadetailedlistofgeneratorsthattrytomirrorthesepatterns.Generatorsareimportant,because theycanhelpwith“whatif”scenarios,extrapolations,andanonymization.Thenweprovidealistof powerfultoolsforgraphanalysis,andspecificallyspectralmethods(SingularValueDecomposition (SVD)),tensors,andcasestudieslikethefamous“pageRank”algorithmandthe“HITS”algorithm for ranking web search results.Finally,we conclude with a survey of tools and observations from relatedfieldslikesociology,whichprovidecomplementaryviewpoints. KEYWORDS data mining,social networks,power laws,graph generators,pagerank,singular value decomposition. vii Christos Faloutsos: To Christina, for her patience, support, and down-to- earth questions; to Michalis and Petros,for the ’99 paper that started it all. DeepayanChakrabarti:ToPurnaandmyparents,fortheirsupportandhelp, and for always being there when I needed them.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.