ebook img

Individual and Collective Graph Mining: Principles, Algorithms, and Applications PDF

207 Pages·2018·3.108 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Individual and Collective Graph Mining: Principles, Algorithms, and Applications

Individual and Collective Graph Mining Principles, Algorithms, and Applications Synthesis Lectures on Data Mining and Knowledge Discovery Editor JiaweiHan,UniversityofIllinoisatUrbana-Champaign LiseGetoor,UniversityofCalifornia,SantaCruz WeiWang,UniversityofCalifornia,LosAngeles JohannesGehrke,CornellUniversity RobertGrossman,UniversityofChicago SynthesisLecturesonDataMiningandKnowledgeDiscoveryiseditedbyJiaweiHan,Lise Getoor,WeiWang,JohannesGehrke,andRobertGrossman.Theseriespublishes50-to150-page publicationsontopicspertainingtodatamining,webmining,textmining,andknowledge discovery,includingtutorialsandcasestudies.Potentialtopicsinclude:dataminingalgorithms, innovativedataminingapplications,dataminingsystems,miningtext,webandsemi-structured data,highperformanceandparallel/distributeddatamining,dataminingstandards,datamining andknowledgediscoveryframeworkandprocess,dataminingfoundations,miningdatastreams andsensordata,miningmulti-mediadata,miningsocialnetworksandgraphdata,miningspatial andtemporaldata,pre-processingandpost-processingindatamining,robustandscalable statisticalmethods,security,privacy,andadversarialdatamining,visualdatamining,visual analytics,anddatavisualization. IndividualandCollectiveGraphMining:Principles,Algorithms,andApplications DanaiKoutraandChristosFaloutsos 2017 PhraseMiningfromMassiveTextandItsApplications JialuLiu,JingboShang,andJiaweiHan 2017 ExploratoryCausalAnalysiswithTimeSeriesData JamesM.McCracken 2016 MiningHumanMobilityinLocation-BasedSocialNetworks HuijiGaoandHuanLiu 2015 iii MiningLatentEntityStructures ChiWangandJiaweiHan 2015 ProbabilisticApproachestoRecommendations NicolaBarbieri,GiuseppeManco,andEttoreRitacco 2014 OutlierDetectionforTemporalData ManishGupta,JingGao,CharuAggarwal,andJiaweiHan 2014 ProvenanceDatainSocialMedia GeoffreyBarbier,ZhuoFeng,PritamGundecha,andHuanLiu 2013 GraphMining:Laws,Tools,andCaseStudies D.ChakrabartiandC.Faloutsos 2012 MiningHeterogeneousInformationNetworks:PrinciplesandMethodologies YizhouSunandJiaweiHan 2012 PrivacyinSocialNetworks ElenaZheleva,EvimariaTerzi,andLiseGetoor 2012 CommunityDetectionandMininginSocialMedia LeiTangandHuanLiu 2010 EnsembleMethodsinDataMining:ImprovingAccuracyThroughCombining Predictions GiovanniSeniandJohnF.Elder 2010 ModelingandDataMininginBlogosphere NitinAgarwalandHuanLiu 2009 Copyright©2018byMorgan&Claypool Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedin anyformorbyanymeans—electronic,mechanical,photocopy,recording,oranyotherexceptforbriefquotations inprintedreviews,withoutthepriorpermissionofthepublisher. IndividualandCollectiveGraphMining:Principles,Algorithms,andApplications DanaiKoutraandChristosFaloutsos www.morganclaypool.com ISBN:9781681730394 paperback ISBN:9781681730400 ebook DOI10.2200/S00796ED1V01Y201708DMK014 APublicationintheMorgan&ClaypoolPublishersseries SYNTHESISLECTURESONDATAMININGANDKNOWLEDGEDISCOVERY Lecture#14 SeriesEditors:JiaweiHan,UniversityofIllinoisatUrbana-Champaign LiseGetoor,UniversityofCalifornia,SantaCruz WeiWang,UniversityofCalifornia,LosAngeles JohannesGehrke,CornellUniversity RobertGrossman,UniversityofChicago SeriesISSN Print2151-0067 Electronic2151-0075 Individual and Collective Graph Mining Principles, Algorithms, and Applications Danai Koutra UniversityofMichigan,AnnArbor Christos Faloutsos CarnegieMellonUniversity SYNTHESISLECTURESONDATAMININGANDKNOWLEDGE DISCOVERY#14 M &C Morgan&cLaypool publishers ABSTRACT Graphsnaturallyrepresentinformationrangingfromlinksbetweenwebpages,tocommunica- tioninemailnetworks,toconnectionsbetweenneuronsinourbrains.Thesegraphsoftenspan billionsofnodesandinteractionsbetweenthem.Withinthisdelugeofinterconnecteddata,how can we find the most important structures and summarize them? How can we efficiently visu- alize them? How can we detect anomalies that indicate critical events, such as an attack on a computersystem,diseaseformationinthehumanbrain,orthefallofacompany? Thisbookpresentsscalable,principleddiscoveryalgorithmsthatcombineglobalitywith locality to make sense of one or more graphs. In addition to fast algorithmic methodologies, we alsocontributegraph-theoreticalideasandmodels,andreal-worldapplicationsintwomainareas. • IndividualGraphMining: We show how to interpretably summarize a single graph by identifying its important graph structures. We complement summarization with infer- ence,whichleveragesinformationaboutfewentities(obtainedviasummarizationorother methods)andthenetworkstructuretoefficientlyandeffectivelylearninformationabout theunknownentities. • Collective Graph Mining: We extend the idea of individual-graph summarization to time-evolving graphs, and show how to scalably discover temporal patterns. Apart from summarization,weclaimthatgraphsimilarityisoftentheunderlyingprobleminahostof applicationswheremultiplegraphsoccur(e.g.,temporal anomalydetection,discoveryof behavioralpatterns),andwepresentprincipled,scalablealgorithmsforaligningnetworks andmeasuringtheirsimilarity. The methods that we present in this book leverage techniques from diverse areas, such as matrix algebra, graph theory, optimization, information theory, machine learning, finance, and social science, to solve real-world problems. We present applications of our exploration algorithms to massive datasets, including a Web graph of 6:6 billion edges, a Twitter graph of 1.8billionedges,braingraphswithupto90millionedges,collaboration,peer-to-peernetworks, browserlogs,allspanningmillionsofusersandinteractions. KEYWORDS datamining,graphminingandexploration,graphsimilarity,graphmatching,net- workalignment,graphsummarization,patternmining,outlierdetection,anomaly detection, scalability, fast algorithms, models, visualization, social networks, brain graphs,connectomes vii Contents Acknowledgments ................................................. xi 1 Introduction .......................................................1 1.1 Overview ....................................................... 1 1.2 OrganizationofThisBook.......................................... 2 1.2.1 PartI:IndividualGraphMining ............................... 2 1.2.2 PartII:CollectiveGraphMining............................... 3 1.2.3 CodeandSupportingMaterialsontheWeb ..................... 5 1.3 Preliminaries..................................................... 5 1.3.1 GraphDefinitions........................................... 5 1.3.2 Graph-theoreticDataStructures ............................... 8 1.3.3 LinearAlgebraConcepts ..................................... 9 1.3.4 SelectGraphProperties ..................................... 11 1.4 CommonSymbols ............................................... 12 PARTI IndividualGraphMining . . . . . . . . . . . . . . . . . . 15 2 SummarizationofStaticGraphs......................................17 2.1 OverviewandMotivation ......................................... 18 2.2 ProblemFormulation ............................................. 19 2.2.1 MDLforGraphSummarization .............................. 21 2.2.2 EncodingtheModel ........................................ 23 2.2.3 EncodingtheErrors ........................................ 25 2.3 VoG:Vocabulary-basedSummarizationofGraphs ..................... 25 2.3.1 SubgraphGeneration ....................................... 26 2.3.2 SubgraphLabeling ......................................... 26 2.3.3 SummaryAssembly......................................... 28 2.3.4 ToyExample .............................................. 29 2.3.5 TimeComplexity .......................................... 29 2.4 EmpiricalResults ................................................ 30 viii 2.4.1 QuantitativeAnalysis ...................................... 31 2.4.2 QualitativeAnalysis ........................................ 35 2.4.3 Scalability ................................................ 43 2.5 Discussion...................................................... 44 2.6 RelatedWork ................................................... 46 3 InferenceinaGraph................................................49 3.1 Guilt-by-associationTechniques .................................... 50 3.1.1 RandomWalkwithRestarts(RWR) ........................... 50 3.1.2 Semi-supervisedLearning(SSL) .............................. 51 3.1.3 BeliefPropagation(BP) ..................................... 51 3.1.4 Summary ................................................. 53 3.2 FaBP:FastBeliefPropagation...................................... 53 3.2.1 Derivation ................................................ 58 3.2.2 AnalysisofConvergence..................................... 63 3.2.3 Algorithm ................................................ 64 3.3 ExtensiontoMultipleClasses ...................................... 65 3.4 EmpiricalResults ................................................ 68 3.4.1 Accuracy ................................................. 69 3.4.2 Convergence .............................................. 69 3.4.3 Robustness................................................ 70 3.4.4 Scalability ................................................ 70 PARTII CollectiveGraphMining . . . . . . . . . . . . . . . . . 73 4 SummarizationofDynamicGraphs...................................75 4.1 ProblemFormulation ............................................. 77 4.1.1 MDLforDynamicGraphSummarization ...................... 79 4.1.2 EncodingtheModel ........................................ 80 4.1.3 EncodingtheErrors ........................................ 81 4.2 TimeCrunch:Vocabulary-basedSummarizationofDynamicGraphs...... 83 4.2.1 GeneratingCandidateStaticStructures......................... 83 4.2.2 LabelingCandidateStaticStructures........................... 84 4.2.3 StitchingCandidateTemporalStructures ....................... 84 4.2.4 ComposingtheSummary.................................... 86 ix 4.3 EmpiricalResults ................................................ 87 4.3.1 QuantitativeAnalysis ....................................... 88 4.3.2 QualitativeAnalysis ........................................ 90 4.3.3 Scalability ................................................ 92 4.4 RelatedWork ................................................... 93 5 GraphSimilarity...................................................97 5.1 Intuition ....................................................... 97 5.1.1 Overview ................................................. 99 5.1.2 MeasuringNodeAffinities ................................... 99 5.1.3 LeveragingBeliefPropagation ............................... 100 5.1.4 DesiredPropertiesforSimilarityMeasures ..................... 101 5.2 DeltaCon:“(cid:14)”ConnectivityChangeDetection ...................... 102 5.2.1 AlgorithmDescription ..................................... 102 5.2.2 FasterComputation ....................................... 103 5.2.3 DesiredProperties......................................... 106 5.3 DeltaCon-Attr:AddingNodeandEdgeAttribution ................ 112 5.3.1 AlgorithmDescription ..................................... 113 5.3.2 ScalabilityAnalysis ........................................ 115 5.4 EmpiricalResults ............................................... 115 5.4.1 Intuitivenessof DeltaCon ................................. 115 5.4.2 Intuitivenessof DeltaCon-Attr ........................... 123 5.4.3 Scalability ............................................... 130 5.4.4 Robustness............................................... 131 5.5 Applications ................................................... 132 5.5.1 Enron................................................... 132 5.5.2 BrainConnectivityGraphClustering ......................... 134 5.5.3 RecoveryofConnectomeCorrespondences..................... 135 5.6 RelatedWork .................................................. 138 6 GraphAlignment .................................................143 6.1 ProblemFormulation ............................................ 144 6.2 BiG-Align:BipartiteGraphAlignment ............................ 146 6.2.1 MathematicalFormulation .................................. 146 6.2.2 Problem-specificOptimizations .............................. 149 6.2.3 AlgorithmDescription ..................................... 154 6.3 Uni-Align:ExtensiontoUnipartiteGraphAlignment ................ 154

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.