Table Of ContentIndividual and Collective
Graph Mining
Principles, Algorithms, and Applications
Synthesis Lectures on Data
Mining and Knowledge
Discovery
Editor
JiaweiHan,UniversityofIllinoisatUrbana-Champaign
LiseGetoor,UniversityofCalifornia,SantaCruz
WeiWang,UniversityofCalifornia,LosAngeles
JohannesGehrke,CornellUniversity
RobertGrossman,UniversityofChicago
SynthesisLecturesonDataMiningandKnowledgeDiscoveryiseditedbyJiaweiHan,Lise
Getoor,WeiWang,JohannesGehrke,andRobertGrossman.Theseriespublishes50-to150-page
publicationsontopicspertainingtodatamining,webmining,textmining,andknowledge
discovery,includingtutorialsandcasestudies.Potentialtopicsinclude:dataminingalgorithms,
innovativedataminingapplications,dataminingsystems,miningtext,webandsemi-structured
data,highperformanceandparallel/distributeddatamining,dataminingstandards,datamining
andknowledgediscoveryframeworkandprocess,dataminingfoundations,miningdatastreams
andsensordata,miningmulti-mediadata,miningsocialnetworksandgraphdata,miningspatial
andtemporaldata,pre-processingandpost-processingindatamining,robustandscalable
statisticalmethods,security,privacy,andadversarialdatamining,visualdatamining,visual
analytics,anddatavisualization.
IndividualandCollectiveGraphMining:Principles,Algorithms,andApplications
DanaiKoutraandChristosFaloutsos
2017
PhraseMiningfromMassiveTextandItsApplications
JialuLiu,JingboShang,andJiaweiHan
2017
ExploratoryCausalAnalysiswithTimeSeriesData
JamesM.McCracken
2016
MiningHumanMobilityinLocation-BasedSocialNetworks
HuijiGaoandHuanLiu
2015
iii
MiningLatentEntityStructures
ChiWangandJiaweiHan
2015
ProbabilisticApproachestoRecommendations
NicolaBarbieri,GiuseppeManco,andEttoreRitacco
2014
OutlierDetectionforTemporalData
ManishGupta,JingGao,CharuAggarwal,andJiaweiHan
2014
ProvenanceDatainSocialMedia
GeoffreyBarbier,ZhuoFeng,PritamGundecha,andHuanLiu
2013
GraphMining:Laws,Tools,andCaseStudies
D.ChakrabartiandC.Faloutsos
2012
MiningHeterogeneousInformationNetworks:PrinciplesandMethodologies
YizhouSunandJiaweiHan
2012
PrivacyinSocialNetworks
ElenaZheleva,EvimariaTerzi,andLiseGetoor
2012
CommunityDetectionandMininginSocialMedia
LeiTangandHuanLiu
2010
EnsembleMethodsinDataMining:ImprovingAccuracyThroughCombining
Predictions
GiovanniSeniandJohnF.Elder
2010
ModelingandDataMininginBlogosphere
NitinAgarwalandHuanLiu
2009
Copyright©2018byMorgan&Claypool
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedin
anyformorbyanymeans—electronic,mechanical,photocopy,recording,oranyotherexceptforbriefquotations
inprintedreviews,withoutthepriorpermissionofthepublisher.
IndividualandCollectiveGraphMining:Principles,Algorithms,andApplications
DanaiKoutraandChristosFaloutsos
www.morganclaypool.com
ISBN:9781681730394 paperback
ISBN:9781681730400 ebook
DOI10.2200/S00796ED1V01Y201708DMK014
APublicationintheMorgan&ClaypoolPublishersseries
SYNTHESISLECTURESONDATAMININGANDKNOWLEDGEDISCOVERY
Lecture#14
SeriesEditors:JiaweiHan,UniversityofIllinoisatUrbana-Champaign
LiseGetoor,UniversityofCalifornia,SantaCruz
WeiWang,UniversityofCalifornia,LosAngeles
JohannesGehrke,CornellUniversity
RobertGrossman,UniversityofChicago
SeriesISSN
Print2151-0067 Electronic2151-0075
Individual and Collective
Graph Mining
Principles, Algorithms, and Applications
Danai Koutra
UniversityofMichigan,AnnArbor
Christos Faloutsos
CarnegieMellonUniversity
SYNTHESISLECTURESONDATAMININGANDKNOWLEDGE
DISCOVERY#14
M
&C Morgan&cLaypool publishers
ABSTRACT
Graphsnaturallyrepresentinformationrangingfromlinksbetweenwebpages,tocommunica-
tioninemailnetworks,toconnectionsbetweenneuronsinourbrains.Thesegraphsoftenspan
billionsofnodesandinteractionsbetweenthem.Withinthisdelugeofinterconnecteddata,how
can we find the most important structures and summarize them? How can we efficiently visu-
alize them? How can we detect anomalies that indicate critical events, such as an attack on a
computersystem,diseaseformationinthehumanbrain,orthefallofacompany?
Thisbookpresentsscalable,principleddiscoveryalgorithmsthatcombineglobalitywith
locality to make sense of one or more graphs. In addition to fast algorithmic methodologies, we
alsocontributegraph-theoreticalideasandmodels,andreal-worldapplicationsintwomainareas.
• IndividualGraphMining: We show how to interpretably summarize a single graph by
identifying its important graph structures. We complement summarization with infer-
ence,whichleveragesinformationaboutfewentities(obtainedviasummarizationorother
methods)andthenetworkstructuretoefficientlyandeffectivelylearninformationabout
theunknownentities.
• Collective Graph Mining: We extend the idea of individual-graph summarization to
time-evolving graphs, and show how to scalably discover temporal patterns. Apart from
summarization,weclaimthatgraphsimilarityisoftentheunderlyingprobleminahostof
applicationswheremultiplegraphsoccur(e.g.,temporal anomalydetection,discoveryof
behavioralpatterns),andwepresentprincipled,scalablealgorithmsforaligningnetworks
andmeasuringtheirsimilarity.
The methods that we present in this book leverage techniques from diverse areas, such
as matrix algebra, graph theory, optimization, information theory, machine learning, finance,
and social science, to solve real-world problems. We present applications of our exploration
algorithms to massive datasets, including a Web graph of 6:6 billion edges, a Twitter graph of
1.8billionedges,braingraphswithupto90millionedges,collaboration,peer-to-peernetworks,
browserlogs,allspanningmillionsofusersandinteractions.
KEYWORDS
datamining,graphminingandexploration,graphsimilarity,graphmatching,net-
workalignment,graphsummarization,patternmining,outlierdetection,anomaly
detection, scalability, fast algorithms, models, visualization, social networks, brain
graphs,connectomes
vii
Contents
Acknowledgments ................................................. xi
1 Introduction .......................................................1
1.1 Overview ....................................................... 1
1.2 OrganizationofThisBook.......................................... 2
1.2.1 PartI:IndividualGraphMining ............................... 2
1.2.2 PartII:CollectiveGraphMining............................... 3
1.2.3 CodeandSupportingMaterialsontheWeb ..................... 5
1.3 Preliminaries..................................................... 5
1.3.1 GraphDefinitions........................................... 5
1.3.2 Graph-theoreticDataStructures ............................... 8
1.3.3 LinearAlgebraConcepts ..................................... 9
1.3.4 SelectGraphProperties ..................................... 11
1.4 CommonSymbols ............................................... 12
PARTI IndividualGraphMining . . . . . . . . . . . . . . . . . . 15
2 SummarizationofStaticGraphs......................................17
2.1 OverviewandMotivation ......................................... 18
2.2 ProblemFormulation ............................................. 19
2.2.1 MDLforGraphSummarization .............................. 21
2.2.2 EncodingtheModel ........................................ 23
2.2.3 EncodingtheErrors ........................................ 25
2.3 VoG:Vocabulary-basedSummarizationofGraphs ..................... 25
2.3.1 SubgraphGeneration ....................................... 26
2.3.2 SubgraphLabeling ......................................... 26
2.3.3 SummaryAssembly......................................... 28
2.3.4 ToyExample .............................................. 29
2.3.5 TimeComplexity .......................................... 29
2.4 EmpiricalResults ................................................ 30
viii
2.4.1 QuantitativeAnalysis ...................................... 31
2.4.2 QualitativeAnalysis ........................................ 35
2.4.3 Scalability ................................................ 43
2.5 Discussion...................................................... 44
2.6 RelatedWork ................................................... 46
3 InferenceinaGraph................................................49
3.1 Guilt-by-associationTechniques .................................... 50
3.1.1 RandomWalkwithRestarts(RWR) ........................... 50
3.1.2 Semi-supervisedLearning(SSL) .............................. 51
3.1.3 BeliefPropagation(BP) ..................................... 51
3.1.4 Summary ................................................. 53
3.2 FaBP:FastBeliefPropagation...................................... 53
3.2.1 Derivation ................................................ 58
3.2.2 AnalysisofConvergence..................................... 63
3.2.3 Algorithm ................................................ 64
3.3 ExtensiontoMultipleClasses ...................................... 65
3.4 EmpiricalResults ................................................ 68
3.4.1 Accuracy ................................................. 69
3.4.2 Convergence .............................................. 69
3.4.3 Robustness................................................ 70
3.4.4 Scalability ................................................ 70
PARTII CollectiveGraphMining . . . . . . . . . . . . . . . . . 73
4 SummarizationofDynamicGraphs...................................75
4.1 ProblemFormulation ............................................. 77
4.1.1 MDLforDynamicGraphSummarization ...................... 79
4.1.2 EncodingtheModel ........................................ 80
4.1.3 EncodingtheErrors ........................................ 81
4.2 TimeCrunch:Vocabulary-basedSummarizationofDynamicGraphs...... 83
4.2.1 GeneratingCandidateStaticStructures......................... 83
4.2.2 LabelingCandidateStaticStructures........................... 84
4.2.3 StitchingCandidateTemporalStructures ....................... 84
4.2.4 ComposingtheSummary.................................... 86
ix
4.3 EmpiricalResults ................................................ 87
4.3.1 QuantitativeAnalysis ....................................... 88
4.3.2 QualitativeAnalysis ........................................ 90
4.3.3 Scalability ................................................ 92
4.4 RelatedWork ................................................... 93
5 GraphSimilarity...................................................97
5.1 Intuition ....................................................... 97
5.1.1 Overview ................................................. 99
5.1.2 MeasuringNodeAffinities ................................... 99
5.1.3 LeveragingBeliefPropagation ............................... 100
5.1.4 DesiredPropertiesforSimilarityMeasures ..................... 101
5.2 DeltaCon:“(cid:14)”ConnectivityChangeDetection ...................... 102
5.2.1 AlgorithmDescription ..................................... 102
5.2.2 FasterComputation ....................................... 103
5.2.3 DesiredProperties......................................... 106
5.3 DeltaCon-Attr:AddingNodeandEdgeAttribution ................ 112
5.3.1 AlgorithmDescription ..................................... 113
5.3.2 ScalabilityAnalysis ........................................ 115
5.4 EmpiricalResults ............................................... 115
5.4.1 Intuitivenessof DeltaCon ................................. 115
5.4.2 Intuitivenessof DeltaCon-Attr ........................... 123
5.4.3 Scalability ............................................... 130
5.4.4 Robustness............................................... 131
5.5 Applications ................................................... 132
5.5.1 Enron................................................... 132
5.5.2 BrainConnectivityGraphClustering ......................... 134
5.5.3 RecoveryofConnectomeCorrespondences..................... 135
5.6 RelatedWork .................................................. 138
6 GraphAlignment .................................................143
6.1 ProblemFormulation ............................................ 144
6.2 BiG-Align:BipartiteGraphAlignment ............................ 146
6.2.1 MathematicalFormulation .................................. 146
6.2.2 Problem-specificOptimizations .............................. 149
6.2.3 AlgorithmDescription ..................................... 154
6.3 Uni-Align:ExtensiontoUnipartiteGraphAlignment ................ 154