ebook img

Clustering attributed graphs: models, measures and methods PDF

0.67 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Clustering attributed graphs: models, measures and methods

ZU064-05-FPR article 9January2015 1:22 1 UnderconsiderationforpublicationinNetworkScience Clustering attributed graphs: models, measures and methods∗ 5 1 0 CECILEBOTHOREL,JUANDAVIDCRUZ 2 DepartmentofLogicsinUses,SocialScienceandInformationScience, n UMRCNRS3192Lab-STICC, a Te´le´comBretagne,InstitutMines-Te´le´com,Brest,France J 7 (e-mail:cecile.bothorel, [email protected]) ] MATTEOMAGNANI† I ComputingScienceDivision,ITDepartment,UppsalaUniversity,Sweden S . (e-mail:[email protected]) s c BARBORAMICENKOVA´ [ DataIntensiveSystems,DepartmentofComputerScience, 1 AarhusUniversity,Denmark v 6 (e-mail:[email protected]) 7 6 1 Abstract 0 . 1 Clusteringagraph,i.e.,assigningitsnodestogroups,isanimportantoperationwhosebestknown 0 application is the discovery of communities in social networks. Graph clustering and community 5 detectionhavetraditionallyfocusedongraphswithoutattributes,withthenotableexceptionofedge 1 weights.However,thesemodelsonlyprovideapartialrepresentationofrealsocialsystems,thatare : v thus oftendescribed using node attributes,representing featuresof theactors, andedge attributes, i representing different kinds of relationships among them. We refer to these models as attributed X graphs. Consequently, existinggraphclusteringmethodshavebeenrecentlyextendedtodealwith r nodeandedgeattributes.Thisarticleisaliteraturesurveyonthistopic,organizingandpresenting a recent research results in a uniform way, characterizing the main existing clustering methods and highlightingtheirconceptualdifferences.Wealsocovertheimportanttopicofclusteringevaluation andidentifycurrentopenproblems. Contents 1 Introduction 2 1.1 Currenttrendsinattributedgraphanalysisandmining 3 1.2 Clusteringattributedgraphs 4 ∗ This version has been submitted to the Network Science journal and has been subsequently acceptedforpublicationsubjecttominorrevisions.Itwillappearinarevisedformsubsequentto peerreviewand/oreditorialinputbyCambridgeUniversityPressand/orthejournal’sproprietor (http://journals.cambridge.org/NWS). † The author has been partly supported by the Italian Ministry of Education, Universities and ResearchFIRBgrantRBFR107725. ZU064-05-FPR article 9January2015 1:22 2 C.Bothorel,J.D.Cruz,M.MagnaniandB.Micenkova´ 2 Clusteringedge-attributedgraphs 4 2.1 Single-layerapproaches 6 2.2 Extensionofmodularity 8 2.3 Clique-findingmethods 10 2.4 Emergingclusters 12 3 Clusteringnode-attributedgraphs 13 3.1 Datarepresentation 14 3.2 Weightmodificationaccordingtonodeattributes 15 3.3 Linearcombinationofattributesandstructuraldimensions 17 3.4 Walk-basedapproaches 19 3.5 Methodsbasedonstatisticalinference 19 3.6 Subspace-basedmethods 20 3.7 Othermethods 21 4 Practicalaspects 22 4.1 Evaluation 22 4.2 Applicability 29 5 Openproblemsanddiscussion 31 1 Introduction Graphsrepresentoneofthemainmodelstostudyhumanrelationships.Forexample,struc- turalpropertiesofsocialsystemscanbemeasuredbyrepresentingindividualsandtheirre- lationshipsasgraphsandcomputingthecentralityorprestigeoftheirnodes(Wasserman&Faust, 1994).Similarly,onceasocialgraphisavailable,groupsofstronglyconnectedindividuals (communities) can be identified using clustering algorithms. The application of graphs to the study of social systems motivated and is now a partof a broaderdiscipline called network science, focused on the modeling and analysis of relationshipsbetween generic entities. This discipline provides a set of tools (methodologies, methods and measures) to improve our understanding of complex systems, including social and technological environments, transport and communication networks and biological systems. The wide applicabilityofnetworksciencelargelyreliesontheadoptionofgraph-basedmodels,that thankstotheirgeneralitycanbeappliedtoadiverserangeofscenarios. However, researchers in social network analysis (SNA) and social sciences have long been aware of the potential value in representing additional information on top of the socialgraph,andofthepotentiallossinaccuracywhensimplenodesandedgesareused to represent complex social interactions. For example, according to Wasserman&Faust (1994)socialnetworkscontainatleastthreedifferentdimensions:astructuraldimension correspondingto the social graph,e.g. actorsand theirrelationships,a compositionaldi- mensiondescribingtheactors,e.g.theirpersonalinformation,andanaffiliationdimension indicatinggroupmemberships.Theexistenceofmultiplerelationshiptypes,e.g.,working together, being friends or exchanging text messages, has also been studied for a long time,asrecentlyreportedbyBorgattietal.(2009).Thislastaspecthasbeenreferredtoas multiplexityintheSNAtradition,andcanberelatedtoGoffman’sconceptofcontext,well exemplifiedbythemetaphoreofindividualsactingonmultiplestagesdependingontheir ZU064-05-FPR article 9January2015 1:22 Clusteringattributedgraphs 3 audience(Goffman,1974).Asanexample,Figure1(b)highlightshowanattributedgraph mayleadtoadeeperunderstandingofsocialinteractionsifcomparedtothecorresponding graphwithoutattributesinFigure1(a). (a) (b) Fig. 1. A graph (a) provides a simplified representation of a social system which can be easy to understand but may prevent a deep understanding of its structural and compositionaldimensions(b) 1.1 Currenttrendsinattributedgraphanalysisandmining Attributedgraphshavebeenusedfordecadestostudysocialenvironmentsandithasbeen long recognized that the structure of a social network may not be sufficient to identify itscommunities(Freeman,1996;Hricetal.,2014).However,recentyearshavewitnessed a renewed attention towards these models, partially motivated by the availability of real datafromon-linesources.Oneinterestingaspectofrealattributedgraphsistheobserved dependencybetweenwhotheactorsareandhowtheyinteract,i.e.betweenthestructural andcompositionaldimensions.Forexample,LaFond&Neville(2010)haveobservedthe coexistenceofsocialinfluenceandhomophily.Socialinfluencestatesthatpeoplewhoare linkedarelikelytohavesimilarattributes,thusnodeattributevaluescanbeinterpretedas aresultofinteractionswithothernodes.Atthesametime,homophilyimpliesthatpeople withsimilarattributesarelikelytobuildrelationships.Thesetworelatedphenomenahave beenobservedinrealnetworksbyKossinets&Watts(2006),andthedependencybetween attributesandconnectivityhasbeenstudiedmathematically(Kim&Leskovec,2012). Withthisinmind,researchershavefocusedonattributedgraphgenerators.Artificially growngraphsareusefultoexperimentalgorithmsandrunsimulationswhenrealdataare difficult to collect. They are relevantin testing what if scenarios, providingforecasts on futureevolutions,andcanbe usedto designgraphsamplingalgorithmswhenthe size of originalgraphswouldotherwisemaketheanalysisimpractical(Leskovecetal.,2005). Priormodels,asthewell-knownpreferentialattachmentmechanismbyBaraba´si&Albert (1999), have focused on the social structure. Now the challenge is to generate datasets as close as possible to real-world social graphs, as done by Zhelevaetal. (2009) where affiliationinformationisalsogenerated.Thismodelcapturespreviouslystudiedproperties (e.g. power-law distribution for social degree) but also providesnew interesting insights regarding the processes behind group formation. More recently Gongetal. (2011) have proposedagenerativesocial-attributenetworkmodelbasedontheirempiricalobservations of Google+ growth. Here attributes describe user characteristics like name of attended school and group membership.NanDuetal. (2010); Magnani&Rossi (2013a) have in- steadfocusedonthegenerationofgraphswithinterdependentattributesontheedges. ZU064-05-FPR article 9January2015 1:22 4 C.Bothorel,J.D.Cruz,M.MagnaniandB.Micenkova´ Theideathatattributesandconnectionsaregeneratedinaninterdependentwayhasled tothedevelopmentofspecializedanalysismethods.Severalgraphminingtaskshavebeen extendedto attributed graphs, like link prediction (Getoor&Diehl, 2005; Rossettietal., 2011;Gongetal.,2011;Sunetal.,2012)orattributeinference(Li&Yeung,2009;Gongetal., 2011;Yangetal.,2011).Thissurveyisdedicatedtooneofthemostrelevantandstudied operations on graphs and complex networks: graph clustering, often referred to as com- munitydetectionwhensocialgraphsareinvolved.Webelievethatthisisanimportantand timelyefforttofacilitateresearchinthisstillyoungarea,inparticularconsideringthatthe discussedapproacheshavebeenintroducedindifferentdisciplines,oftenunawareofeach other. 1.2 Clusteringattributedgraphs Althoughseveralsurveysongraphclusteringhavebeenwritten(Schaeffer,2007;Fortunato, 2010; Aggarwal&Wang, 2010; Cosciaetal., 2011), most of the approaches to cluster attributedgraphsaremorerecentandhavenotbeenincludedintheseworks.Atthesame time,thereisalargeliteratureon(multi-dimensional)clusteringoftabulardata(Moiseetal., 2009;Hanetal.,2011),butexistingsurveysinthisareahavenotaddressedextensionsfor graphdata.Attributedgraphclusteringcanbeseen astheconfluenceofthesetwo fields, the former focusing on the structuraland the latter on the compositionalaspects. In this articlewefocusonrecentworksresultingfromthispromisingcombination. The article is organized in three main parts: a review of methods for edge-attributed graphs,a review ofmethodsfor node-attributedgraphs,and a section on practicalissues including the evaluation of clusterings and the applicability of different approaches. We concludebysummarizingthestatusoftheresearchanddiscussingtheopenproblemsthat are more promising according to our view of the area. Attributed graph clustering has beenindependentlystudiedindifferentdisciplines,thereforeitisimportanttoknowhow differentterms havebeen used in the literature.In Table 1 we have indicatedand briefly explainedthetermsusedinthisarticle. 2 Clusteringedge-attributedgraphs Onewaytoextenda graphmodelandtoprovideadditionalinformationtotheclustering algorithmistorepresentthedifferentkindsofedgesamongindividuals.Asanexample,in Figure1(b)wecanseethattherelationshipbetweenthetwoleft-mostnodesconsistsofa friendshipandaworkingedge. (a) (b) Fig.2. Twoalternativerepresentationsofthedifferentedgetypesinamultigraph Differentmodelshavebeenusedtorepresentthisscenario(Minor,1983;Lazega&Pattison, 1999;Skvoretz&Agneessens,2007;Kazienkoetal.,2010;Berlingerioetal.,2011b),some- ZU064-05-FPR article 9January2015 1:22 Clusteringattributedgraphs 5 Table1. Terminologyusedinthisarticleandsynonymsusedintheliterature mainterm synonyms meaning Basic component of a graph. As an example, a node may indicate that a Node Vertex,site,actor userhasanaccountonthesocialmedia sitewhosesocialnetworkisrepresented bythatgraph. Arelationshipbetweentwonodes,e.g., a following relationship between two Link,arc,tie,connection,bond, Edge Twitteraccounts.Whenthereisanedge relation(ship) betweentwonodeswesaythattheyare directlyconnected. A graph without attributes, neither on nodesnoronedges,withtheexception of an optional numerical weight on Graph Network,socialnetwork,layer edges indicating the strength of the connection. Edges may be directed or indirected. Attributes indicate connections of dif- Multiplex network, multi-layer ferent kinds or inside different graphs. Edge-attributedgraph graph, multidimensional net- With this term we do not indicate the work,edge-labeledmulti-graph presence of weights, inwhich casewe explicitlytalkofweightedgraph/edges. Afeaturevectorisassociatedwitheach Node-labeledgraph,graphwith Node-attributedgraph nodeandcontainsinformationaboutit, featurevectors e.g.,age,nationality,language,income. Attributegraph,socialandaffil- An edge-attributed graph, or a node- Attributedgraph iation network, relational data, attributedgraph,orboth. multidimensionalnetwork Sometimesalltheedgeswiththesame attribute value in an edge-attributed graphareindicatedasalayer,e.g.,the Layer Aspect,dimension Facebookfriendship,spacialproximity, Twitter following, colleague or family layersinanattributedgraphindicating differenttypesofsocialrelationships. Assignment of each node to one or more groups of nodes, called clusters. Clustering Communitystructure Different criteria can be used to determine whether two nodes should belongtothesamecluster. A clustering where each node is Partition Non-overlappingclustering assignedtoexactlyonecluster. ZU064-05-FPR article 9January2015 1:22 6 C.Bothorel,J.D.Cruz,M.MagnaniandB.Micenkova´ times emphasizingthe differentroles playedby individualswith respectto differentnet- works (Magnani&Rossi, 2011), including different kinds of nodes (Caietal., 2005) or providinga moregeneraldata modelto mathematicallyrepresenta graphwith attributes on both nodes and edges (Kivela¨etal., 2014). In Figure 2 we can see two alternative representationsofthesamedata,asamultigraph(a)andasasetofinterconnectedgraphs (b).The former,sometimesreferredto as a multiplex network, focuseson a single set of nodesthatmayhavecomplexrelationshipsbetweenthem: Definition1(Multi-relationaledge-attributedgraph) Given a set of nodes N and a set of labels L, an edge-attributed graph is a triple {G= (V,E,l)}whereV ⊆N,(V,E)isamulti-graphandl:E→L.Eachedgee∈Einthegraph hasanassociatedlabell(e). The latter emphasizeshow the same nodecan belongto multiple (social) graphs, also knownaslayers: Definition2(Multi-layeredge-attributedgraph) GivenasetofnodesN andasetoflabelsL,anedge-attributedgraphisdefinedasasetof graphsG =(V,E) whereV ⊆N, E ⊆V ×V. Each graphG has an associated unique i i i i i i i i namel ∈L. i Althoughverysimilar,andinthisspecificexampleequivalent,thesetworepresentations emphasizedifferentaspectsofanedge-attributedgraph.Itisimportanttounderstandthat the methods covered in the remaining of this section have been developedstarting from specificmodels,influencingtheirfeatures.Researchersusingthefirstmodelhavemainly focusedon the reductionof differentedgetypesto single edges, while researchersusing thesecondmodelhavelookedforclustersspanningdifferentlayersandnodesbelongingto multipleclustersdependingontheedgetype.Withthisdifferenceinmind,inthefollowing wewillformallyrepresentbothscenariosusingthesecond(moregeneral)model,wherea familyofgraphspossiblycontainingcommonnodesrepresentthedifferentkindsofedges. AlargerworkingexampleisshowninFigure3(a). More general definitions have been provided in the literature, where one node in one graphcancorrespondtomultiplenodesinanother.Thisincludesthecaseofonlinesocial media,wherethesameusercanopenmultipleaccountsonsomeservices(Magnani&Rossi, 2011),andthecaseofnon-socialnetworkscontainingdifferentkindsofnodes,suchasa powergridandacontrolnetwork,whereonenodeinanetworkcanberelatedtomultiple nodesinanother(Gaoetal.,2011).Similarly,themodelintroducedbyKivela¨etal.(2014) allows the presence of attributesboth on nodesand edges. For the sake of simplicity we focusonthesimplerdefinitionsabove,becausetheyaretheonesusedbyalmostallworks onclusteringsocialnetworkstodate.Also,noticethatwefocusonnominalattributes,e.g. workandfriendship:thecasewhereattributesareonlynumeric,thatis,weightedgraphs, hasalreadybeentreatedindepthinexistingsurveys.However,wewilldealwithnumeric weightswhentheseareusedinsidealgorithmsfornominalattributes. 2.1 Single-layerapproaches A basic approach to deal with edge-attributed graphs is to flatten them: to reconstruct a singleweightedgraphso thatexistingclusteringmethodscanbe indirectlyapplied.This ZU064-05-FPR article 9January2015 1:22 Clusteringattributedgraphs 7 (a) (b) (c) Fig.3. Anedge-attributedgraph,correspondingtoasetoninterconnectedgraphsdefined on a common superset of individuals (a). An indirect way to process it is to reduce it to asingleweightedgraph,thenapplyclassicalclusteringalgorithms(b).A significantly differentapproachistolookatexclusiveconnections(c) approach,exemplifiedin Figure3(b), is notrestricted to clusteringbut can be applied to anyoperationdefinedonweightedgraphs.Weightscanbecomputedstraightforwardlyso thatanedgebetweentwonodeshasaweightproportionaltothenumberofgraphswhere thetwonodesaredirectlyconnected. Definition3(Flattening) A flattening of an edge-attributed graph ({G}) is a weighted graph (E ,V ,w ) where i f f f E = E,V = V andw(u,v)= |{i|(u,v)∈Ei}| (whereN isthetotalnumberofgraphs). f i f i N BerlinSgerioetal.S(2011a)followsthisapproach.However,thesameauthorspointouthow this solution may discard relevant information, e.g., the fact that some attribute values (or graph layers) are more important than others to define a cluster. Tangetal. (2011) propose a more generalframeworkwhere the informationabout the multiple edge types is considered during one of the four different components of the community detection process, network flattening being one of them. Nevertheless, the authors point out that this kind of integration requires that edges of differenttypes share the same community structure. Therefore,it is not suitable for cases where the structures significantly vary in differentdimensions. Anantitheticapproachacknowledgingtheimportanceofedge-attributedmodelsbutstill notconsideringclustersthatcanspanseveralgraphsisintroducedbyBonchietal.(2012). While flatteningtendstoassign nodesdirectlyconnectedonmultiplegraphstothe same group because they get connected by a strong edge in the flattened graph, Bonchietal. (2012) consider a set of nodesas a good cluster if their relationships are as specific and homogeneousaspossible,i.e.,theyaremainlyconnectedthroughthesameedgetype.An exampleispresentedinFigure3(c)wherethethreenodesmarkedinblackareconnected with each other in the middle layer but only share one single edge on all other layers, representingagoodclusteraccordingtothisapproach1. 1 PleasenoticethatthisspecificexampleisnotcompatiblewiththeoriginalmodelbyBonchietal. (2012)whereindividualsareallowedtobedirectlyconnectedonlyononeofthelayers.However, itretainsitsunderlyingintuition.Whilethisworkwasnotoriginallyintendedtobeappliedtothis domain,itstillpresentsaworth-mentioningalternativepointofview. ZU064-05-FPR article 9January2015 1:22 8 C.Bothorel,J.D.Cruz,M.MagnaniandB.Micenkova´ Thenextsectionsaredevotedtomethodsaimingatidentifyingclustersspanningmul- tiple layers. They are mostly extensions of quality measures traditionally used in graph clustering,modularityandquasi-cliquesbeingtwoprominentexamples. 2.2 Extensionofmodularity Modularity is a measure of how well the nodes in a graph can be separated into dense andindependentcomponents(Newman&Girvan,2004).Figure4showsfourgraphswith theirnodesassignedintotwocommunities(blackandwhite)andthemodularitiesresulting fromtheseassignments.Intheseexamplesitclearlyappearshowtheassignmentsputting togetherhighlyinterconnectednodesandseparatinggroupsofnodeswithonlyafewcon- nectionsbetweenthemgetahighervalueofmodularity.Itisworthnoticingthatmodularity isnotamethodtofindcommunities,butonlyaqualityfunction.However,itcanbedirectly optimizedorusedinsidecommunitydetectionmethodstoguidetheclusteringprocess. Althoughthismeasuresuffersfromsomewellknownpitfalls(Fortunato&Barthe´lemy, 2007;Lancichinetti&Fortunato,2011),ithasrecentlybeenatthebasis ofseveralgraph clustering methods and it has also been extended to deal with attributed graphs. Let us briefly introduceit2, to later simplify the explanationof its extension.The modularityis thusexpressedas Q= 1 (cid:229) a −kikj d (g ,g ), (1) ij i j 2m 2m ij (cid:18) (cid:19) whered (g ,g )istheKroneckerdeltawhichreturns1whennodesiand jbelongtothe i j same cluster, 0 otherwise. Therefore,the sum is computed only for those pairs of nodes thatareinsidethesamecluster.Foreachofthesepairs,thepresenceofanedgebetween themimprovesthequalityoftheassignment:a equals1whenthereisanedgebetween ij i and j, 0 otherwise. As we are dividing everything by m (the number of edges in the graph),edgesbetween nodesbelongingto differentclusters negativelyaffectmodularity becausetheyarenotconsideredinthenumerator(asd (g ,g )=0),butarecountedinthe i j denominator(m).Finally,the formulaconsidersthefactthattwonodeswithhighdegree wouldbemorelikelytoendupinthesameclusterbychance,thereforetheircontribution isreduced(−kikj,wherek andk arethedegreesofiand j). 2m i j NowitshouldbeeasiertounderstandtheextensionofmodularityproposedbyMuchaetal. (2010)foredge-attributedgraphs.LetusconsiderFigure5:herewehaveemphasizedhow the same individuali can be present in multiple graphsat the same time. For example,i and jaredirectlyconnectedongraphsrands,whererandsrepresenttwodifferentedge types.Noticethatinthisexamplewehavethreegraphs,i.e.,threeedgetypes,andthat jis assignedtotwodifferentclustersingraphsr(gray)ands,t (white). 2 Pleasenoticethatmodificationsofthisformulahavebeenproposedtomakeitmoreadaptableto differentdatasets.Onetypicaladditionisaresolutionparameter,thatwehaveomittedfromthe followingequationsbecauseitisorthogonaltoourdiscussion. ZU064-05-FPR article 9January2015 1:22 Clusteringattributedgraphs 9 Fig. 4. Modularity of four graph clusterings: nodes in each graph are assigned to two clusters(blackandwhite);themodularityofeachassignmentisreportedunderthegraph Fig. 5. An edge-attributed graph with three kinds of edges, represented as three interconnectedgraphs.Nodeshavebeenassignedtothreeclusters(black,grayandwhite) Thus,theextendedversionofthemodularitycanbeexpressedas Q = 1 (cid:229) a −kiskjs d (s,r)+c d (i,j) d (g ,g ). (2) m 2m ijs 2m jsr i,s j,r ijsr(cid:20)(cid:18) s (cid:19) (cid:21) This extended quality function involves not just all pairs of nodes (i,j) but also all pairs of graphs (s,r). m and d (g ,g ) correspondrespectively to m and d (g ,g ) in the i,s j,r i j modularity formula, where m also considers the connections between different graphs: we say that there is a connection between two graphs r and s whenever they contain a commonnode j, whichincreasesm byc .d (g ,g )allowstoassignthesamenodeto jsr i,s j,r differentclustersinsidedifferentgraphs.The sumis now madeoftwo components.One is onlycomputedwhentwo nodesin thesame graphare considered(becauseofd (s,r)), correspondingtomodularity.Infact,herea =1wheniand j aredirectlyconnectedin ijs graphsandk isthedegreeofnodeiin thesamegraph.Thesecondcomponent,c , is is jsr onlycomputedwhenweareconsideringthesamenode jinsidetwodifferentgraphsrand s.Thistermincreasesthequalityfunctionbyc (typically,aconstantvaluerangingfrom jsr 0to1)wheneverweassignthesameindividualtothesameclusterondifferentgraphs. One practical problem in using this measure is to set the c parameter. Setting it to jsr 0 for all nodes and graphs, clusters are identified on each single graph independently of each other. If c is high, e.g., 1, it becomes unlikely to assign the same individuals jsr to different clusters on different graphs. Other practical aspects to consider are the fact ZU064-05-FPR article 9January2015 1:22 10 C.Bothorel,J.D.Cruz,M.MagnaniandB.Micenkova´ that the part of the formula corresponding to traditional modularity can give a negative contribution,whichisnottruefortheparttakingcareofinter-networkrelationships,and alsothefactthatthecontributionofinter-networkrelationshipsgrowsquadraticallyonthe number of networks while the modularity part only grows linearly. However, while the choiceofappropriateparametersdeservesmoreresearch,thisextendeddefinitionofmod- ularitycanbedirectlyusedto findclustersbyusinganymodularity-optimizationheuris- tics, as done by Muchaetal. (2010), or paired with a concept of betweenness to extend the Girvan-Newmanalgorithm.The definitionof betweennessfor edge-attributedgraphs followsdirectly fromanydefinitionof distance involvingmultiple graphs(Brodkaetal., 2011;Magnanietal.,2013). Figure 6 shows the values of modularityfor four differentmulti-graphsand three dif- ferent settings for the inter-graph parameter c (which is kept constant for all nodes jsr andgraphs).Thefigureemphasizesthedifferentcomponentsofthismeasure.Onthetop we can see two clusteringsalignedwith boththe single-graphand multi-graphstructure. In particular, groups of nodes sharing several edges belong to the same cluster, and the samenodesondifferentgraphstendtobelongtothesamecluster.However,thetop-right exampleshowsthatwecanassignanodetodifferentclustersindifferentgraphs. Modularities computed using different values of c cannot be compared: increasing jsr c alsoincreasestheabsolutevalueofmodularity.However,wecanseehowtheincrease jsr inthetop-rightfigureisproportionallylowerthantheoneontheleft(from.48to.68and from.54to.62,respectively).Thisisdeterminedbythenodesassignedtomultipleclusters. Thetwolowerfiguresshowexamplesoflowermodularity,i.e.,clusteringsnotfollowing thestructureofthegraphs.Thelower-leftimagehasalowoverallintra-graphmodularity which can be seen when c =0 and thus inter-graph connections are not considered. jsr When we also consider them (c =.5 and c =1) we can see that modularity is in- jsr jsr creasinginthelower-leftgraphmuchmorethaninthelower-rightone,whereeverynode belongstobothclustersondifferentlayers. 2.3 Clique-findingmethods Anotherconceptusedtodiscoverclustersingraphsistheclique,i.e.,acomplete(sub)graph. Althoughthis is one of the basic conceptsin graphtheoryandit is thus wellknown,we brieflyrecallit. Definition4(Clique) Acliqueisasetofnodesdirectlyconnectedtoallothernodesintheclique. Definition5(Maximalclique) Amaximalcliqueisacliquethatisnotcontainedinalargerclique. Figure 7(a) shows an example of a clique. Any three nodes in Figure 7(a) still make a clique,butnotamaximalonebecausewecanaddthefourthnodeandstillhaveaclique. A(maximal)cliqueclearlycorrespondstoacluster.However,largecliquesaredifficult tofindinrealdatabecauseitissufficientforoneedgenottobepresenttobreaktheclique, and in socialgraphsedgescan be missing for manyreasons, e.g., because of unreported dataorjustbecauseeveninatightgrouptherecanbetwoindividualsthatdonotgetwell

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.