Table Of Content

Community detection in complex networks using Extremal Optimization Jordi Duch1 and Alex Arenas1 1Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain (Dated: February 2, 2008) 5 We propose a novel method to find the community structure in complex networks based on an 0 extremaloptimization ofthevalueofmodularity. Themethodoutperformstheoptimalmodularity 0 found by the existing algorithms in the literature. We present the results of the algorithm for 2 computer simulated and real networks and compare them with other approaches. The efficiency n andaccuracyofthemethodmakeitfeasibletobeusedfortheaccurateidentificationofcommunity a structurein large complex networks. J PACSnumbers: 6 1 The description of the structure of complex networks certain mesoscopic description of the graph in terms of ] n has been one of the focus of attention of the physicist’s communitiesismoreorlessaccurate. Thelargerthe val- n community in the recent years. The levels of descrip- ues of Q the most accurate a partition into communities - tion range from the microscopic (degree, clustering co- is. s i efficient, centrality measures, etc., of individual nodes) The search for the optimal (largest) modularity value d . to the macroscopic description in terms of statistical is a NP-hard problem due to the fact that the space of t a properties of the whole network (degree distribution, to- possible partitions grows faster than any power of the m talclusteringcoefficient,degree-degreecorrelations,etc.) system size. For this reason, a heuristic search strategy [1,2,3,4]. Betweenthesetwoextremesthereisa”meso- is mandatory to restrict the search space while preserv- - d scopic” description of networks that tries to explain its ing the optimization goal [14]. Indeed, it is possible to n community structure. The general notion of community relate the current optimization problem for Q with clas- o structureincomplexnetworkswasfirstpointedoutinthe sical problems in statistical physics, e.g. the spin glass c physics literature by Girvan and Newman [5], and refers problem of finding the ground state energy [15], where [ to the fact that nodes in many real networks appear to algorithms inspired in natural optimization processes as 1 group in subgraphs in which the density of internal con- simulatedannealing[16]andgeneticalgorithms[17]have v nections is larger than the connections with the rest of been successfully used. 8 nodes in the network. In this Letter, we propose a new divisive algorithm 6 3 The community structure has been empirically found thatoptimizesthemodularityQusinganheuristicsearch 1 inmanyrealtechnological,biologicalandsocialnetworks basedontheExtremalOptimization(EO)algorithmpro- 0 [6,7,8,9,10]anditsemergenceseemstobeattheheart posed by Boettcher and Percus [18, 19]. This algorithm 5 of the network formation process [11]. isinspiredinturnintheevolutionmodelofBak-Sneppen 0 The existing methods intended to devise the commu- [20], and basically operates optimizing a global variable / t nity structure in complex networks have been recently by improving extremal local variables that involve co- a reviewed in [10]. All these methods require a definition evolutionary avalanches. The performance of EO algo- m ofcommunitythatimposesthelimituptowhichagroup rithms have been shown to overcome the efficiency of - d shouldbeconsideredacommunity. However,theconcept classicalsimulatedannealingandgeneticalgorithmspro- n of community itself is qualitative: nodes must be more viding competitive accuracy [21]. o connectedwithinitscommunitythanwiththerestofthe In our case, the global variable to optimize is Q as c network,anditsquantificationisstillasubjectofdebate. defined in eq.(1). Thus, the definition of the local vari- : v Some quantitative definitions that came from sociology ables used in the extremal optimization problem should i have been used in recent studies [12], but in general,the be related to the contribution of individual nodes i to X physics community has widely accepted a recent mea- the summation in eq.(1) given a certain partition into ar sure for the community structure based on the concept communities ofmodularityQintroducedbyNewmanandGirvan[13]: q =κ −k a (2) Q=X(err−a2r) (1) i r(i) i r(i) r where κ is the number of links that a node i be- r(i) wheree arethefractionoflinksthatconnecttwonodes longing to a community r has with nodes into the same rr insidethecommunityr,a thefractionoflinksthathave community, and k is the degree of node i. Note that r i one or both vertices inside of the community r, and the Q = 1 q where i refers to all nodes in the network 2LPi i sum extends to all communities r in a given network. given a certain partition into communities and L is the Note that this measure provides a way to determine if a total number of links in the network. Eq.(2) provides a 2 measure that depends on the node degree, and its nor- malizationinvolveallthe links inthe networkafter summation. Re-scaling the local variable q by the degree of i node i we obtain a proper definition for the contribution of node i to the modularity, relative to its own degree and normalized in the interval [-1,1]. Pajek Pajek FIG. 1: Left: Random initialization of the Zachary network q κ λ = i = r(i) −a (3) intotwopartitions,redandgreen. Right: Fivedifferentcom- i ki ki r(i) munities identified as connected components in each partitions. Each color definesa different community. Keeping in mind this definition of λ we can compare i the relative contribution of individual nodes to the community structure. We will consider λ as the local vari- i again repeat the process until no changes could improve able involved in the extremal optimization process that it (see Fig. 2). characterizesanindividualnode,fromnowonwewillre- fertoλ asthefitnessofnodeiusingthecommonjargon The application of the algorithm to the Zachary net- i in extremal optimization problems. work provides the optimal modularity value after three The heuristic search we propose to find the optimal recursive iterations. The network is decomposed in four modularity value evolves as follows: communities and the value for the modularity is 0.419, greater than the value 0.381 reported by Newman [14], • Initially, we split the nodes of the whole graph in the value 0.406reportedby Reichardtet al. [23]and the two random partitions having the same number of value0.412reportedbyDonettietal. [24]usingdifferent nodes each one. This splitting creates an initial optimization methods. communities division, where communities are un- The extremal optimization (EO) approach presented derstood as connected components in each parti- here has several technical implementation details, that tion. are relevant for our purposes. In the original EO algo- • At each time step, the system self-organizes by rithm,thenodeselectedisalwaysthenodewiththeworst moving the node with the lower fitness (extremal) λj value. This is a deterministic and fast way to solve from one partition to the other. In principle, each the problem, but it presents some drawbacks: the final movementimpliestherecalculationofthefitnessof result strongly depends on the initialization and there is manynodesbecausetherighthandsideofequation no possibility to escape from local maxima. Instead, we (3) involves the pseudo-global magnitude a . use a probabilistic selection called τ-EO [18], in which r(i) the nodes are ranked according to their fitness values, • The process is repeated until an ”optimal state” and then the node of rank r is selected according to the withamaximumvalue ofQis reached. After that, we delete all the links between both partitions and proceed recursively with every resultant connected component. The process finishes when the modularity Q could not be improved [32]. Note that this process is not a bipartitioning of the Pajek First Cut SecoPnajekd Cut Third CutPajek graph as known in computer science [19], because: the Q = 0.3718 Q = 0.4020 Q = 0.4188 number of nodes in each partition is dependent on the evolutionprocessandnotrestrictedtobethesameatthe 0.4 end of the process;andmore importantly, eachpartition arity could contain different connected components (commu- ul d0.3 nities) that when the partitions are disconnected result Mo in several subgraphs. on Let us illustrate the above mentioned heuristics in a urati0.2 g simple case. We will apply it to the well-know Zachary nfi o karate club network [22]. Initially we split the nodes in C two random partitions (see Fig.1 left). Note that the 0.1 numberofinitialcommunities(connectedcomponentsin 0 50 100 150 200 250 300 eachpartition)inthis caseis five(see Fig.1right). After Algorithm Steps that, the self-organization process starts: the node with the ”worst fitness” is selected and moved from its parti- FIG. 2: Top: Network after edge removal at each recursive tion to the other partition, this movement provokes an cut. Bottom: Evolution of the Q value in the at each step avalanche of changes in the fitness of the rest of nodes. of the adaptation process. Separation bars indicate recursive We calculate the new value for the modularity Q, and divisions of thegraph performed at maximum Q. 3 following probability distribution: 1 y Girvan-Newman ctl0.9 Present e P(r)∝r−τ (4) orr d c0.8 aτfonrThdrahaasinlslbdoswoeoemlsunttniotoeuntenwsiecosdarlpkeaessrsoofsrfueonsnmidszietltoiNhvceeatltohompadttaiixmffaipemarplearnov.taaTlciunhheietstihaeoelxibzpstacoataniilnoeinenndgst ertices classifie00..67 um modularity000...468 v m τ ∼1+1/ln(N) [18]. The use of this technique also im- n of 0.5 Maxi0.2 pstleiepsstαheNdenteeerdmedinatotiodnecoidftehtehnautmthbeermoafxsiemlfu-omrgvaanliuzeathioans Fractio0.4 0Av. nu2mber in4ter-com6munit8y edge1s0 little chance to be improved. In practice, we keep track 0.3 ateachstepofthelastmaximumvalueobatinedforQ,if 0 2 4 6 8 10 Average number of inter-community edges per vertex this maximum is not improved in αN steps we stop the search. Usually α is empirically determined balancing FIG.3: Fractionofnodescorrectlyclassifiedusingcomputer- accuracy and efficiency in the algorithm, we use α = 1 generated graphs described in text. Each point is an average allowing as many steps as nodes to improve the current over 100 different networks. Inset: Average of the maximum maximum value of Q. The computational cost involved modularity obtained in each case. inthewholeprocessisO(N2ln2N)whereafactorNlnN is the cost associate to the ranking process, however it can be substantially reduced using heap data structures university e-mail network [11], the C.elegans metabolic [25] for the ranking selection process up to O(N). The network[28],anetworkofusersofthePGPalgorithmfor total cost of the algorithm can then be improved up to secureinformationtransactions[29],andfinally therela- O(N2lnN). tions between authors that shared a paper in cond-mat To test the performance of the algorithm we use first [30]. computer-generated graphs with a known community structure [5]. These graphs have 128 vertices grouped in Network Size QN #comsN QEO #comsEO fourcommunitiesof32vertices. Eachvertexhasonaver- Zachary 34 0.3810 2 0.4188 4 agez edgestoverticesinthesamecommunityandz in out edges to vertices in other communities, keeping an aver- Jazz 198 0.4379 4 0.4452 4 age degree z +z = 16. We generate several graphs C. elegans 453 0.4001 10 0.4342 12 in out using zout values between 0 and 10, and compare the E-mail 1133 0.4796 13 0.5738 15 results of our algorithm with those obtained using the PGP 10680 0.7329 80 0.8459 365 heuristics proposed by Newman [14]. This shows the ca- Cond-Mat 27519 0.6683 302 0.6790 647 pabilities of each algorithm identifying the communities whenthesearemorefuzzyinsidethewholenetwork. Us- TABLEI:Maximummodularityobtainedusingthealgorithm ing the Girvan-Newman algorithm, wich is the reference [14] QN and the extremal optimization algorithm QEO for algorithmforcommunityidentification,the communities differentcomplexnetworks. Itisalso includedthenumberof are well detected until values of z = 6. In contrast, communitiesfoundattheconfigurationwithmaximummod- out our algorithm detects the communities up to z = 8, ularity. out where the community structure still persist but is much moredifficulttoreveal,seeFig.3. Inthisparticularcase In Table I we present the results for the maximum 50 percent of the links are within the community and 50 modularity achieved by our algorithm compared to the percentarelinkswithnodesoutsidethecommunity. This modularity obtained using [14]. The difference in maxi- resultthatcouldseemcontradictoryisnot. Notethatthe mummodularityis upto 15%depending onthe network 50percentoflinkswithnodesoutsidethecommunityare considered. These differences result in a best determina- equally distributed among the rest of communities, and tion of the unknown community structure of the whole then its contribution to the definition of community is network. Thepartitionintocommunitiesisclearlydiffer- deprivedbythenumberofcommunitiesintherestofthe ent as shows the different number of communities found network,in our case three. For this reasonit is expected using both algorithms. to find community structure even in these cases. Note that sincethe coreofthe algorithmis stochastic, For values higher than 8, the average maximum mod- differentrunscouldyieldinprincipledifferentpartitions. ularity rapidly approach the limit Q = 0.208 (see inset We have performed 100 runs of the algorithm for the e- Fig.3), the expected modularity for a random network mail network and for a random network with the same with the same number of links and nodes, as it has been number of links and nodes to check the consistency of shown in [26]. the proposed method. In Fig. 4 we present the results Wehavealsoanalyzedthecommunitystructureofsev- ofthe fractionoftimes acouple ofnodes areclassifiedin eral real networks: the jazz musicians network [27], an the same partition. The community structure is clearly 4 revealedforthee-mailnetworkwhilefortherandomnet- work this structure is inexistent. Recently, Guimerà and Amaral have obtained similar results by applying simulated annealing to find the community structure in the context of metabolic networks [31]. Summarizing,wehavepresentedanextremaloptimiza- tion based algorithm that optimizes the modularity and allows an accurate identification of community structure incomplexnetworks. Theresultsoutperformallprevious algorithms existent in the literature. We thankM.Bogunã,L.Danon,A.Diaz-Guileraand R. Guimerà for helpful comments and suggestions. We FIG.4: Fractionofnodesclassifiedinthesamepartitionover also thank M.E.J. Newman for providing us the cond- 100 realizations of the algorithm. The color of the position (i,j) corresponds to the fraction of times that nodes i and j mat network. This work has been supported by DGES belong to thesame partition. of the Spanish Government Grant No. BFM-2003-08258 and EC-FET Open Project No. IST-2001-33555. [1] S.Strogatz, Nature410, 268 (2001). (2001). [2] R. Albert and A.-L. Barabási, Rev. Mod. Phys. 74, 47 [19] S. Boettcher and A. G. Percus, Phys Rev E 64, 026114 (2002). (2001). [3] S.Dorogovtsev and J. Mendes, Adv.Phys. 51 (2002). [20] P.BakandK.Sneppen,Phys.Rev.Lett.71,4083(1993). [4] M. E. J. Newman, SIAMReview 45, 167 (2003). [21] S.BoettcherandA.G.Percus,ArtificialIntelligence119, [5] M. Girvan and M. Newman, Proc. Natl. Acad. Sci. 99, 275 (2000). 7821 (2002). [22] W.W.Zachary,JournalofAnthropologicalResearch 33 [6] J.-P. Eckmann and E. Moses, Proc. Natl. Acad.Sci. 99, (1977). 5825 (2002). [23] J. Reichardt and S. Bornholdt, Phys. Rev. Lett 93, [7] K. Eriksen, I. Simonsen, S. Maslov, and K. Sneppen, 218701 (2004). Phys.Rev.Lett. 90, 148701 (2003). [24] L. Donetti and M. Munoz, J. Stat. Mech.: Theor. Exp. [8] P.Holme,M.Huss,andH.Jeong,Bioinformatics19,532 p. 10012 (2004). (2003). [25] A.V.Aho,J.D.Ullman,andJ.E.Hopcroft,DataStruc- [9] A.Arenas,L.Danon,A.Diaz-Guilera,P.M.Gleiser,and tures and Algorithms (Addison-Wesley,1983). R.Guimerà, Eur. Phys.J. B 38, 373 (2004). [26] R.Guimerá,M.Sales-Pardo,andL.A.N.Amaral,Phys. [10] M. E. J. Newman, Eur. Phys.J. B 38, 321 (2004). Rev. E70, 025101 (2004). [11] R. Guimerà, L. Danon, A. D´ıaz-Guilera, F. Giralt, and [27] P. Gleiser and L. Danon, Advancesin Complex Systems A.Arenas, Phys. Rev.E 68, 065103(R) (2003). 6, 565 (2003). [12] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and [28] H.Jeong, B.Tombor, R.Albert,Z.N.Oltvai,andA.-L. D.Parisi, Proc. Natl. Acad.Sci. 101, 2658 (2004). Barabási, Nature 407, 651 (2000). [13] M. E. J. Newman and M. Girvan, Phys. Rev. E 69, [29] X. Guardiola, R. Guimera, A. Arenas, A. Diaz-Guilera, 026113 (2004). and L. A.N. Amaral, cond-mat/0206240 (2002). [14] M. E. J. Newman, Phys.Rev.E 69, 066133 (2004). [30] M. E. J. Newman, Phys.Rev.E 64, 016131 (2001). [15] D. Sherrington and S. Kirkpatrick, Phys. Rev. Lett. 35, [31] R. Guimera and L. Amaral, Nature (2005), in press. 1792 (1975). [32] The value of Q always refers to the whole network i.e. [16] S. Kirkpatrick, C. Gilatt, and M. Vecchi, Science 220 is the sum over all the communities. At a certain mo- (1983). ment more subdivisions into communities will necessar- [17] D. E. Goldberg, Genetic Algorithms in Search, Opti- ily decrease Q because the limit of decomposition is a mization, and Machine Learning (Addison-Wesley Pro- community per node whose value of Q is negative. fessional, 1989). [18] S.BoettcherandA.G.Percus,Phys.Rev.Lett.86,5211

Community detection in complex networks using Extremal Optimization PDF

0.18 MB·English

by J. Duch

#additional_collections #journals #arxiv

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download Community detection in complex networks using Extremal Optimization PDF Free - Full Version

by J. Duch| 0.18| English

Download Community detection in complex networks using Extremal Optimization by J. Duch in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Community detection in complex networks using Extremal Optimization

No description available for this book.

Detailed Information

Author:	J. Duch
Language:	English
File Size:	0.18
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Community detection in complex networks using Extremal Optimization Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Community detection in complex networks using Extremal Optimization PDF?

Yes, on https://PDFdrive.to you can download Community detection in complex networks using Extremal Optimization by J. Duch completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Community detection in complex networks using Extremal Optimization on my mobile device?

After downloading Community detection in complex networks using Extremal Optimization PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Community detection in complex networks using Extremal Optimization?

Yes, this is the complete PDF version of Community detection in complex networks using Extremal Optimization by J. Duch. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Community detection in complex networks using Extremal Optimization PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.