Detecting Overlapping Link Communities by Finding Local Minima of a Cost Function with a Memetic Algorithm Part 1: Problem and Method Frank Havemann∗ Jochen Gl¨aser† Michael Heinz‡ Abstract ing communities by clustering links has been proposedbyEvansandLambiotte(2009)andby We propose an algorithm for detecting commu- Ahn,Bagrow,andLehmann(2010)asamethod nitiesoflinksinnetworkswhichuseslocalinfor- for the construction of overlapping communities 6 mation, is based on a new evaluation function, of nodes. In addition, clustering links is likely 1 0 and allows for pervasive overlaps of communi- to be advantageous whenever the information 2 ties. The complexity of the clustering task re- asymmetry described above occurs, i.e. when- quires the application of a memetic algorithm ever links rather than nodes have the real-world n u that combines probabilistic evolutionary strate- properties whose similarity shall be reflected by J gies with deterministic local searches. In Part 2 clusters. 3 we will present results of experiments with cita- Second, overlapping communities must be a 2 tion networks. possible outcome of the algorithm because the real-world phenomenon under investigation is ] I known to have such a structure. For the same 1 Introduction S reason, pervasive overlaps must be possible, i.e. . s overlapsthatextendtoallnodesratherthanjust c Communitiesinnetworksarecommonlydefined the boundary nodes of a community. The con- [ as cohesive subgraphs which are well separated struction of overlapping communities is by now 2 fromtherestofthenetwork. Thisvagueconcept a well-known and frequently addressed problem v of communities is operationalised in a variety ofnetworkanalysis(Fortunato2010;Xie,Kelley, 9 of ways (Fortunato 2010). The utility of algo- 3 and Szymanski 2013; Amelio and Pizzuti 2014). rithms for the detection of communities in net- 1 Third, the phenomena to be represented by works partly depends on their ‘conceptual fit’, 5 communities are local in that they emerge from i.e. on the degree to which they match proper- 0 local interactions represented by neighbouring . tiesofthephenomenonthatisrepresented(Hric, 1 nodes and links in the network. If this is the Darst, and Fortunato 2014). Achieving such a 0 case, the use of local rather than global in- 5 conceptualfitmayrequireunusualcombinations formation may return better communities and 1 of ideas from network analysis, as is the case : with the question and the algorithm presented a better community structure of the network v (Clauset 2005; Lancichinetti, Fortunato, and i in this paper. X Kertesz 2009; Havemann, Heinz, Struck, and Consider the following three properties of a r network and the task of community detection. Gl¨aser 2011). a All three ideas have been developed in net- First, links between nodes contain better infor- work analysis. However, as reviews of algo- mation about communities than the nodes that rithmsindicate(Fortunato2010;Xieetal. 2013; are to be clustered. In this case, link clustering Amelio and Pizzuti 2014), link clustering, per- appears to be the method of choice. Construct- vasively overlapping communities and use of lo- ∗Institut fu¨r Bibliotheks- und Informationswissen- cal information have not yet been combined all schaft,Humboldt-Universita¨tzuBerlin,D10099Berlin, three,possiblybecausethetaskforwhichthisis Dorotheenstr. 26(Germany) necessary has not yet arisen.1 †CenterforTechnologyandSociety,TUBerlin(Ger- many) ‡Institut fu¨r Bibliotheks- und Informationswissen- 1TheonlyapparentexceptionistheworkbyLeiPan schaft,Humboldt-Universit¨atzuBerlin etal.which,however,compromisesintwoexpects,global 1 There is at least one task for which this com- mentsofwhichareseenassimilarinsomesense. bination of link-based approach, pervasive over- In contrast, the approach to link clustering pro- lapsandlocalapproachisnecessary,namelythe posedbyAhnetal. (2010)isbasedonlinksim- detection of thematic structures (topics) in net- ilarity. The authors estimate the similarity of works of papers. two links by comparing their sets of neighbour- In networks of papers and their cited sources, ing nodes. This is not very appropiate for cita- citation links (links between a publication and tion links because we would estimate thematic the sources it cites) are thematically more ho- similarityofthematicallynearlyhomogenousel- mogenous than nodes (papers), and thus pro- ements (citation links) with sets of very inho- vide better information for clustering, than the mogenous elements (papers, cited sources). In papers themselves. While papers commonly be- the case of citation networks, it would be bet- long to more than one scientific topic, many ci- ter to measure link similarity by using textual tation links can be assumed to be homogenous information from citing and cited documents. in that the link between paper and source be- The local construction of topics, their vary- longs to only one topic. If it belongs to more ing size and pervasive overlaps make it likely than one topic these topics often are not very that topics form a poly-hierarchy i.e. a hierar- distant from each other. chy where a smaller topic can be a subtopic of Scientific topics are known to overlap perva- two or more larger topics that have no hierar- sively, which means that their reconstruction chical subtopic relation. This poly-hierarchy of as communities of papers must reflect this per- topics should be reflected in a poly-hierarchy of vasive overlap. Topics are also locally emer- communities. gent phenomena in that they represent coincid- Communities without sub-communities can ing and mutually referring perspectives of re- be well separated and very cohesive, too, but searchers (the authors of the papers). inside larger communities there can exist well In order to reconstruct scientific topics from separated sub-communities which diminish the networksofpapersandtheircitedsources,then, cohesion of their super-community. we need an algorithm that clusters links, can Since the cost landscape of link communi- construct pervasively overlapping communities, ties has many local minima, purely determin- andusesmainlylocalinformation. Inthispaper, istic search strategies are not efficient. This is wepresentsuchanalgorithm(inPart1)andits why we designed a memetic search that com- applicationtocitationnetworks(inPart2). We bines an evolutionary algorithm with determin- propose a local cost function for the indepen- istic adjustments in the cost landscape. Evolu- dent evaluation of each link community by re- tionary algorithms have already been used for latingitsexternaltoitstotalconnectivityinthe identifying communities in networks (Fortunato network. Thecostfunctionisalmostcompletely 2010, p. 106). Some authors have even applied basedonlocalinformation,theonlyglobalinfor- evolutionaryalgorithmstolinkclusteringbutall mation used is the number of links in the whole used global evaluation functions (Pizzuti 2009; network. The independent evaluation of each Li,Zhang,Wang,Liu,andZhang2013;Shi,Cai, subgraph with a local cost function means that Fu,Dong,andWu2013). Memeticevolutionary communities can be constructed independently algorithmshavealsobeenappliedtoreconstruct from each other, which enables pervasive over- communities but only for node clustering and laps. onlywithglobalevaluationfunctions(Gong,Fu, The cost function we propose for subgraph Jiao, and Du 2011; Pizzuti 2012; Gach and Hao evaluationissolelybasedonthenetwork’stopol- 2012; Ma, Gong, Liu, Cai, and Jiao 2014). ogy and not on link similarity. Generally, clus- tering by optimising a (global or local) evalua- 2 Strategy tion function needs no measure of similarity of clusteredelementsbutresultsinclusterstheele- The strategy we apply in response to the three informationusedintheendandnopervasiveoverlapbe- challengesdescribedintheintroductionconsists cause link clusters are disjunct (Pan, Wang, Xie, and of three main steps. We develop an evaluation Liu2011;Pan,Wang,andXie2012). Furthermore,they function for link communities that uses local in- differfromourapproachbecausetheyproposeanevalu- formation. This evaluation function makes it ationfunctionforlinkclusteringwhichisderivedwithin thenodeclusteringapproach. possible to construct each community indepen- 2 dentlyfromallothers,whichinturnenablesper- withalltheirlinks,andfollowitwithfinesearch vasive overlaps because inner links (links all of phase,namelylink-wisememeticevolutionorat whose neighbours are community members) of least a link-wise local search. one community can also be inner links of an- other community. We then design an algorithm 3 The cost function: that constructs local communities. For the first step, we followed a suggestion ratio node-cut by Evans and Lambiotte (2009) to obtain link clustersbyclusteringverticesinanetwork’sline 3.1 Node-induced and graph. We defined a local cost function Ψ(L) link-induced subgraphs in the line-graph approach which we call ratio node-cut. Itcanbeusedtoidentifylinkcommu- Traditionally, the boundary of a community is nities by finding local minima in the cost land- drawn between nodes and therefore cuts the scape. Since Ψ(L) evaluates the boundary be- linksbetweennodesinsideandoutsidethecom- tween a subgraph and the rest of the network, munity. If we consider communities as clusters communities can be constructed independently of links rather than nodes, the perspective must of all other communities. bereversed. Whiletheboundaryofanodecom- munity cuts links, the boundary of a link com- ThecostlandscapeofΨ(L)isoftenveryrough munity cuts nodes. i.e.hasmanylocalminimathatmaycorrespond A node community is a connected subgraph to very similar subgraphs. Therefore, the reso- defined by a node set C. It contains all links lution of the algorithm must be defined by set- existing between nodes in C. A link community ting a minimum distance (number of links that is a connected link-induced subgraph. It con- differ) between subgraphs corresponding to dif- tains all nodes attached to links of a given set ferent local minima. We define the range of a L. There can be links existing between a link community as a distance in which no subgraph community’s nodes which are not in L. exists that has a lower Ψ-value. Cost functions of a subgraph can be defined Since the task of finding communities in large byrelatingameasureofexternaltoameasureof networksisalwaysverycomplex,heuristicsmust totalconnectivity. Thisratioshouldbeminimal be applied. This applies even more strongly to forwellseparatedandcohesivesubgraphsi.e.for link clustering because networks contain many communities. more links than nodes, and particularly to the Node communities can be defined as con- rough Ψ-landscape. We chose an evolutionary nected subgraphs corresponding to minima in algorithm but accelerate evolution by combin- cost landscapes where places correspond to ing it with a deterministic local search in the node-induced subgraphs. Correspondingly, link costlandscape. Thisapproachiscalledmemetic communities can be defined as connected sub- (Neri,Cotta,andMoscato2012). Memeticalgo- graphs corresponding to minima in cost land- rithms can also find local optima of a local cost scapes where places correspond to link-induced function (Vitela and Castan˜os 2012). subgraphs. In evolutionary algorithms, individuals oc- In the following, we only consider connected cupy places in the cost (or fitness) landscape. unweighted graphs G = (V,E). The number of In our local algorithm, populations are sets of edges (or links) is m = |E|, the number of ver- different subgraphs. We start with a random tices (or nodes) is n = |V|. With k we denote i initialisation of the population of some definite thedegreeofnodei. Theinternaldegreeofnode size. The genetic operators of crossover, mu- i, denoted by kin(L), is the number of links at- i tation, and selection are repeatedly applied to tached to node i which are in link set L. The move the population into optima. In memetic externaldegreeofnodeiiskout(L)=k −kin(L). i i i algorithms each crossover and each mutation is followed by a local search. 3.2 External connectivity Inlargenetworksexploringthecostlandscape by adding or removing individual links is very We first consider measures of external connec- time-consuming. We therefore begin the search tivity of a subgraph which are useful for con- withacoarsesearchphasethataddsorremoves structing node or link communities. The sim- groups of links by adding or removing nodes plestmeasureofexternalconnectivityofanode- 3 induced subgraph is the cut size that equals the The sum is restricted to nodes attached to links sum of weights of boundary links i.e. the links in L because other nodes have kin(L) = 0. To- i connecting the subgraph with the rest of the tal connectivity of L is then given by the sum graph (Fortunato 2010, p. 92). If link weights σ(L) + τ(L) = (cid:80)n kin(L) = k (L) = 2|L|. i=1 i in represent electrical conductance, cut size mea- The derivation can be found in Appendix A. sures the total conductance of all boundary links. Cut size can be calculated as the sum 3.4 Cost function of external degrees kout(L) of boundary nodes i (subgraph members with boundary links). Relating external to total connectivity leads us Applying these considerations to the external to cost functions whose minima correspond to connectivityofalink-inducedsubgraphleadsto well separated and cohesive subgraphs. On the asimplemeasureofexternalconnectivityasthe other hand, we also achieve a size normalisation sum of koutkin/k of boundary nodes: when we divide external by total connectivity. i i i This is welcome, because the boundary length (cid:88)n kout(L)kin(L) (measuredbyexternalconnectivity)tendstoin- σ(L)= i i . (1) crease with size (here measured by total con- k i=1 i nectivity k (L) = 2|L|)—at least for not too in large subgraphs in not too small networks. If OnlyforboundarynodesofLwehavekoutkin > i i a subgraph occupies more than one half of the 0. That means, we can restrict the sum in the network its boundary tends to become shorter formula to boundary nodes. In function σ(L) with increasing size. A simple size normalisa- theexternaldegreeskout areweightedwithsub- i tion that accounts for the finite size of the net- graphmembership-gradekin/k oftheboundary i i work is achieved by adding to the external-total nodes. The function σ(L) can be derived from ratioofasubgraphthesameratioofitscomple- the total conductance or cut size of link sets in ment. For small subgraphs in a large network the graph’s line graph if the line graph’s edges thesecondratioisverysmall. Fornode-induced are weighted with 1/k —a weighting proposed i subgraphs this normalisation was introduced by by Evans and Lambiotte(2009). The derivation WeiandCheng(1989)andnamedratio cut. For can be found in Appendix A. link-induced subgraphs we analogously define a Each term of σ(L) equals the conductance of cost function ratio node-cut as a boundary node i i.e. the total conductance σ(L) σ(E\L) forcurrentsflowingoutofthesubgraphthrough Ψ(L) = + (4) k (L) k (E\L) this node. We call σ(L) the node cut of a link- in in induced subgraph. σ(L) = . (5) k (L)(1−k (L)/2m) in in 3.3 Internal and total connectivity Theexpressiononther.h.s. isobtainedbecause σ(E\L) = σ(L) and k (E\L) = 2m−k (L). in in Now we discuss measures of internal and total Ratio node-cut Ψ is not strictly local but the connectivity of subgraphs induced by node and only global information needed here is the to- by link sets, respectively. In the case of node- tal number of links m. In the limit of small induced subgraphs kin(C) = (cid:80)i∈Ckiin(C) is an subgraphsinlargenetworksweachieveapproxi- appropriate measure of internal connectivity of mately strict locality because we have k (L)(cid:28) in node set C. Total connectivity of C is then the 2m and we therefore obtain sum of degrees of all nodes in C: σ(L) Ψ(L)≈ , (6) (cid:88) (cid:88) k (L) k (C)= kin(C)+kout(C)= k . (2) in total i i i i∈C i∈C which equals a strictly local cost function for the construction of link communities intro- Foralink-inducedsubgraphwecanusethesum duced by us earlier (Havemann et al. 2012). ofinternaldegrees,weightedwiththeirmember- Our cost function Ψ(L) rewards separation of ship, as a measure of internal connectivity: link community L but not really its cohesion. Yang and Leskovec (2012) found that evaluat- τ(L)=(cid:88)n kiin(L)kiin(L). (3) ing node subgraphs with conductance—a mea- i=1 ki sure analogue to σ(L)/kin(L) in the world of 4 node communities—can also lead to communi- scape. Thus, we have to define the resolution of ties with low cohesion. the search by defining this minimal distance in Thatmeans,usingcostfunctionΨweempha- the landscape. The appropriate resolution de- sizeseparationandrequireonlyaminimalcohe- pends on the research question about the phe- sion of subgraphs, which is expressed by the de- nomenon represented by the network. The ex- mand that subgraphs must be connected. Oth- tent to which two communities should differ in erwise an unconnected subgraph with parts in content (of links) to consider them as different very different regions of the network could be a depends on the question asked about communi- community. ties. Function σ(L) vanishes for the empty sub- Anotherplaceinthecostlandscapeisreached graph with L = ∅ and for the full graph with by adding links to and by removing links from L = E. In both cases, the denominator of the the subgraph corresponding to the starting cost function also vanishes and we obtain zero place. The distance between two places in the divided by zero but it makes sense to define cost landscape equals the sum of the number of Ψ(E) = Ψ(∅) = 1 because Ψ of one link (1, links we have to add and to exclude. In other 2) with vanishing weight w approximates 1: words: the distance is the size of the symmetric 12 0 difference between the two link sets. We define . Ψ((1,2))=w (k1−w12)/k1+(k2−w12)/k2. the range of a community as the minimal dis- ● 1 12 2w12(1−2w12/2m) tance to a subgraph with lower cost. Within a (7) community’s range there is no better subgraph. Our cost function is symmetric: Ψ(L) = The resolution of a search for communities can Ψ(E\L), i.e. the cost function is the same for be defined as the minimal range of communities a link-induced subgraph and the subgraph in- that are accepted as valid solutions. Depending duced by the complementary link set E\L. onthenetworksrealbackground,arelativereso- lutioncanbemoreappropriate. Thatmeans,we 3.5 The cost landscape demand that any valid community should have arangewhichislargerthanacertainpercentage Each place in the cost landscape represents a of its size. 8 link-induced subgraph. Two places in the land- In order to determine the range of a com- . scape have a direct relation if and only if the munity we would need to know its whole en- 0 corresponding subgraphs differ in one link. The vironment up to the distance to the nearest height of each place is given by the value of the lower place in the cost landscape. Otherwise, a cost function Ψ(L). The global minimum of the lower place only determines an upper bound of cost function is reached for a division of the set the community’s range. However, searching the E of all links that produces the two best link whole environment of a subgraph is practically communitiesintermsofseparation. Asasimple impossibleforlargenetworks. Aselectivesearch example, we determined the Ψ-landscape of the isnecessary,whichiswhyweapplyevolutionary bow-tiegraph(Figure1,forcalculationsseeAp- anddeterministicgreedyalgorithms. Iftheseal- pendix B). We expect a cut through the central gorithms find an upper bound smaller than the 6 node to be the best division in two link commu- setresolution,wecandeselectthecommunity. If nities(thetwotriangles). Indeed, thelandscape they don’t, the community is provisionally kept . hastwominimawithΨ=1/3,whichcorrespond 0 to the two triangles. There are no further local ) minima. ● ● 1 We do not restrict the search for link commu- nities to finding only the global minimum but , define a link community as a connected link- ● 0 induced subgraph which corresponds to any lo- cal minimum in the Ψ-landscape. Since the Ψ- ( c landscape of larger graphs contains many local ● ● minima,weneedafiltertoselectthelocallybest 4 link communities. For this reason, we restrict . our search to those minima with a sufficiently 0 largedistancetoanylowerplaceinthecostland- Figure 1: Bow-tie graph 5 2 . 0 0 ● . 0 0.0 0.2 0.4 0.6 0.8 1.0 c(0, 1) butcanlaterbereplacedbyabettercommunity hasnottofindsubgraphswithlowercostineach within its minimal range defined by the set res- stepbutcangoanumberofstepsby‘tunneling’ olution. We assume, however, that later found through ‘barriers’ in the landscape (areas with better solutions differ only in some links. higherΨ)beforereachinglowervalueswhichin- validatethecommunityatthetunnel’sentrance. Tunneling makes the algorithm more efficient. 4 Memetic search Themaximumlengthofatunnelthroughabar- rier of higher Ψ-values is determined by the set Memetic algorithms combine random evolution resolution. with deterministic local search. In this section, Thelocalsearchcanbeginbyaseriesofeither we describe inclusions or exclusions of nodes (links). When 1. thelocalsearchweapply,calledadaptation no further improvement can be achieved, the for short, search switches from inclusion to exclusion or vice versa. Inclusion and exclusion are contin- 2. our implementation of the evolutionary ap- ued until no further improvement is possible. proach, If the exclusion of nodes fragments a sub- 3. the genetic operators of mutation, cross- graph, we proceed with the subgraph’s main over, and selection we employ in the evo- component. In the link-wise local search the lutionary approach. greedy algorithm is allowed to go through in- termediarystatesrepresentingunconnectedsub- The memetic algorithm is applied in the graphs. At the end of the link-wise local search search for link communities, which can be done wedetermineallcomponentsofthesubgraph. If by exploring the cost landscape of a network the subgraph is unconnected we repeat the pro- by adding or removing individual links. For cedure for each component until we obtain only largesubgraphsthisisverytime-consuming. We connected subgraphs with minimal cost. therefore split the search in a coarse phase, in A greedy algorithm is efficient because the whichweaddorremovenodeswithalltheirlinks cost reduction for all possible cases of includ- toothernodesinthesubgraph, andafinerlink- ing a neighbour must be calculated only at the wise search, which is applied after communities beginning of the local search. In the subsequent have been identified by a node-wise search. Af- steps, we only calculate or recalculate cost re- ter communities with a minimal range defined ductions achieved by adding neighbours of the by the set resolution are found in a node-wise link (or node) included. Analogously, we pro- memetic search, they are subjected to a link- ceed when excluding boundary nodes or links wise memetic search or at least a link-wise local (Havemann et al. 2012, Appendix). Otherwise search. it would be more efficient to include or exclude just the first node (link) which reduces cost. 4.1 Local search The local search in the cost landscape applies a greedy algorithm for finding local cost minima 4.2 Evolution that correspond to communities. The algorithm starts from the place occupied by the current The general implementation of the memetic al- subgraph and moves to subgraphs with lower gorithm is described by Algorithm 1.2 The ge- Ψ-values. The algorithm is greedy because it neticoperatorsofcrossover,mutation,andselec- always chooses the step that brings the biggest tion(describedbelow)areappliedtoeachgener- decrease or the smallest increase of Ψ. A step ation of communities. Subgraphs generated by includes or excludes a node with all their links crossover and mutation are adapted by a local to the nodes already in a subgraph in node-wise search. Ifthestartingsubgraphisnotconnected local search, and includes or excludes an indi- we replace it by its main component. Evolution vidual link in link-wise local search. isterminatedwhennobetterbestcommunityis A valid community can be made invalid and found for many generations. replaced by a better one if the better one is withinitsminimalrangewhichissetbytheres- 2The notation is inspired by a pseudocode given by olution parameter. Therefore, the local search Merz(2012). 6 4.3 Genetic operators them is part of the other one. Normally, evolu- tionary algorithms include some randomness in Mutation: Wemutateacommunitywithmu- the crossover, which in our case would mean to tation variance v < 1 by changing maximally a enlarge the intersection by some nodes or links proportion v of its links or nodes. In node-wise from the union. In contrast, our crossing pro- memeticevolutionswerandomlyexcludebound- cedure is deterministic because the boundary of ary nodes and then include the same number of the union of two good communities should also neighbouring nodes. In link-wise memetics we benottoobad. Thesameholdsfortheintersec- experiment with two other mutation operators: tion. Deterministic crossover should be (and is) weonlyexcludeor includelinksandconcentrate doneonlyoncewiththesameparents. Theonly changes around one randomly chosen boundary random element of our crossover is the random node. (Details can be found in Appendix of selection of parents. Part 2.) Selection: From the old population and the Crossover: From two parent subgraphs we resultsofmutationsandcrossoversweselectthe construct two new individuals by taking inter- communities with lowest Ψ-values, keeping the section and union of the subgraphs as starting population size constant. A new best commu- points for adaptive local searches. Of course, it nity is only included if it is inside the minimal has no effect to cross such parents where one of rangeofthebestcommunityoftheoriginalpop- ulation. Disregardingthebestcommunitiesout- side the minimal range assures that we do not Algorithm 1Pseudocodeofmemeticevolution lose communities which can have a range above for one adapted seed the minimum given by the resolution limit we apply. Deselected communities can be used as initialise population P by mutating the seeds for other memetic searches. adaptedseedwithhighvarianceseveraltimes and adapting mutants Renewal: Renewal means to mutate the best while the best community is not too old do community with high variance several times, to mutate the best community with low vari- adapt the mutants, and to apply a usual selec- ance and adapt the mutants tion procedure described above. if an adapted mutant is new and its cost is lower than highest cost then add it to population P 5 Concluding remarks end if cross the best community with some ran- In the forthcoming Part 2 of our paper we dis- domly chosen communities and adapt the cuss test results. offspring if adapted offspring is new and its cost is lower than highest cost then Acknowledgement add it to population P end if This work is part of a project in which we de- select the best communities so that the velop methods for measuring the diversity of re- population size remains constant search. The project was funded by the German if there is no better best community for Ministry for Education and Research (BMBF). somegenerationsandinnovationrateislow We would like to thank the members of the then project’sadvisorygroup3 andalsoalldevelopers renew the population by mutating the of R.4 We thank Alexander Struck for assisting best community with high variance and ourworkandfordiscussionsaboutthenewcost adapt mutants function. We discussed the issue of link similar- select the best communities so that the ity in citation networks with Rob Koopman. population size remains constant end if end while 3http://141.20.126.172/~div 4http://www.r-project.org 7 Appendix auxiliary bipartite graph. This becomes clear if we factorise the terms of the sum in equation 8: A Connectivity measures n E =(cid:88)√Bik √Bil . (9) kl k k i=1 i i In this section we derive the connectivity mea- suresforlinksetsσ(L)andkin(L)fromanalogue Then√wecanshortlywriteE =DTDwithDik = measures in the line graph. We closely follow B / k andverifytheEuclideannormalisation ik i the arguments given in our earlier paper (Have- of the n row vectors of D (for unweighted net- mann, Gl¨aser, Heinz, and Struck 2012). works for which we have B2 =B ): ik ik We here use i,j = 1,...,n to denote nodes and k,l = 1,...,m for links. With C(L) we (cid:88)m (cid:88)m B2 1 (cid:88)m D2 = ik = B =1. (10) name the set of nodes attached to links in the ik k k ik i i subgraph induced by link set L. If a link k be- k=1 k=1 k=1 longs to L its membership µk(L) = 1 and zero On the other hand, the projection of the nor- otherwise. malised bipartite graph described by affiliation To construct a network’s line graph we first matrixDbackonanetworkoftheoriginalnodes define an auxiliary bipartite graph obtained by is given by DDT. An element of adjacency ma- putting a node on each link of the original net- trix DDT is given by work. The affiliation matrix B of the bipartite m m graph—also called its incidence matrix—has a (cid:88)D D =(cid:88)BikBjk = Aij . (11) row for each of the n original nodes and a col- ik jk (cid:112)k k (cid:112)k k k=1 k=1 i j i j umn for each of the m original links. Each link column contains only two non-zero elements, Thus, Euclidean normalisation of B’s row vec- namely the elements in the rows of the nodes tors is equivalent to weighting each link in the i and j connected by the link. We can project original (unweighted) network with the geomet- the bipartite graph back onto the original net- ric mean of its nodes’ inverse degrees. The work with the product BBT which equals its weighted graph described by adjacency matrix adjacency matrix A (except for the main diago- E is not the line graph of the unweighted net- nal). workdescribedbyadjacencymatrixAbutofthe Weobtainthenetwork’slinegraphbytheop- network weighted according to equation 11. It posite projection BTB of the bipartite graph. dependsontherealrelationswemodelwiththe Evans and Lambiotte (2009) underline, that in network whether this is a realistic weighting. all cases of practical intererest the line graph Now we calculate internal connectivity τ(L) contains the same amount of information as the as the sum of internal degrees of vertices in the original network. Knowing BTB we can almost line graph: ever calculate BBT and thus also the network’s m adjacency matrix A. (cid:88) τ(L)= µ (L)E µ (L) k kl l Because each node of the original network is k,l=1 represented as a clique in the line graph Evans (12) m n and Lambiotte (2009) weighted the edges of the = (cid:88) µ (L)(cid:88)BikBilµ (L). k k l line graph with the inverse degree 1/k of the i i k,l=1 i=1 node i in the original network. They define the line graph’s adjacency matrix as In the same way, we can calculate external con- nectivity σ(L) as the sum of external degrees in n the line graph: E =(cid:88)BikBil. (8) kl k m i=1 i (cid:88) σ(L)= µ (L)E (1−µ (L)). k kl l k,l=1 Weighting the line graph’s edges with the in- m n verse degrees of nodes in the original network is = (cid:88) µ (L)(cid:88)BikBil(1−µ (L)). k k l equivalent to an Euclidean normalisation of the i k,l=1 i=1 nodes’ vectors in the affiliation matrix B of the (13) 8 0 ● . 1 8 . 0 6 . 0 Now we use the relations ) 2 4 1 m ● ● (cid:88) , µk(L)Bik =kiin(L) 1 ● 6 0 k=1 ( and 3 5 c m ● ● (cid:88) 4 (1−µl(L))Bil =kiout(L), . l=1 0which directly follow from the definition of the incidence matrix B. Thus, we get Figure 2: Bow-tie graph with numbered links (cid:88)n (kin(L))2 τ(L)= i We define the north pole as corresponding to k i i=1 the empty subgraph and the south pole as cor- and responding to the whole graph. The Ψ-globe of 2 (cid:88)n kin(L)kout(L) thebow-tiegraphhasfivecirclesoflatitudecor- σ(L)= i i . . ki responding to six subgraphs with one link, 15 0 i=1 with two, 20 with three, 15 with four, and six From this we easily derive total connectivity of with five links, respectively. a link-induced subgraph as the sum For the empty graph at the north pole σ = 0 and Ψ = 1 (by definition). The six single links n (cid:88) τ(L)+σ(L)= kin(L)=k (L). asthesmallestrealsubgraphsarelocatedatthe i in highest circle of latitude. The two outer links 1 i=1 and6haveσ =1·1/2+1·1/2=1andΨ=0.6, 0B Cost-landscape thefourinnerlinkshaveσ =1·1/2+1·3/4=5/4 . ● and Ψ=0.75. 0 of the bow-tie graph Therearetenconnectedandfiveunconnected subgraphs with two links: For the bow-tie graph we expect two link com- munities, namely the triangle {1,2,3} and its • four connected subgraphs with one outer complement{4,5,6},cf.Figure2andEvansand link and one inner link (e.g. link set {1,2}) 0.0 0.2 0.4 0.6 0.8 1.0 Lambiotte (2009). To describe the 2m different resulting in σ =1·1/2+1·3/4=5/4 and possible subgraphs it is advantageous to make Ψ≈0.469, use of the spherical topology of any landscape of subgraphs. Indeed, the cost-function land- • six connected subgraphs with two inner scape of a graph’s subgraphs can be seen as the links (e.g. link set {2,3}) and σ =1·1/2+ surface of a globe 2·2/4+1·c1/2(=02,a nd1Ψ)=0.75, • with the whole and the empty graph at the • four unconnected subgraphs with one outer poles, and one inner link (e.g. link set {1,4}) and σ =9/4 and Ψ≈0.844, • withallpossiblesubgraphsofthesamesize on each circle of latitude, and • one unconnected subgraph with two outer links ({1,6}) and σ =2 and Ψ=0.75. • with complementary subgraphs situated at antipodes. On the equator of the Ψ-globe there are 20 triples of links which can be classified into four The neighbours of a place in the landscape can types: be reached by adding an element to the set of nodes (for node-induced subgraphs) or of links • the triangle {1,2,3} and its complement (for link-induced subgraphs), respectively, or by {4,5,6} with σ =2·2/4=1 and Ψ=1/3, deleting an element from this set. That means, there are no direct relations between places on • four triples of inner links (e.g. link set the same circle of latitude. Steps (adding or re- {2,3,4}) and their unconnected comple- movingnodesorlinks)aremovesbetweenneigh- ments (e.g. link set {1,5,6}) with σ = bouring circles of latitude. 3/2+3/4=9/4 and Ψ=0.75, 9 • eight subgraphs with one of the two outer plexity in networks. Nature 466(7307), links, one of the two attached inner links 761–764. and one of the two inner links not attached Amelio, A. and C. Pizzuti (2014). Overlap- to the outer link (e.g. link set {1,2,4}): ping Community Discovery Methods: A theyallhaveσ =1·1/2+2·2/4+1·1/2=2 Survey. Social Networks: Analysis and and Ψ=2/3, Case Studies, 105. http://arxiv.org/ abs/1411.3935. • the unconnected triple with one outer and two inner links (set {1,4,5}) and its un- Clauset, A. (2005). Finding local commu- connected complement (set {2,3,6}) with nity structure in networks. Physical Re- σ =3 and Ψ=1. view E 72(2), 26132. On the two circles of latitude on the south- Evans, T. and R. Lambiotte (2009). Line ern hemisphere we find the complements of the graphs, link partitions, and overlapping subgraphs on the northern hemisphere with the communities. Physical Review E 80(1), same Ψ-values. Next to the equator we find 13 16105. connected and two unconnected subgraphs with Fortunato,S.(2010).Communitydetectionin four links each: graphs.Physics Reports 486(3-5),75–174. • twounconnectedquadrupleswithonetrian- Gach, O. and J.-K. Hao (2012, January). A gle and the second outer link (e.g. link set memetic algorithm for community detec- {1,2,3,6})whichhaveσ =2andΨ=0.75, tion in complex networks. In C. A. C. Coello, V. Cutello, K. Deb, S. Forrest, • the central star with all four inner links G. Nicosia, and M. Pavone (Eds.), Paral- {2,3,4,5} with σ = 4/2 = 2 and Ψ=0.75, lel Problem Solving from Nature - PPSN XII, Number 7492 in Lecture Notes in • four subgraphs containing one of the two Computer Science, pp. 327–336. Springer trianglesplusoneofthetwoinnerlinks(e.g. Berlin Heidelberg. link set {1,2,3,5}) which all have σ = 1· 3/4+1·1/2=5/4 and Ψ≈0.469, Gong, M., B. Fu, L. Jiao, and H. Du (2011, November). Memetic algorithm for com- • the four subgraphs with both outer links munitydetectioninnetworks.PhysicalRe- and two inner links connecting them (e.g. view E 84(5), 056101. link set {1,2,5,6}) with σ =2/2+4/4=2 Havemann, F., J. Gl¨aser, M. Heinz, and and Ψ=0.75, A. Struck (2012). Evaluating overlap- • the four subgraphs with one outer link and ping communities with the conductance three inner links (one of them attached to of their boundary nodes. arXiv preprint the outer link, e.g. link set {1,2,4,5}) with arXiv:1206.3992. σ =3/2+3/4=9/4 and Ψ≈0.844. Havemann, F., M. Heinz, A. Struck, and All complements of the six single links contain- J. Gl¨aser (2011). Identification of Over- ing the five other links are connected and have lapping Communities and their Hierarchy the same Ψ-values as their single-link comple- by Locally Calculating Community- ments (cf. above). The full graph at the south Changing Resolution Levels. Journal of pole with σ = 0 and Ψ = 1 is connected. The Statistical Mechanics: Theory and Exper- Ψ-landscape of links has two local minima: the iment 2011, P01023. doi: 10.1088/1742- two triangles have a locally and globally mini- 5468/2011/01/P01023, Arxiv preprint mal Ψ=1/3. There are no other local minima. arXiv:1008.1004. Thus, we obtain the pair of complementary tri- Hric,D.,R.K.Darst,andS.Fortunato(2014, angles as the only solution. December). Community detection in net- works: Structural communities versus ground truth. Physical Review E 90(6), References 062805. Ahn, Y., J. Bagrow, and S. Lehmann (2010). Lancichinetti, A., S. Fortunato, and Link communities reveal multiscale com- J. Kertesz (2009). Detecting the overlap- 10