Table Of ContentDetecting Overlapping Link Communities by Finding
Local Minima of a Cost Function with a Memetic Algorithm
Part 1: Problem and Method
Frank Havemann∗ Jochen Gl¨aser† Michael Heinz‡
Abstract
ing communities by clustering links has been
proposedbyEvansandLambiotte(2009)andby
We propose an algorithm for detecting commu- Ahn,Bagrow,andLehmann(2010)asamethod
nitiesoflinksinnetworkswhichuseslocalinfor- for the construction of overlapping communities
6
mation, is based on a new evaluation function, of nodes. In addition, clustering links is likely
1
0 and allows for pervasive overlaps of communi- to be advantageous whenever the information
2 ties. The complexity of the clustering task re- asymmetry described above occurs, i.e. when-
quires the application of a memetic algorithm ever links rather than nodes have the real-world
n
u that combines probabilistic evolutionary strate- properties whose similarity shall be reflected by
J gies with deterministic local searches. In Part 2 clusters.
3 we will present results of experiments with cita- Second, overlapping communities must be a
2 tion networks. possible outcome of the algorithm because the
real-world phenomenon under investigation is
]
I known to have such a structure. For the same
1 Introduction
S
reason, pervasive overlaps must be possible, i.e.
.
s overlapsthatextendtoallnodesratherthanjust
c Communitiesinnetworksarecommonlydefined
the boundary nodes of a community. The con-
[ as cohesive subgraphs which are well separated
struction of overlapping communities is by now
2 fromtherestofthenetwork. Thisvagueconcept
a well-known and frequently addressed problem
v of communities is operationalised in a variety
ofnetworkanalysis(Fortunato2010;Xie,Kelley,
9 of ways (Fortunato 2010). The utility of algo-
3 and Szymanski 2013; Amelio and Pizzuti 2014).
rithms for the detection of communities in net-
1 Third, the phenomena to be represented by
works partly depends on their ‘conceptual fit’,
5 communities are local in that they emerge from
i.e. on the degree to which they match proper-
0
local interactions represented by neighbouring
. tiesofthephenomenonthatisrepresented(Hric,
1 nodes and links in the network. If this is the
Darst, and Fortunato 2014). Achieving such a
0
case, the use of local rather than global in-
5 conceptualfitmayrequireunusualcombinations
formation may return better communities and
1 of ideas from network analysis, as is the case
: with the question and the algorithm presented a better community structure of the network
v
(Clauset 2005; Lancichinetti, Fortunato, and
i in this paper.
X Kertesz 2009; Havemann, Heinz, Struck, and
Consider the following three properties of a
r network and the task of community detection. Gl¨aser 2011).
a
All three ideas have been developed in net-
First, links between nodes contain better infor-
work analysis. However, as reviews of algo-
mation about communities than the nodes that
rithmsindicate(Fortunato2010;Xieetal. 2013;
are to be clustered. In this case, link clustering
Amelio and Pizzuti 2014), link clustering, per-
appears to be the method of choice. Construct-
vasively overlapping communities and use of lo-
∗Institut fu¨r Bibliotheks- und Informationswissen- cal information have not yet been combined all
schaft,Humboldt-Universita¨tzuBerlin,D10099Berlin, three,possiblybecausethetaskforwhichthisis
Dorotheenstr. 26(Germany)
necessary has not yet arisen.1
†CenterforTechnologyandSociety,TUBerlin(Ger-
many)
‡Institut fu¨r Bibliotheks- und Informationswissen- 1TheonlyapparentexceptionistheworkbyLeiPan
schaft,Humboldt-Universit¨atzuBerlin etal.which,however,compromisesintwoexpects,global
1
There is at least one task for which this com- mentsofwhichareseenassimilarinsomesense.
bination of link-based approach, pervasive over- In contrast, the approach to link clustering pro-
lapsandlocalapproachisnecessary,namelythe posedbyAhnetal. (2010)isbasedonlinksim-
detection of thematic structures (topics) in net- ilarity. The authors estimate the similarity of
works of papers. two links by comparing their sets of neighbour-
In networks of papers and their cited sources, ing nodes. This is not very appropiate for cita-
citation links (links between a publication and tion links because we would estimate thematic
the sources it cites) are thematically more ho- similarityofthematicallynearlyhomogenousel-
mogenous than nodes (papers), and thus pro- ements (citation links) with sets of very inho-
vide better information for clustering, than the mogenous elements (papers, cited sources). In
papers themselves. While papers commonly be- the case of citation networks, it would be bet-
long to more than one scientific topic, many ci- ter to measure link similarity by using textual
tation links can be assumed to be homogenous information from citing and cited documents.
in that the link between paper and source be- The local construction of topics, their vary-
longs to only one topic. If it belongs to more ing size and pervasive overlaps make it likely
than one topic these topics often are not very that topics form a poly-hierarchy i.e. a hierar-
distant from each other. chy where a smaller topic can be a subtopic of
Scientific topics are known to overlap perva- two or more larger topics that have no hierar-
sively, which means that their reconstruction chical subtopic relation. This poly-hierarchy of
as communities of papers must reflect this per- topics should be reflected in a poly-hierarchy of
vasive overlap. Topics are also locally emer- communities.
gent phenomena in that they represent coincid- Communities without sub-communities can
ing and mutually referring perspectives of re- be well separated and very cohesive, too, but
searchers (the authors of the papers). inside larger communities there can exist well
In order to reconstruct scientific topics from separated sub-communities which diminish the
networksofpapersandtheircitedsources,then, cohesion of their super-community.
we need an algorithm that clusters links, can Since the cost landscape of link communi-
construct pervasively overlapping communities, ties has many local minima, purely determin-
andusesmainlylocalinformation. Inthispaper, istic search strategies are not efficient. This is
wepresentsuchanalgorithm(inPart1)andits why we designed a memetic search that com-
applicationtocitationnetworks(inPart2). We bines an evolutionary algorithm with determin-
propose a local cost function for the indepen- istic adjustments in the cost landscape. Evolu-
dent evaluation of each link community by re- tionary algorithms have already been used for
latingitsexternaltoitstotalconnectivityinthe identifying communities in networks (Fortunato
network. Thecostfunctionisalmostcompletely 2010, p. 106). Some authors have even applied
basedonlocalinformation,theonlyglobalinfor- evolutionaryalgorithmstolinkclusteringbutall
mation used is the number of links in the whole used global evaluation functions (Pizzuti 2009;
network. The independent evaluation of each Li,Zhang,Wang,Liu,andZhang2013;Shi,Cai,
subgraph with a local cost function means that Fu,Dong,andWu2013). Memeticevolutionary
communities can be constructed independently algorithmshavealsobeenappliedtoreconstruct
from each other, which enables pervasive over- communities but only for node clustering and
laps. onlywithglobalevaluationfunctions(Gong,Fu,
The cost function we propose for subgraph Jiao, and Du 2011; Pizzuti 2012; Gach and Hao
evaluationissolelybasedonthenetwork’stopol- 2012; Ma, Gong, Liu, Cai, and Jiao 2014).
ogy and not on link similarity. Generally, clus-
tering by optimising a (global or local) evalua-
2 Strategy
tion function needs no measure of similarity of
clusteredelementsbutresultsinclusterstheele-
The strategy we apply in response to the three
informationusedintheendandnopervasiveoverlapbe- challengesdescribedintheintroductionconsists
cause link clusters are disjunct (Pan, Wang, Xie, and of three main steps. We develop an evaluation
Liu2011;Pan,Wang,andXie2012). Furthermore,they function for link communities that uses local in-
differfromourapproachbecausetheyproposeanevalu-
formation. This evaluation function makes it
ationfunctionforlinkclusteringwhichisderivedwithin
thenodeclusteringapproach. possible to construct each community indepen-
2
dentlyfromallothers,whichinturnenablesper- withalltheirlinks,andfollowitwithfinesearch
vasive overlaps because inner links (links all of phase,namelylink-wisememeticevolutionorat
whose neighbours are community members) of least a link-wise local search.
one community can also be inner links of an-
other community. We then design an algorithm
3 The cost function:
that constructs local communities.
For the first step, we followed a suggestion ratio node-cut
by Evans and Lambiotte (2009) to obtain link
clustersbyclusteringverticesinanetwork’sline 3.1 Node-induced and
graph. We defined a local cost function Ψ(L) link-induced subgraphs
in the line-graph approach which we call ratio
node-cut. Itcanbeusedtoidentifylinkcommu- Traditionally, the boundary of a community is
nities by finding local minima in the cost land- drawn between nodes and therefore cuts the
scape. Since Ψ(L) evaluates the boundary be- linksbetweennodesinsideandoutsidethecom-
tween a subgraph and the rest of the network, munity. If we consider communities as clusters
communities can be constructed independently of links rather than nodes, the perspective must
of all other communities. bereversed. Whiletheboundaryofanodecom-
munity cuts links, the boundary of a link com-
ThecostlandscapeofΨ(L)isoftenveryrough
munity cuts nodes.
i.e.hasmanylocalminimathatmaycorrespond
A node community is a connected subgraph
to very similar subgraphs. Therefore, the reso-
defined by a node set C. It contains all links
lution of the algorithm must be defined by set-
existing between nodes in C. A link community
ting a minimum distance (number of links that
is a connected link-induced subgraph. It con-
differ) between subgraphs corresponding to dif-
tains all nodes attached to links of a given set
ferent local minima. We define the range of a
L. There can be links existing between a link
community as a distance in which no subgraph
community’s nodes which are not in L.
exists that has a lower Ψ-value.
Cost functions of a subgraph can be defined
Since the task of finding communities in large
byrelatingameasureofexternaltoameasureof
networksisalwaysverycomplex,heuristicsmust
totalconnectivity. Thisratioshouldbeminimal
be applied. This applies even more strongly to
forwellseparatedandcohesivesubgraphsi.e.for
link clustering because networks contain many
communities.
more links than nodes, and particularly to the
Node communities can be defined as con-
rough Ψ-landscape. We chose an evolutionary
nected subgraphs corresponding to minima in
algorithm but accelerate evolution by combin-
cost landscapes where places correspond to
ing it with a deterministic local search in the
node-induced subgraphs. Correspondingly, link
costlandscape. Thisapproachiscalledmemetic
communities can be defined as connected sub-
(Neri,Cotta,andMoscato2012). Memeticalgo-
graphs corresponding to minima in cost land-
rithms can also find local optima of a local cost
scapes where places correspond to link-induced
function (Vitela and Castan˜os 2012).
subgraphs.
In evolutionary algorithms, individuals oc-
In the following, we only consider connected
cupy places in the cost (or fitness) landscape.
unweighted graphs G = (V,E). The number of
In our local algorithm, populations are sets of
edges (or links) is m = |E|, the number of ver-
different subgraphs. We start with a random
tices (or nodes) is n = |V|. With k we denote
i
initialisation of the population of some definite
thedegreeofnodei. Theinternaldegreeofnode
size. The genetic operators of crossover, mu- i, denoted by kin(L), is the number of links at-
i
tation, and selection are repeatedly applied to
tached to node i which are in link set L. The
move the population into optima. In memetic externaldegreeofnodeiiskout(L)=k −kin(L).
i i i
algorithms each crossover and each mutation is
followed by a local search.
3.2 External connectivity
Inlargenetworksexploringthecostlandscape
by adding or removing individual links is very We first consider measures of external connec-
time-consuming. We therefore begin the search tivity of a subgraph which are useful for con-
withacoarsesearchphasethataddsorremoves structing node or link communities. The sim-
groups of links by adding or removing nodes plestmeasureofexternalconnectivityofanode-
3
induced subgraph is the cut size that equals the The sum is restricted to nodes attached to links
sum of weights of boundary links i.e. the links in L because other nodes have kin(L) = 0. To-
i
connecting the subgraph with the rest of the tal connectivity of L is then given by the sum
graph (Fortunato 2010, p. 92). If link weights σ(L) + τ(L) = (cid:80)n kin(L) = k (L) = 2|L|.
i=1 i in
represent electrical conductance, cut size mea- The derivation can be found in Appendix A.
sures the total conductance of all boundary
links. Cut size can be calculated as the sum 3.4 Cost function
of external degrees kout(L) of boundary nodes
i
(subgraph members with boundary links). Relating external to total connectivity leads us
Applying these considerations to the external to cost functions whose minima correspond to
connectivityofalink-inducedsubgraphleadsto well separated and cohesive subgraphs. On the
asimplemeasureofexternalconnectivityasthe other hand, we also achieve a size normalisation
sum of koutkin/k of boundary nodes: when we divide external by total connectivity.
i i i
This is welcome, because the boundary length
(cid:88)n kout(L)kin(L) (measuredbyexternalconnectivity)tendstoin-
σ(L)= i i . (1) crease with size (here measured by total con-
k
i=1 i nectivity k (L) = 2|L|)—at least for not too
in
large subgraphs in not too small networks. If
OnlyforboundarynodesofLwehavekoutkin >
i i a subgraph occupies more than one half of the
0. That means, we can restrict the sum in the
network its boundary tends to become shorter
formula to boundary nodes. In function σ(L)
with increasing size. A simple size normalisa-
theexternaldegreeskout areweightedwithsub-
i tion that accounts for the finite size of the net-
graphmembership-gradekin/k oftheboundary
i i work is achieved by adding to the external-total
nodes. The function σ(L) can be derived from
ratioofasubgraphthesameratioofitscomple-
the total conductance or cut size of link sets in
ment. For small subgraphs in a large network
the graph’s line graph if the line graph’s edges
thesecondratioisverysmall. Fornode-induced
are weighted with 1/k —a weighting proposed
i subgraphs this normalisation was introduced by
by Evans and Lambiotte(2009). The derivation
WeiandCheng(1989)andnamedratio cut. For
can be found in Appendix A.
link-induced subgraphs we analogously define a
Each term of σ(L) equals the conductance of
cost function ratio node-cut as
a boundary node i i.e. the total conductance
σ(L) σ(E\L)
forcurrentsflowingoutofthesubgraphthrough Ψ(L) = + (4)
k (L) k (E\L)
this node. We call σ(L) the node cut of a link- in in
induced subgraph. σ(L)
= . (5)
k (L)(1−k (L)/2m)
in in
3.3 Internal and total connectivity Theexpressiononther.h.s. isobtainedbecause
σ(E\L) = σ(L) and k (E\L) = 2m−k (L).
in in
Now we discuss measures of internal and total Ratio node-cut Ψ is not strictly local but the
connectivity of subgraphs induced by node and only global information needed here is the to-
by link sets, respectively. In the case of node- tal number of links m. In the limit of small
induced subgraphs kin(C) = (cid:80)i∈Ckiin(C) is an subgraphsinlargenetworksweachieveapproxi-
appropriate measure of internal connectivity of mately strict locality because we have k (L)(cid:28)
in
node set C. Total connectivity of C is then the 2m and we therefore obtain
sum of degrees of all nodes in C:
σ(L)
Ψ(L)≈ , (6)
(cid:88) (cid:88) k (L)
k (C)= kin(C)+kout(C)= k . (2) in
total i i i
i∈C i∈C which equals a strictly local cost function for
the construction of link communities intro-
Foralink-inducedsubgraphwecanusethesum
duced by us earlier (Havemann et al. 2012).
ofinternaldegrees,weightedwiththeirmember-
Our cost function Ψ(L) rewards separation of
ship, as a measure of internal connectivity:
link community L but not really its cohesion.
Yang and Leskovec (2012) found that evaluat-
τ(L)=(cid:88)n kiin(L)kiin(L). (3) ing node subgraphs with conductance—a mea-
i=1 ki sure analogue to σ(L)/kin(L) in the world of
4
node communities—can also lead to communi- scape. Thus, we have to define the resolution of
ties with low cohesion. the search by defining this minimal distance in
Thatmeans,usingcostfunctionΨweempha- the landscape. The appropriate resolution de-
sizeseparationandrequireonlyaminimalcohe- pends on the research question about the phe-
sion of subgraphs, which is expressed by the de- nomenon represented by the network. The ex-
mand that subgraphs must be connected. Oth- tent to which two communities should differ in
erwise an unconnected subgraph with parts in content (of links) to consider them as different
very different regions of the network could be a depends on the question asked about communi-
community. ties.
Function σ(L) vanishes for the empty sub- Anotherplaceinthecostlandscapeisreached
graph with L = ∅ and for the full graph with by adding links to and by removing links from
L = E. In both cases, the denominator of the the subgraph corresponding to the starting
cost function also vanishes and we obtain zero place. The distance between two places in the
divided by zero but it makes sense to define cost landscape equals the sum of the number of
Ψ(E) = Ψ(∅) = 1 because Ψ of one link (1, links we have to add and to exclude. In other
2) with vanishing weight w approximates 1: words: the distance is the size of the symmetric
12
0
difference between the two link sets. We define
. Ψ((1,2))=w (k1−w12)/k1+(k2−w12)/k2. the range of a community as the minimal dis- ●
1 12 2w12(1−2w12/2m) tance to a subgraph with lower cost. Within a
(7) community’s range there is no better subgraph.
Our cost function is symmetric: Ψ(L) = The resolution of a search for communities can
Ψ(E\L), i.e. the cost function is the same for be defined as the minimal range of communities
a link-induced subgraph and the subgraph in- that are accepted as valid solutions. Depending
duced by the complementary link set E\L. onthenetworksrealbackground,arelativereso-
lutioncanbemoreappropriate. Thatmeans,we
3.5 The cost landscape demand that any valid community should have
arangewhichislargerthanacertainpercentage
Each place in the cost landscape represents a of its size.
8
link-induced subgraph. Two places in the land- In order to determine the range of a com-
. scape have a direct relation if and only if the munity we would need to know its whole en-
0 corresponding subgraphs differ in one link. The vironment up to the distance to the nearest
height of each place is given by the value of the lower place in the cost landscape. Otherwise, a
cost function Ψ(L). The global minimum of the lower place only determines an upper bound of
cost function is reached for a division of the set the community’s range. However, searching the
E of all links that produces the two best link whole environment of a subgraph is practically
communitiesintermsofseparation. Asasimple impossibleforlargenetworks. Aselectivesearch
example, we determined the Ψ-landscape of the isnecessary,whichiswhyweapplyevolutionary
bow-tiegraph(Figure1,forcalculationsseeAp- anddeterministicgreedyalgorithms. Iftheseal-
pendix B). We expect a cut through the central gorithms find an upper bound smaller than the
6 node to be the best division in two link commu- setresolution,wecandeselectthecommunity. If
nities(thetwotriangles). Indeed, thelandscape they don’t, the community is provisionally kept
.
hastwominimawithΨ=1/3,whichcorrespond
0
to the two triangles. There are no further local
) minima.
● ●
1
We do not restrict the search for link commu-
nities to finding only the global minimum but
, define a link community as a connected link-
●
0 induced subgraph which corresponds to any lo-
cal minimum in the Ψ-landscape. Since the Ψ-
(
c landscape of larger graphs contains many local ● ●
minima,weneedafiltertoselectthelocallybest
4
link communities. For this reason, we restrict
. our search to those minima with a sufficiently
0 largedistancetoanylowerplaceinthecostland- Figure 1: Bow-tie graph
5
2
.
0
0
●
.
0
0.0 0.2 0.4 0.6 0.8 1.0
c(0, 1)
butcanlaterbereplacedbyabettercommunity hasnottofindsubgraphswithlowercostineach
within its minimal range defined by the set res- stepbutcangoanumberofstepsby‘tunneling’
olution. We assume, however, that later found through ‘barriers’ in the landscape (areas with
better solutions differ only in some links. higherΨ)beforereachinglowervalueswhichin-
validatethecommunityatthetunnel’sentrance.
Tunneling makes the algorithm more efficient.
4 Memetic search
Themaximumlengthofatunnelthroughabar-
rier of higher Ψ-values is determined by the set
Memetic algorithms combine random evolution
resolution.
with deterministic local search. In this section,
Thelocalsearchcanbeginbyaseriesofeither
we describe
inclusions or exclusions of nodes (links). When
1. thelocalsearchweapply,calledadaptation no further improvement can be achieved, the
for short, search switches from inclusion to exclusion or
vice versa. Inclusion and exclusion are contin-
2. our implementation of the evolutionary ap-
ued until no further improvement is possible.
proach,
If the exclusion of nodes fragments a sub-
3. the genetic operators of mutation, cross- graph, we proceed with the subgraph’s main
over, and selection we employ in the evo- component. In the link-wise local search the
lutionary approach. greedy algorithm is allowed to go through in-
termediarystatesrepresentingunconnectedsub-
The memetic algorithm is applied in the
graphs. At the end of the link-wise local search
search for link communities, which can be done
wedetermineallcomponentsofthesubgraph. If
by exploring the cost landscape of a network
the subgraph is unconnected we repeat the pro-
by adding or removing individual links. For
cedure for each component until we obtain only
largesubgraphsthisisverytime-consuming. We
connected subgraphs with minimal cost.
therefore split the search in a coarse phase, in
A greedy algorithm is efficient because the
whichweaddorremovenodeswithalltheirlinks
cost reduction for all possible cases of includ-
toothernodesinthesubgraph, andafinerlink-
ing a neighbour must be calculated only at the
wise search, which is applied after communities
beginning of the local search. In the subsequent
have been identified by a node-wise search. Af-
steps, we only calculate or recalculate cost re-
ter communities with a minimal range defined
ductions achieved by adding neighbours of the
by the set resolution are found in a node-wise
link (or node) included. Analogously, we pro-
memetic search, they are subjected to a link-
ceed when excluding boundary nodes or links
wise memetic search or at least a link-wise local
(Havemann et al. 2012, Appendix). Otherwise
search.
it would be more efficient to include or exclude
just the first node (link) which reduces cost.
4.1 Local search
The local search in the cost landscape applies a
greedy algorithm for finding local cost minima 4.2 Evolution
that correspond to communities. The algorithm
starts from the place occupied by the current The general implementation of the memetic al-
subgraph and moves to subgraphs with lower gorithm is described by Algorithm 1.2 The ge-
Ψ-values. The algorithm is greedy because it neticoperatorsofcrossover,mutation,andselec-
always chooses the step that brings the biggest tion(describedbelow)areappliedtoeachgener-
decrease or the smallest increase of Ψ. A step ation of communities. Subgraphs generated by
includes or excludes a node with all their links crossover and mutation are adapted by a local
to the nodes already in a subgraph in node-wise search. Ifthestartingsubgraphisnotconnected
local search, and includes or excludes an indi- we replace it by its main component. Evolution
vidual link in link-wise local search. isterminatedwhennobetterbestcommunityis
A valid community can be made invalid and found for many generations.
replaced by a better one if the better one is
withinitsminimalrangewhichissetbytheres- 2The notation is inspired by a pseudocode given by
olution parameter. Therefore, the local search Merz(2012).
6
4.3 Genetic operators them is part of the other one. Normally, evolu-
tionary algorithms include some randomness in
Mutation: Wemutateacommunitywithmu-
the crossover, which in our case would mean to
tation variance v < 1 by changing maximally a
enlarge the intersection by some nodes or links
proportion v of its links or nodes. In node-wise
from the union. In contrast, our crossing pro-
memeticevolutionswerandomlyexcludebound-
cedure is deterministic because the boundary of
ary nodes and then include the same number of
the union of two good communities should also
neighbouring nodes. In link-wise memetics we
benottoobad. Thesameholdsfortheintersec-
experiment with two other mutation operators:
tion. Deterministic crossover should be (and is)
weonlyexcludeor includelinksandconcentrate
doneonlyoncewiththesameparents. Theonly
changes around one randomly chosen boundary
random element of our crossover is the random
node. (Details can be found in Appendix of
selection of parents.
Part 2.)
Selection: From the old population and the
Crossover: From two parent subgraphs we resultsofmutationsandcrossoversweselectthe
construct two new individuals by taking inter- communities with lowest Ψ-values, keeping the
section and union of the subgraphs as starting population size constant. A new best commu-
points for adaptive local searches. Of course, it nity is only included if it is inside the minimal
has no effect to cross such parents where one of rangeofthebestcommunityoftheoriginalpop-
ulation. Disregardingthebestcommunitiesout-
side the minimal range assures that we do not
Algorithm 1Pseudocodeofmemeticevolution lose communities which can have a range above
for one adapted seed the minimum given by the resolution limit we
apply. Deselected communities can be used as
initialise population P by mutating the
seeds for other memetic searches.
adaptedseedwithhighvarianceseveraltimes
and adapting mutants
Renewal: Renewal means to mutate the best
while the best community is not too old do
community with high variance several times, to
mutate the best community with low vari-
adapt the mutants, and to apply a usual selec-
ance and adapt the mutants
tion procedure described above.
if an adapted mutant is new and its cost is
lower than highest cost then
add it to population P 5 Concluding remarks
end if
cross the best community with some ran-
In the forthcoming Part 2 of our paper we dis-
domly chosen communities and adapt the
cuss test results.
offspring
if adapted offspring is new and its cost is
lower than highest cost then Acknowledgement
add it to population P
end if This work is part of a project in which we de-
select the best communities so that the velop methods for measuring the diversity of re-
population size remains constant search. The project was funded by the German
if there is no better best community for Ministry for Education and Research (BMBF).
somegenerationsandinnovationrateislow We would like to thank the members of the
then project’sadvisorygroup3 andalsoalldevelopers
renew the population by mutating the of R.4 We thank Alexander Struck for assisting
best community with high variance and ourworkandfordiscussionsaboutthenewcost
adapt mutants function. We discussed the issue of link similar-
select the best communities so that the ity in citation networks with Rob Koopman.
population size remains constant
end if
end while
3http://141.20.126.172/~div
4http://www.r-project.org
7
Appendix
auxiliary bipartite graph. This becomes clear if
we factorise the terms of the sum in equation 8:
A Connectivity measures n
E =(cid:88)√Bik √Bil . (9)
kl
k k
i=1 i i
In this section we derive the connectivity mea-
suresforlinksetsσ(L)andkin(L)fromanalogue Then√wecanshortlywriteE =DTDwithDik =
measures in the line graph. We closely follow B / k andverifytheEuclideannormalisation
ik i
the arguments given in our earlier paper (Have- of the n row vectors of D (for unweighted net-
mann, Gl¨aser, Heinz, and Struck 2012). works for which we have B2 =B ):
ik ik
We here use i,j = 1,...,n to denote nodes
and k,l = 1,...,m for links. With C(L) we (cid:88)m (cid:88)m B2 1 (cid:88)m
D2 = ik = B =1. (10)
name the set of nodes attached to links in the ik k k ik
i i
subgraph induced by link set L. If a link k be- k=1 k=1 k=1
longs to L its membership µk(L) = 1 and zero On the other hand, the projection of the nor-
otherwise. malised bipartite graph described by affiliation
To construct a network’s line graph we first matrixDbackonanetworkoftheoriginalnodes
define an auxiliary bipartite graph obtained by is given by DDT. An element of adjacency ma-
putting a node on each link of the original net- trix DDT is given by
work. The affiliation matrix B of the bipartite
m m
graph—also called its incidence matrix—has a (cid:88)D D =(cid:88)BikBjk = Aij . (11)
row for each of the n original nodes and a col- ik jk (cid:112)k k (cid:112)k k
k=1 k=1 i j i j
umn for each of the m original links. Each link
column contains only two non-zero elements, Thus, Euclidean normalisation of B’s row vec-
namely the elements in the rows of the nodes tors is equivalent to weighting each link in the
i and j connected by the link. We can project original (unweighted) network with the geomet-
the bipartite graph back onto the original net- ric mean of its nodes’ inverse degrees. The
work with the product BBT which equals its weighted graph described by adjacency matrix
adjacency matrix A (except for the main diago- E is not the line graph of the unweighted net-
nal). workdescribedbyadjacencymatrixAbutofthe
Weobtainthenetwork’slinegraphbytheop- network weighted according to equation 11. It
posite projection BTB of the bipartite graph. dependsontherealrelationswemodelwiththe
Evans and Lambiotte (2009) underline, that in network whether this is a realistic weighting.
all cases of practical intererest the line graph Now we calculate internal connectivity τ(L)
contains the same amount of information as the as the sum of internal degrees of vertices in the
original network. Knowing BTB we can almost line graph:
ever calculate BBT and thus also the network’s
m
adjacency matrix A. (cid:88)
τ(L)= µ (L)E µ (L)
k kl l
Because each node of the original network is
k,l=1
represented as a clique in the line graph Evans (12)
m n
and Lambiotte (2009) weighted the edges of the = (cid:88) µ (L)(cid:88)BikBilµ (L).
k k l
line graph with the inverse degree 1/k of the i
i k,l=1 i=1
node i in the original network. They define the
line graph’s adjacency matrix as In the same way, we can calculate external con-
nectivity σ(L) as the sum of external degrees in
n the line graph:
E =(cid:88)BikBil. (8)
kl k m
i=1 i (cid:88)
σ(L)= µ (L)E (1−µ (L)).
k kl l
k,l=1
Weighting the line graph’s edges with the in-
m n
verse degrees of nodes in the original network is = (cid:88) µ (L)(cid:88)BikBil(1−µ (L)).
k k l
equivalent to an Euclidean normalisation of the i
k,l=1 i=1
nodes’ vectors in the affiliation matrix B of the (13)
8
0
●
.
1
8
.
0
6
.
0
Now we use the relations
) 2 4
1 m ● ●
(cid:88)
, µk(L)Bik =kiin(L) 1 ● 6
0 k=1
( and 3 5
c m ● ●
(cid:88)
4 (1−µl(L))Bil =kiout(L),
. l=1
0which directly follow from the definition of the
incidence matrix B. Thus, we get Figure 2: Bow-tie graph with numbered links
(cid:88)n (kin(L))2
τ(L)= i
We define the north pole as corresponding to
k
i
i=1 the empty subgraph and the south pole as cor-
and responding to the whole graph. The Ψ-globe of
2 (cid:88)n kin(L)kout(L) thebow-tiegraphhasfivecirclesoflatitudecor-
σ(L)= i i .
. ki responding to six subgraphs with one link, 15
0 i=1 with two, 20 with three, 15 with four, and six
From this we easily derive total connectivity of with five links, respectively.
a link-induced subgraph as the sum For the empty graph at the north pole σ = 0
and Ψ = 1 (by definition). The six single links
n
(cid:88)
τ(L)+σ(L)= kin(L)=k (L). asthesmallestrealsubgraphsarelocatedatthe
i in
highest circle of latitude. The two outer links 1
i=1
and6haveσ =1·1/2+1·1/2=1andΨ=0.6,
0B Cost-landscape thefourinnerlinkshaveσ =1·1/2+1·3/4=5/4
. ● and Ψ=0.75.
0 of the bow-tie graph
Therearetenconnectedandfiveunconnected
subgraphs with two links:
For the bow-tie graph we expect two link com-
munities, namely the triangle {1,2,3} and its • four connected subgraphs with one outer
complement{4,5,6},cf.Figure2andEvansand link and one inner link (e.g. link set {1,2})
0.0 0.2 0.4 0.6 0.8 1.0
Lambiotte (2009). To describe the 2m different resulting in σ =1·1/2+1·3/4=5/4 and
possible subgraphs it is advantageous to make Ψ≈0.469,
use of the spherical topology of any landscape
of subgraphs. Indeed, the cost-function land- • six connected subgraphs with two inner
scape of a graph’s subgraphs can be seen as the links (e.g. link set {2,3}) and σ =1·1/2+
surface of a globe 2·2/4+1·c1/2(=02,a nd1Ψ)=0.75,
• with the whole and the empty graph at the • four unconnected subgraphs with one outer
poles, and one inner link (e.g. link set {1,4}) and
σ =9/4 and Ψ≈0.844,
• withallpossiblesubgraphsofthesamesize
on each circle of latitude, and • one unconnected subgraph with two outer
links ({1,6}) and σ =2 and Ψ=0.75.
• with complementary subgraphs situated at
antipodes. On the equator of the Ψ-globe there are 20
triples of links which can be classified into four
The neighbours of a place in the landscape can
types:
be reached by adding an element to the set of
nodes (for node-induced subgraphs) or of links
• the triangle {1,2,3} and its complement
(for link-induced subgraphs), respectively, or by
{4,5,6} with σ =2·2/4=1 and Ψ=1/3,
deleting an element from this set. That means,
there are no direct relations between places on • four triples of inner links (e.g. link set
the same circle of latitude. Steps (adding or re- {2,3,4}) and their unconnected comple-
movingnodesorlinks)aremovesbetweenneigh- ments (e.g. link set {1,5,6}) with σ =
bouring circles of latitude. 3/2+3/4=9/4 and Ψ=0.75,
9
• eight subgraphs with one of the two outer plexity in networks. Nature 466(7307),
links, one of the two attached inner links 761–764.
and one of the two inner links not attached
Amelio, A. and C. Pizzuti (2014). Overlap-
to the outer link (e.g. link set {1,2,4}):
ping Community Discovery Methods: A
theyallhaveσ =1·1/2+2·2/4+1·1/2=2
Survey. Social Networks: Analysis and
and Ψ=2/3,
Case Studies, 105. http://arxiv.org/
abs/1411.3935.
• the unconnected triple with one outer and
two inner links (set {1,4,5}) and its un- Clauset, A. (2005). Finding local commu-
connected complement (set {2,3,6}) with nity structure in networks. Physical Re-
σ =3 and Ψ=1. view E 72(2), 26132.
On the two circles of latitude on the south- Evans, T. and R. Lambiotte (2009). Line
ern hemisphere we find the complements of the graphs, link partitions, and overlapping
subgraphs on the northern hemisphere with the communities. Physical Review E 80(1),
same Ψ-values. Next to the equator we find 13 16105.
connected and two unconnected subgraphs with Fortunato,S.(2010).Communitydetectionin
four links each: graphs.Physics Reports 486(3-5),75–174.
• twounconnectedquadrupleswithonetrian- Gach, O. and J.-K. Hao (2012, January). A
gle and the second outer link (e.g. link set memetic algorithm for community detec-
{1,2,3,6})whichhaveσ =2andΨ=0.75, tion in complex networks. In C. A. C.
Coello, V. Cutello, K. Deb, S. Forrest,
• the central star with all four inner links
G. Nicosia, and M. Pavone (Eds.), Paral-
{2,3,4,5} with σ = 4/2 = 2 and Ψ=0.75,
lel Problem Solving from Nature - PPSN
XII, Number 7492 in Lecture Notes in
• four subgraphs containing one of the two
Computer Science, pp. 327–336. Springer
trianglesplusoneofthetwoinnerlinks(e.g.
Berlin Heidelberg.
link set {1,2,3,5}) which all have σ = 1·
3/4+1·1/2=5/4 and Ψ≈0.469, Gong, M., B. Fu, L. Jiao, and H. Du (2011,
November). Memetic algorithm for com-
• the four subgraphs with both outer links
munitydetectioninnetworks.PhysicalRe-
and two inner links connecting them (e.g.
view E 84(5), 056101.
link set {1,2,5,6}) with σ =2/2+4/4=2
Havemann, F., J. Gl¨aser, M. Heinz, and
and Ψ=0.75,
A. Struck (2012). Evaluating overlap-
• the four subgraphs with one outer link and ping communities with the conductance
three inner links (one of them attached to of their boundary nodes. arXiv preprint
the outer link, e.g. link set {1,2,4,5}) with arXiv:1206.3992.
σ =3/2+3/4=9/4 and Ψ≈0.844.
Havemann, F., M. Heinz, A. Struck, and
All complements of the six single links contain- J. Gl¨aser (2011). Identification of Over-
ing the five other links are connected and have lapping Communities and their Hierarchy
the same Ψ-values as their single-link comple- by Locally Calculating Community-
ments (cf. above). The full graph at the south Changing Resolution Levels. Journal of
pole with σ = 0 and Ψ = 1 is connected. The Statistical Mechanics: Theory and Exper-
Ψ-landscape of links has two local minima: the iment 2011, P01023. doi: 10.1088/1742-
two triangles have a locally and globally mini- 5468/2011/01/P01023, Arxiv preprint
mal Ψ=1/3. There are no other local minima. arXiv:1008.1004.
Thus, we obtain the pair of complementary tri- Hric,D.,R.K.Darst,andS.Fortunato(2014,
angles as the only solution. December). Community detection in net-
works: Structural communities versus
ground truth. Physical Review E 90(6),
References
062805.
Ahn, Y., J. Bagrow, and S. Lehmann (2010). Lancichinetti, A., S. Fortunato, and
Link communities reveal multiscale com- J. Kertesz (2009). Detecting the overlap-
10