Table Of Content

1 Latent Network Features and Overlapping Community Discovery via Boolean Intersection Representations Son Hoang Dau, Member, and Olgica Milenkovic, Senior Member, IEEE Abstract—We propose a new latent Boolean feature model for intersection number of the graph [9]. An assignment of sets of complexnetworksthatcapturesdifferenttypesofnodeinteractions features to vertices that achieves the perfect score is known as and network communities. The model is based on a new concept an intersection representation of a graph (see Fig. 1)1. If in the in graph theory, termed the Boolean intersection representation Intersection Condition one insisted on u and v sharing at least 6of a graph, which generalizes the notion of an intersection rep- 1resentation. We mostly focus on one form of Boolean intersection, p 1 common features, achieving a perfect score would require ≥ 0termed cointersection, and describe how to use this representation aminimumnumberoffeaturesequaltothep-intersectionnumber 2to deduce node feature sets and their communities. We derive of the graph [10], [11]. severalgeneralboundsontheminimumnumberoffeaturesusedin n a a cointersection representations and discuss graph families for which { 1} { 3} u exact cointersection characterizations are possible. Our results also 1 5 J include algorithms for finding optimal and approximate cointersec- 8tion representations of a graph. 2 2 3 4 ] I. INTRODUCTION O a a ,a a ,a { 1} { 1 2} { 2 3} An important task in network analysis is to understand the C mechanism behind the formation of a given complex network. Fig.1. Illustrationofanintersectionrepresentationofagraphfrom[8].Vertices . hLatent feature models for networks seek to explain the observed areassignedsubsetsfromthefeaturesetA={a1,a2,a3}sothattwovertices atpairwise connections among the nodes in a network by associat- areadjacentifandonlyiftheyshareatleastonecommonfeature.Inthiscase, theintersectionnumberisthree. ming to each node a set of features and by setting rules based on [which pairs of nodes are connected according to their features. Intersection representations elucidate overlapping community Inference of latent network features not only allows for the structures via a simple generative principle: one feature - one 4 discovery of community structures in networks via association community. As an illustrative example, each feature in Fig. 1 v with features but also aids in predicting unobserved connections. maydescribeonecommunity;thetriangleformsonecommunity 3 4As such, feature inference is invaluable in the study of social defined by feature a1, and the remaining two edges are defined 3networks, protein complexes and gene regulatory modules. by features a and a , respectively. Note that all communities 2 3 6 Probabilistic latent feature models for networks are usually are cliques, and that they may overlap (intersect). 0 studied via machine learning techniques; known problems and We propose to extend the combinatorial variant of the model . 1analytic approaches include the Binary Matrix Factorization studied by Bonchi et al. [12] and by Tsourakakis [8] to a much 0model [1], the Mixed-Membership Stochastic Block model [2], more general setting by using Boolean functions of features 6 the Infinite Latent Feature/Attribute model [3], [4], the Multi- that can express more complicated interactions among nodes 1 :plicative Attribute Graph model [5], the Attribute Graph Af- (vertices). For instance, suppose that there are three different v filiation model [6], and the Cluster Affiliation model (or BIG- typesoffeatures,namely‘Familymember’,‘City’,and‘Hobby’. i XCLAM) [7]. In contrast, almost nothing is known about deter- TheBooleanfunctionf(x ,x ,x )=x (x x )canbeused 1 2 3 1 2 3 ∨ ∧ rministic, combinatorial latent feature models. to express the connection rule that two people are Facebook a In the recent work of Tsourakakis [8], a probabilistic latent friends if and only if either they are family members or they featuremodelfornetworkswasproposedthatimplicitlyusesthe have lived in at least one common city and shared at least one notion of intersection representations of graphs [9], [10], [11] common hobby. As such, it asserts that the ‘Family’ feature is and builds upon the overlapping community detection approach morerelevantthaneitherofthe‘City’or‘Hobby’features.More of Bonchi et al. [12]. More specifically, in this model one fixes generally, we can use any Boolean function f = f(x ,...,x ) 1 r the total number of features and tries to assign to each vertex together with a vector p = (p ,...,p ), p 1, to describe a 1 r i ≥ a subset of features in a way that maximizes a certain score. connectivity rule based on r different types of features in which Here, the score of a specific feature assignment is the count the requirement ‘sharing at least one common feature of type of unordered pairs of vertices (u,v) that satisfies the so-called ’ is replaced by the requirement ‘sharing at least p common i i A Intersection Condition, which states that u and v are adjacent if features of type ’. i A andonlyiftheyshareatleastonecommonfeature.Inparticular, Inthescopeofthispaper,wemostlyfocusonabasicbuilding if one insists on a perfect score, i.e., a score equal to (cid:0)n(cid:1), block of Boolean functions, namely the AND function of two 2 then the minimum number of features required reduces to the 1Theintersectionrepresentationofgrapharisesinnumerousproblemssuchas S. H. Dau and O. Milenkovic are with the Coordinated Science Laboratory, the keyword conflict problem, the traffic phasing problem, and the competition University of Illinois at Urbana-Champaign, 1308 W. Main Street, Urbana, IL graphsfromfoodwebs,tonameafew,andhasbeenextensivelystudiedinthe 61801,USA.Emails:{hoangdau,milenkov}@illinois.edu. literature(see,forinstance[13],[14]). variablesf(x ,x )=x x .Itisstraightforwardtoseethatthe complete multipartite graphs and affine planes. We provide an 1 2 1 2 ∧ Boolean OR function leads to results identical to those obtained exact algorithm to find an optimal cointersection representation forthesimpleintersectionproblem,andresultsobtainedforAND of a graph by using SAT solvers (Section V-B). We also develop functions allow one to easily extend all the proposed approaches a randomized algorithm to find an approximate cointersection to the case of Boolean functions that include both AND and OR representation of a graph in Section V-C. Finally, we extend operations. For simplicity, we also consider (p ,p ) = (1,1). the bounds on the cointersection number for the case when a 1 2 To illustrate the latent feature model arising in this setup, we general Boolean function is used instead of the AND function consider the example in Fig 2. The network has five nodes, (Section VI). which represent five different people. Each person is assigned As a parting remark, we point out that there exist many other twodistinctsetsoffeatures,onerepresentingthehobbiesthatthe applications of latent feature modeling which pertain to com- person has and the other representing the cities that the person munication networks, spectrum allocation being one particular haslivedin.Forinstance,let = a ,a besuchthata stands example of interest. We defer the discussion of these topics to a 1 2 1 A { } forfishinganda standsforplayingsoccer,andlet = b ,b companion paper. 2 1 2 B { } be such that b stands for Hanoi and b stands for Champaign. 1 2 Then Person 4 is assigned two sets of features, namely a II. PRELIMINARIES 2 { } and b ,b , which states that this person has soccer as a hobby We start by formally introducing our new latent feature model 1 2 { } and has lived in both Hanoi and Champaign (to avoid notational and describing its relevant properties. clutter,weuse a b ,b todenotepairsofsets). Supposethat 2 1 2 { | } A. The cointersection Model a b a b { 1| 1} { 2| 2} 1 5 Definition 1. Let and be two disjoint nonempty subsets A B of features of cardinalities α and β, respectively. An (α β)- | cointersection representation (CIR) for a graph = ( , ) is a G V E family R = (A B ):v , where A , B , that v v v v 2 3 4 satisfies the s{o-call|ed cointe∈rseVc}tion Conditio⊆n:A ⊆B {a1|b1} {a1,a2|b1} {a2|b1,b2} (u,v) Au Av =∅ and Bu Bv =∅. ∈E ⇐⇒ ∩ (cid:54) ∩ (cid:54) Let θc( )=minR( + ), where the minimum is taken over Fofigf.ea2t.ureEsafcrohmnoBde=is{bas1s,ibg2n}ed.TawsoetnoodfefseaatruerecsonfrnoemcteAdb=ya{na1ed,age2}ifaannddaonsleyt all coinGtersection re|pAre|sen|Bta|tions R of G. Then θc(G) is called iftheyshareatmostonefeaturefromAandonefeaturefromB. the cointersection number of . A cointersection representation G that uses exactly θc( ) features is called optimal. two people are connected if and only if they share at least one G common hobby AND they have lived in at least one common Itisclearthatthecointersectionnumberofagraphisprecisely city. For instance, Person 3 and Person 4 are connected because the smallest number of features used to describe the network in they have soccer as a common hobby and they both have lived the Boolean AND model (see Section VI). in Hanoi. However, Person 3 and Person 5 are not connected, Fig. 2 depicts a (2 2)-CIR. We can verify easily that for eventhoughtheybothlikeplayingsoccer,becausetheyhavenot this graph, θc = 4, an|d hence, this representation is optimal. lived in the same city. If we refer to the set of nodes that have a particular common Giventhenodes’correspondingsetsoffeaturesandtherulesas feature as a community, then the community structure induced of how to connect two nodes, it is clear how the graph emerges. by this representation is illustrated in Fig. 3. Note that in this Theproblemofinterestistheopposite:undertheassumptionthat setting communities are no longer restricted to be cliques, which the graph is given and that each node is assigned two subsets is a more realistic modeling assumption. Furthermore, u and v of features from and , where and are two disjoint sets are adjacent if and only if they belong to the intersection of A B A B of features, and that two nodes are connected if and only if they one community of type and another community of type . share at least one feature from and at least one feature from A B Note that communities may also be defined by pairs of features, A , how can we infer the latent features assigned to the nodes? in which case they form cliques and represent intersections of B Usually, the latent features are abstracted as elements from a individual feature communities. discrete set, and the mapping between the elements and the real a b a b features is determined based on available data. { 1 | 1} { 2 | 2} Our first aim is to determine the smallest possible number 1 5 of features min( + ) needed to explain a given graph. a1 a2 |A| |B| We refer to this quantity as the cointersection number of a b b 1 2 graph. Note that the notions of cointersection number and cointersection representation of graphs have not been studied before in the literature. We then proceed to establish general {a1 |b1} 2 3 4 {a2 |b1,b2} lower and upper bounds on the cointersection number of a a ,a b 1 2 1 graph via its intersection number. In addition, we derive several { | } explicit bounds for some particular families of graphs, including Fig. 3. The community structure induced by the features in a cointersection bipartite graphs, multipartite graphs, and graphs with bounded representationofthegraph.Theverticesaregroupedintodifferentcommunities, each of which corresponds to an A-feature (solid closed curve) or a B-feature degrees (Section III). We also examine the tightness of these (dashed closed curve). The pair (u,v) is an edge if and only if both u and v bounds (Section IV). In particular, we describe an interesting belongtoacommonA-communityandacommonB-community.Inotherwords, connection between the cointersection representations of certain everyedgeliesinsidebothasolidcurveandadashedcurve. Inthenextsubsection,wereviewtheconceptsandsomewell- Weshownextthatagraphofboundeddegreehasacointersec- known results on the intersection number and its generalization, tion representation that uses (√n) features. Our probabilistic O the p-intersection number. proof is based on the analysis in [15, Theorem 11]. Theorem5. Let beagraphonnvertices,withedgeset and G E B. The Intersection Number and the p-Intersection Number maximum vertex degree ∆( ) d. Then θc( ) 16d5/2√n. G ≤ G ≤ Clearly, an (α 1)-CIR of a graph is equivalent to an inter- Proof: Let and be two disjoint sets of features of the sectionrepresenta|tionofthesamegraphthatusesαfeatures[9]. same cardinalityAα = βB= 8d5/2n1/2. Our goal is to show the Anintersectionrepresentationofagraphisequivalenttoanedge existence of an (α β)-CIR of . | G cliquecover,i.e.asetofcompletesubgraphs(cliques)ofagraph We independently assign to every edge e of a randomly G that covers every edge at least once. The intersection number of chosenpairoffeatures a(e) b(e) ,wherea(e) andb(e) { | } ∈A ∈ a graph , denoted by θ ( ), is the smallest number of features . For each vertex v , let G 1 G B ∈V used in an intersection representation of the graph, or the size A = a(e): e=(u,v) , (1) v of a smallest edge clique cover of that graph. The p-intersection { ∈E} B = b(e): e=(u,v) . (2) number of a graph, denoted by θ ( ), is the smallest possible v { ∈E} p numberoffeaturestoassigntotheveGrticessuchthattwovertices We aim to show that with a positive probability, the feature are adjacent if and only if they share at least p common features assignment (Av Bv): v co-represents . Clearly, if { | ∈ V} G (see,e.g.[10],[11],[15]).Welistbelowacoupleofwell-known e=(u,v) then by (1) and (2), we have a(e) Au Av and results on the intersection number and the p-intersection number b(e) Bu∈BEv. Therefore, Au Av =∅ and Bu∈ Bv∩=∅. In ∈ ∩ ∩ (cid:54) ∩ (cid:54) of a graph. order for the cointersection Condition to be satisfied, we need to show that with a positive probability, for every (u,v) / , either Theorem 1 (Erdo¨s, Goodman, and Po´sa [9]). If is any graph, A A =∅ or B B =∅. To this end, we make∈usEe of the then θ1( ) n2/4 . G Louv∩a´szvLocal Lemmua∩[1v8]. G ≤(cid:98) (cid:99) The classical Lova´sz Local Lemma may be stated as follows. Theorem 2 (Alon [16]). Let be a graph on n vertices with H Suppose that there are m bad events E ,E ,...,E , each maximal degree at most d and minimal degree at least one, and 1 2 m let = be its complement. Then θ ( ) 2e2(d+1)2log n. occurring with probability at most P. Moreover, each event is G H 1 G ≤ e dependent on at most D other events. If PD 1/4 then ≤ Theorem 3 (Eaton, Gould, and Ro¨dl [15]). For p 2 and any graph G on n vertices, (cid:0)θpp(G)(cid:1)≥θ1(G). ≥ Prob(∩mi=1Ei)>0. In other words, with a positive probability, we can avoid all bad Theorem 4 (Eaton, Gould, and Ro¨dl [15]). Let be a graph G events simultaneously. on n vertices with maximum vertex degree d and p > 1 be an integer, then θ ( ) 3epd2(d+1)1/pn1/p. Wedefineoursetofbadeventsasfollows.Foreach(u,v) / , p G ≤ weletE denotetheeventthatA A =∅andB B =∈∅E. u,v u v u v ∩ (cid:54) ∩ (cid:54) For each event E , we need to find an upper bounds on the u,v III. LOWERANDUPPERBOUNDSONTHECOINTERSECTION probability that it happens and the number of other events that it NUMBERSOFGRAPHS may depend on. First, we estimate the probability that each E occurs. Since We now turn our attention to deriving upper bounds on the u,v ∆( ) d, each vertex v is incident to at most d edges. cointersectionnumbersθcofarbitrarygraphs,andexplicitbounds G ≤ ∈ V Therefore, by (1) and (2), A d and B d, for every on θc for bipartite graphs, chordal graphs, and graphs with | v| ≤ | v| ≤ v . To obtain an upper bound on the probability that A bounded vertex degrees. A∈=V ∅, we may assume that A and A are as largeua∩s v u v Lemma 1. For any graph , one has θc( ) 1+θ ( ). poss(cid:54)ible, i.e. A = A = d. M|ore|over, s|ince| u and v do not 1 u v G G ≤ G | | | | have any incident edges in common, their sets of -features are Proof: Given an optimal intersection representation of , A which uses θ features, we may create a (θ 1)-CIR Gof independent.Therefore,wecantreatAu andAv astwoarbitrary 1 1 | subsets of [α] of sizes d. Then we have as follows. If in the intersection representation of the cvGoerrtreexspvonidsiansgsicgoniendtetrhseecstieotnofrefperaetsuernetsat{ioan1,o.f..,,awr}e,athsesingGnintothve Prob(Au∩Av (cid:54)=∅)≤ d(cid:0)(cid:0)dα−α(cid:1)1(cid:1) = α dd2+1. the sets of features a ,...,a b , where b /Ga ,...,a . d − { 1 r | } ∈{ 1 θ1(G)} Similarly, Itiseasytoverifythatthisfeatureassignmentisindeeda(θ 1)- 1 CIR of . | d(cid:0) α (cid:1) d2 LemmGa 1 immediately implies some explicit upper bounds on Prob(Bu∩Bv (cid:54)=∅)≤ (cid:0)dα−(cid:1)1 = β d+1. the cointersection number of graphs. For instance, the following d − upper bound for complement of a sparse graph is an obvious Thus, we deduce that for (u,v) / , ∈E corollary of Lemma 1 and [16, Theorem 1.4]: if is a graph on Prob(E )=Prob(A A =∅) Prob(B B =∅) G u,v u v u v n vertices with maximum degree at most n 1 and minimum ∩ (cid:54) × ∩ (cid:54) − d4 (3) degree at least n d then θc( ) 1+2e2(d+1)2lnn. Another P = . − G ≤ ≤ (α d+1)(β d+1) immediate consequence of Lemma 1 and [17, Corollary 3.2] is − − that if is a chordal graph on n vertices with largest clique of Second, we evaluate the number of other events that a certain G size r then θc( ) 1+θ ( ) n r+2. event E is dependent of. If (u,v) / and (w,x) / then 1 u,v G ≤ G ≤ − ∈ E ∈ E thetwoeventsE andE aredependentifandonlyifeither ,where =αand =β.Foreachpair(a,b) ,the u,v w,x B |A| |B| ∈A×B there exist z u,v and z(cid:48) w,x such that (z,z(cid:48)) or set of vertices = v V : a A ,b B forms a clique a,b v v ∈ { } ∈ { } ∈ E C { ∈ ∈ ∈ } u,v,w,x 3. For each (u,v) / , there are at most 2dn of .Moreover,itisobviousthatanyedgeof mustbecovered |{ }| ≤ ∈ E G G pairs w,x that meet the first criteria and at most 2n pairs that by one such clique. Therefore, C = : (a,b) is a,b { } {C ∈ A×B} meet the second. Therefore, each event E is dependent of at an edge clique cover of . As θ ( ) is the number of cliques in u,v 1 G G most D =2n(d+1) other events. a minimum edge clique cover of , we have G By Lova´s Local Lemma, it remains to prove that PD 1/4. ≤ αβ = = C θ1( ). Recall that we assumed that α = β = 8d5/2n1/2. Hence, we |A||B| | |≥ G Therefore, θc( ) min (α+β). need to show that G ≥ αβ≥θ1(G) The following is immediate from Lemma 1 and Lemma 3. (8d5/2n1/2 d+1)2 8d4(d+1)n. (4) − ≥ Corollary 1. For any graph we have G This claim may be established as follows: (cid:112) 2 θ ( ) θc( ) 1+θ ( ). (5) 1 1 (8d5/2n1/2 d+1)2 (8d5/2n1/2 2√2d)2 (cid:100) G (cid:101)≤ G ≤ G − ≥ − Note again that both θc and θ2 (the 2-intersection number) =8d2(2√2d3/2n1/2 1)2 =8d2(8d3n 4√2d3/2n1/2+1) have quite similar lower bounds in terms of θ . Indeed, based ≥=88dd22(cid:0)(cid:16)(dd23(nd++d12)nn)++−(cid:0)((77dd3n1−)dd12/n2n−1/42√−2d43√/22n(cid:1)1d/32/)2(cid:1)n1/2(cid:17) toθw2n(oGthl)oew≥aef(cid:112)rorb2eoθmu1en(nGdts)io.fnoCerodrθoblloaaunrnydd1θ(cid:0)cgθid2v2(ieGffs)e(cid:1)urs≥frθocmθ(1G(e)Ga≥c)1,h2oo(cid:112)ntheθe1ar(rGorin)vl.eysTbhayet − − 2 >8d4(d+1)n. a multiplicative factor of √2. Thelastinequalityisduetothefactthatforn d 1,wehave ≥ ≥ IV. TIGHTNESSOFTHEBOUNDS (7d 1)d1/2n1/2 6>4√2. This completes the proof. − ≥ We discuss next the tightness of the bounds on θc( ) for For triangle-free d-regular graphs on n vertices, by Corol- G lary 1, θc( ) 2(cid:112)θ ( ) = √2d√Gn. Therefore, in this case, several families of graphs. In addition, we link the existence 1 G ≥ G of cointersection representations of certain complete multipartite the upper bound given by Theorem 5 is optimal up to a constant graphsthatachievethelowerboundwiththeexistenceofspecific factor depending on d. affine planes. Recall that θ ( ) denotes the 2-intersection number of . As 2 already pointed oGut, Eaton et al. [15] showed that θ ( ) G1+ The first result shows that for graphs with very small θ1, the 2 θ ( ) for a general graph and θ ( ) 3epd2(d+1)1/2G√n≤for a upper bound θc( ) 1+θ1( ) is actually tight. 1 2 G ≤ G grapGhofboundeddegreed.ThefoGrm≤erboundisthesameasthe Proposition 1. The upper bound θc( ) 1+θ ( ) stated in 1 G ≤ G upper bound for θc( ) in Lemma 1 and the latter is essentially Lemma 1 is tight when θ ( ) 3. 1 G G ≤ the same as the upper bound for θc( ) in Theorem 5. However, Proof: It is obvious that when θ ( ) 3, the left-hand side θc( ) and θ2( ) can be vastly diffeGrent for certain families of and the right-hand side of (5) are co1inGcide≤. G G graphs. For instance, we establish in Proposition 3 in Section IV Next, we demonstrate that for some simple graphs, the lower that for a complete balanced bipartite graph with edge set , bound αβ θ ( ) established in Lemma 3 is also sufficient for Wwheislte[1θ1c(]Gf)or=the|Vl|a,ttθe2r(cGla)imis).quadratic in |V| (see Chung anVd tθhceceaxnisbteendc≥eeteo1rfmaGnin(eαd e|xβp)l-iCciItRly..Asθ1 isknownforthesegraphs, Next, we show that the cointersection number of a bipartite Proposition 2. If αβ θ ( ) then there exists an (α β)-CIR graphisatmostitsorder.Sincetheintersectionrepresentationof ≥ 1 G | of when is a star , a path , or a cycle . abipartitegraphisequaltoitssize,theboundstatedinLemma2 G G Sn Pn Cn improvestheboundstatedinLemma1whenthegraphhasmore Proof: Suppose that is a star graph on n ver- n G ≡ S edges than vertices. tices. Let and be two disjoint subsets of features of A B sizes α and β, respectively. First, suppose that has edges Lemma 2. θc( ) if =( , ) is a bipartite graph. Sn G ≤|V| G V E (1,2),(1,3),...,(1,n). Since n 1=θ1( n), we can |A||B|≥ − S Proof: As is a bipartite graph, we can partition the set of assign distinct pairs (a,b) to the edges of . For each n G ∈A×B S vertices into two parts, say U = 1,2,...,n and V = n + vertex v 2,...,n , let A = a , B = b , where 1 1 v 1,v v 1,v { } { ∈ { } { } { } 1,n +2,...,n ,forsome1 n <n,sothat (u,v): u a b are the features assigned to the edge (1,v). Also, 1 1 1,v 1,v } ≤ E ⊆{ ∈ { | } U,v V . Set = a : u U and = b : v V . We let A = and B = . It is clear that this is an (α β)-CIR u v 1 1 ∈ } A { ∈ } B { ∈ } A B | assigntoeachu U twosetsoffeatures,namelyA = a and of . u u n ∈ { } S B = b : (u,v) . Similarly, we assign to each v V two Next, suppose that is a path on n vertices and that it u v n { ∈E} ∈ G ≡P sets of features, namely A = a : (u,v) and B = b . has edges (v,v+1), 1 v < n. Recall that θ ( ) = n 1. v u v v 1 n { ∈E} { } ≤ P − Then it is straightforward to verify that R = (Av,Bv): v To simplify the notation, we assume that αβ = θ1( n) = n { ∈V} P − isan(n ,n n )-CIRof .Asthiscointersectionrepresentation 1. The case when we have strict inequality can be proved in 1 1 − G uses n features in total, the proof follows. the same manner. Furthermore, let = a ,...,a , and = 1 α A { } B We prove next a lower bound on θc via θ . b ,...,b . 1 1 β { } We describe next an (α β)-CIR of . We first split Lcoenmsemqaue3n.ceI,fθRc(Gis)a≥nm(αin|αββ≥)-θC1(IGR)(oαf G+tβh)e.n αβ ≥θ1(G). As a nof−pr1eciesdeglyesβocfoPnsneciunttioveαedegqe|usa.lW-siezethdengraosuspPigsn,neaac1h cbo1ns,istai1ng { | } { | Proof:SupposewehaveacointersectionrepresentationR = b ,..., a b asfeaturestothefirstgroupofβ edgesinthat 2 1 β } { | } (A B ):v of withtwodisjointsetsoffeatures and order. For the next group of β edges, we assign the sequence of v v { | ∈V} G A features a b , a b ,..., a b .Forthethirdgroup thisvertexmayshareacommonpairoffeatureswithsomeother 2 β 2 β−1 2 1 { | } { | } { | } of β edges, we use the sequence a b , a b ,..., a verticesthatarenotadjacenttoit.Forinstance,forn=9=3 3, 3 1 3 2 3 { | } { | } { | × b . Note that we used an increasing order for the indices of the currently discussed feature assignment for , demonstrated β 9 } C the sequence b in the first group, and a decreasing order for in Fig. 4, violates the cointersection Condition. j the second group, and again an increasing order for the third 1,3 4 { | } group. We continue to assign features in this way until reaching 1 the last group of edges. We illustrate this feature assignment for {3|4,6}6 }9 {3 |4} {1|4} 2 {1|4,5} tahnededg=es o4f,5P,163,7in.the figure below. Here, we set A={1,2,3} {3|5,6} 8 3{ | {1|}5 3 {1|5,6} B { } {3 6} 4}2 6}4 7 }6 5 }8 4 }10 6 }12 |}5 1{| 11{|{|1}5 31{|{|1}7 52{|{|2}6 72{|{|2}4 93{|{|3}5113{|{|3}713 {2,3|5} 7 {2|5} 6 {2|4} 5 {2 |6 } 4 {1,2|6} We use a(e) b(e) to denote the pair of features assigned {2|4,5} {2|4,6} { | } to an edge e. Then we assign to each vertex v n two feature sets A = a(e): e is incident to v and ∈B P= Fig. 5. An example of a (3|3)-cointersection representation of C9. Here we v { } v setA={1,2,3}andB={4,5,6}. b(e): e is incident to v . For example, the features of the ver- { } tices of are given in the figure below. 13 We correct this issue as follows. Suppose that α 3 (the case P {1|4,5} {1|6,7} {2|6,7} {2|4,5} {3|4,5} {3|6,7} α=1andβ =nistrivial,duetoLemma1).Weas≥signfeatures 2 4 6 8 10 12 to the first α 2 groups of edges of in the same way as for n − C paths. For the (α 1)th group, instead of assigning a α−1 1 3 5 7 9 11 13 b ,..., a −b , we assign a b ,..., {a | 1 4 1 5,6 1,2 7 2 5,6 2,3 4 3 5,6 3 7 β} { α−1 | 1} { α−1 | β} { α−1 | { | } { | } { | } { | } { | } { | } { | } b , a b , a b to the edges in this order. For 3 α−1 1 α−1 2 Wecanverifythatthisisan(α|β)-CIRofPn.Duetotheway the}α{th grou|p, i}ns{tead of|ass}igning {aα | b1},...,{aα | bβ}, we assign features to the vertices, each vertex has precisely the we assign a b , a b ,..., a b , a b to the α 2 α 3 α β α 1 featurepairs a,b ,wherea andb assignedtotheedges edges.Inth{iswa|y,w}e{guar|ante}ethat{thev|ertex}1{isal|soa}ssigned { } ∈A ∈B incident to that vertex. Moreover, different edges are assigned two feature pairs as the others, and hence, two vertices share a different feature pairs. Consequently, two distinct vertices share common feature pair if and only if they are adjacent to the same a common feature pair only if they share a common edge. edge. We illustrate this feature assignment in Fig. 5. 1,3 4,6 (cid:112) { | } Corollary2. If isastar,apath,oracycle,then 2 θ ( ) 1 (cid:112) G (cid:100) 1 G (cid:101)≤ {8|}343{ |5 }9 {3 |6} {v1io|late4d}pair{21|}516{|}3 θMthce(onGre)Poth≤rv(cid:112)oeeror2e,f(cid:100):beyxBθiysP1t(srGoCap)oo(cid:101)r(.so(cid:100)ilt(cid:112)liaorθny1(21G,,)i(cid:101)fwGe| (cid:100)hi(cid:112)savaθe1s(tθGacr)(,(cid:112)(cid:101)G)a)-CpI≥aRth,o(cid:100)f2o(cid:112)rGa,θ1wc(yGhci)lc(cid:101)eh., 7 {2|4} 6 {2|5} 5 {2 |6 } 4 {1,2|6} u2(cid:100)se(cid:112)sθ21(cid:100)(G)θ(cid:101)1,(Gw)h(cid:101)icfheaetustraebsliisnhteostaolu.rHaesnsceer,ti(cid:100)o2n foθr1(sGta)r(cid:101)s,≤paθthc(sG, )an≤d cycles. Similar results also hold for complete multipartite graphs Fig.4. Anexamplewherethediscussedfeatureassignmentforpathsdoesnot applyforthecaseofacycle,sayC9.Twovertices1and4shareapairofcommon n,...,n with certain parameters, as shown in the subsequent features{1|6},eventhoughtheyarenotadjacent.HerewesetA={1,2,3} Kresults. Note that for a complete bipartite graph , we have andB={4,5,6}. Kn,n θ ( ) = n2, which is precisely the number of edges. We 1 n,n K henceforth denote the set 1,2,...,m by [m]. The proof for cycles proceeds along the same lines as the { } proof for paths, except for one added modification. Recall that Proposition 3. If n=ts then a (t,ts2)-CIR exists for n,n. As (cid:112) K θ1(G) = n if G ≡ Cn is a cycle on n vertices. Suppose that a consequence, θc(Kn,n)=2n=2 θ1(Kn,n). αβ =n(thecaseαβ >ncanbedealtwithinthesamemanner). Proof: The explanation that the second assertion follows We split the n edges of into α equal-sized groups, each Cn from the first assertion is as follows. Let t=n and s=1. Then consisting of β consecutive edges. As demonstrated for paths, an (n,n)-CIR of exists which uses exactly 2n features. n,n the key idea is to assign features to edges so that different edges K Combining this result with Corollary 1, we have receive different pairs of features and moreover, the set of the (cid:113) feature pairs each vertex has consists precisely of the feature 2n=2 θ ( ) θc( ) 2n, 1 n,n n,n pairs assigned to its two adjacent edges. When α is even, we K ≤ K ≤ which implies that assign features to α groups of edges of and then deduce the n C (cid:113) set of features assigned to each vertex in the same way we do θc( )=2n=2 θ ( ). for paths. When α is odd, this feature assignment may no longer Kn,n 1 Kn,n work, because now the vertex 1 of the cycle would be assigned Note that this equality may also be deduced by combining two sets of features = a ,a and = b ,b ; as a Corollary 1 and Lemma 2. 1 1 α 1 1 β A { } B { } result, it would have four instead of two feature pairs, namely We now prove the first assertion of the proposition. Let = A a1 b1 , a1 bβ , aα b1 , aα bβ . As a consequence, a1,...,at and = b1,...,bts2 . Let R1,...,Rs be disjoint { | } { | } { | } { | } { } B { } subsetsofsizetsof thatpartition .Moreover,letC ,...,C comprising n/k blocks that partition . We provide an example 1 ts B B X be disjoint subsets of size s of that partition . In addition, let for a 2-(9,3,1) resolvable packing below. B B R C =1 for every i [s] and j [ts]. For instance, if we i j | ∩ | ∈ ∈ 1,2,3 1,4,7 1,5,9 1,6,8 arrange the ts2 elements of in a s (ts) matrix, then we can { } { } { } { } B × 4,5,6 2,5,8 2,6,7 2,4,9 simply let Ri be the set of ts elements in the ith row and let Cj { } { } { } { } 7,8,9 3,6,9 3,4,8 3,5,7 be the set of s elements in the jth column. { } { } { } { } We assign feature sets to each vertex in as follows. Fig.7. A2-(9,3,1)resolvablepackingwithfourparallelclasses. n,n K Suppose that ( ) = 1,...,n n+1,...,2n , and let n,n ( ) = (Vi,jK): 1 i{ n,n+}∪1 { j 2n . Fi}rst, for a The following simple lemma describes a property of a 2- n,n vEerKtex i {1,...,n ,≤we w≤rite i = (i≤ 1≤)s+}i 1, where (k2,k,1) resolvable packing that will be of importance in the a b 1 i ∈ {t and 1} i s. Then w−e assign A−= a proof of upcoming Theorem 6. ≤ a ≤ ≤ b ≤ i { ia} and Bi = Rib. For a vertex i ∈ {n + 1,...,2n}, we assign Lemma 4. Let ( , ) be a 2-(k2,k,1) resolvable packing. If Ai = = a1,...,at and Bi =Ci. Recall that n=ts, which S and S(cid:48) XarSe two blocks fromdifferent parallelclasses, A { } is precisely the number of sets Cj’s that we have. For example, the∈nSS S(cid:48) =∈1S. when n = 6, t = 2, and s = 3, then the sets R and C consist | ∩ | i j of elements in the correspondingly indexed rows and columns, Proof: By the definition of a packing, every pair of points respectively, of the matrix given below. is contained in exactly one block. Therefore, any two different blocks have at most one point in common. Hence, S S(cid:48) 1. C1 C2 C3 C4 C5 C6 SupposethatS andS(cid:48) belongtwodifferentparallelc|la∩sses|≤and R1 b1 b2 b3 b4 b5 b6 (cid:48),respectively.NotethateachparallelclassconsistsofprecCisely R2 b7 b8 b9 b10 b11 b12 Ck = k2/k disjoint blocks. These k blocks together partition the R3 b13 b14 b15 b16 b17 b18 set . Therefore, if S(cid:48) / then it must intersect each block in X ∈ C at at least one point, for otherwise C The resulting (2,18)-CIR of K6,6 constructed as described S(cid:48) = S(cid:48) S = (cid:88) S(cid:48) S < (cid:88)1=k, above is illustrated in Fig. 6. | | |∪S∈C ∩ | | ∩ | S∈C S∈C {a1|R1} {a1|R2} {a1|R3} {a2|R1} {a2|R2} {a2|R3} a contradiction. Hence, S S(cid:48) 1. Thus, S S(cid:48) =1. 1 2 3 4 5 6 | ∩ |≥ | ∩ | Theorem 6. Ifthereexistsa2-(k2,k,1)-resolvablepackingwith at least r 2 parallel classes then θc( ) = 2n, where n = ≥ Kn[r] k2, and is the complete r-partite graph . Kn[r] Kn,...,n 7 8 9 10 11 12 Proof: Note that for r 2, is an induced subgraph of n,n ≥ K {a1,a2|C1} {a1,a2|C2}{a1,a2|C3}{a1,a2|C4}{a1,a2|C5}{a1,a2|C6} Kn[r]. Therefore, by Proposition 3, we have FTihge.s6e.tsAC1(,2.,.1.8,)C-C6IRareofalKso6,p6a.irTwhiesesedtissjRoi1n,t.RE2a,cahnpdaRir3ofarseetpsaRirwiiasneddCisjjohinats. θc(Kn[r])≥θc(Kn,n)=2n. anintersectionofsizeone.BothRi’sandCj’saresubsetsof[b1,...,b18]. Hence, it remains to prove that we can co-represent by Kn[r] using 2n features if a certain resolvable packing exists. Wenowproceedtoverifythatthisfeatureassignmentisindeed Let us assume that a 2-(n = k2,k,1)-resolvable packing a cointersection representation of Kn,n. (X,S) with at least r parallel classes, say C1,...,Cr, exists. Let WefirstverifythatthecointersectionConditionholdsfornon- = a : x and = b : x . Then = =n. x x edges of . For 1 i = i(cid:48) n, either i = i(cid:48) or i = i(cid:48). AWe as{sign to∈thXe v}erticeBs of { fe∈atuXre}s from |Aa|nd |Ba|s fol- If i = i(cid:48)Ktnh,enn A A≤ =(cid:54) a ≤ a =a∅(cid:54). Ifai =bi(cid:48)(cid:54) thebn lows.ConsidernverticesintKhen[(cid:96)rt]hpartP oftheAgraph(B(cid:96) [r]). B aB(cid:54) a=R Ri∩ =i(cid:48)∅, b{eciaau}se∩t{hei(cid:48)ase}ts R formba(cid:54) parbtition. We partition these n=k2 vertices into k(cid:96)groups, each of∈which Ini∩eithei(cid:48)r casei,bw∩e hi(cid:48)bave A A = ∅ or Bi B = ∅. For consists of precisely k vertices. Let G(cid:96) = v(cid:96) : j [k] denote n+1 i=i(cid:48) 2n, we aliw∩aysih(cid:48)ave B Bi =∩ Ci(cid:48) C =∅, theithvertexgroupofP ,fori [k]aind(cid:96){ i,[jr].T∈heve}rticesin ≤ (cid:54) ≤ i∩ i(cid:48) i∩ i(cid:48) (cid:96) ∈ ∈ since all the pairs of sets C are disjoint. P are then assigned features according to the blocks in the (cid:96)th i (cid:96) Next, we verify that the cointersection Condition holds for parallelclass = S(cid:96),...,S(cid:96) inthefollowingway.Thevertex C(cid:96) { 1 k} edges of . Indeed, for 1 i n and n+1 j 2n, v(cid:96) in the ith group G(cid:96) has feature sets A = a : x S(cid:96) Bwe hBave=AKRin,∩nACj =={∅ai,ab}e∩caAu≤se=w≤e{aaisas}um(cid:54)=e t∅ha,tanRd≤mCor≤eo=ver1, ani,dj Bvi(cid:96),j ={bx: x∈Sij(cid:96)}. vi(cid:96),j { x ∈ i} i∩ j ib∩ j (cid:54) | i∩ j| We show next that the above feature assignment indeed satis- foreveryi [s]andj [ts].Thus,weconstructeda(t,ts2)-CIR fies the cointersection Condition. ∈ ∈ of . n,n First, we verify this condition for the non-edges of . BKefore proceeding with our discussion, we review a few ConsidereachpartP ofthegraph.Ifv(cid:96) andv(cid:96) ,wherejK=n[jr](cid:48), definitionsfromthetheoryofcombinatorialdesigns(see,e.g.[19, are two distinct verti(cid:96)ces that belong toi,tjhe samie,j(cid:48)group G(cid:96),(cid:54) then VI.40]). Let n k 2. A 2-(n,k,1) packing is a pair ( , ), i ≥ ≥ X S where is a set of n elements (points) and is a collection of B B = S(cid:96) S(cid:96) =0. subsetsXof size k of (blocks), such that eSvery pair of points | vi(cid:96),j ∩ vi(cid:96),j(cid:48)| | j ∩ j(cid:48)| X occurs in at most one block in . A 2-(n,k,1) packing ( , ) The reason is that when j = j(cid:48), S(cid:96) and S(cid:96) are two distinct S X S (cid:54) j j(cid:48) is resolvable if can be partitioned into parallel classes, each blocks in the same parallel class of the packing, and hence (cid:96) S C must be disjoint. If v(cid:96) and v(cid:96) belong to different groups G(cid:96) assertion also holds because an affine plane of a prime power and G(cid:96), respectively,i,wjhere ii=(cid:48),j(cid:48)i(cid:48), then i order always exists. The resolvable packing used in Example 1 i(cid:48) (cid:54) is in fact an affine plane of order three. A A = S(cid:96) S(cid:96) =0, | vi(cid:96),j ∩ vi(cid:96)(cid:48),j(cid:48)| | i ∩ i(cid:48)| In light of Corollary 3, it is apparently nontrivial to prove (theoretically or computationally) that θc( ) > 2n, where because S(cid:96) and S(cid:96) are two distinct blocks in the same parallel Kn[r] i i(cid:48) n=k2, r =k+1, when k is not a prime power. Indeed, such a class . Thus, every pair of vertices from the same part P ((cid:96) C(cid:96) (cid:96) ∈ proof(ifany)wouldimplythatanaffineplaneoforderkdoesnot [r]) has either no -features or no -features in common. A B exist. Note that the question whether an affine plane of an order Second, we verify the cointersection Condition for the edges which is not a prime power exists is still a widely open question of that connect vertices in different parts. Suppose that Kn[r] in finite geometry. It is not even known whether an affine plane v(cid:96) P and v(cid:96)(cid:48) P , where P and P are different parts i,j ∈ (cid:96) i(cid:48),j(cid:48) ∈ (cid:96)(cid:48) (cid:96) (cid:96)(cid:48) of order 12 or 15 exists (see, e.g. [19, VII.2.2]). of the complete r-partite graph. Then we have Corollary 4. θc( ) = 2n for every n = k2, where k 2 A A = S(cid:96) S(cid:96)(cid:48) =1. Kn,n,n ≥ | vi(cid:96),j ∩ vi(cid:96)(cid:48)(cid:48),j(cid:48)| | i ∩ i(cid:48)| is not necessarily a prime power. The validity of the above claim follows from the observation Proof: By Theorem 6, it suffices to construct a 2-(k2,k,1) that for (cid:96) (cid:54)= (cid:96)(cid:48), the two blocks Si(cid:96) and Si(cid:96)(cid:48)(cid:48), which are from resolvable packing with three parallel classes for every k ≥ 2. different parallel classes of the packing, must intersect at one Let =[k2].Wecanarrangethesek2 pointsintoak kmatrix. X × point (according to Lemma 4). Similarly, we have Then the k blocks containing the points along the rows of this B B = S(cid:96) S(cid:96)(cid:48) =1. matrix form the first parallel class. The k blocks containing the | vi(cid:96),j ∩ vi(cid:96)(cid:48)(cid:48),j(cid:48)| | j ∩ j(cid:48)| points along the columns of this matrix form the second parallel Therefore, the cointersection Condition is satisfied for all edges class. The k blocks containing the points along the direction of of the graph. Thus, the assigned features form an (n,n)-CIR of themaindiagonalformthethirdparallelclass.Itiseasytoverify , which uses precisely 2n features, as desired. thattheseblocksandthethreeparallelclassesforma2-(k2,k,1) Kn[r] resolvable packing. Example 1. To illustrate the idea of Theorem 6, we consider 9,9,9,9 and the 2-(9,3,1) resolvable packing with four parallel 1,2,3,4 1,5,9,13 1,6,11,16 Kclasses 1, 2, 3, 4 given in Fig. 7. Note that by Theorem 6, {5,6,7,8} {2,6,10,14} {2,7,12,13} C C C C (cid:113) { } { } { } θc( )=θc( )=θc( )=2 θ ( )=18. 9,10,11,12 3,7,11,15 3,8,9,14 K9,9,9,9 K9,9,9 K9,9 1 K9,9 {13,14,15,16} {4,8,12,16} {4,5,10,15} { } { } { } We omit the edges of the graph and provide a (9,9)-CIR of Fig.9. A2-(16,4,1)resolvablepackingwiththreeparallelclasses. in Fig. 8. Note that in this figure, instead of a and b , 9,9,9,9 i j K we simply use i and j, respectively. For example, when k = 4, the three parallel classes of this {11,,22,,33| {14,,25,,36| {17,,28,,39| {41,,52,,63| {44,,55,,66| {47,,58,,69| {71,,82,,93| {74,,85,,96| {77,,88,,99| packing are given in Fig. 9. } } } } } } } } } Until this point, we have focused on providing several exam- 1,4,7 1,4,7 1,4,7 2,5,8 2,5,8 2,5,8 3,6,9 3,6,9 3,6,9 ples of graphs which meet the lower bound on θc established in {1,4,7| {2,5,8| {3,6,9| {1,4,7| {2,5,8| {3,6,9| {1,4,7| {2,5,8| {3,6,9| } } } } } } } } } Lemma 3. However, as we establish in subsequent propositions, the lower bound many not always be achievable. Note that by {11,,55,,99| {12,,56,,97| {13,,54,,98| {21,,65,,79| {22,,66,,77| {23,,64,,78| {31,,45,,89| {32,,46,,87| {33,,44,,88| Corollary 4, θc( n,n,n) = 2n for n = 4,9,16,... This is, in } } } } } } } } } K contrast, not true for n=2,3. 1,6,8 1,6,8 1,6,8 2,4,9 2,4,9 2,4,9 3,5,7 3,5,7 3,5,7 We first need to prove the following lemma, which states an {1,6,8| {2,4,9| {3,5,7| {1,6,8| {2,4,9| {3,5,7| {1,6,8| {2,4,9| {3,5,7| } } } } } } } } } important property of cointersection representations of triangle- free graphs (e.g. bipartite graphs) that meet the lower bound on Fig.8. Anoptimal(9,9)-CIRofK9,9,9,9 viaa2-(9,3,1)resolvablepacking θc inLemma3.Recallthatif =( , )isatriangle-freegraph, withfourclasses.Infact,thisisa2-(9,3,1)resolvabledesign,whichisalsoan then θ ( )= . G V E affineplaneoforder9. 1 G |E| Lemma5. Ifthereexistsan(α β)-CIRofatriangle-freegraph A 2-(n,k,1) resolvable design (see, e.g. [19, II.7]) is equiva- =( , ) where αβ = , the|n lenttoa2-(n,k,1)resolvablepackingdefinedearlier,exceptthat G V E |E| onerequiresthateverypairofpointsappearinexactlyoneblock. Av Bv =deg(v), | || | An affine plane of order k is a 2-(k2,k,1) resolvable design. So for every v V. Moreover, if (u,v) , then A A = far,onlyaffineplanesofordersthatareprimepowersareknown ∈ ∈ E | u ∩ v| B B =1. (see, e.g. [19, VII.2.2]). | u∩ v| Proof:Supposethat (A ,B ): v isan(α β)-CIRof Corollary 3. If there exists an affine plane of order k then { v v ∈V} | ,whereαβ = .Foreachedge(u,v) ,chooseanarbitrary θcoc(nKsenq[ru]e)nc=e,2thni,sfeoqruaelvietyryhorld≤s wkhe+n 1k,iswhaeprerimne=powk2e.r.As a Gfeature au,v ∈A|Eu|∩Av and an arbitrary∈feEature bu,v ∈Bu∩Bv and assign the pair a b to this edge. u,v u,v { | } Proof: It is well known that a 2-(k2,k,1) resolvable design We claim that different edges must have different pairs of has precisely k+1 parallel classes. As an affine plane of order features. Indeed, if (u,v) and (u(cid:48),v(cid:48)) are two different edges k is a 2-(k2,k,1) resolvable design, which is also a packing, by of such that a = a and b = b , then the four G u,v u(cid:48),v(cid:48) u,v u(cid:48),v(cid:48) Theorem 6, the first assertion of the corollary follows. The last vertices u,v,u(cid:48),v(cid:48) have a pair of features in common, namely a b . This implies that any three distinct vertices among When n = 2,3, the above lower bound on θc is attained. u,v u,v { | } these four must form a triangle in , which contradicts our Examples of (n 1,n)-CIRs of M when n = 2,3 are given G − Kn,n assumption that is triangle-free. Thus, different edges must be in Fig. 10. G assigneddifferentpairsoffeatures,asclaimed.Aconsequenceof M M K2,2 K3,3 this claim is that for every vertex v , the number of pairs of ∈V 1 2 1 3 1 3,4 2 3,4 1,2 5 features a b , where a A and b B , must be greater than { | } { | } { | } { | } { | } v v { | } ∈ ∈ or equal to the number of edges incident to v. In other words, A B deg(v), for every v . v v | || |≥ ∈V Moreover, by our assumption, the number of possible pairs of features a b , where a and b , is αβ, which is {1|3} {1|2} {2|3,5} {1|3,5} {1,2|4} { | } ∈ A ∈ B the same as the number of edges. Therefore, each such pair of Fig.10. A(1,2)-CIRofKM (left)anda(2,3)-CIRofKM (right). features must be used exactly once, as features of some edge. 2,2 3,3 It is now clear that if (u,v) , then A A = 1 and u v ∈ E | ∩ | It remains to show that if n 1 is an odd prime then |Bu ∩Bv| = 1. For otherwise, we could replace the assigned θc( M )=2n 1. Suppose, by co−ntradiction, that θc( M )= feaa(cid:48)turbe(cid:48)s,{wauh,evre|ab(cid:48)u,v}Afuor (Auv,va)ndbyb(cid:48)a dBiffuerenBtvp.aBirutoafsfepartouvreeds 2nK−n1,n. T(cid:54)hen th−ere must exist an (n−1,n)-CIR of KKnM,nn,n. Let {earli|er,}a(cid:48) b(cid:48) mu∈stalre∩adyhavebee∈nused∩asapairoffeatures A = {a1,...,an−1} and B = {b1,...,bn}. Note that every ofsome{oth|ere}dge(u(cid:48),v(cid:48))=(u,v).Thatwouldimplyatriangle vertex of this graph has degree n−1. By Lemma 5, for every formed by some three dist(cid:54)inct vertices among u,v,u(cid:48), and v(cid:48), vertex v, A B =deg(v)=n 1. which, again, contradicts our assumption that is triangle-free. v v | || | − G Finally, suppose that A B > deg(v) for some v . | v|| v| ∈ V As n 1 is a prime number, we deduce that either Av =1 and Then there must be a pair of features a b , where a A − | | { | } ∈ v Bv = n 1 or Av = n 1 and Bv = 1. We consider the and b B , that is not assigned to any edge incident to v. | | − | | − | | ∈ v following three cases, distinguished by the number of vertices However, as shown earlier, this pair of features a b must be { | } that have only one -feature, and aim to obtain a contradiction used as features of some edge, say (u,w), that is not incident A in each case: to v. Then u, v, and w share the common features a and b and hence must form a triangle in , which is im∈poAssible. • Case 1. |Av| = n − 1 and |Bv| = 1 for all v ∈ V = ∈B G U V. Since A = for all i [n] and there are no Thus, |Av||Bv|=deg(v) for every v ∈V, as stated. edg∪es between thueise veArtices u ele∈ments, B B = ∅ gPirvoepnogsritaipohns,4t.heθlco(wKenr,nb,onu)n>dθ2cn≥fomrinnαβ=≥θ21,(3α.+Hβe)necset,abfolirshthede wHhoweneevveerr,aisu(cid:54)=1 ijs.aSdijmacielanrtltyo, vB2iv,i..∩.,Bvnvj,thwehseeunivee∩vretircueijs(cid:54)=mujst. in Lemma 3 is not tight. havethesame -featureasu .Wearriveatacontradiction. 1 B Proof: Since the graphs under consideration are small, • Case 2. There exists one vertex, say ui, satisfying |Aui| = 1, while other vertices in the same part have A = , one can determine their cointersection numbers by using the uj A j = i. By Lemma 5, B = n 1 and B = 1 for algorithm of Section V-B, resulting in θc(K2,2,2) = 5 and j (cid:54)= i. Moreover, as u| iusi|not adj−acent to |u ujf|or j = i, θc(K3,3,3)=8.Thisfactmayalsobeprovedtheoretically,based B(cid:54) B =∅.As i=nand B =n 1j,thisimp(cid:54) lies on the previously derived results for the induced subgraphs K2,2 thuati∩B uj= B |Bfo|r all j =|i.uSii|nce A− = for all and K3,3. The details of the proof are omitted due to lack of j = iuajs weBll,\theuicorrespondi(cid:54)ng elements uuj muAst be all space. (cid:54) j adjacent, which is not true. We arrive at a contradiction. Proposition 5. Let KnM,n be a bipartite matrix obtained from • Case 3. There exist two vertices, which we without loss of Kn,n by removing a maximum matching. Then generality label as ui and uj, that are in the same part of 2n−1≤θc(KnM,n)≤2n. tLheemgmraap5h,, aBnd w=hicBh sat=isfny |A1u.iS|in=ce|Anu>j|2=,B1. ThBen b=y pTrhiemleo,wtheernbθocu(ndMis)a=tta2inne.d when n = 2,3. If n−1 is an odd ∅. Therefo|reu,iA|ui |∩Auju|j = ∅−. Without loss ofugie∩nerauljit(cid:54)y, Kn,n let Aui = {ai} and Auj = {aj}. For any h ∈ [n]\{i,j}, V =P{rvo1o,f.:..L,evtnK}nMb,ent=wo(Vpa,rEts)oafnVd lseutchUth=at{u1,...,un} and ds{iaendic,ueacjev}hthisiastaAcsounbns=eecttoedf,bftooorthbaoAllthvhhu.=iAisa,nj|Ad. vTuhhj|e,∈nw{eB1,dned−=uc1e1},tahwnadet E ={(ui,vj): 1≤i(cid:54)=j ≤n}. Bvh ∩Bvk =v∅h foAr every h (cid:54)=(cid:54) k, h,k ∈ [n|]\vh{|i,j}. We can set B = b and B = b . As n 1 is an odd By Lemma 2 we have vh { h} vk { k} − prime, n 4. Therefore, we can choose h and k such that ≥ θc( M ) 2n. (6) h,k,i,j are distinct. Kn,n ≤ Since v and v are not adjacent to v , and moreover, since h k i Note that A = A = , we deduce that B b ,b = ∅. vh vk A vi ∩ { h k} θ1(KnM,n)=|E(KnM,n)|=n2−n=(n−1)n. dTehdeurecfeorteh,at|BBvi| ≤=n1−. S2i.mSilianrcley,|BBvh| ∈={11.,nW−e c1a}n, wseet | vi| | vj| Therefore, by Lemma 3, B = b and B = b . For any r = i,j, since u is vi { i} vj { j} (cid:54) r adjacent to v and v , the set b ,b is a subset of B . θc( M ) min (α+β)=(n 1)+n=2n 1. (7) i j { i j} ur Kn,n ≥αβ≥n(n−1) − − Therefore, |Bur| = n−1, and hence, |Aur| = 1, for all r [n]. By the pigeon hole principle, among the n vertices • The cycle 4 has a unique (2,2)-CIR, where the vertices ∈ C u ,...,u , there must be two distinct vertices, say u and from 1 to 4 are respectively assigned the following sets of 1 n r u , that satisfy A =A . Moreover, as B = B = features: a ,a b , a b ,b , a ,a b , a ns 1, we must huavre B us B =∅ as w|elul.rW| e o|btuasi|n a b ,b . { 1 2 | 1} { 1 | 1 2} { 1 2 | 2} { 2 | − ur ∩ us (cid:54) 1 2} contradiction,sincethecointersectionConditionisviolated. A graph may not have a unique cointersection representation, Thus,ifn 1isanoddprimethenθc( M )=2n 1.Combining even if we restrict ourselves to optimal (α,β) cointersection − Kn,n (cid:54) − this fact with (6) and (7), we conclude that θc( M ) = 2n in representations, where α and β are fixed, and α + β = θc. Kn,n this case. An example of two optimal (2,3)-CIRs of the path 7 that P An obvious corollary of Proposition 5 is that there exists are not equivalent is presented in Fig. 11. In fact, we prove in infinitely many bipartite graphs where the lower bound θc Corollary5thateverypath n,n 4,except 5,isnotuniquely ≥ P ≥ P min (α+β) established in Lemma 3 is not attained. cointersectable. A similar result also holds for cycles, but we αβ≥θ1 omit the proof due to lack of space. In fact, most paths have V. ALGORITHMSFORTHECOINTERSECTIONMODEL at least exponentially many nonequivalent optimal cointersection representations(Theorem7).Notethatapathoracycle,whichis In what follows, we develop two algorithms for finding (exact obviously diamond free, is always uniquely intersectable. These and approximate) cointersection representations of a graph. The results suggest that uniquely cointersectable graphs are even first algorithm is based on a transformation to instances of scarcer than uniquely intersectable ones. The problem of finding the Satisfiability Problem (SAT) and outputs an optimal coin- anecessaryand/orsufficientconditionforagraphtobeuniquely tersection representation, which uses exactly θc features. The cointersectable is also open. secondalgorithmisbasedonthewellknownsimulatedannealing approach, which produces an approximate cointersection repre- Theorem7. Everypath n withn 6hasatleast( √n 1 P ≥ (cid:100) − (cid:101)− sentation of a graph. More specifically, this algorithm inputs , 1)! nonequivalent optimal cointersection representations. G α, and β, and outputs feature assignments to all vertices of the Proof: The main idea behind the proof is to construct a list graph so as to maximize, as much as possible, the score of the ofatleast( √n 1 1)!optimalcointersectionrepresentations representation, i.e. the number of pairs (u,v) that satisfy the (cid:100) − (cid:101)− of ,andthenshowthatforeverypairofrepresentations,there n cointersection Condition. P exist two vertices whose sets of assigned features intersect in a nonequivalent manner. A. Uniqueness of Optimal Cointersection Representations Two nonequivalent optimal (2,3)-cointersection representa- Before presenting the two algorithms, we briefly discuss the tions of 7 are shown in Fig. 11. If we delete the last vertex and P question of uniqueness of an optimal cointersection representa- edge in the paths, we obtain two nonequivalent representations tion of a graph. Throughout our analysis, we tacitly assume that for 6. P α≤β for all (α,β)-CIRs. {a1|b1,b2} {a1,a2|b3} {a2|b1,b2} Twocointersectionrepresentationsareconsideredequivalentif 2 4 6 {a {a {a one can be obtained from the other by possibly swapping the set b1} |1b b3} |2b b2} |2b ofA-featuresandthesetofB-features(onlyif|A|=|B|),andby 1 a{1 | }23 a{1 | }35 a{2 | }17 permutingfeatureswithineachset.Agraphissaidtobeuniquely cointersectable if all of its optimal cointersection representations {a1|b1} {a1|b2,b3} {a2|b2,b3} {a2|b1} areequivalent.Theissueofuniquecointersectionrepresentations {a1|b1,b2} {a1,a2|b3} {a2|b1,b2} 2 4 6 isofimportanceinpracticalapplications,wheredifferentfeature {a {a {a assignmentalgorithmsmayconstructdiversesolutionsandwhere b1} |1b b3} |2b b1} |2b we would like to understand how many different solutions are 1 a{1 | }23 a{1 | }35 a{2 | }27 possible. The related concept of uniquely intersectable graphs {a1|b1} {a1|b2,b3} {a2|b1,b3} {a2|b1} was studied in [20], [21]. It was proved in [21, Thm. 3.2] that every diamond-free graph is uniquely intersectable (more Fig.11. Anillustrationoftwononequivalent,optimal(2,3)-CIRsofthepath precisely, uniquely intersectable with respect to a multifamily). P7. In the first (top) representation, vertex 1 and vertex 5 do not share any features,whileinthesecond(bottom)representation,theydoshareonefeature, Note that a diamond is obtained by removing one edge in . K4 b1. The problem of finding a necessary and sufficient condition for a graph to be uniquely intersectable is widely open. Now suppose that n 8 and that we have an optimal (α,β)- Some examples of uniquely cointersectable graphs include: CIRof .Ifβ α+2≥,then(α+1)(β 1)>αβ,andhenceby n P ≥ − • Cliques n, n 2, which have a unique (1,1)-CIR with Proposition2,thereisanotheroptimal(α+1,β 1)-CIRof n. K ≥ − P all vertices having features a b , We can repeat this argument to obtain an optimal representation 1 1 { | } • n e, n 2, where e=(u,v) is an arbitrary edge. This with α β α+1 (Note that this argument also reveals that K − ≥ ≤ ≤ graph has a unique (1,2)-CIR in which u is assigned the for paths, there always exists a balanced optimal cointersection pair of features a b , v is assigned a b , while all representation). By Lemma 3, α(α + 1) αβ θ ( ) = 1 1 1 2 1 n { | } { | } ≥ ≥ P other vertices (if any) are assigned the set a b ,b . n 1 7. Hence, β α 3. We also have β √n 1 . 1 1 2 { | } − ≥ ≥ ≥ ≥(cid:100) − (cid:101) • The path 5 has a unique (2,2)-CIR, where the vertices We describe next a list of (β 1)! (α,β)-cointersection rep- P − from 1 to 5 are respectively assigned the following sets of resentations of and proceed to prove that the representations n P features: a b , a b ,b , a ,a b , a b ,b , are pairwise nonequivalent. Each of these representations corre- 1 1 1 1 2 1 2 2 2 1 2 { | } { | } { | } { | } and a b , sponds to a particular permutation σ of the set 1,2,...,β 1 , 2 1 { | } { − } denoted by Rσ. Following the proof of Proposition 2 for paths, {a1|bz,bz+1} {a2|by,bz} wedegpesarteitaicohn,theexcseepttoffnor−p1oesdsgibelsyintthoeαlagsrtougprsooufp,βwcohnicsehcumtiavye inRσ {a1|bz} u{a1|bz+1} {a2|by} v {a2|bz} contain less than β edges if αβ > n 1. In all representations, {a1|bz,bz+1} {a2|by,bt} wtoethaessifigrnstβgrpoauiprs ooff βfeactounressecu{tai1v,e−b1e}d,g{eas1,ibn2}t,h.a.t.,o{rdae1r,.bβI}n inRσ′ {a1|bz} u{a1|bz+1} {a2|by} v {a2|bt} chooseufromthe1stgroup choosev fromthe2ndgroup the representation R , we continue to assign β pairs of fea- σ tures a ,b , a ,b , a ,b ,..., a ,b to the nex{t2groβu}p{of2 βσ(cβo−n1s)e}cu{tiv2e eσd(βg−es2)}in tha{t o1rdeσr(.1)S}imi- Fig.13. (Case2)Thefeaturesetsofuandv withrespecttoRσ andRσ(cid:48). larly, the third group of edges is assigned pairs of features (a3,bσ(1)),(a3,bσ(2)),..., in Rσ, and so forth. In general, the z or y > z + 1 then in Rσ the vertices u and v share one raunldeitsotoasassigsingnthdeifffeearetunrtefseabtjurienssauicthodaifwfearyentthgartotuhpesloafstedegdegse, fIefaytu=re,zna+m1e,lythbezn, winhiRleσi,ntRheσ(cid:48)v,etrhtiecyesdounaontdshvarsehaarneypfreeactuisreelsy. of one group is assigned the same bj as the first edge of the two features, namely bz and bz+1, while in Rσ(cid:48), they share only following group. This process is continued until all edges are one feature, namely b . z+1 assigned one pair of features each. Upon completion of this This completes the proof. procedure,eachvertexisassignedtheunionofthesetsoffeatures Corollary 5. None of the paths , n 4, except for , is n 5 assigned to its adjacent edges. According to the argument used P ≥ P uniquely cointersectable. in the proof of Proposition 2 for paths, each R represents an σ (α,β)-cointersection representation of Pn. Proof: By Proposition 2, P4 has a (1,3)-CIR as well as a It remains to prove that for two different permutations σ and (2,2)-CIR,bothofwhichareoptimal.Hence, 4 isnotuniquely P σ(cid:48) of 1,2,...,β 1 , there exist two distinct vertices u and v cointersectable. For n 6, according to Theorem 7, n has at { − } ≥ P whose sets of assigned features intersect differently in the two least 2 = ( √6 1 1)! nonequivalent optimal cointersection (cid:100) − (cid:101)− representations. More specifically, u lies within the first group representations, and is hence not uniquely cointersectable. of vertices and v lies within the second group of vertices. Let j [β 1] be the largest index satisfying z =(cid:52) σ(j)=t=(cid:52) σ(cid:48)(j). B. Feature Assignments via SAT Solvers ∈ − (cid:54) Then y =(cid:52) σ(j +1) = σ(cid:48)(j +1). Note that if j = β 1, one − For arbitrary α and β, it is an NP-complete problem to deter- may set y = β. Without loss of generality, let us also assume mine if an (α,β)-CIR exists; indeed, when α = 1, the problem that t > z. We select v (see Fig. 12 and Fig. 13) to be the becomes whether there exists an intersection representation that vertexadjacenttothetwoconsecutiveedgesinthesecondgroup usesβ features,whichisknowntobeNP-complete[22].Wedis- whichareassignedfeatures a ,b and a ,b inR .InR , { 2 y} { 2 z} σ σ(cid:48) cuss below a means of determining the cointersection number in v is adjacent to two edges with assigned features a ,b and { 2 y} a constructive manner, which also results in feature assignments a ,b .Asα 3,bothgroupshaveβ edgesandverticesuand { 2 t} ≥ for the vertices. The idea is to restate the cointersection problem v as described above always exist. as a Satisfiability Problem (SAT). Weconsidertwocaseswhichcorrespondtodifferentchoicesof Given α, β, and a graph on n vertices, we construct an u. It suffices to show that in both cases, u and v have a different G instance of a SAT problem that is satisfiable if and only if there number of common features in R and R . σ σ(cid:48) exists an (α,β)-CIR of . An optimal pair (α,β), therefore, can Case 1. t = z +1. We select u (see Fig. 12) as the vertex G be determined via a simple binary search. We use the variables adjacent to the two consecutive edges in the first group that are x and y , for u [n], a [α], b [β], where x =1 and u,a u,b u,a assigned features a ,b and a ,b in both R and R . ∈ ∈ ∈ { 1 t} { 1 t+1} σ σ(cid:48) yu,b =1meanthatthevertexuisassignedafeaturea =[α] Note that t β 1, and hence t+1 β. Since y / z,t , we ∈A ≤ − ≤ ∈{ } and a feature b =[β], respectively. For each edge (u,v), we ∈B want the formula inRσ {a1|{bat}1|but{,bat1+|1}bt+1} {a2|{bay}2|bvy,{baz}2|bz} to be sa(cid:16)tis∨fiaa∈b[αle],(wxuh,iach∧ixsv,eaq)u(cid:17)iv∧al(cid:16)en∨tbt∈o[βt]h(eyur,ebq∧uiyrevm,b)e(cid:17)nt that(8u) inRσ′ {a1|{bat1}|but,{bta+11|}bt+1} {a2|b{ya}2|vby,{bat}2|bt} afaonnrddmavudlhdaaoivnneteosomamoceroencrjouemqnucmtiirovenemffeeonartmtuth,reawsteAain∈trAodauncde(xtbh∈evBa.rxiTaoblt)eu,Arwnuht,ihvc,ihas u,v,a u,a v,a chooseufromthe1stgroup choosev fromthe2ndgroup ↔ ∧ stands for Fig.12. (Case1)Thefeaturesetsofuandv withrespecttoRσ andRσ(cid:48). (Au,v,a∨xu,a)∧(Au,v,a∨xv,a)∧(Au,v,a∨xu,a∨xv,a). (9) Similarly, we include B (y y ), which stands for u,v,b u,b v,b considerthefollowingtwosub-cases.Ify <z ory >t+1,then ↔ ∧ (B y ) (B y ) (B y y ). (10) in Rσ the vertices u and v do not share any features, while in u,v,b∨ u,b ∧ u,v,b∨ v,b ∧ u,v,b∨ u,b∨ v,b R ,theydoshareonecommonfeature,namelyb .Ify =t+1, One may hence rewrite (8) as σ(cid:48) t then in R the vertices u and v share precisely one feature, σ ( A ) ( B ). (11) a∈[α] u,v,a b∈[β] u,v,b namely b , while in R , they share two features, b and b . ∨ ∧ ∨ t+1 σ(cid:48) t t+1 If(u,v)isnotanedge,weintroducethevariablesC andD Case 2. t > z +1. We select u (see Fig. 13) as the vertex u,v u,v adjacent to the two consecutive edges in the first group that are and the following clauses assigned a ,b and a ,b in both R and R . If y < C D , (12) { 1 z} { 1 z+1} σ σ(cid:48) u,v∨ u,v

Latent Network Features and Overlapping Community Discovery via Boolean Intersection Representations PDF

0.61 MB·

by Son Hoang Dau

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Latent Network Features and Overlapping Community Discovery via Boolean Intersection Representations

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.