ebook img

Info-Clustering: An Efficient Algorithm by Network Information Flow PDF

0.22 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Info-Clustering: An Efficient Algorithm by Network Information Flow

Info-Clustering: An Efficient Algorithm by Network Information Flow Chung Chan, Ali Al-Bashabsheh and Qiaoqiao Zhou Abstract—Motivated by the fact that entities in a social exponentialtime in the size of V [4]. The computationof the network or biological system often interact by exchanging in- PSP [3, 5], [1, Algorithm 3] without any approximation also formation, weproposean efficientinfo-clusteringalgorithmthat takes Ω(|V|2) calls to a submodular function minimization cangroupentitiesintocommunitiesusingaparametricmax-flow (SFM) algorithm, which in turn makes Ω(|V|5) oracle calls 7 algorithm.Thisisameaningfulspecialcaseoftheinfo-clustering 1 paradigm where the dependency structure is graphical and can to evaluate the submodular function. Hence, the practicality 0 be learned readily from data. of the general info-clusteringalgorithm is limited by both the 2 sample complexity and computational complexity. I. INTRODUCTION b Fortunately,info-clusteringreducesto faster algorithmsun- e Info-clustering was proposed in [1] as an application of der special statistical models that are also easier (compared F network information theory to the problem of clustering in to a general model) to learn from data. For instance, under 1 machine learning. It regards each object as a piece of in- the Markov tree model, info-clustering reduces to an edge- formation, namely a random variable, and groups random filteringprocedurethatrunsinO(|V|2)time[6].Furthermore, ] variableswithsufficientlylargeamountofmutualinformation T this procedure coincides with an existing functional genomic together.Clusteringisoftenanimportantfirststepinstudying I clustering method by mutual information relevance networks . a large biological system such as the human connectome and s (MIRN) [7]. While rediscovering a simple clustering algo- c genome. It can also identify communities in a social network rithmundertheMarkovtreesimplification,theinfo-clustering [ so that resources can be allocated efficiently based on the paradigm provides a theoretical justification of the MIRN 1 communitiesdiscovered.Sinceentitiesinasocialorbiological algorithmandhelpsdiscoverhowthealgorithmmayfailwhen v system often possess informationand interactwith each other the Markov tree assumption does not hold [1, Example 6]. 9 by the transmission of information, clustering them by their 0 In this work, we propose an efficient info-clustering al- mutual information intuitively gives meaningful results. 1 gorithm under a different graphical model called the pair- Using the multivariate mutual information (MMI) in [2] 0 wise independent network (PIN) [8, 9]. Using the idea of 0 as the similarity measure, info-clustering defines a hierar- the matroidal network link model in [10], the MMI has a . chy of clusters of possibly different sizes at different levels 2 concreteoperationalmeaningasthemaximumnetworkbroad- 0 of mutual information. The clustering solution is intimately cast throughput [11]. The info-clustering solution therefore 7 related to the principal sequence of partitions (PSP) [3] of 1 a submodular function, namely, that of the entropy function identifiesclusterswithlargeintra-clustercommunicationrates, : whichnaturallymapstocommunitiesofcloselyrelatedentities v of the set of random variables to be clustered. From this, it in a social network. Learning the PIN model simplifies to i follows that the clustering solution is unique and solvable X learning the weights of O(|V|2) edges in a graph on V. In using a polynomial number of oracle calls to evaluate the r a social network, the weight of each edge can simply be the a entropy function. However, in general, learning the entropy amount/rate of communication between the edge’s incident functionof a finite set V of randomvariablesfrom data takes nodes. C. Chan, and Q. Zhou are with the Institute of Network Coding at As shown in [1, Proposition 9], the info-clusteringsolution the Chinese University of Hong Kong, the Shenzhen Key Laboratory of Network Coding Key Technology and Application, China, and the Shen- for the PIN model can be obtained from the PSP of the cut zhen Research Institute of the Chinese University of Hong Kong (email: functionofaweightedgraph.Itiswell-knownthatfasterSFM [email protected]). algorithms are possible for the cut function using min-cut or A. Al-Bashabsheh was with the Institute of Network Coding at the Chinese University of Hong Kong. He is now with the Big Data and Brain max-flow algorithms (e.g., see [12–14]). An algorithm was Computing (BDBC) center at Beihang University, Beijing, China (e-mail: given in [15] that computes the PSP efficiently by reducing [email protected]). the problem to a parametric max-flow problem, where the TheworkdescribedinthispaperwassupportedbyagrantfromUniversity Grants Committee of the Hong Kong Special Administrative Region, China capacities of the edges are certain monotonic functions of (ProjectNo.AoE/E-02/08),andsupportedpartiallybyagrantfromShenzhen a parameter. The reduction is carefully done such that the ScienceandTechnologyInnovationCommittee(JSGG20160301170514984), parametric max-flow algorithm in [16] can compute the PSP theChinese University ofHongKong(Shenzhen), China. The work of C. Chan was supported in part by The Vice-Chancellor’s in O(|V|3 |E|) time, where E is the set of edges with non- One-offDiscretionaryFundofTheChineseUniversityofHongKong(Project zeroweight.Wewilladaptthisalgorithmtocomputetheinfo- Nos.VCF2014030andVCF2015007),andagrantfromtheUniversityGrants p clustering solution and modify it to improve the performance CommitteeoftheHongKongSpecialAdministrative Region,China(Project No.14200714). further. source Z 1 1 1 1 5 = c({ m2 m1 c({1,3})Xc Xa1,2})=1 5 m1 1 5γ∈[2,5) 1 1 3 Xb 2 3 2 3 1 2 Z3 c({2,3})=1 Z2 m2 γ∈[−∞,2) (a) Weighted graph G that represents of the (b)Broadcastof2messagebitsm1and (c) The info-clustering solution in (2.11). PIN defined in (2.1a). m2 with capacities defined in (2.1c). Fig. 1: Identifyingthe clusters of the PIN in (2.1a) by brute-forcesearch oversubsets satisfying the threshold constraint(2.9). II. PRELIMINARIES ON INFO-CLUSTERING Definition 2.1 (Definition 2.4 of [17]) ZV is a hypergraphi- A. Formulation cal source w.r.t. a hypergraph (V,E,ξ) with edge functions ξ : E → 2V \ {∅} iff, for some independent (hyper)edge Let V be the finite index set of the objects we want to variables X for e∈E with H(X )>0, cluster. Without loss of generality, we assume e e Z :=(X |e∈E,i∈ξ(e)), for i∈V. (2.2) V =[|V|]:={1,...,|V|} and |V|>1 i e The weight function c : 2V \{∅} → R of a hypergraphical The idea of info-clustering is to treat each object i ∈ V as a random variable Z (denoted in san serif font) taking source is defined as i values from a finite set Zi (denoted in the usual math font), c(B):=H(Xe |e∈E,ξ(e)=B) with support (2.3a) and cluster the objects according to their mutual information. supp(c):= B ∈2V \{∅}|c(B)>0 (2.3b) PZ denotes the distribution of the entire vector of random V variables The PIN mode(cid:8)l [9] is an example, wher(cid:9)e the corresponding Z :=(Z |i∈V). hypergraphis a graph. V i Definition 2.2 ([9]) Z is a pairwise independent network We will illustrate the idea of info-clustering via a simple V (PIN)iffitishypergraphicalw.r.t.agraph(V,E,ξ)withedge example shown in Fig. 1a, where V = [3] = {1,2,3} and the corresponding random variables are defined as function ξ :E →V2\{(i,i)|i∈V} (i.e., no self loops). ✷ Z :=(X , X ) The mutual information among multiple random variables 1 a c is measured by the multivariate mutual information (MMI) Z :=(X ,X ) (2.1a) 2 a b defined in [2] as Z :=( X ,X ), 3 b c I(Z ):= min I (Z ), with (2.4a) with Xa,Xb and Xc being independent uniformly random V P∈Π′(V) P V variables with entropies 1 I (Z ):= H(Z )−H(Z ) (2.4b) H(X )=H(X )=1 P V |P|−1 C∈P C V a b (2.1b) (cid:20)X (cid:21) H(Xc)=5. =D(PZVk C∈PPZC) This is a PIN (Definition 2.2) with correlation represented by andΠ′(V)beingtheseto|fpartitions{QozfV into at}least2 non- the weightedtriangleGshownin Fig.1a characterizedbythe emptydisjointsubsetsofV. We mayalso writeIP(ZV)more weight function c where explicitly as c({1,2})=c({2,3}):=H(Xa)=H(Xb)=1 (2.1c) IP(ZV)=I(ZC1 ∧···∧ZCk) (2.5) c({1,3}):=H(X )=5. c for P ={C ,...,C }. Note that Shannon’s mutual informa- 1 k The vertex set is V :=[3] and the edge set is tion I(Z1∧Z2) is the special case when P is a bipartition. It is sometimes convenient to expand I (Z ) using Shannon’s P V E :=supp(c)={{1,2},{2,3},{1,3}}. (2.1d) mutual information [2, (5.18)] as follows: Note that, forease of comparison,this is thesame graphused I (Z )=I(Z ∧···∧C ) P V C1 k as an example in [15] to illustrate the algorithm. A formal k−1 1 definitionof the (hyper-)graphicalsourcemodelis as follows: = I(Z ∧Z ). (2.6) k−1 Ci kj=i+1Cj i=1 X S For the example with the random vector defined in (2.1a), The info-clustering solution above consists of two clusters shown in Fig. 1c for different intervals of the threshold γ. I(Z )=I(Z ∧Z )=c({v,w}) {v,w} v w For γ ≥2, the subset {1,3} is the only feasible solution that 1, {v,w}∈{{1,2},{2,3}} (2.7a) satisfiesthethresholdconstraintin(2.9).Forγ ≤2,theentire = (5, {v,w}={1,3} set [3] also satisfies the threshold constraint and is maximal. Recall from (2.7b) that there is a unique optimal partition for which reduces to Shanon’s mutual information, and I(Z ), which is therefore the finest optimal partition {1,2,3} I(Z )=min I (Z ), [3] {{2,3},{1}} [3] P∗(Z )= {1,3},{2} . (2.12) I (Z ), [3] (cid:8){{1,3},{2}} [3] I{{1,2},{3}}(Z[3]), (cid:8)C2(Z[3]) (cid:9) I (Z ) As expected from Proposition|2.{1z, th}e non-singleton element {{1},{2},{3}} [3] =min I(Z ,Z ∧Z ), (2.7b) {1,3} is the only possible subset with MMI larger than 2. 2 3 1 (cid:9) It turns out that the computation of the MMI, fundamental I(Z ,Z ∧Z ), (cid:8) 1 3 2 partition, and the entire info-clustering solution can be done I(Z1,Z2∧Z3), in strongly polynomial time from the principal sequence of I(Z1,Z3∧Z2)+I(Z1∧Z3) partition (PSP) of the entropy function 2 =min 6,6,2,2+25 =2, (cid:9) h:B ⊆V 7→H(ZB). (2.13) where we have applied (2(cid:8).6) to calcul(cid:9)ate I{{1},{2},{3}}(Z[3]) The PSP is a more general mathematical structure [3] in for the partition into singletons as the average of the value combinatorialoptimizationdefinedforasubmodularfunction. I(Z1,Z3∧Z2) of the cut that separates node 2 from nodes 1 More precisely, a reveal-valued set function g : 2V → R is and3,andthevalueI(Z1∧Z3)ofthecutthatfurtherseparates said to be submodular iff node 1 from node 3. Note thatthe sequenceoftwo cutseffectivelypartitionsthe g(B1)+g(B2)≥g(B1∩B2)+g(B1∪B2) (2.14) vertexset into singletons. From this expansion,it is clear that for all B ,B ⊆ V. The entropy function, in particular, is a 1 2 the partition into singletons cannot be optimal in this case, submodularfunction[18].ThePSPofthesubmodularfunction since the mutual information between nodes 1 and 3 is very g is the characterization of the solutions to the following for large.Indeed,theoptimalpartitionturnsouttobea clustering all γ ∈R: of the random variable into correlated groups. In general, the gˆ(V):= min g [P], (2.15a) set of optimal partitions to (2.4a), denoted as Π∗(ZV), form P∈Π(V) γ a semi-lattice w.r.t. the partial order that P (cid:22) P′ for the referredtoastheDilworthtruncation[19],whereΠ(V)isthe partitions P and P′ when partition of V into one or more non-empty disjoint subsets, ∀C ∈P, ∃C′ ∈P′ :C ⊆C′. (2.8) g [P]:= g (C) (2.15b) γ γ P ≺ P′ denotes the strict inequality. There is a unique C∈P X finest/minimum partition P∗(ZV), referred to as the funda- gγ(C):=g(C)−γ (2.15c) mental partition for ZV [2, Theorem 5.2]. For a threshold (n.b.,Π′(V) in (2.4a)is Π(V) but withoutthe trivial partition γ ∈R, the set of clusters is defined as [1, Definition 1] {V}, i.e., Π′(V) = Π(V)\{{V}}.) For every γ, submodu- C (Z ):=maximal{B ⊆V ||B|>1,I(Z )>γ} (2.9) larity of g implies that there exists a unique finest/minimum γ V B (w.r.t. the partial order (2.8)) optimal partition to (2.15a), where maximalF is used to denote the inclusion-wise maxi- denoted as P∗(γ). It can be characterized as mal elements of F, i.e., P∗(γ)=P ∀γ ∈[γ ,γ ),ℓ∈{0,...,N} (2.16a) ℓ ℓ ℓ+1 maximalF :={B ∈F |6∃B′ )B,B′ ∈F}. (2.10) for some integer N >0, a sequence of critical values of γ Proposition 2.1 ([1, Theorem 5]) C (Z ) = P∗(Z )\{i | γ V V i∈V} with γ =I(ZV), i.e., the non-singleton subsets in the −∞<γ1 <···<γN <∞ (2.16b) fundamentalpartition are the maximal subsets (clusters) with with γ := −∞ and γ := ∞ for convenience, and a 0 N+1 MMI larger than that of the entire set. ✷ sequence of successively finer partitions Fortheexample,applyingthedefinition (2.9)ofclusterswith P =V ≻P ≻···≻P ={{i}|i∈V}. (2.16c) the MMI calculated in (2.7), the clustering solution is 0 1 N The sequence of partitions (together with the corresponding {[3]} γ ∈(−∞,2) critical values) is referred to as the PSP of g. The PSP of Cγ(Z[3])={1,3} γ ∈[2,5) (2.11) the entropy function (2.13) characterizes the info-clustering ∅ γ ∈[5,∞). solution as follows:  Proposition 2.2 ([1, Corollary 2]) For a finite set V with The incut function of the weighted digraph is defined as size |V|>1 and a random vector Z , V g(B):=c(V \B,B). (3.3) C (Z )= min{P ∈Π(V)|h [P]=ˆh (V)} γ V γ γ (2.17) The incut function for the digraph in Fig. 2a is shown in h {{i}|i∈V}, i Fig. 2b and calculated below: namely, the non-singl(cid:15)eton elements of the finest optimal par- g({2})=c({1,3},{2})=c(1,2)+c(3,2)=1 tition to the Dilworth truncation (2.15a). ✷ g({3})=c({1,2},{3})=c(1,3)+c(2,3)=6 g({2,3})=c({1},{2,3})=c(1,2)+c(1,3)=6 (3.4) III. INFORMATION FLOWINTERPRETATION g({1,3})=c({2},{1,3})=c(2,1)+c(2,3)=1 For PIN, the MMI can be interpreted as the maximum broadcast throughput of a network [10, 11], and hence info- g({1})=g({1,2})=g({1,2,3})=0. clustering reduces to clustering by network information flow. TocomputethePSPofg,wefirstevaluate(2.15b)fordifferent When applied to clustering social network, it can identify partitions as follows: communities naturally based on the amount of information g [{{1,2,3}}]=g ({1,2,3}) flow. γ γ More precisely, treating each edge as an undirected com- =g([3])−γ munication link with capacity c, at most a total of 1 bit can =−γ be communicated between node 1 and 2, and between node g [{1,3},{2}]=g ({1,3})+g ({2}) γ γ γ 2 and 3; and at most a total of 5 bits can be communicated (3.5a) =g({1,3})+g({2})−2γ between node 1 and 3. It can be seen that, for every pair of distinct nodes v,w ∈ V, the broadcast throughput between v =2−2γ and w is given by the MMI in (2.7a). Fig. 1b illustrates how g [{1},{2},{3}]=g ({1})+g ({2})+g ({3}) γ γ γ γ 2 bits of information can be broadcast in the entire network, =7−3γ achieving the MMI of the entire set of random variables in (2.7b).WiththeinterpretationoftheMMIasinformationflow, Similarly, aclusteratthresholdγ isthereforeamaximalsubnetworkwith g [{1,2},{3}]=6−2γ >g [{1,3},{2}] γ γ broadcast throughput larger than γ. For instance, the cluster (3.5b) g [{1},{2,3}]=6−2γ >g [{1,3},{2}]. {1,3}∈C (Z )istheonlysubsetofnodesonwhichthe γ γ 2 {1,2,3} induced subnetwork has a throughput exceeding 2. Fig. 2c plots the Dilworth truncation gˆ (V) in (2.15) against γ Specializingtothe(hyper-)graphicalmodel,itwasshownin γ as the minimum of g [P] over all partitions P ∈ Π(V). It γ [1,Proposition8]thattheclusteringsolutionscanbeobtained can be seen that for a given P, g [P] is linear with integer γ directlyasthenon-singletonsubsetsfromthePSPoftheincut slope −|P| ∈ {−|V|,...,1} and so gˆ (V) is piecewise γ function. More precisely, from the weighted graph G with linear consisting, in this example, of |V| = 3 line segments vertex set V and capacity function c, define for every pair (highlightedinblue)and|V|−1=2breakpoints(highlighted (v,w) of vertices v,w∈V in red). (In general, the number of line segments is at most |V|.) The finest optimal partition for each value of γ is c({v,w}), v <w c(v,w):= (3.1) {{1,2,3}}, γ ∈(−∞,2] (0, otherwise. TwthhietihssedvteerfiotnefxesasrtcehsteV(cva,apwnad)citewydigtfhuenpsceotitsoiEntivoewfhcaiacpwhaecciigatynhtecbd(evd,diwegfi)ranpe>hd Da0s. P∗(γ)={|{1,P{3zP0}1,{2}}}, γ ∈[ γ21 ,5) (3.6) {{1},{2},{3}}, γ ∈[ 5 ,∞) For example, Fig. 2a is the weighted digraph obtained by | {z } |{z} oi.rei.e,nbtyindgirtehcetinwgeaignhetdedgegfrraopmhthineiFnicgi.de1natnacocdoerwdiinthgatosm(3a.l1le)r, with the PSP and|the coP{rrz2espond}ing critic|a{γlz2}values annotated label to the other incident node with a larger one. above and in the Fig. 2c. For convenience, we also write for arbitrary subsets B ,B ⊆V IV. CLUSTERING USING PARAMETRIC MAX-FLOW 1 2 By [1, Proposition 9], the PSP of the incut function c(B ,B ):= c(v,B ) where (3.2a) 1 2 2 B 7→ c(V \ B,B) of the digraph D coincides with the vX∈B1 PSP of the cut function (divided by 2) of the corresponding c(v,B2):= c(v,w) for v ∈V, and (3.2b) undirected graph G, which was shown in [15] to be com- wX∈B2 putable by running a parametric max-flow algorithm O(|V|) c(B ,w):= c(v,w) for w ∈V. (3.2c) times. The parametric max-flow algorithm was introduced by 1 [16], whichrunsin O(|V|2 |E|) times usingthe well-known wX∈B1 p gˆγ(V) γ1=2γ2=5 1 1 γ 1 })= gγ[{{1,2,3}}]=1−γ gγ[{{1,2},{3}}] 5 g({1,35 1 | P{z0 } ==gγ6[{−{12}γ,{2,3}}] c(1,3)= c(1,2)=1 g({12,3})=6 gγ[{|{1,3P{}z1,{2}}}]=2−2γ 3 2 gγ[{{1},{2},{3}}]=5−3γ 3 2 g({3})=6 g({3})=6 P2 c(2,3)=1 | {z } (a)WeighteddigraphDbyorientingG (b) The non-zero values of the incut function (c) The PSP (3.6) of g and the Dilworth truncation as (3.1). g in (3.4). from (3.5). Fig. 2: Computing the clusters of the PIN model (2.1a) as the non-singleton elements of the PSP (2.16). push-relable/preflowalgorithm [20, 21] implementedwith the Algorithm 1: Computing the PSP as a parametric SFM. highest-level selection rule [22]. Hence, the info-clustering Input: Submodular function g :2V 7→R defined on a algorithm solution for the PIN model can be obtained in finite ground set V. O(|V|3 |E|) time. Output: The function P∗(γ) of γ ∈R defined in (2.16a). In thpis section, we will adapt and improve the algorithm 1 P∗ ←{{1}}, B∗(γ)←{1}, xγ,1 ←gγ({1}), µi ←−∞ in [15] to compute the desired PSP for the info-clustering for all i∈V; solution. The algorithm will be illustrated using the same 2 for j =2 to |V| do exampleasin the lastsection,whichis chosentobe the same 3 set B∗(γ) as the (inclusion-wise) minimum example as in [15] for ease of comparison. minimizer to We first give a procedure in Algorithm 1 for computing the minimum minimizer P∗(γ) to (2.15) for all γ ∈ R and min gγ(B)−xγ(B\{j}), (4.1) B⊆[j]:j∈B anysubmodularfunctiong,assumingaparametricsubmodular function minimizer. This procedurecan be specialized further where xγ(C):= i∈Cxγ,i for convenience; to the PIN model where g is chosen to be the incut func- 4 remove every C that intersects B∗(γ) from P∗(γ); P tion (3.3), so that the parametric max-flow algorithm can be 5 add B∗(γ) to P∗(γ); applied instead. 6 if j <|V| then Considertheexamplewithg definedin(3.4)andillustrated 7 for i=1 to j do inFig.2b,andwithg definedin(2.15c).Whenj =2,Line1 8 µi ←max{µi,min{γ |i6∈B∗(γ)}} ; γ initializes xγ,1 as 9 xγ,i ←gmax{γ,µi}({i}) ; 10 end 11 end x =g({1})−γ =−γ. (4.2) γ,1 12 end Then, (4.1) becomes min{g ({1,2})−x ,g ({2})}=min{0,1−γ} γ γ,1 γ 1−γ, γ <1 P∗(γ) to = (0 γ ≥1, which is a piecewise linear function plotted in Fig. 3a. The minimum minimizer is therefore given by {1,2} , γ < 1 P∗(γ)=(cid:8) B0 (cid:9) γ1′ (4.4) {1,2}, γ <1  {|1{}z, {}2} , γ ≥|{1z}. B∗(γ)= B0 (4.3) (cid:8) B1 (cid:9) γ2′ |{2{}z,} γ ≥ 1 .  |{z} |{z} B1 γ1′ With P∗(γ) initialized inL|{inze} 1 to {{1|}{}z,}Line 4–5 update Next, with µ and µ initialized to −∞ in Line 1, the 1 2 min gγ(B)−xγ(B\{2}) 5 update P∗(γ) to the desired solution (3.6) characterized by B⊆{1,2}:2∈B the PSP. 1−γ γ′ =1 1 The procedure is said to be parametric since the solution P∗ is computed for all possible values of γ ∈ R rather than 0 γ a particular value. To realize such a procedure, the charac- B0={1,2} B1={1} terization (2.16a) of P∗(γ) through the PSP in (2.16c) and the corresponding critical values of γ in (2.16b) is computed B∗(γ) 1 1 instead. 3 2 3 2 The minimization in (4.1) is a submodular function mini- P∗(γ) 1 1 mization (SFM) (over a lattice family). In contrast with [15], we perform (4.1) on a growing set [j] = {1,...,j} instead 3 2 3 2 of the entire set V in every loop. The idea follows from the (a) For the loop with j =2. algorithm for computing Dilworth truncation such as the one B⊆{1m,2,i3n}:3∈Bgγ(B)−xγ(B\{3}) givenin[19].Incontrastwith[19],however,wefollow[15]to consider the update rule in Lines 8–9, which guarantees x γ′ =2 γ′ =5 γ,i 1 2 to be piecewise linear with at most one break point, possibly 6−γ at γ = µ if µ is finite. When i = j, the step in Line 8 B={2,3} i i 5 givesµ =−∞asB∗(γ)alwayscontainj (see(4.1)),andso j x = g ({j}), which is linear without any breakpoint. As γ,j γ pointed out in [15], the update rule is particularly useful for max{2−γ,1} thetheparametricproceduresincethecomplexityoftengrows with the number of break points. max{0,γ−1} B1={1,3} B2={3γ} Specializing to PIN models where g is the incut function B0={1,2,3} definedin(3.3),theparametricSFM in (4.1)canbe solvedas a parametric min-cut problem as shown in Algorithm 2. B∗(γ) 1 1 1 3 2 3 2 3 2 Algorithm 2: Computing the parametric SFM in (4.1) as P∗(γ) 1 1 1 a parametric min-cut. 3 2 3 2 3 2 Input: A weighted digraph D on vertex set V with P1 P2 P3 capacity function c:V2 →R satisfying (3.1). γ1 γ2 Output: The minimum minimizer to (4.1) with g defined (b) For the loop with j =3. as the incut function in (3.3). 1 define a weighted digraph Dj(γ) with vertex set Fig. 3: Illustration of the computationof PSP in Algorithm 1. U ←{s,1,...,j} (where s is a new node outside [j]) and capacity function c :U2 →R initialized to 0; γ 2 for v =1 to j−1 do subsequent steps following Line 5 give 3 cγ(s,v)←max{0,−xγ,v} ; 4 cγ(v,j)←max{0,xγ,v}+c(v,j) ; µ1 =max{−∞,γ1′}=1 (4.5a) 5 cγ(v,w)←c(v,w) for all w ∈[j−1]\[v] ; −1, γ <1 6 end xγ,1 =gmax{γ,1}({1})=(−γ, γ ≥1. (4.5b) 7 compute the minimum minimizer B∗(γ) to µ2 =max{−∞,−∞}=−∞ (4.5c) min cγ(U\T,T) (4.7) T⊆U\{s}:j∈T x =g ({2})=1−γ. (4.5d) γ,2 max{γ,−∞} which is the minimum s–j cut value of D (γ). Similarly, when j = 3, the function of the minimum in (4.1) j is plotted in Fig. 3b. It follows that the minimum minimizer is For the current example, the digraph D (γ) with j = 2 is j {1,2,3}, γ ∈(−∞,2] shown in Fig. 4. The vertexset is U ={s,1,2}. The for-loop sets v =1 and initializes the capacity c(s,1)=|−x |+ and B∗(γ)={|1B,B{13z0},} γ ∈[ γ21′ ,5) (4.6) c(1,2)=|xγ,1||+x|++ :1=, wmhaexre{0,x} for x∈R. γ,1 (4.8) {3}, γ ∈[ 5 ,∞). | {z } |{z} Evaluatingthe capacitieswith theinitialvalueof x in (4.2) With P∗(γ)givenby(4|.B4{z)2}andillustrate|d{γiz2′n}Fig.3a, Lines4– gfuivnecstiothnesnc(osn,-1d)ecarnedascin(1g,a2n)drensopne-citnivcreelyassinhgowpniecinewγt,hi1seefilginueraer. |−xγ,1|+ |xγ,1|+0, γ<0|−xγ,1|+−γ, γ<0 0, γ<0 = = s =(γ, γ≥0 1 s (γ, γ≥01 (0, γ≥0t s 1 1 |xγ,1|++1 c(s,{1,2})=1 1−γ, γ<0 5 1 1 = (1, γ≥0 1 2 3 2 c({s,1},2)=1 2 (a) Weighted digraph D2(γ). (b) Construction of D2(γ) from D. (c) Minimum s–2 cut for D2(1). Fig. 4: Illustration of the parametric min-cut problem in Algorithm 2 for the for-loop with j =2. |−xγ,1|+ |−xγ,1|+ = 1, γ<1 1, γ<1 (γ, γ≥1 = (γ, γ≥1 1 s 1 s |−xγ,2|+ T∗={1,2,3} 1 |−xγ,2|+ 5 1 0, γ<1 1 (f∗(s,1)=1) s 5 1 = 0, γ<1 1 =(γ−1, γ≥1 =1)(f∗ (γ−1, γ≥1 t 3 2 5 3) (1, 3|xγ,2|++1 2 |xγ,2|+ f∗(1, 2)=1 = 2−γ, γ<1 = 1−γ, γ<1 ((f∗(2,3)=00)) (1, γ≥1 (0, γ≥1 3 1 2 (a) Weighted digraph D3(γ). (b) Construction of D3(γ) from D. (c) s–3 max-flow and min-cut for D3(1). Fig. 5: Illustration of the parametric min-cut problem in Algorithm 2 for the for-loop with j =3. Similarly, the digraph D (γ) with j = 3 is shown in Fig. 5, c (v,w)=c(v,w) forw ∈[v−1]astheyarezerobydefault. j γ with the vertex set now being U = {s,1,2,3} instead, and Incontrastwith[15],theadditionalnodecontractionandedge x and x updated to the functions in (4.5). removalin Steps2 and3abovereducethe numberofvertices γ,1 γ,2 The weighted digraph D (γ) can be viewed as a result of from |V| to |U| = j and therefore the complexity in solving j processing the weighted digraph D as follow: (4.7). 1. AugmentthedigraphDwithtwonewnodessandt(both Foreachj ∈V,(4.7)canbesolvedbytheparametricmax- outside V) such that flow algorithm in [16] in O(j3 |E|) time by invoking O(j) a. the capacity from j to t is set to infinity; timesthepreflowalgorithm[20,21]implementedwithhighest p b. for v from 1 to j −1, if x < 0, add an arc with level selection rule [22], which in turn runs in O(j2 |E|) γ,v capacity −x from s to v, else add an arc with times. The procedure is described in Algorithm 3, which γ,v p capacity x from v to t. returns the characterization of the solution B∗(γ) to (4.7) as γ,v 2. Removealloutgoingarcsfromnodej and contractnode t to node j. B∗(γ)=Bℓ for γ ∈[γℓ′,γℓ′+1),ℓ∈{0,...,N′} (4.9) 3. Remove all incoming arcs of nodes j + 1,...,|V| and contract the nodes to node s. (4.7). for some integer N′ > 0, where γ′ := −∞,γ′ := +∞ 0 N′+1 This procedure is illustrated for D2(γ) and D3(γ) in Fig. 4b and B0 := [j]. Note also that B∗(γ) = {j} for sufficiently and Fig. 5b respectively, where the red dotted arcs are large γ and so BN′+1 ={j}. removed, and the nodes circled together by blue lines are To solve (4.7) for any fixed j, we assume the following contracted. subroutine Step 2 implies the formula in Line 4 directly. Step 3 gives [f∗,T∗]=MaxFlow(c¯,U¯,s¯,t¯,f¯) (4.11) c (s,v)=max{0,−x }+ c(u,v) γ γ,v u∈XV\[j] which takes as arguments the capacity function c¯ (fixed and but this reduces to the formula in Line 3 because the last not parametric), the vertex set U¯ on which c¯ is defined, the summation is zero by the assumption (3.1) that all the arcs source node s¯∈ U¯, the sink node t¯∈ U¯ and a valid preflow pointfroma nodewith a smaller labelto a nodewith a larger f¯associated with the weighted digraph D (γ) defined by the j label. For the same reason, in Line 5, we do not need to set previous arguments. It returns the maximum s¯–t¯flow f∗ and Algorithm 3: Solving the parametric min-cut problem in 3 give (4.7) using the parametric max-flow algorithm [16]. γ+ =c(1,2)=1 Input: The weighted digraph Dj(γ) on vertex set U with γ− =max{µ1,c(1,2)}=1, capacity function c created in Algorithm 2. γ Output: A list L containing (γ′,B ) for ℓ∈[N′] that where the last equality is because µ1 is initialized to be −∞ ℓ ℓ by Line 1 of Algorithm 1. Since γ− = γ+ in this case, the characterizes (4.7) as in (4.9). algorithm returns at Line 5 the list 1 create empty lists L and PL; 2 γ+ ←maxv∈[j−1]{c([v−1],v)+c(v,[j]\[v])}; L=[( 1 , {1})]. 3 γ− ←min{max{µv,c(v,j)}|v ∈[j−1]}; 4 if γ− =γ+ then γ1′ B1 5 L←(γ−,{j}) and return L; This gives the desired B∗|(γ{z)}in|{z(4}.3). Fig. 4c shows the 6 end digraph D (γ) at γ = 1. It can be seen that both {1,2} and 7 set f as the zero flow 0 from s to j; {2} are so2lutions to the minimization in (4.7). 8 [f∗,T∗]←MaxFlow(cγ−,U,s,j,f); If γ− 6= γ+ (or more specifically γ− < γ+), then the 9 add (γ−,γ+,f∗,{s},{j}) to PL; interval (γ−,γ+) must contain other critical values of γ 10 while PL is not empty do where B∗(γ) changes. The critical values are then computed 11 withdraw any element (γ−,γ+,f,S,T) from L; iterativelybythepreflowalgorithmMaxFlow(4.11)(Lines8 12 compute γ¯ ∈[γ−,γ+] as the solution to and23)appliedonthedigraphD¯ withcapacitiesderivedfrom j those of D (γ) (Lines 15–17), and with γ evaluated at some c (S,U \S)=c (U \T,T); (4.10) j γ γ value γ¯ ∈ (γ−,γ+) satisfying (4.10). This either resolves 13 define a weighted digraph D¯j with vertex set B∗(γ) for the entire interval (in which case the solution is U¯ ←([j]\(S∪T))∪{s,j} and capacity function updated in Line 25) or reduces the problem to two smaller c¯:U¯2 →R initialized to 0; subproblemsforlaterprocessing(i.e.,withtheoriginalinterval 14 for v in U¯ \{s,j} do (γ−,γ+) replaced by the two smaller intervals (γ−,γ¯) and 15 c¯(s,v)← u∈Scγ¯(u,v) ; (γ¯,γ+) in Line 27). 1176 cc¯¯((vv,,wj))←←Pc(wv,∈wT)cγ¯fo(vr,awll)w; ∈U¯ \{s,j,v} ; D3T(oγ)illsuhsotrwanteitnheFpigr.oc5eaduarsetahbeoivnep,uctotnosidAelrgojr=ith3m, i3.e..,Twheitnh, 18 end P Lines 2–3 give 19 for v in U¯ \{j} do γ+ =max{c(1,2)+c(1,3),c(1,2)+c(2,3)} 20 f¯(v,j)← w∈T min{f(v,w),c¯(v,j)} and f¯(j,v)←−f¯(v,j) to ensure anti-symmetry; =max{1+5,1+1}=6 21 f¯(v,w)←Pf(v,w) for all w∈U¯ \{j,v} ; γ− =min{max{µ1,c(1,3)},max{µ2,c(2,3)}} 22 end =min{max{1,5},max{−∞,1}}=1 23 [f∗,T∗]←MaxFlow(c¯,U¯,s,j,f); 24 if T∗ ={j} then where we used the values µ1 = 1 and µ2 = −∞ by (4.5). 25 add (γ−,T) to L; Sinceγ− <γ+ inthiscase,Line5isskipped.Line8invokes 26 end the preflow algorithm for the graph D3(γ−) = D3(1) shown 27 add (γ−,γ¯,f,S,T ∪T∗) and in Fig. 5c. The min-cut is T∗ = [3] = U \{s} (where U = (γ¯,γ+,f∗,S∪(U \T∗),T) to PL; {s,1,2,3}is the vertexset of D3) by the constructionof γ−. 28 end The max-flow is f∗(s,1)=f∗(1,3)=1 (4.13) f∗(1,s)=f∗(3,1)=−1 the inclusion-wise minimum set T∗ that solves and0otherwise.Notethatthesecondlineofequationsensures min c(U¯ \T,T), (4.12) the anti-symmetry property of a flow function, i.e., T⊆U¯\{s¯}:t¯∈T f∗(w,v)=−f∗(v,w) (4.14) and is referred to as the minimum s¯–t¯cut. Roughly speaking, γ+ in Line 2 is the value of γ at which for all pairs of distinct nodes v and w. The flow along each (4.7) (i.e., B∗(γ)) is constant for γ ≥ γ+. Similarly, γ− in arc is indicated in Fig. 5c by the parentheses next to the Line3isthevalueofγ atwhich(4.7)isconstantforγ ≤γ−. corresponding capacity of the arc. Whenγ− =γ+,thereisonlyonecriticalvalueγ′ ofγ where The tuple (γ−,γ+,f∗,{s},{j}) is then added to PL in 1 B∗(γ) changes from B =[j] to B ={j}. Line 9 and then retrieved(and deletedfromPL) subsequently 0 1 To illustrate the above, consider j = 2, i.e., with D (γ) inside the while-loop (Line 11). With γ− = 1, γ+ = 6, 2 shown in Fig. 4a as the input to Algorithm 3. Then, Lines 2– S = {s} and T = {j} in Line 11, the l.h.s. of (4.10) is 1 3.5 s S T 1 2 s S 1 5 S s (1) (1) (3.5) (1) 5 (0) 1 (0) 2.5 5 (1) (0) 1 (0) 1 5 (3.5) 1 (1) 4 (0) (0) (0) (1) T 3 1 2 3 1 2 T 3 1 2 T∗={1,3} 1 3.5 s s 1 5 s (3.5) (5) 5 (3.5)(0)1 (1) 2.5 (22) (1) 1 5 (5) (11) T∗={3} (1) (1) 3 1 2 3 1 2 3 T∗={3} (a) D¯3 from D3(3.5). (b) D¯3 from D3(2). (c) D¯3 from D3(5). Fig. 6: Illustration of the parametric max-flow algorithm in Algorithm 3. given as (see Fig. 5a) Line 25 will be skipped. Instead, Line 27 adds the following two tuples to the list PL, which becomes c (S,U \S)=c (s,{1,2,3}) γ γ 1, γ <1 0, γ <1 PL=[(1,3.5,f∗,{s},{1,3}),(3.5,6,f,{s,2},{3})]. = + (γ, γ ≥1 (γ−1, γ ≥1 (4.15) Repeatingthe while-loopwiththe firstelementretrievedfrom 1, γ <1 PL, it can be shown that (4.10) is solved by the value γ¯ =2. = (2γ−1, γ ≥1 SimilartoFig. 6a,Fig.6bshowsthedigraphD3(2)atthetop andD¯ at thebottom.Itcanbe verifiedthatS and T satisfies and r.h.s. of (4.10) is given as 3 (4.10) for the top graph and T∗ is the min-cut in the bottom cγ(U \T,T)=cγ({s,1,2},3) graph.Since T∗ ={3} in this case, a new element(2,{1,3}) 2−γ, γ <1 is added to L in Line 25. =5+ Finally,repeatingthewhile-loopagainwiththelastelement (1, γ ≥1 retrived from PL (4.15), it can be shown that γ¯ = 5. Fig. 6c 7−γ, γ <1 again gives D (5) and the min-cut T∗ = {3}, in which case = 3 (6, γ ≥1. a new element (5,{3}) is added to L again in Line 25. Since PL is not empty, the algorithm terminates with γ¯ is computed as the solution to (4.10), namely γ¯ = 3.5. L=[( 2 ,{1,3}),( 5 , {3})]. In general, such a value must exist and is unique because U \ S and T are optimal solutions to (4.7) at γ− and γ+ γ1′ B1 γ2′ B2 respectively.ThecomputationisinO(j)timesincebothsides This gives the desir|e{dz}B|∗({γz)}in|({4z.}6)|{thz}at yields the de- of the equationsare piecewise linear with at most O(j) break sired PSP in (3.6), and therefore the info-clustering solution points. in (2.11) for the PIN model (2.1a). The new weighted digraph D¯ with the capacity function j c¯assigned in the first for-loop (Lines 15–17) can be obtained V. CONCLUSION from D (γ) by j We have adapted the parametric max-flow algorithm of 1) setting γ =γ¯, contracting S to the source node s, computingthePSPtoaninfo-clusteringalgorithmthatclusters 2) contracting T to the sink node j, and then a graphical network based on the information flow over its 3) removing the incoming arcs to s and outgoing arcs edges. The overall running time is O(|V|3 |E|), where |V| from j. is the size of the network and |E| is the number of edges The second for-loop turns f to a valid preflow f¯of D¯ . or communication link. The algorithm simpplifies the general j Recall that for the current example, γ¯ = 3.5 in the last info-clustering algorithm by a few orders of magnitude, and executionofthealgorithm.Fig.6ashowstwodigraphs,where is applicable to systems, such as the social networks, where the top one is the digraph D (γ) at γ = γ¯ = 3.5 and the similarity can be measured by mutual information. 3 bottomoneisthenewweighteddigraphD¯ .ThesetsS,T and Toimplementthe algorithmin alarge-scalesocialnetwork, 3 theflowf indicatedonthetopdigraphD (3.5)satisfy(4.10), the preflow algorithm may be made distributive and adaptive: 3 while D¯ is annotated with the max-flow f∗ and min-cut T∗ Servers may be deployed in different parts of the network to 3 computed by Line 23. Note that, since T∗ = {1,3} 6= {j}, measure and store the informationexchange rates of different pair of nodes. The push and relabel operations in the preflow [19] A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency. algorithm can be done locally by the servers first and then Springer, 2002. [20] A.V.Goldberg, “Efficient graphalgorithms forsequential and parallel communicated to other servers when necessary. The preflow computers,” Ph.D. dissertation, Massachusetts Institute of Technology, ofthenetworkmaybestoredinconjunctionwiththeclustering Dept.ofElectrical Engineering andComputerScience, 1987. solution, so that the clusters can be updated incrementally [21] A. V. Goldberg and R. E. Tarjan, “A new approach to the maximum- flowproblem,”JournaloftheACM(JACM),vol.35,no.4,pp.921–940, overtimebasedonthechangesofinformationexchangerates. 1988. The allocation of the servers and other resources may also be [22] B.V.CherkasskyandA.V.Goldberg,“Onimplementingthepushrelabel adaptedtotheclusteringsolution.Forinstance,asintra-cluster method for the maximum flow problem,” Algorithmica, vol. 19, no. 4, pp.390–410, 1997. communicationismorefrequentthaninter-clustercommunica- tion,thenodesinaclusterwithlargermutualinformationmay be assigned to the same server so thatchangesin the network canbeupdatedmorefrequentlywithoutmuchcommunication overhead among the servers. REFERENCES [1] C. Chan, A. Al-Bashabsheh, Q. Zhou, T. Kaced, and T. Liu, “Info- clustering: A mathematical theory for data clustering,” IEEE Transac- tionsonMolecular,BiologicalandMulti-ScaleCommunications,vol.2, no.1,pp.64–91,June2016. [2] C. Chan, A. Al-Bashabsheh, J. Ebrahimi, T. Kaced, and T. Liu, “Multivariate mutual information inspired by secret-key agreement,” Proceedings oftheIEEE,vol.103,no.10,pp.1883–1913, Oct2015. [3] H. Narayanan, “The principal lattice of partitions of a submodular function,”LinearAlgebraanditsApplications, vol.144,no.0,pp.179 –216,1990. [4] Y. Wu and P. Yang, “Minimax rates of entropy estimation on large alphabetsviabestpolynomialapproximation,”IEEETrans.Inf.Theory, vol.62,no.6,pp.3702–3720, 2016. [5] K. Nagano, Y. Kawahara, and S. Iwata, “Minimum average cost clus- tering.” in NIPS, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S.Zemel, and A.Culotta, Eds. Curran Associates, Inc.,2010, pp. 1759–1767. [6] C. Chan and T. Liu, “Clustering of random variables by multivariate mutual information on Chow-Liu tree approximations,” in Fifty-Third AnnualAllertonConference onCommunication, Control, andComput- ing,Allerton Retreat Center, Monticello, Illinois, Sep.2015. [7] A.J.ButteandI.S.Kohane,“Mutual informationrelevance networks: functionalgenomicclusteringusingpairwiseentropymeasurements,”in PacSympBiocomput, vol.5,2000,pp.418–429. [8] S.Nitinawarat, C.Ye,A.Barg,P.Narayan,andA.Reznik,“Secretkey generationforapairwiseindependentnetworkmodel,”IEEETrans.Inf. Theory,vol.56,no.12,pp.6482–6489,Dec2010. [9] S. Nitinawarat and P. Narayan, “Perfect omniscience, perfect secrecy, andsteiner treepacking,” IEEETrans.Inf.Theory,vol.56,no.12,pp. 6490–6500, Dec.2010. [10] C. Chan, “Matroidal undirected network,” in Information Theory Pro- ceedings(ISIT),2012IEEEInternationalSymposiumon,July2012,pp. 1498–1502. [11] ——, “The hidden flow of information,” in Proc. IEEE Int. Symp. on Inf.Theory,St.Petersburg,Russia,Jul.2011. [12] S. Fujishige and S. Iwata, “Minimizing a submodular function arising fromaconcavefunction,”DiscreteAppliedMathematics,vol.92,no.2, pp.211–215, 1999. [13] M.Queyranne, “Minimizing symmetricsubmodularfunctions,” Mathe- matical Programming,vol.82,no.1-2,pp.3–12,1998. [14] S.Jegelka,H.Lin,andJ.A.Bilmes,“Onfastapproximatesubmodular minimization,” inAdvances inNeuralInformation ProcessingSystems, 2011,pp.460–468. [15] V. Kolmogorov, “A faster algorithm for computing the principal se- quenceofpartitions ofagraph,”Algorithmica,vol.56,no.4,pp.394– 412,2010. [16] G.Gallo,M.D.Grigoriadis,andR.E.Tarjan,“Afastparametricmax- imum flow algorithm and applications,” SIAM Journal on Computing, vol.18,no.1,pp.30–55,1989. [17] C.ChanandL.Zheng,“Mutualdependence forsecretkeyagreement,” inProceedingsof44thAnnualConferenceonInformationSciencesand Systems,2010. [18] S. Fujishige, “Polymatroidal dependence structure of a set of random variables,” Information andControl, vol.39,no.1,pp.55–72,1978.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.