ebook img

Spectral Detection in the Censored Block Model PDF

0.38 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Spectral Detection in the Censored Block Model

Spectral Detection in the Censored Block Model Alaa Saade Florent Krzakala Laboratoire de Physique Statistique Sorbonne Universite´s, UPMC Univ. Paris 06 E´cole Normale Supe´rieure, 24 Rue Lhomond Laboratoire de Physique Statistique, CNRS UMR 8550 Paris 75005 E´cole Normale Supe´rieure, 24 Rue Lhomond, Paris Marc Lelarge Lenka Zdeborova´ INRIA and E´cole Normale Supe´rieure Institut de Physique The´orique Paris, France CEA Saclay and URA 2306, CNRS 91191 Gif-sur-Yvette, France. 5 1 0 Abstract—We consider the problem of partially recovering σi? We call positively correlated an assignment σˆi such that hidden binary variables from the observation of (few) censored the following quantity, called overlap, is strictly positive: 2 edge weights, a problem with applications in community detec- (cid:34) (cid:32) n n (cid:33) (cid:35) n tion,correlationclusteringandsynchronization.Wedescribetwo 1(cid:88) 1(cid:88) 1 2 max 1(σˆ =σ ), 1(σˆ =−σ ) − . (3) u spectral algorithms for this task based on the non-backtracking n i i n i i 2 J and the Bethe Hessian operators. These algorithms are shown i=1 i=1 tobeasymptoticallyoptimalforthepartialrecoveryproblem,in 0 In the limit n → ∞, this overlap vanishes for a random thattheydetectthehiddenassignmentassoonasitisinformation 1 theoretically possible to do so. guessσˆi,andisequaltounityiftherecoveryisexact.Wewill refer to the task of finding a positively correlated assignment ] I σˆ as partial recovery. This task has been shown [3], [4] to be i S A. Introduction possible only if . s In many inference problems, the available data can be 1 c α>α = . (4) [ represented on a weighted graph. Given the knowledge of detect (1−2(cid:15))2 2 the edge weights, the task is to infer latent variables carried To the best of our knowledge, there is no rigorous proof by the nodes. Here, we shall consider the problem of recov- v that this bound is also sufficient. In [3], the same authors ering binary node labels from censored edge measurements 3 also showed that belief propagation (BP) allows to saturate [1], [2]. Specifically, given an Erdo˝s-Re´nyi random graph 6 this bound. However, there is no rigorous analysis of BP for G = (V,E) ∈ G(n,α/n) with n nodes carrying latent 1 this problem and the fact that condition (4) is necessary and 0 variables σi = ±1, 1 ≤ i ≤ n, we draw the edge labels sufficientwasleftasaconjecturein[3]andonlythenecessary 0 Jij=±1, (ij)∈E from the following distribution: part was proved in [4]. Moreover, from a practical point of . 2 view, BP requires the knowledge of the noise parameter (cid:15). P(J |σ ,σ )=(1−(cid:15))1(J =σ σ )+(cid:15)1(J =−σ σ ), (1) 0 ij i j ij i j ij i j In this contribution, we describe two simple spectral al- 5 where (cid:15) is a noise parameter. In the noiseless case (cid:15)=0, we gorithms and we show rigorously that they are optimal, in 1 : have σiσj=Jij and one can easily recover the communities the sense that they can perform partial recovery as soon as v in each connected component along a spanning tree. When α > α . Additionally, the output of these algorithms is detect Xi (cid:15) = 1/2, on the other hand, the graph doesn’t contain any shown numerically to have an overlap similar to that of BP, information about the latent variables σi, and recovery is without requiring the knowledge of the noise parameter (cid:15). ar impossible.Whathappensinbetween?Theproblemofexactly This closes the gap from [3], [4], where spectral methods are recovering the latent variables σi has been studied in [1]. introducedthatsucceedonlyiftheconnectivityissignificantly It turns out that, asymptotically in the large n limit, exact largerthanthethreshold(4).Theresultingalgorithmsarethus recovery is shown to be possible if and only if fast, trivial to implement, and asymptotically optimal. 2logn α>α = , (2) B. Motivation and Related work exact (1−2(cid:15))2 There are various interpretations and models that connect where α is the average degree of the graph. Note that the to this problem such as i) Community detection [2]: we try variable of an isolated vertex cannot be recovered so that the to recover the community membership of the nodes based on averagedegreehastogrowatleastlikelogn,asintheCoupon noisy (or censored) observations about their relationship; ii) collector’s problem, to ensure that the graph is connected. Correlation clustering [5]: we try to cluster the graph G by minimizing the number of “disagreeing edges” (J =−1) in Weconsiderinthispaperthecasewheretheaveragedegree ij each cluster. These examples, and others such as synchronisa- α will remain fixed as n tends to infinity. In this setting, we tion, are discussed in details in [1]. cannotaskforexactrecoveryandweconsiderhereadifferent question: is it possible to infer an assignment σˆ of the latent The inspiration for the present contribution comes from i variablesthatispositivelycorrelatedwiththeplantedvariables recent developments in the problem of detecting communities inthe(sparse)stochasticblockmodel.Thethresholdforpartial (λ (cid:54)= ±1,v ∈ R2m) is an eigenpair of B, then (λ,v(cid:48) ∈ R2n) recovery in the stochastic block model was conjectured in [6] is an eigenpair of B(cid:48) if and proved in [7]–[9]. Optimal spectral methods, based on (cid:88) the same operators as the algorithms introduced here, were vn(cid:48)+i = vj→i, ∀1≤i≤n, (8) proposed in [10], [11]. These operators were in particular j∈∂i shown to be much better suited to very sparse graphs than λv(cid:48) =(d −1)v(cid:48) , (9) i i n+i the traditional adjacency or Laplacian operators. where ∂i and d are the set of neighbors and the degree of i Interestingly, this problem first appeared in statistical node i. We will therefore favor using B(cid:48). The algorithm is physics.Indeed,theposteriordistributioncorrespondingtoeq. then as follows: given a graph with edge weights J , ij (1) reads, using β = 1log1−(cid:15) 0 2 (cid:15) Algorithm 1 β0 (cid:80) Jijσiσj e (ij)∈E 1) build the matrix B(cid:48) P(σ|J)= . (5) Z 2) compute its leading eigenvalue λ1 (with largest mag- J nitude), and its corresponding eigenvector v(cid:48) ={v(cid:48)}. √ i This is nothing but the spin glass [12] problem where the 3) if λ ∈ R and λ > α, where α is the average 1 1 couplings Jij are correlated with the ”planted” configura- degree of the graph, set xˆi =sign(vn(cid:48)+i). Otherwise, tion σi [2], [13]. Such problems can also be shown to be raise an error. equivalent to spin glasses on the so-called Nishimori line [14], [15]. With these notations, the detection condition (4) Theorem 1 ensures that whenever (4) holds, this algorithm corr√esponds to the well-known spin glass transition [16], [17] outputs an assignment xˆi that is positively correlated with the at αdetecttanhβ0 = 1. In this spin glass context, [18] planted latent variables xi. alreadyconjecturedthataspectralalgorithmbasedonthenon- backtracking operator (see sec. I-A) was optimal. B. The Bethe Hessian Another operator closely related to the non-backtracking C. Outline and main results operatorwasintroducedin[11].Thisoperator,calledtheBethe Hessian, is an n×n real and symmetric matrix defined as In section I, we describe two spectral algorithms that √ achieve the threshold (4). These algorithms are based on two H=(α−1)1− αJ +D, (10) linear operators: the non-backtracking operator introduced in [10], and the Bethe Hessian introduced in [11]. We further whereDisthediagonalmatrixofvertexdegrees.Basedonthis illustrate their properties by showing the results of numerical operator, we propose the following algorithm: given a graph experiments. In section II, we list the spectral properties of with edge weights Jij, the non-backtracking operator that are relevant to the present Algorithm 2 context.Finally,wediscussthepropertiesoftheBetheHessian and its relation with the non-backtracking operator in section 1) build the Bethe Hessian H III and discuss its connection with the Bethe free energy. 2) computeits(algebraically)smallesteigenvalueλ,and its corresponding eigenvector v. 3) if λ<0, set xˆ =sign(v ). Otherwise, raise an error. I. SPECTRALALGORITHMS i i Justificationsforthissecondalgorithm,anditsrelationwiththe A. The non-backtracking operator first one, will be provided in section III. Compared to the first The non-backtracking operator acts on the directed edges algorithm, this second one is based on a smaller, symmetric i→j of the graph as matrix, which leads to improved numerical performance and stability.Additionally,inthecaseofmoregeneraledgeweights Bi→j,k→(cid:96) =Jk(cid:96)1(j =k)1(i(cid:54)=(cid:96)). (6) Jij (cid:54)=±1, the reduction of B to a smaller matrix B(cid:48) fails, and onehastoworkwitha2m×2mmatrix.TheBetheHessian,on It is therefore represented by a 2m×2m matrix, where m is the other hand, generalizes easily to arbitrary weights without the number of edges in the graph. As discussed in [10], [18] any loss in scalability [11]. the motivation for using this operator is that it corresponds to thelinearapproximationofbeliefpropagationforthisproblem C. Numerical results around the so-called uninformative fixed point of BP. Beforeturningtoproofs,weshowonfigure1thenumerical Similarly to [10], one can show (see Sec. III for details) performanceofourtwoalgorithms,andcomparethemwiththe that the eigenvalues of B that are different from ±1 form the performanceofbeliefpropagation([3],[19])whichisbelieved spectrum of the simpler 2n×2n matrix to be optimal on such locally tree-like graphs in the sense (cid:18) 0 D−1 (cid:19) that it gives, arguably, the Bayes optimal value of the overlap B(cid:48) = , (7) −1 J asymptotically. As shown in section II, both algorithms 1 and 2 are able to achieve partial recovery as soon as α>α , detect where 1 is the n×n identity matrix, D is the diagonal matrix and their overlap is similar to that of BP, though of course defined by D = d , where d is the degree of node i, and strictly smaller. Note again that BP requires the knowledge of ii i i J has entries equal to the edge weights J . Furthermore, if (cid:15) while the two spectral algorithms described here do not, are ij magnitude.Then,withprobabilitytendingto1asn→∞,we have: √ (i) if α<α then |λ |≤ α+o(1). detect 1 (ii) ifα>α ,thenλ ∈R, λ =α(1−2(cid:15))+o(1)> √ detect √ 1 1 α, and |λ | ≤ α+o(1). Additionally, denoting 2 v the eigenvector associated with λ , the following 1 assignment is positively correlated with the planted variables σ : i   (cid:88) σˆi =sign vj→i . j∈∂i This theorem is illustrated on Fig. 2. It is then straightforward to show the following: Corollary 1: The assignment output by Algo. 1 is posi- tively correlated with the planted variables σ if and only if i Fig.1. Overlapasafunctionofα:comparisonbetweenalgorithm1(based α>α . (11) detect onthenon-backtrackingoperatorB),algorithm2(basedontheBetheHessian H), and belief propagation (BP). The noise parameter (cid:15) is fixed to 0.25 (correspondingtoαdetect=4),andwevaryα.TheoverlapforBandHis WenowgiveabriefsketchofproofforourTheorem1.The averaged over 20 graphs of size n = 105. The overlap for BP is estimated proof relies heavily on the techniques developed in [23]. We asymptotically using the standard method of population dynamics (see for trytousenotationconsistentwith[23]:E(cid:126) isthesetoforiented instance [20]), with a population of size 104. All three methods output a edges and for any e = u → v = (u,v) ∈ E(cid:126), we set e = u, p1oasnitdiv2elyhacvoerraenlaotevderalaspsigsinmmielanrttaostshoaotnofasBαP,>wiαthdtehteecsta.mSepepchtraasleatlrgaonrsiitthiomns, e2 = v and e−1 = (v,u). For a matrice M, its transp1ose is whilebeingsimplerandnotrequiringtheknowledgeoftheparameter(cid:15). denoted by M∗. We start with a simple observation: if t is the vector in RE(cid:126) defined by t =σ and (cid:12) is the Hadamard e e2 product, i.e. (t(cid:12)x) =σ x , then we have trivial to implement, run faster, and avoid the potential non- e e2 e convergence problem of belief propagation while remaining Bx=λx⇔B˜(t(cid:12)x)=λ(t(cid:12)x), (12) asymptoticallyoptimalindetectingthehiddenassignment.We withB˜ definedbyB˜ =B σ σ .Inparticular,BanB˜ have also observe, empirically, that the overlap given by the Bethe ef ef f1 f2 the same spectrum and there is a trivial relation between their Hessian seems to be always superior to the one provided by eigenvectors. It will be easier to work with B˜ so to lighten the the non-backtracking operator. notation, we will denote (in this section): B =1(e =f )1(e (cid:54)=f )P , ef 2 1 1 2 f II. SPECTRALPROPERTIESOFTHENON-BACKTRACKING where P =σ J σ . Note that the random variables P are OPERATOR f f1 f f2 f now i.i.d. with P(P =1)=1−P(P =−1)=1−(cid:15). With f f In this section, we state results concerning the spectrum of this formulation, the problem is said in statistical physics to B and show that algorithm 1 outputs an assignment σˆi that is be ”on the Nishimori line” [14], [15]. positivelycorrelatedwiththeplantedone,whenever(4)holds. For the case (1−2(cid:15))2α < 1, the proof is relatively easy. As already noticed in previous work for the case of an Indeed,from[4],weknowthatoursettingiscontiguoustothe unweightedrandomgraph[10],[21],thesuperiorperformance setting with (cid:15) = 1/2. In this case, the random variable Pi,j of the non-backtracking operator B is due to the particular are centered and a version of the trace method will allow to shapeofitsspectrum.Inthecaseofthestochasticblockmodel upper bound the spectral radius of B. Note however, that one [22], it decomposes into a bulk of uninformative eigenvalues needs to condition on the graph to be (cid:96)-tangle-free, i.e. such √ contained in a disk of radius α in the complex plane, and thateveryneighborhoodofradius(cid:96)containsatmostonecycle a few real and informative eigenvalues outside of the disk. in order to apply the first moment method. This observation was recently proven in [23], in the case of 2 We now consider the case (1−2(cid:15))2α>1 and denote by communities. P the linear mapping on RE(cid:126) defined by (Px) = P x e e e−1 The following theorem generalizes this previous result to (i.e. the matrix associated to P is Pef = Pe1(f = e−1)). the present setting and is the main result of this paper. Note that P∗ = P and since P2 = 1, P is an involution so e that P is an orthogonal matrix. A simple computation shows Theorem 1: Given an Erdo˝s-Re´nyi random graph with that BkP = PB∗k, hence BkP is a symmetric matrix. This average degree α, variables assigned to vertices σi = ±1 symmetry corresponds to the oriented path symmetry in [23] uniformly at random independently from the graph and where and will be crucial to our analysis. the edges carry weights sampled from (1), we denote by B the non-backtracking operator defined by (6). and by |λ1| ≥ We also define α˜ = (1−2(cid:15))α and χ ∈ RE(cid:126) with χe = 1 |λ |≥···≥|λ |theeigenvaluesofBinorderofdecreasing for all e ∈ E(cid:126). The proof strategy is then similar to Section 2 2m 5 in [23]. Consider a sequence (cid:96) ∼ κlog n for some small α˜ positive κ. Let B(cid:96)χ B(cid:96)Pϕ ϕ= , θ =(cid:107)B(cid:96)Pϕ(cid:107), ζ = . (cid:107)B(cid:96)χ(cid:107) θ If R = B(cid:96) −θζPϕ∗ and we can prove that (cid:107)R(cid:107) is small in comparisonwithθ,thenwecanuseatheoremonperturbation of eigenvalues and eigenvectors adapted from the Bauer-Fike theorem (see Section 4 in [23]) saying that B(cid:96) should have an eigenvalue close to θ. More precisely, for y ∈ RE(cid:126) with (cid:107)y(cid:107) = 1, write y = sPϕ+x with x∈(Pϕ)⊥ and s∈R. Then, we find Fig.2. Spectrumofthenon-backtrackingmatrixinthecomplexplanefora (cid:107)Ry(cid:107)=(cid:107)B(cid:96)x+s(B(cid:96)Pϕ−θζ)(cid:107)≤ sup (cid:107)B(cid:96)x(cid:107). problemgeneratedwith(cid:15)=0.25,n=2000.Weusedα=3(leftside)and x:(cid:104)x,Pϕ(cid:105)=0,(cid:107)x(cid:107)=1 α=8(rightside),tobecomparedwithαdetect=4.Eachpointrepresents aneigenvalue.Inbothcases,thebulkofthespectrumisconfinedinacircle √ This last quantity can be shown to be upper bounded by of radius α. However, when α > αdetect, a single isolated eigenvalue (logn)cα(cid:96)/2 similarly as in Proposition 12 in [23]. Moreover, appearsoutofthebulkat(1−2(cid:15))α(seethearrowontherightplot)andthe correspondingeigenvectoriscorrelatedwiththeplantedassignement. we can also show that w.h.p. (cid:104)ζ,Pϕ(cid:105)≥c , c α˜(cid:96) ≤θ ≤c α˜(cid:96). (13) 0 0 1 process, we can prove that (cid:107)B(cid:96)B∗(cid:96)Pχ(cid:107)≈α˜2(cid:96) and moreover, These bounds allow to show that B has an eigenvalue |λ − we have for e∈E(cid:126), √ 1 α˜|=O(1/(cid:96)) and that |λ2|≤ α+o(1). (B(cid:96)B∗(cid:96)Pχ) α˜ e ≈ X(∞), (14) Note that θ = (cid:107)B(cid:96)B∗(cid:96)Pχ(cid:107), so that we need to compute α˜2(cid:96) α(1−2(cid:15))2−1 (cid:107)B(cid:96)χ(cid:107) quantities of the type (cid:107)B(cid:96)χ(cid:107). We now explain the main ideas where X(∞) is the limit of the martingale defined above to compute these quantities. First note that, (B(cid:96)χ) depends and has mean one. We can now translate this result to the e only on the ball of radius (cid:96) around the edge e. For (cid:96) not too eigenvector of the original non-backtracking operator thanks large,thisneighborhoodcanbecoupledwithaGalton-Watson to (12): v = σ x where x is approximated by (14). In e e2 e (cid:80) e branching process with offspring distribution Poi(α). It is particular, we see that v is correlated with σ . e,e2=v e v thennaturaltoconsiderthisPoissonGalton-Watsonbranching process with i.i.d. weights Pu,v ∈ {±1} on its edges with III. FROMTHENON-BACKTRACKINGOPERATORTOTHE mean 1−2(cid:15). For u in thetree, wedenote by |u| its generation BETHEHESSIAN and by Y(u) = (cid:81)t P where γ = (γ ,...,γ ) is the s=1 γs,γs+1 1 t In this section, we relate the spectra of H, B and B(cid:48) by uniquepathbetweentherooto=γ andu=γ .Then(B(cid:96)χ) 1 t e generalizing some properties discussed in [10], [11]. (λ (cid:54)= is well approximated by: ±1,v ∈R2m) being an eigenpair of B, we define (cid:88) Z = Y(u). (cid:88) (cid:96) v = v , ∀1≤i≤n. (15) i j→i |u|=(cid:96) j∈∂i (cid:80) Sinceλv = J v itfollowsthatλv =v − ItiseasytoseethatXt = Zα˜tt isamartingale(withrespect J v .i→Cljosing tkh∈e∂ie\qjuaktiionk→oin the single site eil→emjentsiv to the natural filtration) with zero mean. Moreover we have ij j→i i thus leads to   (cid:32) (cid:33) E(cid:2)Zt2(cid:3) = E (cid:88) Y(u)Y(v) vi 1+ (cid:88) λ−Ji2jJ2 −λ (cid:88) λ−JijJ2 vk =0. (16) u,v:|u|=|v|=t k∈∂i ij k∈∂i ij t For convenience, we now define the matrix: = (cid:88)αt−i(1−2(cid:15))2iα2i =O(cid:0)α˜2t(cid:1), H(X)=(X2−1)1−XJ +D (17) i=0 √ where the last equality is valid only if (1−2(cid:15))2α > 1. So Note in particular that the Bethe Hessian reads H = H( α). in this case, we have E(cid:2)X2(cid:3) = O(1) and the martingale Given that the values of Jij are ±1, all eigenvalues of B dif- t X converges a.s. and in L2 to a limiting random variable ferentfrom±1thusmustsatisfiesthefollowinggeneralization t of the Ihara-Bass formula [24] : X(∞)withmeanone.Followingtheargumentasin[23],this reasoning leads to (13). det(cid:2)(λ2−1)1−λJ +D(cid:3)=detH(λ)=0. (18) We now consider the eigenvector associated with λ . It 1 Tosolve(16)oneneedstofindaneigenvectorv ofH(λ)with follows from Bauer-Fike theorem (see Section 4 in [23]) that a zero eigenvalue. This is a quadratic eigenproblem, which the eigenvector x associated to λ1 is asymptotically aligned can be turned into a linear one by introducing the matrix B(cid:48) with B(cid:96)B∗(cid:96)Pχ . Thanks to the coupling with the branching of Algo. 1. Indeed, if λ ∈ R is an eigenvalue of B(cid:48) with (cid:107)B(cid:96)B∗(cid:96)Pχ(cid:107) eigenvector v(cid:48), then it follows that v := {vi(cid:48)}n+1≤i≤2n is [2] E. Abbe and A. Montanari, “Conditional random fields, planted con- an eigenvector of H(λ) with eigenvalue 0, so that λ is an straintsatisfactionandentropyconcentration,”inApproximation,Ran- eigenvalue of B as well (at least if λ (cid:54)= ±1), justifying eq. domization, and Combinatorial Optimization. Algorithms and Tech- niques. Springer,2013,pp.332–346. (8,9). Note that since we are interested in values of λ > 1 [3] S. Heimlicher, M. Lelarge, and L. Massoulie´, “Community detection (since λ > α and we need α > 1 from (4)), the limitation of in the labelled stochastic block model,” 09 2012. [Online]. Available: looking at λ(cid:54)=±1 is irrelevant. http://arxiv.org/abs/1209.2910 Finally, following [11], we can relate the spectra of B and [4] M. Lelarge, L. Massoulie, and J. Xu, “Reconstruction in the labeled stochasticblockmodel,”inInformationTheoryWorkshop(ITW),2013 H by the following argument. For X large enough, H(X) IEEE,Sept2013,pp.1–5. is positive definite. Then as X decreases, H(X) will gain a [5] N.Bansal,A.Blum,andS.Chawla,“Correlationclustering,”Machine new negative eigenvalue whenever X becomes equal to an Learning,vol.56,no.1-3,pp.89–113,2004. eigenvalue of B. This justifies the following corollary: [6] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborova´, “Asymptotic analysis of the stochastic block model for modular networks and its Corollary 2: if the conditions of Theorem 1 apply, then √ algorithmicapplications,”Phys.Rev.E,vol.84,no.6,p.066106,2011. H=H( α) has a unique negative eigenvalue if α>α , detect [7] E. Mossel, J. Neeman, and A. Sly, “Stochastic block models and and none otherwise. reconstruction,”arXivpreprintarXiv:1202.1499,2012. Strictlyspeaking,ifwedenotebyλ theleadingeigenvalue [8] L.Massoulie,“Communitydetectionthresholdsandtheweakramanu- 1 janproperty,”arXivpreprintarXiv:1311.3085,2013. ofB,wehaveonlyshownthattheeigenvectorwitheigenvalue [9] E.Mossel,J.Neeman,andA.Sly,“Aproofoftheblockmodelthreshold 0 of H(λ ) is positively correlated with the planted variables 1 conjecture,”arXivpreprintarXiv:1311.4115,2013. if α > α . However, we observe numerically (see figure detect [10] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborova´, 1) that the eigenvector with negative eigenvalue of H is also and P. Zhang, “Spectral redemption in clustering sparse networks,” positivelycorrelated,andinfactgivesaslightlybetteroverlap. Proceedings of the National Academy of Sciences, vol. 110, no. 52, This point will have to be clarified in future work. pp.20935–20940,2013. [11] A.Saade,F.Krzakala,andL.Zdeborova´,“Spectralclusteringofgraphs It is worth noting the Bethe Hessian is also related to withthebethehessian,”inAdvancesinNeuralInformationProcessing the belief propagation algorithm. [25] showed that the fixed Systems,2014,pp.406–414. points of the BP recursion are stationary points of the so- [12] M.Me´zard,M.A.Virasoro,andG.Parisi,Spinglasstheoryandbeyond. calledBethefreeenergy.DirectoptimizationoftheBethefree Worldscientific,1987. energy has then been proposed as an alternative to BP [26]. [13] F. Krzakala and L. Zdeborova´, “Hiding quiet solutions in random In this context, [11] showed that the so-called paramagnetic constraintsatisfactionproblems,”Phys.Rev.Lett.,vol.102,p.238701, 2009. fixed point (corresponding to an uninformative assignment) is [14] F. Krzakala, M.-C. Angelini, and F. Caltagirone, “Sta- a local minimum of the Bethe free energy if and only if H is tistical physics of inference problems,” Lecture notes, positive definite. Algo. 2 can therefore be seen as a spectral http://ipht.cea.fr/Docspht/articles/t14/045/public/notes.pdf,2014. relaxation of the direct optimization of the Bethe free energy. [15] H.Nishimori,“Internalenergy,specificheatandcorrelationfunctionof In the end, both approaches are indeed deeply related to BP. thebond-randomisingmodel,”ProgressofTheoreticalPhysics,vol.66, no.4,pp.1169–1181,1981. IV. CONCLUSION [16] L. Viana and A. J. Bray, “Phase diagrams for dilute spin glasses,” J. Phys.C:SolidStatePhysics,vol.18,no.15,p.3037,1985. Wehaveconsideredtheproblemofpartiallyrecoveringbi- [17] F.GuerraandF.L.Toninelli,“Thehightemperatureregionoftheviana– nary variables from the observation of censored edge weights, braydilutedspinglassmodel,”Journalofstatisticalphysics,vol.115, and described two optimal spectral algorithms for this task no.1-2,pp.531–555,2004. that can provably perform partial recovery as soon as it is [18] P.Zhang,“Non-backtrackingoperatorforisingmodelanditsapplica- tioninattractorneuralnetworks,”arXiv:1409.3264,2014. information theoretically possible to do so. Remarkably, these algorithmsdonotrequiretheknowledgeofthenoiseparameter [19] J. Pearl, “Reverend bayes on inference engines: A distributed hierar- chicalapproach,”inAAAI,1982,pp.133–136. (cid:15) and perform almost as well as belief propagation, which is [20] M.MezardandA.Montanari,Information,physics,andcomputation. expected(butnotproved)tobeBayesoptimalforthisproblem. OxfordUniversityPress,2009. This allows to close the gap from previous works, both [21] A. Saade, F. Krzakala, and L. Zdeborova´, “Spectral density of the algorithmically, by providing optimal spectral algorithms, and non-backtracking operator on random graphs,” EPL, vol. 107, no. 5, theoretically, by proving that the transition (4) is a necessary p.50005,2014. and sufficient condition for partial recovery. [22] P.W.Holland,K.B.Laskey,andS.Leinhardt,“Stochasticblockmodels: Firststeps,”SocialNetworks,vol.5,no.2,p.109,1983. ACKNOWLEDGMENT [23] C.Bordenave,M.Lelarge,andL.Massoulie´,“Non-backtrackingspec- trumofrandomgraphs:communitydetectionandnon-regularramanu- This work has been supported by the ERC under the jangraphs,”arXiv,2015. European Union’s FP7 Grant Agreement 307087-SPARCS [24] H.Bass,“Theihara-selbergzetafunctionofatreelattice,”International and by the French Agence Nationale de la Recherche under JournalofMathematics,vol.3,no.06,pp.717–797,1992. reference ANR-11-JS02-005-01 (GAP project). [25] J.S.Yedidia,W.T.Freeman,andY.Weiss,“Bethefreeenergy,kikuchi approximations,andbeliefpropagationalgorithms,”Advancesinneural informationprocessingsystems,vol.13,2001. REFERENCES [26] M. Welling and Y. W. Teh, “Belief optimization for binary networks: A stable alternative to loopy belief propagation,” in Proceedings of [1] E.Abbe,A.S.Bandeira,A.Bracher,andA.Singer,“Decodingbinary the Seventeenth conference on Uncertainty in artificial intelligence. node labels from censored edge measurements: Phase transition and MorganKaufmannPublishersInc.,2001,pp.554–561. efficientrecovery,”arXiv:1404.4749,2014.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.