ebook img

Clustering from Sparse Pairwise Measurements PDF

1.5 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Clustering from Sparse Pairwise Measurements

Clustering from Sparse Pairwise Measurements Alaa Saade Florent Krzakala Laboratoire de Physique Statistique Sorbonne Universite´s, UPMC Univ. Paris 06 E´cole Normale Supe´rieure, 24 Rue Lhomond Laboratoire de Physique Statistique, CNRS UMR 8550 Paris 75005 E´cole Normale Supe´rieure, 24 Rue Lhomond, Paris Marc Lelarge Lenka Zdeborova´ INRIA and E´cole Normale Supe´rieure Institut de Physique The´orique Paris, France CEA Saclay and CNRS, France. 6 1 0 Abstract—We consider the problem of grouping items into a belief propagation algorithm and two spectral algorithms 2 clusters based on few random pairwise comparisons between basedonthenon-backtrackingoperatorandtheBetheHessian. y the items. We introduce three closely related algorithms for this Although these three algorithms are intimately related, so far, a task: a belief propagation algorithm approximating the Bayes a sketch of rigorous analysis is available only for the spectral M optimal solution, and two spectral algorithms based on the properties of the non-backtracking matrix. From a practical non-backtracking and Bethe Hessian operators. For the case of perspectivehowever,beliefpropagationandtheBetheHessian two symmetric clusters, we conjecture that these algorithms are 9 aremuchsimplertoimplementandshowevenbetternumerical asymptoticallyoptimalinthattheydetectthe clusters assoonas 1 performance. it is information theoretically possible to do so. We substantiate this claim for one of the spectral approaches we introduce. ] To evaluate the performance of our proposed algorithms, I S we construct a model with n items in k predefined clusters of . I. INTRODUCTION same average size n/k, by assigning to each item i ∈ [n] s c A. Problem and model a cluster label ci ∈ [k] with uniform probability 1/k. We [ assume that the pairwise measurement between an item in Similarity-based clustering is a standard approach to label clusteraandanotheriteminclusterbisarandomvariablewith 2 v itemsinadatasetbasedonsomemeasureoftheirresemblance. density pa,b. We choose the observed pairwise measurements 3 In general, given a dataset {xi}i∈[n] ∈ Xn, and a symmetric uniformly at random, by generating an Erdo˝s-Re´nyi random 8 measurement function s : X2 → R quantifying the similarity graph G = (V = [n],E) ∈ G(n,α/n). The average degree α 6 between two items, the aim is to cluster the dataset from the corresponds to the sampling rate: pairwise measurements are 6 knowledge of the pairwise measurements s :=s(x ,x ), for observed only on the edges of G, and α therefore controls the ij i j 0 1≤i<j ≤n. This information is conveniently encoded in a difficulty of the problem. From the base graph G, we build a 1. similarity graph, which vertices represent items in the dataset, measurementgraphbyweightingeachedge(ij)∈E withthe 0 andtheweightededgescarrythepairwisesimilarities.Typical measurement sij, drawn from the probability density pci,cj. 6 choices for this similarity graph are the complete graph and The aim is to recover the cluster assignments ci for i ∈ [n] 1 the nearest neighbor graph (see e.g. [1] for a discussion in the from the measurement graph thus constructed. : context of spectral clustering). v We consider the sparse regime α = O(1), and the limit Xi Here however, we will not assume the measurement func- n→∞withfixednumberofclustersk.Withhighprobability, tion s to quantify the similarity between items, but more thegraphGisdisconnected,sothatexactrecoveryoftheclus- r a generally ask that the measurements be typically different ters, as considered e.g. in [3], [4], is impossible. In this paper, depending on the cluster memberships of the items, in a way we address instead the question of how many measurements that will be made quantitative in the following. For instance, are needed to partially recover the cluster assignments, i.e. to s could be a distance in an Euclidean space or could take infer cluster assignments cˆi such that the following quantity, values in a set of colors (i.e. s does not need to be real- called overlap, is strictly positive: valued). Additionally, we will not assume knowledge of the 1 (cid:80) 1(σ(cˆ)=c )− 1 measurements for all pairs of items in the dataset, but only max n i i i k , (1) for O(n) of them chosen uniformly at random. Sampling is a σ∈Sk 1− k1 well-known technique to speed up computations by reducing the number of non-zero entries [2]. The main challenge is to where Sk is the set of permutations of [k]. This quantity choosethelowestpossiblesamplingratewhilestillbeingable is monotonously increasing with the number of correctly to detect the signal of interest. In this paper, we compute classified items. In the limit n→∞, it vanishes for a random explicitly this fundamental limit for a simple probabilistic guess, and equals unity if the recovery is perfect. Finally, we modelandpresentthreealgorithmsallowingpartialrecoveryof note an important special case for which analytical results can the signal above this limit. Below the limit, in the case of two bederived,whichisthecaseofsymmetricclusters:∀a,b∈[k] clusters, no algorithm can give an output positively correlated p (s):=1(a=b)p (s)+1(a(cid:54)=b)p (s), (2) with the true clusters. Our three algorithms are respectively a,b in out where pin(s) (resp. pout(s)) is the probability density of 0.9 k=2, αc≈2.63 0.9 k=3, αc≈5.5 observing a measurement s between items of the same cluster 0.8 BP 0.8 BP (resp.differentclusters).Forthisparticularcase,weconjecture H H 0.7 0.7 B B that all of the three algorithms we propose achieve partial 0.6 0.6 p p recovery of the clusters whenever α>αc, where 0.5 rla 0.5 rla 1 1 (cid:90) (cid:0)p (s)−p (s)(cid:1)2 0.4 Ove 0.4 Ove in out 0.3 0.3 = ds , (3) α k p (s)+(k−1)p (s) 0.2 0.2 c K in out 0.1 0.1 where K is the support of the function p + (k − 1)p . in out 0.0 0.0 This expression corresponds to the threshold of a related 0 1 2 α3 4 5 6 3 4 5 6α7 8 9 10 reconstruction problem on trees [5]. In the following, we Fig. 1. Performance in clustering model-generated measurement graphs in substantiatethisclaimforthecaseofk=2symmetricclusters, thesymmetriccase(2).Theoverlapisaveragedover20realizationsofgraphs and discrete measurement distributions. Note that the model ofsizen=105,withk=2,3clusters,andGaussianpin,pout withmean we introduce is a special case of the labeled stochastic block respectively1.5and0,andunitvariance.Thetheoreticaltransition(3)isat model of [6]. In particular, for the case k=2, it was proven in αc≈2.63fork=2,andconjecturedtobeatαc≈5.5fork=3.Allthree methodsachievethetheoreticaltransition,althoughtheBetheHessian(H)and [7]thatpartialrecoveryisinformationtheoreticallyimpossible belief propagation (BP) achieve a higher overlap than the non-backtracking if α < αc. In this contribution, we argue that this bound operator(B). is tight, namely that partial recovery is possible whenever α > α , and that the algorithms we propose are optimal, in c that they achieve this threshold. Note also that the symmetric a belief propagation (BP) algorithm approximating the Bayes model (2) contains the censored block model of [8]. More optimal solution. The other two are spectral methods derived precisely, if p and p are discrete distributions on {±1} from BP. We show numerically that all three methods achieve in out with p (+1) = p (−1) = 1−(cid:15), then α = (1−2(cid:15))−2. In thethreshold(3).NextinSec.IIIwesubstantiatethisclaimfor in out c this case, the claimed result is known [9], and to the best of the spectral method based on the non-backtracking operator. ourknowledge,thisistheonlycasewhereourresultisknown. II. ALGORITHMS B. Motivation and related work A. Belief propagation The ability to cluster data from as few pairwise compar- We consider a measurement graph generated from the isons as possible is of broad practical interest [4]. First, there model of Section I-A. From Bayes’ rule, we have: are situations where all the pairwise comparisons are simply 1 (cid:89) notavailable.Thisisparticularlythecaseifacomparisonisthe P({c }|{s })= p (s ), (4) result of a human-based experiment. For instance, in crowd- i ij Z ci,cj ij (ij)∈E clustering [10], [11], people are asked to compare a subset of the items in a dataset, and the aim is to cluster the whole where Z is a normalization. The Bayes optimal assignment, datasetbasedonthesecomparisons.Clearly,foralargedataset maximizing the overlap (1), is cˆi = argmax Pi, the mode of of size n, we can’t expect to have all O(n2) measurements. the marginal of node i. We approximate this marginal using Second, even if these comparisons can be automated, the belief propagation (BP): typicalcostofcomputingallpairwisemeasurementsisO(n2d) k wheredisthedimensionofthedata.Forlargedatasetswithn 1 (cid:89) (cid:88) P (c )≈ p (s )P (c ), (5) in the millions or billions, or large dimensional data, like high i i Z ci,cl il l→i l i resolution images, this cost is often prohibitive. Storing all l∈∂icl=1 O(n2) measurements is also problematic. Our work supports where ∂i denotes the neighbors of node i in the measurement the idea that if the measurements between different classes of graphG,Zi isanormalization,andthePi→j(ci)arethefixed itemsaresufficientlydifferent,arandomsubsamplingofO(n) point of the recursion: measurements might be enough to accurately cluster the data. k 1 (cid:89) (cid:88) P (c )= p (s )P (c ). (6) This work is inspired by recent progress in the problem i→j i Z ci,cl il l→i l i→j of detecting communities in the sparse stochastic block model l∈∂i\jcl=1 (SBM) where partial recovery is possible only when the aver- Inpractice,startingfromarandominitialcondition,weiterate age degree α is larger than a threshold value, first conjectured (6)untilconvergence,andestimatethemarginalsfrom(5).On in [12], and proved in [13]–[15]. A belief propagation (BP) sparse tree-like random graphs generated by our model, BP is algorithm similar to the one presented here is introduced in widelybelievedtogiveasymptoticallyaccurateresults,though [12],andarguedtobeoptimalintheSBM.Spectralalgorithms a rigorous proof is still lacking. This algorithm is general and that match the performance of BP were later introduced in applies to any model parameters p . For now on, however, ab [16], [17]. The spectral algorithms presented here are based we restrict our theoretical discussion to the symmetric model on a generalization of the operators that they introduce. (2). Eq. (6) can be written in the compact form P = F(P), where P∈R2mk and m=|E| is the number of edges in G. C. Outline and main results ThefirststepinunderstandingthebehaviorofBPistonote In Sec. II, we describe three closely related algorithms to that in the case of symmetric clusters (2), there exists a trivial solvethepartialrecoveryproblemofSec.I-A.Thefirstoneis fixed point of the recursion (6), namely P (c )=1/k. This i→j i fixed point is uninformative, yielding a vanishing overlap. If C. The Bethe Hessian thisfixedpointisstable,thenstartingfromaninitialcondition The non-backtracking operator B of the last section is close to it will cause BP to fail to recover the clusters. We a large, non-symmetric matrix, making the implementation therefore investigate the linearization of (6) around this fixed of the previous algorithm numerically challenging. A much point, given by the Jacobian J . F smaller, closely related symmetric matrix can be defined that empirically performs as well in recovering the clusters, and in B. The non-backtracking operator fact slightly better than B. For a real parameter x≥1, define a matrix H(x)∈Rn×n with non-zero elements: A simple computation yields (cid:18) 1 (cid:19) H (x)=1+(cid:80)l∈∂i x2w−(wsi(ls)i2l)2 if i=j , (10) JF =B⊗ Ik− kUk , (7) ij − xw(sij) if (ij)∈E x2−w(sij)2 where I is the k×k identity matrix, U is the k×k matrix where ∂i denotes the set of neighbors of node i in the graph k k withallitsentriesequalto1,⊗denotesthetensorproduct,and G, and w is defined in (8). A simple computation, analogous B is a 2m×2m matrix called the non-backtracking operator, to [9], allows to show that (λ ≥ 1,v) is an eigenpair of B, actingonthedirectededgesofthegraphG,withelementsfor if and only H(λ)v = 0. This property justifies the following (ab) and (cd)∈E: picture [17]. For x large enough, H(x) is positive definite and has no negative eigenvalue. As we decrease x, H(x) gains a B =w(s )1(a=d)1(b(cid:54)=c), new negative eigenvalue whenever x becomes smaller than (a→b),(c→d) cd p (s)−p (s) (8) an eigenvalue of B. Finally, at x = 1, there is a one to ∀s,w(s):= in out . one correspondence between the negative eigenvalues of H(x) p (s)+(k−1)p (s) in out and the real eigenvalues of B that are larger than 1. We call Note that to be consistent with the analysis of BP, our Bethe Hessian the matrix H(1), and propose the following definition of the non-backtracking operator is the transpose spectral algorithm, by analogy with Sec. II-B. First, compute of the definition of [16]. This matrix generalizes the non- all the negative eigenvalues of H(1). Let r be their number. If backtracking operators of [9], [16] to arbitrary edge weights. r = 0, raise an error. Otherwise, denoting vi,...,vr ∈ Rn the More precisely, for the censored block model [8], we have correspondingeigenvectors,formthematrixX =[v1···vr]∈ s = ±1 and w(s) = (1−2(cid:15))s so that B is simply a scaled Rn×r by stacking them in columns. Finally, regarding each version of the matrix introduced in [9]. We also introduce an itemiasavectorinRr specifiedbythei-thlineofX,cluster operator C∈Rn×2m defined as the items, using e.g. the k-means algorithm. Inthecaseoftwosymmetricclusters,theresultsofthenext C =w(s )1(i=l). (9) i,j→l jl section imply that if α > α , denoting by λ > 1 the largest c 1 eigenvalue of B, the smallest eigenvalue of H(λ ) is 0, and Thisoperatorfollowsfromthelinearizationofeq.(5)forsmall 1 the corresponding eigenvector allows partial recovery of the P . Based on these operators, we propose the following l→i clusters.WhilethepresentalgorithmreplacesthematrixH(λ ) spectral algorithm. First, compute the real eigenvalues of B 1 by the matrix H(1) and is therefore beyond the scope of this with modulus greater than 1. Let r be their number, and denote by v ,...,v ∈ R2m the corresponding eigenvectors. theoreticalguarantee,wefindempiricallythattheeigenvectors 1 r withnegativeeigenvaluesofH(1)arealsopositivelycorrelated If r = 0, raise an error. Otherwise, form the matrix Y = [v ···v ] ∈ R2m×r by stacking the eigenvectors in columns, with the hidden clusters, and in fact allow better recovery an1d let Xr =CY ∈Rn×r. Finally, regarding each item i as a (see figure 1), without the need to build the non-backtracking vector in Rr specified by the i-th line of X, cluster the items, operator B and to compute its leading eigenvalue. using e.g. the k-means algorithm. This last algorithm also has an intuitive justification. It is well known [18] that BP tries to optimize the so-called Bethe Theoretical guarantees for the case of k = 2 clusters are free energy. In the same way B can be seen as a spectral sketched in the next section, stating that this simple algorithm relaxation of BP, H(1) can be seen as a spectral relaxation succeeds in partially recovering the true clusters all the way of the direct optimization of the Bethe free energy. In fact, it down to the transition (3). Intuitively, this algorithm can corresponds to the Hessian of the Bethe free energy around a be thought of as a spectral relaxation of belief propagation. trivial stationary point (see e.g. [17], [19]). Indeed, for the particular case of k = 2 symmetric clusters, we will argue that the spectral radius of B is larger than 1 if and only if α > α . As a simple consequence, whenever D. Numerical results c α < α , the trivial fixed point of BP is stable, and BP fails c Figure 1 shows the performance of all three algorithms to recover the clusters. On the other hand, when α > α , c on model-generated problems. We consider the symmetric a small perturbation of the trivial fixed point grows when problem defined by (2) with k = 2,3, fixed p and p , iteratingBP.Ourspectralalgorithmapproximatestheevolution in out chosen to be Gaussian with a strong overlap, and we vary α. of this perturbation by replacing the non-linear operator F All three algorithms achieve the theoretical threshold. by its Jacobian J . In practice, as shown on figure 1, the F non-linearity of the function F allows BP to achieve a better Whileallthealgorithmspresentedinthisworkassumethe overlap than the spectral method based on B, but a rigorous knowledge of the parameters of the model, namely the func- proof that BP is asymptotically optimal is still lacking. tions p for a,b ∈ [k], we argue that the belief propagation a,b Note that for the censored block model, our claim implies Theorem1in[9].Themainideawhichsubstantiatesourclaim is to introduce a new non-backtracking operator with spectral properties close to those of B and then apply the techniques developedin[20]toit.Wetrytousenotationsconsistentwith [20]: for an oriented edge e = u → v = (u,v) from node u to node v, we set e = u, e = v and e−1 = (v,u). For a 1 2 matrice M, its transpose is denoted by M∗. We also define σ =2c −3 for each i∈[n]. i i We start by a simple transformation: if t is the vector in RE(cid:126) defined by t =σ and (cid:12) is the Hadamard product, i.e. e e1 (t(cid:12)x) =σ x , then we have e e1 e B∗x=λx⇔BX(t(cid:12)x)=λ(t(cid:12)x), (11) with BX defined by BX = B σ σ . In particular, BX ef fe e1 e2 and B have the same spectrum and there is a trivial relation betweentheireigenvectors.WithX =σ w(s )σ ,wehave: e e1 e e2 BX =X 1(e =f )1(e (cid:54)=f ). ef e 2 1 1 2 Fig. 2. Clustering of toy datasets using belief propagation. Each dataset is composed of 20000 points, 200 of which come labeled and constitute the Moreover, note that the random variables (Af = σf1σf2)f∈E trainingset.WeusedtheEuclideandistanceasthemeasurementfunctions, are random signs with P(A = 1) = 1/2 and the random f and estimated the probability densities pab on the training set using kernel variables (X ) are such that densityestimation(middlerow).Althoughtheseestimatesareverynoisyand f f∈E oavrearnladpopmingm,ebaesluierefmpreonptaggraatpiohnGisaobflesmtoalalcahvieevraegaevdeergyreheigαha(tcocpurraocwy)u.sFinogr EX =EX2 = 1(cid:88)(pin(s)−pout(s))2 = 1 . comparison, we show in the third row the result of spectral clustering with f f 2 pin(s)+pout(s) αc s the normalized Laplacian, using a 3-nearest neighbors similarity graph (see e.g.[1])builtfromG,i.e.usingonlytheavailablemeasurements. We now define another non-backtracking operator BY. First, letting (cid:15)(s) = pout(s) ∈ [0,1], we define the sequence pin(s)+pout(s) of independent random variables {Y˜ } with P(Y˜ = algorithm is robust to large imprecisions on the estimation of e e∈E e these parameters. To support this claim, we show on figure +1|se) = 1 − P(Y˜e = −1|se) = 1 − (cid:15)(se), so that 2 the result of the belief propagation algorithm on standard E[Y˜e|se]=w(se). We define Ye =Y˜eσe1σe2 and finally toy datasets where the parameters were estimated on a small BY =Y 1(e =f )1(e (cid:54)=f ), fraction of labeled data. ef f 2 1 1 2 sothatE[BY|G,{s } ]=BX.Itturnsoutthattheanalysis e e∈E of the matrix BY can be done with the techniques developped III. PROPERTIESOFTHENON-BACKTRACKINGOPERATOR in [20]. More precisely, we define P the linear mapping We now state our claims concerning the spectrum of B. on RE(cid:126) defined by (Px) = Y x (i.e. in matrix form e e e−1 We restrict ourselves to the case where k = 2 and pin and Pef = Ye1(f = e−1)). Note that P∗ = P and since Ye2 = 1, pout are distributions on a finite alphabet. P isaninvolutionsothatP isanorthogonalmatrix.Asimple computationshowsthat(BY)kP =P(BY)∗k,hence(BY)kP Claim 1: Consider an Erdo˝s-Re´nyi random graph on n is a symmetric matrix. This symmetry corresponds to the vertices with average degree α, with variables assigned to oriented path symmetry in [20] and is crucial to our analysis. vertices c ∈ {1,2} uniformly at random independently from i If (τ ),1 ≤ j ≤ 2m are the eigenvalues of (BY)kP and the graph and measurements s between any two neigh- j,k ij (x ) is an orthonormal basis of eigenvectors, we deduce that boring vertices drawn according to the probability density: j,k pci,cj(s) = 1(ci = cj)pin(s) + 1(ci (cid:54)= cj)pout(s) for two (cid:88)2m fixed (i.e. independent of n) discrete distributions pin (cid:54)= pout (BY)k = τj,kxj,k(Pxj,k)∗. (12) on S. Let B be the matrix defined by (8) and denote by j=1 |λ | ≥ |λ | ≥ ··· ≥ |λ | the eigenvalues of B in order of 1 2 2m Since P is an orthogonal matrix (Px ),1 ≤ j ≤ 2m decreasing magnitude, where m is the number of edges in the j,k graph. Recall that α is defined by (3). Then, with probability is also an orthonormal basis of RE(cid:126). In particular, (12) c tending to 1 as n→∞: gives the singular value decomposition of (BY)k. Indeed, if t = |τ | and y = sign(τ )Px , then we get (i) If α<α , then |λ |≤(cid:113) α +o(1). (BYj,)kk = (cid:80)j2,mk t x j,yk∗ , which ijs,kprecijs,ekly the singular c 1 αc value decompjo=s1itijo,nk ojf,k(Bj,kY)k. As shown in [20], for large (ii) (cid:113)If α > αc, then λ1 = ααc + o(1) and |λ2| ≤ k,thedecomposition(12)carriesstructuralinformationonthe α +o(1). Additionally, denoting by v the eigen- graph. αc vector associated with λ , Cv is positively correlated Acrucialelementintheproofof[20]istheresultofKesten 1 withtheplantedvariables(c ) ,whereCisdefined and Stigum [21], [22] and we give now its extension required i i∈[n] in (9). here which can be seen as a version of Kesten and Stigum’s work in a random environment. We write N∗ = {1,2,...} ACKNOWLEDGMENT and U =∪ (N∗)n the set of finite sequences composed by N∗, where n(N≥0∗)0 contains the null sequence ∅. For u,v ∈ U, This work has been supported by the ERC under the European Union’s FP7 Grant Agreement 307087-SPARCS we note |u| = n for the lenght of u and uv for the and by the French Agence Nationale de la Recherche under sequence obtained by the juxtaposition of u and v. Suppose reference ANR-11-JS02-005-01 (GAP project). that {(N ,A ,A ,...)} is a sequence of i.i.d. random u u1 u2 u∈U variables with value in N×RN∗ such that N is a Poisson u randomvariablewithmeanαandtheA areindependenti.i.d. REFERENCES ui random signs with P(Au1 = 1) = P(Au1 = −1) = 21. We [1] U.Luxburg,“Atutorialonspectralclustering,”StatisticsandComput- then define the following random variables: first s such that ing,vol.17,no.4,p.395,2007. u P(s |A = 1) = p (s ) and P(s |A = −1) = p (s ), [2] D. Achlioptas and F. McSherry, “Fast computation of low rank ma- u u in u u u out u then X =A w(s ) and Y =A Y˜ where P(Y˜ =1|s )= trix approximations,” in Proceedings of the thirty-third annual ACM u u u u u u u u symposiumonTheoryofcomputing. ACM,2001,pp.611–618. 1 − P(Y = −1|s ) = 1 − (cid:15)(s ). We assume that for all u u u [3] V.JogandP.-L.Loh,“Information-theoreticboundsforexactrecovery u ∈ U and i > N , A = s = X = Y = 0. N u ui ui ui ui u inweightedstochasticblockmodelsusingtherenyidivergence,”arXiv will be the number of children of node u and the sequence preprintarXiv:1509.06418,2015. (s ,...,s ) the measurements on edges between u and u1 uNu [4] Y. Chen, C. Suh, and A. Goldsmith, “Information recovery from its children. We set for u=u1u2...un ∈U, pairwisemeasurements:Ashannon-theoreticapproach,”inInformation Theory,2015IEEEInternationalSymposiumon,2015,p.2336. PX =1, PX =X X ...X , ∅ u u1 u1u2 u1...un [5] M.Me´zardandA.Montanari,“Reconstructionontreesandspinglass PY =1, PY =Y Y ...Y . transition,”Journalofstatisticalphysics,vol.124,no.6,pp.1317–1350, ∅ u u1 u1u2 u1...un 2006. We define (with the convention 0 =0) , [6] S. Heimlicher, M. Lelarge, and L. Massoulie´, “Community detection 0 in the labelled stochastic block model,” 09 2012. [Online]. Available: (cid:88) PY http://arxiv.org/abs/1209.2910 M =1, M = u . (13) 0 t αtPX [7] M. Lelarge, L. Massoulie, and J. Xu, “Reconstruction in the labeled |u|=t u stochasticblockmodel,”inInformationTheoryWorkshop(ITW),2013 IEEE,Sept2013,pp.1–5. Then conditionnaly on the variables (s ) , M is a martin- u u∈U t [8] E.Abbe,A.S.Bandeira,A.Bracher,andA.Singer,“Decodingbinary gale converging almost surely and in L2 as soon as α > α . c node labels from censored edge measurements: Phase transition and The fact that this martingale is bounded in L2 follows from efficientrecovery,”arXiv:1404.4749,2014. an argument given in the proof of Theorem 3 in [6]. [9] A.Saade,M.Lelarge,F.Krzakala,andL.Zdeborova,“Spectraldetec- tioninthecensoredblockmodel,”inInformationTheory(ISIT),2015 In order to apply the technique of [20], we need to deal IEEEInternationalSymposiumon,June2015,pp.1184–1188. with the (cid:96)-th power of the non-backtracking operators. For [10] R.G.Gomes,P.Welinder,A.Krause,andP.Perona,“Crowdclustering,” (cid:96) not too large, the local struture of the graph (up to depth inAdvancesinneuralinformationprocessingsystems,2011,p.558. (cid:96)) can be coupled to a Poisson Galton-Watson branching [11] J. Yi, R. Jin, A. K. Jain, and S. Jain, “Crowdclustering with sparse process,sothatthecomputationsdoneforM aboveprovidea pairwiselabels:Amatrixcompletionapproach,”inAAAIWorkshopon (cid:96) goodapproximationofthe(cid:96)-thpowerofthenon-backtracking HumanComputation,vol.2,2012. operatorandwecanusethealgebraictoolsaboutperturbation [12] A. Decelle, F. Krzakala, C. Moore, and L. Zdeborova´, “Asymptotic analysis of the stochastic block model for modular networks and its of eigenvalues and eigenvectors, see the Bauer-Fike theorem algorithmicapplications,”Phys.Rev.E,vol.84,no.6,p.066106,2011. in Section 4 in [20]. [13] E. Mossel, J. Neeman, and A. Sly, “Stochastic block models and reconstruction,”arXivpreprintarXiv:1202.1499,2012. IV. CONCLUSION [14] L.Massoulie,“Communitydetectionthresholdsandtheweakramanu- janproperty,”arXivpreprintarXiv:1311.3085,2013. We have considered the problem of clustering a dataset [15] E.Mossel,J.Neeman,andA.Sly,“Aproofoftheblockmodelthreshold from as few measurements as possible. On a reasonable conjecture,”arXivpreprintarXiv:1311.4115,2013. model, we have made a precise prediction on the number [16] F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborova´, of measurements needed to cluster better than chance, and andP.Zhang,“Spectralredemptioninclusteringsparsenetworks,”Proc. have substantiated this prediction on an interesting particular Natl.Acad.Sci.U.S.A.,vol.110,no.52,pp.20935–20940,2013. case. We have also introduced three efficient and optimal [17] A.Saade,F.Krzakala,andL.Zdeborova´,“Spectralclusteringofgraphs algorithms, based on belief propagation, to cluster model- withthebethehessian,”inAdvancesinNeuralInformationProcessing Systems,2014,pp.406–414. generateddatastartingfromthistransition.Ourresultssuggest that clustering can be significantly sped up by using a number [18] J.S.Yedidia,W.T.Freeman,andY.Weiss,“Bethefreeenergy,kikuchi approximations,andbeliefpropagationalgorithms,”Advancesinneural of measurements linear in the size of the dataset, instead informationprocessingsystems,vol.13,2001. of quadratic. These algorithms, however, require an estimate [19] A. Saade, F. Krzakala, and L. Zdeborova´, “Matrix completion from of the distribution of the measurements between objects de- fewerentries:Spectraldetectabilityandrankestimation,”inAdvances pending on their cluster membership. On toy datasets, we inNeuralInformationProcessingSystems,2015,pp.1261–1269. have demonstrated the robustness of the belief propagation [20] C.Bordenave,M.Lelarge,andL.Massoulie´,“Non-backtrackingspec- algorithm to large imprecisions on these estimates, paving trumofrandomgraphs:communitydetectionandnon-regularramanu- the way for broad applications in real world, large-scale data jangraphs,”arXiv,2015. analysis. A natural avenue for future work is the unsupervised [21] H. Kesten and B. P. Stigum, “A limit theorem for multidimen- sionalgalton-watsonprocesses,”TheAnnalsofMathematicalStatistics, estimation of these distributions, through e.g. an expectation- vol.37,no.5,pp.1211–1223,1966. maximization approach. [22] ——,“Additionallimittheoremsforindecomposablemultidimensional galton-watsonprocesses,”Ann.Math.Statist.,vol.37,no.6,pp.1463– 1481,121966.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.