Community Detection with Colored Edges Narae Ryu Sae-Young Chung School of EE, KAIST, Daejeon, Korea School of EE, KAIST, Daejeon, Korea Email: [email protected] Email: [email protected] Abstract In this paper, we prove a sharp limit on the community detection problem with colored edges. We assume two equal-sized communitiesandtherearemdifferenttypesofedges.Iftwoverticesareinthesamecommunity,thedistributionofedgesfollows pi = αilogn/n for 1 i m, otherwise the distribution of edges is qi = βilogn/n for 1 i m, where αi and βi are ≤ ≤ ≤ ≤ positive constants and n is the total number of vertices. Under these assumptions, a fundamental limit on community detection m 2 7 is characterized using the Hellinger distance between the two distributions. If Pi=1(√mαi−√βi) > 2,2then the community 1 detectionviamaximumlikelihood(ML)estimatorispossiblewithhighprobability.IfPi=1(√αi−√βi) <2,theprobability that the ML estimator fails to detect the communities does not go to zero. 0 2 n I. INTRODUCTION a J 2 In community detection, the community structure is detected from a given graph by observing the relations between the 1 vertices. Recently, the community detection problem is getting popular in many research fields, such as biology, computer science, and sociology [3]. ] I The limit on the community detection is proven in many cases. For instance, for the case of 2 communities and general k S communities on the stochastic block model (SBM), the limit is known when the probability is of order of logn/n [4], [5]. . s Even when the parameters of SBM are unknown, the limit is proven in [6]. Also, there are works about recovering a hidden c community. In [7], they provide nearly matching necessary and sufficient conditions for the recovery of densely subgraph [ when the distribution of edges follows Bernoulli and Gaussian. Our main theorems generalize the corresponding results in 1 [4]. Also, [2] proved the same results as this paper. However, our proofs are based on Cramer’s theorem and very simple. v 3 We consideragraphG=(V,E)whereV isthesetofn verticesandE isthesetofedgesthatconnectsthevertices.Here, 5 we focus on the case where the graph has two communities of equal size. Unlike most papers on SBM, we will deal with a 1 more general version of SBM that contains colored edges. In other words, two vertices are connected with edge with color 6 1,2,...,m.From nowon, an edgewith colori is denotedbyi-edgeforsimplicity.Therefore,the probabilitythattwo vertices 0 within the same community are connected with an i-edge is p and they are disconnected, i.e., no edge, with probability . i 2 1 m p . In a similar way, two vertices in different communities are connected with an i-edge with probability q and 0 di−sconni=ec1tedi with probability 1 m q . i 7 P − i=1 i 1 Furthermore, we assume that piPand qi are of order of logn/n. Hence, we set pi and qi as pi = αilogn/n and qi = : β logn/nwhereα andβ arepositiveconstants.Thereasonwhywechoosesuchaprobabilityistoguaranteetheconnectivity v i i i of the graph. According to [8], if α + β <2, then there would be an isolated vertex with high probability. i i i i i X In this paper, we prove a fundamental limit on the SBM with colored edges. As the maximum likelihood (ML) detector P P r is optimal in a sense of minimizing the probability of error, we first specify what the ML rule is in this model. In chapter a 3, we provide the limit on the detection of community via two theorems. Theorem 1 proves the sufficient condition for the detection. Theorem 2 proves the necessary part. Therefore, by combining these two theorems, we can get the sharp limit on the community detection with colored edges. II. MAXIMUMLIKELIHOOD ESTIMATORON SBM As mentioned before, we have 2 communities of size n/2 each. If all edges have the same color and p>q, then the ML rule is simply to find the community that has the minimum number of edges across two communities. However, if we have m different types of edges, the rule becomes slightly different. Suppose we fix a graph G and the number of j-edges inside the graph G is denoted by k . Then, we need to find the communitiesA and B of equal size, which maximizes Pr(GA,B) j | among the 2n possible community assignments. To calculate Pr(GA,B), let’s fix some communities A and B and define | l as the number of inner j-edges inside the communities A and B. Then, the number of j-edges across the communities is j k l , obviously. Therefore, Pr(GA,B) is, j j − | Thematerial inthispaperwaspresented inpartattheIEEEInternational SymposiumonInformationTheory(ISIT)2016. m 2(n2/2)−Pmi=1li m n42−Pmi=1(ki−li) Pr(GA,B) = pl1pl2 plm 1 p qk1−l1qk2−l2 qkm−lm 1 q | 1 2 ··· m − i! 1 2 ··· m − i! i=1 i=1 X X p l1 p l2 p lm 1 m q Pmi=1li = 1 2 m − i=1 i q q ··· q 1 m p (cid:18) 1(cid:19) (cid:18) 2(cid:19) (cid:18) m(cid:19) (cid:18) −Pi=1 i(cid:19) m 2(n2/P2) m n42−Pmi=1ki qk1qk2 qkm 1 p 1 q 1 2 ··· m − i! − i! i=1 i=1 X X (a) p1 l1 p2 l2 pm lm = C(1+o(1)) , q q ··· q (cid:18) 1(cid:19) (cid:18) 2(cid:19) (cid:18) m(cid:19) where (a) holds since 1−Pmi=1qi Pmi=1li convergesto 1 as n tends to , and the remaining terms are fixed if the graph G 1−Pmi=1pi ∞ is given.Therefore,M(cid:16)L rule finds(cid:17)two communitiesthat maximizes p1 l1 p2 l2 pm lm. In otherwords, it maximizes q1 q2 ··· qm the weighted sum of the number of inner edges, mi=1lilogpqii. (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) Using this result, we will get a sufficient and necessary condition of the event that ML estimator fails to detect the P communities. For convenience, we define the following events for the true community assignment A and B. F = the maximum likelihood rule fails F ={ m E [v,A]logpi < m E [}v,B]log pi, v A FA ={ im=1Ei[v,A]logqpii > im=1Ei[v,B]logqpii,∃v∈ B} B {Pi=1 i qi Pi=1 i qi ∃ ∈ } where Ei[v,S] is the numberof i-edgesPbetween the vertex v aPnd the set S. First, let’s show F F F. Suppose F and F happen simultaneously. Then, there exist vertices v and v such A B A B A B ∩ ⇒ that, m m p p v A and E [v ,A]log i < E [v ,B]log i A i A i A ∈ q q i i i=1 i=1 X X m m p p v B and E [v ,A]log i > E [v ,B]log i. B i B i B ∈ q q i i i=1 i=1 X X Let’s fix such v and v . And we define a function l (S) as the number of inner edges of type i inside the set of nodes A B i S. Also, we define a new community assignment, A˜= (A v ) v and B˜ = (B v ) v . Then, we can compare the A B B A weighted sum of the number of inner edges of (A,B) and (\A˜,B˜)∪: \ ∪ m m m p p p l (A˜)+l (B˜) log i = ((l (A)+l (B))log i + (E [v ,A] E [v ,A])log i i i i i i B i A q q − q i i i Xi=1(cid:16) (cid:17) Xi=1 Xi=1 m p i + (E [v ,B] E [v ,B])log i A i B − q i i=1 X m p i > (l (A)+l (B))log . i i q i i=1 X Therefore, ML estimator does not choose the original community assignment A and B, resulting in the event F. On the other hand, let’s assume F happens. We define a function o (S ,S ) as the number of edges of type i across the i 1 2 set of nodes S and S . Then, we can get an upper bound on probability of F by the following lemma. 1 2 Lemma 1. Suppose F happens. Then, there exist k and sets A A and B B with A = B = k for 0 k n w ⊂ w ⊂ | w| | w| ≤ ≤ 4 such that m m p p i i [o (A ,B B )+o (B ,A A )]log [o (A ,A A )+o (B ,B B )]log i w w i w w i w w i w w \ \ q ≥ \ \ q i i i=1 i=1 X X Proof: This lemma can be proved in a similar way as lemma 5 in [4]. Assume F happens. Then, there exist Aˆ and Bˆ such that m m p p l (Aˆ)+l (Bˆ) log i (l (A)+l (B))log i, i i i i q ≥ q i i Xi=1(cid:16) (cid:17) Xi=1 where A Aˆ, B Bˆ n, without loss of generality. | ∩ | | ∩ |≥ 4 Then, we can spilt the number of inner edges inside each community as follows. l (A) = l (A Aˆ)+l (A Bˆ)+o (A Aˆ,A Bˆ) i i i i ∩ ∩ ∩ ∩ l (B) = l (B Aˆ)+l (B Bˆ)+o (B Aˆ,B Bˆ) i i i i ∩ ∩ ∩ ∩ l (Aˆ) = l (A Aˆ)+l (B Aˆ)+o (A Aˆ,B Aˆ) i i i i ∩ ∩ ∩ ∩ l (Bˆ) = l (A Bˆ)+l (B Bˆ)+o (A Bˆ,B Bˆ) i i i i ∩ ∩ ∩ ∩ Therefore, we get m m p p o (A Aˆ,B Aˆ)+o (A Bˆ,B Bˆ) log i o (A Aˆ,A Bˆ)+o (B Aˆ,B Bˆ) log i. i i i i ∩ ∩ ∩ ∩ q ≥ ∩ ∩ ∩ ∩ q i i Xi=1(cid:16) (cid:17) Xi=1(cid:16) (cid:17) Let A =A Bˆ and B =B Aˆ. Then, the above inequality can be rewritten as, w w ∩ ∩ m m p p i i [o (A ,B B )+o (B ,A A )]log [o (A ,A A )+o (B ,B B )]log . i w w i w w i w w i w w \ \ q ≥ \ \ q i i i=1 i=1 X X Now we define the probability P(k) as, n m m p p P(k) =Pr [o (A ,B B )+o (B ,A A )]log i [o (A ,A A )+o (B ,B B )]log i , n i w \ w i w \ w qi ≥ i w \ w i w \ w qi! i=1 i=1 X X where k is size of A and B . w w Finally by applying union bound, we get an upper bound on Pr(F), n/4 n/2 2 Pr(F) P(k) ≤ k n k=1(cid:18) (cid:19) X III. LIMIT ON THECOMMUNITY DETECTION In this section, we provide a fundamental limit of the community detection by proving two theorems. If p =q for all i, i i the ML rule fails in detecting the communities obviously. Therefore, we focus on the case where there exists at least one i such that p =q throughoutthe paper. i i 6 A. Achievability Proof Theorem 1. If m (√α √β )2 > 2, then the maximum likelihood estimator detects the communities exactly with high i=1 i− i probability. P Note that m (√α √β )2 is related to the Hellinger distance between the two probability distributions p and q as i=1 i− i m P logn 2 logn H2(p q)= (√α β ) +o , i i k n − n i=1 (cid:18) (cid:19) X p where H(pkq)= mi=+11(√pi−√qi)2 is the Hellingerdistance [9]. This distanceis the special case of the CH-divergence given in [5]. q P Before proving the theorem, we introduce some assumptions and definitions. For simplicity, we assume that A = 1,2,...,n/2 and B = n/2+ 1,...,n . Also, we define independent random variables W for 1 i,j n/2 or ij { } { } ≤ ≤ n/2<i,j n such that ≤ logpk w.p. p for k =1,2,...,m Wij = 0 qk w.p. 1k m p (cid:26) − k=1 k and Zij for 1 i n/2 and n/2+1 j n such that P ≤ ≤ ≤ ≤ logpk w.p. q for k =1,2,...,m Zij = 0 qk w.p. 1k m q (cid:26) − k=1 k Proof of Theorem 1: We can rewrite P(k) using W andPZ , n ij ij m m p p P(k) = Pr [o (A ,B B )+o (B ,A A )]log i [o (A ,A A )+o (B ,B B )]log i n i w \ w i w \ w qi ≥ i w \ w i w \ w qi! i=1 i=1 X X = Pr Z + Z W + W ij ij ij ij ≥ i∈AwX,j∈B\Bw i∈A\AXw,j∈Bw i∈AwX,j∈A\Aw i∈BwX,j∈B\Bw 2k(n−k) 2 = Pr (Zi′ Wi′) 0 , (1) − ≥ i′=1 X where Wi′ and Zi′ are i.i.d. random variables that have the same distribution with Wij and Zij, respectively. And we can get an upper bound on (1) by proving Lemma 1. Lemma 2. 2k(n2−k) n logn m 2 log2n log2n Pr iX′=1 (Zi′ −Wi′)≥0≤2exp"−2k(cid:16)2 −k(cid:17) n Xi=1(√αi−pβi) −C n2 −o(cid:18) n2 (cid:19)!#. for some constant C. Proof: By the proof of Cramer’s theorem in [10], for i.i.d. sequence X and any closed set F IR, i ⊆ n 1 Pr X F 2exp n inf I(x) , i n i=1 ∈ !≤ (cid:18)− x∈F (cid:19) X where I(a)=sup (θa logE[eθX]). The proof for this theorem is in the appendix. θ∈IR − We need to evaluate the right-hand side of the following: 2k(n−k) 2 n Pr (Zi′ Wi′) 0 2exp 2k k inf I(x) . (2) − ≥ ≤ − 2 − x≥0 iX′=1 (cid:18) (cid:16) (cid:17) (cid:19) By direct computation, we can get the moment generating function of (Zi′ Wi′), i.e., − logn m log2n log2n E[exp(θ(Zi′ −Wi′))]=exp"− n i=1(αi+βi−αθiβi1−θ −βiθα1i−θ)+D(θ) n2 +o(cid:18) n2 (cid:19)#, X where D(θ) is a function of θ, the exact form of D(θ) is in Appendix 5.2. Then, the right-hand side of (2) is, for any 2 j n/2, ≤ ≤ inf sup(θx logE[exp(θ(Zi′ Wi′))]) − x≥0(cid:20)θ∈IR − − (cid:21) logn m log2n log2n = inf sup θx+ (α +β αθβ1−θ βθα1−θ) D(θ) o −x≥0"θ∈IR n i=1 i i− i i − i i − n2 − (cid:18) n2 (cid:19)!# X (a) logn m 2 log2n log2n inf 0.5x+ (√α β ) C o ≤ −x≥0" n i=1 i− i − n2 − (cid:18) n2 (cid:19)# X p logn m 2 log2n log2n = (√α β ) +C +o , (3) − n i− i n2 n2 i=1 (cid:18) (cid:19) X p where (a) holds by taking θ =0.5 instead of taking the supremum and D(θ)=C. Finally, by combining (2) and (3), we get 2k(n2−k) n logn m 2 log2n log2n Pr iX′=1 (Zi′ −Wi′)≥0≤2exp"−2k(cid:16)2 −k(cid:17) n Xi=1(√αi−pβi) −C n2 −o(cid:18) n2 (cid:19)!#. Now, the following is similar with proof of theorem 2 in [4]. If we assume that m (√α √β )2 2+ǫ for ǫ>0, i=1 i− i ≥ n/4 2 P n/2 Pr(F) P(k) ≤ k n k=1(cid:18) (cid:19) X n/4 n/2 2 n logn m 2 log2n log2n 2 exp 2k k (√α β ) C o ≤ Xk=1(cid:18) k (cid:19) "− (cid:16)2 − (cid:17) n Xi=1 i−p i − n2 − (cid:18) n2 (cid:19)!# (a) n/4 n n logn m 2 log2n log2n 2 exp 2k log +1 2k k (√α β ) C o ≤ Xk=1 " (cid:16) 2k (cid:17)− (cid:16)2 − (cid:17) n Xi=1 i−p i − n2 − (cid:18) n2 (cid:19)!# (b) n/4 1 k log2n log2n 2 exp 2k logn log2k+1 (2+ǫ)logn C o ≤ − − 2 − n − n − n k=1 (cid:20) (cid:18) (cid:18) (cid:19)(cid:18) (cid:18) (cid:19)(cid:19)(cid:19)(cid:21) X n/4 2k 1 k 1 k log2n log2n 2 exp 2k log2k+1+ logn ǫ logn+ C +o ≤ − n − 2 − n 2 − n n n k=1 (cid:20) (cid:18) (cid:18) (cid:19) (cid:18) (cid:19)(cid:18) (cid:18) (cid:19)(cid:19)(cid:19)(cid:21) X n/4 (c) 2k ǫ 2 exp 2k log2k+1+ logn logn+o(1) ≤ − n − 4 k=1 (cid:20) (cid:18) (cid:19)(cid:21) X n/4 2k = 2 n−k2ǫexp 2k log2k+1+ logn+o(1) , − n k=1 (cid:20) (cid:18) (cid:19)(cid:21) X where (a) holds by Stirling’s formula, n (ne/k)k, (b) holds by the assumption, and (c) holds since k n/4. k ≤ ≤ For sufficiently large n and 1 k n, we have: ≤ ≤(cid:0)4(cid:1) 2 1 1 1 log2k o logn. 32k − 2k ≥ n (cid:18) (cid:19) In other words, log2k 2k logn o(1) 1log2k. By applying this inequality and n−k2ǫ n−21ǫ, − n − ≥ 3 ≤ n/4 1 Pr(F) 2n−12ǫ exp 2k 1 log2k ≤ − 3 k=1 (cid:20) (cid:18) (cid:19)(cid:21) X Since n/4 exp 2k 1 1log2k =O(1), this proves Theorem 1. k=1 − 3 P (cid:2) (cid:0) (cid:1)(cid:3) B. Converse Proof Theorem 2. If m (√α √β )2 < 2, then the probability that the maximum likelihood estimator fails to detect the i=1 i− i communities does not go to zero. P Proof of Theorem 2: We will prove this theorem via several lemmas. In this proof, we assume that there exists at least one i such that p >q . Even if there is no such i, we can prove the theorem in a similar manner. i i Lemma 3. For v H := 1,...,n/log3n and M =max logp /q , if k k k ∈ { } n n2 Mlogn Pr Z W >n−1log3nlog10, vi vi − ≥ loglogn i=Xn/2+1 i=loXgn3n+1 then, for sufficiently large n, Pr(F) 1. ≥ 3 Proof: This can be proved in a similar way as Lemma 3 in [4]. Therefore, we describe its steps briefly. We define the following events. ∆= v H, m E [v,H]logpi < Mlogn {∀ ∈ i=1 i qi loglogn} FH(v) ={v ∈HPsatisfies mi=1Ei[v,A\H]logpqii + lMogllooggnn ≤ mi=1Ei[v,B]logpqii} F = F(v) H {∪v∈H H } P P Then, the condition in the statement can be written as Pr(F(v))>n−1log3nlog10. H First, we show if Pr(FH(v)) > n−1log3nlog10, then Pr(FH) ≥ 190. Since Pr(FH) = 1−(1−Pr(FH(v)))logn3n, we can easily check that Pr(F ) 9 . H ≥ 10 Also, we know that (∆ F ) implies F . Therefore, to evaluate a lower bound on Pr(F ), we should evaluate a lower H A A bound on Pr(∆). Define a∩new event ∆ as ∆ = m E [v,H]logpi < Mlogn , then v v { i=1 i qi loglogn} P n v−1 log3n Mlogn Pr(∆c)=Pr W + W . v iv vj ≥ loglogn i=1 j=v+1 X X Now, we define new i.i.d. random variables X Bern logn α . Then the following inequality holds: j ∼ n i i (cid:16) (cid:17) P n n v−1 log3n Mlogn log3n logn Pr W + W Pr X , iv vj j ≥ loglogn≤ ≥ loglogn i=1 j=v+1 j=1 X X X By the multiplicative Chernoff bound, logn3n logn log3n −lolgolgongn Pr X . j ≥ loglogn≤ e α loglogn j=1 (cid:18) i i (cid:19) X P By using the union bound, we get n 1 Pr(∆) Pr(∆c) − ≤ log3n i n n log3n logn Pr X ≤ log3n j ≥ loglogn j=1 X n logn (logn)3 = exp log log " log3n − loglogn eloglogn iαi# = exp[ 2logn+o(logn)]. P − Therefore, for sufficiently large n, Pr(∆) 9 . This implies Pr(F ) 2, since (∆ F ) implies F . Finally, since ≥ 10 A ≥ 3 ∩ H A (F F ) implies F, we can conclude that if Pr(F ) 2, then Pr(F) 1. A∩ B A ≥ 3 ≥ 3 Lemma 4. For sufficiently large n and any positive constant M, n 2 Mlogn logn Pr (Z1(j+n2)−W1j)≥ loglogn +o loglogn j= n +1 (cid:18) (cid:19) lXog3n m logn 2 exp √α β o(logn) . i i ≥ "− 2 − !− # Xi=1(cid:16) p (cid:17) Proof: By the proof of Cramer’s theorem in [10], for i.i.d. sequence X and any open set U IR, i ⊆ n n 1 1 Pr X U Pr X (a ǫ,a+ǫ) i i n ∈ ! ≥ n ∈ − ! i=1 i=1 X X e−n(I(a)−ǫ|η∗|) 1 σ2 , (4) ≥ − nǫ2 (cid:18) (cid:19) where a and ǫ are constants satisfing (a ǫ,a+ǫ) U, η∗ is the constant that minimizes logE[eηXi], σ2 is the variance of X˜ and I(a)=sup (θa logE[eθX]−). The proo⊂f for this theorem is in the appendix. i θ∈IR − ∗ Here, X˜i is a random variable that follows eEη[exηp∗(Xx]) where p(x) is the distribution of Xi. Since η∗ is a constant and σ2 is of order of logn/n, if we take ǫ=log23 n/n, enǫ|η∗| and the last term in (4) is negligible. 2 Let l = n n for simplicity and take a= Mlogn + 1o logn + log3 n. Then, we have, 2 − log3n lloglogn l loglogn n (cid:16) (cid:17) n Pr 2 (Z1(j+n2)−W1j)≥ lMogllooggnn +o lolgolgongn ≥e−l(I(a)−ǫ|η∗|) 1− lσǫ22 . (5) j= n +1 (cid:18) (cid:19) (cid:18) (cid:19) loXg3n Therefore, we need to evaluate I(a) to evaluate the right-hand side of (5). I(a)= sup θa logE[eθ(Z1(j+n/2)−W1j)] − θ∈IR (cid:16) (cid:17) m logn logn = sup θa+ (α +β αθβ1−θ βθα1−θ) o θ∈IR n i=1 i i− i i − i i − (cid:18) n (cid:19)! X m (a) logn 2 logn = (√α β ) +o , i i n − n i=1 (cid:18) (cid:19) X p where(a)holdssincethefunctioninthesupremumisconcaveandthederivativeofthefunctionbecomeszerowhenθ =0.5+ǫ for 0 ǫ<1/logloglogn. Detailed explanation is in the Appendix. ≤ Lemma 5. For some constant K >0, n log3n 1 m α C i Pr Z1(j+n2) ≥ log2n βilog βi −K≥1− K2log2n. i=1 i=1 X X Proof: To prove this lemma, we will use Chebyshev’s inequality. Since the variance of Z1(j+n) is of order of logn/n, 2 σ2 Clogn/n for some C >0. Then, we get, Z ≤ n log3n 1 m α i Pr Z1(j+n2) ≥ log2n βilog βi −K i=1 i=1 X X n log3n n log3n = Pr Z1(j+n2) ≥ log3n E[Z1(1+n2)]− n K i=1 (cid:18) (cid:19) X C 1 . ≥ − K2log2n Now, we assume that m (√α √β )2 2 ǫ for ǫ>0. For simplicity, we define an event S such that i=1 i− i ≤ − P n log3n 1 m α i S = Z1(j+n2) ≥ log2n βilog βi −K. Xi=1 Xi=1 Then, we have, n n2 Mlogn Pr Z W 1i 1i − ≥ loglogn i=Xn/2+1 i=loXgn3n+1 n n2 Mlogn Pr Z W S Pr(S) 1i 1i ≥ − ≥ loglogn | i=Xn/2+1 i=loXgn3n+1 n2 Mlogn 1 m α i ≥ Pr j= n +1(Z1(j+n2)−W1j)≥ loglogn − log2n i=1βilog βi +K!Pr(S) loXg3n X m (a) logn 2 exp √α β o(logn) Pr(S) i i ≥ "− 2 − !− # Xi=1(cid:16) p (cid:17) (b) C n−1+2ǫ 1 ≥ − K2log2n (cid:18) (cid:19) (c) > n−1log3nlog10, (6) where (a) holds by Lemma 3, (b) holds by the assumption and Lemma 4, and (c) holds for sufficiently large n. Finally, by observing (6), we know that Pr(F) 1 by Lemma 2. This proves Theorem 2. ≥ 3 IV. APPENDIX A. Cramer’s Theorem Cramer’s theorem played a crucial role when we prove the main theorems of this paper. Therefore, in this section, we introducehow to provethis theorem [10]. The followingtheorem is a slightly modified version of Cramer’s theorem,but this is essentially the same as the original one. Theorem (Cramer’s Theorem). For i.i.d. sequence X and any closed set and open set F,U IR, i ⊆ n 1 Pr X F 2exp n inf I(x) , (7) i n i=1 ∈ !≤ (cid:18)− x∈F (cid:19) X and n n 1 1 Pr X U Pr X (a ǫ,a+ǫ) i i n ∈ ! ≥ n ∈ − ! i=1 i=1 X X e−n(I(a)+ǫ|η∗|) 1 σ2 , (8) ≥ − nǫ2 (cid:18) (cid:19) where I(a)=sup (θa logE[eθX]). θ∈IR − Proof:First,wewillshow(7).LetF beanon-emptyclosedset.Wecannoticethat(7)triviallyholdswheninf I(x)= x∈F 0. Hence, we assume inf I(x)>0 through the following proof. On the other hand, the following always holds for all x x∈F and θ 0, ≥ n n 1 1 Pr X x = E E exp nθ X x n i ≥ ! 1n1Pni=1Xi−x≥0 ≤ " n i− !!# Xi=1 h i Xi=1 n = e−nθx E eθXi =exp n θx logE eθX , (9) − − i=1 Y (cid:2) (cid:3) (cid:2) (cid:0) (cid:2) (cid:3)(cid:1)(cid:3) where the inequality holds by Chebycheff’s inequality. Now, we will show these two statements is true by proving Lemma 6. If x¯< , for every x>x¯,Pr 1 n X x e−nI(x) If x¯>∞ , for every x<x¯,Prn 1 i=n1 Xi ≥ x≤ e−nI(x) (cid:26) −∞ (cid:0) Pn i=1 i ≤(cid:1) ≤ Lemma 6. Assume x¯< . Then, for x x¯, I(x)=sup θx(cid:0) lPogE eθX (cid:1) ∞ ≥ θ≥0 − Proof: For all θ IR, we know that logE eθX E log(cid:2)eθX =θx¯(cid:2)by J(cid:3)e(cid:3)nsen’s inequality. ∈ ≥ If x¯ = , then logE eθX = for negative θ so that this lemma trivially holds. Therefore, let’s assume x¯ is finite. −∞ ∞ (cid:2) (cid:3) (cid:2) (cid:3) Then, for x x¯ and θ <0, ≥ (cid:2) (cid:3) θx logE eθX θx¯ logE eθX 0 − ≤ − ≤ This proves the lemma. (cid:2) (cid:3) (cid:2) (cid:3) By combining (9) and Lemma 6, we can prove the first statement. Also, the second statement can be proved in a similar manner. Now,sinceweareinterestedinthecaseoffinitex¯,weassumethatx¯isfinite.Then,I(x¯)=sup θx¯ logE eθx¯ =0 θ≥0 − implies x¯ Fc by the assumption. Let (x ,x ) be the union of all the open intervals (a,b) Fc that contains x¯. Since − + ∈ ∈ (cid:2) (cid:2) (cid:3)(cid:3) F is non-empty, either x or x must be finite. If x is finite, then x F, resulting in inf I(x) I(x ). Likewise, − + − − x∈F − ∈ ≤ inf I(x) I(x ) whenever x is finite. Finally, x∈F + + ≤ n n n 1 1 1 Pr X F Pr X x +Pr X x i i − i + n ∈ ! ≤ n ≤ ! n ≥ ! i=1 i=1 i=1 X X X e−nI(x−)+e−nI(x+) 2exp n inf I(x) ≤ ≤ − x∈F (cid:18) (cid:19) This proves the first statement of the theorem. To prove the second statement of the theorem, we need to show that for every ǫ>0, Pr 1 n Y ( ǫ,ǫ) exp n logE eη∗Y +ǫη∗ 1 σ2 , nXi=1 i ∈ − !≥ h− (cid:16)− h i | |(cid:17)i(cid:18) − nǫ2(cid:19) where Y =X a. i i − Since if we rewrite the above inequality using X , we get: i Pr 1 n X (a ǫ,a+ǫ) exp n logE eη∗(X−a) +ǫη∗ 1 σ2 nXi=1 i ∈ − ! ≥ h− (cid:16)− h i | |(cid:17)i(cid:18) − nǫ2(cid:19) = exp n η∗a logE eη∗X +ǫη∗ 1 σ2 − − | | − nǫ2 h (cid:16) h i (cid:17)i(cid:18) (cid:19) σ2 exp[ n(I(a)+ǫη∗ )] 1 . ≥ − | | − nǫ2 (cid:18) (cid:19) Now, as logE eλX is a continuous and differentiable function, there exists a finite η such that (cid:2) (cid:3) d logE eηX = inf logE eλX and logE eλX =0. λ∈IR dλ λ=η (cid:12) (cid:2) (cid:3) (cid:2) (cid:3) (cid:2) (cid:3)(cid:12) (cid:12) Using η, we define new random variables, X˜ eηxp(x). Then, the expected value of X˜ is, i ∼ E[eηX] 1 E[X˜] = x eηxip(x ) E[eηX] i i χ X 1 d = logE eλX =0. E[eηX]dλ λ=η (cid:12) (cid:2) (cid:3)(cid:12) (cid:12) Therefore, by the law of large numbers, Pr 1 n X˜ ( ǫ,ǫ) 1 σ2 . n i=1 i ∈ − ≥ − nǫ2 Finally, using the above results, (cid:16) (cid:17) (cid:16) (cid:17) P n 1 Pr X ( ǫ,ǫ) = p(x )p(x ) p(x ) i 1 2 n n ∈ − ! ··· Xi=1 |PXxi|<nǫ n e−nǫ|η| exp η x p(x )p(x ) p(x ) i 1 2 n ≥ ! ··· |PXxi|<nǫ Xi=1 n 1 = e−nǫ|η|exp nlogE eηX Pr X˜ ( ǫ,ǫ) i n ∈ − ! i=1 (cid:0) (cid:2) (cid:3)(cid:1) X σ2 exp n logE eηX +ǫη 1 . ≥ − − | | − nǫ2 (cid:18) (cid:19) (cid:2) (cid:0) (cid:2) (cid:3) (cid:1)(cid:3) This proves the second statement of the theorem. B. Detailed Explanation for the proof of Lemma 3 First, we should calculate the momentgenerating function of (Z W ). For simplicity, m p and m q are 1(j+n/2)− 1j i=1 i i=1 i denoted as p∗ and q∗ respectively. α∗ and β∗ are defined in a similar way. P P Then we get 0 w.p. m p q +(1 p∗)(1 q∗) logpi w.p. (1k=p1∗)kqk − − Z1(j+n/2)−W1j = lllooogggpppqqqq12ii21ippqqiiii www...ppp... Ppppiiiqq(−112−qffoo∗)irr ii6=6=12 . . . logpqmmpqii w.p. piqm for i6=m and m m θ p E eθ(Z1(j+n/2)−W1j) = piqi+(1 p∗)(1 q∗) + (1 p∗)qi i − − − q (cid:20)i=1 (cid:21) i=1 (cid:18) i(cid:19) (cid:2) (cid:3) X X m q θ p q θ + p (1 q∗) i + p q j i i i j − p q p i=1 (cid:18) i(cid:19) i6=j (cid:18) j i(cid:19) X X logn 2 m α∗logn β∗logn m α∗logn β logn α θ i i = α β + 1 1 + 1 i i n − n − n − n n β (cid:18) (cid:19) i=1 (cid:18) (cid:19)(cid:18) (cid:19) i=1(cid:18) (cid:19) (cid:18) i(cid:19) X X m α logn β∗logn β θ logn 2 α β θ i i j i + 1 + α β i j n − n α n β α i=1 (cid:18) (cid:19)(cid:18) i(cid:19) (cid:18) (cid:19) i6=j (cid:18) j i(cid:19) X X α∗logn β∗logn m α logn β θ β logn α θ i i i i = 1 + + − n − n n α n β i=1(cid:20) (cid:18) i(cid:19) (cid:18) i(cid:19) (cid:21) X logn 2 m m α θ β θ α β θ + α β +α∗β∗ α∗β i +β∗α i + α β j i i i i i i j (cid:18) n (cid:19) (cid:20)i=1 −i=1 (cid:18)βi(cid:19) (cid:18)αi(cid:19) ! i6=j (cid:18)βjαi(cid:19) (cid:21) X X X 2 logn logn = 1+C(θ) +D(θ) , n n (cid:18) (cid:19) θ θ where C(θ)= α∗ β∗+ m α βi +β αi − − i=1 i αi i βi and D(θ)= m α β +Pα∗β∗(cid:20) (cid:16)m (cid:17) α∗β (cid:16)αi(cid:17)θ(cid:21)+β∗α βi θ + α β αjβi θ. i=1 i i − i=1 i βi i αi i6=j i j βjαi (cid:18) (cid:16) (cid:17) (cid:16) (cid:17) (cid:19) (cid:16) (cid:17) P P P I(a) = sup θa logE[eθ(Z1(j+n/2)−W1j)] − θ∈IR (cid:16) (cid:17) 2 logn logn = sup θa log 1+C(θ) +D(θ) θ∈IR − n (cid:18) n (cid:19) !! m (a) logn 2 logn = (√α β ) +o , i i n − n i=1 (cid:18) (cid:19) X p Define a function f :IR IR such that 7→ 2 logn logn f(θ)=θa log 1+C(θ) +D(θ) . − n (cid:18) n (cid:19) ! Then the derivative and the second derivative of f are given by 2 C′(θ)logn +D′(θ) logn f′(θ)=a n n , − (cid:16) (cid:17) 2 1+C(θ)logn +D(θ) logn n n (cid:16) (cid:17) θ θ where C′(θ)= m α βi log βi +β αi logαi i=1 i αi αi i βi βi and D′(θ)=P m(cid:20) (cid:16)α∗β(cid:17) αi θ β∗α(cid:16) β(cid:17)i θ log(cid:21)αi + α β αjβi θlogαjβi − i=1 i βi − i αi βi i6=j i j βjαi βjαi Also, (cid:18) (cid:16) (cid:17) (cid:16) (cid:17) (cid:19) (cid:16) (cid:17) P P 2 2 2 2 1+C(θ)logn +D(θ) logn C′′(θ)logn +D′′(θ) logn C′(θ)logn +D′(θ) logn n n n n − n n f′′(θ) = (cid:20) (cid:16) (cid:17) (cid:21)(cid:20) (cid:16) (cid:17) (cid:21) (cid:20) (cid:16) (cid:17) (cid:21) − 2 2 1+C(θ)logn +D(θ) logn n n (cid:20) (cid:16) (cid:17) (cid:21) 2 2 1+C(θ)logn +D(θ) logn C′′(θ)+D′′(θ)logn logn C′(θ)+D′(θ)logn logn n n n − n n = (cid:20) (cid:16) (cid:17) (cid:21)(cid:20) (cid:21) (cid:20) (cid:21) , − n 2 2 1+C(θ)logn +D(θ) logn n n (cid:20) (cid:16) (cid:17) (cid:21) θ 2 θ 2 where C′′(θ)= m α βi log βi +β αi logαi . i=1 i αi αi i βi βi (cid:20) (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) (cid:21) P