1 Cover’s Open Problem: “The Capacity of the Relay Channel” ¨ Xiugang Wu, Leighton Pate Barnes and Ayfer Ozgu¨r Abstract Consider a memoryless relay channel, where the channel from the relay to the destination is an isolated bit pipe of capacity C . Let C(C ) denote the capacity of this channel as a function of C . What is the critical value 0 0 0 of C such that C(C ) first equals C( )? This is a long-standing open problem posed by Cover and named “The 0 0 ∞ 7 Capacity of the Relay Channel,” in Open Problems in Communication and Computation, Springer-Verlag, 1987. 1 In this paper, we answer this question in the Gaussian case and show that C(C ) can not equal to C( ) unless 0 0 ∞ C = , regardless of the SNR of the Gaussian channels, while the cut-set bound would suggest that C( ) can 2 0 ∞ ∞ be achieved at finite C0. Our approach is geometric and relies on a strengthening of the isoperimetric inequality n on the sphere by using the Riesz rearrangement inequality. a J 9 I. PROBLEM SETUP AND MAIN RESULT ] In 1987, Thomas M. Cover asked the following question which he called “The Capacity of the Relay T I Channel” [2]. . s c [ A. The Capacity of the Relay Channel1 1 Consider the following seemingly simple discrete memoryless relay channel: Here Z and Y are condi- v 3 4 0 2 C 0 0 . 1 0 7 1 tionally independent and conditionally identically distributed given X, that is, p(z,y x) = p(z x)p(y x). : v Also, the channel from Z to Y does not interfere with Y. A (2nR,n) code for this| channel|is a m|ap i X Xn : [1 : 2nR] n, a relay function f : n [1 : 2nC0] and a decoding function g : n [1 : n n r 2nC0] [1 : 2nR→]. TXhe probability of error is gZiven→by Y × a → P(n) = Pr(g (Yn,f (Zn)) = M), e n n (cid:54) where the message M is uniformly distributed over [1 : 2nR] and n n p(m,yn,zn) = 2−nR p(y x (m)) p(z x (m)). i i i i | | i=1 i=1 (cid:89) (cid:89) Let C(C ) be the supremum of achievable rates R for a given C , that is, the supremum of the rates R 0 0 for which P(n) can be made to tend to zero. We note the following facts: e The work was supported in part by NSF award CCF-1514538 and by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-0939370. This paper was presented in part at the 2015 Allerton Conference on Communication, Control, and Computing [1]. X. Wu, L. P. Barnes and A. O¨zgu¨r are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA (e-mail: [email protected]; [email protected]; [email protected]). 1This subsection is taken verbatim from [2] with a few notation changes. 2 1. C(0) = sup I(X;Y). p(x) 2. C( ) = sup I(X;Y,Z). p(x) ∞ 3. C(C ) is a nondecreasing function of C . 0 0 What is the critical value of C such that C(C ) first equals C( )? 0 0 ∞ B. Main Result In this paper, we answer this long-standing open question in the Gaussian case. In particular, consider the symmetric Gaussian relay channel as depicted in Fig. 1, where Z = X +W 1 Y = X +W (cid:26) 2 with the transmitted signal being constrained to average power P, i.e.2, 2nR 1 Xn(m) 2 nP, (1) 2nR (cid:107) (cid:107) ≤ m=1 (cid:88) and W ,W (0,N) representing Gaussian noises that are independent of each other and X. 1 2 ∼ N Z W (0,N) 1⇠N C0 W (0,N) 2 ⇠N X Y Fig. 1. Symmetric Gaussian relay channel. For this channel it is easy to observe that 1 2P C( ) = log 1+ . ∞ 2 N (cid:18) (cid:19) Let C∗ := inf C : C(C ) = C( ) . (2) 0 { 0 0 ∞ } The cut-set bound [8] yields the following lower bound on C∗: 0 1 2P 1 P C∗ log 1+ log 1+ , 0 ≥ 2 N − 2 N (cid:18) (cid:19) (cid:18) (cid:19) which may lead one to suspect that C( ) could be achieved at finite C . The main result of our paper 0 ∞ is to show that C∗ = regardless of the parameters of the problem. 0 ∞ Theorem 1.1: For the symmetric Gaussian relay channel depicted in Fig. 1, C∗ = . 0 ∞ 2Thisconstraintislessstringentthanrequiring(cid:107)Xn(m)(cid:107)2 ≤nP, ∀m∈[1:2nR],whichisamorestandardwayofexpressingtheaverage powerconstraintfortheAWGNchannel.Notethatthecapacitycanbeonlylargerunder(1)andthereforeconclusionsofTheorems1.1and 1.2 are also valid under the individual power constraint for the codewords, i.e. (cid:107)Xn(m)(cid:107)2 ≤nP, ∀m∈[1:2nR]. 3 This theorem follows immediately from the following theorem which establishes an upper bound on the capacity of this channel for any C . 0 Theorem 1.2: For the symmetric Gaussian relay channel depicted in Fig. 1, the capacity C(C ) satisfies 0 1 P C(C ) log 1+ +C +logsinθ (3) 0 0 ≤ 2 N  (cid:18) (cid:19)  1 P C(C ) log 1+ + min h (ω), (4)  0 θ ≤ 2 N ω∈(π−θ,π] (cid:18) (cid:19) 2 2  for some θ arcsin2−C0, π, where ∈ 2 SNR = -15 dB (cid:2) (cid:3) 0.0145 4sin2ω(P +N Nsin2ω)sin2θ hθ(ω) = 2 logCCOu-ldFt- bsoeut bnodund(P2+N)(sin−2θ cos22ω) . 0.04 N(cid:32)ew bound (cid:33) − In Fig. 2 we plot this upper boun0d.035(label: New bound) under three different values of the SNR = P of N the Gaussian channels together with the celebrated cut-set bound [8] and an upper bound on the capacity of this channel we have previously0.03derived in [6] (label: Old bound). For reference, we also provide the rate achieved by a compress-and-forward relay strategy (label: C-F), which employs Gaussian input 0.025 distribution at the source combined with Gaussian quantization and Wyner-Ziv binning at the relay.3 Note that from these figures one can visually observe that the new upper bound reaches the value C( ) only 0.02 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 ∞ as C0 , which leads to the conclusion in ThC0e (boit/crhaennmel use)1.1. This is formally proved in the next section. → ∞ 0.045 SNR = -15 dB 0.8 SNR = 0 dB 3.1 SNR = 15 dB Cut-set bound Cut-set bound Cut-set bound C-F C-F C-F Old bound 0.75 Old bound 3 Old bound 0.04 New bound New bound New bound 0.7 2.9 0.035 0.65 2.8 0.03 0.6 2.7 0.025 0.55 2.6 0.02 0.5 2.5 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 C0 (bit/channel use) C0 (bit/channel use) C0 (bit/channel use) Fig. 2. Upper bounds and achievable rates for the Gaussian relay channel. SNR = 0 dB 0.8 Cut-set bound C-F 0.75 Old bound C. ApprNeow baoucndh 0.7 Our approach builds on the method we developed in our earlier work [3]–[7] for characterizing in0f.6o5rmation tensions in a Markov chain by using high-dimensional geometry. The main idea is to study the geometry of the high-dimensional typical sets associated with the random variables in the Markov chain 0.6 and then translate this high-dimensional geometry to information inequalities for the random variables. 0.55 This idea applies equally well to single-letter and multi-letter random variables. The main geometric tool em0.p5loyed in our previous work [3]–[7] was the so-called blowing-up lemma. In the current paper, our main 0 0.1 0.2 0.3 0.4 0.5 0.6 geometric ingreC0d (biit/cehannntel uises) a strengthening of the isoperimetric inequality on a high-dimensional sphere, which we develop by building on the Riesz rearrangement inequality [15]. The classical isoperimetric inequality on the sphere states that among all sets on the sphere with a given volume the spherical cap has the smallest boundary or more generally the smallest volume of neighborhood [9]. In this paper, we show 3Thisisnotthebeststate-of-the-artachievablerateasthisratecanbeimproved,forexample,byusingburstytransmissionsandtime-sharing at low SNR. 4 that the spherical cap is the extremal set not only in terms of minimizing the volume of its neighborhood, but roughly speaking also in terms of minimizing its total intersection volume with a ball drawn around a randomly chosen point on the sphere. It may be a priori surprising that the isoperimetric inequality appears as the main technical ingredient in the solution of a network information theory problem. However, a converse can be thought of as characterizing the extremal configuration of the typical sets of the random variables associated with an information theory problem, i.e., the configuration that is induced by the (extremal) capacity-achieving strategy.Inthissense,itisquitenaturalthatatool,suchastheisoperimetricinequality,whichcharacterizes extremal sets in a certain geometric sense, turns out to be useful. Formulating the problem of determining the communication capacity of channels as a problem in high-dimensional geometry is one of Shannon’s most important insights that has led to the conception of information theory. His second paper [10], which appears couple of months after his classical paper “A Mathematical Theory of Communication” [11] but is cited in this first paper, develops a geometric representation of any point-to-point communication system. It then provides an elegant and intuitive geometric proof of the coding theorem for the AWGN channel, where the converse is based on a sphere- packing argument in high-dimensional space and achievability is proved by a geometric random coding argument. However, to the best of our knowledge such techniques have not been used effectively for solving network problems. Our approach is similar to Shannon’s approach in [10] in that the key step in our proof is a packing argument on a spherical cap. However, it is also different from Shannon’s approach as we do not directly study the geometry of the codewords but rather use high-dimensional geometry to characterize information tensions in a Markov chain by a lifting step to a high-dimensional space. We believe this approach can be useful for solving other open problems in network information theory. II. PROOFS OF THEOREMS 1.1 AND 1.2 The proofs of both theorems follow from the below lemma, which is the main technical focus of this paper. The proof idea for this lemma is outlined in Section II-C and a formal proof is given in Section IV. We now state this lemma and show how it leads to the conclusion in Theorem 1.2, which is then used to prove Theorem 1.1. Lemma 2.1: Let I be an integer random variable and Xn, Yn and Zn be n-length random vectors n which form the Markov chain I Zn Xn Yn. Assume moreover that Zn and Yn are i.i.d. white n − − − Gaussian vectors given Xn, i.e. Zn,Yn (Xn,N I ), where I denotes the identity matrix, and n×n n×n ∼ N E[ Xn 2] = nP, and I = f (Zn) is a deterministic mapping of Zn to a set of integers. n n (cid:107) (cid:107) Let H(I Xn) be denoted by nlogsinθ , i.e., define4 n n | − θn := arcsin2−n1H(In|Xn). (5) Then the following inequality holds for any n, 1 4sin2ω(P +N Nsin2ω) H(I Yn) n min log 2 − 2 . (6) n| ≤ ·ω∈(π2−θn,π2] 2 (cid:32)(P +N)(sin2θn −cos2ω)(cid:33) Note that the lemma provides an upper bound on H(I Yn) in terms of H(I Xn) for a Markov chain n n | | that satisfies the conditions of the lemma. A. Proof of Theorem 1.2 Suppose a rate R is achievable. Then there exists a sequence of (2nR,n) codes such that the average probability of error P(n) 0 as n . Let the relay’s transmission be denoted by I = f (Zn). By e n n → → ∞ 4Note that values of θ between 0 and π/2 span all possible values for H(I |Xn) between 0 and ∞. n n 5 standard information theoretic arguments, for this sequence of codes we have nR = H(M) = I(M;Yn,I )+H(M Yn,I ) n n | I(Xn;Yn,I )+nµ (7) n ≤ = I(Xn;Yn)+I(Xn;I Yn)+nµ n | = I(Xn;Yn)+H(I Yn) H(I Xn)+nµ (8) n n | − | nI(X ;Y )+H(I Yn) H(I Xn)+nµ, (9) Q Q n n ≤ | − | n P log 1+ +H(I Yn) H(I Xn)+nµ, (10) n n ≤ 2 N | − | (cid:18) (cid:19) for any µ > 0 and n sufficiently large. In the above, (7) follows from applying the data processing inequality to the Markov chain M Xn (Yn,I ) and Fano’s inequality, (8) uses the fact that I Xn Yn n n − − − − form a Markov chain and thus H(I Xn,Yn) = H(I Xn), (9) follows by defining the time sharing n n | | random variable Q to be uniformly distributed over [1 : n], and (10) follows because 2nR n 2nR 1 1 1 1 E[X2] = X2(m) = Xn(m) 2 P. (11) Q 2nR n i n2nR (cid:107) (cid:107) ≤ m=1 i=1 m=1 (cid:88) (cid:88) (cid:88) Given (10), the standard way to proceed would be to upper bound the first entropy term by H(I Yn) n | ≤ H(I ) nC and lower bound the second entropy term H(I Xn) simply by 0. This would lead to the so- n 0 n ≤ | called multiple-access bound in the well-known cut-set bound on the capacity of this channel [8]. However as we already point out in our previous works [3]–[7], this leads to a loose bound since it does not capture the inherent tension between how large the first entropy term can be and how small the second one can be. Instead, we can use Lemma 2.1 to more tightly upper bound the difference H(I Yn) H(I Xn) in n n | − | (10). We start by verifying that the random variables I ,Xn,Zn and Yn associated with a code of blocklength n n satisfy the conditions in the lemma. It is trivial to observe that they satisfy the required Markov chain condition and Zn and Yn are i.i.d. Gaussian given Xn due to the channel structure. Note also that without loss of generality we can assume that the code satisfies the average power constraint in (1) with equality, i.e., 2nR 1 E[ Xn 2] = Xn(m) 2 = nP. (cid:107) (cid:107) 2nR (cid:107) (cid:107) m=1 (cid:88) This is because given a (2nR,n) code with average probability of error P(n) and E[ Xn 2] = nP(cid:48) < nP, e (cid:107) (cid:107) we can always scale up the codewords by a factor of nP/nP(cid:48) and achieve an average probability of error smaller than or equal to P(n).5 e (cid:112) Therefore, applying Lemma 2.1 to the random variables associated with a code for the relay channel, we can bound the difference of the two entropy terms in (10) and conclude that for any achievable rate R, 1 P R log 1+ + min h (ω)+µ, (12) ≤ 2 (cid:18) N(cid:19) ω∈(π2−θn,π2] θn where h (ω) is defined as θn 1 4sin2ω(P +N Nsin2ω)sin2θ h (ω) = log 2 − 2 n , (13) θn 2 (P +N)(sin2θ cos2ω) (cid:32) n (cid:33) − 5ThiscanbedoneforexamplebyaddingadditionalindependentGaussiannoiseattherelayandthedestinationtoemulatethetransmission of the original codeword. 6 in which θn := arcsin2−n1H(In|Xn) satisfies π θ0 := arcsin2−C0 arcsin2−n1H(In|Xn) = θn . (14) ≤ ≤ 2 At the same time, for any achievable rate R, we also have 1 P R log 1+ +C +logsinθ +µ, (15) 0 n ≤ 2 N (cid:18) (cid:19) which simply follows from (10) by upper bounding H(I Yn) with nC and plugging in the definition of n 0 | θ . Therefore, if a rate R is achievable then for any µ > 0 and n sufficiently large it should simultaneously n satisfy both (12) and (15) for some θ that satisfies the condition in (14). This concludes the proof of the n theorem. B. Proof of Theorem 1.1 In order to prove that Theorem 1.1 follows from Theorem 1.2, consider the second bound (4) on C(C ) 0 in Theorem 1.2. Since the θ in (4) satisfies θ arcsin2−C0 := θ , we can upper bound the right-hand 0 ≥ side of (4) to obtain 1 P C(C ) log 1+ + min h (ω). 0 ≤ 2 (cid:18) N(cid:19) ω∈(π2−θ0,π2] θn Also because for any ω π θ , π , h (ω) h (ω), we further have ∈ 2 − 0 2 θn ≤ θ0 (cid:0) (cid:3) 1 P C(C ) log 1+ + min h (ω). (16) 0 ≤ 2 (cid:18) N(cid:19) ω∈(π2−θ0,π2] θ0 The significance of the function h (ω) is that for any θ > 0, θ0 0 π 1 2P +N h = log , (17) θ0 2 2 P +N (cid:18) (cid:19) (cid:16) (cid:17) and h (ω) is increasing at π, or more precisely, θ0 2 π P h(cid:48) = > 0. θn 2 (2P +N)ln2 (cid:16) (cid:17) Therefore, as long as θ > 0, which is the case when C is finite, the minimization of h (ω) with respect 0 0 θ0 to ω in (16) yields a value strictly smaller than h π in (17). This would allow us to conclude that the θ0 2 capacity C(C ) for any finite C is strictly smaller than 1 log 1+ 2P . 0 0 (cid:0) (cid:1) 2 N We now formalize the above argument. Using the definition of the derivative, one obtains (cid:0) (cid:1) π h π h π ∆ h(cid:48) = lim θ0 2 − θ0 2 − . θ0 2 ∆→0 ∆ (cid:0) (cid:1) (cid:0) (cid:1) (cid:16) (cid:17) Therefore, there exists a sufficiently small ∆ > 0 such that 0 < ∆ < θ and 1 1 0 h π h π ∆ π h(cid:48) π θ0 2 − θ0 2 − 1 h(cid:48) θ0 2 . (cid:12) (cid:0) (cid:1) ∆1 (cid:0) (cid:1) − θ0 2 (cid:12) ≤ 2(cid:0) (cid:1) (cid:12) (cid:16) (cid:17)(cid:12) (cid:12) (cid:12) For such ∆1 we have (cid:12) (cid:12) (cid:12) (cid:12) π π ∆ h(cid:48) π h ∆ h 1 θ0 2 θ0 2 − 1 ≤ θ0 2 − 2 (cid:0) (cid:1) (cid:16) (cid:17) 1 (cid:16) (cid:17)2P +N P∆ 1 = log , 2 P +N − 2(2P +N)ln2 (cid:18) (cid:19) 7 Zn SetofZn/Ynjointly typicalwithXn Xn pnN Yn Fig. 3. Jointly typical set with Xn. which further implies that 1 2P +N P∆ 1 min h (ω) log . (18) ω∈(π2−θ0,π2] θ0 ≤ 2 (cid:18) P +N (cid:19)− 2(2P +N)ln2 Combining (16) and (18) we obtain that for any finite C , there exists some ∆ > 0 such that 0 1 1 2P P∆ 1 C(C ) log 1+ . 0 ≤ 2 N − 2(2P +N)ln2 (cid:18) (cid:19) This proves Theorem 1.1. C. Proof Outline for Lemma 2.1 Recall that Lemma 2.1 bounds H(I Yn) in terms of H(I Xn) in a Markov chain I Zn Xn Yn, n n n | | − − − whereZn andYn arei.i.d.GaussianvectorsgivenXn,E[ Xn 2] = nP andI = f (Zn)isadeterministic n n (cid:107) (cid:107) mapping of Zn to a set of integers. As a preliminary exercise to check that fixing H(I Xn) indeed induces n | an upper bound on H(I Yn), one can verify that if H(I Xn) = 0 then H(I Yn) = 0. One (heuristic) n n n | | | way to see this (that can be made precise) is as follows: H(I Xn) = 0 implies that given the transmitted n | codeword Xn, there is no ambiguity about I , or equivalently all Zn sequences jointly typical with Xn n are mapped to the same I . See Figure 3. However, since Yn and Zn are statistically equivalent given n Xn (they share the same typical set given Xn) this would imply that I can be also determined based on n Yn and therefore H(I Yn) = 0. n | Following a similar line of thought, if H(I Xn) is fixed to a certain non-zero value, say H(I Xn) = n n | | nlogsinθ , this roughly speaking implies that the typical Zn’s surrounding an Xn are now mapped to n − multiple I values. This argument can be made precise as follows: Consider the following B-length i.i.d. n sequence (Xn(b),Yn(b),Zn(b),I (b)) B , (19) { n }b=1 where for any b [1 : B], (Xn(b),Yn(b),Zn(b),I (b)) has the same distribution as (Xn,Yn,Zn,I ). For n n ∈ notational convenience, in the sequel we write the B-length sequence [Xn(1),Xn(2),...,Xn(B)] as X and similarly define Y,Z and I; note that here we have I = [f (Zn(1)),f (Zn(2)),...,f (Zn(B))] =: f(Z). n n n Now we can apply a standard typicality argument to say that for any typical (x,i) pair,6 . p(i x) = Pr(f(Z) = i x) = 2nBlogsinθn. (20) | | . . . 6Following [12], we say a =b if lim 1 logam =0. Notations “≥” and “≤” are similarly defined. m m m→0 m bm 8 z 0 Cap(z ,✓ ) 0 n ✓n pnBN x Fig. 4. A spherical cap with angle θ . n This probabilistic statement can be translated into the following geometric picture: Given x, typical y and z sequences will be approximately uniformly distributed on an (cid:15)-thin spherical shell centered at x and of radius √nBN, denoted as Shell x, nB(N (cid:15)), nB(N +(cid:15)) := a RnB : a x nB(N (cid:15)), nB(N +(cid:15)) − ∈ (cid:107) − (cid:107) ∈ − where (cid:15)(cid:16) 0(cid:112)as B . T(cid:112)he relation ((cid:17)20) ca(cid:110)n then be used to argu(cid:104)e(cid:112)that the set of(cid:112)z’s jointly ty(cid:105)p(cid:111)ical → → ∞ with x that are mapped to the given i, denoted by A (i) = z Shell x, nB(N (cid:15)), nB(N +(cid:15)) : f(z) = i , x ∈ − (cid:26) (cid:27) (cid:16) (cid:112) (cid:112) (cid:17) will occupy a volume Ax(i) =. 2nB(21log2πeNsin2θn), (21) | | on this thin shell. This translation between probabilities and volumes of sets is immediate since y and z are distributed approximately uniformly on the shell. Assume now that the set A (i) were a spherical cap as illustrated in Fig. 4. In general, a spherical cap x on Shell x, nB(N (cid:15)), nB(N +(cid:15)) can be defined as a ball in terms of the geodesic metric, or − simply the angle: (cid:16) (cid:17) (cid:112) (cid:112) y z ∠(y,z) = arccos · y z (cid:18)(cid:107) (cid:107)(cid:107) (cid:107)(cid:19) on the shell, i.e., Cap(z ,φ) = z Shell x, nB(N (cid:15)), nB(N +(cid:15)) : ∠(z ,z) φ , 0 0 ∈ − ≤ (cid:26) (cid:27) (cid:16) (cid:112) (cid:112) (cid:17) where we will refer to z as the pole and φ as the angle of the cap. Using the volume formula for 0 the hyperspherical cap and characterizing the exponent of such a volume (c.f. Appendix A), it can be shown that the volume in (21) would correspond to an angle of θ for the spherical cap as the thickness n of the shell (cid:15) tends to zero. Now, a straightforward computation would yield the following result: Let V = Cap(z ,θ ) Cap(y ,ω ) where ∠(z ,y ) = π/2 and θ +ω > π/2. Then, n 0 n 0 n 0 0 n n | ∩ | . Pr A (i) Cap(Y,ω ) V x 1 as B . (22) x n n | ∩ | ≥ → → ∞ (cid:16) (cid:12) (cid:17) In words, if we take a y uniformly at random on the she(cid:12)ll and draw a spherical cap centered at y with (cid:12) angle ω > π/2 θ , then with high probability the intersection volume of this cap with the cap A (i) will n n x − be approximately lower bounded by V . This statement follows from the (unthinkable in low-dimensions) n 9 z Cap(z0,✓n) 0 Cap(z0,✓n)\Cap(y0,!n) pnBN ✓ n ! n y x 0 Cap(y ,! ) 0 n Fig. 5. Intersection of two spherical caps. fact that in high dimensions most of the volume of the shell is concentrated around the equator (any equator), and in particular the equator at angle π/2 from the pole of A (i). Therefore, as the dimension x nB gets large, for almost all y’s, the intersection volume of the two spherical caps will be approximately given by V (see Fig. 5), which can be shown to be n Vn =. 2nB(21log2πeN(sin2θn−cos2ωn)), by using the volume formula for the intersection of two hyperspherical caps and characterizing the exponent of this volume (c.f. Appendix B).7 One of the main technical steps in our proof is to show that the statement (22) holds for any arbitrary set A (i) with volume given in (21), not only when A (i) is a spherical cap as we assumed above. x x Note that this can be regarded as an extension of the classical isoperimetric inequality on the sphere, which states that among all sets on the sphere with a given volume, the spherical cap has the smallest boundary, or more generally the smallest volume of neighborhood. Another way to interpret the classical isoperimetric inequality is the following: given an abritrary set A on the sphere, if we take a random point on the same sphere and draw a ball of certain radius around it, the probability that this ball touches A is at least as large as the same probability when A is a spherical cap of the same volume. Proving that (22) holds for any set amounts to saying that if we take a random point on the sphere and draw a ball of given radius, with high probability the intersection of the ball with the set A would be at least as large as the intersection we would get if A were a spherical cap. Roughly speaking, it identifies the spherical cap as the extremal set, not only for minimizing the volume of its neighborhood as done by the classical isoperimetric inequality, but also the extremal set when one is interested in minimizing the total intersection volume with A at given distance. We provide a more detailed discussion of this technical step in Section III. The above statement allows us to reach the following conclusion regarding the random vectors (I,X,Y) with high probability: if we take Y and draw a Euclidean ball of radius ω nBN4sin2 n (23) 2 (cid:114) around it, since a Euclidean ball of this radius includes the spherical cap of angle ω in (22); see Fig. 6, n the volume of the intersection of the set A(I) with this ball is lower bounded by A(I) Ball Y, nBN4sin2ωn . 2nB(21log2πeN(sin2θn−cos2ωn)), (24) ∩ 2 ≥ (cid:12) (cid:18) (cid:114) (cid:19)(cid:12) (cid:12) (cid:12) 7Preciselyspeaking,Ap(cid:12)pendixBprovidestheintersectionareaoftwoh(cid:12)ypersphericalcapsonasphereratherthantheintersectionvolume (cid:12) (cid:12) oftwocapsonashell.Notehoweverthatasthedimensiongrowstheintersectionvolumeandarea,computedontheshellandonthesphere respectively, approach the same in the exponent. 10 Y Cap(Y,!n) Fig. 6. Euclidean ball contains the cap. x sphere pnBP pnBN 0 nB(P+N) p y/z sphere Fig. 7. x sphere and y/z sphere. where A(I) is defined as A(I) = z RnB : f(z) = I and Ball(c,r) denotes a ball centered at c with ∈ radius r. This follows from (22), since (22) says that this property holds with high probability conditioned (cid:8) (cid:9) on any x which is typical with (I,Y). In words, if we take a typical realization (i,y) of (I,Y) and draw a ball of radius (23) around y, the volume of the set of points that are mapped to i in this ball is lower bounded as in (24). This puts an upper limit on the number of possible values of i given y. To get a tighter bound, we can incorporate the fact that most of the x’s lie on a thin shell of radius √nBP, and y and z lie on a thin shell of radius nB(P +N). See Fig. 7. Therefore the number of possible values for I given Y can be bounded by the ratio of the spherical cap volume (cid:112) ω Shell 0, nB(P +N (cid:15)), nB(P +N +(cid:15)) Ball nB(P +N)e, nBN4sin2 n , − ∩ 2 (cid:12) (cid:18) (cid:114) (cid:19)(cid:12) (cid:12) (cid:16) (cid:112) (cid:112) (cid:17) (cid:112) (cid:12) where e is any arbitrary unit vector, to the volume each possible i occupies from this cap (cid:12) (cid:12) (cid:12) (cid:12) 2nB(12log2πeN(sin2θn−cos2ωn)). See Figure 8. This ratio can be shown to be . nB(cid:20)1log4sin2ω2n(P+N−Nsin2ω2n)(cid:21) 2 2 (P+N)(sin2θn−cos2ωn) , ≤ which in turn imposes the following bound on H(I Yn): n | 1 4sin2ωn(P +N Nsin2ωn) H(I Yn) n log 2 − 2 . n| ≤ 2 (P +N)(sin2θ cos2ω ) (cid:32) n n (cid:33) − The upper bound (6) in Lemma 2.1 follows by noting that the above argument holds for any ω > π/2 θ . n n −

