Table Of Content

1 Polar Codes: Characterization of Exponent, Bounds, and Constructions Satish Babu Korada, Eren S¸as¸og˘lu and Ru¨diger Urbanke Abstract—Polar codes were recently introduced by Arıkan. bit1 W They achieve the capacity of arbitrary symmetric binary-input bit2 · discrete memoryless channels undera low complexity successive · 9 cancellationdecodingstrategy.Theoriginalpolarcodeconstruc- G⊗n · 0 tion is closely related to the recursive construction of Reed- · 20 sMhouwllenrbcyodAersıkaanndainsdbTaesleadtaornthtahteth2is×c2onmstrautrcitxionˆ11a01ch˜.ieIvteswaans · · error exponent of 1, i.e., that for sufficiently large blocklengths n the error probabil2ity decays exponentially in the square root bitN W a of the length. It was already mentioned by Arıkan that in J principlelargermatricescanbeusedtoconstructpolarcodes.A 6 fundamentalquestionthenistoseewhetherthereexist matrices Fig.1. ThetransformG⊗nisappliedandtheresultingvectoristransmitted 2 with exponent exceeding 1. We first show that any ℓ×ℓ matrix throughthechannel W. 2 noneofwhosecolumnpermutationsisuppertriangularpolarizes ] symmetricchannels.Wethencharacterizetheexponentofagiven T squarematrixandderiveupperandlowerboundsonachievable 1.Weshowthatthisexponentcanbeimprovedbyconsidering I exponents.Usingtheseboundsweshowthattherearenomatrices l2arger matrices. In fact, the exponent can be made arbitrarily . cs oafgseizneerleaslsctohnasntr1u5ctwiointhbeaxspeodneonntsBeCxHceecdoidnegs12w.hFiuchrthfoerr,lwaerggeivne close to 1 by increasing the size of the matrix G. [ achieves exponents arbitrarily close to 1 and which exceeds 1 Finally, we give an explicit construction of a family of for size 16. 2 matrices,derivedfromBCHcodes,withexponentapproaching 2 1 for large ℓ. This construction results in a matrix whose v 6 exponent exceeds 1 for ℓ=16. 2 3 I. INTRODUCTION 5 Polar codes, introduced by Arıkan in [1], are the first II. PRELIMINARIES 0 . provably capacity achieving codes for arbitrary symmetric In this paper we deal exclusively with symmetric channels: 1 binary-inputdiscretememorylesschannels(B-DMC)withlow Definition 1: A binary-input discrete memoryless channel 0 9 encoding and decoding complexity. The polar code construc- (B-DMC) W : 0,1 is said to be symmetric if there 0 tion is based on the following observation: Let exists a permut{ation}π→: Y such that π = π−1 and : W(y 0)=W(π(y)1) for Yall→y Y . Xiv G2 = 11 01 . (1) Le|t W : {0,1} →| Y be a sym∈mYetric binary-input discrete (cid:20) (cid:21) memoryless channel (B-DMC). Let I(W) [0,1] denote the r Apply the transform G⊗n (where “⊗n” denotes the nth mutual information between the input and∈output of W with a 2 Kronecker power) to a block of N = 2n bits and transmit uniform distribution on the inputs. Also, let Z(W) [0,1] ∈ the output through independent copies of a B-DMC W (see denote the Bhattacharyya parameter of W, i.e., Z(W) = Figure 1). As n grows large, the channels seen by individual W(y 0)W(y 1). y∈Y | | bits (suitably defined in [1]) start polarizing: they approach Fix an ℓ 3 and an ℓ ℓ invertible matrix G with either a noiseless channel or a pure-noise channel, where ePntriespin 0≥,1 . Consider a×random ℓ-vector Uℓ that is { } 1 the fraction of channels becoming noiseless is close to the uniformly distributed over 0,1 ℓ. Let Xℓ = UℓG, where { } 1 1 symmetric mutual information I(W). the multiplication is performed over GF(2). Also, let Yℓ be 1 It was conjectured in [1] that polarization is a general phe- the output of ℓ uses of W with the input Xℓ. The channel 1 nomenon,andis notrestrictedto theparticulartransformation between Uℓ and Yℓ is defined by the transition probabilities 1 1 G⊗n. In this paper we first give a partial affirmation to this 2 ℓ ℓ conjecture. In particular, we consider transformations of the W (yℓ uℓ), W(y x )= W(y (uℓG) ). (2) form G⊗n where G is an ℓ ℓ matrix for ℓ 3 and provide ℓ 1| 1 i| i i| 1 i necessary and sufficient con×ditions for such≥Gs to polarize iY=1 iY=1 Define W(i) : 0,1 ℓ 0,1 i−1 as the channel with symmetric B-DMCs. input u , outpu{t (yℓ},u→i−1Y) an×d{trans}ition probabilities For the matrix G2 it was shown by Arıkan and Telatar [2] i 1 1 thattheblockerrorprobabilityforpolarcodingandsuccessive 1 cancellationdecodingisO(2−2nβ)foranyfixedβ < 1,where W(i)(y1ℓ,ui1−1|ui)= 2ℓ−1 Wℓ(y1ℓ|uℓ1), (3) 2 2nistheblocklength.InthiscasewesaythatG2 hasexponent uXℓi+1 2 and let Z(i) denote its Bhattacharyya parameter, i.e., We start by claiming that any invertible 0,1 matrix G { } can be written as a (real) sum G = P +P′, where P is a Z(i) = W(i)(yℓ,ui−1 0)W(i)(yℓ,ui−1 1). permutation matrix, and P′ is a 0,1 matrix. To see this, 1 1 | 1 1 | { } y1ℓX,ui1−1q consider a bipartite graph on 2ℓ nodes. The ℓ left nodes correspond to the rows of the matrix and the ℓ right nodes For k 1 let Wk : 0,1 k denote the B-DMC with ≥ { } → Y correspond to the columns of the matrix. Connect left node transition probabilities i to right node j if G = 1. The invertibility of G implies ij k thatforeverysubsetofrows the numberofcolumnswhich Wk(y1k|x)= W(yj|x). contain non-zero elements inRthese rows is at least |R|. By j=1 Hall’s Theorem [3, Theorem 16.4.] this guarantees that there Y Also let W˜(i) : 0,1 ℓ denote the B-DMC with isamatchingbetweentheleftandtherightnodesofthegraph { } → Y andthismatchingrepresentsapermutation.Therefore,forany transition probabilities invertiblematrix G, there exists a columnpermutationso that 1 W˜(i)(yℓ u )= W (yℓ 0i−1,uℓ). (4) all diagonalelements of the permuted matrix are 1. Note that 1| i 2ℓ−i ℓ 1| 1 i thetransitionprobabilitiesdefiningW(i) areinvariant(uptoa uXℓi+1 permutationof the outputs yℓ) under column permutationson 1 Observation 2: Since W is symmetric, the channels W(i) G. Therefore, for the remainder of this section, and without and W˜(i) are equivalent in the sense that for any fixed ui−1 loss of generality, we assume that G has 1s on its diagonal. 1 there exists a permutation πui1−1 :Yℓ →Yℓ such that tioTnshefofrol(l5o)wtiongbelesmatmisafiegdiv.es necessary and sufficient condi- 1 W(i)(y1ℓ,ui1−1|ui)= 2i−1W˜(i)(πui1−1(y1ℓ)|ui). LeLteWmmbae3a(sCymhamnenterlicTrBa-nDsfMorCm.ation for Polarizing Matrices): Finally, let I(i) denote the mutual information between the input and output of channel W(i). Since G is invertible, it is (i) If G is not upper triangular, then there exists an i for easy to check that which W(i) Wk for some k 2. ≡ ≥ (ii) IfGisuppertriangular,thenW(i) W forall1 i ℓ. ℓ ≡ ≤ ≤ Proof: Let the number of 1s in the last row of G be k. I(i) =ℓI(W). Clearly W(ℓ) Wk. If k 2 then G is not upper triangular i=1 ≡ ≥ X and the first claim of the lemma holds. If k =1 then WewilluseC todenotealinearcodeanddmin(C)todenote its minimum distance. We let hg1,...,gki denote the linear Glk =0, for all 1≤k <ℓ. (7) code generated by the vectors g ,...,g . We let d (a,b) 1 k H One can then write denotetheHammingdistancebetweenbinaryvectorsaandb. We also let dH(a,C) denotethe minimumdistance between a W(ℓ−i)(yℓ,uℓ−i−1 u ) vector a and a code C, i.e., dH(a,C)=minc∈CdH(a,c). 1 1 1 | ℓ−i = W (yℓ uℓ) 2ℓ−1 ℓ 1| 1 III. POLARIZATION uℓℓX−i+1 We say that G is a polarizing matrix if there exists an i = 1 Pr[Yℓ−1 =yℓ−1 Uℓ =uℓ] 1,...,ℓ for which ∈ 2ℓ−1 1 1 | 1 1 { } uℓℓ−−X1i+1,uℓ W˜(i)(y1ℓ|ui)=Q(yAc) W(yj|ui) (5) ·Pr[Yℓ =yℓ|Y1ℓ−1 =y1ℓ−1,U1ℓ =uℓ1] jY∈A (=7) 1 W (yℓ−1 uℓ−1) for some and A 1,...,ℓ with A = k, k 2, and a 2ℓ−1 ℓ−1 1 | 1 probability distrib⊆utio{n Q: |}Ac| [|0,|1]. ≥ uℓℓ−−X1i+1,uℓ Inwords,amatrixGispoYlarizin→gifthereexistsabitwhich ·Pr[Yℓ =yℓ|Y1ℓ−1 =y1ℓ−1,U1ℓ =uℓ1] “sees” a channel whose k outputs are equivalent to those of 1 = W (yℓ−1 uℓ−1) k independentrealizationsofthe underlyingchannel,whereas 2ℓ−1 ℓ−1 1 | 1 theremainingℓ koutputsareindependentoftheinputtothe uℓℓX−−1i+1 − channel.ThereasontocallsuchaG“polarizing”isthat,aswe Pr[Y =y Yℓ−1 =yℓ−1,Uℓ =uℓ] willseeshortly,arepeatedapplicationofsuchatransformation · ℓ ℓ| 1 1 1 1 Xuℓ polarizes the underlying channel. 1 = W(y 0)+W(y 1) W (yℓ−1 uℓ−1). Recall that by assumption W is symmetric. Hence, by 2ℓ−1 ℓ| ℓ| ℓ−1 1 | 1 Observation 2, equation (5) implies (cid:2) (cid:3)uℓℓX−−1i+1 W(i)(y1ℓ,ui1−1|ui)= Q2(iy−A1c) W((πui1−1(y1ℓ))j|ui), (6) TWh(eℓr−efi)orfeo,rYiℓ=is1,i.n.d.e,pℓend1e.ntThoifstihseeqinupivuatlsentot tothseaycihnagnnthealst jY∈A channelsW(1),...,W(ℓ−−1) are definedbythe matrixG(ℓ−1), an equivalence we will denote by W(i) Wk. Note that wherewedefineG(ℓ−i) asthe(ℓ i) (ℓ i)matrixobtained ≡ − × − W(i) Wk implies I(i) =I(Wk) and Z(i) =Z(Wk). from G by removing its last i rows and columns. Applying ≡ 3 the same argumentto G(ℓ−1) and repeating, we see that if G Proof: For any polarizing transformation G, Lemma 3 is upper triangular, then we have W(i) W for all i. On the impliesthatthereexistsani 1,...,ℓ andk 2forwhich ≡ ∈{ } ≥ other hand, if G is not upper triangular, then there exists an i I(i) =I(Wk). (12) for which G(ℓ−i) has at least two 1s in the last row. This in turn implies that W(i) Wk for some k 2. This implies that for the tree process defined above, we have ≡ ≥ Consider the recursive channel combining operation given 1 I =I(Wk) with probability at least , in [1], using a transformation G. Recall that n recursions of n+1 n ℓ this construction is equivalent to applying the transformation for some k 2. Moreover by the convergence in 1 of I , AnG⊗n to U1ℓn where, An : {1,...,ℓn} → {1,...,ℓn} is a we have E[I≥n+1 In ]n→∞0. This in turn implieLs n permutation defined analogously to the bit-reversal operation | − | −→ 1 in [1]. E[I I ] E[(I(Wk) I(W )] 0. (13) | n+1− n| ≥ ℓ n − n → Theorem 4 (Polarization of Symmetric B-DMCs): Given a It is shown in Lemma 33 in the Appendix that for any symmetricB-DMCW andanℓ ℓtransformationG,consider × symmetricB-DMCW ,ifI(W ) (δ,1 δ)forsomeδ >0, the channels W(i),i = 1,...,ℓn , defined by the transfor- n n ∈ − { } then there exists an η(δ) > 0 such that I(Wk) I(W ) > mation AnG⊗n. n − n η(δ).Therefore,convergencein(13)impliesI 0,1 w.p. ∞ (i) If G is polarizing, then for any δ >0 ∈{ } 1.TheclaimontheprobabilitydistributionofI followsfrom ∞ i 1,...,ℓn :I(W(i)) (δ,1 δ) thefactthat In isamartingale,i.e.,E[I∞]=E[I0]=I(W). lim | ∈{ } ∈ − | =0, { } n→∞ ℓn (cid:8) (cid:9) (8) ProofofTheorem4: Note thatforanyn thefractionin (8) i 1,...,ℓn :Z(W(i)) (δ,1 δ) is equalto Pr[In ∈(δ,1−δ)]. Combined with Lemma 6, this lim | ∈{ } ∈ − | =0. implies (8). n→∞ ℓn (cid:8) (cid:9) For any B-DMC Q, I(Q) and Z(Q) satisfy [1] (9) (ii) If G is not polarizing, then for all n and i 1,...,ℓn I(Q)2+Z(Q)2 ≤1, ∈{ } I(Q)+Z(Q) 1. I(W(i))=I(W), Z(W(i))=Z(W). ≥ When I(Q) takes on the value 0 or 1, these two inequalities In [1, Section 6], Arıkan proves part (i) of Theorem 4 for implythat Z(Q)takes onthe value1 or 0, respectively.From G = G . His proof involves defining a random variable W 2 n that is uniformly distributed over the set W(i) ℓn (where Lemma 6 we know that In converges to I∞ w.p. 1 and { }i=1 I 0,1 . This implies{th}at Z converges w.p. 1 to a ℓ=2 for the case G=G ), which implies ∞ n 2 ∈ { } { } random variable Z and ∞ i 1,...,ℓn :I(W(i)) (a,b) Pr[I(W ) (a,b)]= | ∈{ } ∈ |, 0 w.p. I(W), n ∈ ℓn Z = (cid:8) (1(cid:9)0) ∞ (1 w.p. 1 I(W). − i 1,...,ℓn :Z(W(i)) (a,b) This proves the first part of the theorem. The second part Pr[Z(W ) (a,b)]= | ∈{ } ∈ |. n ∈ ℓn follows from Lemma 3, (ii). (cid:4) (cid:8) (cid:9) (11) Remark 7: Arıkan’s proof for part (i) of Theorem 4 with G=G proceeds by first showing the convergence of Z , Following Arıkan, we define the random variable Wn 2 { n} {W(i)}ℓi=n1 for our purpose through a tree process {Wn;n ≥∈ itnhsetemadatroixf G{In}th.eTrheissulitsingacpcroomcepslissheZd byissahoswubinmgartthinatgafloer. 0 with 2 { n} } SuchapropertyisingeneraldifficulttoproveforarbitraryG. W =W, On the other hand, the process I is a martingale for any 0 n { } Wn+1 =Wn(Bn+1), invertiblematrixG,whichissufficienttoensureconvergence. Theorem 4 guarantees that repeated application of a po- where Bn;n 1 is a sequence of i.i.d. random variables larizing matrix G polarizes the underlying channel W, i.e., { ≥ } defined on a probability space (Ω, ,µ), and where Bn is the resulting channels W(i), i 1,...,ℓn , tend towards F ∈ { } uniformly distributed over the set 1,...,ℓ . Defining = either a noiseless or a completely noisy channel. Lemma 6 0 { } F ,Ω and = σ(B ,...,B ) for n 1, we augment the ensuresthatthefractionofnoiselesschannelsisindeedI(W). n 1 n {∅ } F ≥ above process by the processes I ;n 0 := I(W );n This suggests to use the noiseless channels for transmitting n n { ≥ } { ≥ 0 and Z ;n 0 := Z(W );n 0 . It is easy to verify information while transmitting no information over the noisy n n } { ≥ } { ≥ } that these processes satisfy (10) and (11). channels [1]. Let 1,...,ℓn denote the set of channels A⊂{ } Observation 5: (I , ) is a bounded martingale and W(i) used for transmitting the information bits. Since Z(i) n n { F } therefore converges w.p. 1 and in 1 to a random variable upper bounds the error probability of decoding bit U with i I . L the knowledge of Ui−1, the block error probability of such ∞ 1 Lemma 6 (I ): If G is polarizing, then a transmission scheme under successive cancellation decoder ∞ can be upper bounded as [1] 1 w.p. I(W), I∞ =(0 w.p. 1 I(W). PB ≤ Z(i). (14) − i∈A X 4 Further,theblockerrorprobabilitycanalsobelowerbounded The definition of exponent provides a meaningful perfor- in terms of the Z(i)s: Consider a symmetric B-DMC with mance measure of polar codes under successive cancellation Bhattacharyya parameter Z, and let P denote the bit error decoding. This can be seen as follows: Consider a matrix G e probability of uncoded transmission over this channel. It is with exponent E(G). Fix 0 < R < I(W) and β < E(G). known that Definition 10 (i) implies that for n sufficiently large there P 1(1 1 Z2). exists a set of size ℓnR such that Z(i) 2−ℓnβ. e ≥ 2 − − Using set Aas the set of information biit∈s,Athe bl≤ock error p A P A proof of this fact is provided in the Appendix. Under probabilityundersuccessive cancellationdecodingP can be B successive cancellation decoding, the block error probability bounded using (14) as is lower bounded by each of the bit error probabilities over thechannelsW(i).Thereforetheformerquantitycanbelower PB 2−ℓnβ. ≤ bounded by Conversely, consider R>0 and β >E(G). Definition 10 (ii) 1 implies that for n sufficiently large, any set of size ℓnR PB ≥mi∈aAx2(1− 1−(Z(i))2). (15) will satisfy maxi∈AZ(i) >2−ℓnβ. Using (15)Athe block error q probability can be lower bounded as Both the above upper and lower bounds to the block error probability look somewhat loose at a first look. However, as P 2−ℓnβ. B we shall see later, these bounds are sufficiently tight for our ≥ purposes. Therefore, it suffices to analyze the behavior of the It turns out, and it will be shown later, that the exponent Z(i)s. is independent of the channel W. Indeed, we will show in Theorem 14 that the exponent E(G) can be expressed as a IV. RATEOF POLARIZATION function of the partial distances of G. Definition 11 (Partial Distances): Given an ℓ ℓ matrix For the matrix G2 Arıkan shows that, combined with suc- G = [gT,...,gT]T, we define the partial dis×tances D , cessivecancellationdecoding,thesecodesachieveavanishing 1 ℓ i i=1,...,ℓ as block error probability for any rate strictly less than I(W). Moreover, it is shown in [2] that when Zn approaches 0 it Di ,dH(gi, gi+1,...,gℓ ), i=1,...,ℓ 1, does so at a sufficiently fast rate: h i − D ,d (g ,0). Theorem 8 ([2]): Given a B-DMC W, the matrix G and ℓ H ℓ 2 Example 12: The partial distances of the matrix any β < 1, 2 1 0 0 lim Pr[Z 2−2nβ]=I(W). n F = 1 0 1 n→∞ ≤   A similar result for arbitrary G is given in the following 1 1 1 theorem.   are D =1,D =1,D =3. Theorem 9 (Universal Bound on Rate of Polarization): 1 2 3 In order to establish the relationship between E(G) and the Given a symmetric B-DMC W, an ℓ ℓ polarizingmatrix G, and any β < logℓ2, × partialdistances of G we considerthe Bhattacharyyaparame- ℓ tersZ(i) ofthechannelsW(i).TheseparametersdependonG lim Pr[Z 2−ℓnβ]=I(W). as well as on W. The exact relationship with respect to W is n n→∞ ≤ difficulttocomputeingeneral.However,therearesufficiently Proof Idea: For any polarizing matrix it can be shown that Z ℓZ with probability 1 and that Z Z2 with tightupper and lower boundson the Z(i)s in terms of Z(W), n+1 ≤ n n+1 ≤ n the Battacharyya parameter of W. probability at least 1/ℓ. The proof then follows by adapting Lemma 13 (Bhattacharyya Parameter and Partial Distance): the proof of [2, Theorem 3]. (cid:4) For any symmetric B-DMC W and any ℓ ℓ matrix G with The above estimation of the probability is universal and is × partial distances D ℓ independentoftheexactstructureofG.Wearenowinterested { i}i=1 in a more precise estimate of this probability. The results in Z(W)Di Z(i) 2ℓ−iZ(W)Di. (16) this section are the natural generalization of those in [2]. ≤ ≤ Proof: To prove the upper bound we write Definition 10 (Rate of Polarization): For any B-DMC W with 0 < I(W) < 1, we will say that an ℓ ℓ matrix G Z(i) = W(i)(yℓ,ui−1 0)W(i)(yℓ,ui−1 1) has rate of polarization E(G) if × 1 1 | 1 1 | (i) For any fixed β <E(G), y1ℓX,u1i−1q (3) 1 = linm→i∞nfPr[Zn ≤2−ℓnβ]=I(W). 2ℓ−1 y1ℓX,u1i−1 (ii) For any fixed β >E(G), W (yℓ ui−1,0,vℓ )W (yℓ ui−1,1,wℓ ) ℓ 1| 1 i+1 ℓ 1| 1 i+1 liminfPr[Z 2−ℓnβ]=1. sviℓ+X1,wiℓ+1 n For convenience, innt→he∞rest of th≥e paper we refer to E(G) as ≤ 2ℓ1−1 Wℓ(y1ℓ|ui1−1,0,viℓ+1) the exponent of the matrix G. y1ℓX,u1i−1viℓ+X1,wiℓ+1q 5 W (yℓ ui−1,1,wℓ ). · ℓ 1| 1 i+1 q (17) Genie Let c0 = (ui1−1,0,viℓ+1)G and c1 = (ui1−1,1,wiℓ+1)G. Let 0i1−1,uℓi+1 uℓ S (S ) be the set of indices where both c and c are equal i+1 0 1 0 1 yℓ to 0(1). Let Sc be the complement of S0∪S1. We have ui Wℓ 1 Receiver uî Sc =d (c ,c ) D . H 0 1 i | | ≥ Now, (17) can be rewritten as 1 Fig.2. Genie-aided channelWg(i). Z(i) W(y 0) W(y 1) ≤ 2ℓ−1 j| j| viℓ+X1,wiℓ+1y1ℓX,ui1−1jY∈S0 jY∈S1 independentcopyofW,andsincetheweightofg isequalto i W(yj 0)W(yj 1) Di,theresultingchannelWg(i) isequivalenttoDi independent · | | sjY∈Sc copies of W. Hence, Zg(i) =Z(W)Di. 1 ZDi ≤ 2ℓ−1 Lemma 13 shows that the link between Z(i) and Z(W) is viℓ+1,wXiℓ+1,ui1−1 given in terms of the partial distances of G. This link is =2ℓ−iZDi. sufficiently strong to completely characterize E(G). Theorem 14 (Exponent from Partial Distances): For any For the lower bound on Z(i), first note that by Observation symmetric B-DMC W and any ℓ ℓ matrix G with partial 2,wehaveZ(W(i))=Z(W˜(i)).Thereforeitsufficestoshow distances D ℓ , the rate of pola×rization E(G) is given by the claim for the channelW˜(i). Let G=[gT,...,gT]T. Then { i}i=1 1 ℓ using (2), (3) and (4), W˜(i) can be written as 1 ℓ E(G)= log D . (21) ℓ ℓ ℓ i W˜(i)(y1ℓ|ui)= 2ℓ1−i W(yk|xk) (18) Proof: The proof is simXii=la1r to that of [2, Theorem 3]. xℓ1∈XA(ui)kY=1 We highlight the main idea and omit the details. w0h,e1reℓ−xℓ1i ∈ A(ui) ⊂ {0,1}ℓ if and only if for some uℓi+1 ∈ miF=irst n1otejthatnb:yBLje=mmi a.1W3ewtheenhaovbetaZinj ≥ ZjD−B1j. Let |{ ≤ ≤ }| { } Zn ZQiDimi =Zℓ(PimilogℓDi). (22) ℓ ≥ xℓ =u g + u g . (19) The exponent of Z on the right-hand side of (22) can be 1 i i j j j=i+1 rewritten as X Consider the code hgi+1,...,gℓi and let ℓj=i+1αjgj be a ℓPimilogℓDi =(ℓn)Pi mnilogℓDi. codeword satisfying d (g , ℓ α g ) = D . Due to the H i j=i+1 j j P i By the law of large numbers, for any ǫ>0, linearity of the code g ...,g , one can equivalently say i+1 ℓ that xℓ1 ∈A(ui) if andh onlyPif i mi 1 ǫ ℓ ℓ (cid:12) n − ℓ(cid:12)≤ xℓ =u g + α g + u g . (20) (cid:12) (cid:12) 1 i i j j j j with high probabilityfo(cid:12)r n suffi(cid:12)cientlylarge.This provespart (cid:12) (cid:12) (cid:0) j=Xi+1 (cid:1) j=Xi+1 (ii) of the definition of E(G), i.e., for any β > 1ℓ ilogℓDi, Now let g′ = g + ℓ α g and G′ = i i j=i+1 j j lim Pr[Z 2−ℓnβ]=1. P [gT,...,gT ,g′T,gT ,...,gT]T. Equations (19) and (20) n→∞ n ≥ 1 i−1 i i+1 ℓ P showthatthechannelsW(i) definedbythematricesGandG′ The proof for part (i) of the definition follows using similar wareeigeqhutiovfalgei′nti.sNeoqtueatlhtaotGD′i.hasthepropertythattheHamming acrognusmtanentt2sℓa−sBajbocavne,baendtabkyennoctairnegothfautsZinjg≤th2eℓ‘−bBojoZtsjDt−rBa1pjp.iTnhge’ We will now consider a channel W(i) where a genie pro- g argument of [2]. videsextrainformationtothedecoder.SinceW˜(i) isdegraded Example 15: For the matrix F considered in Example 12, with respect to the genie-aided channel W(i), and since the g we have ordering of the Bhattacharyya parameter is preserved under 1 1 degradation,itsufficestofindagenie-aidedchannelforwhich E(F)= (log 1+log 1+log 3)= . Zg(i) =Z(W)Di. 3 3 3 3 3 Considerageniewhichrevealsthe bitsuℓ tothedecoder (Figure 2). With the knowledge of uℓ tih+e1decoder’s task V. BOUNDS ON THEEXPONENT i+1 reduces to finding the value of any of the transmitted bits For the matrix G , we have E(G ) = 1. Note that for the 2 2 2 x for which g = 1. Since each bit x goes through an case of 2 2 matrices, the only polarizing matrix is G . In j ij j 2 × 6 ordertoaddressthequestionofwhethertherateofpolarization where the last equality follows from can be improved by considering large matrices, we define min d (g ,c+g ) min d (g ,c) H k k+1 H k E , max E(G). (23) c∈hgk+2,...,gℓi ≥c∈hgk+1,gk+2,...,gℓi ℓ G∈{0,1}ℓ×ℓ =Dk >Dk+1. eTxhperoersesmion14fofracEil(iGta)tesinthteercmosmpouftathtieonpaorftiEalℓ bdyistparnocveisdionfgGan. TThheeresefocoren,dDck′laDimk′+f1o≥lloDwskDfrok+m1,thwehiincehqpuraolviteysDth′efirstcDlaim>. k+1 ≥ k Lemmas 16 and 18 below provide further simplification for D =D′. k+1 k computing (23). Corollary 19: InthedefinitionofE (23),themaximization ℓ Lemma 16 (Gilbert-Varshamov Inequality for Linear Codes): can be restricted to the matrices G which satisfy D D 1 2 Let C be a binary linear code of length ℓ and dmin(C)=d . ... D . ≤ ≤ 1 ℓ Let g 0,1 ℓ and let d (g,C) = d . Let C′ be the linear ≤ H 2 code o∈bta{ined}by adding the vector g to C, i.e., C′ = g,C . A. Lower Bound Then dmin(C′)=min d ,d . h i Proof: SinceC′ is{a1line2a}rcode,itscodewordsareofthe ThefollowinglemmaprovidesalowerboundonEℓbyusing form c+αg where c C,α 0,1 . Therefore a Gilbert-Varshamov type construction. ∈ ∈{ } Lemma 20 (Gilbert-Varshamov Bound): dmin(C′)=mc∈iCn{min{dH(0,c),dH(0,c+g)}} E 1 ℓ log D˜ =min min dH(0,c) ,min dH(g,c) ℓ ≥ ℓ ℓ i {c∈C{ } c∈C{ }} i=1 X =min d1,d2 . where { } D−1 ℓ D˜ =max D : <2i . (24) Corollary 17: Givenasetofvectorsg1,...,gk withpartial i  j  distances Dj = dH(gj,hgj+1,...,gki), j = 1,...,k, the Proof: We will constructXj=a0m(cid:18)atr(cid:19)ix G =[gT,...,gT]T, mmiinniℓmumDdis.tance of the linear code hg1,...,gki is given by with partial distances Di=D˜i: Let S(c,d) deno1te the seℓt of j=1{ j} binary vectors with Hamming distance at most d from c Themaximizationproblemin(23)isnotfeasibleinpractice ∈ 0,1 ℓ, i.e., even for ℓ 10. The following lemma allows to restrict this { } ≥ maximization to a smaller set of matrices. Even though the S(c,d)= x 0,1 ℓ :d (x,c) d . H maximization problem still remains intractable, by working { ∈{ } ≤ } on this restricted set, we obtain lower and upper bounds on To construct the ith row of G with partial distance D˜i, we Eℓ.Lemma 18 (Partial Distances Should Decrease): Let G = wanidll sfientdgai =v ∈v.{0S,u1c}hℓasavtissfaytiinsfigedsHv(v/,hSgi(+c1,,D˜.i..,g1ℓ)i)fo=rDa˜lil [gT ...gT]T. Fix k 1,...,ℓ and let G′ = c gi+1,...,gℓ and exists if the s∈ets S(c,D˜−i 1), c [g1T ...gℓT gT ...gT]T b∈e the{ matrix}obtained from G by gi∈+1h,...,gℓ doinot cover 0,1 ℓ. The latter co−ndition ∈is 1 k+1 k ℓ h i { } swapping g and g . Let D ℓ and D′ ℓ denote the satisfied if k k+1 { i}i=1 { i}i=1 partial distances of G and G′ respectively. If Dk > Dk+1, S(c,D˜ 1) S(c,D˜ 1) then |∪c∈hgi+1,...,gℓi i− |≤ | i− | (i) E(G′) E(G), c∈hgiX+1,...,gℓi (ii) Dk′+1 >≥Dk′. =2ℓ−iD˜i−1 ℓ <2ℓ, Proof: Note first that D = D′ if i / k,k + 1 . j Therefore, to prove the first cliaim, itisuffices∈to{show th}at Xj=0 (cid:18) (cid:19) which is guaranteed by (24). D′D′ D D . To that end, write k k+1 ≥ k k+1 The solid line in Figure 3 shows the lower bound of Lemma D′ =d (g , g ,g ,...,g ), 20 . The bound exceeds 1 for ℓ = 85, suggesting that the k H k+1 h k k+2 ℓi exponent can be improved2by considering large matrices. In D =d (g , g ,...,g ), k H k k+1 ℓ h i fact, the lower bound tends to 1 when ℓ tends to infinity: Dk′+1 =dH(gk,hgk+2,...,gℓi), Lemma 21 (Exponent 1 is Achievable): limℓ→∞Eℓ =1. Dk+1 =dH(gk+1,hgk+2,...,gℓi), Proof: Fixα∈(0,21).Let{D˜j}bedefinedasinLemma 20. It is known (cite something here) that D˜ in (24) and observe that D′ D since g ,...,g is a sub- ⌈αℓ⌉ code of hgk+1,...,gkℓ+i1. D≥k′ ckan be cohmkp+u2ted as ℓi seanttirsofipeysfluimncℓt→io∞n.D˜T⌈hαeℓr⌉ef≥oreℓ,ht−h1e(rαe)e,xwishtseraenhℓ(0·()αi)s<the bisnuacrhy ∞ thatforallℓ ℓ (α)wehaveD 1ℓh−1(α). Hence,for min min d (g ,c), min d (g ,c+g ) ≥ 0 ⌈αℓ⌉ ≥ 2 H k+1 H k+1 k ℓ ℓ (α) we can write (cid:26)c∈hgk+2,...,gℓi c∈hgk+2,...,gℓi (cid:27) ≥ 0 =min D , min d (g ,c+g ) ℓ k+1 H k k+1 1 { c∈hgk+2,...,gℓi } Eℓ ≥ ℓ logℓD˜i =D , k+1 i=X⌈αℓ⌉ 7 0.6 Example 23 (Sphere Packing Bound): Applying the sphere packing bound for d(ℓ,ℓ i+1) in Lemma 22, we get − 0.5 ℓ 1 E log D˜ , (27) ℓ ≤ ℓ ℓ i i=1 X 0.4 where ⌊D−1⌋ 2 ℓ D˜ =max D : 2i−1 . i  j ≤  0.3 0 8 16 24 32 NotethatforsmallvaluesofnXj=fo0rw(cid:18)h(cid:19)ichd(n,k)isknownfor all k n, the bound in Lemma 22 can be evaluated exactly. Fig. 3. The solid curve shows the lower bound on Eℓ as described by ≤ Lemma20.ThedashedcurvecorrespondstotheupperboundonEℓaccording toLemma26.Thepointsshowtheperformanceofthebestmatricesobtained C. Improved Upper Bound bytheprocedure described inSectionVI. Bounds given in Section V-B relate the partial distances D to minimum distances of linear codes, but are loose 1(1 α)ℓlog D˜ s{incie} they do not exploit the dependence among the D . ≥ ℓ − ℓ ⌈αℓ⌉ { i} In order to improve the upper bound we use the following 1 ℓh−1(α) (1 α)ℓlog parametrization:Consideran ℓ ℓmatrix G=[gT,...,gT]T. ≥ ℓ − ℓ 2 × 1 ℓ Let h−1(α) =1 α+(1 α)log , − − ℓ 2 Ti = k:gik =1,gjk =0 for all j >i { } where the first inequality follows from Lemma 20, and the Si = k: j >i s.t. gjk =1 , { ∃ } second inequality follows from the fact that D˜ D˜ for all i. Therefore we obtain i ≤ i+1 and let ti =|Ti|. Example 24: For the matrix 1 liminfEℓ 1 α α (0, ). (25) 0 0 0 1 ℓ→∞ ≥ − ∀ ∈ 2 0 1 1 0 Also, since D˜i ℓ for all i, we have Eℓ 1 for all ℓ. Hence, F = 1 1 0 0 . ≤ ≤  1 0 0 0  limsupE 1. (26)   ℓ ≤   ℓ→∞ T = 3 and S = 1,2 . 2 2 { } { } Combining (25) and (26) concludes the proof. NotethatTiaredisjointandSi =∪ℓj=i+1Tj.Therefore,|Si|= ℓ t . Denoting the restriction of g to the indices in S j=i+1 i j i by g , we have B. Upper Bound P jSi D =t +s , (28) Corollary 19 says that for any ℓ, there exists a matrix with i i i oDb1ta≤in·u·p·p≤erDboℓutnhdastoanchEiev,eist sthueffiecxepsotnoebnotuEnℓd. Tthheereexfpooren,etnot where si , dH(giSi,hg(i+1)Si,...,gℓSii). By a similar rea- ℓ soning as in the proof of Lemma 18, it can be shown that achievable by this restricted class of matrices. The partial there exists a matrix G with distances of these matrices can be bounded easily as shown in the following lemma. si ≤dH(gjSi,hg(j+1)Si,...,gℓSii) ∀i<j, Lemma 22 (Upper Bound on Exponent): Let d(n,k) de- and note the largest possible minimum distance of a binary code E(G)=E . of length n and dimension k. Then, ℓ Therefore,for such a matrix G, we have (cf. proofof Lemma ℓ E 1 log d(ℓ,ℓ i+1). 22) ℓ ≤ ℓ ℓ − Proof: Let G beXia=n1ℓ ℓ matrix with partial distances si ≤d(|Si|,ℓ−i+1). (29) {Di}ℓi=1 such that E(G) =×Eℓ. Corollary 19 lets us assume Using the structure of the set Si, we can bound si further: twhietrheofuotreloosbstaoinf generality that Di ≤ Di+1 for all i. We LePmrmoaof:25 (BWouendwonillSufib-nddistaancelisn):easri ≤co⌊m|Sb2ii|n⌋a.tion of g ,...,g whose Hamming distance to g is at Di =mj≥ini Dj =dmin(hgi,...,gℓi)≤d(ℓ,ℓ−i+1), m{o(si+t1⌊)S|S2ii|⌋. ToℓSthi}is end define w = ℓj=i+1αjgjSiiS,iwhere where the second equality follows from Corollary 17. αj ∈ {0,1}. Also define wk = Pkj=i+1αjgjSi. Noting Lemma22allowsustouseexistingboundsontheminimum that the sets Tjs are disjoint with ∪ℓjP=i+1Tj = Si, we have distances of binary codes to bound E : d (g ,w)= ℓ d (g ,w ). ℓ H iSi j=i+1 H iTj Tj P 8 We now claim that choosing the α s in the order Example 27 (Chords for m=5): For m = 5 the list of j α ,...,α by chords is given by i+1 ℓ argmin d (g ,w +α g ), (30) = 0 , 1,2,4,8,16 , 3,6,12,17,24 , αj∈{0,1} H iTj j−1Tj j jTj C {{ } { } { } 5,9,10,18,20 , 7,14,19,25,28 , we obtain d (g ,w) |Si| . To see this, note that { } { } H iSi ≤ ⌊ 2 ⌋ 11,13,21,22,26 , 15,23,27,29,30 . by definition of the sets Tj we have wTj = wjTj. Also { } { }} observe that by the rule (30) for choosing αj, we have (cid:4) dH(giTj,wjTj)≤⌊|T2j|⌋. Thus, Let C denote the number of chords and assume that the chords are ordered according to their smallest element as in ℓ Example 27. Let µ(i) denote the minimal element of chord d (g ,w)= d (g ,w ) H iSi H iTj Tj i, 1 i C and let l(i) denote the number of elements in j=Xi+1 chord≤i. N≤ote that by this convention µ(i) is increasing. It is ℓ well known that 1 l(i) m and that l(i) must divide m. = d (g ,w ) H iTj jTj Example 28 (Cho≤rds fo≤r m=5): In Example 27 we have j=i+1 X C = 7, l(1) = 1, l(2) = = l(7) = 5 = m, µ(1) = 0, ℓ T S ··· j i µ(2)=1, µ(3)=3, µ(4)=5, µ(5)=7, µ(6)=11, µ(7)= | | | | . ≤ 2 ≤ 2 15. (cid:4) j=i+1(cid:22) (cid:23) (cid:22) (cid:23) X ConsideraBCHcodeoflengthℓanddimension C l(j)for j=k somek 1,...,C .Itiswell-knownthatthiscodehasmini- Combining (28), (29) and Lemma 25, and noting that the ∈{ } P mumdistanceatleastµ(k)+1.Further,thegeneratormatrixof invertibility of G implies t =ℓ, we obtain the following: i this code is obtained by concatenating the generator matrices Lemma 26 (Improved UPpper Bound): oftwo BCH codesofrespectivedimensions C l(j)and j=k+1 1 ℓ l(k). This being true for all k 1,...,C , it is easy to Eℓ ≤Pℓim=1atxi=ℓ ℓ i=1logℓ(ti+si) sBeCeHthcaotdthee,wgehnicehrawtoirllmbaettrhixeobfastih∈seoℓf{doiumrecnosniPso}tnruaclt(iio.en.,,hraasteth1e) X where propertythatitslast C l(j)rowsformthegeneratormatrix j=k ℓ ℓ ofaBCH codewith minimumdistanceatleastµ(k)+1.This 1 P s =min t ,d t ,ℓ i+1 . translates to the following lower bound on partial distances i j j 2 − The bound givenn(cid:4)in jth=Xei+a1bov(cid:5)e le(cid:0)mj=Xmi+a1is plotted in(cid:1)Foigure 3. o{Df tih}e:cColdeearglye,nDeraiteisdlbeyastthaeslalsatrgℓe ais+th1ermowinsimofumthedmistaatnricxe. It is seen that no matrix with exponent greater than 1 can be Therefore, if C l(j) ℓ i−+1 C l(j), then found for ℓ 10. 2 j=k+1 ≤ − ≤ j=k In additio≤n to providing an upper bound to E , Lemma P Di µ(k)+1. P ℓ ≥ 26 narrows down the search for matrices which achieve Eℓ. The exponentE associated with these partial design distances In particular, it enables us to list all sets of possible partial can then be bounded as distances with exponents greater than 1. For 11 ℓ 14, 2 ≤ ≤ 1 C an exhaustive search for matrices with a “good” set of partial E l(i)log (µ(i)+1). (31) distances bounded by Lemma 26 (of which there are 285) ≥ 2m 1 2m−1 − i=1 shows that no matrix with exponent greater than 1 exists. X 2 Example 29 (BCH Construction for ℓ=31): From the list of chords computed in Example 27 we obtain VI. CONSTRUCTION USING BCH CODES 5 WewillnowshowhowtoconstructamatrixGofdimension E≥ 31log31(2·4·6·8·12·16)≈0.526433. ℓ=16with exponentexceeding 1.In fact,we willshowhow 2 Anexplicitcheckofthepartialdistancesrevealsthattheabove toconstructthebestsuchmatrix.Moregenerally,wewillshow inequality is in fact an equality. (cid:4) howBCHcodesgiveriseto“goodmatrices.”Ourconstruction For large m, the bound in (31) is not convenient to work ofGconsistsoftakinganℓ ℓbinarymatrixwhoseklastrows with. The asymptotic behavior of the exponent is however × form a generator matrix of a k-dimensional BCH code. The easy to assess by considering the following bound. Note partial distance D is then at least as large as the minimum k that no µ(i) (except for i = 1) can be an even number distance of this k-dimensional code. since otherwise µ(i)/2, being an integer, would be contained To describe the partial distances explicitly we make use in chord i, a contradiction. It follows that for the smallest of the spectral view of BCH codes as sub-field sub-codes exponentallchords(exceptchord1)mustbe oflengthm and of Reed-Solomon codes as described in [4]. We restrict our that µ(i)=2i+1. This gives rise to the bound discussion to BCH codes of length ℓ=2m 1, m N. Fix m N. Partition the set of integers −0,1,...∈,2m 2 E 1 (32) into a set∈ of chords, { − } ≥(2m 1)log(2m 1) − − C a = 2m−2 2ki mod (2m 1):k N . mlog(2k)+(2m 2 am)log(2a+2) , C ∪i=0 { − ∈ } · − − ! k=1 X 9 wherea= 2m−2 .Itiseasytoseethatasm theabove obtain the matrix ⌊ m ⌋ →∞ exponent tends to 1, the best exponent one can hope for (cf. 1 0 0 0 Lemma21).We havealso seenin Example29thatform=5 0 1 0 1 we achieve an exponent strictly above 1.  . 2 0 0 1 1 Binary BCH codes exist for lengths of the form 2m 1.  1 1 1 1  −   To construct matrices of other lengths, we use shortening, a   The partial distances of this matrix are 1,2,2,4 . (cid:4) standard method to construct good codes of smaller lengths { } from an existing code, which we recall here: Given a code C, Example 32 (Construction of Code with ℓ=16): Starting with the 31 31 BCH matrix and repeatedly applying the fix a symbol, say the first one, and divide the codewords into × above procedure results in the exponents listed in Table I. two sets of equal size depending on whether the first symbol is a 1 or a 0. Choose the set having zero in the first symbol ℓ exponent ℓ exponent ℓ exponent ℓ exponent anddeletethissymbol.The resultingcodewordsforma linear 31 0.52643 27 0.50836 23 0.50071 19 0.48742 code with both the length and dimension decreased by one. 30 0.52205 26 0.50470 22 0.49445 18 0.48968 29 0.51710 25 0.50040 21 0.48705 17 0.49175 Theminimumdistanceoftheresultingcodeisatleastaslarge 28 0.51457 24 0.50445 20 0.49659 16 0.51828 as the initial distance. The generator matrix of the resulting TABLEI code can be obtained from the original generator matrix by removing a generator vector having a one in the first symbol, THEBESTEXPONENTSACHIEVEDBYSHORTENINGTHEBCHMATRIXOF adding this vector to all the remaining vectors starting with a LENGTH31. one and removing the first column. The 16 16 matrix having an exponent 0.51828 is Now consider an ℓ ℓ matrix G . Find the column j with ℓ × × the longest run of zeros at the bottom, and let i be the last 1 0 0 1 1 1 0 0 0 0 1 1 1 1 0 1 row with a 1 in this column. Then add the ith row to all the 0 1 0 0 1 0 0 1 0 1 1 1 0 0 1 1 rows with a 1 in the jth column. Finally, remove the ith row  0 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0  andthejthcolumntoobtainan(ℓ 1) (ℓ 1)matrixGℓ−1.  0 1 0 1 0 1 1 0 1 0 1 0 0 0 0 0  − × −   The matrix Gℓ−1 satisfies the following property.  1 1 1 1 0 0 0 0 0 0 1 0 1 1 0 1    Lemma 30 (Partial Distances after Shortening): Let the  0 0 1 0 0 1 0 1 0 1 0 0 0 1 1 0    partial distances of G be given by D D . Let  0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0  ℓ { 1 ≤ ··· ≤ ℓ}   G be the resulting matrix obtained by applying the above  0 1 0 1 1 1 0 0 1 0 1 1 0 0 1 0  ℓ−1  . shortening procedure with the ith row and the jth column.  1 1 1 0 0 1 1 0 1 0 0 1 0 1 0 0    Let the partial distances of G be D′,...,D′ . We  1 0 1 0 1 0 1 1 1 0 1 1 0 1 0 1  ℓ−1 { 1 ℓ−1}   have  1 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0     1 0 0 1 1 0 0 0 0 1 0 1 1 0 1 1     1 1 1 1 1 0 1 0 0 0 0 1 0 1 0 0  D′ D , 1 k i 1 (33)   k ≥ k ≤ ≤ −  1 0 1 0 1 1 1 1 0 1 0 0 0 0 0 1  Dk′ =Dk+1, i≤k ≤ℓ−1. (34)  1 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0  Proof: Let G = [gT,...,gT]T and G =  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1  ℓ 1 ℓ ℓ−1   [g′T,...,g′ T]T For i k, g′ is obtained by removing   1 ℓ−1 ≤ k The partial distances of this matrix are the jth column of g . Since all these rows have a zero in k+1 16,8,8,8,8,6,6,4,4,4,4,2,2,2,2,1 . Using Lemma 26 the jth position their partial distances do not change, which { } we observe that for the 16 16 case there are only 11 other in turn implies (34). × possiblesetsofpartialdistanceswhichhavea betterexponent Fork i,notethattheminimumdistanceofthecodeC′ = thantheabovematrix.An exhaustivesearchformatriceswith Thghk′e,r.e.fo.,r≤eg,ℓ′−D1′i isdombitnai(nCe′d) bydmsihnor(tCe)ni=ngDC.= hgk,...,gℓi. seuxcishts.seHtsenocfe,patrhteialabdoisvteanmceastrcixonaficrhmiesvethsatthneobseustchpomssaitbrilxe k ≥ ≥ k Example 31 (Shortening of Code): Consider the matrix exponent among all 16 16 matrices. (cid:4) × ACKNOWLEDGMENT 1 0 1 0 1 0 0 1 0 1 We would like to thank E. Telatar for helpful discussions  0 1 0 0 1 . and his suggestions for an improved exposition. This work  0 0 0 1 1  waspartiallysupportedbytheNationalCompetenceCenterin    1 1 0 1 1  ResearchonMobileInformationandCommunicationSystems   (NCCR-MICS), a center supported by the Swiss National   Science Foundation under grant number 5005-67322. The partialdistances of this matrix are 1,2,2,2,4 .Accord- { } ing to our procedure, we pick the 3rd column since it has a APPENDIX run of three zeros at the bottom (which is maximal). We then add the second row to the first row (since it also has a 1 in Inthissectionweprovethefollowinglemmawhichisused the third column). Finally, deleting column 3 and row 2 we in the proof of Lemma 6. 10 Lemma 33 (Mutual Information of Wk): LetW beasym- K, BSCs where the receiver has knowledge of the particular metric B-DMC and let Wk denote the channel BSC being used. Let ǫ K and Z K denote the bit { i}i=1 { i}i=1 k error probabilities and the Bhattacharyya parameter of the Wk(y1k|x)= W(yi|x). constituent BSCs. Then, Pe(W) and Z(W) are given by Yi=1 K K If I(W) (δ,1 δ) for some δ > 0, then there exists an P (W)= α ǫ , Z(W)= α Z e i i i i ∈ − η(δ)>0 such that I(Wk) I(W)>η(δ). i=1 i=1 − X X The proof of Lemma 33 is in turn based on the following for some α >0, with K α =1. Therefore, theorem. i i=1 i Theorem 34 ([5], [6] Extremes of Information Combining): PK 1 Let W1,...,Wk be k symmetric B-DMCs with capacities Pe(W)= αi2(1− 1−Zi2) I1,...,Ik respectively. Let W(k) denote the channel with Xi=1 q transition probabilities K 1 (1 1 ( α Z )2) k ≥ 2 −v − i i W(k)(y1k|x)= Wi(yi|x). 1 uut Xi=1 Yi=1 = (1 1 Z(W)2), 2 − − Also let W(k) denote the channel with transition probabil- BSC wheretheinequalityfollowsfrpomtheconvexityofthefunction ities x 1 √1 x2 for x (0,1). k → − − ∈ W(k)(yk x)= W (y x), BSC 1| BSC(ǫi) i| REFERENCES i=1 Y where BSC(ǫ ) denotes the binary symmetric channel (BSC) [1] E. Arıkan, “Channel polarization: A method for constructing capacity- i achieving codes forsymmetric binary-input memoryless channels,” sub- with crossover probability ǫi ∈ [0,12], ǫi , h−1(1 − Ii), mittedtoIEEETrans.Inform.Theory,2008. wherehdenotesthebinaryentropyfunction.Then,I(W(k)) [2] E.ArıkanandE.Telatar,“Ontherateofchannelpolarization,”July2008, I(W(k)). ≥ available from“http://arxiv.org/pdf/0807.3917”. BSC [3] J.A.BondyandU.S.R.Murty,GraphTheory. Springer, 2008. Remark 35: Consider the transmission of a single bit X [4] R. E. Blahut, Theory and Practice of Error Control Codes. Addison- using k independent symmetric B-DMCs W ,...,W with Wesley,1983. 1 k [5] I.Sutskover,S.Shamai,andJ.Ziv,“Extremesofinformationcombining,” capacities I ,...,I . Theorem34 states that over the class of 1 k IEEETrans.Inform.Theory,vol.51,no.4,pp.1313–1325,Apr.2005. all symmetric channels with given mutual informations, the [6] I.Land,S.Huettinger, P.A.Hoeher,andJ.B.Huber,“Boundsoninfor- mutualinformationbetweenthe inputandtheoutputvectoris mation combining,” IEEE Transactions on Information Theory, vol. 51, no.2,pp.612–619, 2005. minimized when each of the individual channels is a BSC. Proof of Lemma 33: Let ǫ [0,1] be the crossoverproba- ∈ 2 bilityofa BSCwith capacityI(W), i.e.,ǫ=h−1(1 I(W)). − Note that for k 2, ≥ I(Wk) I(W2) I(W). ≥ ≥ By Theorem 34, we have I(W2) I(W2 ). A simple ≥ BSC(ǫ) computation shows that I(W2 )=1+h(2ǫǫ¯) 2h(ǫ). BSC(ǫ) − We can then write I(Wk) I(W) I(W2 ) I(W) − ≥ BSC(ǫ) − =I(W2 ) I(W ) BSC(ǫ) − BSC(ǫ) =h(2ǫǫ¯) h(ǫ). (35) − Note that I(W) (δ,1 δ) implies ǫ (φ(δ),1 φ(δ)) ∈ − ∈ 2 − where φ(δ) >0, which in turn implies h(2ǫǫ¯) h(ǫ)>η(δ) − for some η(δ)>0. (cid:4) Lemma 36: Consider a symmetric B-DMC W. Let P (W) e denotethebiterrorprobabilityofuncodedtransmissionunder MAP decoding. Then, 1 P (W) (1 1 Z(W)2). e ≥ 2 − − Proof: Onecancheckthattheinequalityissatisfiedwith p equality for BSC. It is also known that any symmetric B- DMCW isequivalenttoaconvexcombinationofseveral,say