Polar codes for q-ary channels, q = 2r Woomyoung Park Alexander Barg Department of ECE and Inst. Sys. Res. Department of ECE and Inst. Sys. Res. University of Maryland University of Maryland College Park, Maryland, 20742 College Park, Maryland, 20742 Email: [email protected] Email: [email protected] Abstract—We study polarization for nonbinary channels with to the two extremes. Rather, they show that there exists a input alphabet of size q = 2r,r = 2,3,.... Using Arıkan’s sequence of permutations of the input alphabet such that 2 polarizing kernel H2, we prove that the virtual channels that whenthey are combinedwith H , the virtualchannelsforthe 1 arise in the process of polarization converge to q-ary channels 2 transmitted symbols become either nearly perfect or nearly 0 withcapacity1,2,...,r bits,andthatthetotaltransmissionrate 2 approaches thesymmetric capacity of the channel.Thisleads to useless. an explicit transmission scheme for q-ary channels. The error The authorsof [3] suggest severalalternativesto the kernel n Ja pexropb(−abNiliαty),owfhdeerceodNingisutshiengcosdueccleesnsigvtehcaanndceαllaitsioannbyehcoanvestsanast Hq 2=th2art, roenlymounltirlaenvdeolmscihzeedmepsertmhauttaitmiopnlsemoer,ntinpothlaerccaosdeinogf less than 0.5. 4 for each of the bits of the symbol independently, combining 2 I. INTRODUCTION them in the decoding procedure; see esp. [4]. In this paper we study polarization for channels with input ] Polarizationis a new conceptin informationtheorydiscov- T alphabet of size q = 2r,r = 2,3,.... Suppose that the ered in the context of capacity-achieving families of codes I channel is given by a stochastic matrix W(y x) where x . for symmetric memoryless channels and later generalized | ∈ s ,y , = 0,1,...,q 1 , and is a finite alphabet. c to source coding, multi-user channels and other problems. AXssum∈inYg Xthat th{e channel c−omb}ining iYs performed using the [ PolarizationwasfirstdescribedbyArıkan[1]whoconstructed kernel H with addition modulo q, we establish results about binary codes that achieve capacity of symmetric memoryless 2 3 thepolarizationofchannelsforindividualsymbols.Itturnsout v channels (and “symmetric capacity” of general binary-input that virtual channels for the transmitted symbols converge to 5 channels). The main idea of [1] is to combine the bits of the oneofr+1extremalconfigurationsinwhichj outofrbitsare 6 sourcesequenceusingrepeatedapplicationofthe“polarization 49 kernel” H2 = (cid:16)11 01(cid:17). The resulting linear code of length tarlamnosmstitnteodinnefoarr-mpaetrifoenc.tlyMwohreiloevtehr,etrheemagionoindgbrit−s jarbeitaslcwaaryrys . N = 2n has the generator matrix which forms a submatrix aligned to the right of the transmitted r-block, and no other 7 0 of GN = BH2⊗n, where B is a permutation matrix. The situations arise in the limit. Thus, the extremal configurations 1 choice of the rows of G is governed by the polarization of for information rates that arise as a result of polarization are N 1 virtual channels for individual bits that arise in the process easily characterized: they form an upper-triangular matrix as : v of channel combining and splitting. Namely, the data bits described in Sect. II-B (see also Figs. 1,2 in the final section i are written in the coordinates that correspond to near-perfect of the paper). This characterization also constitutes the main X channels while the other bits are fixed to some values known difference of our results from the multilevel scheme in [4]: r a to both the transmitter and the decoder. It was shown later there, the set of extremal configurationscan in principle have that polarization on binary channels can be achieved using cardinality 2r which complicates the code construction. a variety of other kernels: in particular, any m m matrix Another related work is the paper by Abbe and Telatar × whosecolumnscannotbearrangedtoformanuppertriangular [6]. In it, the authors observed multilevel polarization in a matrix, achieves the desired polarization [2]. somewhat different context. The main result of their paper A study of polar codes for channels with nonbinary input providesa characterizationofextremalpointsof the regionof wasundertakenbyS¸as¸og˘luetal.[3],[4]andMoriandTanaka attainable rates when polar codes are used for each of the r [5]. For prime q, it suffices to take the kernel H , while for users of a multiple-access channel. Namely, as shown in [6] 2 nonprimealphabets,thekernelistime-varyingandnotexplicit. (see also [7]), these points forma subset in the set of vertices Namely,for primeq, [3] showedthatthere exist permutations of a matroid on the set of r users. [6] also remarks that these of the input alphabet such that the virtual channels for indi- results translate directly to transmission over a q-ary DMC, vidualq-arysymbolsbecomeeitherfullynoisyorperfect,and showing that the rate polarizes to many levels. To explain the the proportion of perfect channels approaches the symmetric differencebetween[6]andourworkwenotethattransmission capacity, in analogy with the results for binary codes in [1]. overthemultiple-accesschannelin[6]issetupinsuchaway At the same time, [3] remarks that the transmission scheme that, once applied to the DMC, it corresponds to encoding thatuses the kernelH with modulo-q additionfor composite each bit of the q-ary symbol by its own polar code (we again 2 q does notnecessarily lead to the polarizationof the channels assume that q = 2r). In other words, the polarization kernel employed is a linear operator G=I H . Thus, the group channels W(j),j =1,...,N. For the case q = 2 it is shown acting on X is F+2r = Z2 ×···×Zr2⊗rath2er than the cyclic in [1] that aNs n increases, the channels WN(j) become either additive group of order q considered in this paper. almostperfectoralmostcompletelynoisy(polarize).Informal This work began as an attempt to construct polar codes for terms, for any ε>0 theorderedsymmetricchannel,introducedinourearlierpaper b +, n:I(Wb) (ε,1 ε) [8]. This channel provides an information-theoretic model lim |{ ∈{ −} ∈ − }| =0. (4) related to the ordered distance on binary r-vectors, defined n→∞ 2n as follows: In this paper we extend this result to the case q =2r,r >1. As shown in [1], after n steps of the transformation(2)-(3) dr(x,x′)=max{j :xj 6=x′j}, where x,x′ ∈{0,1}r. (1) the channels WN(i) : X → YN ×Xi−1,1 ≤ i ≤ N are given by Below wt (x) = d (x,0) denotes the ordered weight of the r r symbolx. The ordereddistance is an instance of a large class 1 W(i)(yN,ui−1 u )= WN(yN uNG ), of metrics introduced in [9] following works of Niederreiter N 1 1 | i qN−1 1 | 1 N in numerical analysis [10]. It has subsequently appeared in a uNi+1X∈XN−i (5) large number of works in algebraic combinatoricsand coding where G = BH⊗n and B is a permutation matrix. Here theory; see e.g., [11] and references therein. We find it quite N 2 we use the shorthand notation for sequences of symbols: for interesting that it independently arises in the study of polar instance, yN ,(y ,y ,...,y ), etc. codes on channels with input of size q =2r. Examples of q- 1 1 2 N ary polar codes for ordered symmetric channels can be easily A. Notation constructed and analyzed. Foranypairofinputsymbolsx,x′ , theBhattacharyya Last but not least, when this work was in its final stages, ∈X distance between them is we became aware of the paper by Sahebi and Pradhan [12] whoalsoobservedthemultilevelpolarizationphenomenonfor Z(W{x,x′})= W(y x)W(y x′) | | q-ary channels. At the same time, [12] did not give a proof y∈Y Xp of polarization, which constitutes the main technical part of whereW{x,x′} is the channelobtainedbyrestrictingthe input our work. The motivation of the approach of [12] relates to alphabet of W to the subset x,x′ . a detailed study of linear and group codes on q-ary channels, Define the quantity Z (W){for v}⊂X 0 : v and is also different from our approach. ∈X \{ } 1 In the next section we state and prove the main result, the Z (W)= Z(W ). v 2r {x,x+v} convergenceof the channelsto one ofthe r+1 extremalcon- x∈X X figurations,anddeducethatpolarcodesachievethesymmetric Introducethe ith average Bhattacharyyadistance of the chan- capacityofthechannel.Thenwederivetherateofpolarization nel W by and estimate the error probabilityof decoding,and give some 1 Z (W)= Z (W) (6) examples. i 2i−1 v vX∈Xi II. POLARIZATION FORq-ARY CHANNELS where i=1,2, ,r and = v :wt (v)=i . Then i r ··· X { ∈X } We consider combining of the q-ary data under the action 1 of the operator H2, where q = 2r,r ≥ 2. Let W : X → Z(W):= 2r(2r 1) Z(W{x,x′}) , = q be a discrete memoryless channel (DMC). The − x6=x′ X sYym|Xme|tric capacity of the channel W equals 1 r = 2i−1Z (W) (7) 1 W(y x) 2r 1 i I(W), W(y x)log | − Xi=1 q | 1W(y x′) x∈Xy∈Y x′∈X q | Recall the setting of [1] for the evolution of the channel X X parameters. On the set Ω = +, ∗ of semi-infinite binary where the base of the logarithm is 2P. Define the combined { −} sequencesdefineaσ-algebra onΩgeneratedbythecylinder channel W2 and the channels W− and W+ by sets S(b ,...,b )= ω ΩF:ω =b ,...,ω =b for all 1 n 1 1 n n { ∈ } W (y ,y u ,u )=W(y u +u )W(y u ), sequences(b ,...,b ) +, nandforalln 0.Consider 2 1 2 1 2 1 1 2 2 2 1 n | | | ∈{ −} ≥ W−(y1,y2|u1)= 1qW2(y1,y2|u1,u2), (2) 2th−en,pnrobab0il.itDyesfipnaecea(fiΩlt,rFati,oPn), w0here 1P(S(b1,...,bnw)h)e=re uX2∈X 0 = ≥,Ω and n,n 1 is gFener⊂ateFd b⊂y th··e·c⊂yliFnder sets 1 F {∅ } F ≥ W+(y1,y2,u1|u2)= qW2(y1,y2|u1,u2), (3) S(Lb1e,t.B..,,bin=),b1i,∈2,{+,−be}.i.i.d. +, -valued random vari- i ··· { −} where u ,u ,y ,y are r-vectors and + is a modulo-q ables with Pr(B = +) = Pr(B = ) = 1/2. The random 1 2 1 2 1 1 − sum. This transformation can be applied recursively to the channel emerging at time n will be denoted by WB, where channels W−,W+ resulting in four channels of the form B =(B ,B , ,B ). Thus, P(WB =W(i))=2−n for all 1 2 ··· n N Wb1b2,b1,b2 +. . After n steps we obtain N = 2n i = 1,...,2n. Let Wn = WB, In = I(WB), Z{x,x′},n = ∈ { −} Z(WB ), Z = Z (WB), and Z = Z (WB). These (b)Forevery0 k randeveryδ > 0thereexistsε > 0 {x,x′} v,n v i,n i ≤ ≤ random variables are adapted to the above filtration (meaning suchthat that I etc. are measurable w.r.t. for every n 1). n n F ≥ lim P( I (r k) δ B (ε))=0. n k,n n→∞ {| − − |≤ }△ B. Channel polarization (c)E( i:Z =0 )=I(W). i,∞ In this section we state a sequence of results that shows Proo|{f: The first st}a|tement is obvious from (9)-(10). To that q-ary polar codes based on the kernel H2 can be used to provethe second statement we note that, with the appropriate transmit reliably over the channelW for all rates R<I(W). choice of ε Theorem 1: (a)Letn .TherandomvariableI con- vergesa.e.toarandomva→riab∞leI∞withE(I∞)=I(Wn). {|In−(r−k)|≤δ}⊃Bk,n(ε) (b)Foralli=1,2,...,r for all n 0. At the same time, P( I (r k) δ n ≥ {| ◦ − − | ≤ }∩ nl→im∞Zi,n =Zi,∞ a.e., Banky′,nε(ε>))0=. T0ogfeotrhearllthki′s6=imkpl,ieasnd(bP).(F∪inBalkly,n,(wε)e)h→ave1thfoart E(I ) = I(W). Then use (a) and (b) to claim that E( i : wherethevariablesZ takevalues0and1.Withprobability ∞ i,∞ Z =0 )= r kP(I =k)=I(W). |{ onethevector(Zi,∞,i = 1,...,r)takesoneofthefollowing i,∞ }| k=0 ∞ values: We can say a bit more about the nature of convergence P established in this proposition. Let us fix k 0,1,...,r (Z1,∞ =0,Z2,∞ =0,...,Zr−1,∞ =0, Zr,∞ =0) and define the channel for the r k rightm∈ost{bits of th}e (Z1,∞ =1,Z2,∞ =0,...,Zr−1,∞ =0, Zr,∞ =0) transmitted symbol as follows: − (Z =1,Z =1,...,Z =0, Z =0) 1,∞ 2,∞ r−1,∞ r,∞ . . . (8) 1 .. .. .. W[r−k](y u)= W(y x), u 0,1 r−k | 2k | ∈{ } (Z1,∞ =1,Z2,∞ =1,...,Zr−1,∞ =1, Zr,∞ =0) x∈XX:xrk+1=u (Z =1,Z =1,...,Z =1,Z =1). 1,∞ 2,∞ r−1,∞ r,∞ where x=(x ,x ,...,x ). 1 2 r Let us restate part (b) of this theorem for finite n. Lemma 2: LetV : ˜beaDMCandletδ > 0.Sup- X → Y Proposition 1: Letε,δ > 0 befixed.Fork = 0,1,...,r posethat(Z1,n(V),Z2,n(V),...,Zr,n(V)) k(ε),forsome ∈R definedisjointevents 0 k r.Ifεissufficientlysmall,thenI(V[r−k]) r k δ. ≤ ≤ ≥ − − Inparticular,itsufficestotakeε 2−k+δ/(2r−k 1). ≤ − B (ε)= ω :(Z ,Z ,...,Z ) Proof: We may assume that 1 k r 1. Let u k,n 1,n 2,n r,n k ∈R ≤ ≤ − ∈ r−k,x = (x ,...,x ,u) ,x′ = (x′,...,x′,u) . n o X 1 k ∈ X 1 k ∈ X where = (ε) , k D r D and Let v 0,1 r−k 0 and consider Rk Rk i=1 1 × i=k+1 0 ∈{ } \{ } D0 = [0,ε),D1 = (1−ε,(cid:16)1Q].ThenP(cid:17)(∪rk=(cid:16)0QBk,n(ε))≥(cid:17)1−δ Z(V[r−k] )= V[r−k](y u)V[r−k](y u+v) startingfromsomen=n(ε,δ). {u,u+v} | | Xy q The proofsof these statementsare givenin a later partof this 1 section. = V(y x)V(y x′+v′) 2k | | We need the following lemma. Xy sXx Xx′ 1 Lemma 1: For a DMC with q-ary input, I(W) and Z(W) V(y x)V(y x′+v′) are related by ≤ 2k y x x′ | | XXXp I(W)≥log1+ ri=122ri−1Zi(W) (9) = 21k Xx,x′Z(V{x,x′+v′}) r <2kε I(W) 1PZ (W)2. (10) i ≤ − Xi=1p where v′ =0kv1v2...vr−k. The last inequality follows from thefactthatZ (V)<εfori=k+1,...,r.SinceZ (V[r−k]) For r = 1 these inequalities are proved in [1]. For r > 1 i i is the average of the Z(V[r−k] ) over all v with wt (v)=i, Eq.(9)is a restatementof [3, Prop.3]using(7). Thefactthat {u,u+v} r (10)holdsforallr>1isnew,andisprovedintheAppendix. Zi(V[r−k]) < 2kε for all i = 1,...,r k. Now the lemma − Inequalities (9)-(10) imply that if (Z ,...,Z ) (ε) follows from (9) in Lemma 1, 1 r k then I(W) (r k) δ where δ max(k√ε,∈(2Rr−k Itturnsoutthatthe channelsfor individualbits convergeto 1)εlo|ge). − − | ≤ ≥ − either perfect or fully noisy channels. If the channel for bit j The following proposition is an immediate corollary of the isperfectthenthechannelsforallbitsi,r i>j areperfect. ≥ above results. If the channel for bit i is noisy then the channels for all bits Proposition 2: (a)TherandomvariableI issupportedon j,1 j < i are noisy. The total number of near-perfect bits ∞ ≤ theset 0,1,...,r . approachesI(W).Thisismadeformalinthenextproposition. { } Proposition 3: LetΩ = ω : (Z ,Z ,...,Z ) = where if j ,k = 0,1,...,r 1, then the maximum k 1,∞ 2,∞ r,∞ k,n { ∈ A − 1k0r−k ,k =0,1,...,r.Foreveryω Ω is computed over the symbols x with the fixed (known) k } ∈ ∈X values of the first k bits. lim I I(W[r−k]) =0. n→∞| n− n | The error probability of this decoding is estimated in Sect. II-E. Proof: For every ω Ω we have that I (ω) r k. k n ∈ → − CombiningthiswiththepreviouslemmaandProposition2(b), D. Proof of Theorem 1 we conclude that for such ω also I(Wn[r−k]) r k. Part (a) of Theorem 1 follows straightforwardly from [1], → − The concluding claim of this section describes the channel [3]. Namely, as shown in [1, Prop. 4], I(W+)+I(W−) = polarization and establishes that the total number of bits sent 2I(W). We note that the proof in [1] uses only the fact that over almost noiseless channels approaches NI(W). u1,u2 are recoverable from x1,x2 which is true in our case. Theorem 2: ForanyDMCW : thechannelsW(i) HencethesequenceIn,n 1formsaboundedmartingale.By polarizetooneofther+1extremXal→conYfigurations.NameNly, Doob’s theorem [13, p.19≥6], it convergesa.e. in L1(Ω, ,P) F letVi =WN(i)and to a random variable I∞ with E(I∞)=I(W). Toprovepart(b)weshowthateachoftheZ ’sconverges i,n i [N]: I(V ) k <δ I(V[k]) k <δ a.s. to a (0,1) Bernoulli random variable Z . This conver- π = |{ ∈ | i − | ∧| i − | }|, i,∞ k,N N gence occurs in a concerted way in that the limit r.v.’s obey Z Z a.e. if j < i. This is shown by observing that whereδ > 0,thenlimN→∞πk,N = P(I∞ = k)forallk = fojr,∞an≥y fixie,∞d i = 1,...,r and for all v , the Z (W) 0,1,...,r.Consequently ∈ Xi v,n converge to identical copies of a Bernoulli random variable. r kπk I(W). 1) Convergence of Z ,v : In this section we shall → v,n ∈ X Xk=1 provethattheBhattacharyyaparametersZv,n convergealmost surely to Bernoulli random variables. The proof forms the ThistheoremfollowsdirectlyfromTheorem1andProposi- main technical result of this paper and is accomplished in tions 2 and 3. Some examplesof convergenceto the extremal several steps. configurationsdescribedby thistheoremare givenin Sect. III Lemma 3: Let below. Z(j) (W)= maxZ (W), j =1,...,r. C. Transmission with polar codes max v∈Xj v Let us describe a scheme of transmitting over the channel Then W with polar codes. Take ε > 0 and choose a sufficiently Z(r−j)(W+)=Z(r−j)(W)2, j =0,...,r 1. (11) large n. Assume that the length of the code is N = 2n. max max − Proposition1impliesthatset[N],apartfromasmallsubset,is Z(r) (W−) qZ(r) (W) (12) partitionedinto r+1 subsets such that for j the max ≤ max vector (Z1(WN(j)),Z2(WN(j)),A..k.,,nZr(WN(j))) ∈ R∈k(Aε)k.,nEach Zm(ra−x1)(W−)≤ 2qZm(ra)x(W)+ 2qZm(ra−x1)(W) (13) j refers to an r-bit symbol in which r k rightmost ∈ Ak,n − and generally bits correspondto small valuesof Z (W(j)). To transmitdata i N over the channel, we write the data bits in these coordinates Z(r−j)(W−) qZ(r) (W)+ qZ(r−1)(W)+ and encode them using the linear transformation G . max ≤ 2 max 4 max N q q More specifically, let us order the coordinates j [N] + Z(r−j+1)(W)+ Z(r−j)(W). (14) by the increase of the quantity r 2i−1Z (W(j)) an∈d use ··· 2j max 2j max i=1 i N these numbers to locate the subsets . We transmit data Proof: In [3] it is shown that for all v 0 by encoding messages uN1 =P(u1A,.k.,n.,uN) in which if Z (W+)=Z (W)2 ∈X\{ } (15) v v j ,k =0,...,r 1 then the symbol u is taken from k,n j th∈e sAubset of symbols−of with the first k symbols fixed Zv(W−) 2Zv(W)+ Zδ(W)Zv+δ(W). (16) X ≤ and known to both the encoder and the decoder ([1] calls δ∈XX\{0,−v} them frozenbits). In particular, the subset r,n is not used to The first of these two equations implies (11). Now take v transmitdata.ApolarcodewordiscomputeAdasxN1 =uN1 GN r. Then in the sum on the right-hand side of (16) we hav∈e and sent over the channel. Xthat either δ or δ+v , and r r Decoding is performed using the “successive cancellation” ∈X ∈X procedure of [1] with the obvious constraints on the symbol Zv(W−)≤2Zv(W)+(q−2)Zm(ra)x(W), values. Namely, for j =1,...,N put implying (12). Now take v ,j 1. The sum on δ r−j ∈ X ≥ in (16) contains q/2 terms with δ , q/4 terms with uj, j r,n ∈ Xr uˆ = ∈A δ , and so on, before reaching . Finally, let j (argmaxxWN(j)(y1N,uˆ1j−1|x), j ∈∪k≤r−1Ak,n δ ∈∈∪riX=−rj1−X1r−i\{−v}. There are (q/2j)−2Xpr−osjsibilities, and for each of them either v+δ or δ is in . This implies thelaststepby(20).Thishowevercontradictsthealmostsure r−j X (14) and therefore also (13). convergenceof Y . n In particular, take j =0. Relations (11), (12) imply that (c) This implies that P(Y < 1/2) = P(Y 0) = ∞ n → P(U 0). From (19) Z(r) =(Z(r) )2 if B =+ (17) n → max,n+1 max,n n+1 1 1 1 Zm(ra)x,n+1 ≤qZm(ra)x,n if Bn+1 =−. (18) P(Un →0)≥ 2 provided that U0 ≤ 4 β. (21) (cid:16) (cid:17) Iterated randommaps of this kind were studied in [14] which Moreover, if U (1/2)1/β then either Y 0 or Y 1/2 0 n n ≤ → ≥ contains general results on their convergence and stationary for some n. This translates to distributions. We need more detailed information about this P((U 0) or (U (1/2)1/β for some n))=1 (22) process, established in the following lemma. n → n ≥ Lemma 4: Let Un,n ≥ 0 be a sequence of random vari- provided that U0 ≤(1/2)1/β. ables adapted to a filtration n with the following properties: (d) Let δ > 0 be such that q(1)β1 < 1 δ (depending on (i) Un [0,1] F q this may require taking a suffi2ciently sm−all β). Let L := ∈ ((iiii)i)PU(nU+n1+≤1 =qUUnn2f|oFrns)o≥me1/q2∈Z+. c[0a,n(n41o)tβ1m]oavnedfrRom:=Lt[o1R−wδ,it1h]o.uOtbvsiseirtvinegthCat:=the((p1r)oβ1c,e1ss Uδ)n. Then there are events Ω ,Ω such that P(Ω Ω )= 1 and 2 − 0 1 0∪ 1 Let σ1 be the first time when Un C, let η1 be the first time U (ω) i for ω Ω ,i=0,1. ∈ n → ∈ i after σ1 when Un L R, let σ2 be the first time after η1 Proof: (a) First let us rescale the process U so that in ∈ ∪ n whenU C,etc.,σ <η <σ <η <....Weshallprove n 1 1 2 2 the neighborhoodof zero it has a drift to zero. Let β (0,1) ∈ ∈ that every sample path of the process eventuallystays outside be such that C, i.e., that for almost all ω there exists k =k(ω)< such qβ 1<1/4. ∞ − that σk(ω)= . ∞ 1LY/e2t=.XLYnet=YUnonβ=r. TXakmeinτ((nω,τ)).toObneththeeefivresntttiYmne≥wh1e/n2Xwne(ωh)av≥e (hsaivnAecsesuPm(eσkth+e1c<on∞tra)ry<, i.Pe.(,σlkim<k→∞∞)P, t(hσiks l<im∞it )ex=istαs.)>W0e n n+1 E(Y Y )=0 ∞ n+1 n n − |F P( k :σ = ) P(σ = ;U L;σ = ) while on the event Y <1/2 we have ∃ k ∞ ≥ j 6 ∞ ηj ∈ j+1 ∞ n j=1 X 1 1 ∞ E(Yn+1−Yn|Fn)≤ 2(Yn2−Yn)+ 2(qβYn−Yn) ≥α P(Uηj ∈L;σj+1 =∞|σj 6=∞). (23) 1 Xj=1 Y 0. ≤−8 n ≤ Consider the process Un′ = Uσk+n on the event σk < ∞ gTahlieswimhpiclihesisthbaotuthnedesdeqbueetnwceeenYn0,nan≥d 01.foBrymtshaescuopnevremrgaertnicne- p(wroitchessthehasmethaesurseamreenporrompaelritzieeds (biy)-(iPii)(σaks U<n.∞L)e)t. JTh=is itsheaoreramn,doYmn v→ariaYbl∞e sau.pe.poarnteddionnL[10(,Ω1],.FT,hPis),imwphleieres tYh∞at P⌈lo(Ug2′(β1 lLog)1−δ21−/J4)b⌉y,tphreonpexr2tJy∈(ii)L.Nfoorwancyonxs∈ideCr.thTeheprreofcoerses, J ∈ ≥ EY EY EY . Further, if X [0,1/4] then (since U′ on the event U′ L. This process has properties (i)- 0 ≥ n ↓ ∞ 0 ∈ J+n J ∈ EY =EX ) (iii), so we can use (21) to conclude that for 0 0 P(Y∞ ≥1/2)≤2EY0 ≤1/2. (19) P(Uηk ∈L;σk+1 =∞|σk 6=∞)≥2−(J+1) uniformly in k. But then the sum in (23) is equal to infinity, (b)NowweshallprovethatP(Y (δ,1 δ))=0forany ∞ ∈ 2− a contradiction. δ > 0. From (ii) it follows that P(X = X2 ) 1/2, n+1 n|Fn ≥ (e) The proof is completed by showing that the probability which implies that of U staying in Rc = [0,1] R without converging to zero n \ P(Y =Y2 ) 1/2 on Y <1/2 (20) is zero. We know that almost all trajectories stay outside C, n+1 n|Fn ≥ n so suppose that the process starts in (0,(1/2)1/β). Then the for all n 0. Suppose that Y takes values in (δ,1/2 δ) ∞ probability that it enters L in a finite number of steps is ≥ − with probability α > 0. Let A = ω : Y (δ,1/2 δ) . n n uniformly bounded from below (this is shown similarly to { ∈ − } Since Y Y a.e., the Egorov theorem implies that there n ∞ (23)), so the probability that it does not go to L is zero. Next → is a subset of probability arbitrarily close to P(A ) which n assumethatthe processstartsin L, thenby(22)it eithergoes this convergence is uniform, and thus P(A ) α/2 for all n to zero or enters C with probability one. Together with part ≥ sufficiently large n. Therefore (d) this implies that the process that starts in L converges to P(Y Y δ2/2) P(Y =Y2,Y (δ,1/2 δ)) zero or one with probability one. | n+1− n|≥ ≥ α n+1 n n ∈ − Lemma 5: LetV : ˜beachannel.Letv,v′ 0 , X →Y ∈X\{ } ≥ 4 be such that wt (v) wt (v′). For any δ′ > 0 there exists r r ≥ δ > 0 such that Zv′(V) 1 δ′ whenever Zv(V) 1 δ. valuedrandomvariable.Butso isZm(ra)x,∞,sothepossibilities ≥ − ≥ − In particular, we can take δ =δ′q−3. are Proof: If wt (v)= 1 then v =10...0, so the statement r iesvetrriyvipaal.irLxet,xZ′v=(Vx)+≥v1−wδe,hwahveereZw(Vtr{x(v,x)′}=) i≥12.εT,hwenheforer (Zm[ra−x1,∞,r],Zm(ra)x,∞)=(1,1) or (1,0) or (0,0) ≥ − ε = qδ. Consider the unit-length vectors z = ( V(y x),y ˜),z′ = ( V(y x′),y ˜), and let θ(z,z′) be th|e angl∈e with probability one (note that (0,1) is ruled out by the Ybanetdwseoenzthpemz.′ W2|e=h2ave2∈cocosY(sθ(θ(z(z,z,z′)′)))= Z2(εV.{xp,x′}) ≥ 1−ε, dLeefimnmitiao5n(othfiZssm[rtaa−xt1e,mr]e).ntIfhoZlmd(rsa)xtr,∞aje=cto1ryt-hweinseZ).m(Irfa−xo1,n∞)th=eo1thbeyr Nowktak−e akpair o−f symbols x ,x≤ = x + v′ where hand, the case that is realized is (1,0) then Zm(ra−x1,∞) = 1 by 1 2 1 v′ s,s i. There exists a number t r−i+s the definitionof Zm[ra−x1,r]. Finally in the case (0,0)we clearly such∈thXat v′ =≤tv. Define z1 = ( V(y x1),y ∈ X˜) and have that Zm(ra−x1,∞) =0, both holding trajectory-wise. z = ( V(y x ),y ˜). Let w = ( V| (y x +∈jYv),y The general induction step is almost exactly the same. ˜2),j =1,...|,t2 1.∈FroYm the triajngple inequal|ity1 ∈ Assume that we have proved the required convergence for Y p − p Z(r−i),i = 0,1,...,j 1. Assume that Z[r−j,r] = 0, then kz1−z2k≤kz1−w1k+kw1−w2k+···+kwt−1−z2k Zm(ra−xj) = 0. If on the−other hand, Z[r−j,r]m=ax,∞1 then either max max,∞ ≤t√2ε one of Zm(ra−xi,)∞,i < j equals one, and then Zm(ra−xj,∞) = 1 by q√2ε. Lemma5,orZ(r−i) =0foralli<j,andthenZ(r−j) =1 ≤ max,∞ max,∞ by definition of Z[r−j,r]. We obtain max,∞ Now we are in a position to complete the proof of conver- Z(V{x1,x2})=cos(θ(z1,z2))=1−1/2kz1−z2k2 gence. 1 q2ε ≥ − Lemma 7: Zv,n Zv,∞ a.e., where Zv,∞ is a (0,1)- =1 q3δ. valued random varia→ble whose distribution depends only on − the ordered weight wt (v). Thus we obtain r Proof: Let Ω(j) = ω : Z(j) i , where i = 0,1 Zv′(V)= 1q Z(V{x,x+v})≥1−q3δ. and j = 1,...,r,iwhere{some omfaxt,hne→even}ts may be empty. Xx For every ω ∈ Ω(1j),j = 1,...,r we have that for any δ > 0 starting with some n the quantity Z(j) 1 δ. Thus, for 0 max,n ≥ − Remark : We can prove the previous lemma in a different n n there exists v , possibly depending on n, such 0 j ≥ ∈ X way by relating the Bhattacharyya distance to the ℓ1-distance thatZv,n(ω)≥1−δ.ThenLemma5impliesthatZv′,n(ω)≥ bδ′eqtw−3eecnanVb(ey|ixm1p)raonvdedVto(yδ|x=2)δ′[(125q]).−T2h.en the estimate δ = i1f−ωq3δΩf(jo)rtahlelnv′Z∈ X(ωj,)so Z0v,fno(rωa)ll→v 1. At. the same time, ∈ 0 v,n → ∈Xj Lemma 6: For all j =1,...,r Z(j) a.e. Z(j) . 2) Proof of Part (b) of Theorem 1: max,n −→ max,∞ Lemma 8: For any i = 1,...,r, the random variable where Zm(ja)x,∞ is a Bernoulli random variable supported on Zi,n converges a.e. to a (0,1)-valued random variable Zi,∞. 0,1 . Moreover, Z Z a.e. { } i,∞ ≥ i−1,∞ Proof: For a given channel V denote Proof: The first part follows because all the Z ,v v i ∈ X Z[s,r](V)=max(Z(s) (V),Z(s+1)(V),...,Z(r) (V)). converge to identical copies of the same random variable. max max max max Formally, Lemma 7 asserts that Z j for every v v,n i → ∈ X Eq. (15) gives us that and every ω Ω(i),j = 0,1. Hence taking the limit n ∈ j →∞ Zm[ra−xj,r](W+)=(Zm[ra−xj,r](W))2 in(6)weseethatZi,n →j onΩj(i) whereP(Ω0(i)∪Ω(1i))=1. Let us prove the second part. Suppose that Z 1 ε′, and (14) implies that i,n ≥ − then using (6) we see that Zv′,n 1 2i−1ε′ for all v′ i. ≥ − ∈X Zm[ra−xj,r](W−)≤qZm[ra−xj,r](W). Le,mwmt a(v5) i=mpil,ieasndthattheZrevf,onre≥Z1 − 23r1+i−12ε3′r+foi−r1aεn′.yTvhu∈s Hence by Lemma 4 the random variables Zm[ra−xj,,∞r] are well- XZ (ωr) 1 implies Z (ω) i,1nf≥or all−ω Ω (i) and all i,n i−1 1 defined and are Bernoulli 0-1 valued a.e. for all j = → → ∈ i. The second claim of the lemma now follows because Z i,∞ 0,1,...,r 1. are 0-1 valued for all i. We need−to prove the same for Z(r−j) . The proof is max,∞ by induction on j. We just established the needed claim for We obtain that Z is a (0,1) random variable a.e. and i,∞ Z(r) .Foreaseofunderstandingletusshowthatthisimplies for all i, and if Z = 1 then Z = 1 for all 1 j < i. max,n i,∞ j,∞ ≤ theconvergenceofZ(r−1).Indeed,Z[r−1,r] isaBernoulli0-1 Consider the events Ψ(j) = ω : Z = i ,i = 0,1;j = max,n max,∞ i { j,∞ } 1,...,r. We have arise with probability 0), and therefore has to be, and is, a stable pointofthe channelcombiningoperation.It is possible Ψ(1) Ψ(2) Ψ(r) 1 ⊃ 1 ⊃···⊃ 1 to reach capacity by transmitting the least significant bit of Ψ(1) Ψ(2) Ψ(r). every symbol. 0 ⊂ 0 ⊂···⊂ 0 Paper [3] went on to show that for every n 1 there We need to prove that with probability one, the vector ≥ exists a permutation π : such that the kernels n (Z ,i = 1,...,r) takes one of the values (8). With X → X i,∞ H (n):(u,v) (u+v,π (v))leadtochannelsthatpolarize 2 n probability one Z = 1 or 0. If it is equal to 1 then → r,∞ to perfect or fully noisy. While the result of [3] holds for necessarily Zr−1,∞ =···=Z1,∞ =1. Otherwise Zr,∞ =0. any q, in the case of q = 2r this means that configurations In this case it is possible that Z = 1 (in which case r−1,∞ 00...0and11...1arisewithprobability1 I(W)andI(W) Z = = Z = 1) or Z = 0. Of course − r−2,∞ 1,∞ r−1,∞ respectively,whilealltheotherconfigurationshaveprobability ··· P(Ψ(0r−1)∪Ψ(1r−1))=1, so in particular zero. P(Ψ(r) (Ψ(r−1) (Ψ(r−1) Ψ(r))))=0. E. Rate of polarization and error probability of decoding 0 \ 0 ∪ 1 \ 1 If Z =0 then the possibilities are Z =1 or 0, up The following theorem, due to Arıkan and Telatar [16], is r−1,∞ r−2,∞ to another event of probability 0, and so on. Thus, the union useful in quantifying the rate of convergence of the channels of the disjoint events given by (8) holds with probability one. Wn to one of the extremal configurations (8). Theorem 1 is proved. Theorem 3: [16] SupposethatarandomprocessUn,n 0 ≥ satisfiestheconditions(i)-(iii)ofLemma4andthat(iv),U 3) ProofofProp.1: Theproofisanalogoustotheargument n convergesa.e.toa 0,1 -valuedrandomvariable U with in the previousparagraph.The randomvariable Zr,n →Zr,∞ P(U =0)=p.The{nfor}anyα (0,1/2) ∞ a.e. . By the Egorovtheorem, for any γ >0 there are disjoint ∞ ∈ subsetsΨ(r) Ψ(r),Ψ(r) Ψ(r) withP(Ψ(r) Ψ(r)) 1 γ lim P(U <2−2αn)=p. (24) on which0thi⊂s co0nverg1enc∈e is1uniform. Ta0ke∪n(r1) su≥ch t−hat n→∞ n Z >e1 ε/24r−1efor every ω Ψ(r)eand ne1 n(r). By Ifcondition(iii)isreplacedwith(iii′)Un Un+1andU0 >0, r,n − ∈ 1 ≥ 1 thenforanyα>1/2, ≤ Lemma 5 and (6) for every such ω we have Z 1 ε i,n ≥ − feovrenatllBi =. O1,th..e.rw,ris−e l1e;t nn(r≥) bne(1rs)u.ecThhtihsatgisvueps riZse to<thεe nl→im∞P(Un <2−2αn)=0. r,n 0 ω r,n for ω Ψ(r) and n n(r). Consider the events Ψ(r−1) Note that, as a consequence of Lemma 4, assumption (iv) in Ψ(r−1)∈,Ψ(r0−1) Ψ(r≥−1)w0ithP(Ψ(r−1) Ψ(r−1)) 01 γo⊂n this theorem is superfluous in that it follows from (i)-(iii). wh0ich Zr−e11,n →⊂Zr−11,∞ uniformly0.Cho∪ose1n(1r−1)≥esuc−h that conPdroitcioesnsses(i)Z-(m(iria)ix),nofanLdemZmm[raa−xj4,,nr.],Hjen=ce0t,h.e..a,bro−ve1thseaotrisefmy Z >e1 ε/24r−2 forall n en(r−1) aendall ω Ψ(r−1). r−1,n − ≥ 1 ∈ 1 gives the rate of convergence of each of them to zero. We Foreverysuchω wehaveZ 1 εforalli=1,...,r 2; n≥n(1r−1). Next, i,n ≥ − e − aalrsgouegothvaetrnthede cboynvTehrgeoenrecme r3at.eInodfeZedm(r,a−xlje,n)t,Ωj[≥r−j1,r]to=zeroωis: i { P(Ψ(0r)\(Ψ(0r−1)∪(Ψ(1r−1)\Ψ(1r))))≤2γ. Zm[ra−xj,,nr] →i},Ω(ir−j) ={ω :Zm(ra−xj,n) →i},i=0,1. Then We continue in this manner until we construct all the r+1 Ω(r−j) Ω[r−j,r] and Ω(r−j) =Ω[r−j,r] (25) e e e e 0 ⊇ 0 1 1 events B . For this, n should be taken sufficiently large, n≥maxkk,nmax(n0(k),n(1k)).Bytakingγ =δ/r wecanensure Zth(er−lajs)t equ1aliotynbeevcearuysetrabjyecLtoermy.mAas5a, Zcmo[rna−sxje,,nqr]ue→nc1e iomfp(l2ie5s) that P(∪kBk,n ≥1−δ. This concludes the proof. wmeahxa,nve→that P(Ω(r−j) Ω[r−j,r]) = 0. Hence P(Z(r−j) = Remark : For binary-inputchannels, the transmitted bits in 0 \ 0 max,∞ 0)=P(Z[r−j,r] =0). Denote this common value by p . The the limit are transmitted either perfectly or carry no informa- max,∞ j randomvariableZ[r−j,r] satisfiesa conditionoftheform(24) tion about the message. S¸as¸og˘lu et al. [3] observed that q-ary max,n with p=p . We obtain that for any α (0,1/2) codesconstructedusingArıkan’skernelH2sharethisproperty j ∈ fortransmittedsymbolsonlyifq isprime.Otherwise[3]notes lim P(Z(r−j) <2−2αn)= lim P(Z(r−j) <2−2αn)=p . max,n max,n j thesymbolscanpolarizetostatesthatcarrypartialinformation n→∞ n→∞ about the transmission. In particular, they give an example of Of course if Z(r−j) is small then so is every Z for v max,n v,n a quaternary-input channel W : 0,1,2,3 0,1 with . We conclude as follows. ∈ W(00) = W(02) = W(11) = W{ (13) =}1→. T{his c}hannel XrP−rjoposition 4: For any α (0,1/2) and any v ,j = has c|apacity 1 |bit. Compu|ting the ch|annels W+ and W− 1,2,...,r ∈ ∈ Xj we find that they are equivalent to the original channel W. lim P(Z <2−2αn)=p . v,n j The conclusion reached in [3] was that there are nonbinary n→∞ channels that do not polarize under the action of H . Thisresultenablesustoestimatetheprobabilityofdecoding 2 We observe that the above channel corresponds to the error under successive cancellation decoding. To do this, we extremalconfiguration10 in (8) (the other two configurations extend the argument of [1] to nonbinary alphabets. The following statement follows directly from the previ- By Theorem 4, for any R < I(W) there exists a se- ously established results, notably Proposition 2. quenceof r-tuples of disjointsubsets ,..., with 0,N r−1,N A A (r k) NR such that Theorem 4: Let 0<α<1/2. For any DMC W : k|Ak,N| − ≥ X →Y with I(W) > 0 and any R < I(W) there exists a sequence P Z (W(i)) qN2−Nα of r-tuples of disjoint subsets 0,N,..., r−1,N of [N] such v N ≤ that (r k) NRAand Z (WA(i))<2−Nα for all i∈A0,N∪X···∪Ar−1,Nv∈XX(ak1) i∈APk,kN|,Aakl,lNv|∈−rl=k+≥1Xl, and allvk =N0,1,...,r−1. and thus we obtain that P(E)=O(2−Nα). Let S III. ORDERED CHANNELS , (uN,yN) N N :uˆN =uN E { 1 1 ∈X ×Y 1 6 1 } To compute a few examples, consider “ordered symmetric Bi ,{(uN1 ,y1N)∈XN ×YN :uˆi1−1 =ui1−1,uˆi 6=ui}. channels,”calledsobecausetheyprovideanaturalcounterpart to the combinatorial definition of the ordered distance [8]. Then the block error probability of decoding is defined as A simple example is given by the ordered erasure channel, Pe =P(E)=P Bi . defined as Wr :Frq →(Fq ∪{?})r, where (cid:0)i∈A0,N∪[···∪Ar−1,N (cid:1) ε , y =x, 0 W (y x)= The next theorem is the main result of this section. r | (εi, y =(??...?xi+1...xr),1 i r Theorem 5: Let 0 < α < 1/2 and let 0 < R < I(W), ≤ ≤ where W : is a DMC. The best achievable andWr(y x)=0ify doesnotcontainanyerasedcoordinates error probabilitXy of→blocYk error under successive cancellation andy =x|. Itscapacityequalsr r iε andisattainedby 6 − i=1 i decoding at block length N =2n and rate R satisfies sendingr independentstreamsofdataencodedforbinaryera- surechannelswitherasureprobabiPlities r ε ,i=1,...,r. P =O(2−Nα). j=i j e Therefore, sending r independentpolar codewords over the r P bit channels, one can approach the capacity of the channel. Proof: Let Despitethefactthatthisexampleistrivial,italreadyshows , (uN,yN) N N : the domination pattern observed in Theorem 1. Namely, it is Ei,v { 1 1 ∈X ×Y WN(i)(y1N,ui1−1|ui)≤WN(i)(y1N,ui1−1|ui+v)}. ethaesryebtoyepsrtoavbelisdhiirnegcttlhyethreastuZltjo,∞fL≥emZmia,∞8.aF.os.rfthoartailtlsiuf>ficejs, For a fixed value of ak = (a ,a ,...,a ) 0,1 k let us toobservethattheerasureinhigher-numberedbitsimpliesthat 1 1 2 k ∈ { } define (ak)= x : xk = ak . Notice that the decoder all the lower-numberedbits are erased with probability 1. We X 1 { ∈ X 1 1} finds uˆi, i k,N by taking the maximum over the symbols include two examples. In Fig. 1, r = 2, and ε0 = 0.5,ε1 = ∈A x∈X(ak1). Then we obtain 0.4,ε2 = 0.1. In Fig. 2, r = 9 and εi = 0.1,i = 0,1,...,9. Note that the proportion of the channels with capacity i = Bi ⊆ Ei,v. 0,1,...,r bits converges to εi. v∈X[(ak1) Anotherexampleisgivenbytheorderedsymmetricchannel [8] which is a DMC W : 0,1 r 0,1 r defined by the Using (5), we obtain { } → { } matrix W(y x) where | P( ) P( ) Bi ≤ Ei,v W(y x)=2−(j−1)ε (26) j v∈XX(ak1) | = 1 W (yN uN)1 (uN,yN) for all pairs y,x such that dr(x,y) = j, j = 1,...,r, and qN N 1 | 1 Ei,v 1 1 where W(xx) = ε for all x . The ordered symmetric v∈XX(ak1)uN1X,y1N channel mod|els tran0smission ov∈erXr parallel links such that, 1 W(i)(yN,ui−1 u +v) if in a given time slot a bit is received incorrectly, the bits ≤v∈XX(ak1)uN1X,y1N qNWN(y1N|uN1 )vuu WNN(i)(1y1N,1ui1−|1i|ui) wpriothpoisneddiciens[l1o9w]earsthananabitstarraecteioqnuiopfrotbraanbslem.iTsshioisnsiynstwemirewleasss 1 t fading environment. The capacity of the channel equals = Z(W(i) ) q N,{ui,ui+v} v∈XX(ak1)Xui I(W)=r+ε log ε + r ε log εi . = Z (W(i)). 0 q 0 i q qi−1(q 1) v N Xi=1 (cid:16) − (cid:17) v∈XX(ak1) By Theorem 1 q-ary polar codes, q = 2r can be used to Thus the decoding error is bounded by transmit at rates close to capacity on this channel; moreover, the domination pattern that emerges, exactly matches the P( ) Z (W(i)). E ≤ v N fading nature of the bundle of r parallel channels, achieving i∈A0,N∪X···∪Ar−1,Nv∈XX(ak1) the capacity of the system discussed above. (cid:2) Let z = (z ,...,z ) be an k-tuple of symbols from . 1 k (cid:7)(cid:9)(cid:4) DefinetheprobabilitydistributionP(y z)= 1 k W(y zX). (cid:7)(cid:9)(cid:6) Define a B-DMC W(k) : k | wkith iin=p1uts z(|i)i (cid:7)(cid:9)(cid:3) {z(1),z(2)} X → Y P ∈ (cid:7)(cid:9)(cid:2) k, where the transition z(i) y is given by P(y z(i)), i= (cid:21)(cid:22)(cid:23)(cid:24) (cid:7) 1X,2. → | (cid:12) (cid:12)(cid:20) (cid:1)(cid:9)(cid:4) Lemma 9: The Bhattacharyya parameter of the chan- (cid:10) (cid:1)(cid:9)(cid:6) nel W(k) , where z(1) = (x ,...,x ),z(2) = {z(1),z(2)} 1 k (cid:1)(cid:9)(cid:3) (x ,...,x ), can be lower bounded by k+1 2k (cid:1)(cid:9)(cid:2) k (cid:1) 1 (cid:1) (cid:2)(cid:1)(cid:3)(cid:4) (cid:3)(cid:1)(cid:5)(cid:6) (cid:6)(cid:7)(cid:3)(cid:3) (cid:4)(cid:7)(cid:5)(cid:2) (cid:7)(cid:1)(cid:2)(cid:3)(cid:1) (cid:7)(cid:2)(cid:2)(cid:4)(cid:4) (cid:7)(cid:3)(cid:8)(cid:8)(cid:6) (cid:7)(cid:6)(cid:8)(cid:4)(cid:3) Z(W{(zk()1),z(2)})≥ k Z(W{xj,xf(j)}) (27) (cid:10)(cid:11)(cid:12)(cid:13)(cid:13)(cid:14)(cid:15)(cid:16)(cid:17)(cid:13)(cid:18)(cid:14)(cid:19) Xj=1 for any f which is a one-to-one mapping from the set Fig. 1. 3-level polarization on the ordered erasure channel W : X → 1,2,...,k to k+1,...,2k . Y,X = {00,01,10,11} with transition probabilities ε0 := W(00|00) = { } { } 0.5,ε1 := W(?x2|x1x2) = 0.4,ε2 := W(??|x1,x2) = 0.1, for all Proof: It suffices to prove the above inequality for some x1,x2∈{0,1}.InthisexampleitiseasytoseethatP(I∞=i)=εi,i= one-to-one mapping. Let f(i) = k + i. For brevity denote 0,1,2. w =W(y x ). We have i,y i | (cid:3) k 2k 1 (cid:5) Z(W{(zk()1),z(2)})= k v wi,y wi,y , (cid:10) Xy uu(cid:18)Xi=1 (cid:19)(cid:18)i=Xk+1 (cid:19) t (cid:4) while the right hand side of (27) is (cid:16)(cid:17)(cid:18)(cid:17)(cid:19)(cid:20)(cid:21)(cid:22) (cid:2)(cid:9) k1 k Z(W{xj,xf(j)})= k1 k √wi,ywk+i,y. j=1 y i=1 (cid:8) X XX The Cauchy-Schwartz inequality gives us (cid:7) (cid:6) k 2k k 2 (cid:1) wi,y wi,y ≥ √wi,ywk+i,y (cid:1) (cid:2)(cid:1)(cid:3)(cid:4) (cid:5)(cid:6)(cid:3)(cid:7) (cid:6)(cid:7)(cid:7)(cid:5)(cid:5) (cid:6)(cid:4)(cid:8)(cid:5)(cid:2) (cid:7)(cid:1)(cid:2)(cid:5)(cid:1) (cid:7)(cid:2)(cid:9)(cid:10)(cid:4) (cid:7)(cid:5)(cid:4)(cid:10)(cid:7) (cid:8)(cid:7)(cid:10)(cid:4)(cid:5) (cid:16)Xi=1 (cid:17)(cid:16)i=Xk+1 (cid:17) (cid:16)Xi=1 (cid:17) (cid:11)(cid:12)(cid:13)(cid:14)(cid:15) hence the lemma. Fig.2. 10-levelpolarizationontheorderederasurechannelW :{0,1}9→ Let us introduce some notation. Given z = (z1,...,zk) Y withtransition probabilities εi=0.1,i=0,1,...,9. k, let z x = (z x,...,z x) where is a bi∈t- 1 k X ⊕ ⊕ ⊕ ⊕ wise modulo-2 summation. In the next lemma we consider B-DMCs W(k) : k ,k = 2m−1,m = 1,...,r IV. CONCLUSION {zm(1),zm(2)} X → Y with inputs of special form. Namely, z(1) = x ; z(1) = 1 1 2 The result of this paper offers more detailed information (x ,x x ); z(1) =(x ,x x ,x x ,x x x ), and about polarization on q-ary channels, q = 2r. The multilevel gen1era1ll⊕y, z2(1) i3s formed1 of1x⊕ p2lus1a⊕ll t3he p1o⊕ssi2bl⊕e su3ms of polarization adds flexibility to the design of the transmission m 1 the vectors x ,...,x with 0 1 coefficients, including the schemeinthatwecanadjustthenumberofsymbolsthatcarry empty one. F2inally, zm(2) =z(1)− x . a givennumberof bits to a specified proportionof the overall m m ⊕ m+1 For m = 0,1,...,r 1 introduce the set = transmission as long as the total number of bits is fixed. This − A (x ,...,x ) m+1 as follows: couldbeusefulinthedesignofsignalconstellationsforcoded A 1 m+1 ⊂X modulation, including BICM [17], [18] as well as in other = (x ,...,x ) m+1 x ;x 0 ; 1 m+1 1 2 communication problems that can benefit from nonuniform A ∈X ∈X ∈X\{ } symbol sets. nj−1 (cid:12) (cid:12) x = a x , for all choices of a 0,1 ,j =3,...,m+1 The authorsare gratefulto EmmanuelAbbe, Eren S¸as¸og˘lu, j 6 i i i ∈{ } and Emre Telatar (EPFL), and Leonid Koralov, Armand Xi=2 o We need the following technical lemma. Makowski,andHimanshuTyagi(UMD)forusefuldiscussions Lemma 10: of this work. This research was partially supported by NSF grants CCF0916919, CCF0830699, and DMS1117852. r 1 m 1 I(W)= I(W(k) ) APPENDIX mX=1(cid:18)2r jY=12r−2j−1(cid:19)A(x1,X...,xm+1) {zm(1),zm(2)} (28) Theproofof (10):We shallbreaktheexpressionforI(W) where the number k, the vectors z(1),z(2), and the set m m into a sum of symmetric capacities of B-DMCs. (x ,...,x ) are defined before the lemma. 1 m+1 A 1 1 Proof: First we express the capacity of W as the sum of + (W(y x x )+W(y x x x )) 1 3 1 2 3 2 · 2 | ⊕ | ⊕ ⊕ symmetric capacities of B-DMCs. 1(W(y x x )+W(y x x x )) log 2 | 1⊕ 3 | 1⊕ 2⊕ 3 · B I(W) (cid:19) 1 W(y x) = 2r W(y|x)log P(y|) Wwh(eyrexB =x41(Wx(y)|)x.1)+W(y|x1⊕x2)+W(y|x1⊕x3)+ x y 1 2 3 XX | ⊕ ⊕ 1 1 W(y x ) By now it is clear what we want to accomplish. Let us 1 = W(y x )log | 2r 2(2r 1) | 1 P(y) againtake the sum ony inside.Recalling the definitionofthe Xy − Xx1 x2X:x26=0(cid:18) channel W(k) before Lemma 9, we obtain W(y x x ) 1 2 +W(y x x )log | ⊕ | 1⊕ 2 P(y) 1 (2) 1 (cid:19) T2 = 2r 2 I(W{z(1),z(2)})+T3 ; = 2r(2r 1) − (cid:26)A(xX1,x2,x3) 2 2 (cid:27) − ·Xy xxX12,6=x02(cid:18)21W(y|x1)log 12(W(y|x1W)+(yW|x(1y)|x1⊕x2)) hDeMreCIW(W{(z{2(()z212())1,)z,(z22()2})})wiitsh tzh2(e1)s=ymm{xe1tr,ixc1c⊕apxac2i}tyanodf zth2(2e) B=- 2 2 + 21W(y|x1⊕x2)log 12(W(y|Wx1()y+|xW1⊕(yx|x2)1⊕x2)) {thxe1e⊕xpxre3s,sxio1n⊕foxr2T⊕2 uxp3o}n, aisnodlaTtin3gisthtihsecatepramcityre:maining in 1 + (W(y x )+W(y x x )) B 1 1 2 2 | | ⊕ T = Blog . 3 log 12(W(y|x1)+W(y|x1⊕x2)) Xy A(xX1,x2,x3) P(y) · P(y) (cid:19) Now repeat the above trick for T , namely, average over all 1 3 = 2r(2r 1) I(W{x1,x1⊕x2})+T2 the linear combinations that this time include the vector x4 − nxxX12,6=x02 o and isolate the symmetric capacity of the channel W(k) that arises. Proceeding in this manner, we obtain where 1 I(W)= I(W ) 1 2r(2r 1) {x1,x1⊕x2} T2 = 2(W(y|x1)+W(y|x1⊕x2)) − xxX12,6=x02 Xy xxX12,6=x02 + 1 I(W(2) ) log 21(W(y|x1)+W(y|x1⊕x2)) . 2r(2r−1)(2r−2)A(x1X,x2,x3) {z2(1),z2(2)} · P(y) 1 B o + Blog 2r(2r 1)(2r 2) P(y) Observethattheconditionx2 =0isneededinordertoobtain − − Xy A(x1X,x2,x3) 6 the expression for I(W ). =... {x1,x1⊕x2} We will apply the same technique repeatedly. In the next r 1 m 1 = I(W(k) ) tshteepcwonedaitdidonasnoxt3he=r0su,mx3,t=hisx2ti.mWeeonhaxv3ewhich hastosatisfy mX=1(cid:18)2r jY=12r−2j−1(cid:19)A(x1,X...,xm+1) {zm(1),zm(2)} 6 6 1 1 where the notation zm(1),zm(2), (x1,...,xm+1) is introduced T2 = 2(2r 2) 2(W(y|x1)+W(y|x1⊕x2)) before the statement of lemmaA. Xy − A(x1X,x2,x3)(cid:18) We continue with the proof of inequality (10). The term log 12(W(y|x1)+W(y|x1⊕x2)) with m=1 in (28) equals · P(y) 1 1 I(W ) + (W(y x1 x3)+W(y x1 x2 x3)) 2r(2r 1) {x1,x1⊕x2} 2 1(W| (y⊕x x )+W| (y⊕x ⊕x x )) − xxX12,6=x02 log 2 | 1⊕ 3 | 1⊕ 2⊕ 3 1 · P(y) 1 Z(W )2 (cid:19) ≤ 2r(2r 1) − {x1,x1⊕x2} = 2r1 2 12 · 21(W(y|x1)+W(y|x1⊕x2)) − xxX12r,6=x02q − Xy A(x1X,x2,x3)(cid:18) = 1 1 Z(W )2 1(W(y x )+W(y x x )) B 2r(2r 1) − {x1,x1⊕x2} ·log 2 | 1 B | 1⊕ 2 +BlogP(y) − Xd=1wtrxX(1x,2x)2=dq