Nested Polar Codes Achieve the Shannon Rate-Distortion Function and the Shannon Capacity Aria G. Sahebi and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109, USA. Email: [email protected], [email protected] Abstract—It is shown that nested polar codes achieve the denote the source by (X,U,p ,d). With a slight abuse of X 4 Shannon capacity of arbitrary discrete memoryless sources and notation, for xn ∈Xn and un ∈Un, we define theShannoncapacityofarbitrarydiscretememorylesschannels. 1 1 (cid:88)n 0 d(xn,un)= d(x ,u ) i i n 2 i=1 n I. INTRODUCTION For the channel coding problem, we consider discrete a memoryless and stationary channels used without feedback. Polar codes were originally proposed by Arikan in [1] J We associate two finite sets X and Y with the channel as to achieve the symmetric capacity of binary-input discrete 5 the channel input and output alphabets. These channels can memoryless channels. Polar codes for lossy source coding 2 were investigated in [2] where it is shown that polar codes be characterized by a conditional probability law W(y|x) for ] achievethesymmetricrate-distortionfunctionforsourceswith x ∈ X and y ∈ Y. The channel is specified by (X,Y,W). T The source of information generates messages over the set binaryreconstructionalphabets.Forthelosslesssourcecoding .I problem, the source polarization phenomenon is introduced in {1,2,...,M} uniformly for some positive integer M. s c [3] to compress a source down to its entropy. 2) Achievability and the Rate-Distortion Function for the [ It is well known that linear codes can at most achieve the symmetric capacity of discrete memoryless channels and SourceCodingProblem: Atransmissionsystemwithparame- 1 ters(n,Θ,∆,τ)forcompressingagivensource(X,U,p ,d) v thesymmetricrate-distortionfunctionfordiscretememoryless X consists of an encoding mapping and a decoding mapping 2 sources. This indicates that polar codes are optimal linear 8 codes in terms of the achievable rate. It is also known that Enc:Xn →{1,2,··· ,Θ}, 4 6 nested linear codes achieve the Shannon capacity of arbitrary Dec:{1,2,··· ,Θ}→Un discrete memoryless channels and the Shannon rate-distortion . 1 such that the following condition is met: function for arbitrary discrete memoryless sources. In this 0 4 paper, we investigate the performance of nested polar codes P (d(Xn,Dec(Enc(Xn)))>∆)≤τ 1 for the point-to-point channel and source coding problems where Xn is the random vector of length n generated by : and show that these codes achieve the Shannon capacity of v the source. In this transmission system, n denotes the block arbitrary (binary or non-binary) DMCs and the Shannon rate- i length, logΘ denotes the number of channel uses, ∆ denotes X distortion function for arbitrary DMSs. thedistortionlevelandτ denotestheprobabilityofexceeding r The results of this paper are general regarding the size of a the distortion level ∆. the channel and source alphabets. To generalize the results Given a source, a pair of non-negative real numbers (R,D) to non-binary cases, we use the approach of [4] in which is said to be achievable if there exists for every (cid:15) > 0, and it is shown that polar codes with their original (u,u + v) forallsufficientlylargenumbersnatransmissionsystemwith kernel, achieve the symmetric capacity of arbitrary discrete parameters (n,Θ,∆,τ) for compressing the source such that memoryless channels where + is the addition operation over 1 any finite Abelian group. logΘ≤R+(cid:15), ∆≤D+(cid:15), τ ≤(cid:15) n The optimal rate distortion function R (D) of the source is II. PRELIMINARIES ∗ given by the infimum of the rates R such that (R,D) is 1) Source and Channel Models: For the source coding achievable. problem, the source is modeled as a discrete-time random It is known that the optimal rate-distortion function is given process with each sample taking values in a fixed finite set X by: with probability distribution p . The reconstruction alphabet X R(D)= inf I(X;U) (1) is denoted by U and the quality of reconstruction is measured pU|X by a single-letter distortion function d : X ×U → R+. We EpXpU|X{d(X,U)}≤D where p is the conditional probability of U given X. where q = |X|. We use the following two quantities in the U X 3) Achi|evability and Capacity for the Channel Coding paper extensively: Problem: A transmission system with parameters (n,M,τ) 1 (cid:88) (cid:88) for reliable communication over a given channel (X,Y,W) Dd(W)= 2q |W(x|u)−W(x|u+d)| consists of an encoding mapping Enc : {1,2,...,M} → Xn u x ∈U ∈X and a decoding mapping Dec:Yn →{1,2,...,M} such that D˜ (W)= 1 (cid:88) (cid:88)(W(x|u)−W(x|u+d))2 d 2q u x 1 (cid:88)M ∈U ∈X Wn(Dec(Yn)(cid:54)=m|Xn =Enc(m))≤τ where d is some element of G and + is the group operation. M m=1 6) Binary Polar Codes: For any N = 2n, a polar code Givenachannel(X,Y,W),therateRissaidtobeachievable of length N designed for the channel (Z ,Y,W) is a linear 2 if for all (cid:15) > 0 and for all sufficiently large n, there (coset)codecharacterizedbyageneratormatrixG andaset N exists a transmission system for reliable communication with of indices A ⊆ {1,··· ,N} of almost perfect channels. The parameters (n,M,τ) such that generator matrix for polar codes is defined as G =B F n (cid:20) N (cid:21) N ⊗ 1 0 1 logM ≥R−(cid:15), τ ≤(cid:15) where BN is a permutation of rows, F = 1 1 and ⊗ n denotes the Kronecker product. The set A is a function of the channel. The decoding algorithm for polar codes is a specific Thechannelcapacityisthesupremumofthesetofachievable form of successive cancellation [1]. rates. It is known that the channel capacity is given by: 7) Polar Codes Over Abelian Groups: For any discrete C =supI(X;Y) (2) memoryless channel, there always exists an Abelian group of pX thesamesizeasthatofthechannelinputalphabet.Ingeneral, where p is the channel input distribution. for an Abelian group, there may not exist a multiplication X operation. Since polar encoders are characterized by a matrix 4) Groups, Rings and Fields: All groups referred to in this multiplication, before using these codes for channels of paper are Abelian groups. Given a group (G,+), a subset H arbitrary input alphabet sizes, a generator matrix for codes of G is called a subgroup of G if it is closed under the group over Abelian groups needs to be properly defined. Polar operation. In this case, (H,+) is a group in its own right. codes over Abelian groups are introduced in [4]. This is denoted by H ≤G. A coset C of a subgroup H is a shift of H by an arbitrary element a∈G (i.e. C =a+H for 8) Notation: We denote by O((cid:15)) any function of (cid:15) which some a∈G). For any subgroup H of G, its cosets partition is right-continuous around 0 and that O((cid:15))→0 as (cid:15)↓0. the group G. A transversal T of a subgroup H of G is a For positive integers N and r, let {A0,A1,··· ,Ar} be a subset of G containing one and only one element from each partition of the index set {1,2,··· ,N}. Given sets Tt for coset (shift) of H. Given an element d of G, (cid:104)d(cid:105) denotes the t = 0,··· ,r, the direct sum (cid:76)r TAt is defined as the set t=0 t subgroup of G generated by d. i.e. the smallest subgroup of of all tuples uN1 =(u1,··· ,uN) such that ui ∈Tt whenever G containing d. A subgroup M of G is called maximal if it i∈At. is a proper subgroup and there does not exist another proper subgroup of G containing M. III. THELOSSYSOURCECODINGPROBLEM 5) Channel Parameters: For a channel (X,Y,W), assume In this section, we prove the following theorem: X is equipped with the structure of a group (G,+). The Theorem III.1. For an arbitrary discrete memoryless source symmetric capacity is defined as I¯(W)=I(X;Y) where the (X,U,p ,d), nested polar codes achieve the Shannon rate- channel input X is uniformly distributed over X and Y is the X distortion function (2). output of the channel. The Bhattacharyya distance between two distinct input symbols x and x˜ is defined as For the source (X,U,p ,d), let U = G where G is X an arbitrary Abelian group and let q = |G| be the size of Z(W )= (cid:88)(cid:112)W(y|x)W(y|x˜) the group. For a pair (R,D) ∈ R2, let X be distributed x,x˜ { } according to p and let U be a random variable such that y X ∈Y E{d(X,U)} ≤ D. We prove that there exists a pair of polar and the average Bhattacharyya distance is defined as codesC ⊆C suchthatC inducesapartitionofC through i o i o its shifts, C is a good source code for X and each shift of o Z(W)= (cid:88) q(q1−1)Z(W{x,x˜}) wCiillisbeamgaodoedcclheaarnnlaetlercoindethfeorfotlhloewtiensgt.channel pX|U. This x,x˜ x=∈x˜X (cid:54) Given the test channel p , define the artificial channels where (a) follows since X and Z are independent and there XU (G,G,W ) and (G,X ×G,|W ) such that for s,z ∈G and is a one-to-one correspondence between (S,Z) and (S,U). c s x∈X, Equality(b)followssinceS isindependentofX,U andhence H(SXU) = H(S)+H(XU). Equality (c) follows since Z W (z|s)=p (z−s) c U is uniform. Ws(x,z|s)=pXU(x,z−s) We employ a nested polar code in which the inner code is a good channel code for the channel W and the outer code is These channels have been depicted in Figures 1 and 2. c a good source code for W . The rate of this code is equal to s U R=I¯(W )−I¯(W ) s c =logq−H(U|X)−(logq−H(U))=I(X;U) S Z Note that the channels W and W are chosen so that the c s W differenceoftheirsymmetriccapacitiesisequaltotheShannon c mutual information between U and X. This enables us to use Fig. 1: Test channel for the inner code (the channel coding compo- channel coding polar codes to achieve the symmetric capacity nent) of W (as the inner code) and source coding polar codes to c achieve the symmetric capacity of the test channel W (as s the outer code). The exact proof is postponed to Section III-B U pXU X wheretheresultisprovedforthebinarycaseandSectionIII-C | in which the general proof (for arbitrary Abelian groups) is S Z presented. The next section is devoted to some general definitions and W s useful lemmas which are used in the proofs. Fig.2:Testchannelfortheoutercode(thesourcecodingcomponent) A. Definitions and Lemmas Let S be a random variable uniformly distributed over G Forachannel(X,Y,W),thebasicchanneltransformations which is independent from X and U. It is straightforward to associated with polar codes are given by: showthatinthiscase,Z isalsouniformlydistributedoverG. (cid:88) 1 The symmetric capacity of the channel Wc is equal to W−(y1,y2|u1)= qW(y1|u1+u(cid:48)2)W(y2|u(cid:48)2) (3) I¯(W )=I(S;Z)=H(Z)−H(Z|S) u(cid:48)2∈G c 1 (a) W+(y1,y2,u1|u2)= W(y1|u1+u2)W(y2|u2) (4) = logq−H(U|S)=logq−H(U) q where (a) follows since Z is uniformly distributed and U is for y1,y2 ∈ Y and u1,u2 ∈ G. We apply these transfor- independent of S. For the channel W , first we show that X mations to both channels (G,G,Wc) and (G,X ×G,Ws). s and Z are independent. For z ∈G and x∈X, Repeating these operations n times recursively for Wc and W , we obtain N = 2n channels W(1),··· ,W(N) and (cid:88) s c,N c,N pXZ(x|z)= pU Z(u|z)pXZU(x|z,u) W(1),··· ,W(N) respectively. For i=1,··· ,N, these chan- | | | s,N s,N u∈G nels are given by: =u(cid:88)GpUSp(uZ,(zz)−u)pX|ZU(x|z,u) Wc(,iN) (z1n,v1i−1|vi)= (cid:88) qN1 1WcN(z1N|v1NG) (=a) (cid:88)∈ p (u)p (x|u) viN+1∈GN−i − u∈G U X|U = (cid:88) qN1 1pNU(z1N −v1NG) =pX(x) viN+1∈GN−i − awrheeurenif(oar)mfloylldoiwstsrisbiuntceedSandantdheUMaarrekoinvdcehpaeinndZent↔, SUa↔ndXZ Ws(,iN) (xN1 ,z1n,v1i−1|vi)=vN (cid:88)GN−i qN1−1WsN(xN1 ,z1N|v1NG) holds. The symmetric capacity of the channel W is equal to i+1∈ s (cid:88) 1 = pN (xN,zN −vNG) I¯(W )=I(S;XZ)=H(S)+H(XZ)−H(SXZ) qN 1 XU 1 1 1 s vN GN−i − (a) i+1∈ = H(S)+H(X)+H(Z)−H(SXU) (b) = H(X)+H(Z)−H(XU) forzN,vN ∈GN,xN ∈XN whereGisthegeneratormatrix 1 1 1 (c) of dimensions N ×N for polar codes. For the case of binary = logq−H(U|X) input channels, it has been shown in [1] that as N → ∞, these channels polarize in the sense that their Bhattacharyya parameter gets either close to zero (perfect channels) or close be such that for z,z ∈G and x∈X, W(z|x,z )=1 . (cid:48) (cid:48) z=z(cid:48) to one (useless channels). For arbitrary channels, it is shown Then for s,z ∈G, { } in [4] that polarization happens in multiple levels so that as (cid:88) (cid:88) W (x,z |s)1 = P (x,z −s)·1 N →∞ channels get useless, perfect or “partially perfect”. s (cid:48) z=z(cid:48) XU (cid:48) z=z(cid:48) { } { } For an integer n, let Jn be a uniform random variable over zx(cid:48)∈G zx(cid:48)∈G the set {1,2,··· ,N = 2n} and define the random variable ∈X (cid:88)∈X = P (x,z −s) In(W) as XU (cid:48) x ∈X In(W)=I(X;Y) (5) =PU(x,z(cid:48)−s)=Wc(z|s) whereX andY aretheinputandoutputofWN(Jn) respectively LettherandomvectorsX1N,U1N bedistributedaccordingto and X is uniformly distributed. It has been shown in [5] that PN and let ZN be a random variable uniformly distributed the process I0,I1,I2,··· is a martingale; hence E{In} = ovXeUr GN which1is independent of XN,UN. Let SN =ZN− 1 1 1 1 IZ0d.(WFoN(rJna)n)iwntheegreerfno,r daecfihnaenntheel (rGan,dYo,mWv)a,riable Zdn(W) = Utw1No-oannde Vm1aNpp=inSg1NGG:−G1 N(He→re,GGN−)1. Iisntohteheinrvwerosredos,ftthheejoonine-t distribution of these random vectors is given by 1 (cid:88) (cid:88)(cid:112) Z (W)= W(y|x)W(y|x+d) (6) d q pVNSNUNXNZN(v1N,sN1 ,uN1 ,xN1 ,z1N) x Gy 1 1 1 1 1 ∈ ∈Y 1 = pN (xN,uN)1 Other than the processes In(W) and Zdn(W), in the proof of qN XU 1 1 {sN1 =v1NG,un1=z1N−v1NG} polarization, we need another set of processes ZH(W) and This implies In(W) for H ≤G which we define in the following. Define H 1 ZH(W)= (cid:88)Z (W) pV1NX1NZ1N(v1N,xN1 ,z1N)= qNpNXU(xN1 ,z1N −v1NG), d 1 d∈/H pV1NZ1N(v1N,z1N)= qNpNU(z1N −v1NG) Note that any uniform random variable defined over G can Inthenextsection,weprovidetheproofforthebinarycase. be decomposed into two uniform and independent random variables [X]H and [X]TH where [X]H takes values from H B. Source Coding: Sketch of the Proof for the Binary Case and [X] takes values from the transversal T of H such TH The standard result of channel polarization for the binary that X =[X] +[X] . For an integer n, define the random variable In(WH) as TH input channel Wc implies [1] that for any (cid:15) > 0 and H 0<β < 1, there exist a large N =2n and a partition A ,A 2 (cid:12) 0(cid:12) 1 IHn(W)=I(X;Y|[X]TH)=I([X]H;Y|[X]TH) (7) of[1,N]suchthatfort=0,1andi∈At,(cid:12)(cid:12)I¯(Wc(,iN) )−t(cid:12)(cid:12)<(cid:15) and such that for i ∈ A Z(W(i) ) < 2 Nβ. Moreover, as Definition III.1. The channel (G,Y ,W ) is degraded with 1 c,N − respect to the channel (G,Y2,W2) i1f the1re exists a channel (cid:15)to→on0e w(ainthdpN =→I¯∞(W),)|.ANt| → pt for some p0,p1 adding up (Y ,Y ,W) such that for x∈G and y ∈Y , 1 c 2 1 1 1 W1(y1|x)= (cid:88) W2(y2|x)W(y1|y2) Similarly, for the channel Ws we have the following: For any (cid:15) > 0 and 0 < β < 1, there exist a large N = 2n and a y2∈Y∈ partition B ,B of [1,N]2such that for τ = 0,1 and i ∈ B , (cid:12) 0 (cid:12)1 τ Lemma III.1. If the channel (G,Y1,W1) is degraded with (cid:12)(cid:12)I¯(Ws(,iN) )−τ(cid:12)(cid:12) < (cid:15) and such that for i ∈ B1, Z(Ws(,iN) ) < respect to the channel (G,Y ,W ) then Z(W )≥Z(W ). 2 2 1 2 2−Nβ. Moreover, as (cid:15) → 0 (and N → ∞), |BNτ| → qτ for Proof: Follows from [???]. some q ,q adding up to one with q =I¯(W ). 0 1 1 s Lemma III.2. If the channel (G,Y1,W1) is degraded with Lemma III.4. For i=1,··· ,N, Z(Wc(,iN) )≥Z(Ws(,iN) ). respecttothechannel(G,Y ,W )then(G,Y ×Y ×G,W+) isdegradedwithrespecttot2hech2annel(G,Y1×Y1×G,W1+) Proof: Follows from Lemma III.3, Lemma III.1 and 2 2 2 Lemma III.2. and(G,Y1×Y1,W1−)isdegradedwithrespecttothechannel To introduce the encoding and decoding rules, we need to (G,Y2×Y2,W2−) make the following definitions: Proof: Follows from [???]. (cid:110) (cid:12) (cid:111) A = i∈[1,N](cid:12)Z(W(i) )>2 Nβ 0 (cid:12) c,N − Lemma III.3. The channel Wc is degraded with respect to (cid:110) (cid:12) (cid:111) the channel Ws in the sense of Definition III.1. B0 = i∈[1,N](cid:12)(cid:12)Z(Ws(,iN) )>1−2−Nβ Proof:Intuitively,itisclearthatW isadegradedversion and A = [1,N]\A and B = [1,N]\B . For t = 0,1 c 1 0 1 0 ofW .Theproofisasfollows:Letthechannel(X×G,G,W) and τ = 0,1, define A = A ∩ B . Note that for large s t,τ t τ N, 2 Nβ < 1−2 Nβ and therefore, Lemma III.4 implies where in the last inequality, we dropped the subscripts of the − − A = ∅. Note that the above polarization results imply that probability distributions for simplicity of notation. Therefore, 1,0 as N increases, |AN1| →I¯(Wc) and |BNτ| →I¯(Ws). D ≤D +D +D (9) avg 1 2 3 1) Encoding and Decoding: Let zN ∈GN be an outcome 1 where oftherandomvariableZN knowntoboththeencoderandthe 1 decoder.GivenasourcesequencexN ∈XN,theencodingrule (cid:88) 1 D = p(vN,xN,zN)d ·1 (10) is as follows: For i ∈ [1,N], if i ∈ B0, then vi is uniformly 1 1 1 1 max {vˆ(cid:54)=v} distributed over G and is known to both the encoder and the v1Nx,Nz1N∈GNN decoder (and is independent from other random variables). If 1 ∈X (cid:88) i∈B1, vi =g for some g ∈G with probability D2 = p(v1N,xN1 ,z1N)d(xN1 ,z1N −v1NG) (11) P(vi =g)=pVi|X1NZ1NV1i−1(g|xN1 ,z1N,v1i−1) v1Nx,N1z(cid:88)1N∈X∈GNN(cid:12) (cid:12) Note that [1,N] can be partitioned into A0,0,A0,1 and A1,1 D3 = (cid:12)q(v1N,xN1 ,z1N)−p(v1N,xN1 ,z1N)(cid:12) (since A1,0 is empty) and B0 = A0,0, B1 = A0,1 ∪ A1,1. vN,zN GN sTveAhn1ed,r1sefivonAre0w,,1hvi−tco1hNthveAc0ad,n0ecbiosedekdrnecoaownmdnpttohoseetdhaeescodvde1Nceor=duesvre.As0T,t0hhe+e ecvnhAca0on,1dn+eerl 1xN11∈X∈N(cid:16)dmax·1{vˆ(cid:54)=v}+d(xN1 ,z1N −v1NG)(cid:17) (12) code to recover v . The decoding rule is as follows: Given A1,1 Here, we only give a sketch for the rest of the proof. The zN, v and v , let vˆ =v and vˆ =v . For 1 A0,0 A0,1 A0,0 A0,0 A0,1 A0,1 proof for the general case is completely presented in Section i∈A , let 1,1 III-C. The proof proceeds as follows: It is straightforward to vˆi =argmaxWc(,iN) (z1N,vˆ1i−1|g) show that D1 → D as N increases. It can also be shown g∈G that D2 → 0 as N increases since the inner code is a good Finally, the decoder outputs zN −vˆNG. channel code. Finally, it can be shown that D3 → 0 as N 1 1 2) Error Analysis: The analysis is a combination of the- increases since the total variation distance between the P and point-to point channel coding and source coding results for theQmeasuresissmall(inturnsincetheoutercodeisagood polarcodes.Theaveragedistortionbetweentheencoderinput source code). and the decoder output is upper bounded by (cid:32) (cid:33) C. Source Coding: Proof for the General Case (cid:88) 1 (cid:88) (cid:88) 1 (cid:89) Davg≤z1N∈GNqNxN1 ∈XpNNX(cid:16)(xN1v1N)∈GNq|B0| i∈B1p(vi|xN1,z1N,v1i−1(cid:17)) Ch1a)nnReelvs:iewThoefrePsoullatrofCocdheasnnefolrpoAlrabriiztraatiroyn Sfoourrcaerbsitraanryd d ·1 +d(xN,zN −vNG) discrete memoryless channels applied to Wc implies [] max {vˆ(cid:54)=v} 1 1 1 that for any (cid:15) > 0 and 0 < β < 1, there exist a large 2 wp(hveir|xeN1w,ezh1Na,vve1ir−e1p)lafcoerdspimVip|Xlic1NiZty1NoVf1i−n1o(tvaiti|oxnN1 a,nzd1N,dvm1i−ax1)iswtihthe Nfor=H2n≤anGd aanpdarititi∈onA{HA,H(cid:12)(cid:12)(cid:12)|IH¯(W≤c(,iN)G)}−olfog[1|,HGN|(cid:12)(cid:12)(cid:12)] s<uch(cid:15) tahnadt maximum value of the d(·,·) function. Let ZH(W(i) ) < 2 Nβ. Moreover, as (cid:15) → 0 (a|nd| N → ∞), c,N − qVi|X1NZ1NV1i−=1((cid:26)vip|21xVN1i|zX1N1NvZ1i−1N1V)1i−1(vi|xN1z1Nv1i−1) IIff ii∈∈BB01 o|AnNHeSi|wm→iitlhapr(cid:80)lHy,Hffoo≤rrGsthopemHeclohpgarno||nHGbea||lb=WilitIs¯i(eWwsepc)Hh.a,vHe ≤theGfoalldodwininggu:pFotor and any (cid:15) > 0 and 0 < β < 1, there exist a large N = 2n and 2 qXNZN(xN1 ,z1N)=pXNZN(xN1 ,z1N) a partition {BH(cid:12)|H ≤ G} of [1,N(cid:12) ] such that for H ≤ G We have 1 1 1 1 and i ∈ BH, (cid:12)(cid:12)I¯(Ws(,iN) )−log|HG|(cid:12)(cid:12) < (cid:15) and ZH(Ws(,iN) ) < D ≤ (cid:88) q (vN,xN,zN) 2−Nβ. Moreover, as (cid:15) → 0 (a|nd| N → ∞), |BNH| → qH avg VNXNZN 1 1 1 for some probabilities q ,H ≤ G adding up to one with 1 1 1 H v1Nx,N1z1N∈X∈GNN (cid:16)d ·1 +d(xN,zN −vNG)(cid:17) (cid:80)LeHm≤mGaqIHIIl.o5g. ||IHGf||th=eI¯c(hWansn).el (G,Y1,W1) is degraded with ≤(cid:88)(cid:16)p(vN,xNm,zaNx)+{(cid:12)(cid:12)qvˆ((cid:54)=vvN},xN,zN1)−p1(vN,x1N,zN)(cid:12)(cid:12)(cid:17) rIIeIs.p1e,ctthetontfhoer cahnayndne∈l (GG,,Y2,W2) in the sense of Definition 1 1 1 1 1 1 1 1 1 vN,zN GN 1xN1 ∈N Zd(W1)≥Zd(W2) 1 ∈X (cid:16) (cid:17) dmax·1{vˆ(cid:54)=v}+d(xN1 ,z1N −v1NG) (8) Proof:Let(Y2,Y1,W)beachannelsothatthecondition of Definition III.1 is satisfied. We have so that for g ∈[v ] +T , i K K Zd(W1)= 11q x(cid:88)(cid:88)∈Gy1(cid:88)(cid:88)∈Y1(cid:112)W1(y1|x)W1(y1|x+d) P(vi =g)= pVi|Xp1NVZi|1NXV1N1iZ−11N(V[v1i−i]1K(g+|xTN1K,z|x1NN1,,vz1i−1N1,)v1i−1) = q Note that vN can be decomposed as vN = [vN] + (cid:115)(cid:88)x∈Gy1∈Y1 (cid:88) [v1N]TK≤H +1[v1N]TH (with a slight abuse 1of notatio1n sKince W (y |x)W(y |y ) W (y |x+d)W(y |y ) K and H depend on the index i) in which [vN] is known 2 2 1 2 2 2(cid:48) 1 2(cid:48) 1 K y2∈Y2 y2(cid:48)∈Y2 to the decoder. The encoder sends [v1N]TK≤H to the decoder ≥1(cid:88) (cid:88) (cid:88)(cid:112)W2(y2|x)W(y1|y2)2W2(y2|x+d) adnedcotdhiengdercuoledeirsuassefsotlhloewcsh:aGnniveelncozdNe,to[vrNec]ovearnd[v[1NvN]T]H.The, q 1 1 K 1 TK≤H x∈Gy1∈Y1y2∈Y2 and for i∈AH,K, let 1 (cid:88) (cid:88) (cid:112) = W (y |x)W (y |x+d) =Zq x(∈WGy)2∈Y2 2 2 2 2 vˆi =g∈[vi]Ka+r[gvim]TaKx≤H+THWc(,iN) (z1N,vˆ1i−1|g) d 2 Finally, the decoder outputs zN −vˆNG. Note that the rate of 1 1 this code is equal to ZLem(Wm(ai)I)II≥.6.ZF(oWr i(i=) )1,a·n·d·Z,NH(aWnd(if)o)r≥d∈ZHG(Wan(di)H).≤G, R= (cid:88) |AH,K|log |H| d c,N d s,N c,N s,N N |K| K H G Proof: Follows from Lemma III.3, Lemma III.2 and ≤ ≤ Lemma III.6. = (cid:88) |AH,K|log |G| − (cid:88) |AH,K|log |G| N |K| N |H| Wedefinesomequantitiesbeforeweintroducetheencoding K H G K H G ≤ ≤ ≤ ≤ and decoding rules. For H ≤G, define = (cid:88) |BK|log |G| − (cid:88) |AH|log |G| (cid:110) (cid:12) N |K| N |H| AH = i∈[1,N](cid:12)(cid:12)ZH(Wc(,iN) )<2−Nβ, →KI¯≤(WG )−I¯(W )=I(XH≤;UG) (cid:111) s c (cid:64)K ≤Hsuch that ZK(W(i) )<2 Nβ c,N − D. Error Analysis (cid:110) (cid:12) B = i∈[1,N](cid:12)ZH(W(i) )<1−2 Nβ, H (cid:12) s,N − The average distortion between the encoder input and the (cid:111) (cid:64)K ≤H such that ZK(W(i) )<1−2 Nβ decoder output is upper bounded by s,N − (cid:88) 1 (cid:88) (cid:88) 1 NFoorteHtha≤t foGr laanrgdeKN,≤2 GNβ, d<efin1e−A2H,NKβ =andAHthe∩refBorKe,. Davg ≤zN GN qN xN NpNX(xN1 )vN GN q|B0| − − 1 ∈ 1 ∈X 1 ∈ if for some i ∈ [1,N], i ∈ A , Lemma III.6 implies ZfoHr (KWs(cid:2)(,iN)H)<, A1H−,K2−N=β∅a.nTdhheernecfoereiH∈{A∪HK,≤KH|KBK≤. THher≤efoGre}, K(cid:89)≤Gi∈(cid:89)BKp([vi]Kp+(gT|KxN1|x,N1z1N,z,1Nv1i,−v11i−)1)·|K| is a partition of [1,N]. Note that the channel polarization (cid:0)d ·1 +d(xN,zN −vNG)(cid:1) max vˆ=v 1 1 1 resultsimplythatasN increases, |ANH| →pH and |BNH| →qH. { (cid:54) } 2) Encoding and Decoding: Let z1N ∈GN be an outcome wp(h·|exreN1 ,zp1NVi,|vX1i1N−1Z)1NfVo1ir−s1i(m·|xplN1ic,izty1No,fv1in−o1t)atioins. Froerpila∈ceBdK,wlietth oftherandomvariableZN knowntoboththeencoderandthe 1 AfigdnoelrocsGoofudnGnaeionrq.tduceGealtenihv[tagebTt]neKKTKr≤e∈Hp≤rKebse+He,na[Ttg≤et]rdTaKGnibs≤sy,vHaelgetrrs∈ta=anTlTsHo[vKgfe]≤brKKseHa+lianaT[tnHgrda]Tn.o[KsAgfv≤]nKeTHryHs+aienll∈e[oGmgfT]eTHHsnHot. qVi|X1N=Z1NpVVi1i|−X11N(Zvpi1NV|xVi|N11Xi−z1N11N(Z[v1vN1iiV−]K1i1−)+1(gT|KxN1|x,N1z1N,z,1Nv1i,−v11i−)1)·|K| K H H K and that g can be uniq≤uely represented by g = [g] +[g] for K TK some [g]TK ∈ TK and [g]TK can be uniquely represented by qXNZN(xN1 ,z1N)=pXNZN(xN1 ,z1N) [g] =[g] +[g] . 1 1 1 1 TK TK≤H TH Given asource sequencexN ∈XN, theencoding ruleis as Note that Equations (8) through (12) are valid for the general 1 follows: For i ∈ [1,N], if i ∈ A for some K ≤ H ≤ G, case (with the new Q measure). H,K [v ] is uniformly distributed over K and is known to both It follows from the analysis of [6, Section III.F and Section i K the encoder and the decoder (and is independent from other IV] that D =D. The following lemma is also proved in [6, 2 random variables). The component [v ] is chosen randomly Section III.F and Section IV]. i TK Lemma III.7. With the above definitions, C is a good channel code and each shift of C is a good o i (cid:88) (cid:12) (cid:12) source code. This will be made clear later in the following. (cid:107)P −Q(cid:107)t.v. = (cid:12)q(v1N,xN1 ,z1N)−p(v1N,xN1 ,z1N)(cid:12) v1Nx,N1z1N∈X∈GNN disLtreitbuXtionbeanadrlaentdUombevuanriifaobrlmelwyidthisttrhibeutceadpaocvietyrGac.hDieevfiinnge ≤K2−Nβ the artificial channels (G,G,Ws) and (G,Y ×G,Wc) such that for u,z ∈G and y ∈Y, for some constant K depending only on q. It remains to show that D1 vanishes as N approaches Ws(z|u)=pX(z−u) infinity. We have W (y,z|u)=p (z−u,y) c XY (cid:88) (cid:88) (cid:88) pN (xN,zN−vNG) 1(cid:110) These channels have been depicted in Figures ?? and ??. XU 1 1 1 v1Nx,Nz1N∈GNN K≤H≤Gi∈AH,K Wc(,iN)(z1N,v1i−1|vi) 1 ∈X X 1 (cid:111) ≤W(cid:88)c(,iN)(z1N,v1i−1|v˜i) for some v˜i∈[(cid:88)vi]K+[vi(cid:88)]TK≤H+TH(cid:88),v˜i(cid:54)=vi U Z ≤ pN (xN,zN −vNG) XU 1 1 1 v1Nx,N1z1N∈X∈GNN K≤H≤Gi∈AH,Kv˜i∈[v˜vii(cid:54)=]Hv+i TH Fig.3:TestchannelfortheinneWrcsode(thesourcecodingcomponent) (cid:118) (cid:117)(cid:117)Wc(,iN) (z1N,v1i−1|v˜i) (cid:116) Wc(,iN) (z1N,v1i−1|vi) X W Y (cid:88) (cid:88) (cid:88) (cid:88) 1 (cid:88) 1 =K≤H≤Gi∈AH,Kv˜i∈[v˜vviii(cid:54)=∈]HGv+i THv1i−1,z1Nq(cid:118)vi+1NqN−1pNU(z1N−v1NG) U Wc Z (cid:117)(cid:117)Wc(,iN) (z1N,v1i−1|v˜i) Fig. 4: Test channel for the outer code (the channel coding compo- (cid:116) nent) Wc(,iN) (z1N,v1i−1|vi) =K≤(cid:88)H≤Gi∈(cid:88)AH,Kv˜i∈[v˜vv(cid:88)iii(cid:54)=∈]HGv+i THZ{vi,v˜i}(Wc(,iN) ) pccahUsaN(enu,on)teepolXnsteh(aaxrtce)aWfenoqr(usyuah|,lxoxtw)o,1z{tzh∈=autG+txha}ne.dSsyiymm∈ilmaYrelt,yripctUoXcthaYepZas(couiut,irexcse,yco,ofzd)itnh=ge Notethatv˜ ∈[v ] +T andv˜ (cid:54)=v implyd=v˜ −v ∈/ H. i i H H i i i i We have I¯(W )=logq−H(X) s Z (W(i) )≤qZ (W(i) )≤qZH(W(i) ) I¯(Wc)=logq−H(X|Y) {vi,v˜i} c,N d c,N c,N Therefore, Weemployanestedpolarcodeinwhichtheinnercodeisa D ≤ (cid:88) (cid:88) (cid:88) qZH(W(i) ) good source code for the test channel Ws and the outer code 1 c,N is a good channel code for Wc. The rate of this code is equal ≤K4q≤qHN≤2GNi∈βAH,Kv˜i∈[v˜vviii(cid:54)=∈]HGv+i TH to R=I¯(Wc)−I¯(Wx) − =logq−H(X|Y)−(logq−H(X))=I(X;Y) Therefore, D →0 as N increases. 1 Note that the channels W and W are chosen so that the c s IV. POLARCODESACHIEVETHESHANNONCAPACITYOF differenceoftheirsymmetriccapacitiesisequaltotheShannon ARBITRARYDMCS capacity of the original channel. We postpone the proof to In this section, we prove the following theorem: Section V where the result is proved for the binary case and Section VI in which the general proof (for arbitrary Abelian Theorem IV.1. For an arbitrary discrete memoryless channel groups) is presented. The rest of this section is devoted to (X,Y,W), nested polar codes achieve the Shannon capacity. some general definitions and lemmas which are used in the For the channel let X = G for some Abelian group G proofs. and let |G| = q. Similarly to the source coding problem, we Let n be a positive integer and let N = 2n. Similar to the show that there exists nested polar code C ⊆ C such that source coding case, For both channels W and W and for i o s c i=1···N, define the synthesized channels as and A = [1,N]\A and B = [1,N]\B . For t = 0,1 and 1 0 1 0 Wc(,iN) (y1N,z1N,v1i−1|vi)= (cid:88) 2N1 1WcN(y1N,z1N|v1NG) iτmp=lie0s,1, define At,τ = At ∩ Bτ. Note that Lemma III.4 =viN+1(cid:88)∈GN−i 1− pN (zN −vNG,yN)A1,0 =(cid:110)i∈[1,N](cid:12)(cid:12)(cid:12)<2−Nβ <Z(Wc(,iN) )<Z(Ws(,iN) )<1−2−Nβ(cid:111) 2N 1 XY 1 1 1 vN GN−i − Since Z(W(i) ) and Z(W(i) ) both polarize to 0,1, as N i+1∈ c,N s,N and increases A1,0 → 0. Note that Theorems ?? and ?? imply N Ws(,iN) (z1N,v1i−1|vi)= (cid:88) 2N1 1WsN(z1N|v1NG) that as N increases, |AN1| →I¯(Wc) and |BN1| →I¯(Ws). vN GN−i − A. Encoding and Decoding i+1∈ = (cid:88) 1 pN(zN −vNG) Let z1N be a realization of the random vector Z1N avail- 2N 1 X 1 1 able to both the encoder and the decoder. Given a partition viN+1∈GN−i − A0,0,A0,1,A1,0,A1,1 of [1,N], a vector v1N ∈ GN can Let the random vector UN be distributed according to pN be decomposed as vN = v + v + v + v 1 U 1 A0,0 A0,1 A1,0 A1,1 (uniform)andletVN =UNG 1 whereGisthepolarcoding and similarly, the set GN can be partitioned into the union 1 1 − matrixofdimensionN×N.NotethatsinceGisaone-to-one GA0,0 ∪ GA0,1 ∪ GA1,0 ∪ GA1,1. Let vA0,0 ∈ GA0,0 be a mapping, VN is also uniformly distributed. Let YN and ZN uniformly distributed random variable available to both the 1 1 1 be the outputs of the channel W when the input is UN. Note encoder and the decoder which is independent from all other c 1 that for vN,un,xN,zN ∈GN and yN ∈YN, random variables and let v be the message vector. The 1 1 1 1 1 A0,1 p (vN,un,xN,yN,zN) encoding is as follows: For i∈A1,1∪A1,0, VNUNXNYNZN 1 1 1 1 1 1 1 1 1 1 (cid:40) ==121{Nv1NpNX=u(zN11NG−−1}vp1NNUG(u)N1W)NpNX(y(1NxN1|z)1NW−Nv(y1N1NG|x)1N1{)u1N1{=z1Nv1N=GuN1,x+N1x=N1z}1N−v1NGT}hevrie=ceiver01hawws iiatthhccpperrsoosbb..toppVVyiiN||ZZ,11NNzVVN11ii−−11a((n01d||zz11vNN,,vv11ii−−.11N))ote that and 1 1 A0,0 pV1NY1NZ1N(v1N,y1N,z1N)= 21NpNX(z1N −v1NG)WN(y1N|z1N −v1Nr|eAGcN1e),0iv|e→r h0asasacNcesisnctroeavsAes1.,0A. sTshuemneiftocratnheusmeotmheenftotlhloawt itnhge 1 decoding rule: For i∈A0,1∪A1,1, p (vN,zN)= pN(zN −vNG) V. CHAV1NNNZE1NLC1ODI1NG:S2KNETCXHO1FTHE1PROOFFORTHE vˆi =arggmGaxWc(,iN) (y1N,z1N,vˆ1i−1|g) ∈ BINARYCASE It is shown in the next section that with this encoding and The following theorems state the standard channel coding decodingrules,theprobabilityoferrorgoestozero.Itremains and source coding polarization phenomenons for the binary to send the component v to the decoder which can be A1,0 case. doneusingaregularpolarcode(whichachievesthesymmetric Theorem V.1. For any (cid:15) > 0 and 0 < β < 1, there exist capacity of the channel). Note that since the fraction |AN1,0| a large N = 2n and a partition A ,A of [1,2N] such that vanishes as N increases, the rate loss due to the transmission for t = 0,1 and i ∈ A , (cid:12)(cid:12)I¯(W(i) )−0t(cid:12)(cid:12) 1< (cid:15) and Z(W(i) ) < of vA1,0 can be made arbitrarily small. t (cid:12) s,N (cid:12) s,N 2−Nβ. Moreover, as (cid:15) → 0 (and N → ∞), |ANt| → pt for B. Error Analysis some p ,p adding up to one with p =I¯(W ). The receiver has access to zN,v ,v and yN. A 0 1 1 s 1 A0,0 A1,0 1 communication error occurs if vˆ (cid:54)= v . The error event Theorem V.2. For any (cid:15) > 0 and 0 < β < 1, there exist a 0,1 0,1 2 is contained in the event: large N = 2n and a partition B ,B of [1,N] such that for τ = 0,1 and i ∈ Bτ, (cid:12)(cid:12)(cid:12)I¯(Wc(,iN) )0−τ1(cid:12)(cid:12)(cid:12) < (cid:15) and Z(Wc(,iN) ) < (cid:91) {Wc(,iN) (y1N,z1N,vˆ1i−1|vi)≤Wc(,iN) (y1N,z1N,vˆ1i−1|vi+1)} 2−Nβ. Moreover, as (cid:15) → 0 (and N → ∞), |BNτ| → qτ for i∈A0,1∪A1,1 some q ,q adding up to one with q =I¯(W ). Therefore, we have the following upper bound on the 0 1 1 c average probability of error: Lemma V.1. For i=1,··· ,N, Z(W(i) )≥Z(W(i) ). s,N c,N (cid:88) 1 (cid:88) 1 toDWePcfirnaoenodf:uFsoinllgowLesmfrmoma[?s?i?n]ceanWdsLeismdmeag[r?a?d?e]d. with respect E{Perr}≤z1N∈GN qN v1N∈GN q|A0,0|+|A0,1| A0 =(cid:110)i∈[1,N](cid:12)(cid:12)(cid:12)Z(Ws(,iN) )>1−2−Nβ(cid:111) (cid:89) pVi|Z1NV1i−1(vi|z1N,v1i−1)(cid:88)WN(y1N|z1N −v1NG) B0 =(cid:110)i∈[1,N](cid:12)(cid:12)(cid:12)Z(Wc(,iN) )>2−Nβ(cid:111) 1{∃i∈i∈A[01,,1N∪]A:W1,c0(,iN)(y1N,z1N,v1i−1|vi)≤Wc(,iN)(y1N,z1Ny,1Nv1i−1|vi+1)} Define We have (cid:88)(cid:88) (cid:88) (cid:88) p(v1N,z1N)=pVNZN(v1N,z1N) P1 ≤ p(v1N,z1N) WN(y1N|z1N −v1NG) 1 1 = q1N ·i∈(cid:89)[1,N]pVi|Z1NV1i−1(vi|z1N,v1i−1) 1≤{W(cid:88)vc(1N,iN)(cid:88)(zy1N1N,pz1N(v,vN1i−,1z|Nvi))≤(cid:88)yW1Nc(,iWN)(yN1N(,yz1NN,|vz1iN−1|−vi+vN1)}G)i∈(cid:88)A0,1 1 1 1 1 1 and v1N z1N y1N i∈A0,1 (cid:118) q(v1N,z1N)= q1N · q|A0,0|1+|A0,1| i∈A0(cid:89),1∪A1,0pVi|Z1NV1i−1(vi|z1N,v1i−(cid:117)(cid:117)(cid:116)1)W(cid:88)Wc(,iN)(cid:88)c(,i(N)y(1N(cid:88)y,1Nz,1Nz1,1Nv,1i−v1i1−|v1i|v+i)1) = pN(zN −vNG)WN(yN|zN −vNG) Note that 2N X 1 1 1 1 1 vN zN yN 1 1 1 (cid:88)(cid:88) (cid:88) (cid:118) E{Perr}≤ q(v1N,z1N) WN(y1N|z1N −v1NG) (cid:88) (cid:117)(cid:117)Wc(,iN) (y1N,z1N,v1i−1|vi+1) (cid:116) 1{∃iv∈1N[1,zN1N]:Wc(,iN)(y1N,z1N,yv11Ni−1|vi)≤Wc(,iN)(y1N,z1N,v1i−1|vi+1)} i∈A0,1 Wc(,iN) (y1N,z1N,v1i−1|vi) ≤P +P 1 2 where the last equality follows since p(vN,zN) = 1 1 where p (vN,zN)= 1 pN(zN −vNG). Therefore, V1NZ1N 1 1 2N X 1 1 P11={∃(cid:88)iv∈1N[1,(cid:88)zN1N]:Wp(c(,viN)1N(y,1Nz1,Nz1N),(cid:88)yv11iN−1W|viN)≤(Wy1Nc(,iN)|z(1Ny1N−,z1Nv,1Nv1iG−1)|vi+1)} P1 ≤i∈(cid:88)A0,1 21(cid:88)vi v(cid:88)1i−1y1N(cid:88),z1N(cid:118)(cid:117)(cid:117)(cid:116)(cid:118)WWc(,iN)c(,i(N)y(1Ny,1Nz,1Nz,1Nv,1i−v1i1−|v1i|v+i)1)v(cid:88)iN+1 2N1−1pNX(z1N −v1NG)WN(y1N|z1N −v1NG) aPnd=(cid:88)(cid:88)|q(vN,zN)−p(vN,zN)|(cid:88)WN(yN|zN −vNG) =i∈(cid:88)A0,1 21(cid:88)vi v(cid:88)1i−1y1N(cid:88),z1N(cid:117)(cid:117)(cid:116)WWc(,iN)c(,i(N)y(1Ny,1Nz,1Nz,1Nv,1i−v1i1−|v1i|v+i)1)Wc(,iN) (y1N,z1N,v1i−1|vi) 2 vN zN 1 1 1 1 yN 1 1 1 = (cid:88) 1(cid:88)Z(W(i) ) 1 1 1 2 c,N 1≤{∃(cid:88)i∈[1,(cid:88)N]:W|qc((,iNv)N(y,1Nz,zN1N),v−1i−p1|(vviN)≤,WzNc(,iN))|(y(cid:88)1N,zW1N,vN1i−(1y|Nvi+|z1N)}−vNG) ≤iN∈AδN20,1 vi 1 1 1 1 1 1 1 vN zN yN VI. CHANNELCODING:PROOFFORTHEGENERALCASE 1 1 1 (cid:88)(cid:88) ≤ |q(v1N,z1N)−p(v1N,z1N)| For H ≤G, define v1N z1N (cid:110) (cid:12) A = i∈[1,N](cid:12)ZH(W(i) )<1−2 Nβ, We use the following two lemmas from [???]. H (cid:12) s,N − (cid:111) (cid:64)K ≤Hsuch that ZK(W(i) )<1−2 Nβ Lemma V.2. For p(·,·) and q(·,·) defined as above, s,N − (cid:110) (cid:12) (cid:88)(cid:88)(cid:12)(cid:12)p(vN,zN)−q(vN,zN)(cid:12)(cid:12)≤2 (cid:88) BH = i∈[1,N](cid:12)(cid:12)ZH(Wc(,iN) )<2−Nβ, 1 1 1 1 (cid:111) v1N z1N i∈A0,0∪A0,1 (cid:64)K ≤H such that ZK(Wc(,iN) )<2−Nβ (cid:26)(cid:12) (cid:12)(cid:27) E (cid:12)(cid:12)(cid:12)21 −pVi|V1i−1Z1N(0|V1N,Z1N)(cid:12)(cid:12)(cid:12) FthoartHfor≤KG≤anHd ≤K G≤,GZ,Hd(eWfin)e≤AHZ,KK(W=)A.HAl∩soBnKot.eNthoatet Lemma V.3. For i∈[1,N], if Z(W(i) )≥1−δ2 then ZH(Wc(,iN) )≤ZH(Ws(,iN) ) Therefore, if K (cid:2)H then s,N N (cid:110) (cid:12) (cid:111) E(cid:26)(cid:12)(cid:12)(cid:12)(cid:12)12 −pVi|V1i−1Z1N(0|V1N,Z1N)(cid:12)(cid:12)(cid:12)(cid:12)(cid:27)≤√2δN AH,K ⊆ i∈[1,N](cid:12)(cid:12)2−Nβ <ZH(Wc(,iN) )≤ZH(Ws(,iN) )<1−2−Nβ Since ZH(W(i) ) and ZH(W(i) ) both polarize to 0,1, as c,N s,N Notethatfori∈A ∪A ,Z(W(i) )≥1−δ2 .Therefore, N increases AH,K → 0 if K (cid:2) H. Note that the channel 0,0 0,1 s,N N N Lemmas [] and [] imply polarization results imply that as N increases, |ANH| → pH P ≤2√2Nδ and |BNH| →qH. 2 N 1) Encoding and Decoding: Let zN ∈GN be an outcome [5] E.Sasoglu,E.Telatar,andE.Arikan,“Polarizationforarbitrarydiscrete 1 oftherandomvariableZN knowntoboththeencoderandthe memoryless channels,” IEEE Information Theory Worshop, Dec. 2009, 1 Lausanne,Switzerland. decoder. Given K ≤ H ≤ G, let T be a transversal of H H [6] A.SahebiandS.Pradhan,“Polarcodesforsourceswithfinitereconstruc- in G and let TK H be a transversal of K in H. Any element tion alphabets,” in Communication, Control, and Computing (Allerton), g of G can be r≤epresented by g = [g] +[g] +[g] 201250thAnnualAllertonConferenceon,2012,pp.580–586. K TK≤H TH for unique [g] ∈ K, [g] ∈ T and [g] ∈ T . Also note thatKT +T TKis≤Ha transvKe≤rsHal T of KTHin GHso K H H K that g can be uniq≤uely represented by g = [g] +[g] for K TK some [g] ∈ T and [g] can be uniquely represented by TK K TK [g] =[g] +[g] . TK TK≤H TH Given asource sequencexN ∈XN, theencoding ruleis as 1 follows: For i ∈ [1,N], if i ∈ A for some K ≤ H ≤ G, H,K [v ] is uniformly distributed over K and is known to both i K the encoder and the decoder (and is independent from other random variables). The component [v ] is the message i TK≤H and is uniformly distributed but is only known to the encoder. The component [v ] is chosen randomly so that for g ∈ i TK [v ] +[v ] +T , i K i TK≤H H P(vi =g)= pVi|X1NZ1NVp1i−Vi1|(X[v1NiZ]K1NV+1i−[v1i(]gT|Kx≤N1H,z+1NT,vH1i|−x1N1),z1N,v1i−1) For i ∈ [1,N], if i ∈ A for some K (cid:2) H, [v ] is H,K i H uniformlydistributedoverH andisknowntoboththeencoder andthedecoderandthecomponent[v ] ischosenrandomly i TH so that for g ∈[v ] +T , i H H P(vi =g)= pVi|Xp1NVZi|1NXV1N1iZ−1N1(V[v1i−i]1H(g+|xTN1H,z|x1NN1,,vz1i−1N1,)v1i−1) For the moment assume that in this case v is known at the i receiver. Note that for i∈[1,N], if i∈A for some K ≤ H,K H ≤ G, v can be decomposed as v = [v ] +[v ] + i i i K i TK≤H [v ] in which [v ] is known to the decoder. The decoding i TH i K rule is as follows: Given zN and for i ∈ A for some 1 H,K K ≤H ≤G, let vˆi = argmax Wc(,iN) (z1N,vˆ1i−1|g) g∈[vi]K+[vi]TK≤H+TH It is shown in the next section that with this encoding and decodingrules,theprobabilityoferrorgoestozero.Itremains to send the v i ∈ A with K (cid:2) H to the decoder i H,K whichcanbedoneusingaregularpolarcode(whichachieves the symmetric capacity of the channel). Note that since the fraction |AH,K| vanishes as N increases if K (cid:2) H, the rate N loss due to this transmission can be made arbitrarily small. REFERENCES [1] E.Arikan,“ChannelPolarization:AMethodforConstructingCapacity- Achieving Codes for Symmetric Binary-Input Memoryless Channels,” IEEETransactionsonInformationTheory,vol.55,no.7,pp.3051–3073, 2009. [2] S. Korada and R. Urbanke, “Polar codes are optimal for lossy source coding,”inInformationTheoryWorkshop,2009.ITW2009.IEEE,2009, pp.149–153. [3] E. Arikan, “Source polarization,” in Information Theory Proceedings (ISIT),2010IEEEInternationalSymposiumon,2010,pp.899–903. [4] A.SahebiandS.Pradhan,“MultilevelChannelPolarizationforArbitrary DiscreteMemorylessChannels,”InformationTheory,IEEETransactions on,vol.59,no.12,pp.7839–7857,2013.