Table Of Content

1 A Lower Bound on the Probability of Error of Polar Codes over BMS Channels Boaz Shuval, Ido Tal Department of Electrical Engineering, Technion, Haifa 32000, Israel. Email: {bshuval@campus, idotal@ee}.technion.ac.il Abstract—Polarcodesareafamilyofcapacity-achievingcodes The polar construction begins with two identical and inde- 7 that have explicit and low-complexity construction, encoding, pendent copies of a BMS W and transforms them into two 1 and decoding algorithms. Decoding of polar codes is based new channels, 0 on the successive-cancellation decoder, which decodes in a bit- 2 wise manner. A decoding error occurs when at least one bit is W−(y ,y |u )= 1(cid:88)W(y |u ⊕u )W(y |u ), erroneously decoded. The various codeword bits are correlated, 1 2 1 2 1 1 2 2 2 n yetperformanceanalysisofpolarcodesignoresthisdependence: u2 a 1 J tbhoeunudppiserbabsoeudnodnisthbeawseodrsot-npetrhfeorumniinognbbiot.und, and the lower W+(y1,y2,u1|u2)= 2W(y1|u1⊕u2)W(y2|u2). (1) 9 Improvement of the lower bound is afforded by considering Channel W+ is a better channel than W whereas channel W− 1 errorprobabilitiesoftwobitssimultaneously.Thesearedifficult is worse than W.1 This construction can be repeated multiple to compute explicitly due to the large alphabet size inherent ] times; each time we take two identical copies of a channel, to polar codes. In this research we propose a method to lower- T bound the error probabilities of bit pairs. We develop several say W+ and W+, and polarize them, e.g., to W+− and W++. I . transformations on pairs of synthetic channels that make the We call the operation W (cid:55)→ W− a ‘−’-transform, and the cs resultantsyntheticchannelsamenabletoalphabetreduction.Our operation W (cid:55)→W+ a ‘+’-transform. [ method improves upon currently known lower bounds for polar There are N =2n possible combinations of n ‘−’- and ‘+’- codes under successive-cancellation decoding. transforms;wedefinechannelW asfollows.Letb b ···b be 2 i 1 2 n v IndexTerms—Channelpolarization,channelupgrading,lower the binary expansion of i−1, where b1 is the most significant bounds, polar codes, probability of error. 8 bit (MSB). Then, channel Wi is obtained by n transforms of 2 W according to the sequence b ,b ,...,b , starting with the 1 2 n 6 MSB: if b = 0 we do a ‘−’-transform and if b = 1 we j j 1 I. INTRODUCTION do a ‘+’-transform. For example, if n = 3, channel W is 0 5 . POLARcodes[1]areafamilyofcodesthatachievecapacity W+−−, i.e., it first undergoes a ‘+’-transform and then two 1 ‘−’-transforms. 0 on binary, memoryless, symmetric (BMS) channels and Overall, we obtain N channels W ,...,W ; channel W 7 have low-complexity construction, encoding, and decoding 1 N i 1 algorithms. This is the setting we consider. Polar codes has input ui and output y1,...,yN,u1,...,ui−1. I.e., channel : W has binary input u , output that consists of the output and v have since been extended to a variety of settings including i i inputofchannelW ,andassumesthattheinputbitsoffuture i source-coding [2], [3], non-binary channels [4], asymmetric i−1 X channels u ,...,u are uniform. We call these synthetic channels [5], channels with memory [6], [7], and more. The i+1 N r channels. One then determines which synthetic channels are a probability of error of polar codes is given by a union of “good”andwhichare“bad”,andtransmitsinformationoverthe correlated error events. The union bound, which ignores this “good” synthetic channels and predetermined values over the correlation, is used to upper-bound the error probability. In “bad” synthetic channels. Since the values transmitted over the this work, we exploit the correlation between error events to latter are predetermined, we call the “bad” synthetic channels develop a general method for lower-bounding the probability frozen. of error of polar codes. Decoding is accomplished via the successive-cancellation Polar codes are based on an iterative construction that (SC) decoder. It decodes the synthetic channels in succession, transforms N = 2n identical and independent channel uses using previous bit decisions as part of the output. The bit into “good” and “bad” channels. The “good” channels are decisionforasyntheticchanneliseitherbasedonitslikelihood almost noiseless, whereas the “bad” channels are almost pure or, if it is frozen, on its predetermined value. I.e., denoting the noise. Arıkan showed [1] that for every (cid:15)>0, as N →∞ the set of non-frozen synthetic channels by A, proportion of channels with capacity greater than 1−(cid:15) tends (cid:40) to the channel capacity C and the proportion of channels with argmax W (yN,uî−1|u ) i∈A Uˆ (yN,uî−1)= ui i 1 1 i capacity less than (cid:15) tends to 1−C. i 1 1 u i∈Ac, i Anabbreviatedversionofthisarticle,withtheproofsomitted,wassubmitted 1By this we mean that channel W+ can be stochastically degraded to toISIT2017. channelW,whichinturncanbestochasticallydegradedtoW−. 2 where we denoted yN = y ,...,y and similarly for the only to the BEC. We briefly introduce these below, but first 1 1 N previous bit decisions uî−1. As non-frozen synthetic channels we explain where the difficulty lies. 1 are almost noiseless, previous bit decisions are assumed to be The probability P{E } is given by an appropriate functional correct. Thus, when N is sufficiently large, this scheme can i oftheprobabilitydistributionofsyntheticchannelW .However, be shown to achieve capacity [1] as the proportion of almost i the output alphabet of W is very large. If the output alphabet noiseless channels is C. i of W is Y then the output alphabet of W has size |Y|N2i−1. Toanalyzetheperformanceofpolarcodes,letB denotethe i i This quickly grows unwieldy, recalling that N = 2n. It is event that channel W errs under SC decoding while channels i infeasible to store this probability distribution and it must be 1,2,...,i−1 do not. I.e., approximated. Such approximations are the subject of [11]; (cid:110) (cid:111) B = uN,yN|uî−1 =ui−1,Uˆ (yN,uî−1)(cid:54)=u . theyenableonetocomputeupperandlowerboundsonvarious i 1 1 1 1 i 1 1 i functionals of the synthetic channel W . i The probability of error of polar codes under SC decoding is given by P(cid:8)(cid:83) B (cid:9). Let E denote the event that channel To compute probabilities of unions of events, one must i∈A i i know the joint distribution of two synthetic channels. The W errs given that a genie had revealed to it the true previous i size of the joint distribution’s output alphabet is the product bits, i.e. of each synthetic channel’s alphabet size, rendering the joint (cid:110) (cid:111) E = uN,yN|Uˆ (yN,ui−1)(cid:54)=u . distribution infeasible to store. i 1 1 i 1 1 i The authors of [8] suggested to approximate the joint distri- We call an SC decoder with access to genie-provided pre- bution of pairs of synthetic channels using a density evolution vious bits a genie-aided decoder. Some thought reveals that (cid:83) B = (cid:83) E (see [4, Proposition 2.1] or [8, Lemma approach.Thisprovidesaniterativemethodtocomputethejoint i∈A i i∈A i distribution,butdoesnotaddresstheproblemoftheamountof 1]). Thus, the probability of error of polar codes under SC decoding is equivalently given by PSC(W) = P(cid:8)(cid:83) E (cid:9). memoryrequiredtostoreit.Practicalimplementationofdensity e i∈A i evolution must involve quantization [12, Appendix B]. The In the sequel we assume a genie-aided decoder. probability of error derived from quantized joint distributions The events {B } are disjoint but difficult to analyze. The i approximates,butdoesnotgenerallybound,therealprobability events E are easier to analyze, but are no longer disjoint. A straightfoirward upper bound for P(cid:8)(cid:83) E (cid:9) is the union oferror.ForthespecialcaseoftheBEC,asnotedandanalyzed i∈A i in [8], no quantization is needed, as the polar transform of bound: (cid:40) (cid:41) a BEC is a BEC. Thus, they were able to precisely compute (cid:91) (cid:88) P E ≤ P{E }. the probabilities of unions of error events of descendants of a i i i∈A i∈A BEC using density evolution. Thisboundfacilitatedtheanalysisof[1].Animportantquestion The same bounds for the BEC were developed in [10] using is how tight this upper bound is. To this end, one approach a different approach, again relying on the property that the is to develop a lower bound to P{(cid:83)Ei}, which is what we polar transform of a BEC is a BEC. The authors were able to pursue in this work. track the joint probability of erasure during the polarization A trivial lower bound on a union is process. Furthermore, they were able to show that the union (cid:40) (cid:41) bound is asymptotically tight for the BEC. (cid:91) P E ≥maxP{E }. (2) i i∈A i In this work, we develop an algorithm to compute lower i∈A bounds on the joint probability of error of two synthetic Better lower bounds may be obtained by considering pairs of channels P{E ∪E }. Our technique is general, and applies to i j error events: synthetic channels that are polar descendants of any BMS (cid:40) (cid:41) channel. We use these bounds in (3) to lower-bound the (cid:91) P Ei ≥ maxP{Ei∪Ej}. probability of error of polar codes. For the special case of i,j∈A i∈A the BEC, we recover the results of [8] and [10] using our bounds. Via the inclusion-exclusion principle, one can combine lower bounds on multiple pairs of error events to obtain a better Our method is based on approximating the joint distribution lower bound [9] with a stochastically upgraded joint distribution that has a (cid:40) (cid:41) smalleroutputalphabet.Thedifficultyisthatkeyideasthatare (cid:91) (cid:88) (cid:88) P E ≥ P{E }− P{E ∩E }. (3) true for single channels no longer apply to joint distributions. i i i j i∈A i∈A i,j∈A, For example, a degrading operation on a joint distribution may i<j improvetheperformanceofanSCdecoder.Asanotherexample, This can also be cast in terms of unions of error events using a sufficient statistic for a single synthetic channel is not a P{E ∩E }=P{E }+P{E }−P{E ∪E }. sufficient statistic for the joint distribution: two symbols that i j i j i j To our knowledge, to date there have been two attempts to are indistinguishable for one synthetic channel may have very compute a lower bound on the performance of the SC decoder, different meanings for future synthetic channels. Therefore, we both based on (3). The first attempt was in [8], using a density develop methods that in one sense decouple the two synthetic evolution approach, and the second attempt in [10] applies channels yet in another sense couple them even further. 3 II. OVERVIEWOFOURMETHOD We present our algorithm for lower-bounding the probability In this section we provide a brief overview of our method, of error of polar codes in Section VII. This algorithm is based and lay out the groundwork for the sections that follow. We on the building blocks presented in the previous sections. We aim to produce a lower bound on the probability of error of demonstrate our algorithm with some numerical results in twosyntheticchannels.Sincewecannotknowtheprecisejoint Section VIII. distribution, we must approximate it. The approximation is rooted in stochastic degradation. A. Notation Degradation is a partial ordering of channels. Let W(y|u) We denote by yk =y ,y ,...,y for j <k. We use an j j j+1 k and Q(z|u) be two channels. We say that W is (stochastically) Iverson-style notation (see [13]) for indicator (characteristic) degradedwithrespecttoQ,denotedW (cid:52)Q,whenthereexists functions. I.e., for a logical expression expr, expr is 0 some channel P(y|z) such that whenever expr is not true and is 1 otherwise. We(cid:74)assum(cid:75)e that W(y|u)=(cid:88)P(y|z)Q(z|u). (4) the indicator function takes precedence whenever it appears, e.g., n−1 n>0 is 0 for n=0. z (cid:74) (cid:75) If W is degraded with respect to Q then Q is upgraded with respect to W. Degradation implies an ordering on the III. DECODINGOFTWODEPENDENTCHANNELS probability of error of the channels [12, Chapter 4]: if W (cid:52)Q In this section, we tackle decoding of two dependent then P (W) ≥ P (Q). This is true only when the decoder channels.Weexplainhowthisdiffersfromthecaseofdecoding e e used is the optimal decoder. asinglechannel,anddispelsomemisconceptionsthatmayarise. The notion of degradation readily extends We then specialize the discussion to polar codes. We explain to joint channels. We say that joint channel the difficulty with combining the SC decoder with degradation Q (z ,z |u ,u )(cid:60)W (y ,y |u ,u ) via some degrading procedures, and develop a different decoding criterion instead. a,b a b a b a,b a b a b channel P(y ,y |z ,z ) if Finally,wedevelopaspecialstructureforthedegradingchannel a b a b (cid:88) that, combined with the decoding criterion, implies ordering W (y ,y |u ,u )= P(y ,y |z ,z )Q (z ,z |u ,u ). a,b a b a b a b a b a,b a b a b of probability of error by degradation. za,zb (5) As for the single channel case, if Q (cid:60) W then A. General Case a,b a,b Pe(Wa,b)≥Pe(Qa,b), where Pe is the probability of error of AdecoderforchannelW :U→Yisamappingφthatmaps the optimal decoder for the joint channel. Indeed our approach every output sequence y ∈ Y to some u ∈ U. The average will be to approximate the joint synthetic channel with an probability of error of the decoder for equiprobable inputs is upgraded joint channel with smaller output alphabet. There is given by a snag, however: this ordering of error probabilities does not 1 (cid:88)(cid:88) hold, in general, for suboptimal decoders. P = W(y|u)P{φ(y)(cid:54)=u}. e |U| The SC decoder, used for polar codes, is suboptimal. In the u y genie-aided case, which we consider here, it is equivalent to ThedecoderisdeterministicforsymbolsywhereP{φ(y)(cid:54)=u} performing a maximum likelihood decision on each marginal assumes only the values 0 and 1. For some symbols, however, separately. We shall demonstrate the suboptimality of the SC weallowthedecodertomakearandomdecision.IfW(y|u)= decoder in Section III. Then, we will develop a different W(y|u(cid:48)) for some u,u(cid:48) ∈ U, then P is the same whether e decoding criterion whose performance lower-bounds the SC φ(y) = u or φ(y) = u(cid:48). Thus, the probability of error is decoder performance and is ordered by degradation. While insensitive to the resolution of ties. We denote the error event in general this decoder requires an exhaustive search, for the of a decoder by E = {(u,y):φ(y)(cid:54)=u}. It is dependent specialcaseofpolarcodesthisdecoderiseasilyfound.Itdoes, on the decoder, i.e., E = E(φ); we suppress this to avoid however, imply a special structure for the degrading channel, cumbersome notation. Clearly, P =P{E}. e which we use to our advantage. The maximum-likelihood (ML) decoder, well known to Weinvestigatethejointdistributionoftwosyntheticchannels minimize P when the input bits are equiprobable, is defined e in Section IV. We first bring it to a more convenient form by that will be used in the sequel. Then, we explain how to W(y|u)>W(y|u(cid:48)) ∀u(cid:48) (cid:54)=u⇒φ(y)=u. (6) polarizeajointsyntheticchanneldistributionandexploresome consequences of symmetry. Further consequences of symmetry The ML decoder is not unique, as it does not define how ties arethesubjectofSectionV,inwhichwetransformthechannel are resolved. toanotherformthatgreatlysimplifiesthestepsthatfollow.This Wenowconsidertwodependent binary-inputchannels,W : a form exposes the inherent structure of the joint distribution. U → Y and W : U → Y , with joint distribution W : a b b a,b How to actually upgrade joint distributions is the subject of U×U→Y ×Y . The optimal decoder for the joint channel a b SectionVI.Weupgradethejointdistributionintwoways;each considers both outputs together and makes a decision for both upgrades one marginal without changing the other. We cannot inputs jointly; its probability of error is P (W ). Rather than e a,b simply upgrade the marginals, as we must consider the joint jointly decoding the input bits based on the joint output, we distribution as a whole. This is where the above-mentioned may opt to decode each channel separately. I.e., the decoder methods of coupling-decoupling come into play. of channel W bases its decision solely on y and completely a a 4 ignores yb and vice versa. What are the optimal decoders φa TABLEI andDeφnbo?teThbeyaEnswtheer deerrpoerndesveonnttohfecchriatnenrieolnWof oupntdimeralsiotym. e CONDDITEICOONDAELRDSIOSTFRTIHBEUTMIOANRGWINaA,bL(SyaD,OybN|OuTa,MuIbN)I.MIINZETHPIS{ECaAS∪EE,Tb}H.EML i i decoder φ :Y →U. If we wish to minimize each individual channel’siprobiability of error, we set each decoder as the (ua,ub) (ya,yb) (0,0) (0,1) (1,0) (1,1) ML decoder for the respective channel. We call this the Individual Maximum Likelihood (IML) decoder, and denote (0,0) 0.30 0.04 0.04 0.62 (0,1) 0.44 0.46 0.01 0.09 its probability of error by PIML(W ). Another criterion is to e a,b (1,0) 0.22 0.49 0.24 0.05 minimize P{Ea∪Eb}, the probability that at least one of the (1,1) 0.05 0.54 0.32 0.09 decoders makes an error. We call the decoder that minimizes this probability using individual decoders for each channel the Individual Minimum Joint P (IMJP) decoder. The event distribution, whereas in the second case based on the marginal e E ∪E isnotthesameastheerroreventoftheoptimaldecoder distributions. On the other hand, finding an IMJP decoder a b for the joint channel, even when the individual decoders turn requires an exhaustive search, which may be costly. In the out to be ML decoders. This is because we decode each input polar coding setting, as we now show, the special structure bit separately using only a portion of the joint output. Clearly, of joint synthetic channels permits finding the IMJP decoder without resorting to a search procedure. P (W )≤ min P{E ∪E }≤PIML(W ). (7) e a,b φa,φb a b e a,b 1) Joint Distribution of Two Synthetic Channels: Let W be some BMS channel that undergoes n polarization steps. Let The three decoders in (7) successively use less information for a and b be two indices of synthetic channels, where b > a. their decisions. The optimal decoder uses both outputs jointly The synthetic channels are W (y |u ) and W (y |u ), where as well as knowledge of the joint probability distribution; the a a a b b b y = (yN,ua−1), y = (yN,ub−1), and N = 2n. We call IMJPdecoderretainstheknowledgeofthejointdistribution,but a 1 1 b 1 1 them the a-channel and the b-channel, respectively. Their joint uses each output separately; finally, the IML decoder dispenses distribution is W (y ,y |u ,u ). I.e., this is the probability with the joint distribution and operates as if the marginals are a,b a b a b that the output of the a-channel is y and the output of the independent channels. a b-channel is y , given that the inputs to the channels are u b a Example 1. The conditional distribution Wa,b(ya,yb|ua,ub) and ub, respectively. of some joint channel is given in Table I.2 The marginals With probability 1, the prefix of y is (y ,u ). Namely, y b a a b are channels W (y |u ) and W (y |u ). The optimal decoder a a a b b b has the form for the joint channel chooses, for each output pair, the input pair with the highest probability; it achieves Pe(Wa,b)=0.52. yb =((y1N,ua1−1),ua,uab−+11)≡(ya,ua,yr), It is easily verified that the ML decoders of the marginals where y denotes the remainder of y after removing y and decide that the input is 0 when 1 is received and vice versa; r b a u . Thus, thus, PIML(W ) = 0.7075. If we change the decoder of a e a,b channel Wb to always declare 0, regardless of the output, Wa,b(ya,yb|ua,ub)=2Wb(yb|ub) yb =(ya,ua,yr) , (8) then P{E ∪E } = 0.6575. By checking all combinations (cid:74) (cid:75) a b for some arbitrary y . The factor 2 stems from the uniform of decoders φ ,φ , it can be verified that this is indeed the r a b minimum value of P{E ∪E }. As expected, (7) holds. distribution of ua. With some abuse of notation, we will write a b We now demonstrate that the probability of error of subop- W (y ,y |u ,u )=W (y |u ,u ) a,b a b a b a,b b a b timal decoders is not ordered by degradation. To this end, =W (y ,u ,y |u ,u ). we degrade the joint channel in Table I by merging the a,b a a r a b output symbols (0,0),(1,1) into a new symbol, (0(cid:48),0(cid:48)) and The right-most expression makes it clear that the portion of (0,1),(1,0) into a new symbol, (1(cid:48),1(cid:48)). Denote the new joint y in which the input of the a-channel appears must equal the b channel Wa(cid:48),b. For each of the marginals, the ML decoder actual input of the a-channel. declares 0 upon receipt of 0(cid:48), and 1 otherwise. Hence, for Observe from (8) that we can think of W (y ,u ,y |u ) as b a a r b the degraded channel, PeIML(Wa(cid:48),b) = 0.555, which is lower the joint distribution Wa,b up to a constant factor. Indeed, we than PeIML(Wa,b). For the degraded channel, the IML decoder will use Wb(ya,ua,yr|ub) to denote the joint channel where is also the optimal decoder. As this is a degraded channel, convenient. however, PeIML(Wa(cid:48),b)=Pe(Wa(cid:48),b)≥Pe(Wa,b)=0.52. 2) Decoders for Joint Synthetic Channels: Which decoders can we consider for joint synthetic channels? The optimal B. Polar Coding Setting decoder extracts u from the output of the b-channel and a Given a joint channel distribution, finding an optimal or proceeds to decode ub. This outperforms the SC decoder but IML decoder is an easy task. In both cases we use maximum- is also impractical and does not lend itself to computing the likelihood decoders; in the first case based on the joint probabilitythatisofinteresttous,theprobabilitythateither of thesyntheticchannelserrs.Anaturalsuggestionistomimicthe 2Thisisnotajointdistributionoftwosyntheticchannelsthatresultfrom SCdecoder,i.e.,touseanIMLdecoder.Thejointprobabilityof polarization.However,thephenomenaobservedhereholdforjointdistributions error of this decoder may decrease after stochastic degradation, oftwosyntheticchannelsaswell,andsimilarexamplesmaybeconstructed forthepolarcase. so we discard this option. 5 Consider two decoders φ and φ for channels W and W , The problem of finding the decoder φ that minimizes a b a b b respectively.Asabove,E istheerroreventofchannelW using P{E ∪E } is separable over u ,y ,y ; the terms ϕ (y ,u ), i i a b a a b a a a decoderφ .WeseekalowerboundonP{E ∪E }.Therefore, y =(y ,u ,y ) are non-negative and independent of u . i a b b a a r b we choose decoders φ and φ that minimize P{E ∪E }; T(cid:74)herefore, the op(cid:75)timal decoder φ is given by φ (y ) = a b a b b b b this is none other than the IMJP decoder. Its performance argmax W (y |u(cid:48)). u(cid:48)b b b b lower-bounds that of the IML decoder (see (7)). As we shall Lemma 4. Let W (y |u ) and W (y |u ) be two binary- latersee,combinedwithasuitabledegradingchannelstructure, a a a b b b input channels with joint distribution W (y ,y |u ,u ) and the probability of error of the IMJP decoder increases after a,b a b a b equiprobable inputs. Let φ : Y → U be some decoder for stochasticdegradation.Conversely,itdecreasesunderstochastic b b channelW .Then,thedecoderφ forchannelW givenby(9) upgradation; thus, combining the IMJP decoder with a suitable b a a minimizes P{E ∪E }. upgrading procedure produces the desired lower bound. a b Multiple decoders may achieve min P{E ∪E }. One Proof: Since the input is equiprobable, φa,φb a b decoder can be found in a straight-forward manner; we call it 1−P{E ∪E } the IMJP decoder. The following theorem shows how to find a b 1(cid:88)(cid:88) it. Its proof is a direct consequence of Lemmas 3 and 4 that = W (y ,y |u ,u )ϕ (y ,u )ϕ (y ,u ) 4 a,b a b a b a a a b b b follow. ua, ub, ya yb Theorem 1. Let W (y |u ) and W (y |u ) be two chan- 1(cid:88) 1(cid:88) a a a b b b = ϕ (y ,u )· W (y ,y |u ,u )ϕ (y ,u ) nmeilnsφaw,φitbhPj{oEinat∪dEisbt}ribisutiaocnhieWvead,bbtyhastetstiantgisfiφebs a(s8).anThMenL, 2 uyaa, a a a 2 uybb, a,b a b a b b b b decoder for W and φ according to 1(cid:88) b a = T(y |u )ϕ (y ,u ), 2 a a a a a φ (y )=argmaxT(y |u ), (9) ua, a a ua a a ya where the last equality is by (10). The problem of finding where the decoder φ that minimizes P{E ∪E } is separable over a a b T(y |u )= 1(cid:88)W (y ,y |u ,u )P{φ (y )=u }. ya; clearly the optimal decoder is the one that sets φa(ya)= a a 2 a,b a b a b b b b argmax T(y |u(cid:48)). uybb, Usingu((cid:48)a8),ifφa iaschosenasanMLdecoder,asperLemma3, (10) b we have the following expression for T(y |u ): a a Note that T(y |u ) is not a conditional distribution; it is a a (cid:88)(cid:88) non-negative, but its sum over ya does not necessarily equal 1. T(ya|ua)= Wb(ya,ua,yr|ub)ϕb(yb,ub) yr ub (11) Corollary 2. Theorem 1 holds for any two synthetic channels (cid:88) = maxW (y ,u ,y |u ). Wofap(oylaa|ruizaa)tiaonndstWepbs(yobf|uab)BMthSa,twrehseurlet firnodmexthbeissagmreeatneurmthbaenr yr ub b a a r b a. The IMJP and IML decoders do not coincide in general, although in some cases they may indeed coincide. We demon- Proof:Inthepolarcodecase,thejointchannelsatisfies(8), strate this in the following example. so Theorem 1 applies. Example2. LetW beaBSCwithcrossoverprobabilityp.We In what follows, denote performn=2polarizationstepsandconsiderthejointchannel ϕi(yi,ui)(cid:44)P{φi(yi)=ui}, i=a,b. W1,4, i.e. Wa = W−− and Wb = W++. When p = 0.4, we have 0.6544=PIMJP(W )<PIML(W )=0.6976. On the Lemma 3. Let W (y |u ) and W (y |u ) be two dependent e 1,4 e 1,4 a a a b b b other hand, when p = 0.2, PIMJP(W ) = PIML(W ) = binary-input channels with equiprobable inputs and joint e 1,4 e 1,4 0.5136. In either case, (7) holds. distribution W that satisfies (8). Let φ :Y →U be some a,b a a decoder for channel Wa with error event Ea. Then, setting φb Remark 1. In the special case where W is a BEC and Wa as an ML decoder for Wb achieves minφbP{Ea∪Eb}. and Wb are two of its polar descendants, the IMJP and IML (SC) decoders coincide. This is thanks to a special property of Proof: Recall that y =(y ,u ,y ). Using (8), b a a r the BEC that erasures for a synthetic channel are determined 1−P{E ∪E } by the outputs of the N = 2n copies of a BEC, regardless a b of the inputs of previous synthetic channels. We show this in 1(cid:88)(cid:88) = 4 Wa,b(ya,yb|ua,ub)ϕa(ya,ua)ϕb(yb,ub) Appendix A. uuab, yyab, 3) Proper Degrading Channels: The IMJP decoder is 1 (cid:88) attractiveforjointpolarsyntheticchannelssince,byTheorem1, = ϕ (y ,u ) y =(y ,u ,y ) g(y ), 2yuaa,y,b a a a (cid:74) b a a r (cid:75) b wthee csuancceesffisicvieenftolyrmcoomf pthuetejoiitn.tTchhiasnwneals(8m)a.dTehupso,sswibelesebeky degrading channels that maintain this form. where (cid:88) Let W (y ,y |u ,u ) be a joint distribution of g(y )= ϕ (y ,u )W (y |u ). a,b a b a b b b b b b b b two synthetic channels and let Q (z ,z |u ,u ) (cid:60) ub a,b a b a b 6 z channels.Intuitively,thisisbecausethedecoderfortheoriginal ua Qa,b a Pa ya channelcansimulatethedegradingchannel.WedenotebyEW z a r Pb yr the error event of channel Wa under some decoder φa, and u similarly define EQ, EW, and EQ. Further, we denote by φ u a u a b b i b a decoders for W and by ψ decoders for Q , i=a,b. i i i P(y ,u ,y |z ,u ,z ) Lemma 6. Let joint channel W (y ,u ,y |u ,u ) have a a r a a r a,b a a r a b marginals W (y |u ) and W (y ,u ,y |u ). Assume that a a a b a a r b p Fig.1. Thestructureofproperdegradingchannels. Q (z ,u ,z |u ,u ) (cid:60) W (y ,u ,y |u ,u ), then a,b a a r a b a,b a a r a b min P(cid:110)EQ∪EQ(cid:111)≤min P(cid:8)EW ∪EW(cid:9). ψa,ψb a b φa,φb a b W (y ,y |u ,u ). The marginal channels of Q are a,b a b a b a,b Proof: The proof follows by noting that for any decoder Q (z |u ) and Q (z |u ). The most general degrading a a a b b b φ , i = a,b we can find a decoder ψ with identical channel is of the form i i performance. First consider the decoder for channel a. Denote P(ya,yb|za,zb)=P1(ya|za,zb)·P2(yb|za,zb,ya), by argPa(ya|za) the result of drawing ya with probability P (·|z ). Then, ψ (z ) = φ (argP (y |z )); i.e., this is where P and P are probability distributions. This form a a a a a a a a 1 2 the decoder that results from first degrading the a-channel does not preserve the successive structure of joint synthetic output and only then decoding. Next, consider the decoder channels (8). Even if Q satisfies (8), the resulting W may a,b a,b for the b-channel. Denote by argP (y |z ,u ,z ,y ) the not. To this end, we turn to a subset of degrading channels. b r a a r a result of drawing y with probability P (·|z ,u ,z ,y ). Recalling that y = (y ,u ,y ), we consider degrading r b a a r a b a a r Then, similar to the a-channel case, ψ (z ,u ,z ) = channels of the form b a a r φ (argP (y |z ),u ,argP (y |z ,u ,z ,y )). Hence, the P(y ,u ,y |z ,u ,z ) b a a a a b r a a r a a a r a a r (12) bestdecoderpairψa,ψb cannotdoworsethanthebestdecoder =P (y |z )·P (y |z ,u ,z ,y ). a a a b r a a r a pair φ ,φ . a b I.e., these degrading channels degrade z , the output of Q , to LetW beaBMSchannelthatundergoesnpolarizationsteps. a a y , pass u unchanged, and degrade z , the remainder of Q ’s The probability of error of a polar code with non-frozen set A a a r b output, to y . For this to be a valid channel, P and P must under SC decoding is given by PSC(W) = P(cid:8)(cid:83) EML(cid:9), r a b e a∈A a be probability distributions. This degrading channel structure where EML is the error probability of synthetic channel W a a is illustrated in Figure 1. By construction, degrading channels under ML decoding. Obviously, for any A(cid:48) ⊆A, of the form (12) preserve the form (8) that is required for (cid:40) (cid:41) efficiently computing the IMJP decoder as in Theorem 1. PSC(W)≥P (cid:91) EML . (13) e a Definition1(Properdegradingchannels). Adegradingchannel a∈A(cid:48) p of the form (12) is called proper. We write Q(cid:60)W to denote We have already mentioned the simplest such lower bound, that channel Q is upgraded from W with a proper degrading PeSC(W) ≥ maxa∈AP(cid:8)EMaL(cid:9). We now show that the IMJP channel. We say that an upgrading (degrading) procedure is decoder provides a tighter lower bound. To this end, denote proper if its degrading channel is proper. PIMJP(W )=min P{E ∪E }, where E is the proba- e a,b φa,φb a b i bility of error of channel i under decoder φ . By marginalizing the joint distribution it is straight-forward i todeducethefollowingforjointsyntheticchanneldistributions. Lemma 7. Let W be a BMS channel that undergoes n polarization steps, and let A be the set of non-frozen bits. Lemma 5. If joint channel Q (z ,u ,z |u ,u ) (cid:60)p Then, a,b a a r a b Wa,b(ya,ua,yr|ua,ub), then Qa(za|ua) (cid:60) Wa(ya|ua) and PSC(W)≥ max PIMJP(W )≥maxP(cid:8)EML(cid:9). (14) Qb(za,ua,zr|ub)(cid:60)Wb(ya,ua,yr|ub). e a,b∈A e a,b a∈A a ItiTsheiasslyemtomtaakiesdeengcroaudrianggincgh,anbnuetlisntshuaftfiacrieenutsfeodrfoourrdpeugrrpaodsinegs. Proof:Using(13),PeSC(W)≥maxa,b∈AP(cid:8)EMaL∪EMbL(cid:9). By definition, the IMJP decoder seeks decoders φ and φ that a b asingle(notjoint)syntheticchannelandcastthemintoaproper minimize the joint probability of error of synthetic channels degrading channel for joint distributions. This, however, is not with indices a and b. Therefore, for any two indices a and ourgoal.Instead,westartwithWa,bandseekanupgradedQa,b b we have P(cid:8)EML∪EML(cid:9)≥PIMJP(W ). In particular, this withsmalleroutputalphabetthatcanbedegradedtoW using a b e a,b a,b holds for the indices a,b that maximize the right-hand-side. a proper degrading channel. This is a very different problem This establishes the leftmost inequality of (14). than the degrading one, and its solution is not immediately To establish the rightmost inequality of (14), we first show apparent. Plain-vanilla attempts to use upgrading procedures that for any a,b, for single channels fail to produce the desired results. Later, we develop proper upgrading procedures that upgrade one of PIMJP(W )≥max{P(cid:8)EML(cid:9),P(cid:8)EML(cid:9)}. (15) e a,b a b the marginals without changing the other. We now show that the probability of error of the IMJP de- To see this, first recall that the IMJP decoder performs ML coder does not decrease after degradation by proper degrading decoding on the b-channel, yielding PIMJP(W )≥P(cid:8)EML(cid:9). e a,b b 7 p Next, we construct W(cid:48) (cid:60) W in which the b-channel is b-channel’s ML decision is immediately apparent. Moreover, a,b a,b noiseless, by augmenting the y portion of its with u , i.e., this description greatly simplifies the expressions that follow. r b W(cid:48) (y ,u ,(y ,v )|u ,u ) Lemma 8. Channels Wa,b(ya,ua,yr|ua,ub) and a,b a a r b a b W (y ,u ,d |u ,u ) are equivalent and the degrading a,b a a b a b =W (y ,u ,y |u ,u ) v =u . a,b a a r a b b b channels from one to the other are proper. (cid:74) (cid:75) Channel W(cid:48) can be degraded to W using a proper a,b a,b Proof:Toestablishequivalenceweshowthateachchannel degrading channel by omitting v from the y portion of b r is degraded from the other using proper degrading channels. the output and leaving y unchanged. Thus, PIMJP(W ) ≥ PIMJP(W(cid:48) )=P(cid:8)EML(cid:9).a e a,b The only portion of interest in (12) is Pb, as in either direction aPmneeIayFMxiJnPc(aWl{>lyPda,,,IaabMd00eJ)Pn(>o≥WtedPaw(cid:8)0a)Ee,=MaP0hLIaaM(cid:9)vr.JegPS(mWPinaeIcMxeJaP∈m()AW}aPxwa0a(cid:8)e,,cEbo)∈MabAL≥ta(cid:9)Pin.eIPMBth(cid:8)JPyeE(WMap(01rLoa5(cid:9),o)b,f).afno≥dr ybDay-vaDanWlydduabea,uu,boaa(fyat(ahry,eeau,usaeun,tacdh,obya|furnaa)gl,eliusdsbyd)bmby,bftoholersfidyxeregdsrauydcaihn,guthaac.thTathnhenenebl,.-cDheannonteel c,d e a0,c e d,a0 (cid:88) Lemmas6and7areinstrumentalforourlowerbound,which = W (y ,u ,y |u ,u ) a,b a a r a b combines upgrading operations and the IMJP decoder. Dydab,ua (cid:88) = W (y ,u ,y |u ,u )·P (d |y ,y ,u ), IV. PROPERTIESOFJOINTSYNTHETICCHANNELS a,b a a r a b b b r a a In this section, we study the properties of joint synthetic yr where channels. We begin by bringing the joint synthetic channel into an equivalent form where the b-channel’s ML decision is Pb(db|yr,ya,ua)=(cid:113)yr ∈Dydab,ua(cid:121). immediately apparent. We then explain how to jointly polarize Clearly, the b-channel D-value of (y ,u ,d ) is d . a a b b synthetic channels. Finally, we describe some consequences of On the other hand, by (8) and since all symbols in Ddb symmetry on joint distributions and on the IMJP decoder. share the same b-channel D-value, ya,ua W (y ,u ,y |u ,u ) A. RepresentationofJointSyntheticChannelDistributionusing a,b a a r a b (cid:88) D-values = W (y ,u ,d |u ,u )·P(cid:48)(y |d ,y ,u ), a,b a a b a b b r b a a Two channels W and W(cid:48) with the same input alphabet db but possibly different output alphabets are called equivalent if where W (cid:60)W(cid:48) andW(cid:48) (cid:60)W.WedenotethisbyW ≡W(cid:48).Channel W (y ,u ,y ) equivalence can cast a channel in a more convenient form. For Pb(cid:48)(yr|db,ya,ua)= (cid:88)Wb (ay ,au ,ry )(cid:113)yr ∈Dydab,ua(cid:121), example, if W is a BMS, one can transform it to an equivalent b a a r channelwhoseoutputisasufficientstatistic,suchasaD-value Dydab,ua (see Appendix B), in which case the ML decoder’s decision is and W (y ,u ,y )= 1(cid:80) W (y ,u ,y |u ). immediately apparent. b a a r 2 ub b a a r b Remark 2. In the sequel, we will use this lemma to convert Let W (y ,u ,y |u ,u ) be a joint synthetic channel. a,b a a r a b to D-value representation the result of polarizing a joint Since the joint distribution is determined by the distribution of distribution in D-value representation (see Section IV-B). This W , we can transform W to an equivalent channel in which b a,b is possible because Lemma 8 holds for any representation the b-channel D-value3 of symbol (y ,u ,y ) is immediately a a r of W (y ,u ,y |u ,u ) in which u ,y are the input and apparent. a,b a a r a b a a output, respectively, of the a-channel, u is the input of the b Definition 2 (D-value representation). Joint channel b-channel, and (y ,u ,y ) is the output of the b-channel. a a r W (y ,u ,d |u ,u ) is in D-value representation if the In particular, y need not consist of inputs to channels a,b a a b a b r marginal W satisfies W ,...,W . b a+1 b−1 W (y ,u ,d |0)−W (y ,u ,d |1) Remark 3. At this point the reader may wonder why we have d = b a a b b a a b . b W (y ,u ,d |0)+W (y ,u ,d |1) stopped here and not converted the a-channel output to its D- b a a b b a a b value. The reason is that this constitutes a degrading operation, We use the same notation W for both the regular and a,b whichistheoppositeofwhatweneed.Twoa-channelsymbols the D-value representations of the joint channel due to their with the same a-channel D-value may have very different equivalence. The discussion of the various representations of meanings for the IMJP decoder. Thus, we cannot combine jointchannelsinSectionIII-Bapplieshereaswell.Inparticular, them to a single symbol without incurring loss. we will frequently use W (y ,u ,d |u ) to denote the joint b a a b b When the joint distribution is in D-value representation, synthetic channel distribution. proper degrading channels admit the form The following lemma affords a more convenient description ofthejointchannel,inwhich,inlinewiththeIMJPdecoder,the P(y ,u ,d |z ,u ,z )=P (y |z )P (d |z ,y ,u ,z ). a a b a a b a a a b b a a a b (16) 3By“b-channelD-value”wemeantheD-valuecomputedforchannelWb. It is obvious that all properties obtained from degrading InsteadofD-values,othersufficientstatisticsoftheb-channelcouldhavebeen used.OuruseofD-valueswaspromptedbytheirboundedrange[−1,1]. channelsoftheform(12)areretainedfordegradingchannelsof 8 ua + ya WehaveshownhowtogenerateWaα,bβ fromWa,b.Another Wa,b case of interest is generating Wa−,a+ from Wa. Denote the νa + ηa output of Wa− by ya−. The output of Wa+ is (ya−,ua). From (8), we need only compute W to find W . This a+ a−,a+ is accomplished by (1). u ⊕ν Iftwochannelsareorderedbydegradation,soaretheirpolar u a a b W νa transforms [3, Lemma 4.7]. I.e., if Q (cid:60) W then Q− (cid:60) W− ν a,b db and Q+ (cid:60)W+. This is readily extended to joint channels. b δ b p Lemma 9. Let BMS channel Q(cid:60)W. Then Q (cid:60)W . −,+ −,+ Fig.2. PolarconstructionappliedjointlytoWaandWbwithjointdistribution Wa,b. The two joint channels are independent duplicates; their inputs are Proof: Using (4) and the definition of W−,+ we have combinedusinga(u⊕v,v)construction. W ((y ,y ),u |u ,u ) −,+ 1 2 1 1 2 1 = W(y |u ⊕u )W(y |u ) the form (16). By Lemma 8, we may assume that the degraded 2 1 1 2 2 2 channel is also in D-value representation. (cid:88) 1 = Q(z |u ⊕u )P(y |z )Q(z |u )P(y |z ) 2 1 1 2 1 1 2 2 2 2 z1,z2 B. Polarization for Joint Synthetic Channels (cid:88) = Q ((z ,z ),u |u ,u )P (y ,y |z ,z ), −,+ 1 2 1 1 2 a 1 2 1 2 Let Wa,b(ya,ua,db|ua,ub) be some joint synthetic channel z1,z2 distribution in D-value representation. We wish to find the where P (y ,y |z ,z ) = P(y |z )P(y |z ) is a proper a 1 2 1 2 1 1 2 2 distribution of W where α,β ∈ {−,+}. Even though aα,bβ degrading channel. W isinD-valuerepresentation,afterapolarizationtransform a,b p this is no longer the case. Of course, one can always bring the Lemma10. IfQ (z ,z |u ,u )(cid:60)W (y ,y |u ,u ),then, a,b a b a b a,b a b a b polarized joint channel to an equivalent D-value representation for α,β ∈{−,+}, Q (cid:60)p W . aα,bβ aα,bβ as in Lemma 8. The polar construction is shown in Figure 2, where we Proof: The proof follows similar lines to the proof of explicitly stated the different outputs of the polarized channels. Lemma 9. Expand Waα,bβ using (17) and expand again using We note that the top copy of W outputs, jointly, (y ,u ⊕ the definition of joint degradation with a proper degrading a,b a a ν ,d ), as its a-input is u ⊕ν . channel. Using the one-to-one mappings between the outputs a b a a The input u and output y of W are given by of the polarized channels and the inputs and outputs of non- aα aα aα polarizedchannels,thedesiredresultsareobtained.Thedetails (cid:40) u = ua α=− are mostly technical, and are omitted. aα ν α=+, The operational meaning of Lemma 10 is that to compute a (cid:40) an upgraded approximation of Waα,bβ we may start with (y ,η ) α=− yaα = a a Qa,b, an upgraded approximation of Wa,b, and polarize it. The (ya,ηa,ua) α=+. result Qaα,bβ is an upgraded approximation of Waα,bβ. This enables us to iteratively compute upgraded approximations of The input u and output y of W are given by bβ bβ bβ joint synthetic channels. Whenever the joint synthetic channel (cid:40) u β =− exceeds an allotted size, we upgrade it to a joint channel with ubβ = b a smaller alphabet size and continue from there. We make ν β =+, b sure to use proper upgrading procedures; this preserves the (cid:40) (y ,η ,u ,ν ,d ,δ ) β =− specialstructureofthejointchannelandenablesustocompute y = a a a a b b bβ (y ,η ,u ,ν ,u ,d ,δ ) β =+. a lower bound on the probability of error. In Section VI we a a a a b b b derive such upgrading procedures. Note that yaα and uaα are contained in ybβ. Thus, the joint Since a sequence of polarization and upgrading steps is output of both channels is ybβ. equivalent to upgrading the overall polarized joint distribution, The distribution of the jointly polarized channel is given by using Lemmas 6 and 7 we obtain that the IMJP decoding error of a joint distribution that has undergone multiple polarization W (y ,y |u ,u ) aα,bβ aα bβ aα bβ and proper upgrading steps lower-bounds the SC decoding =2W (y |u ) bβ bβ bβ error of the joint distribution that has undergone only the same (cid:88)(cid:16) (cid:17) = W (y ,u ⊕ν ,d |u ⊕ν )W (η ,ν ,δ |ν ) , polarization steps (without upgrading steps). b a a a b b b b a a b b Bβ (17) C. Double Symmetry for Joint Distributions where A binary input channel W(y|u) is called symmetric if for (cid:88) β =− every output y there exists a conjugate output y¯ such that (cid:88)  ≡ νb W(y|0) = W(y¯|1). We now extend this to joint synthetic Bβ No sum β =+. channels. 9 Definition 3 (Double symmetry). Joint channel Proof: The first item is obvious from (18). For the second W (y ,u ,d |u ) exhibits double symmetry if for every y , item, note that b a a b b a db there exist ya(a), ya(b), ya(ab) such that W (y |u )=(cid:88)(cid:88)W (y ,u ,d |u ) a a a b a a b b db ub Wb(ya,ua,db|ub)=Wb(ya(a),u¯a,db|ub) (=a)(cid:88)(cid:88)W (y(b),u ,−d |u¯ ) =W (y(b),u ,−d |u¯ ) (18) b a a b b b a a b b db ub =Wb(ya(ab),u¯a,−db|u¯b). =(cid:88)(cid:88)Wb(ya(b),ua,−db|u¯b) −db u¯b We call (·)(a) the a-conjugate; (·)(b) the b-conjugate; and =Wa(ya(b)|ua), (·)(ab) the ab-conjugate. We can also cast this definition using where (a) is by (18). In the same manner, y(a) and y(ab) have the regular (non-D-value) representation of joint channels in a a a the same a-channel D-value, −d . straight-forward manner, which we omit here. a Lemma 12 implies that an SC decoder does not distinguish Example 3. Let W be a BMS channel, and consider the joint betweeny andy(b)whenmakingitsdecisionforthea-channel. a a channel formed by its ‘−’- and ‘+’-transforms, W . What We now show that a similar conclusion holds for the IMJP −,+ are the a-, b-, and ab-conjugates of the a-channel output y ? decoder. a Recall that the output of the a-channel W− consists of the Lemma 13. Let y be some output of W . Then outputs of two copies of W. Denote y =(y ,y ), where y a a a 1 2 1 and y2 are two possible outputs of W with conjugates y¯1,y¯2, T(ya|ua)=T(ya(b)|ua)=T(ya(a)|u¯a)=T(ya(ab)|u¯a). respectively. We then have Proof: Theorem 1 holds for joint channels given in D- value representation, W (y ,u ,d |u ,u ). This is easily W (y ,u |u ,u )=2W+(y ,u |u ) a,b a a b a b −,+ a a a b a a b seen by following the proof with minor changes. Under the =W(y1|ua⊕ub)W(y2|ub). D-value representation, (11) becomes 1(cid:88) T(y |u )= maxW (y ,u ,d |u ,u ) By symmetry of W we obtain ya(a) =(y¯1,y2), ya(b) =(y¯1,y¯2), a a 2 ub a,b a a b a b and ya(ab) =(y1,y¯2). Indeed, (cid:88)db (19) = maxW (y ,u ,d |u ). b a a b b ub W+(y ,u |u )=W+(y(a),u¯ |u ) db a a b a a b The remainder of the proof hinges on double symmetry and =W+(y(b),u |u¯ ) a a b follows along similar lines to the proof of Lemma 12, with =W+(y(ab),u¯ |u¯ ). W replaced with T and accordingly the sum over u replaced a a b a b with a maximum operation over u . b Lemma13impliesthattheIMJPdecoderdoesnotdistinguish Weleaveittothereadertoshowthat(18)holdsfortheD-value between y and y(b). representation of the joint channel. a a Corollary 14. Let φ be the IMJP decoder for the a-channel. a Pairs of polar synthetic channels exhibit double symmetry. Then φ (y )=φ (y(b))=1−φ (y(a))=1−φ (y(ab)). a a a a a a a a One can see this directly from symmetry properties of polar synthetic channels, see [1, Proposition 13]. Alternatively, one can use induction to show directly that the polar construction V. SYMMETRIZEDJOINTSYNTHETICCHANNELS preserves double symmetry; we omit the details. This implies In this section we introduce the symmetrizing transform. the following Proposition. The resultant channel is degraded from the original joint distribution yet has the same probability of error. Its main Proposition 11. Let W be the joint distribution of two a,b merit is to decouple the a-channel from the b-channel. This synthetic channels W and W that result from n polarization a b simpler structure is the key to upgrading the a-channel, as we stepsofBMSchannelW.Then,W exhibitsdoublesymmetry. a,b shall see in Section VI. The following is a direct consequence of double symmetry. A. Symmetrized Joint Distribution Lemma 12. LetW (y ,u ,d |u ,u )beajointdistribution a,b a a b a b in D-value representation that exhibits double symmetry. Then The SC decoder observes marginal distributions and makes a decision based on the D-value of each synthetic channel’s 1) For the b-channel, (ya,ua,db) and (ya(a),u¯a,db) have output. In particular, by Lemma 12, the SC decoder makes the same b-channel D-value db. the same decision for the a-channel whether its output was 2) For the a-channel, ya and ya(b) have the same a-channel ya or ya(b) and the b-channel decision is based on db without D-value da, and ya(a) and ya(ab) have the same a-channel regard to ya. By Corollary 14, the IMJP decoder acts similarly. D-value −da. That is, the IMJP decoder makes the same decision for the 10 a-channel whether its output is y or y(b), and the decision for The name ‘symmetrized’ stems from comparison of (21) a a ◦ the b-channel is based solely on d . and (18). We note that Theorem 1 holds for W . b a,b We conclude that if the a-channel were told only whether its A symmetrized joint distribution remains symmetrized upon output was one of {ya,ya(b)}, it would make the same decision polarization.Thatis,ifW◦a,b isasymmetrizedjointdistribution haditbeentolditsoutputwas,say,ya.Thisistrueforeitherthe and W◦ , α,β ∈{−,+} is the result of jointly polarizing aα,bβ SCorIMJPdecoder.Consequently,eitherdecoder’sprobability ◦ ◦ it, then the marginals W and W satisfy (21). This is easily aα bβ of error is unaffected by obscuring the a-channel output in this seen from (17) and (21). manner. ◦ Clearly, W is degraded with respect to W , exactly This leads us to define a symmetrized version of the joint a,b a,b synthetic channel distribution, W◦ , as follows. Let4 the opposite of our main thrust. Nevertheless, as established a,b in Lemma 15, both channels have the same probability of y◦ (cid:44){y ,y(b)}, error under SC (IMJP) decoding. Moreover, if we upgrade the a a a y¯◦ (cid:44){y(a),y(ab)} symmetrized version of the channel, its probability of error a a a under IMJP decoding lower-bounds the probability of error and define of the non-symmetrized channel under either SC or IMJP W◦ (y◦ ,u ,d |u ,u )=W (y ,u ,d |u ,u ) decoding. a,b a a b a b a,b a a b a b What isn’t immediately obvious, however, is what happens +W (y(b),u ,d |u ,u ), a,b a a b a b after polarization. I.e., if we take a joint channel, symmetrize W◦ (y¯◦ ,u ,d |u ,u )=W (y(a),u ,d |u ,u ) it, and then polarize it, how does its probability of error a,b a a b a b a,b a a b a b +W (y(ab),u ,d |u ,u ). compare to the original joint channel that has just undergone a,b a a b a b polarization? Furthermore, what happens if the symmetrized (20) version undergoes an upgrading transform? Lemma 15. Let W be a joint synthetic channel distribution, a,b ◦ Proposition 16. Let W be a joint distribution of two andletW beitssymmetrizedversion.Then,theprobabilityof a,b a,b synthetic channels and let Wt denote this joint distribu- error under SC (IMJP) decoding of either channel is identical. a,b tion after a sequence t of joint polarization steps. Then Proof: By Lemma 12 for the SC decoder or Corollary 14 PIMJP(Wt ) ≥ PIMJP(Q◦t ), where Q◦t is the distribution e a,b e a,b a,b for the IMJP decoder, if the decoder for the symmetrized ◦ of W after the same sequence of polarization steps and any channel makes an error for some symbol y◦ then the decoder a,b a number of proper upgrading transforms along the way. for the non-symmetrized channel make an error for both y a and y(b), and vice-versa. Therefore, denoting by E the error Proof: Let W and W◦ be the polarized versions a aα,bβ aα,bβ indicator of the decoder, ofW andW◦ ,respectively.Forthebβ-channel,thedecoder a,b a,b ◦ P (W◦ )= 1(cid:88) (cid:88)W◦ (y◦ ,u ,d |u ,u )E makes the same decision for either Waα,bβ or Waα,bβ. This is e a,b 4 a,b a a b a b because the decision is based on the b-channel D-value, which ua,uby◦a,db is unaffected by symmetrization (see (20)). (=a)1(cid:88) (cid:88)W (y ,u ,d |u ,u )E Next, for the aα channel, using on (17) a derivation similar 4 a,b a a b a b to the proof of Lemma 13, T(y |u )=T(y(cid:48) |u ), where ua,ubya,db aα aα aα aα =Pe(Wa,b), ya(cid:48)α is any combination of an element of y◦a and an element of η◦ . I.e., y(cid:48) is any one of {y ,η }, {y(b),η }, {y ,η(b)}, a aα a a a a a a where (a) is by (20). {y(b),η(b)}. Thus, the IMJP decoder makes the same decision ◦ ◦ a a The marginal synthetic channels Wa and Wb are given by for the aα-channel for either W or W◦ . aα,bβ aα,bβ W◦ (y◦ |u )= (cid:88) W◦ (y◦ ,u ,d |u ,u ), We compare the channels obtained by the following two a a a a,b a a b a b procedures. ub,db W◦ (y◦ ,u ,d |u )= 1W◦ (y◦ ,u ,d |u ,u ). • Procedure 1: Joint channel Wa,b goes through sequence b a a b b 2 a,b a a b a b t of polarization steps. Note that by double symmetry • Procedure 2: Joint channel Wa,b is symmetrized to form ◦ W◦ (y◦ |u )=W◦ (y¯◦ |u¯ ), Wa,b. It goes through sequence t of polarization steps a a a a a a (without any further symmetrization operations). W◦ (y◦ ,u ,d |u )=W◦ (y¯◦ ,u¯ ,d |u ) b a a b b b a a b b (21) We iteratively apply the above reasoning and conclude in a =W◦ (y◦ ,u ,−d |u¯ ) similar manner to Lemma 15 that both channels have the same b a a b b =W◦ (y¯◦ ,u¯ ,−d |u¯ ). performanceunderIMJPdecoding.Next,wemodifyProcedure b a a b b 2. Definition 4 (Symmetrized distribution). A joint distribution • Procedure 2a: Joint channel Wa,b is symmetrized to form ◦ W . It goes through sequence t of polarization steps whose marginals satisfy (21) is called symmetrized. a,b (without any further symmetrization operations), but at 4Theorderofelementsiny◦a andy¯◦a doesnotmatter.I.e.,{ya,ya(b)}isa somepointmid-sequence,itundergoesaproperupgrading setcontainingbothya andya(b). procedure.