High-Girth Matrices and Polarization Emmanuel Abbe Yuval Wigderson Prog. in Applied and Computational Math. and EE Dept. Department of Mathematics Princeton University Princeton University Email: [email protected] Email: [email protected] Abstract—Thegirthofamatrixistheleastnumberoflinearly for a uniformly drawn matrix A with n columns, with high dependent columns, in contrast to the rank which is the largest probability, number of linearly independent columns. This paper considers the construction of high-girth matrices, whose probabilistic girth 5 rank(A)=nH(girth(A)/n)+o(n), (3) is close to its rank. Random matrices can be used to show the 1 existence of high-girth matrices with constant relative rank, but 0 where H is the binary entropy function. the construction is non-explicit. This paper uses a polar-like 2 construction to obtain a deterministic and efficient construction For F = Fq, the bound in (1) is a restatement of the b of high-girth matrices for arbitrary fields and relative ranks. Singleton bound for linear codes and expressed in terms of the e Applications to coding and sparse recovery are discussed. co-dimension of the code. Asking for a perfect-girth matrix F is hence equivalent to asking for an MDS linear code. Such I. INTRODUCTION 5 constructionsareknownwhenq =nwithReed-Solomoncodes. Let A be a matrix over a field F. Assume that A is flat, Note that the interest on MDS codes has recently resurged ] T i.e., it has more columns than rows. The rank of A, denoted with the applications in distributed data storage, see [5] for a I by rank(A), is the maximal number of linearly independent survey. . columns. The girth of A, denoted by girth(A), is the least s One may consider instead the case of non-finite fields, c number of linearly dependent columns. What are the possible typically not covered in coding theory. As shown in Section [ tradeoffsbetweenrank(A)andgirth(A)?Thisdependsonthe IV-B, this is relevant for the recovery of sparse signals [6] 2 cardinality of the field. It is clear that via compressed measurements. The girth is then sometimes v called differently, such as the Kruskal-rank or spark [6]. As 8 girth(A)≤rank(A)+1. (1) stated above, for F=R, a random Gaussian matrix is perfect- 2 5 Is it possible to have a perfect-girth matrix that achieves this girth with probability one. However, computing the girth of an 6 upper-bound? If F=R, drawing the matrix with i.i.d. standard arbitrary matrix is NP-hard [10] (like computing the distance 0 Gaussian entries gives such an example with probability 1. of a code [11]), making the latter construction non-explicit. . 1 However, if F=F , where q is finite, the problem is different. In this paper, we are mainly interested in the following q 0 For F=F , note that notion of probabilistic girth, defined to be the least number of q 5 columnsthatarelinearlydependentwithhighprobability,when 1 girth(A)=dist(C ), (2) A drawing the columns uniformly at random. Formal definitions : v where dist(C ) is the distance of the q-ary linear code C are given in the next section. Going from a worst-case to a i A A X whose parity check matrix is A. In fact, the least number of probabilistic model naturally allows for much better bounds. r columns that are linearly dependent in A is equal to the least In particular, defining high-girth matrices as matrices whose a probabilistic girth and rank are of the same order (up to o(n)), number of columns whose linear combination can be made 0, a random uniform matrix proves the existence of high-girth which is equal to the least weight of a vector that is mapped to 0 by A, which is the least weight of a codeword, i.e., the matrices even for F = F2. However, obtaining an explicit construction is again non-trivial. code distance since the code is linear. Hence, over finite fields, the girth is a key parameter for In this paper, we obtain explicit and fully deterministic error-correctingcodes,andstudyingthegirth/ranktradeoffsfor constructions of high-girth matrices over any fields and for any matrices is equivalent to studying the distance/dimension trade- relative ranks. We rely on a polar-code-like construction. Start- offsforlinearcodes.Clearlyitisnotpossibletoobtainperfect- ing with the same squared matrix as for polar or Reed-Muller girth matrices over F=F , even if we relax the perfect-girth codes, i.e., the tensor-product/Sierpinski matrix, we then select 2 requirement to be asymptotic, requiring rank(A)∼girth(A) rows with a different measure based on ranks. For finite fields, whenthenumberofcolumnsinAtendstoinfinity.1 IfF=F , we show that high-girth matrices are equivalent to capacity- 2 the Gilbert-Varshamov bound provides a lower-bound on the achieving linear codes for erasure channels, while for errors maximal girth (conjectured to be tight by some). Namely, the speed of convergence of the probabilistic girth requirement matters. In particular, we achieve the Bhattacharyya bound 1Weusethenotationan∼bn forlimn→∞an/bn=1. with our explicit construction. For the real field, this allows to construct explicit binary measurement matrices with optimal For µ∈[0,1], we say that A is µ-high-girth if it is high-girth n probabilistic girth. and girth (A )=µ. ∗ n These results have various other implications. First, our Example 1. Consider the following construction, correspond- construction gives an operational interpretation to the upper- ingtoReed-Solomoncodes.Letx ,...,x bedistinctelements bound of the Bhattacharyya-process in polar codes. When 1 n of a field F, and consider the m×n matrix the channel is not the BEC, the upper-bound of this process 1 1 1 ··· 1 used in the polar code literature is in fact the conditional rank process studied in this paper. Second, this paper gives a high- x1 x2 x3 ··· xn schoollevelproof(notnecessarilytrivialbutrelyingonlybasic V = x21 x22 x23 ··· x2n (7) linear algebra concepts) of a fully deterministic, efficient, and ... ... ... ... ... capacity-achieving code for erasure channels. While capacity- xm−1 xm−1 xm−1 ··· xm−1 achieving codes for the BEC are well-known by now, most 1 2 3 n Then V will satisfy a stronger property than being high-girth, constructions rely still on rather sophisticated tools (expander as its actual girth is m+1: every m×m submatrix will be codes, polar codes, LDPC codes, spatially-coupled codes), and invertible, since every m×m submatrix is a Vandermonde we felt that an explicit construction relying only on the notion matrix whose determinant must be nonzero. However, this of rank and girth is rather interesting. On the other hand, for example cannot be used to construct high-girth families over a F = F , our construction turns out to be equivalent to the 2 fixed finite field F. For as soon as n exceeds the size of F, it polar code for the BEC, so that the difference is mainly about will be impossible to pick distinct x ’s, and we will no longer the approach. It allows however to simplify the concepts, not i have high girth. requiring even the notion of mutual information. Finally, we expect the result to generalize to non-binary alphabets, given Example 2. A µn×n uniform random matrix with entries in that our construction does depend on the underlying field. F is µ-high-girth with high probability. 2 II. HIGH-GIRTHMATRICES III. EXPLICITCONSTRUCTIONOFHIGH-GIRTHMATRICES A. Notation A. Sierpinski matrices LetAbeam×nmatrixoverafieldF.ForanysetS ⊆[n], Let F be any field, let n be a power of 2, and let Gn be the let A[S] be the submatrix of A obtained by selecting the matrix over F defined by columns of A indexed by S. For s ∈ [0,1], let A[s] be a (cid:18) (cid:19)⊗logn 1 1 random submatrix of A obtained by sampling each column G = . (8) n 0 1 independently with probability s. Thus, A[s]=A[S(cid:101)], where S(cid:101) is an i.i.d. Ber(s) random subset of [n]. In expectation, A[s] Note that the entries of this matrix are only 0’s and 1’s, hence hassncolumns.Throughoutthepaper,aneventE takesplace this can be viewed as a matrix over any field. n with high probability if P{En}→1 when n→∞, where n ManyimportantcodescanbederivedfromGn,andtheyare should be clear from the context. allbasedonasimpleidea.Namely,wefirstpicksomemeasure of“goodness”ontherowsofG .Then,wetakethesubmatrix n B. Probabilistic Girth ofG obtainedbykeepingonlythoserowswhicharethe“best” n Definition 1. Let {A } be a sequence of matrices over a field under this metric, and we finally define a code whose PCM is n F, where A has n columns. The probabilistic girth of A thismatrix.ThefirstimportantexamplesareReed-Muller(RM) n n is the supremum of all s∈[0,1] such that A [s] has linearly codes[8],[9],wheregoodnessismeasuredbytheweightofthe n independent columns with high probability, i.e., rows, and more recently polar codes [2], [3], where goodness ismeasuredbytheentropy(ormutualinformation).Inthenext girth ({A }):= (4) ∗ n section, we define a measure of goodness based on ranks and sup{s∈[0,1]:P{An[s] has lin. indep. cols.}=1−o(1)} use it to construct high-girth matrices. A similar construction was proposed in [7] for Hadamard matrices to polarize the Note that a better name would have been the probabilistic Re´nyiinformationdimension.Wediscussapplicationstocoding relativegirth,sinceitisacounterpartoftheusualnotionofgirth and sparse recovery in the next sections. in the probabilistic setting with in addition a normalization fac- tor by n. We often write girth (A ) instead of girth ({A }). B. Conditional-rank matrices ∗ n ∗ n We will sometimes care about how fast the above probability With s∈[0,1] fixed, let G(i) denote the submatrix of G n n tends to 1. We then say that An has a probabilistic girth with obtainedbytakingthefirstirows,andletG(ni)[s]betherandom rate τ(n) if the above definition holds when submatrix obtained by sampling each column independently with probability s, as above. P{A [s] has lin. indep. columns}=1−τ(n). (5) n Definition 3. The conditional rank (COR) of row i in G is Definition 2. We say that A is high-girth if n n defined by girth (A )=limsuprank(A )/n. (6) ∗ n n ρ(n,i,s)=E(rank G(i)[s])−E(rank G(i−1)[s]) (9) n→∞ F n F n whererank denotestherankcomputedoverthefieldF.When Theorem 1 (Application of [4]). For any n, F i=1, define |{i∈[n]:ρ(n,i,s)>1−2−n0.49}| =s+o(1) (15) ρ(n,i,s)=E(rank G(1)[s]) (10) n F n |{i∈[n]:ρ(n,i,s)<2−n0.49}| =(1−s)+o(1) (16) Now, by adding the ith row, we will either keep the rank n constant or increase it by 1, and the latter will happen if and Hence the theorem tells us is that the above martingale only if the ith row is independent of the previous rows. So we polarizes very quickly: apart from a vanishing fraction, all get that ρ(n,i,s)’s are exponentially close to 0 or 1 as n→∞. With this in mind, we define the following. ρ(n,i,s)=P(the ith row of G [s] is n Definition 4. Letnbeafixedpowerof2,andlets∈[0,1]be independent of the previous i−1 rows), (11) fixed.LetH ⊂[n]bethesetofindicesiforwhichρ(n,i,s)> where linear independence is also considered over F. The key 1−2−n0.49, and let m=|H|. By Theorem 1, we know that property of the conditional ranks is expressed in the following m = sn+o(n). Let Rn denote the m×n submatrix of Gn lemma. gotten by selecting all the columns of Gn, but only taking those rows indexed by H. We call R the COR matrix of size n Lemma 1. Define the functions n with parameter s. (cid:96)(x)=2x−x2 (12) Note that the construction of COR matrices is trivial as opposedtotheconstructionofpolarcodesbasedontheentropy r(x)=x2 (13) of mutual information for general sources or channels. WewillindextherowsofR byi∈H,ratherthanj ∈[m]. and define a branching process of depth logn and offspring n We sometimes denote R by R. The most important property 2 (i.e., each node has exactly two descendants) as follows: n of R is expressed in the following theorem. the base node has value s, and for a node with value x, its n left-hand child has value (cid:96)(x) and its right-hand child has Theorem 2. For any s∈[0,1], R [s] has full rank (i.e. rank n value r(x). Then the n leaf-nodes of this branching process m) with high probability, as n→∞. In fact, R [s] has full n are, in order, the values ρ(n,i,s) for 1≤i≤n. rank with probability 1−o(2−n0.49). An important point about this lemma is that the functions TheproofisasimpleconsequenceofLemma1andTheorem (cid:96) and r do not depend on F, while the ρ(n,i,s) values do, 1, and can be found in the Appendix. a priori. Thus, one way to interpret this lemma is that the Theorem 2 implies the following. expected conditional ranks of G do not depend on the field n Theorem 3. For any s∈[0,1], R is s-high-girth. F, even though their definition does. The proof of Lemma 1 n is given in Section V. Since the proof of Theorem 2 works independently of the A key property of the branching process in Lemma 1 is that base field F, the same is true of Theorem 3. Thus, the COR it is a balanced process, meaning that the average value of the construction is a fully deterministic and works over any field. two children of a node with value x is x again: In fact, it requires only two values (0 and 1) for the matrix, even when F=R. (cid:96)(x)+r(x) (2x−x2)+x2 = =x (14) 2 2 IV. APPLICATIONSOFHIGH-GIRTHMATRICES A. Coding for erasures This means that this branching process defines a martingale, by letting a random walk go left or right with probability half. LetFafieldandp∈[0,1].Thememorylesserasurechannel Moreover, since ρ(n,i,s) is a probability, we have that this on F with erasure probability p, denoted by MEC(p), erases martingalestaysin[0,1].SobyDoob’smartingaleconvergence each component of a codeword on F independently with theorem, we must have that this martingale converges almost probability p. Denoting by ε the erasure symbol, the output surely to its fixed points. In fact, Doob’s theorem is not alphabet is hence F =F∪{ε} and the transition probability ∗ needed here, as one may conclude using only the fact that the of receiving y ∈F when x∈F is transmitted is ∗ increments are orthogonal.2 Its fixed points are those x’s for (cid:40) p if y =ε, which (cid:96)(x) = r(x) = x. The only points satisfying this are W(y|x)= (17) 0 and 1, so this martingale polarizes. In fact, much can be 1−p if y =x. said about the speed of polarization of this process, as it is The memoryless extension is defined by Wn(yn|xn) = equivalenttothepolarizationprocessforBECchannelsstudied (cid:81)n W(y |x ) for xn ∈Fn,yn ∈Fn. in [4]. i=1 i i ∗ Recall that a code of block length n and dimension k over the alphabet F is a subset of Fn of cardinality |F|k. The 2PrivatediscussionwithE.Telatar.Seealso[1] code is linear if the subset is a subspace of dimension k. In particular, a linear code can be expressed as the image of a k-sparse from Ax, it must be that Ax (cid:54)= Ax(cid:48) from all x,x(cid:48) generator matrix G ∈ Fn×k or as the null space of a parity- that are k-sparse (and different). Hence A needs3 to have girth check matrix H ∈F(n−k)×n. The rate of a code is defined by 2k+1. k/n. A rate R is achievable over the MEC(p) if the code can One may instead consider a probabilistic model where a k- correct the erasures with high probability. More specifically, sparsesignalhasarandomsupport,drawnuniformlyatrandom R is achievable if there exists a sequence of codes C of or from an i.i.d. model where each component in [n] belongs n blocklength n and dimension k having rate R, and decoders to the support with probability p=k/n. The goal is then to n D : Fn → Fn, such that P (C ) → 0, where for xn drawn construct a flat matrix A which allows to recover k-sparse n ∗ e n uniformly at random in C and yn the output of xn over the signals with high probability on the drawing of the support. n MEC(p), and Note that a bad support S is one which is k-sparse and that can be paired with another k-sparse support S(cid:48) such that that P (C ):=P{D(yn)(cid:54)=xn}. (18) e n there exists real-valued vectors x,x(cid:48) supported respectively on The dependency in D is not explicitly stated in P as there S,S(cid:48) which have the same image through A, i.e., n e is no degree of freedom to decode over the MEC (besides Ax=Ax(cid:48) ⇐⇒ A(x−x(cid:48))=0. (19) guessing the erasure symbol), as shown in the proof of next lemma. Note now that this is equivalent to saying that the columns of The supremum of the achievable rates is the capacity, given A indexed by S∪S(cid:48) are linearly dependent, since x−x(cid:48) is by 1−p. We now relate capacity-achieving codes on the MEC supported on S∪S(cid:48) which is 2k-sparse. and high-girth matrices. Hence, the probability of error for sparse recovery is given by Lemma2. AlinearcodeC achievesarateRontheMEC(p) n if and only if its parity check matrix has probabilistic girth at P {∃S(cid:48) :A[S∪S(cid:48)] has lin. dep. columns}. (20) S least 1−R. In particular, a code achieves capacity on on the This error probability can be upper-bounded as for errors (see MEC(p) if and only if its parity check matrix is p-high-girth. next section and Section VII), by estimating the probability In particular, the linear code whose parity-check matrix is a that A has a subset of up to 2k linearly dependent columns, CORmatrixofparameterpachievescapacityontheMEC(p). which relies on the high-girth property of A. Remark 1. In the binary case, COR codes give a new C. Coding for errors interpretation to BEC polar codes: instead of computing the In this section, we work over the binary field F . The mutual information of the polarized channels via the generator 2 binary symmetric channel with error probability p, denoted by matrix, we can interpret BEC polar codes from the girth of BSC(p), flips each bit independently with probability p. More the parity-check matrix. Note that this simplifies the proof formally, the transition probability of receiving y ∈F when that BEC polar codes achieve capacity to a high-school linear 2 x∈F is transmitted is given by algebra — mutual information need not be even introduced. 2 (cid:40) The only part which may not be of a high-school level is p if y (cid:54)=x W(y|x)= (21) the martingale argument, which is in fact not necessary, as 1−p if y =x already known in the polar code literature (see for example [1, Homework 4], which basic algebra). The memoryless extension is then defined by Wn(yn|xn)= (cid:81)n W(y |x ), for xn,yn ∈Fn. Remark 2. As shown in [2], the action of the matrix G = i=1 i i 2 n (11)⊗logn on a vector can be computed in O(nlogn) time Theorem 4. Let p∈[0,1/2] and s=s(p)=2(cid:112)p(1−p) be 01 as well, which means that the encoding of the COR code can the Bhattacharyya parameter of the BSC(p). Let {C } be n be done in O(nlogn) time, as well as the code construction the COR code with parameter s (the code whose PCM is R ). n (which is not the case for general polar codes). Decoding ThenC canreliablycommunicateovertheBSC(p)withhigh n the COR code can be done by inverting the submatrix of A probability, as n→∞. corresponding to the indices that do not have erasure symbols, Note that unlike in the erasure scenario, Theorem 4 does which can be done by Gaussian elimination in O(n3) time. not allow us to achieve capacity over the BSC(p). For the Alternatively, COR codes can be decoded as polar codes, i.e., capacity of the BSC(p) is 1−H(p), and successively, in O(nlog(n)). Hence, like polar codes for the (cid:112) BEC, COR codes are deterministic, capacity-achieving, and 1−H(p)≥1−2 p(1−p) (22) efficiently encodable and decodable for the MEC. with equality holding only for p∈{0,1,1}. 2 B. Sparse recovery This statement, unlike the ones for erasure correction and In the setting of sparse recovery, one wishes to recover a sparse recovery, was stated only for the COR code, and not for real-valued sparse signal from a lower-dimensional projection 3Notethatfornoisestabilityortoobtainaconvexrelaxationofthedecoder, [6]. In the worst-case model, a k-sparse signal is a vector oneneedsthecolumnstohaveinadditionsingularvaluescloseto1,i.e.,the with at most k non-zero components, and to recover x that is restrictedisometryproperty(RIP). general high-girth codes. Our proof of this statement, which Next, we consider the case when i is even. In this case, we can be found in the Appendix, relies on the actual construction wish to prove that of COR codes and on the upper-bound on the successive (cid:18) (cid:18) (cid:19)(cid:19) n i probability of error in terms of the COR known from polar ρ(n,i,s)=r ρ , ,s (32) 2 2 codes. One may also attempt to obtain this result solely from We proceed analogously. From the equation G =G ⊗G , the high-girth property, but this requires further dependencies n n/2 2 we see that the ith row of G is the (i/2)th row of G , on the high-girth rate of convergence (see Section VII). It is n n/2 except with a 0 intersprersed between every two entries. Thus, an interesting problem to obtain achievable rates that solely the ith row will be dependent if either it restriced to O or depend on the probabilistic girth for the BSC. it restricted to E will be dependent; in other words, it’ll be V. SOMEPROOFS independent if and only if both the restriction to O and the Proof of Lemma 1: We induct on logn. The base case is restriction to E are independent. Therefore, logn=1, where the calculation is straightforward. The rank ρ(n,i,s)=P(the restriction to O and of G(1)[s] will be 0 if no columns are chosen, and will be 1 2 the restriction to E are independent) (33) if at least 1 column is chosen. Therefore, =P(the restriction to O is independent)· ρ(2,1,s)=E(rank (G(1)[s])) (23) F 2 P(the restriction to E is independent) (34) =0·(1−s)2+1·2s(1−s)+1·s2 (24) (cid:18) (cid:18)n i (cid:19)(cid:19)2 =2s−s2 =(cid:96)(s) (25) = ρ , ,s (35) 2 2 Similarly, E(rankF(G(22)[s]))=2s, and thus (cid:18) (cid:18)n i (cid:19)(cid:19) =r ρ , ,s (36) ρ(2,2,s)=(2s)−(2s−s2)=s2 =r(x) (26) 2 2 Notethatinallourcalculations,weusedprobabilityarguments Note that all these calculations do not actually depend on F. that are valid over any field. Broadly speaking, this works For the inductive step, assume that ρ(n/2,i,s) is the leaf because the above arguments show that the only sorts of linear value of the branching process for all 1≤i≤n/2. To prove dependence that can be found in G(i)[s] involves coefficients the same for ρ(n,i,s), write n in {−1,0,1}. Since these elements are found in any field, we G =G ⊗G . n n/2 2 have that this theorem is true for all fields F. In other words, we think of G as being an (n/2)×(n/2) Proof of Lemma 2: Note that a decoder over the MEC n matrix whose entries are 2×2 matrices. needstocorrecttheerasures,butthereisnobiastowardswhich We begin with the case when i is odd. By the inductive symbol can have been erased. Hence, a decoder on the MEC hypothesis, we wish to prove that is wrong with probability at least half if there are multiple (cid:18) (cid:18) (cid:19)(cid:19) codewords that match the corrupted word. In other words, the n i+1 ρ(n,i,s)=(cid:96) ρ , ,s (27) probability of error is given by4 2 2 P (C )=P {∃x,y ∈C ,x(cid:54)=y,x[Ec]=y[Ec]} (37) We partition the columns of G(i)[s] into two sets: O, which e n E n n =P {∃z ∈C ,z (cid:54)=0,z[Ec]=0} (38) consists of those columns which have an odd index in G , and E n n E, which consists of those with an even index in G . Since where E is the erasure pattern of the MEC(p), i.e., a n G = G ⊗G , we see that for i odd, the ith row of G random subset of [n] obtained by picking each element with n n/2 2 n is the ((i+1)/2)th row of G , except that each entry is probability p. Let H be the parity-check matrix of C , i.e., n/2 n n repeated twice. From this, and from inclusion-exclusion, we C =ker(H ). Note that E has the property that there exists n n see that a codeword z ∈ C such that z[Ec] = 0 if and only if the n columns indexed by E in H are linearly dependent. Indeed, ρ(n,i,s)=P(row i of G is independent n n assume first that there exists such a codeword z, where the of the previous rows) (28) support of z is contained in E. Since z is in the kernel of H , n =P(row i of Gn[O] is independent the columns of Hn indexed by the support of z must add up of the previous rows of G [O]) to 0, hence any set of columns that contains the support of z n must be linearly dependent. Conversely, if the columns of H +P(row i of G [E] is independent n n indexed by E are linearly dependent, then there exists a subset of the previous rows of G [E]) n of these columns and a collection of coefficients in F such −P(both of the above) (29) that this linear combination is 0, which defines the support of (cid:18)n i+1 (cid:19) (cid:18)n i+1 (cid:19)2 a codeword z. Hence, =2ρ , ,s −ρ , ,s (30) 2 2 2 2 P (C )=P {H [E] has lin. dependent columns}. (39) e n E n (cid:18) (cid:18) (cid:19)(cid:19) n i+1 =(cid:96) ρ , ,s (31) 4If ties are broken at random, an additional factor of 1−1/|F| should 2 2 appearontherighthandside. Recalling that the code rate is given by 1−r, where r is the any set S ⊆[n], we see that relative rank of the parity-check matrix, the conclusions follow. P(B )=P(the ith row of R[s] is dependent on i the previous rows of R[s]) (40) ≤P(the ith row of G [s] is dependent on ACKNOWLEDGEMENTS n the previous rows of G [s]) (41) n This work was partially supported by NSF grant CIF- =1−ρ(n,i,s) (42) 1706648 and the Princeton PACM Director’s Fellowship. <2−n0.49 (43) REFERENCES Therefore, (cid:32) (cid:33) (cid:32) (cid:33) (cid:92) (cid:91) [1] EmmanuelAbbe. LecturenotesforthecourseAPC529:CodingTheory P Bc =1−P B (44) andRandomGraphs. Availableonrequest,2013. i i [2] ErdalArıkan. Channelpolarization:Amethodforconstructingcapacity- i∈H i∈H (cid:88) achieving codes for symmetric binary-input memoryless channels. ≥1− P(B ) (45) i InformationTheory,IEEETransactionson,55(7):3051–3073,2009. i∈H [3] ErdalArıkan. Sourcepolarization. arXivpreprintarXiv:1001.3087, m 2010. >1−(cid:88)2−n0.49 (46) [4] ErdalArıkanandEmreTelatar. Ontherateofchannelpolarization. In InformationTheory,2009.ISIT2009.IEEEInternationalSymposiumon, i=1 pages1493–1495.IEEE,2009. =1−[sn+o(n)]2−n0.49 (47) [5] AlexandrosGDimakis,KannanRamchandran,YunnanWu,andChangho Suh. Asurveyonnetworkcodesfordistributedstorage. Proceedingsof =1−o(cid:16)2−n0.49(cid:17) (48) theIEEE,99(3):476–489,2011. [6] DavidLDonohoandMichaelElad. Optimallysparserepresentationin →1 as n→∞ (49) general(nonorthogonal)dictionariesvia(cid:96)1minimization. Proceedings oftheNationalAcademyofSciences,100(5):2197–2202,2003. [7] S.HaghighatshoarandE.Abbe. PolarizationoftheRe´nyiinformation dimensionforsingleandmultiterminalanalogcompression. InInfor- Proof of Theorem 4: We recall from [2] that in any code mationTheoryProceedings(ISIT),2013IEEEInternationalSymposium generated by taking some of the rows of G , we have that the n on,pages779–783,July2013. probability of error on the BSC(p) is upper-bounded as [8] D.E.Muller. ApplicationofBooleanalgebratoswitchingcircuitdesign andtoerrordetection. ElectronicComputers,TransactionsoftheI.R.E. (cid:88) P ≤ Zi (50) ProfessionalGroupon,EC-3(3):6–12,Sept1954. e n [9] I.S.Reed. Aclassofmultiple-error-correctingcodesandthedecoding i∈Hc scheme.InformationTheory,TransactionsoftheIREProfessionalGroup on,4(4):38–49,1954. where H denotes the set of rows of Gn that we keep and Zni [10] A.M.TillmannandM.E.Pfetsch. Thecomputationalcomplexityofthe denotestheBhattacharyyaparameteroftheithrow.Proposition restrictedisometryproperty,thenullspaceproperty,andrelatedconcepts 5 in [2] tells us that in compressed sensing. Information Theory, IEEE Transactions on, 60(2):1248–1259,Feb2014. (cid:40)=(Zi/2)2 when i is even [11] A.Vardy. Theintractabilityofcomputingtheminimumdistanceofa Zi n/2 (51) code. InformationTheory,IEEETransactionson,43(6):1757–1766,Nov n ≤2(Z(i+1)/2)−(Z(i+1)/2)2 when i is odd 1997. n/2 n/2 We recognize these functions as r and (cid:96). Thus, we see that the branching process of Lemma 1, when initialized at s = VI. APPENDIX Z(BSC(p)), provides an upper bound for the Bhattacharyya A. More Proofs parameters.Now,recallthattherowselectioncriterionforCOR matrices only keeps the rows with high ρ values, and thus high Proof of Theorem 2: For i ∈ H, let Bi be the event Z values. Thus, we see that (50) ensures that COR codes with that the ith row of R[s] is linearly dependent on the previous parameter s=2(cid:112)p(1−p) can successfully transmit over the rows. Note that if R[s] has full rank, then no Bi is satisfied, BSC(p). while if R[s] has non-full rank, then there must be some linear dependence in the rows, so at least one B will be satisfied. In VII. ERRORSFROMHIGH-GIRTHMATRICES i other words, the event whose probability we want to calculate It is an interesting to study what rate on the high-girth is simply the event (cid:84) Bc. property allows to achieve positive rates on the BSC. A classic i∈H i Notethatinournotation,theithrowofRisalsotheithrow union-bound requires at least exponential rate, as explained ofG .Therefore,foranyS ⊆[n],theithrowofR[S]istheith next. This underlines that COR matrices achieve rates higher n row of G [S]. This means that any linear dependence between than what arbitrary high-girth matrices may reach. n theithrowofA[S]andthepreviousrowsautomaticallyinduces LetH =H beamatrixwithprobabilisticgirthµ.Consider n a linear dependence between the ith row of G [S] and the C to be the code whose parity check matrix is H. The n previous i−1 rows, since the previous rows in G [S] are a probabilityoferrorofthiscodeontheBSC(p)istheprobability n superset of the previous rows in R[S]. Since this is true for that an error vector Z has in its coset (i.e., the other error vectors that lead to the same syndrome) a more likely vector, i.e., P =P {∃z(cid:48) ∈Fn :HZ =Hz(cid:48),w(z(cid:48))≤w(Z)} (52) e Z 2 where Z is i.i.d. Bernoulli(p). This is equivalent to P =P {∃x∈Fn :Hx=0,w(x+Z)≤w(Z)}. (53) e Z 2 Note that w(x+Z)≤w(Z) means that Z takes more often the value 1 in the support of x, which has probability w (cid:18) (cid:19) (cid:88) w pk(1−p)w−k, (54) k k=w/2 where w/2 is rounded up if not even. Note that w (cid:18) (cid:19) (cid:88) w pk(1−p)w−k (55) k k=w/2 ≤pw/2(1−p)w/22w =z(p)w, (56) where z(p)=2p1/2(1−p)1/2 (57) is the Bhattacharyya parameter of the channel. Hence, (cid:88) P ≤ N(w)z(p)w (58) e w≥1 where N(w) is the number of codewords of weight w, i.e., N(w)=|{x:Hx=0,w(x)=w}|. (59) Note that N(w)≤|{x:H[x] has lin. dep. col.,w(x)=w}|. (60) (cid:18) (cid:19) n = P {H[S] has lin. dep. col.}, (61) w S where S is uniformly drawn with a support of size w. By standard concentration arguments, this is upper-bounded by (cid:112) P{H[s] has lin. dep. col.} where s = w/n+o( w/n), and by assumption, P{H[s] has lin. dep. col.}=τ(n) (62) when s<µ. Thus, (cid:88) (cid:88) P ≤ τ(n)2n(H(t)+tlog(z(p)))+ N(tn)z(p)tn (63) e t<µ t≥µ where t takes values such that tn is an integer. Hence τ(n) must be exponentially small to drive the first term to 0.