Table Of Content

On the Threshold of Maximum-Distance Separable Codes Bruno Kindarji∗† Ge´rard Cohen† Herve´ Chabanne∗† ∗ Sagem Sećurite´ †Institut Te´lećom Osny, France Te´lećom ParisTech Paris, France Abstract—StartingfromapracticaluseofReed-Solomoncodes the literature concerning list-decoding, which usually looks 0 inacryptographicschemepublishedinIndocrypt’09,thispaper for radii for which the size is always upper-bounded by a 1 deals with the threshold of linear q-ary error-correcting codes. maximum list size, or tries to exhibit a counter-example. 0 The security of this scheme is based on the intractability of 2 polynomial reconstruction when there is too much noise in The “large enough” list size can be obtained easily by n the vector. Our approach switches from this paradigm to an imposingthatMaximum-LikelihoodDecodingto be mostim- a InformationTheoreticalpointofview:isthereaclassofelements probable.Forthat,wefocusontheall-or-nothingbehaviourof J that are so far away from the code that the list size is always theML decoder.Inspiredbypercolationtheory[4], andcode- 4 superpolynomial? Or, dually speaking, is Maximum-Likelihood applied graph theory [5], we will show how it is possible to decoding almost surely impossible? 1 conservatively estimate, before, after, and around a threshold, We relate this issue to the decoding threshold of a code, and showthatwhentheminimaldistanceofthecodeishighenough, the all-or-nothing probability of ML decoding. ] T the threshold effect is very sharp. In a second part, we explicit II. THETHRESHOLD OF ACODE I lower-bounds on the threshold of Maximum-Distance Separable . codes such as Reed-Solomon codes, and compute the threshold The existence of a threshold is motivated by the classical s c for the toy example that motivates this study. question of percolation : given a graph, with a source, and [ a sink, and given the probability p for a “wet” node of the I. INTRODUCTION graph to “wet” an adjacent node, what is the probability for 1 v In [1], Bringer et al. proposed a low-cost mutual authen- thesourcetowetthesink?Itappearsthatthisprobabilityhasa 3 thresholdeffect;inotherwords,thereexistsalimitprobability tication protocol, that uses a Reed-Solomon code structure. 6 p suchthat,ifp>p , thenthesinkisalmostsurelywet,and This protocol is pretty simple: Bob owns two secret polyno- c c 4 if p < p , then the sink is almost never wet. The threshold 2 mials Pb,Pb′ of degree less than k known only by Alice; to effect is icllustrated in Fig 1. . authenticate herself to Bob, Alice proves the knowledge of 1 This question can be transposed into the probability of 0 Pb by sending i,Pb(αi) where αi is the i-th element of a 0 Fq. Bob proveshhis identiity by replying with hPb′(αi)i. This aerrdoerc-coodrirnegctianlggoaritchomde,.wGhivaetnisatphreopporrotbioanbioliftyerorofrscopr,rewcittlhy 1 protocolis made such that if Alice speaks to a lot of persons, v: it is hard to trace Bob out of all the conversations, and it recovering the sent codeword? It was shown in [6] that for every binary code, and every decoding algorithm, this i is hard to impersonate Alice (or Bob). The security of the X probability also follows a threshold. protocolisbasedonanalgorithmicassumption,sayingthatthe r polynomial reconstruction problem is hard for the vectors of In this paper, we show that this property also applies to q- a ary codes. In the following part, we show that the threshold Fn that are far enoughfromthe code. Indeed,the best known q behaviourthatwasseenonbinarycodescanbeobtainedagain. algorithms solving polynomial reconstruction are those of Guruswami-Sudan[2] and,on arelatedproblem,Guruswami- A. The Margulis-Russo Identity Rudra[3],whichcanbasicallyreconstructapolynomialgiven The technique used to derive threshold effects in discrete √kn correct values. spaces is to integrate an isoperimetric inequality; for that, the This algorithmic security result is somehow unsatisfying, Margulis-Russo identity is required. for it is possible to exhibit better decoding algorithm. We Let H = 0,1 n be the Hamming space; the Hamming { } therefore take interest in the information-theoretic aspect of distance d(x,y) provides the number of different coordinates such a problem. between vectors x and y. Consider the measure µ : H p → The solution of the problem raised by [1] is to look at the [0,1] defined by µ (x) =pw(x)(1 p)n w(x) where w(x) is p − − output of a list-decoder centered around the received values, the Hamming weight of x. The number of limit-vectors of a andtooutputthepossiblepolynomialsascandidatevaluesfor subsetA H is a functiondefinedas h (x)= B(x,1) A A ⊂ | ∩ | P orP .Ourapproachconsistsinlookingatausuallyignored for x A. b b′ ∈ side of list-decoding, which is to look at the radii r such that For A H such that A is increasing (i.e. if x A, and ⊂ ∈ list-decodingawordwithradiusrprovidesalistthatisalways y x,theny Awith definedcomponent-wise),Margulis ≥ ∈ ≥ lower-bounded by a large enough number. This differs from and Russo showed : n = (k A (n k+1)(q 1)A ) k k 1 dµp(A) = 1 h (x)dµ (x) kX=1 | |− − − | − | dp pZ A p p k A (1 p)n k − ·(cid:18)q 1(cid:19) − alsLoettruqe∈inNH,qq =>{20.,T..h.qis−se1c}tnio.n shows that this equality is o=nPthnke=o0th|Aerk|h(akn−d,pn1−−pk)(cid:16)q−p1(cid:17)−k(1−p)n−k For a vector x H , the support of x is the set of all its ∈ q k 0pn})on.n−-Dnwue(lfixl)nceowotihrtdheinmwate(exass),uir.=ee.fsu|usnpuctppi(opxn()xµ)=|p(t{xhi)e∈=w{e1i(cid:16)g,qh.−p.t.1,o(cid:17)nfw}(xx:.)x(T1ih6=−is =dµdpPp(Ank)==0|APknk|=(cid:16)0q−|pA1k(cid:17)|kddp(1(cid:18)−(cid:16)qp−p)n1−(cid:17)k(cid:16)(1kp−+p−)n(1n−−−kpk(cid:19))(cid:17) definition is consistent with a measure, as µp(Hn) = Hence the identity. PNx∈oHteqtµhpe(xin)c=lus1io.n to be the relation between a set and a truTehoisnlem0.m..(aqsho1w)snt;haitt twhaesMthaergkueliyss-tRounsesoofidtehnetirteyasisonailnsog ⊂ { − } (general)subset(i.e.forallX,X X).Thesupportinclusion donein[5]toshowanexplicitformofthethresholdbehaviour ⊂ generalisesthecomponent-wise thatwas usedin thebinary of Maximum-LikelihoodError Correction. ≤ case. B. A Threshold for Error-Decoding q-ary codes Lemma 1 (Margulis-Russo Identity over q-ary alphabets): fLoertaAllbxe anHinqcrseuacshintghastubssueptpo(fy)Hq, is.eu.pspu(cxh),ththaetnifxy ∈AA., butIinonth,eΦf(oxll)ow=ingx,weϕu(ts)edϕt(tth)e=ac√cu12πmeu−lat2t2etnhoernmoarlmfaulndcitsiotrni-, ∈ ⊂ ∈ Then and Ψ(x)=ϕ(ΦR−1∞(x)) (so that x,Ψ(x) Φ 1(x)=1). − ′− ∀ · A monotone property is a set A H such that A is q ⊂ dµ (A) 1 increasing, or A is increasing. dpp = pZ hA(x)dµp(x) Theorem 1: LetAbea monotonepropertyofHq. Suppose A that x A,h (x)=0 or h (x) ∆. A A ∀ ∈ ≥ Letθ [0,1]be(theuniquereal)suchthatµ (A)= 1.Let Proof: The proof of this lemma is an adaptation of ∈ θ 2 Margulis’ proof in [7]. For this, we use the notation: gθ(p)=Φ √2∆(√ lnθ √ lnp) . (cid:16) − − − (cid:17) Then the measure of A, µ (A) is bounded by : [A,B] = x,y A B : d(x,y) = 1 where A,B p • |{ } ∈ × | ⊂ Hq, is the number of links from A to B µp(A) gθ(p) for p ]0;θ] • ffoorr Ak ∈{H0,.,.A.,n=},AZk =Z{x(A∈isHtqhe:wre(uxn)io=nko}f,the A ); µp(A) ≤≥ gθ(p) for p∈∈[θ;1[ q k k k • ⊂ ∩ Sketch of Proof D = h (x) is the number of limit-vectors next • tokelemPenxt∈sAokf wAeight k. The proofis exactly the same as the one from [5]. The whole idea is to derive the upper-range: Trivially,D =[A ,Z A ]+[A ,Z A ]+ k k k+1 k+1 k k 1 k 1 [A ,Z A ]. We now note−that : − − − 1 k k k h dµ 2ln Ψ(µ (A)) − Z A p ≥r p p [A ,Z ]= A k, as to go fromA to Z , the only Ap k k 1 k k k 1 • way (in−one m|ove|) is to put one coordinate−to 0; The integration of this equation, together with the Margulis- [A ,Z ]= A (n k)(q 1)withthesamereasoning; Russo lemma, gives the result. k k+1 k • [A ,Z A |] =| [−A ,Z− A ] = 0 as A is To conclude this part, we remark that the non-decoding k k k k k+1 k+1 • increasin−g. − region of a given point, for a q-ary code, is an increasing Combining these equalities, we get [Ak,Ak+1] = region of Fnq. For linear codes, this non-decoding region can • A (n k)(q 1); alwaysbe translatedtothatof0withoutlossofgenerality;let k • |[cAookr|,dZink−a]te=to 00−aansd aitniusllnoenceestsoary1,t.o..qput1a. non-null pAr0ob=ab{ilxity∈oFfnqerrso.tr. d∃ecco∈diCn,gco6=f C0i:sdth(xen,cµ)p≤(Ad0()x.,0)}. The { − } For x µ (A ), we show that either h (x) = 0, or 1)(Fqinal1ly)DAk =[fAokr,kZ>k−01]a−nd[ADk,A=k−01(]o=r Ak|A=kH|−)(.n−k+ hA0(x) ≥∈d2. pInde0ed, if hA0(x) > 0, then xA0is nearer to a − | k−1| 0 q non-null codeword c than to 0. Then all the vectors obtained Back to the identity desired, we observe that byreplacingoneofthecoordinatesofxby0areoutofA ;in 0 particular, h (x) d(x,0). Let d = d(c,0) be the weight Z hA(x)dµp(x) = n hA(x)(q p 1)k(1−p)n−k porfevc;ioaussxassiseAr0tnieoanr.e≥r to c than to 0c, d(x,0) ≥ d2c. Thus the A Xk=0xX∈Ak − Combiningthepreviousresults,wejustshowedthatforany n p k = D (1 p)n k q-ary code, the probability of error is, as for binary codes, k − Xk=0 (cid:18)q−1(cid:19) − bounded by a threshold function. This can be expressed by 1 x: s.t. c C,c=0:d(x,c)=w(x) np g(p)= |{ ∃ ∈ 6 6 ≤ }|. x:w(x) np 0.8 |{ ≤ }| Let vol(q,n,t)= 1 log (B(t)), where B(t) is the Ham- mingballofradiust,n(forqex|ample|,centeredon0)inFn.Itis 0.6 q wellknownthatwhent≤ q−q1,vol(q,n,t)=Hq(nt)+on(1), whereH (x)= xlog x (1 x)log (1 x)+xlog (q 1) 0.4 q − q − − q − q − is the q-ary entropy of x [0,1]. ∈ To compute the numerator, we suggest, for each codeword 0.2 c C that has a weight between d and 2pn, to compute the ∈ numberofvectorsxthatarenearertocthanto0.Thisnumber 0 actually only depends on the weight of c, and will be noted 0 0.2 0.4 0.6 0.8 1 ν (w(c)). As there are A codewords of weight w(c) in pn w(c) g(p) All Slope in sqrt(d) Nothing the code (with the standard notation), the function g(p) can be approximated by: Figure1. Illustration ofthethreshold effect, d=400,pc=0.7 2pnA ν (l) g(p) l=d l pn (1) ≤ Pqnvol(q,n,pn) the following theorem, which has the same form as the one The differentquantities used in this equation are illustrated showed in [5]: in Fig 2. Theorem 2: LetC bea codeofanylength,andofminimal distance d. Over the q-ary symmetric channel, with transition probability p, the probability of decoding error P (p) associ- e ated with C is such that there exists a unique p ]0;1 such c ∈ | that P (p )= 1, and P is bounded by: e c 2 e Pe(p)S1 Φ(√d( ln(1 pc) ln(1 p))) − − − − − − p p Theupper-bound( )istruewhenp ]0;p ];thelower-bound c ≤ ∈ ( ) is true when p [p ;1[. c ≥ ∈ Even though linearity was used not to lose any generality Figure2. Different quantities usedinEq1 previously,itisnotarequirementforthistheorem.Indeed,the boundingequationsaretrueforeverycodewordcbyreplacing ν (w)isexplicitedhereafter.Letcbeacodewordofweight t d by minc′ C,c‘=cd(c,c′). Assuming that the codewords sent w. Let x Fn be a vector with the following constraints: are distribu∈ted i6n a uniform way over C, we thus obtain this ∈ q d(x,0) t, i.e. x is the result of the transmission of 0 result. • ≤ with at most t errors. ThebehaviourofthisfunctionisillustratedinFig1.Around d(x,0) d(x,c), i.e. x is wrongly decoded. p 0 (actually, for all p < pc ǫ...), Pe is extremely flat • ≥ ≈ − Wenoteαthenumberofcoordinatesiinxsuchthatx =c around0;aroundp≈1(and,symmetrically,forallp>pc+ǫ, andx =0;β isthenumberofcoordinatesisuchthatxi 6=ci Pe is extremely flat around 1. Finally, around the threshold andxi =0;γ isthenumberofcoordinatesisuchthatxi =6 ci pmci,ntihmeaslldoipsetainsce√2dπ√i(s1d−laprcg)e,.which is almost vertical when the and cii=6 0. i 6 i The previous constraints on x can be rewritten into the system (S): III. EXPLICIT COMPUTATION OF THETHRESHOLDFOR MAXIMUM-DISTANCE SEPARABLE CODES 1) 0 α,β w ≤ ≤  2) 0 γ n w FnqI.n this section, we only take interest in linear codes over (S): 43)) βγ≤+≤γt+≤αt−−w ≤ A.BAynlointheearriEtys,twimeactiaonnaogfaitnhewDithecooudtilnogssTohfregsehnoelrdalityassume We then obtain  5) 2α+β ≤w thatthesentcodewordwas theallnullvector.Itispossible to w α+β n w ν (w)= (q 2)β − (q 1)γ. havearoughestimationoftheprobabilityofwronglydecoding t (cid:18)α+β(cid:19)(cid:18) β (cid:19) − (cid:18) γ (cid:19) − αX,β,γ with crossover probability p correctly a vector by computing the proportion of vectors x Fn of weight less or equal to Remark 1: Itiseasytoseethatνt(w)isatmostthevolume np that are closer to a non-n∈ull cqodeword than to 0. Let g(p) of a ball of radius w d; this estimation will be used in the − 2 be this proportion. next part. B. Application to MDS codes (Here, the term o(q) is a bounded by a polynomialin n.) We know that Maximum-DistanceSeparable(MDS)Codesarecodessuch d that their dimension k and minimal distance d fulfil the 0 nµ(l,t) l (3) ≤ ≤ − 2 Singleton bound, so that: Combining these elements with equation (1), we obtain k+d=n 1. g(p) 2pno(q)q1+l d+µ(l,pn) nvol(q,n,pn). − ≤ l=d − − A wellknownfamilyofMDScodesarethe Reed-Solomon As vPol(q,n,t) = Hq(nt) + on(1) = nt + oq(1), the first order of g(p) is bounded by: log g(p) codes, for which a codeword is made of the evaluation of a q ≤ max (1+l d pn+nµ(l,pn))+o (1). Rdeegerde-eSoklo−mo1npcoolydnesomoviaelroFvercannfihealvdeealelemnegnthtsuαp1,to..q.,αn1., Thl∈e[db,opnu]nding (−3) o−f µ shows that the rigqht-hand side of q − thisinequalityis between1+pn dand1+3pn 3d, which but shorter such codes are also MDS. − − 2 shows that the threshold g 1(1) is asymptotically between δ For MDS codes, the number A of codewords of given − 2 2 l and δ. weight is known. This number is: Unfortunately, a more precise evaluation of µ strongly n 1 depends on the context. Indeed, according to Section III-A, − n j An−i = Xj=1(−1)j−i(cid:18)j(cid:19)(cid:18)i(cid:19)(qk−j −1) ν(l,t)=oq(q) max qβ+γ n−l l α+β . ·α,β,γ:(S) (cid:18) γ (cid:19)(cid:18)α+β(cid:19)(cid:18) β (cid:19) From this identity, it is easy to derive the more usable This maximum can be obtained by evaluating the term to formula: be maximized on all vertices of the polytope defined by the l d n − l system (S) ((S) is made of 9 inequalities of 3 unknown, the A = ( 1)j (q1+l d j 1) (2) l (cid:18)l(cid:19) − (cid:18)j(cid:19) − − − verticesareobtainedby selecting3of theseequations,thusat Xj=0 most 9 =84 vertices); however,it is not possible to exhibit 3 It is now possible to approximate quite nicely the error here a(cid:0)g(cid:1)eneralanswer as the solution dependson the minimal probability while under the threshold - indeed, the numerator distance of the code, i.e. on the rate of the Reed-Solomon and denominatorare correctas longas a vector x is not close code. to 2 different codewords with a weight in the range [d;pn], i.e. as long as the list of codewordsat a distance less than pn D. Numerical Application to a (2048,256,1793)264 MDS from x is reduced to a single element. Code In the case of a code over a finite field of reasonable C. Short MDS Codes over Large Fields dimension, it is possible to exactly compute the ratio that We now focus on the specific problem presented in the approximates the Maximum Likelihood threshold. However, Introduction, and motivated by the beckoning and authenti- the exact threshold cannot be easily computed yet; it is still cation protocol from [1]. This setting is characterized by the anopenproblemrelatedtothelist-decodingcapacityofReed- following: Solomon codes. The underlyingcode is a Reed-Solomon over a field F ; We therefore used the NTL open-source library [8] to q • The field size q is very large for cryptographic reasons; compute the values A , ν (l) and B(t) in order to have l t • | | The code lengthn is veryshort(with respect to q) as nq an accurate enough approximation of the the function g(p) • is the size of embedded low-cost devices’ memory. describedearlier.Theparametersarethosethatwereproposed This application fits into the framework depicted in the in [1], and show that the decoding threshold of such a code previoussections.Moreover,theinformation“nmuchsmaller is between 0.8 and 0.875. than q” (n = o(q)) enables to compute an asymptotic first The slope around the threshold is around 115, so for p order estimation of the threshold in such codes. “small” (in fact, a bit smaller than pc) g(p) is very near to 0, Indeed, if g(p) f(p), then g 1(1) f 1(1). We now while as p goesto 1, g(p) is much greater than the maximum compute an upper-≤bound on g(p),−to d2eri≥ve a−n es2timation on probabilityof 1. This was predicted earlier, and expresses the the threshold θ. More precisely, we aim at computing ι(p) fact that the list-size of radius pn is always greater than 1. the first-order value of logq(g(p)); then, ι−1(0) is a lower- The threshold value g−1(21) ≈ ι−1(0) is a lower-bound for approximation of the threshold. the threshold of the code, though the intuition says that this To estimate the weight enumerator A , we use formula (2) lower-bound is pretty near to the real threshold. l to derive IV. CONCLUSION n Al (l d) 2lq1+l−d n2n+lq1+l−d. As a conclusion, let us look back to the starting point of ≤ − (cid:18)l(cid:19) ≤ our reasoning.The initial goal was to revise the conditionsof The number of targetted vectors for each codeword ν (l) security of the construction depicted in [1]: from a received t is not easy to evaluate; we note its first order development vector x of Fn, for what parameters is the size of the q log ν (l) = nµ(l,t)+o (1), so that ν (l) qnµ(l,t) o (q). list of radius pn exponentially large? This problem can be q t q t ≤ · q reduced to that of the threshold probability of a linear error- correctingcode.Indeed,belowthethresholdofthecode,when the minimal distance of the code is large enough, the error decodingprobabilityofthe codeisexponentiallysmall,andit isexponentiallycloseto1abovethethreshold.Forourclassof parameters, ensuring that the error rate is above the threshold is enough to show the security of the scheme. We then showed that the threshold behaviour can be explicited for q-ary codes as well as for binary codes; we then explicited a lower-bound on the threshold of MDS codes. Applying these results to the initial problem, we show that thethresholdfora(highly)truncatedReed-Solomoncodeover a finite field F264 is very near to normalized the minimal distance d = n k + 1 of this code. As a conclusion, to − switch from an algorithmic assumption (the hardness of the Polynomial Reconstruction Problem, see [9]) to Information- Theoretical security, we recommend to raise the dimension k of the underlyingcode. Thislowersthe decodingthresholdof the code; the downside is that storage of a codeword is more costly. V. ACKNOWLEDGEMENTS WethankGillesZe´morfortheusefulcommentsandfruitful discussions. REFERENCES [1] J.Bringer,H.Chabanne,G.D.Cohen,andB.Kindarji,“Privateinterro- gationofdevices viaidentification codes,”inINDOCRYPT,ser.Lecture NotesinComputerScience,B.K.RoyandN.Sendrier,Eds.,vol.5922. Springer, 2009,pp.272–289. [2] V. Guruswami and M. Sudan, “Reflections on ”improved decoding of reed-solomonandalgebraic-geometric codes”,”2002. [3] V. Guruswami and A. Rudra, “Better binary list decodable codes via multilevel concatenation,” Information Theory, IEEE Transactions on, vol.55,no.1,pp.19–26,Jan.2009. [4] G.R.Grimmett,“Percolation,” 1997. [5] J.-P. Tillich and G. Ze´mor, “Discrete isoperimetric inequalities and the probability ofa decoding error,” Comb. Probab. Comput., vol. 9, no. 5, pp.465–479, 2000. [6] G.Ze´mor,“Thresholdeffectsincodes,”inAlgebraicCoding,ser.Lecture Notes in Computer Science, G. D. Cohen, S. Litsyn, A. Lobstein, and G.Ze´mor,Eds.,vol.781. Springer, 1993,pp.278–286. [7] G. A. Margulis, “Probabilistic characteristics of graphs with large con- nectivity,” Problemy Peredacˇi Informacii, vol. 10, no. 2, pp. 101–108, 1974. [8] V.Shoup,“Ntl:Alibraryfordoingnumbertheory.”[Online].Available: http://www.shoup.net/ntl [9] A.KiayiasandM.Yung,“Cryptographichardnessbasedonthedecoding ofreed-solomoncodes,” inICALP,ser.LectureNotes inComputerSci- ence,P.Widmayer,F.T.Ruiz,R.M.Bueno,M.Hennessy,S.Eidenbenz, andR.Conejo, Eds.,vol.2380. Springer, 2002,pp.232–243.