Beating the Gilbert-Varshamov Bound for Online Channels IshayHaviv∗ MichaelLangberg† Abstract In the online channel coding model, a sender wishes to communicate a message to a receiver by transmitting a codeword x = (x ,...,x ) ∈ {0,1}n bit by bit via a channel limited to at most pn 1 n 1 corruptions. The channel is online in the sense that at the ith step the channel decides whether to flip 1 theithbitornotanditsdecisionisbasedonlyonthebitstransmittedsofar, i.e., (x1,...,xi). Thisis 0 in contrast to the classical adversarial channel in which the corruption is chosen by a channel that has 2 fullknowledgeonthesentcodewordx. Thebestknownlowerboundonthecapacityofboththeonline n channelandtheclassicaladversarialchannelisthewell-knownGilbert-Varshamovbound. Inthispaper a weprovealowerboundonthecapacityoftheonlinechannelwhichbeatstheGilbert-Varshamovbound J foranypositivepsuchthatH(2p) < 1 (whereH isthebinaryentropyfunction). Todoso, weprove 5 2 thatforanysuchp,acodechosenatrandomcombinedwiththenearestneighbordecoderachieveswith highprobabilityaratestrictlyhigherthantheGilbert-Varshamovbound(fortheonlinechannel). ] T I s. 1 Introduction c [ The classical scenario in coding theory is that of a sender Alice who wants to transmit a message u to a 1 receiverBobviaabinarycommunicationchannel. Todoso,Aliceencodeshermessageuintoacodeword v 5 x = (x1,...,xn) ∈ {0,1}n and sends it to Bob, who is expected to recover the message u. However, the 4 channel is allowed to corrupt (possibly probabilistically) at most a p-fraction of the codeword, i.e., to flip 0 at most pn bits in x, for some p ∈ [0,1]. The goal is to find a coding scheme by which Alice can send 1 . as many distinct messages as possible while ensuring correct decoding by Bob with high probability (over 1 0 the encoding, decoding and the channel). Roughly speaking, we say that a code achieves rate R if 2Rn 1 distinct messages can be sent using codewords of length n. Viewing the channel as a malicious jammer, 1 it is important to specify what information the channel has while deciding on which bits to flip. Such a : v specificationdefinesthemodelofcommunicationandstronglyaffectstheobtainablerateofcommunication. i X Inoneextreme,thereistheclassicaladversarialmodelinwhichthechannelhasfullknowledgeonthe r entire transmitted codeword x. Given x and the coding scheme of Alice and Bob, the channel chooses an a error for x. Calculating the maximum achievable rate for such a channel is a fundamental open problem in coding theory. The best known lower bound on the rate is due to Gilbert [8] and Varshmov [20] and equals 1−H(2p), where H stands for the binary entropy function. Namely, Gilbert and Varshamov show that there exists a subset of {0,1}n of size roughly 2(1−H(2p))n in which every two distinct vectors have Hamming distance at least 2pn+1. This implies that if we take the vectors in this set as codewords then a nearest neighbor decoder always recovers the correct sent codeword. On the other hand, the best known upperboundisduetoMcElieceetal.[15]andisstrictlyhigherthantheGilbert-Varshamovboundforany p ∈ (0, 1). 4 Inthesecondextreme,therearechannelmodelsinwhichtheerrorimposedonthecodewordxiscom- pletelyindependentofx. Anexampleofsuchachannelisthewell-knownbinarysymmetricchannelstudied ∗TheBlavatnikSchoolofComputerScience,TelAvivUniversity,TelAviv69978,Israel. SupportedbytheAdamsFellowship ProgramoftheIsraelAcademyofSciencesandHumanities.WorkdoneinpartwhileattheOpenUniversityofIsrael. †TheComputerScienceDivision,OpenUniversityofIsrael,Raanana43107,Israel.Email:[email protected] inpartbyISFgrant480/08andbytheOpenUniversityofIsrael’sresearchfund(grantno.46114). 1 (amongotherchannels)byShannon[19]. Inthischanneleverytransmittedbitisflippedindependentlywith probability p, no matter what the sent codeword is. As opposed to the classical adversarial model, the pic- ture here is completely clear, since Shannon proved that 1−H(p) is a tight lower and upper bound on the maximumachievablerate. In this work we continue the line of research in [12, 6, 7] which study the online channel model — a channel model whose strength lies somewhere between the above two extremes. In the online channel model, Alicesendsacodewordxbitbybitoverabinarycommunicationchannel. Foreach1 ≤ i ≤ nthe channeldecideswhethertofliptheithbitornotimmediatelyafterx arrives. Thismeansthatthechannel’s i decision depends only on (x ,...,x ). As in the adversarial model, the channel is limited to corrupt at 1 i most pn of the bits. Roughly speaking, the online channel is stronger than the binary symmetric channel, as an online channel can mimic the random behavior of a binary symmetric channel. On the other hand, theonlinechannelisweakerthantheclassicaladversarialchannel, asanonlinechannelislimitedtomake its decisions in a causal manner. The main theme of this work is to better understand the strength of the onlinechannelmodel—inparticular,doesthemaximumachievableratewhencommunicatingoveronline channelsresemblethatoftheclassicaladversarialchannel,thatofthebinarysymmetricchannel,ormaybe neither? Studyingonlineadversarialchannelsisnaturallymotivatedbypracticalsettingsinwhichthesentmes- sage is not known to the channel which simultaneously learns it. For example, the online channel model simulatesatransmissionofacodewordxvianusesofachannelovertime,whereattimeitheithbitofx istransmitted. Ateachstepthechanneldecideswhethertoflipx whereasthereceiverwaitsuntiltheendof i thetransmissionbeforedecoding. Asintheclassicaladversarialchannelmodel,thechannelislimitedtoat mostpncorruptions,whatisusuallyinterpretedaslimitedprocessingpowerortransmitenergy. Fromathe- oreticalpointofview,understandingtheonlinechannelmodelandcomparingittotheclassicaladversarial channel model might shed some light on the capacity of the classical adversarial channel, a long-standing openproblemincodingtheory. 1.1 RelatedWork Let C (p) denote the capacity of the online channel, defined as the maximum achievable rate when online communicatingoveranonlinechannelallowedtocorruptatmostap-fractionofthetransmittedcodeword. WegivearigorousdefinitionofthecapacityC (p)inSection2. Theknownboundsonthecapacities online of the classical adversarial channel and the binary symmetric channel immediately imply some bounds on thecapacityoftheonlinechannel. Itisclearthatanycodingschemethatworksfortheclassicaladversarial channel works also for the online channel, and hence C (p) ≥ 1 − H(2p). On the other hand, the online onlinechannelcanflipeverybitindependentlywithprobabilityp(uptopnofthem)ignoringthetransmitted codewordx. ItisnothardtoverifythatthisimpliesthatShannon’supperbound(forthebinarysymmetric channel)holdsfortheonlinechannelmodelaswell,thatis,C (p) ≤ 1−H(p). Recently,thisupper online boundwasimprovedin[12]foranyp ≥ 0.15642. Moreprecisely,itwasshownin[12]thatforanyp ≥ 1 no 4 communicationwithpositiverateispossibleviatheonlinechannelandthatforp < 1,C (p) ≤ 1−4p. 4 online Thisimpliesthattheonlinechannelmodelisstrictlystrongerthanthebinarysymmetricchannel,inthesense that there exist values of p (e.g., p = 1) for which no communication is possible over the online channel 4 whereasapositiverateispossibleforthebinarysymmetricchannel. In[12]nonon-triviallowerboundson C (p)werepresented. Thestateoftheartontheonlinechannelmodelisgivenbelow(seeFigure1). online Theorem1.1([12]). Foranyp ∈ [0, 1],itholdsthat1−H(2p) ≤ C (p) ≤ min(1−H(p),(1−4p)+), 2 online where(1−4p)+ isdefinedtobe1−4pforp < 1 and0otherwise. 4 The problem of coding against online channels over large alphabets was studied in [6], where a full characterizationofthecapacityispresented. Namely,itisshownin[6]thatwhencommunicatingoverlarge alphabets,theonlinechannelisnoweakerthantheclassicaladversarialchannelandhascapacity1−2pfor 2 1 0.9 1−4p 0.8 0.7 MRRW bound 0.6 C0.5 0.4 0.3 1−H(p) 0.2 0.1 Gilbert−Varshamov bound 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 p Figure1: Theboundsonthecapacitiesoftheclassicaladversarialchannelandtheonlinechannel. Thebold line(inpurple)istheupperboundonthecapacityoftheonlinechannelfrom[12]. p < 1 and0otherwise. Theproofsofthetightupperandlowerboundsin[6]usethegeometrythatfieldsof 2 largesizeenjoy,anditisnotcleariftheseideascanbeextendedtothebinarycaseconsideredinourwork. Tothebestofourknowledge,otherthantheworksmentionedabove,communicationinthepresenceof an online channel has not been explicitly addressed in the literature. Nevertheless, we note that the model of online channels, being a natural one, has been “on the table” for several decades and the analysis of the online channel model appears as an open question in the book of Csisza´r and Korner [4] in the section addressing Arbitrarily Varying Channels (AVC) [2]. (The AVC model is a broad framework for modeling channels, which encapsulates our online model. For a nice survey on AVCs see [13].) In addition, various variantsofonlinechannelshavebeenaddressedinthepast,forinstance[2,11,17,18,16,9]–howeverthe modelsconsideredthereindiffersignificantlyfromours. 1.2 OurResult The Gilbert-Varshamov rate of 1 − H(2p) is the state of the art when communicating over classical ad- versarialchannels. Thequestionwhetheronecanimproveuponthisratewhencommunicatingoveronline channelsisanintriguingquestion. Anaffirmativeanswerwouldnotonlymakeprogressinourunderstand- ingoftheonlinechannelmodelbutalsomayhintonapossibleseparationbetweentheonlineandclassical adversarialchannels. Inourworkweaddressthisquestionandpresentalowerboundonthecapacityofthe onlinechannelthatbeatstheGilbert-Varshamovbound. Moreprecisely,weprovethatforanysmallenough p, the Gilbert-Varshamov lower bound is not tight for the online channel. This means that for any such p, there exists a coding scheme for the online channel with rate strictly higher than 1 − H(2p). This is the firstlowerboundfortheonlinechannelwhichisnotknowntoholdfortheclassicaladversarialmodel. Our resultisstatedbelow. Theorem1.2. ForanypsuchthatH(2p) ∈ (0, 1)thereexistsaδ > 0suchthat 2 p C (p) ≥ 1−H(2p)+δ . online p NotethatH(2p) ∈ (0, 1)foranyp ∈ (0, 1 ·H−1(1)) ≈ (0,0.055). Wealsonotethatourresultholds 2 2 2 with respect to the average error criteria (see Section 2.2 for a discussion on the error type). Finally, we remarkthatinordertoproveTheorem1.2weshowalowerboundonamuchstrongerchannelmodel,which werefertoasthetwo-stepmodel(definedbelow). 3 1.3 TechniquesandProofOverview Ourgoalinthispaperistoshowtheexistenceofanencoderandadecoderfortheonlinechannelbywhich AliceandBobachievesomerateRstrictlyhigherthan1−H(2p),whichistherateachievedbytheGilbert- Varshamovbound. Insteadofdealingdirectlywiththeonlinechannelmodelweconsiderastrongerchannel model,thetwo-stepmodel,definedasfollows. Denoteα = R−εforsomesmallε > 0. InthefirststepAlice sendsthefirstαnbitsofherencodedmessageandthechannel(afterviewingthistransmittedinformation) decides which bits to flip out of these αn bits. In the second step Alice sends the rest of the codeword and thechannel(nowwithfullknowledgeonthesentcodeword)decideswhichbitstoflipoutoftheremaining transmission. Thenumberofbitscorruptedinthetwostepstogetherislimitedtobeatmostpn. Noticethat this model is stronger than the online channel model in the sense that any code allowing communication over the two-step model will also allow communication over our model of online channels. Indeed, any adversarial strategy of the online channel model implies a valid strategy for the two-step model achieving theexactsameparameters. Therefore, inordertoproveourlowerboundonthecapacityinTheorem1.2it sufficestoconsiderthetwo-stepmodel. We turn to describe our construction of codes that allow communication over the two-step model with rate R greater than 1 − H(2p). We first note that no linear code will suffice. Roughly speaking, this followsfromthefactthateachcodewordxinalinearcodehasexactlythesame“neighborhoodstructure”. Thus, when a linear code is used, the problems of communicating over channels with limited information regardingthecodewordxandthosewithfullinformationareequivalent.1 Wethusturntostudycodeswhich are not linear. A natural candidate is a code in which the codewords are chosen completely at random and thedecoderisthenearestneighbordecoder. Moreprecisely,wepickacodeC : [2Rn] → {0,1}n suchthat for every u ∈ [2Rn] the codeword C(u) is independently and uniformly chosen from {0,1}n. Given such a code, Bob outputs a message u(cid:48) ∈ [2Rn] that minimizes the Hamming distance between C(u(cid:48)) and the receivedcorruptedvector. Inordertoproveourtheorem,weshowthatthedecodingsucceedswithhighprobabilitynomatterhow the adversarial online channel behaves. The intuitive idea is the following. In the first step Alice sends a prefix m ∈ {0,1}αn of a codeword where α = R−ε. Since the code C was constructed randomly, for a typical prefix m there are exponentially many (about 2εn) codewords in C that share m as a prefix. This meansthatthechannelisnotabletorecognizethesentcodewordatthispoint,andthereforeithasnogood waytodecidewhichbitsfrommtoflip. Roughlyspeaking,weshowthatnomatterwhichbitstheadversary decidestoflipinthisfirststep,formostofthecodewordsthatsharemasaprefixtheerrorimposedbythe adversaryisinawrongdirectionandthuswillnotenabletheadversarytocauseadecodingerror(afterthe additionalcorruptionofthesecondstep). Infact,asouranalysisshows,forourcodesC thebeststrategyfor theadversaryisactuallytosaveitsflippingpowerandtocorruptonlyinthesecondstepofcommunication. This implies that in our setting the two-step channel will concentrate all its error on the second portion of thecodeword! Comparingthisstateofaffairstotheclassicalchannelmodelinwhichtheerrorisspreadout overtheentirecodewordshedslightonthereasonweareabletoimproveupontheGilbert-Varshamovrate of 1−H(2p). Very loosely speaking, to prove our improved rate, we first show that a code C constructed atrandomisexpectedtoallowsuccessfulcommunication. However,astheeventscorrespondingtocorrect decodingarenotindependentofeachother,ourprooffortheexistenceofthedesiredcodefollowsarather delicateanalysis. Ouranalysisholdsforthetwo-stepmodelandthussufficestoproveTheorem1.2. Toimproveuponthe results of Theorem 1.2, it is natural to try to generalize our analysis to a channel model that includes more thantwosteps. Atitsextreme(then-stepmodel)weobtainouroriginalonlinechannel. Suchageneralized 1Indetail,foranylinearcodeof(minimum)distanceatmost2pnthereexistsanonlinechannelthatcausesanydecodertoerr withprobabilityatleast 1 foreverysentmessage.Toseethis,assumethatxandyaretwocodewordsofdistanceatmost2pn,and 2 letzbeavectorofdistanceatmostpnfrombothxandy. Now,considerachannelthatmapsanycodewordwtow+(z−y)or tow+(z−x)withprobability 1 each. Observethatthisisanonlinechannelthatcausesanydecodertoerrwithprobabilityat 2 least 1. 2 4 analysisisleftopeninthisworkandseeminglycannotbeaddressedbythecurrentprooftechniques. In the following Section 2 we set the notation and definitions used throughout our work. We then turn toproveTheorem1.2inSection3. 2 Preliminaries 2.1 NotationsandStandardDefinitions For k ∈ N we denote [k] = {i ∈ N | 1 ≤ i ≤ k}. For a vector x = (x ,...,x ) ∈ {0,1}n and a 1 n number 1 ≤ k ≤ n we denote by x| the projection of x on its first k entries, i.e., x| = (x ,...,x ). [k] [k] 1 k TheHammingweightofa binaryvector isthe numberof its1-entries, andtheHamming distancebetween x ∈ {0,1}n andy ∈ {0,1}n,denotedbydist (x,y),istheHammingweightofx+y,wheretheaddition H ismodulo2andcoordinate-wise. For two functions f,g : N → R, we say that f and g are polynomially equivalent and write f ∼ g if thereareconstantsc1,c2 suchthatn−c1 ·f(n) ≤ g(n) ≤ nc2 ·f(n)foralllargeenoughn ∈ N. Similarly, wewritef (cid:46) g ifthereisaconstantcsuchthatf(n) ≤ nc·g(n)foralllargeenoughn ∈ N. ThebinaryentropyfunctionH : [0,1] → [0,1]isdefinedbyH(0) = H(1) = 0andH(p) = −plogp− (1−p)log(1−p)forp ∈ (0,1),wherethelogarithms,hereandeverywhereinthispaper,areofbase2. It iswell-knownandeasytoverifythatforanyc ∈ (0, 1],(cid:0)n(cid:1) ∼ 2H(c)n. Weneedthefollowingtwosimple 2 cn facts regarding H. Notice that the first fact implies the second (by setting the parameters of Fact 2.1 to be x = 0,y = 1,andθ = 1−4p). 2 Fact2.1. TheentropyfunctionH isstrictlyconcave,thatis,foranyθ ∈ (0,1)andx,y ∈ [0,1]itholdsthat θ·H(x)+(1−θ)·H(y) ≤ H(θ·x+(1−θ)·y),andequalityholdsifandonlyifx = y. Fact2.2. Foranyp ∈ (0, 1),4p < H(2p). 4 WeneedthefollowingversionoftheChernoff-HoeffdingBound[10,14](addressingrandomvariables whicharenotnecessarilyindicatorvariables). Theorem 2.3 (Chernoff-Hoeffding). Let X ,X ,...,X be independent and identically distributed ran- 1 2 N domvariablestakingvaluesintheunitinterval[0,1]withexpectationatmostµ. Then, (cid:34) N (cid:35) (cid:88) Pr X ≥ 2µN ≤ e−Θ(µN). i i=1 2.2 TheOnlineChannelModelandtheTwo-stepModel For R > 0, an (n,Rn)-code C is a mapping C : [2Rn] → {0,1}n. The elements of the image of C are called codewords. Define α = R − ε for some ε > 0 and let m ∈ {0,1}αn be some prefix. Here and throughout our work we ignore rounding issues and assume that αn, Rn and other such expressions are integers. WedenotebyCm thesetof allmessageswhosecodewordshavemasaprefix, i.e., Cm = {u ∈ [2Rn] | C(u)| = m}, and by Cm the set of all messages whose codewords do not have m as a prefix, [αn] i.e., Cm = [2Rn]\Cm. A random code is a mapping C : [2Rn] → {0,1}n such that for every u ∈ [2Rn] thecodewordC(u)isindependentlyanduniformlychosenfrom{0,1}n. NoticethatweuseCtodenotea fixedcodeandC todenoteacodewhichformsarandomvariable. ConsideracodeC. Throughoutthiswork,weconsidertheaverageerrorsuccesscriteriawhilecommu- nicating over the online channel model. Namely, Alice’s message u is considered as uniformly distributed over [2Rn]. Given the message u, Alice deterministically maps u to the codeword C(u) = (x ,...,x ) ∈ 1 n {0,1}n and transmits it over the communication channel. For every i ∈ [n] the decision of the channel whether to flip x or not depends only on (x ,...,x ). In addition, the channel is limited to at most pn i 1 i corruptions. Bob’sgoalistorecoverufromhisreceivedvector. 5 TheprobabilityoferrorofCisdefinedastheaverageoverallu ∈ [2Rn]oftheprobabilityoferrorfor the message u, i.e., the probability that the message that Bob decodes differs from the message u encoded byAlice. Here,theprobabilityistakenovertherandomvariablesofthechannelandofBob. Wesaythatthe rateR isachievableifforeveryε > 0, δ > 0andeverysufficientlylargenthereexistsan(n,(R−δ)n)- codethatallowscommunicationwith(average)probabilityoferroratmostε. Thesupremumovernofthe achievable rates is called the capacity of the online channel and is denoted by C (p). We note that online thediscussionintheintroductionregardingtheknownboundsonthecapacityofboththebinarysymmetric channelandtheclassicaladversarialchannelholdsforaverageerror(seee.g.,[3]). Onemayalsoconsideradefinitionforcapacitywhichtakesintoaccountthemaximumerrorovermes- sages u and not the average error. In this maximum error (or worst case) setting, if the encoding function of Alice is considered to be deterministic, it is straightforward to verify that online channels have no ad- vantage over the classical adversarial channel. This is no longer the case when one allows randomization in Alice’s encoding process (referred to as stochastic encoders). As common in the study of Arbitrarily Varying Channels (e.g., [5]), there is an equivalence between the capacity when considering the models of (a)deterministicencodersandaverageerrorcriteriaand(b)stochasticencodersandmaximumerrorsuccess criteria. Thisequivalenceholdsalsofortheonlinechannelmodelstudiedinthiswork. As mentioned before, for our lower bound we consider a two-step model defined for a parameter α = R −ε where ε > 0 is some small constant. In the first step, Alice sends the first αn bits of the encoded message and the channel decides which bits to flip out of these αn bits. In the second step, Alice sends therestofthecodewordandthechanneldecideswhichbitstoflipoutoftheremaining(1−α)nbits. The number of bits corrupted in the two steps together is limited to be at most pn. In each step, the decisions madebythechannelarebasedontheinformationtransmittedinandbeforethestepathand. Thenotionof (averageerror)capacityisdefinedasdoneabove. Asexplainedintheintroduction,anylowerboundonthe capacityofthetwo-stepmodelholdsalsofortheonlinechannelmodel. 3 Proof of Theorem 1.2 Before presenting the proof of our lower bound for the online channel model, let us start with a short comparisontotheGilbert-Varshamovlowerboundthatholdsfor theclassicaladversarialmodel. Oneway to prove the Gilbert-Varshamov bound is to show that a code C : [2Rn] → {0,1}n chosen at random combined with the nearest neighbor decoder implies a coding scheme of rate almost 1−H(2p) with high probability. Roughlyspeaking,theachievablerateinthisargumentisaffectedbythenumberofcodewords xthatarefarawayfromanyothercodewordinC. Namely,oneisinterestedinprovingthattherearelotsof codewordsx, forwhichtheballofradius2pncenteredatxincludesnocodewordsexceptx. Indeed, such atransmittedcodewordxwillbedecodedcorrectlybyanearestneighbordecodernomatterwhicherroris imposedbythechannel. Asthevolumeofthisballis(cid:80)2pn(cid:0)n(cid:1) ∼ 2H(2p)n therateessentiallyfollows. i=0 i Recall that for our lower bound on the capacity of the online channel we consider the two-step model. InthefirststepAlicesendsaprefixmoflengthαnandthechannelchooseswhichbitstoflipoutofthese αn bits, and in the second step Alice sends the remaining (1 − α)n bits and the channel again chooses whichbitstoflipoutoftheremainingpartofthecodeword. Letusnowstudytherequired“forbiddenball” corresponding to a codeword x in the two-step model. To take advantage of the two-step model, consider fixinganerrorpatterneimposedonthefirstportionofx. LetB(x,e)bethesubsetof{0,1}n thatsatisfies the following property: if the codeword x was transmitted, the error pattern e was imposed on the first portion of x in the first step, and there are no codewords other than x in B(x,e); then no matter what the channeldoesinthesecondstepthedecodingofBobwillsucceed. WedefineB(x,e)(denotedasB(z)for z = x+e) rigorously and analyze its size in the upcoming section. Specifically, we show that the size of B(z) is exponentially smaller than 2H(2p)n. This fact is a core ingredient in our proof. Combining it with severaladditionalideasleadstoourimprovedlowerbound. We now turn to present the proof of Theorem 1.2. In Section 3.1 we formally define the “forbidden 6 ball”B(z)describedaboveandanalyzeitssize. InSections3.2and3.3weproveourtheorembyshowing that with high probability over the codeword x chosen by Alice, the decoding is successful. Namely, that Bob decodes a codeword x(cid:48) which is equal to the transmitted codeword x. In Section 3.2 we analyze the probability (over x) that Bob decodes an incorrect codeword x(cid:48) in which x and x(cid:48) differ in their first αn bits. InSection3.3weaddressxandx(cid:48) whichagreeontheirfirstαnbits. Finally,inSection3.4weprove Theorem1.2. 3.1 The“ForbiddenBall”B(p,q)(z) α ConsiderasituationinwhichAlicetransmitsacodewordx. Namely,inthefirststep,Alicesendsthefirstαn bitsofxandthechannelflipsqnofthemforsomeq ∈ [0,min(p,α)]. Lete ∈ {0,1}αn×{0}(1−α)nbethe 1 vectorofHammingweightqnthatrepresentsthechannel’scorruptionsinthefirststep,andletz = x+e be 1 the(partially)corruptedcodewordafterthefirststep. InthesecondstepAlicesendstheremaining(1−α)n bitsofx. Sincethechannelislimitedtoatotalnumberofpncorruptions,atmost(p−q)nofthebitscanbe flippedinthisstep. Lete ∈ {0}αn×{0,1}(1−α)n bethevectorofHammingweightatmost(p−q)nthat 2 representsthechannel’scorruptionsinthesecondstep,andletw = z+e = x+e +e bethecorrupted 2 1 2 codewordreceivedbyBob. Conditioning on the first step, namely on the value of z, we are interested in counting the vectors that thechannel(initssecondstep)mayenforceBobtoconsiderinhisnearestneighbordecoding. Theseareall thevectorsy ∈ {0,1}n forwhichthereexistsavectorw ∈ {0,1}n suchthat • w isofdistanceatmostpnfromy,and • w andz agreeonthefirstαnbitsandthedistancebetweenthemisatmost(p−q)n. Noticethattheseconditemfollowsfromthefactthatourchannelcanonlycorruptbitsinthe(1−α)nsuffix ofz inthesecondstep. Wedefine B(p,q)(z) = {y ∈ {0,1}n|∃w ∈ {0,1}n s.t. dist (w,y) ≤ pn, z| = w| , dist (z,w) ≤ (p−q)n}. α H [αn] [αn] H (p,q) It is not hard to verify that (a) the original transmitted codeword is in B (z), and (b) if this is the α (p,q) only codeword in B (z) then Bob will decode successfully. It is also not hard to verify that the size of α B(p,q)(z) does not depend on z and therefore we can denote B(p,q) = |B(p,q)(z)| for any z ∈ {0,1}n. The α α α (p,q) followingclaimboundsB andisprovenintheappendix. α Claim3.1. Forany0 < p < 1 ·H−1(1)thereexistsanη > 0suchthatforany1−H(2p) ≤ α ≤ 1−2p 2 2 andq ∈ [0,p]itholdsthatB(p,q) ≤ 2(H(2p)−η)n. α 3.2 ErrorsCausedbyCodewordswithDistinctPrefixes Let C : [2Rn] → {0,1}n be a code chosen at random and let x ∈ {0,1}n be a codeword sent by Alice. Consider the setting in which Alice, in the first step, sends the prefix m = x| and the channel corrupts [αn] qnofitsbitsforsomeq ∈ [0,min(p,α)]. Lete ∈ {0,1}αn ×{0}(1−α)n bethevectorofHammingweight qnthatrepresentsthechannel’scorruptionsinthefirststep. InthesecondstepAlicesendsthelast(1−α)n bits of x and the channel is allowed to flip at most (p − q)n of these bits. After the first step, the set of vectorsthatareofHammingdistanceatmostpnfromavectorthatthechannelcancauseBobtoreceiveis (p,q) exactly B (x+e). Therefore, if a nearest neighbor decoder fails then there must be another codeword α (p,q) (p,q) ofC (inadditiontox)inB (x+e). InthissectionwestudytheprobabilitythatB (x+e)contains α α acodewordwithaprefixthatdiffersfrommandshowthatitissmallnomatterwhatmoreare. Here,the probabilityistakenovertherandomconstructionofC. Ingeneral,itisnothardtoverifythatinexpectation,indeedarandomcodeC willensureanexponen- tially decaying decoding error in the case under study (here, the expectation is over the code construction 7 andtheerrorisoverthemessagesofAlice). However, astheeventscorrespondingtocorrectdecodingare notindependentofeachother,ourproofincludesaratherdelicateanalysis. Ourproofinthissectionconsists of two parts. In the first part, we identify a certain property on codes C, and prove that it holds with very high probability. This property is then used in the second part of our proof, and enables to cope with the dependenciesmentionedabove. WestartbydefiningourneededpropertyonC. Acodeisconsideredasgoodwithrespecttothepair(m,e)ifithasthefollowingtwoproperties: (a)the numberofcodewordswithprefixmisclosetoitsexpectationand,inaddition,(b)thenumberofcodewords that do not start with m but alternatively may cause a decoding error on the transmission of a word that does start with m is not much larger than the expectation. This notion is formally defined below. We then showthatforeverymandeacodeC chosenatrandomisgoodwithrespectto(m,e)withhighprobability. RecallthedefinitionsofCm andCm fromSection2.2. Definition 3.2. For a natural number n, p > 0, R > 0, ε > 0, α = R − ε, m ∈ {0,1}αn and e ∈ {0,1}αn×{0}(1−α)nofHammingweightqnforq ∈ [0,min(p,α)],wesaythatacodeC : [2Rn] → {0,1}n isgoodwithrespecttothepair(m,e)if 1. 2εn−1 ≤ |Cm| ≤ 2εn+1,and 2. (cid:80) |{u ∈ Cm |C(u) ∈ B(p,q)(x+e)}| ≤ B(p,q)·2εn+2, x∈Zm α α whereZ isthesetofallvectorsin{0,1}n withmasaprefix,i.e.,Z = {z ∈ {0,1}n |z| = m}. m m [αn] A remark regarding Item (2) of Definition 3.2 is in place. In general, Item (2) estimates the number of codewordsinCm thathappentobeincludedin“forbiddenballs”oftypeB(p,q)(x+e)forvectorsx ∈ Z α m (namely,x| = m). Laterinourproof,wewillthinkofxasarandomlychosencodewordwithprefixm, [αn] andthel.h.s. ofItem(2)willcorrespondtotheexpectednumberofcodewordsinits“forbiddenball”. Lemma 3.3. For every large enough n, p > 0, R > 0, ε > 0, α = R − ε, a prefix m ∈ {0,1}αn and e ∈ {0,1}αn×{0}(1−α)n ofHammingweightatmostpn,theprobabilitythatacodeC : [2Rn] → {0,1}n chosenatrandomisgoodwithrespectto(m,e)isatleast1−e−2Ω(n). Proof: Fixapair(m,e)andassumethattheHammingweightofeisqnforq ∈ [0,p]. Foreveryu ∈ [2Rn] denotebyX theindicatorrandomvariabledefinedtobe1ifu ∈ Cmand0otherwise. NoticethattheX ’s u u areindependentandidenticallydistributedandthat|Cm| = (cid:80) X . Also,E[X ] = Pr[X = 1] = u∈[2Rn] u u u 1 ,andlinearityofexpectationimpliesthatE[|Cm|] = 2Rn· 1 = 2εn. ApplyingthestandardChernoff 2αn 2αn bound (see, e.g., [1] Appendix A) we get that Item (1) of Definition 3.2 holds with probability at least 1−e−2Ω(n). Now,giventhat(1)holds,wewillshowthattheprobabilitythat(2)holdsis1−e−2Ω(n). Thiswillimply thatwithsuchprobabilityboth(1)and(2)hold,asfollowsfromPr[(1)∧(2)] = Pr[(1)]·Pr[(2)|(1)]. Since the summands in Item (2) of Definition 3.2 are not independent, we cannot directly apply the Chernoff-Hoeffdingbound. Toovercomethisissue,weexpressthesummationin(2)asanothersummation ofindependentrandomvariables. Detailsfollow. RecallthatZ standsforthesetofallvectorsin{0,1}n m withmasaprefix. Defineforeveryu ∈ Cm therandomvariable (cid:12) (cid:12) Yu = (cid:12)(cid:12){x ∈ Zm (cid:12)(cid:12)C(u) ∈ Bα(p,q)(x+e)}(cid:12)(cid:12). (p,q) Namely, Y counts the number of balls B (x + e) (with x ∈ Z ) which include C(u). Denote Y = u α m (cid:80) Y andobservethatY equalsthesumfrom(2). ObservethattheY ’sareindependentand, more- u∈Cm u u over,theyareindependentevenwhenconditioningonthesizeofthesetCm. Giventhatu ∈ Cm,forevery x ∈ Z theprobabilitythatC(u) ∈ B(p,q)(x+e)isatmost Bα(p,q) . Hence, m α 2n−2(1−α)n B(p,q) B(p,q) B(p,q) E[Y ]≤|Z |· α ≤2·2(1−α)n· α = α . u m 2n−2(1−α)n 2n 2αn−1 8 Notice that for every u ∈ Cm we have Y ≤ B(p,q) and define Y(cid:48) = Yu ∈ [0,1] and Y(cid:48) = Y . For u α u Bα(p,q) Bα(p,q) anyk ∈ [2εn−1,2εn+1]usetheChernoff-Hoeffdingbound(Theorem2.3)toobtain (cid:34) (cid:35) (cid:104) (cid:12) (cid:105) 4B(p,q)(2Rn−k) (cid:12) Pr Y ≥B(p,q)·2εn+2(cid:12)|Cm|=k ≤ Pr Y ≥ α (cid:12)|Cm|=k α (cid:12) 2αn (cid:12) (cid:20) 4(2Rn−k) (cid:12) (cid:21) = Pr Y(cid:48) ≥ (cid:12)|Cm|=k =e−Ω(2εn). 2αn (cid:12) Finally,foralargeenoughnweobtain (cid:88) (cid:104) (cid:12) (cid:105) (cid:104) (cid:12) (cid:105) Pr[(2)|(1)] = 1− Pr Y ≥B(p,q)·2εn+2(cid:12)|Cm|=k∧(1) ·Pr |Cm|=k(cid:12)(1) α (cid:12) (cid:12) k∈[2εn−1,2εn+1] ≥ 1−e−Ω(2εn)· (cid:88) Pr(cid:104)|Cm|=k(cid:12)(cid:12)(1)(cid:105)=1−e−Ω(2εn). (cid:12) k∈[2εn−1,2εn+1] Wenowturntothesecondpartofourproof. LetmbeaprefixofacodewordsentbyAliceandletebe thevectorthatrepresentsthecorruptionsmadebythechannelinthefirststep. Considerafixedchoiceofthe codewordsinC whichdonothavemasaprefix(i.e.,C| ). Thefollowinglemmashowsthatthenumber Cm ofmessagesinCm forwhichthechannelmaycauseadecodingerrorduetomessagesinCm issmallwith highprobability. Theprobabilityhereisoverthechoiceofthecodewordsthatstartwithm(sinceC| is Cm fixed). Foranyu ∈ Cm defineT tobethenumberofcodewordsofmessagesfromCm inthe“forbiddenball” u correspondingtou. Namely,T = |{u(cid:48) ∈ Cm|C(u(cid:48)) ∈ B(p,q)(C(u)+e)}|.LetP beanindicatorrandom u α u variabledefinedtobe1ifT ≥ 1and0otherwise. Finally,weletP(m,e) denotethenumberofcodewords u with prefix m whose corresponding “forbidden balls” contain codewords associated with elements from Cm. Formally, P(m,e) = (cid:80) P . We stress that messages u with P = 1 are considered as messages u∈Cm u u forwhichthechannelmaycauseadecodingerror. ThusourobjectiveistoshowthatP(m,e) issmall. Lemma 3.4. For every 0 < p < 1 · H−1(1) there exists a δ > 0 such that for ε ≤ δ ≤ δ , R = 2 2 p p 1−H(2p)+δandα = R−εthefollowingholdsforanysufficientlylargen. Foreveryprefixm ∈ {0,1}αn, e ∈ {0,1}αn×{0}(1−α)nofHammingweightatmostpn,afixedsetofmessagesCmandafixedrestriction C(cid:101) ofC toCm, (cid:104) (cid:12) (cid:105) Pr P(m,e) < 2εn/2 (cid:12)C| = C(cid:101) ∧ C isgoodwithrespectto(m,e) ≥ 1−e−2Ω(n). (cid:12) Cm Here,theprobabilityistakenovertherandomconstructionofC. Proof: For0 < p < 1·H−1(1)takeδ = min(4 ·η,H(2p)−2p),whereηistheconstantwhoseexistence 2 2 p 7 isguaranteedinClaim3.1. Noticethatδ > 0sinceH(2p) > 2p,asfollowsfromFact2.2. p Fix a pair (m,e) and assume that the Hamming weight of e is qn for q ∈ [0,p]. Denote by G(m,e) the event that C is good with respect to (m,e). Conditioning on C| = C(cid:101) and on G(m,e), every C(u) Cm for u ∈ Cm is independently and uniformly distributed over the vectors in {0,1}n that start with m, and in particular the P ’s are independent. Since C satisfies Item (2) of Definition 3.2 we get that for every u u ∈ Cm, (cid:104) (cid:12) (cid:105) (cid:104) (cid:12) (cid:105) B(p,q)·2εn+2 4B(p,q) E Pu(cid:12)(cid:12)C|Cm =C(cid:101) ∧G(m,e) ≤E Tu(cid:12)(cid:12)C|Cm =C(cid:101) ∧G(m,e) ≤ α2(1−α)n = 2(1−αR)n. (p,q) Notice that our choice of δ implies that 1−H(2p) ≤ R −ε = α ≤ R ≤ 1−2p and hence B ≤ p α 2(H(2p)−η)n byClaim3.1. SinceC satisfiesItem(1)ofDefinition3.2weobtainthat (cid:104) (cid:12) (cid:105) 4B(p,q) 8·2(ε+H(2p)−η)n E P(m,e)(cid:12)C| =C(cid:101) ∧G(m,e) ≤ |Cm|· α ≤ (cid:12) Cm 2(1−R)n 2(1−R)n = 8·2(δ+ε−η)n ≤8·2(ε/4+74δp−η)n ≤8·2εn/4. 9 Forasufficientlylargen,applyingtheChernoff-Hoeffdingbound(Theorem2.3)yields (cid:104) (cid:12) (cid:105) (cid:104) (cid:12) (cid:105) Pr P(m,e) ≥2εn/2(cid:12)C| =C(cid:101) ∧G(m,e) ≤Pr P(m,e) ≥16·2εn/4(cid:12)C| =C(cid:101) ∧G(m,e) ≤e−2Ω(n), (cid:12) Cm (cid:12) Cm asdesired. CombiningLemmas3.3and3.4wegetthefollowingcorollary. Corollary 3.5. For every 0 < p < 1 · H−1(1) there exists a δ > 0 such that for ε ≤ δ ≤ δ , R = 2 2 p p 1 − H(2p) + δ and α = R − ε the following holds for any sufficiently large n. The probability that a code C : [2Rn] → {0,1}n chosen at random satisfies that for every prefix m ∈ {0,1}αn and e ∈ {0,1}αn×{0}(1−α)n ofHammingweightatmostpn,C isgoodwithrespectto(m,e)andP(m,e) < 2εn/2, isatleast1−e−2Ω(n). Proof: Let (m,e) be a fixed pair and denote by G(m,e) the event that C is good with respect to (m,e). In thefollowingC(cid:101) denotesarestrictionofC toCm. Wehave (cid:104) (cid:105) (cid:88) (cid:104) (cid:12) (cid:105) (cid:104) (cid:105) Pr P(m,e) < 2εn/2 ∧ G(m,e) = Pr P(m,e) < 2εn/2 (cid:12)C| = C(cid:101) ∧G(m,e) ·Pr C| = C(cid:101) ∧G(m,e) (cid:12) Cm Cm C(cid:101) ≥ (1−e−2Ω(n))·(cid:88)Pr(cid:104)C| = C(cid:101) ∧G(m,e)(cid:105) Cm C(cid:101) (cid:104) (cid:105) = (1−e−2Ω(n))·Pr G(m,e) ≥ (1−e−2Ω(n))·(1−e−2Ω(n)) ≥ 1−e−2Ω(n), wherethefirstandthesecondinequalitiesfollow,respectively,fromLemmas3.4and3.3. Takingtheunion boundoverallthepossiblepairs(m,e)completestheproof. 3.3 ErrorsCausedbyCodewordswiththeSamePrefix In this section we consider decoding errors caused by codewords in C that have prefix (of length αn) identical to the prefix of the transmitted codeword. Namely, we consider the scenario that Alice sends a codewordx,Bobgetsthecorruptedvectory,andthemessagethatBoboutputscorrespondstoacodeword thatdiffersfromxbutsharestheprefixx| . Awaytohandlesucherrorsistoverifythatforeveryprefix [αn] m, ourcodeC doesnotinclude(many)pairsofcodewordsthatsharemasaprefixandareclosetogether, namely of Hamming distance at most 2pn. This is the type of analysis that actually corresponds to the classicaladversarialchannel,andcanbeusedhereasweareconsideringaspecialcaseofdecodingerrors. ThefollowinglemmasaysthatacodeC : [2Rn] → {0,1}n chosenatrandomwithR < 1−4phasonly fewpairsofcodewordsthatshareaprefixandhaveHammingdistanceatmost2pn. Lemma3.6. Foreveryp ∈ [0, 1),R < 1−4p,asufficientlysmallε > 0andα = R−εthereexistsaγ > 0 4 forwhichthefollowingholdsforanysufficientlylargen. TheprobabilitythatacodeC : [2Rn] → {0,1}n chosenatrandomsatisfiesthat 1. foreverym ∈ {0,1}αn,2εn−1 ≤ |Cm| ≤ 2εn+1, 2. and for every m ∈ {0,1}αn, besides at most 2(α−γ)n of them, there exists a set X ⊆ Cm of size m |X | < 2(ε−γ)n suchthateverydistinctu ,u ∈ Cm\X satisfydist (C(u ),C(u )) > 2pn, m 1 2 m H 1 2 isatleast1−e−2Ω(n). InordertoproveLemma3.6weneedthefollowing(known)claimthatshowsthatwithhighprobability arandomcodealmostachievestheGilbert-Varshamovbound. Weincludeitsproofforcompleteness. 10