Table Of Content

Channels with Synchronization/Substitution Errors and Computation of Error Control Codes Stavros Konstantinidis Nelma Moreira and Roge´rio Reis Department of Mathematics and Computing Science CMUP & DCC Saint Mary’s University Faculdade de Cieˆncias da Universidade do Porto Halifax, Nova Scotia, Canada Rua do Campo Alegre, 4169007 Porto Portugal Email: [email protected] Email: {nam,rvr}@dcc.fc.up.pt 6 (research supported by NSERC) (research supported by FCT project UID/MAT/00144/2013) 1 0 2 Abstract—We introduce the concept of an f-maximal error- inputtoalgorithms.InSectionIII,wepresenttworandomized l detecting block code, for some parameter f between 0 and 1, in algorithms: the first one decides (up to a certain degree of u order to formalize the situation where a block code is close to confidence) whether a given block code C is maximal er- J maximal with respect to being error-detecting. Our motivation detectingforagivenchanneler.Thesecondalgorithmisgiven 9 for this is that constructing a maximal error-detecting code achanneler,aner-detectingblockcodeC ⊆Aℓ (whichcould 2 is a computationally hard problem. We present a randomized be empty), and integer N > 0, and attempts to add to C N algorithm that takes as input two positive integers N,ℓ, a new words of length ℓ resulting into a new er-detecting code. ] probability value f, and a specification of the errors permitted T If less than N words get added then either the new code is in some application, and generates an error-detecting, or error- I correcting, block code having up to N codewords of length ℓ. If 95%-maximalor the chance thata randomlychosenword can s. thealgorithmfindslessthanN codewords,thenthosecodewords be added is less than 5%. Our motivation for considering a c constitute a code that is f-maximal with high probability. The randomized algorithm is that embedding a given er-detecting [ error specification (also called channel) is modelled as a trans- block code C into a maximal er-detecting block code is a 2 ducer, which allows one to model any rational combination of computationally hard problem—this is shown in Section IV. v substitution and synchronization errors. We also present some In Section V, we discuss briefly some capabilities of the 2 elements of our implementation of various error-detecting prop- new module codes.py in the open source software package erties and their associated methods. Then, we show several tests 1 FAdo [1], [4] and we discuss some tests of the randomized of the implemented randomized algorithm on various channels. 3 algorithms on various channels. In Section VI, we discuss Amethodologicalcontributionisthepresentationofhowvarious 6 a few more points on channel modelling and conclude with desirable error combinations can be expressed formally and 0 processed algorithmically. directions for future research. . 1 We note that, while there are various algorithms for com- 0 I. INTRODUCTION puting error-control codes, to our knowledge these work for 6 1 We consider block codes C, that is, sets of words of the specific channels and implementations are generally not open : same length ℓ, for some integer ℓ>0. The elements of C are source. v i calledcodewordsorC-words.WeuseAtodenotethealphabet X used for making words and II. CHANNELS AND ERROR CONTROL CODES r a Aℓ =the set of all words of length ℓ. We need a mathematical model for channels that is useful for answering algorithmic questions pertaining to error Our typical alphabet will be the binary one {0,1}. We shall control codes. While many models of channels and codes use the variables u,v,w,x,y,z to denote words over A (not for substitution-type errors use a rich set of mathematical necessarily in C). The empty word is denoted by ε. structures, this is not the case for channels involving syn- We also consider error specifications er, which we call chronizationerrors[13]. We believethe appropriatemodelfor combinatorial channels, or simply channels. A channel er our purposes is that of a transducer. We note that transducers specifies, for each allowed input word x, the set er(x) of all have been defined as early as in [18], and are a powerful computational tool for processing sets of words—see [2] and possible outputwords. We assume that error-freecommunica- tion is always possible, so x ∈ er(x). On the other hand, if pg 41–110 of [17]. y ∈ er(x) and y 6= x then the channel introduces errors into Definition 1. A transducer is a 5-tuple1 t = (S,A,I,T,F) x. Informally, a block code C is er-detecting if the channel such that A is the alphabet, S is the finite set of states, I ⊆S cannot turn a given C-word into a different C-word. It is er- is the set of initial states, F ⊆S is the set of final states, and correcting if the channel cannot turn two different C-words T is the finite set of transitions. Each transition is a 4-tuple into the same word. (s ,x /y ,t ), where s ,t ∈ S and x ,y are words over A. i i i i i i i i In Section II, we make the aboveconceptsmathematically 1The general definition of transducer allows two alphabets: the input and precise, and show how known examples of combinatorial the output alphabet. Here, however, we assume that both alphabets are the channels can be defined formally so that they can be used as same. The word xi is the input label and the word yi is the output 0/0,1/1 0/0,1/1 0/0,1/1 label of the transition. For two words x,y we write y ∈t(x) to mean that y is a possible output of t when x is used as 0/1 0/1 input. More precisely, there is a sequence sub2= s t1 t2 1/0 1/0 (s0,x1/y1,s1), (s1,x2/y2,s2),..., (sn−1,xn/yn,sn) a/a a/a a/a of transitions such that s0 ∈ I, sn ∈ F, x = x1···xn and y = y1···yn. The relation R(t) realized by t is the set of word pairs (x,y) such that y ∈t(x). A relation ρ⊆A∗×A∗ a/ε a/ε iscalledrationalifitisrealizedbyatransducer.Ifeveryinput id2= s t1 t2 ε/a ε/a and every output label of t is in A∪{ε}, then we say that t is in standard form. The domain of the transducer t is the a/a a/a set of words x such that t(x) 6= ∅. The transducer is called input-preserving if x ∈ t(x), for all words x in the domain of t. The inverse of t, denoted by t−1, is the transducer that a/ε ε/a is simply obtained by making a copy of t and changing each del1= s r t transition (s,x/y,t) to (s,y/x,t). Then x∈t−1(y) if and only if y ∈t(x). a/a a/a We note that every transducer can be converted (in linear ε/a a/ε ins1= s r t time) to one in standard form realizing the same relation. In our objective to model channels er as transducers, we require that a transducerer is a channelif it allows error-free Figure 1. Examples of (combinatorial) channels: sub2,id2,del1,ins1. communication, that is, er is input-preserving. Notation: A short arrow with no label points to an initial state (e.g., state s), and a double line indicates a final state (e.g., state t). An arrow with Definition 2. An error specification er is an input-preserving label a/a represents multiple transitions, each with label a/a, for a ∈ A; transducer. The (combinatorial) channel specified by er is and similarly for an arrow with label a/ε—recall, ε = empty word. Two or R(er), thatis, the relation realizedby er. For the purposesof morelabelsononearrowfromsomestateptosomestateqrepresentmultiple transitionsbetweenpandqhavingtheselabels.Channel sub2:usesthebinary thispaper,however,wesimplyidentifytheconceptofchannel alphabet {0,1}. On input x, sub2 outputs x, or any word that results by with that of error specification. performingoneortwosubstitutions inx.Thelattercaseiswhensub2 takes the transition (s,0/1,t1) or (s,1/0,t1), corresponding to one error, and A piece of notation that is useful in this work is the then possibly (t1,0/1,t2)or(t1,1/0,t2), corresponding toasecond error. A block code C is sub2-detecting iff the min. Hamming distance of C is following, where W is any set of words, >2.Channelid2:alphabetAnotspecified.Oninputx,id2outputsaword that results by inserting and/or deleting at most 2 symbols in x. A block er(W) = [ er(w) (1) code C is id2-detecting iff the min. Levenshtein distance of C is > 2 [8]. w∈W Channels del1,ins1:considered in[15], here alphabet Anot specified. On input x, del1 outputs either x, or any word that results by deleting exactly Thus, er(W) is the set of all possible outputs of er when the onesymbolinxandtheninserting asymbolattheendofx. input is any word from W. For example, if er = id1 = the channel that allows up to 1 symbol to be deleted or inserted in the input word, then id1({00,11}) = An er-detecting block code C is called maximal er-detecting if C ∪{w} is not er-detecting for any word of length ℓ that {00,0,000,100,010,001,11,1,011,101,110,111}. is not in C. The concept of a maximal er-correcting code is Fig. 4 considers examples of channels that have been defined similar. in past research when designing error control codes. Here these channels are shown as transducers, which can be used From a logical point of view (see Lemma 4 below) error- as inputs to algorithmsfor computingerrorcontrolcodes. For detection subsumes the concept of error-correction. This con- the channel sub2, we have 00101∈sub2(00000) because on nection is stated already in [7] but without making use of input00000,the channelsub2 can read the first two input0’s it there. Here we add the fact that maximal error-detection at state s and output 0, 0; then, still at state s, read the 3rd 0 subsumes maximal error-correction. Due to this observation, and output 1 and go to state t1; etc. in this paper we focus only on error-detecting codes. The concepts of error-detectionand -correction mentioned Note:Theoperation‘◦’betweentwotransducerstandsis in the introduction are phrased below more rigorously. calledcompositionandreturnsanewtransducers◦tsuchthat z ∈(s◦t)(x) if and only if y ∈t(x)andz ∈s(y),for some y. Definition 3. Let C be a block code and let er be a channel. We say that C is er-detecting if Lemma 4. Let C ⊆Aℓ be a block code ander be a channel. Then C is er-correcting if and only if it is (er−1 ◦ er)- v ∈C, w ∈C and w ∈er(v) imply v =w. detecting. Moreover, C is maximal er-correcting if and only We say that C is er-correcting if if it is maximal (er−1◦er)-detecting. v ∈C, w ∈C and x∈er(v)∩er(w) imply v =w. Proof:Thefirststatementisalreadyin[7].Forthesecond 2 statement, first assume that C is maximal er-correcting and as in the case of a channel, but each transition has only an consider any word w ∈Aℓ\C. If C∪{w} were (er◦er−1)- input label, that is, it is of the form (s,x,t) with x being one detectingthenC∪{w}wouldalsobeer-correctingand,hence, alphabetsymbolor the emptyword ε. The languageaccepted C would be non-maximal; a contradiction. Thus, C must be by a is denoted by L(a) and consists of all words formed by maximal (er◦er−1)-detecting. The converse can be shown concatenating the labels in any path from an initial to a final analogously. state. The automatonis called deterministic, or DFA for short, ifI consistsofasinglestate,therearenotransitionswithlabel Theoperation‘∨’betweenanytwotransdcucerstandsis ε, and there are no two distinct transitions with same labels obtainedbysimplytakingtheunionoftheirfivecorresponding going out of the same state. Special cases of automata are components (states, alphabet, initial states, transitions, final constraint systems in which normally all states are final (pg states) after a renaming, if necessary, of the states such that 1635–1764 of [16]), and trellises. A trellis is an automaton the two transdcucers have no states in common. Then accepting a block code, and has one initial and one final state (t∨s)(x) = t(x) ∪ s(x). (pg1989–2117of[16]).Inthecaseofatrellisawetalkabout the code represented by a, and we denote it as C(a), which Let er be a channel, let C ⊆ Aℓ be an er-detecting block is equal to L(a). code, and let w∈Aℓ\C. In [3], the authors show that For computationalcomplexity considerations, the size |m| C∪{w} is er-detecting iff w∈/ (er∨er−1)(C). (2) of a finite state machine (automaton or transducer) m is the Definition 5. Let C ⊆Aℓ be an er-detecting block code. We number of states plus the sum of the sizes of the transitions. saythatawordwcanbeaddedintoC ifw ∈/ (er∨er−1)(C). The size of a transition is 1 plus the length of the label(s) on the transition. We assume that the alphabet A is small so we do not include its size in our estimates. Statement (2) above implies that C is maximal er-detecting iff Aℓ \ (er∨er−1)(C) = ∅. An important operation between an automaton a and a (3) transducert,heredenotedby‘✄’,returnsanautomaton(a✄t) that acceptsthe set of all possible outputsof t when the input Definition 6. The maximality index of a block code C ⊆ Aℓ is any word from L(a), that is, w.r.t. a channel er is the quantity |Aℓ∩(er∨er−1)(C)| L(a✄t)=t(L(a)). maxind(C,er)= . |Aℓ| Remark 8. We recall here the construction of (a✄t) from Let f be a real number in [0,1]. An er-detecting block code given a = (S1,A,I1,T1,F1) and t = (S2,A,I2,T2,F2), C is called f-maximal er-detecting if maxind(C,er)≥f. where we assume that a contains no transition with label ε. First, if necessary, we convert t to standard form. Second, if t contains any transition whose input label is ε, then we The maximality index of C is the proportion of the ‘used up’ words of length ℓ over all words of length ℓ. One can add into T1 transitions (q,ε,q), for all states q ∈ S1. Let T1 denote now the updated set of transitions. Then, we construct verify the following useful lemma. the automaton Lemma 7. Let er be a channel and let C ⊆ Aℓ be an er- detecting block code. b = (S1×S2, A, I1×I2, T, F1×F2) 1) maxind(C,er) = 1 if and only if C is maximal er- such that ((p1,p2),y,(q1,q2)) ∈ T, exactly when there are detecting. transitions (p1,x,q1)∈T1 and (p2,x/y,q2)∈T2. The above constructioncanbedoneintimeO(|a||t|) andthesizeofbis 2) Assuming that words are chosen uniformly at random from Aℓ, the maximality index is the probability that a O(|a||t|).Therequiredautomaton(a✄t)isthetrimversionof b,whichcanbecomputedintimeO(|b|).(Thetrimversionof randomly chosen word w of length ℓ cannot be added into C preserving its being er-detecting, that is, an automaton m is the automaton resulting when we remove any states of m thatdo notoccurin some path froman initial maxind(C,er) = Pr w cannot be added into C . to a final state of m.) h i Proof: The first statement follows from Definition 6 and nonMax (er,a,f,ε) condition(3).Thesecondstatementfollowswhenwenotethat b := (a✄(er∨er−1)); the event that a randomly chosen word w from Aℓ cannot be n := 1 + 1/ 4ε(1−f)2 ; added into C is the same as the event that w ∈ Aℓ ∩(er∨ j (cid:16) (cid:17)k er−1)(C). ℓ := the length of the words in C(a); tr := 1; while (tr ≤ n): III. GENERATING ERROR CONTROL CODES w := pickFrom(A,ℓ); We turn now our attention to algorithms processing chan- if (w not in L(b)) return w; nels and sets of words. A set of words is called a language, tr := tr+1; with a block code being a particular example of language. return None; A powerfulmethod of representing languagesis via finite au- Figure2. Algorithm nonMax—seeTheorem9. tomata[17].A(finite)automatonaisa5-tuple(S,A,I,T,F) 3 Next we present our randomized algorithms—we use [14] as Remark10. Wementiontheimportantobservationthatonecan referenceforbasicconcepts.Weassumethatwehaveavailable modify the algorithm nonMax by removing the construction touseinouralgorithmsanidealmethodpickFrom(A,ℓ)that of b and replacing the ‘if’ line in the loop with chooses uniformly at random a word in Aℓ. A randomized if (C(a)∪{w} is er-detecting) return w; algorithm R(···) with specific values for its parameters can beviewedasarandomvariablewhosevalueiswhatevervalue While with this change the output would still be correct, is returned by executing R on the specific values. the time complexity of the algorithm would increase to Theorem9. ConsiderthealgorithmnonMax inFig.2,which O(cid:16)|a|2|er|.(cid:0)ε(1 − f)2(cid:1)(cid:17). This is because testing whether takes as input a channel er, a trellis a accepting an er- L(v) is er-detecting, for any given automaton v and channel detecting code, and two numbers f,ε∈[0,1]. er, can be done in time O(|v|2|er|), and in practice |v| is much larger than ℓ. 1) The algorithm either returns a word w ∈Aℓ\C(a) such that the code C(a)∪{w} is er-detecting, or it returns In Fig. 3, we present the main algorithm for adding new None. words into a given deterministic trellis a. 2) If C(a) is not f-maximal er-detecting, then makeCode (er,a,N) Pr nonMax returns None <ε. W := empty list; c:= a h i cnt := 0; more := True; while (cnt < N and more) 3) The time complexity of nonMax is w := nonMax(er,c,0.95,0.05); if (w is None) more := False; O(cid:16)ℓ|a||er|.(ε(1−f)2)(cid:17). else {add w to c and to W; cnt := cnt+1;} return c, W; Proof: The first statement follows from statement (2) in Figure 3. Algorithm makeCode—see Theorem 12. The trellis a can be the previous section, as any w returned by the algorithm is omittedsothatthealgorithm wouldstartwithanemptysetofcodewords.In not in (er∨er−1)(C(a)). For the second statement, suppose this case, however, the algorithm would require as extra input the codeword thatthe codeC(a) isnotf-maximaler-detecting.LetCnt be lengthℓandthedesiredalphabetA.Weusedthefixedvalues0.95and0.05, astheyseemtoworkwellinpractical testing. the random variable whose value is the value of tr − 1 at the endofexecutionoftherandomizedalgorithmnonMax.Then, Cnt counts the number of words that are in C(a) out of n Remark 11. Insomesense,algorithmmakeCode generalizes randomlychosenwordsw. ThusCnt is binomial:the number to arbitrary channels the idea used in the proof of the well- of successes (words w in L(b)) in n trials. So E(Cnt)=np, known Gilbert-Varshamov bound [12] for the largest possible where p=Prhw∈L(b)i. By the definitionof n in nonMax, blockcodeM ⊆Aℓ thatis subk-correcting,forsome number we get 1/(4n(1 − f)2) < ε. Now consider the Chebyshev kofsubstitutionerrors.Inthatproof,awordcanbeaddedinto the code M if the word is outside of the union of the “balls” inequality, Prh|X −E(X)|≥ai ≤ σ2/a2, where a > 0 is sub2k(u), for all u∈M. In that case, we have that sub−k1 = arbitrary and σ2 is the variance of some random variable X. subk and (sub−k1◦subk)=sub2k(u). The present algorithm For X =Cnt the variance is np(1−p), and we get adds new words w to the constructed trellis c such that each new wordw isoutsideof the “union-ball”(er∨er−1)(C(c)). Pr |Cnt/n−p|≥1−f <ε, h i Theorem 12. Algorithm makeCode in Fig. 3 takes as input a channel er, a deterministic trellis a of some length ℓ, and wherewe useda=n(1−f)andthefactthatp(1−p)≤1/4. anintegerN >0suchthatthecodeC(a)iser-detecting,and Using Lemma 7 and the assumption that C(a) is not f- returns a deterministic trellis c and a list W of words such maximal, we have that maxind(C(a),er)<f, which implies that the following statements hold true: Prhw ∈L(b)i<f; hence, p<f. Then 1) C(c)=C(a)∪W and C(c) is er-detecting, 2) If W has less than N words, then either Prh nonMax returns Nonei=PrhCnt=ni= maxind(C(c),er) ≥ 0.95 or the probability that a Pr Cnt/n=1 =Pr Cnt/n−p=1−p ≤ randomly chosen word from Aℓ can be added in C(c) is h i h i <0.05. Prh|Cnt/n−p|≥1−pi≤Prh|Cnt/n−p|≥1−fi <ε, 3) The algorithm runs in time O ℓN|er||a|+ℓ2N2|er| . (cid:16) (cid:17) as required. Proof: Let c be the value of the trellis c at the end of i For the third statement, we use standard results from the i-th iterationofthe while loop.The firststatementfollows automaton theory, [17], and Remark 8. In particular, com- fromTheorem9:anywordwreturnedbynonMax issuchthat puting b can be done in time O(|a|·|er|) such that |b| = C(c )∪{w}iser-detecting.Forthesecondstatement,assume i O(|a| · |er|). Testing whether w ∈ L(b) can be done in that, at the end of execution, W has <N words and C(c) is time O(|w||b|)=O(ℓ|b|). Thus, the algorithm works in time not 95%-maximal. By the previous theorem, this means that O(ℓ|a||er|/(ε(1−f)2)). the random process nonMax(er,c,0.95,0.05) returns None 4 withprobability<0.05,asrequired.Forthethirdstatement,as such that a accepts Aℓ if and only if C(d) is a maximal er- 2 the loop in the algorithm nonMax performs a fixed number detectingblockcodeoflengthℓ.Therestoftheproofconsists of iterations (=2000), we have that the cost of nonMax is of 5 parts: construction of deterministic trellis d accepting O(ℓ|c ||er|). The cost of adding a new word w of length ℓ words of length ℓ, construction of er, facts about d and er, i to ci−1 is O(ℓ) and increases its size by O(ℓ), so each ci is proving that C(d) is er-detecting, proving that a accepts Aℓ2 of size O(|a|+iℓ). Thus, the cost of the i-th iteration of the if and only if C(d) is maximal er-detecting. while loop in makeCode is O(ℓ|er|(|a|+iℓ)). As there are Construction of d: Let A be the alphabetA2∪˙T, where T up to N iterations the total cost is is the set of transitions of a. The required deterministic trellis N d is any deterministic trellis accepting Aℓ\Aℓ2, that is, XO(cid:16)ℓ|er|·(|a|+iℓ)(cid:17) = O(cid:16)ℓN|er||a|+ℓ2N2|er|(cid:17). C(d) = Aℓ\Aℓ2. i=1 Thiscanbe constructed,forinstance,bymakingdeterministic trellises d1 and d2 accepting, respectively, Aℓ and Aℓ2, and Remark 13. In the algorithm makeCode, attempting to add then intersecting d1 with the complement of d2. Note that only one word into C(a) (case of N = 1), requires time any word in C(d) contains at least one symbol in T. O(ℓ|er||a|+ℓ2|er|), which is of polynomial magnitude. This case is equivalent to testing whether C(a) is maximal er- Construction of er: This is of the form (er1 ∨ er2) as detecting, which is shown to be a hard decision problem in follows.Thetransducerer2hasonlyonestatesandtransitions (s,α/α,s), for all α ∈ A, and realizes the identity relation Theorem 15. {(x,x) | x ∈ A∗}. Thus, we have that er2(x) = {x}, for all wRehmeraerkthe14i.nitIinal tthreellivserasioisnomofitttehde, tahlegotriimthemcommapkleexCitoydies twhoartdTs′′xco∈nsAis.tsTohfeetxraancstldyutcheerterarn1s=itio(nSs,(Ap,,s(p,T,a′′,,qF)/)ai,sq)sufcohr O(ℓ2N2|er|). We also note that the algorithm would work which (p,a,q) is a transition of a. with the same time complexity if the given trellis a is not deterministic.In this case, however,the resulting trellis would Factsaboutdander:Thefollowingfactsarehelpfulinthe not be (in general) deterministic either. restoftheproof.Someofthesefactsrefertothedeterministic trellis d1 = (S,T,s,T1,F) resulting by omitting the output partsofthetransitionlabelsofer1,thatis,(p,(p,a,q),q)∈T1 IV. WHY NOT USE ADETERMINISTIC ALGORITHM exactly when (p,(p,a,q)/a,q) ∈ T′′. Then, C(d1) ⊆ (A\ Our motivation for considering randomized algorithms is A2)ℓ ⊆C(d). that the embedding problem is computationally hard: given F0: L(d1✄er1)=L(a). a deterministic trellis d and a channel er, compute (using a F1: The domain of er1 is C(d1), a subset of (A\A2)ℓ. ddeetteercmtiningisctoicdealcgoonrittahimni)ngatCre(ldli)s.tBhaytcroepmrpesuetanttisoanamllayxhimaradl,ewre- FF23:: Iefr1v(∈C(edr)1)(u=)Lth(ean).v ∈Aℓ2 and v 6=u. mean that a decision version of the embedding problem is F4: er−1(C(d))=∅. 1 coNP-hard. This is shown next. For fact F0, note that the product construction described Theorem 15. The following decision problem is coNP-hard. in Remark 8 produces in (d1 ✄er1) exactly the transitions ((p,p),a,(q,q)), where(p,a,q)isatransitionina,bymatch- IAnnsstawnecre::dwehteetrhmeirniCst(idc)triesllmisadximanadl ecrh-adnenteeclteinrg.. (inpg,(apn,ya,trqa)n/sait,iqo)no(pf,e(rp1,.aF,qa)c,tqF)1offodll1oownslybywitthhethceontrsatrnuscittiioonn of er1 and the definition of d1: in any accepting computation Proof: Let us call the decision problem in question of er1, the input labels appear in an accepting computation MAXED, and let FULLBLOCK be the problem of deciding of d1 that uses the same sequence of states. F3 is shown as whether a given trellis over the alphabet A2 ={0,1} with no follows: As the domain of er1 is C(d1) and C(d1) ⊆ C(d), ε-labeled transitions accepts Aℓ2, for some ℓ. The statement is we have that er1(C(d))=er1(C(d1)), which is L(a) by F0. a logical consequence of the following claims. Fact F4 followsby notingthat the domainof er−1 is a subset 1 of Aℓ but C(d) contains no words in Aℓ. Claim 1: FULLBLOCK is coNP-complete. 2 2 Claim 2: FULLBLOCK is polynomially reducible to C(d) is er-detecting: Let u,v ∈ C(d) such that v ∈ MAXED. er(u) = er1(u)∪{u}. We need to show that v = u, that is, The first claim follows from the proof of the following to show that v ∈/ er1(u). Indeed, if v ∈ er1(u) then v ∈ Aℓ2, which contradicts v ∈C(d)=Aℓ\Aℓ. factonpage329of[10]:Decidingwhethertwogivenstar-free 2 regularexpressionsoverA2areinequivalentisanNP-complete a accepts Aℓ2 if and only if C(d) is maximaler-detecting: problem. Indeed, in that proof the first regular expression can By statement (3) we have that C(d) is maximal er-detecting, be arbitrary, but the second regular expression represents the if and only if (er∨er−1)(C(d))=Aℓ. We have: alanstgaur-afgreeeArℓe,gfuolrarsoemxperpesossiiotinvetoinatnegaecryℓc.liMcoaruetoovmear,tocnonwvietrhtinnog (er∨er−1)(C(d)) = C(d)∪er1(C(d))∪er−11(C(d)) ε-labeled transitions is a polynomial time problem. = (Aℓ\Aℓ2)∪L(a)∪∅ For the second claim, consider any trellis a = = (Aℓ\Aℓ2)∪˙ L(a). (S,A2,s,T,F)withnoε-labeledtransitionsin T.We needto Thus,C(d)ismaximaler-detecting,ifandonlyifL(a)=Aℓ2, construct in polynomial time an instance (d,er) of MAXED as required. 5 V. IMPLEMENTATIONAND USE Definition 3 and using standard logical arguments, it follows that Allmainalgorithmictoolshavebeenimplementedoverthe years in the Python package FAdo [1], [4], [6]. Many aspects 1) L is er1-detecting and er2-detecting, if and only if L is of the new module FAdo.codes are presented in [6]. Here we (er1∨er2)-detecting; presentmethodsofthatmodulepertainingtogeneratingcodes. 2) L is er−1-detecting, if and only if it is er-detecting, if Assume that the string d1 contains a description of the and only if it is (er−1∨er)-detecting. transducerdel1 in FAdo format.In particular,d1 beginswith the type of FAdo object being described, the final states, and The inverseof del1 is ins1 andis shownin Fig. 4, where the initial states (after the character *). Then, d1 containsthe recall it results by simply exchanging the order of the two listoftransitions,witheachoneoftheform“sxyt\n”,where words in all the labels in del1. By statement 2 of the above ‘\n’ is the new-line character. This shown in the following remark, the del1-detecting codes are the same as the ins1- Python script. detecting ones, and the same as the (del1∨ins1)-detecting ones—this is shown in [15] as well. The method of using import FAdo.codes as codes transducers to model channels is quite general and one can d1 = ’@Transducer 0 2 * 0\n’ give many more examples of past channels as transducers, as ’0 0 0 0\n0 1 1 0\n0 0 @epsilon 1\n’ well as channels not studied before. Some further examples ’0 1 @epsilon 1\n1 0 0 1\n1 1 1 1\n’ are shown in the next figures, Fig. 4-6. ’1 @epsilon 0 2\n1 @epsilon 1 2\n’ One can go beyond the classical error control properties pd1 = codes.buildErrorDetectPropS(d1) and define certain synchronization properties via transducers. a = pd1.makeCode(100, 8, 2) Let OF be the set of all overlap-freewords, that is, all words print pd1.notSatisfiesW(a) w such that a proper and nonempty prefix of w cannot be print pd1.nonMaximalW(a, m) a suffix of w. A block code C ⊆ OF is a solid code if s2 = ...string for transducer sub_2 any proper and nonempty prefix of a C-word cannot be a ps2 = codes.buildErrorDetectPropS(s2) suffix of a C-word. For example, {0100, 1001} is not a pd1s2 = pd1 & ps2 block solid code, as 01 is a prefix and a suffix of some b = pd1s2.makeCode(100, 8, 2) codewords and 01 is nonempty and a proper prefix (shorter than the codewords).Solid codescan also be non-blockcodes The above script uses the string d1 to create the object pd1 by extending appropriatelythe above definition [19] (they are representing the del1-detection property over the alphabet also called codes without overlaps in [9]). The transducer ov {0,1}.Then,itconstructsanautomatonarepresentingadel1- in Fig. 6 is such that any block code C ⊆OF is a solid code, detectingblockcodeoflength8withupto100wordsoverthe if and only if C is an ‘ov-detecting’block code. We note that 2-symbolalphabet{0,1}.ThemethodnotSatisfiesW(a) solid codes have instantaneous synchronization capability (in tests whether the code C(a) is del1-detecting and returns particular all solid codes are comma-free codes) as well as a witness of non-error-detection (= pair of codewords u,v synchronization in the presence of noise [5]. with v ∈ del1(u)), or (None, None)—of course, in the above example it would return (None, None). The method nonMaximalW(a, m) tests whether the code C(a) is max- 0b imaldel1-detectingandreturnseithera wordv ∈L(m)\C(a) such that C(a)∪{v} is del1-detecting, or None if C(a) is 0/1 a/a already maximal. The object m is any automaton—here it is the trellis representingAℓ. This methodis used only forsmall 1/0 a/a 1/0 codes,asingeneralthemaximalityproblemisalgorithmically 0a 1 hard (recall Theorem 15), which motivated us to consider the 0/1 randomized version nonMax in this paper. For any channel er and trellis a, the method notSatisfiesW(a) can be bsid2= 0 ε/a, a/ε made to work in time O(|er||a|2), which is of polynomial ε/a, a/ε complexity. The operation ‘&’ combines error-detection prop- 0/1 1/0 erties. Thus, the second call to makeCode constructs a code that is del1-detecting and sub2-detecting (=sub1-correcting). 2 a/a 1a 1b 0/1 VI. MORE ON CHANNEL MODELLING,TESTING 1/0 In this section, we consider further examples of channels and show how operationson channels can result in new ones. We also show the results of testing our codes generation Figure 4. The channel specified by bsid2 allows up to two errors in the inputword.Eachoftheseerrorscanbeadeletion,aninsertion,orabitshift: algorithm for several different channels. a10becomes01,ora01becomes10.Thealphabet is{0,1}. Remark 16. We note that the definition of error-detecting (or error-correcting) block code C is trivially extended to any language L, that is, one replaces in Definition 3 ‘block For ε = 0.05 and f = 0.95, the value of n in nonMax code C’ with ‘language L’. Let er,er1,er2 be channels. By is 2000. We performed several executions of the algorithm 6 N=,ℓ=,end= id2 del1 sub2 bsid2 ov 100, 8, 18,20,23 37,42,51 15,16,18 17,19,21 01,07,08 100, 7, 10,12,13 20,23,28 09,10,13 11,11,13 03,04,05 100, 8, 1 11,13,14 39,50,64 09,10,11 09,12,13 01,05,06 100, 8, 01 06,07,08 64,64,64 04,06,08 06,07,09 01,04,05 500, 12, 177,182,188 500,500,500 148,157,162 169,173,178 51,59,63 500, 13, 318,327,334 272,273,278 302,303,309 43,111,120 a/ε maximum, that is, having the largest possible number of tostatet1 f0 codewords, for given er and ℓ. It seems maximum codes are a/a a/a,a/ε rare, but there are many random maximal ones having lower rates. The del1-detectingcodeof [15] has higherrate thanall the random ones generated here. segd4= s1 s2 s3 a/a a/a For the case of block solid codes (last column of the s0 a/a table), we note that the function pickFrom in the algorithm a/ε a/ε nonMax has to be modified as the randomly chosen word w a/ε t1 t2 t3 should be in OF. a/a a/a tostates1 a/ε a/a VII. CONCLUSIONS f1 a/a We have presented a unified method for generating error control codes, for any rational combination of errors. The method cannot of course replace innovative code design, but Figure 5. Transducer for the segmented deletion channel of [11] with should be helpful in computing various examples of codes. parameter b= 4. In each of the length b consecutive segments of the input word, at most one deletion error occurs. The length of the input word is a The implementation codes.py is available to anyone for multiple of b. By Lemma 4, segd4-correction is equivalent to (segd−41 ◦ download and use [4]. In the implementation for generating segd4)-detection. codes,weallowonetospecifythatgeneratedwordsonlycome froma certaindesirablesubsetM of Aℓ, which is represented by a deterministic trellis. This requires changing the function a/ε a/a ε/a pickFrom in nonMax so that it chooses randomly words from M. There are a few directions for future research. One a/a ε/a is to work on the efficiency of the implementations, possibly ov= 0 1 2 allowing parallel processing, so as to allow generation of block codes having longer block length. Another direction is to somehow find a way to specify that the set of generated Figure6. Thisinput-preservingtransducerdeletesaprefixoftheinputword codewords is a ‘systematic’ code so as to allow efficient (apossibly emptyprefix)andtheninserts apossiblyemptysuffixattheend encodingofinformation.Athirddirectionistodoasystematic oftheinputword. study on how one can map a stochastic channel sc, like the binary symmetric channel or one with memory, to a channel er (representing a combinatorial channel), so as the available makeCode on various channels using n = 2000, no initial algorithms on er have a useful meaning on sc as well. trellis, and alphabet A = {0,1}. In the above table, the first column gives the values of N and ℓ, and if present and nonempty, the pattern that all codewords should end with (1 REFERENCES or 01). For each entry in an ‘N = 100’ row, we executed [1] Andre´ Almeida, Marco Almeida, Jose´ Alves, Nelma Moreira, and makeCode 21 times and reported smallest, median, and Roge´rioReis. FAdoandGUItar:Toolsforautomatamanipulation and largest sizes of the 21 generated codes. For N = 500, we visualization. InProceedingsofCIAA2009,Sydney,Australia,volume 5642ofLectureNotesinComputer Science, pages65–74,2009. reported the same figures by executing the algorithm 5 times. [2] JeanBerstel.TransductionsandContext-FreeLanguages.B.G.Teubner, For example, the entry 37,42,51 corresponds to executing Stuttgart, 1979. makeCode 21 times for er = del1, ℓ = 8, end = ε. The [3] KrystianDudzinskiandStavrosKonstantinidis. Formaldescriptions of entry 64,64,64 corresponds to the systematic code of [15] codeproperties:decidability,complexity,implementation. International whose codewords end with 01, and any of the 64 6-bit words Journal ofFoundations ofComputerScience, 23:1:67–85, 2012. can be used in positions 1–6. The entry for ‘ℓ = 7, end = [4] FAdo.Toolsforformallanguagesmanipulation. AccessedinJan.2016. ε, er = sub2’ corresponds to 2-substitution error-detection URL:http://fado.dcc.fc.up.pt/. which is equivalent to 1-substitution error-correction. Here [5] Helmut Ju¨rgenesen andS. S. Yu. Solid codes. Elektron. Information- the Hamming code of length 7 with 16 codewords has a sverarbeit. Kybernetik., 26:563–574, 1990. maximumnumber of codewordsfor this length. Similarly, the [6] Stavros Konstantinidis, Casey Meijer, Nelma Moreira, and Roge´rio entry for ‘ℓ =7, er= id2’ corresponds to 2-synchronization Reis.Implementationofcodepropertiesviatransducers.InYo-SubHan andKaiSalomaa,editors,ProceedingsofCIAA2016,number9705in error-detectionwhich is equivalentto 1-synchronizationerror- LectureNotesinComputerScience,pages189–201,2016. ArXivver- correction. Here the Levenshtein code [8] of length 8 has 30 sion: Symbolic manipulation of code properties. arXiv:1504.04715v1, codewords. We recall that a maximal code is not necessarily 2015. 7 [7] Stavros Konstantinidis and Pedro V. Silva. Maximal error-detecting capabilities offormallanguages. J. Automata,Languages andCombi- natorics, 13(1):55–71, 2008. [8] VladimirI.Levenshtein. Binarycodescapable ofcorrecting deletions, insertions, andreversals. Soviet PhysicsDokl.,10:707–710, 1966. [9] VladimirI.Levenshtein. Maximumnumberofwordsincodeswithout overlaps. Probl.Inform.Transmission,6(4):355–357, 1973. [10] H.Lewis and C.H.Papadimitriou. Elements ofthe Theory ofCompu- tation, 2nded. Prentice Hall,1998. [11] Zhenming Liu and Michael Mitzenmacher. Codes for deletion and insertionchannelswithsegmentederrors. InProceedingsofISIT,Nice, France,2007,pages 846–849,2007. [12] F.J.MacWilliamsandN.J.A.Sloane. TheTheoryofError-Correcting Codes. Amsterdam,1977. [13] HuguesMercier,VijayBhargava,andVahidTarokh. Asurveyoferror- correctingcodesforchannelswithsymbolsynchronizationerrors.IEEE Communic. Surveys &Tutorials, 12(1):87–96, 2010. [14] Michael Mitzenmacher and Eli Upfal. Probability and Computing. CambridgeUniv.Press,2005. [15] Filip Paluncic, Khaled Abdel-Ghaffar, and Hendrik Ferreira. Inser- tion/deletion detecting codes and the boundary problem. IEEETrans. Information Theory,59(9):5935–5943, 2013. [16] V.S.PlessandW.C.Huffman, editors. HandbookofCodingTheory. Elsevier, 1998. [17] Grzegorz Rozenberg and Arto Salomaa, editors. Handbook of Formal Languages, Vol.I. Springer-Verlag, Berlin, 1997. [18] C.E.Shannon and W.Weaver. TheMathematical Theory ofCommu- nication. University ofIllinois Press,Urbana,1949. [19] H. J. Shyr. Free Monoids and Languages. Hon Min Book Company, Taichung,secondedition, 1991. 8

Channels with Synchronization/Substitution Errors and Computation of Error Control Codes PDF

0.16 MB·

by Stavros Konstantinidis

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Channels with Synchronization/Substitution Errors and Computation of Error Control Codes

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.