ebook img

Weakly Mutually Uncorrelated Codes PDF

0.14 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Weakly Mutually Uncorrelated Codes

1 Weakly Mutually Uncorrelated Codes S. M. Hossein Tabatabaei Yazdi , Han Mao Kiah† and Olgica Milenkovic ∗ ∗ ECE Department, University of Illinois, Urbana-Champaign †SPMS, Nanyang Technological University, Singapore ∗ Abstract—We introduce the notion of weakly mutually uncorre- codesandbalanced,error-correctingWMUcodes.Abinarystring lated (WMU) sequences, motivated by applications in DNA-based is called balanced if half of its symbols are zero. On the other storagesystemsandsynchronizationprotocols.WMUsequencesare hand,aDNAstringistermedbalanceifithasa50%GCcontent, characterized by the property that no sufficiently long suffix of representing the percentage of symbols that are either G or C. one sequence is the prefix of the same or another sequence. In addition, WMU sequences used in DNA-based storage systems are Balanced DNA strands are more stable than DNA strands with requiredtohavebalancedcompositionsofsymbolsandtobeatlarge lowerorhigherGCcontentandtheyhavelowersequencingerror- 6mutual Hamming distance from each other. We present a number rates. Atthe same time,WMU codesatlargeHammingdistance 1of constructions for balanced, error-correcting WMU codes using limit the probability of erroneous codeword selection. 0Dyck paths, Knuth’s balancing principle, prefix synchronized and 2cyclic codes. Thepaperisorganizedasfollows.InSection2wereviewMU and introduce WMU codes, and derive boundson the maximum n 1. INTRODUCTION size of the latter family of codes. In addition, we outline a a J Mutually uncorrelated (MU) codes are a class of block codes constructionthatmeetstheupperbound.InSection3wedescribe 9 in which no proper prefix of one codeword is a proper suffix constructionsforerror-correctingWMUcodes,whileinSection4 2of the same or another codeword. MU codes were extensively wediscussbalancedWMUcodes.Ourmainresultsarepresented studied in the coding theory and combinatorics literature under in Section 5, where we first propose to use cyclic codes to ] Ta variety of names. Levenshtein introduced the codes in 1964 devise an efficient construction of WMU codes that are both Iunder the name ‘strongly regular codes’ [1], and suggested that balancedandhave errorcorrectingcapabilities. We then proceed s.thecodesbeusedforsynchronization.Inspiredbyapplicationsof to improve the cyclic code construction in terms of coding cdistributed sequences in frame synchronization as described by rate through decoupled constrained and error-correcting coding [ van Wijngaarden and Willink in [2], Bajic´ and Stojanovic´ [3] for binary strings. In this setting, we use Knuth’s balancing 1rediscovered mutually uncorrelated codes, and studied them technique [11] and DC-balanced codes [12]. vunder the name of ’cross-bifix-free’ codes. Constructions and 6 2. MU AND WMU CODES:DEFINITIONS, BOUNDS AND boundson the size of MU codes were also reportedin a number 7 CONSTRUCTIONS of recent contributions [4], [5]. In particular, Blackburn [5] 1 Throughout the paper we use the following notation: F 8analyzed these sequences under the name of ‘non-overlapping q denotes a finite field of order q 2. If not stated otherwise, we 0codes’, and provided a simple construction for a class of MU ≥ tacitlyassumethatq =2,andthatthecorrespondingfieldequals .codes with optimal cardinality. MU codes have also found 1 F = 0,1 . We let a= (a ,...,a ) Fn stand for a word of 0applications in DNA storage [6], [7]: In this setting, Yazdi et 2 { } 1 n ∈ q length n over F , and aj = (a ,...,a ), 1 i j n, stand 6al. [8] developed a new, random-access and rewritable DNA- q i i j ≤ ≤ ≤ 1based storage architecture based on DNA sequences endowed for a substring of a starting at position i and ending at position : j. Moreover, for two arbitrary words a Fn,b Fm we use vwith mutually uncorrelated address strings that allow selective ∈ q ∈ q iaccess to encoded DNA blocks. The addressing scheme based ab to denote a word of length n+m generated by appendingb X to the right-hand side of a. on MU codes was augmented by specialized DNA codes in [9]. r a Here, we generalize the family of MU codes by introducing A. MU Codes weakly mutually uncorrelated (WMU) codes. WMU codes are We say that a = (a ,...,a ) Fn is self uncorrelated if block codes in which no “long” prefixes of one codeword are no proper prefix of a 1matchesnits∈sufqfix, i.e., (a ,...,a ) = 1 i suffixesofthesameorothercodewords.WMUcodesdifferfrom (a ,...,a ), for all 1 i < n. One can extend th6is n i+1 n MU codes in so far that they allow short prefixes of codewords defi−nitiontomutuallyuncorrel≤atedsequencesasfollows:twonot to also appear as suffixes of codewords. This relaxation of necessarily distinct words a,b Fn are mutually uncorrelated prefix-suffix constraints was motivated in [8] for the purpose if no proper prefix of a appears∈as aqsuffix of b and vice versa. of improving code rates while allowing for increased precision Furthermore, we say that Fn is a mutually uncorrelated DNA fragment assembly and selective addressing. For more (MU) code if any two not nCec⊆essarqily distinct elements in are details regardingthe utility of WMU codes in DNA storage, the mutually uncorrelated. C interested readers are referred to the overview paper [10]. The maximumcardinality of MU codes was determinedup to We are concerned with determining bounds on the size of aconstantfactorbyBlackburn[5,Theorem8].Forcompleteness, WMUcodesandefficientWMUcodeconstructions.Weconsider we state this result below. both binary and quaternary WMU codes, the later class adapted for encoding over the four letters DNA alphabet A,T,C,G . Theorem 1. Let AMU(n,q) denote the maximum size of MU { } codes over Fn, for n 1 and q 2. Then there exist constants Our contributions include bounds on the largest size of WMU q ≥ ≥ 0<C <C such that codes,constructionofWMUcodesthatachievethederivedupper 1 2 bound as well as results on three important constrained versions qn qn C A (n,q) C . of WMU codes: balanced WMU codes, error-correcting WMU 1 n ≤ MU ≤ 2 n 2 To motivate our WMU code design methods, we next briefly d( )=d( ).Asn waschosensothat 0,1 n. parse H H parse C C C ⊆{ } outline two known and one new construction of MU codes. In addition, the parsing code is an MU code, since parse C it satisfies all the constraints required by Construction 2. To Construction 1. (Prefix-Balanced MU Codes) Bilotta et al. [4] determinethelargestasymptoticsizeofaparsingcode,webriefly described a simple construction for MU codes based on well recall the Gilbery-Varshamovbound. known combinatorial objects termed Dyck words. A Dyck word is a binary string composed of n zeros and n ones such that no Theorem 2. (Asymptotic Gilbert-Varshamov bound [15], [16]) prefix of the word has more zeros than ones. By definition, a For any two positive integers n and d n, there exists a Dyck word necessarily starts with a one and ends with a zero. block code 0,1 n of minimum Ham≤min2g distance d with C ⊆ { } Consider a set of Dyck words of length 2n and define the normalized rate D following set of words of length 2n+1, d R( ) 1 h o(1), , 1a:a . C ≥ − n − CD { ∈D} (cid:18) (cid:19) Bilotta et al. proved that is a MU code. An important where h() is an entropy function, i.e., h(x) = xlog 1 +(1 CD · 2 x − observation is that MU codes constructed using Dyck words x)log 1 , for 0 x 1. 2 1 x ≤ ≤ are inherently balanced or near-balanced. To more rigorously − Corollary 1. For a fixed value of n, n is maximized in the describe this property of Dyck words, recall that a Dyck word H aforementioned construction by choosing ℓ = √n 2; in this hasheightatmostD ifforanyprefixoftheword,thedifference ∗ − case, n =(√n 2 1)2 =n 2√n 2 1. By applyingthe between the number of ones and the number of zeros is at most ∗H − − − − − GV result from Theorem 2 and choosing to be an [n ,s,d] D. Hence, the disbalance of any prefix of a Dyck word is at block code, with d n∗H and s = n (1CHh( d )), we∗Hobtain most D, and the disbalance of an MU codeword in D is one. ≤ 2 ∗H − n∗H Let Dyck(n,D) denote the number of Dyck words ofClength 2n an error-correcting MU code parse with parameters [n,s,d]. C and height at most D. For fixed values of D, de Bruijn et al. [13] proved that B. WMU Codes: Definitions, Bounds and Constructions 4n π π The notion of mutual uncorrelatedness may be relaxed by Dyck(n,D) tan2 cos2n . (1) requiring that only sufficiently long prefixes of one sequence do ∼ D+1 D+1 D+1 (cid:18) (cid:19) (cid:18) (cid:19) not match sufficiently long suffixes of other sequences. We next Here, f(n) g(n) denotes lim f(n)/g(n) = 1. Hence, ∼ m→∞ formally introduce codes with such defining properties. Billota’s construction produces balanced MU codes. In addition, the construction ensures that every prefix of a codeword is Definition 1. Let Fn and 1 k n. We say that is C ⊆ q ≤ ≤ C balanced as well. By mapping 0 and 1 to A,T and C,G , a k-weakly mutually uncorrelated (k-WMU) code if no proper { } { } respectively, we obtain a DNA MU code. prefix of length ℓ, for all ℓ k, of a codeword in appears as ≥ C a suffix of another codeword, including itself. Construction 2. (General MU Codes, Levenshtein [1] and Gilbert [14]). Let ℓ,n, 1 ℓ n 1, be two integers and Theorem 3. Let AWMU (n,q,k) denote the maximumsize of a let Fn be the set of all≤word≤s a=−(a ,...,a ) such that k-WMU code over Fn, for n 1 and q 2. Then, there exist C ⊆ q 1 n q ≥ ≥ (i) (a1,...,aℓ)=(0,...,0) constants 0<C3 <C4 such that (ii) aℓ+1,an =0 qn qn 6 C A (n,q,k) C . (iii) The sequence (a ,...,a ) does not contain ℓ consec- 3 WMU 4 ℓ+2 n 1 n k+1 ≤ ≤ n k+1 utive zeros as a subword. − − − Proof: To prove the upper bound, we use an approach first Then, is an MU code. Blackburn[5, Lemma3] showed that suggested by Blackburnin [5, Theorem 1]. Assume that Fn C C ⊆ q for ℓ = log 2n this construction is optimal. His proof relied isak-WMUcode.LetL=(n+1)(n k+1) 1,andconsider q othnatthdeoonbosetrcvoantitoaninthℓatcothnesencuumtivbeerzeorfosstrainsgas s(uabℓ+w2o,r.d..e,xacne−ed1)s tthheesceytcXlicosfupbawirosrd(ao,fi)awohfelreenagth∈nFLqst,a−irt∈ing{1a,t.p.−o.s,iLti}o,naindbewlohnegres (q−14)n2(q24q−1)qn,therebyestablishingthelowerboundofTheorem to C. Note thatour choiceof the parameterL is governedbythe 1. It is straightforward to modify the second proposed code overlap length k. construction so as to incorporate error-correcting properties in Note that X = L qL n, since there are L possibilities − | | |C| the underlyingMU code. We outline our new code modification for the index i, possibilities for the word starting at position |C| below. i of a, and qL n choices for the remaining L n 0 − − ≥ symbols in a. Moreover, if (a,i) X, then (a,j) / X for Construction 3. (Error-CorrectingMU Codes) Fix t andℓ to be ∈ ∈ j i 1,...,i n k due to the weak mutual uncor- positiveintegersandconsiderabinary[nH,s,d]codeC oflength rel∈ate{dn±essproper±ty.H−enc}em,ofdorLafixedworda FL,thereareat nH =t(ℓ 1), dimension s and Hamming distance d. For each ∈ q cgoivdeenwboyrd−b∈C, wemap b toa wordoflengthn=(t+1)ℓ+1 Tmhoisstimjnp−liLke+s1thkatdiXfferent paiLrs (a,qiL1.)C,.o.m.,b(cid:16)inain,ig⌊tnh−eLkt+w1o⌋(cid:17)de∈rivXed. | |≤ n k+1 a=0ℓ1b1ℓ−11bℓ2(ℓ−1)1···bt((tℓ−11)()ℓ 1)+11. constraints on the size ojf X−, wekobtain − − Furthermore, we define Cparse ,{a:b∈C}. |X|=L|C|qL−n ≤ n Lk+1 qL. It is easy to verify that = , and that the code (cid:22) − (cid:23) parse H has the same minim|uCm Ha|mmi|nCg d|istance as , i.e., Therefore, qn . Cparse CH |C|≤ n k+1 − 3 To prove the lower bound, we introduce a simple WMU code whereaa ,a a .Thisimpliesthatthestringaoflength 0 ′0 ∈C1 construction, outlined in Construction 4. at least k appears both as a proper prefix and suffix of two not necessarily distinct elements of . This contradicts the Construction4. Letk,nbetwointegerssuchthat1 k n.A C1 k-WMUcode Fn maybegeneratedthroughaco≤ncate≤nation assumption that C1 is a k-WMU code. It is easy to verify aCn=d {ab|Fan∈CkC+∈′1,bisq∈anC′M′},Uwchoedree.CIt′ i⊆s eFaqks−y1toisvuenricfoyntshtraatinedis, tahakt-WtheMsUamceodaer.gumentmay be used for the case that C2 is an kC-W′′M⊆Uqc−ode with |C′| |C′′| codewords. C (iii) F1o,rba,nby′ two2dsisutcinhctthwatocrd=s cΨ,c(′a,∈b)C,3c′th=ereΨe(xai′s,tba′),.aT′h∈e Let =Fk 1andlet Fn k+1bethelargestMUcodeof CHamming∈dCistance between c,c equals C′ q− C′′ ⊆ q− ′ size A (n k+1,q).Then, =qk 1A (n k+1,q). The clMaimUed l−ower bound now f|oCll|ows fr−om tMheUlow−er bound of 1(ci 6=c′i)= 1(ai 6=a′i∨bi 6=b′i) Theorem 1, establishing that |C|≥C1n−qkn+1 1≤Xi≤s 1≤Xdi≤s if a=a 1 ′ 3. ERROR-CORRECTING WMU CODES 6 min(d1,d2). ≥(d2 if b=b′ ≥ We now turn our attention to WMU code design problems of 6 interest in DNA-based storage. The collection of results in this This proves the claimed result. sectionpertainstoWMUcodeconstructionswitherror-correcting Construction 5. (Decoupled Binary Code Construction) For functionalities. givenintegersnandk n,letm=n k+1.Asbefore,leta,b Let us start by introducinga mapping Ψ that allows the DNA and c denote the binary≤componentwo−rds used in the encoding. codedesignproblemtobereducedtoabinarycodeconstruction. We construct A,T,C,G n according to the following steps: F0o,r1ansy, tΨwo(ab,ibna)ry:stri0n,g1s as =(a01,,1..s.,as),bA=,T(,bC1,,G..s.,ibss)a∈n (i) Encode aC ∈usi{ng a bina}ry block code C1 ⊆ {0,1}k−1 of e{ncod}ing function th{at m}aps×th{e pa}ir a→,b{to a DN}A string length k−1, and minimum Hamming distance d. Let Φ1 denote the encoding function, so that Φ (a) . c=(c ,...,c ) A,T,C,G s,accordingtothefollowingrules: 1 ∈C1 1 s ∈{ } (ii) InvokeConstruction3withn=mtoarriveatabinaryMU A if (ai,bi)=(0,0) code C2 ⊆ {0,1}m of length m, and minimum Hamming C if (a ,b )=(0,1) distance d. Encodeb using 2. Let Φ2 denotethe encoding for 1≤i≤s, ci =GT iiff ((aaiii,,bbiii))==((11,,10)) (2) (iii) nfEunnaccnotdidoenmc,isnuoismitnhugamtaΦHb2ian(mabrm)yi∈nbglCoC2cd.kisctaondceeCd3.⊆Le{t0,Φ1}ndeonfolteengththe 3 Clearly, Ψ is a bijection and Ψ(a,b)Ψ(c,d)=Ψ(ac,bd). The encoding function, so that Φ3(c)∈C3. next lemma lists a number of useful properties of Ψ. The output of the encoder performing the three outlined steps equals Ψ(Φ (a)Φ (b),Φ (c)). Lemma 1. Suppose that , 0,1 s are two binary block 1 2 3 1 2 C C ⊆ { } code of length s. Encode each pair (a,b) using the Next, we argue that is a WMU code with guaranteed ∈ C1×C2 C DNA block code = Ψ(a,b) a ,b . Then: minimum Hamming distance properties. 3 1 2 C { | ∈C ∈C } (i) C3 is balanced if C2 is balanced. Lemma 2. Let A,T,C,G n denote the code generated by (ii) is a k-WMU code if either or is a k-WMU code. C ∈ { } 3 1 2 Construction 5. Then: C C C (iii) If d and d are the minimum Hamming distances of 1 2 C1 (i) is k-WMU code. and C2, respectively, then the minimum Hamming distance (ii) CThe minimum Hamming distance of is at least d. of 3 is at least min(d1,d2). C C Example 1. In Construction 5, let and be [k 1,s ,d] Proof: C1 C3 − 1 and [n,s ,d] block codes, respectively, where s =(k 1)(1 3 1 (i) An,ybc ∈ C.3Amccaoyrdbiengwtroitt(e2n),athsecn=umΨbe(rao,fb)G,,wChseyrembao∈ls h(kd1)),s3 = n(1−h(nd)) and d ≤ k−21 satisfy the−Gilber−t- iCn1c e∈quCa2ls the number of ones in b. Since b is balanced, Vars−hamov bound of Theorem 2. Construct an [m,s2,d] block code by using Corollary 1, with m = n k + 1,m = exactlyhalfofthesymbolsincareGsandCs.Thisimplies m 2C√2m 2 1,s =m (1 h( d )) andd− m∗H. For∗Hthis that C3 has a 50% GC content. cho−ice of c−omp−onen2t codes∗H, the−cardmin∗Hality of ≤equa2ls (ii) We provetheresultbycontradiction.Supposethat 3 isnot C a k-WMU code while C1 is a k-WMU code. TheCn, there =2s1+s2+s3 =2(k−1)(1−h(k−d1))+m∗H(1−h(md∗H))+n(1−h(nd)) exist c,c such that a proper prefix of length at least |C| k of c ap′p∈eaCr3s as a suffix of c′. Alternatively, there exist = 4n−√n−k−1−12 nonemptystringsp,c0,c′0 such thatc=pc0,c′ =c′0p and 2(k−1)h(k−d1)+m∗Hh(md∗H)+nh(nd) the length of p is at least k. Next, we use the fact Ψ is a bijection and find binary strings a,b,a ,b such that 4. BALANCED WMU CODES 0 0 We begin this section by reviewing a simple method for p=Ψ(a,b),c =Ψ(a ,b ),c =Ψ(a ,b ). 0 0 0 ′0 ′0 ′0 constructing balanced binary words, introduced by Knuth [11] Therefore, in 1986. In this scheme, an n-bit binary string (a ,...,a ) is 1 n sent to an encoder that inverts the first b bits of the data word c=pc =Ψ(a,b)Ψ(a ,b )=Ψ(aa ,bb ), 0 0 0 0 0 ((a ,...,a )+ 1b0n b). The value of b is chosen so that the 1 n − c =c p=Ψ(a ,b )Ψ(a,b)=Ψ(a a,b b), ′ ′0 ′0 ′0 ′0 ′0 encoded word has an equal number of zeros and ones. Knuth 4 provedthatit is alwayspossibleto find an indexb thatensuresa code with parameter D and cardinality balancedoutput.Theindexbisrepresentedbya balancedbinary k 1 n k word (b1,...,bp) of length p. To create the final codeword, the |C|=|C1′||C1′′||C2|=A(k−1,2, −2 )Dyck( −2 ,D)2n encoder prepends (b ,...,b ) to (a ,...,a ) + 1b0n b. The 1 p 1 n − 4n tan2 π cosn k π receiver can easily decode the message by first extracting the D+1 − D+1 . index b from the first p bits and then inverting the first b bits of ∼ √2π(cid:16)(D+(cid:17)1)(k 1(cid:16))12 (cid:17) the length-n sequence. − Let A(n,d,w) denote the maximum cardinality of a binary 5. BALANCED AND ERROR-CORRECTING WMU CODES constantweight-wcodeoflengthnandevenminimumHamming In what follows, we describe the main results of this paper, distance d. Knuth [11] proved that pertaining to constructions of balanced, error-correctingWMUs. n n 2n+1 The first construction is conceptually simple and it lends itself A n,2, = to efficient encoding and decoding procedures. The second con- (cid:16) 2(cid:17) (cid:18)n2(cid:19)≈ √2πn21 structionoutperformsthe first constructionin termsof codebook which is a simple consequence of Stirling’s approximation for- size, and it utilizes the binary encoding functions described in mulan! √2πnnne n.Furthermore,Grahametal.[17]derived the previous sections. − ≈ several bounds for the more general function A(n,d,w). An A. A Construction Based on Cyclic Codes updated list on the exact values and bounds on A(n,d,w) The next construction uses ideas similar to Tavares’ synchro- maybefoundathttp://codes.se/bounds/.Inourfuture nizationtechnique[19].Westartwithasimplelemmaandashort analysis, we use the well known Johnson [18] bound. justification for that. Theorem 4. (Johnson Bound) For n , one has →∞ Lemma 4. Let be a cyclic code of dimensionk. Then the run C 2n+1 n 2n+21 en2 of zeros in any nonzero codeword is at most k−1. A n,d, . √2πnd−21 ≤ 2 ≤ √2πnd−21 Proof: Assume that there exists a non-zero codeword c(x), (cid:16) (cid:17) representedinpolynomialform,witharunofzeroesoflengthk. Construction 6. (Balanced WMU Codes) For given integers n Since the code is cyclic, one may write c(x)=a(x)g(x), where and k n, let m=n k+1. As before,leta and b denotethe a(x)istheinformationsequencecorrespondingtoc(x)andg(x) ≤ − binary words used in the quaternary mapping described before. is the generatorpolynomial.Without loss of generality,one may Construct a code A,T,C,G n as follows: assume that the zeros run appears in positions 0,...,k 1, so C ∈{ } − (i) Encode a using a k-WMU code C1 ⊆ {0,1}n of length n. that i+j=s aigj = 0, for s ∈ {0,...,k−1}. The solution of LFoert Φexamdepnloet,eotnheemenacyoduisnegCfuonncsttirounc,tisoont4hattoΦge(nae)rate C1.. cthoentprParedviciotiunsgstyhseteamssuomf peqtiuoantitohnastics(xa)0i=s nao1n=-ze.r.o..= ak−1 = 0, 1 1 1 (ii) Encode b using a balanced code 0,1 n of len∈gtCh n 2 Construction 7. Let be an [n,k 1,d] cyclic code and let and size A n,2,n . Let Φ denoCte t⊆he{enco}ding function, C − 2 2 e=(1,0,...,0). Then +e is a k-WMU codewith distance d. so that Φ (c) . C 2 2 (cid:0) ∈C (cid:1) Proof: Suppose that on the contrary the code is is not The output of the encoder is Ψ(Φ1(a),Φ2(b)). WMU. Then there exists a proper prefix p of lengthCat least Lemma 3. Let A,T,C,G n denote the code generated by k such that both pa and bp belong to +e. In other words, C ∈ { } C Construction 6. Then, (pa) e and (bp) e belong to . Consequently, (pb) e ′ − − C − belongs to , where e is a cyclic shift of e. Hence, by linearity ((iii)) C iiss abakla-nWcMedU. code. of C, z,0C(a−b)+e′′−e belongsto C. Now, observe that the C firstcoordinateofzisone,andhencenonzero.Butzhasarunof We discuss next the cardinality of the code generated zerosoflengthatleastk 1,whichisacontradiction.Therefore, C − by Construction 6. According to Theorem 3, one has = +e is indeed a k-weakly mutually uncorrelated code. Since C 2n for some constant C >0. The result is constr|uCc1t|ive. C+e is a coset of , the minimum Hamming distance property 3 n k+1 3 C C Inad−dition, 2n+1 .Hence,thesizeof isboundedfrom follows immediately. |C2|≈ √2πn21 C To use the above construction to obtain balanced DNA code- below by: words, we map the elements in F to A,T,C,G via 4 { } C 4n+1 . 0 A, 1 C, ω T, ω+1 G. 3 √2π(n k+1)n21 7→ 7→ 7→ 7→ − Let a be a word of length n. Then it is straightforward to see Next, we slightly modify the aforementioned construction and that the word (a,a+1) has balanced GC content. This leads to combine it with the Prefix-Balanced Construction 1 to obtain a the simple construction described next. near-balanced k-WMU code A,T,C,G n with parameter D. For this purpose, we geneCrat∈e {accordin}g to the Balanced Corollary 2. Let C be an [n,k−1,d] cyclic code over F4 that WMU Construction 6. We set =C 0,1 n and construct by contains the all ones vector 1. Then 2 1 concatenating C1′ ⊆ {0,1}k−1 Cand C{1′′ ⊆}{0,1}n−k+1. HerCe, C1′ {(c+e,c+1+e):c∈C} isbalancedand isanear-balancedWMUcodewithparameter C1′′ is a GC balanced, k-WMU code with minimum Hamming D. It is easy to verify that is a near-balanced k-WMU DNA C distance 2d. 5 TableI SUMMARYOFTHEPROPOSEDCONSTRUCTIONSFORq=4. Code k-WMU k-WMU+Error-Correcting k-WMU+Balanced k-WMU+Error-Correcting +Balanced Rate C1 n−4kn+1 2(k−1)h(k4−dn1−)√+nm−∗Hk−h(1−md∗H21 )+nh(nd) C3 √2π(n4n−+k1+1)n21 √2π2(k−1)h(4nk−−d1√)n+−mk∗H−1h(md∗H)nd−21 Construction Construction 4 Construction 5 Construction 6 Construction 8 Note C1= 236 m∗H =n−k−2√n−k−1 C3= 236 m∗H =n−k−2√n−k−1 B. The Decoupled Binary Code Construction Let = a ...a a for 1 i m . We claim that 1 m i i is a Cbalan{ced error-co|rrec∈tinCg k-WM≤U c≤ode}over Fn. The next construction is a combination of the binary code C q Toclarifytheresult,noticethateachelementin iscreatedby Constructions in 5 and 6. concatenating m strings, where each string belongCs to Fs. Construction 8. For given integers n and k n, let m = n In addition, the words in inherit the distance and Cb0al⊆anceqd k+1 and let a, b and c be the binary compo≤nentwords. Nex−t, properties of . ThereforeC, is balanced and has minimum construct A,T,C,G n by applying the following steps: Hamming distaCn0ce at least d. C C ∈{ } (i) lEenncgothdeka u1s,inagndambiinniamryumbloHcakmcmodinegCd1ist⊆anc{e0,d1.}Lk−et1Φof k Nexlt<, fonr,awnyepsahiorwofthnaottanlecaenssdarbilny distinccatnnao,tbb∈eCidaenndticfaolr. denote th−e encoding function, so that Φ (a) . 1 Th≤isestablishesthattheconst1ructedcno−ncl+at1enatedcodeisWMU. 1 1 (ii) InvokeConstruction3 with n=m to generate∈aCn MU code Let l=is+j, where i= l and 0 j <s. We consider three 0,1 m oflengthmandminimumHammingdistance different scenarios for the insdex j: ≤ C2 ⊆{ } (cid:4) (cid:5) (iii) dsGo.eEnthnearcatotΦdee2ab(cbou)ds∈ienwgCo2Cr.d2.cLefrtoΦm2adebnaoltaenctheedecnocdoedin3gofufnlcetnigotnh, • ∅0j)=<or0j;<.I.n.kt;ohrAis(gCcaaiins∩e,,oC1n1e≤=cai∅n<)vimmerpi.flyTiehtshetarhetaf1otrael1,i(6=C<1b∩mnn−C.mlI+t−1is.i+e1as=y n, minimum Hamming distance d and of sizeCA n,d,n2 . • to show that all−sj+1 is a suffix of length≤s−j of a word in The oΦLue3ttp(ucΦt)3o∈fdCeth3ne.oteencthoedeurnisdeΨrly(Φin1g(aen)cΦo2d(inbg),fΦun3c(tcio)n)(cid:0)., so th(cid:1)at CCak00l1.6=aSnjidnb<cnnbe−snnkl;−−+Isj<1n+.1st−h−iissjac<apsrese,,fiaoxlnoefhlaesnigsatllah−−sjps+r−o1p6=jerobpfnnrae−−nfisjx+el1oe.fmHleeennntgcitenh, The following result is a consequence of Lemmas 3, 2. • j o≤f a word in , and bnl−j+1 is a proper suffix of length Lemma 5. Let A,T,C,G n denote the code generated by j of an elementCi0n . Sinn−cej+k1 j <s, one has al = Construction 8. TCh∈en,{ } bn and al =bCn0 . ≤ l−j+1 6 (i) is a k-WMU code. We sumn−mj+ar1ize the1re6sultns−olf+1our constructions of WMU codes C (ii) is balanced. in Table I. C (iii) The minimum Hamming distance of C is at least d. REFERENCES Example 2. Construct 1 and 2 according to Example 1. The [1] V. Levenshtein, “Decoding automata, invariant with respect to the initial C C size of the code equals state,”ProblemyKibernet, vol.12,pp.125–136,1964. C [2] A. J. De Lind Van Wijngaarden and T. J. Willink, “Frame synchroniza- = =2s1+s2A(n,d, n) tionusingdistributedsequences,” Communications, IEEETransactions on, |C| |C1||C2||C3| 2 vol.48,no.12,pp.2127–2138, 2000. =2(k−1)(1−h(k−d1))+m∗H(1−h(md∗H))A(n,d, n) [3] CDo.mBmajuic´niacnadtioJn.sS,t2o0ja0n4ovIEic´E,E“DIinstterribnuatteiodnsaelqCueonncfeersenancedosena,rvcohl.pr1o.cesIsE,”EEin, 2 2004,pp.514–518. 4n √n k 1 [4] S.Bilotta, E.Pergola,andR.Pinzani, “Anewapproachtocross-bifix-free − − − . sets,”IEEETransactions onInformationTheory,vol.6,no.58,pp.4058– ≥√2π2(k−1)h(k−d1)+m∗Hh(md∗H)nd−21 4063,2012. [5] S.R.Blackburn,“Non-overlappingcodes,”arXivpreprintarXiv:1303.1026, The last inequality follows from the lower bound of Theorem 4. 2013. [6] G.M.Church,Y.Gao,andS.Kosuri,“Next-generation digitalinformation C. Concatenated Construction storageindna,”Science, vol.337,no.6102,pp.1628–1628, 2012. [7] N.Goldman,P.Bertone,S.Chen,C.Dessimoz,E.M.LeProust,B.Sipos, For a given integer s 1, suppose that 0 is a balanced andE.Birney,“Towardspractical,high-capacity,low-maintenanceinforma- error correctingk-WMU c≥ode overFs with miCnimumHamming tionstorageinsynthesized dna,”Nature,2013. q [8] S. Yazdi, Y. Yuan, J. Ma, H. Zhao, and O. Milenkovic, “A rewritable, distanced.Thecode maybeobtainedbyusingoneofthetwo 0 random-access dna-based storage system,” Scientific Reports, vol. 5, no. C methods described in this section. Our goal is to obtain a larger 14138,2015. family of balanced error-correcting k-WMU codes Fn by [9] H. M. Kiah, G. J. Puleo, and O. Milenkovic, “Codes for dna sequence C ⊆ q profiles,”arXivpreprintarXiv:1502.00517,2015. concatenating words in , where n=sm, m 1. 0 [10] S. Yazdi, H. M. Kiah, E. R. Garcia, J. Ma, H. Zhao, and O. Milenkovic, C ≥ “Dna-based storage: Trends and methods,” Molecular, Biological, and Construction 9. Select subsets ,..., such that 1 m 0 Multi-Scale Communications, IEEETransactions on,toappear. C C ⊆C [11] D.E.Knuth,“Efficient balanced codes,”Information Theory,IEEETrans- 1 m = actions on,vol.32,no.1,pp.51–53,1986. C ∩C ∅ [12] K. A. S. Immink, Codes for mass data storage systems. Shannon and ( = ) or ( = ) C1∩Cm−1 ∅ C2∩Cm ∅ Foundation Publisher, 2004. .. [13] N.deBruijn,D.Knuth,andS.Rice,“Theaverageheightofplantedplane . trees,”GraphTheoryandComputing/Ed. RCRead,p.15,1972. [14] E.Gilbert,“Synchronization ofbinarymessages,”InformationTheory,IRE and ( = ) or ... or ( = ) C1∩C2 ∅ Cm−1∩Cm ∅ Transactions on,vol.6,no.4,pp.470–477,1960. 6 [15] E.N.Gilbert,“Acomparisonofsignallingalphabets,”BellSystemTechnical Journal, vol.31,no.3,pp.504–522, 1952. [16] R.Varshamov,“Estimateofthenumberofsignalsinerrorcorrectingcodes,” inDokl.Akad.NaukSSSR,vol.117,no.5,1957,pp.739–741. [17] R. L.Graham and N.Sloane, “Lower bounds for constant weight codes,” InformationTheory,IEEETransactionson,vol.26,no.1,pp.37–43,1980. [18] S.M.Johnson,“Anewupperboundforerror-correctingcodes,”Information Theory, IRETransactions on,vol.8,no.3,pp.203–207,1962. [19] S.Tavares,“Astudyofsynchronizationtechniquesforbinarycycliccodes,” Ph.D.dissertation, Thesis(Ph.D.)–McGillUniversity, 1968.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.