ebook img

Staircase Codes: FEC for 100 Gb/s OTN PDF

0.17 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Staircase Codes: FEC for 100 Gb/s OTN

IEEE/OSAJOURNALOFLIGHTWAVETECHNOLOGY 1 Staircase Codes: FEC for 100 Gb/s OTN Benjamin P. Smith, Arash Farhood, Andrew Hunt, Frank R. Kschischang Fellow, IEEE and John Lodge Fellow, IEEE Abstract—Staircase codes, a new class of forward-error- mean any generalized LDPC code with algebraic component correction (FEC) codes suitable for high-speed optical commu- codes), since they possess properties that make them particu- nications, are introduced. An ITU-T G.709-compatible staircase larlysuitedtoprovidingerror-correctioninfiber-opticcommu- code with rate R = 239/255 is proposed, and FPGA-based nication systems. In particular,for 100 Gb/s implementations, simulation results are presented, exhibiting a net coding gain 2 (NCG) of 9.41 dB at an output error rate of 10−15, an we arguethatsyndrome-baseddecodingof product-likecodes 1 improvement of 0.42dBrelative tothebestcodefromtheITU- is significantly more efficient than message-passing decoding 0 T G.975.1 recommendation. An error floor analysis techniqueis of LDPC codes. 2 presented,andtheproposedcodeisshowntohaveanerrorfloor This paper presents a new class of high-rate binary error- n at 4.0×10−21. correcting codes—staircase codes—whose construction com- a IndexTerms—Staircasecodes,fiber-opticcommunications,for- binesideasfromconvolutionalandblockcoding.Indeed,stair- J ward error correction, product codes, low-density parity-check casecodescanbeinterpretedashavinga‘continuous’product- 9 codes. like construction. In the context of wireless communications, 1 related code constructions include braided block codes [10], ] I. INTRODUCTION braided convolutional codes [11], diamond codes [12] and T ADVANCES in physics—the invention of the laser, low- cross parity check convolutional codes [13], each of which I . loss optical fiber, and the optical amplifier—have driven is related to the recurrentcodesof Wyner-Ash [14]. However, s c the exponential growth in worldwide data communications. these proposals considered soft decoding of the component [ However,asthesetechnologiesmature,systemdesignershave codes, which is unsuitable for high-speed fiber-optic commu- 1 increasingly focused on techniques from communication the- nications. Herein, we describe a syndrome-based decoder for v ory, including forward error correction, to simultaneously in- staircase codes, that provides excellent performance with an 6 crease transmission capacity and decrease transmission costs. efficient decoder implementation. 0 One of the first proposals for FEC in an optical system InSection II, we reviewthe specificationsandperformance 1 appeared in [1], which demonstrated a shortened (224,216) of FEC codes defined in ITU-T Recommendations G.975 4 . Hamming code implementation at 565 Mbit/s. Since then, and G.975.1. In Section III, we describe the syndrome-based 1 ITU-T Recommendations G.975 and G.975.1 have standard- decoder for product-like codes, and argue that it results in a 0 ized more powerful codes for optical transport networks decoder data-flow that is more than two orders of magnitude 2 1 (OTNs). More recently, low-density parity-check (LDPC) smaller than the message-passing decoder of an LDPC code. : codes [2], [3]—which provide the potential for capacity- Staircase codes are presented in Section IV, and a G.709- v i approaching performance—have been investigated, as aptly compatible staircase code is proposed. In Section V, we X summarized in [4], [5]. While implementations exists at 10 present an analytical method for determining the error floor r Gb/s (for 10GBase-T ethernet networks), the blocklengths of of iteratively decoded staircase codes, and show that the a such implementations (∼ 500–2000) are too short to provide proposed staircase code has an error floor at 4.0 × 10−21. performance close to capacity; the (2048,1723) RS-LDPC Finally, in Section VI, we present FPGA-based simulation code is approximately 3 dB from the Shannon Limit at results, illustrating that the proposedcode providesa 9.41 dB 10−15 [6], see also [7]. Another significant roadblock is that NCG at an output error rate of 10−15, an improvement of fiber-optic communication systems are typically required to 0.42 dB relative to the best code from the ITU-T G.975.1 provide bit-error-rates below 10−15. It is well-known that recommendation, and only 0.56 dB from the Shannon Limit. capacity-approaching LDPC codes exhibit error floors [8], and to achieve the targeted error rate would likely require II. EXISTING PROPOSALS concatenationwithanoutercode(e.g.,asin[9]).Inthiswork, A. ITU-T Recommendation G.975 we focus on product-like codes (by product-like codes, we Thefirsterror-correctioncodestandardizedforopticalcom- B. P. Smith and F. R. Kschischang are with the Electrical and munications was the (255,239) Reed-Solomon code, with CCoomllepgueterRoEandg,ineTeorrionngto,DeOpanrttamrieont,MU5Snive3rGsi4ty, oCfanTadoaron(toe-,m1ai0l: Kibnegn’s, symbols in F28, capable of correcting up to 8 symbol errors frank @comm.utoronto.ca). { in any codeword. For an output-error-rateof 10−15, the NCG A.}Farhood is with Cortina Systems Inc., 535 Legget Drive, Suite 1000, of the RS code is 6.2 dB, which is 3.77 dB from capacity. Kanata, OntarioK2K3B8,Canada. In order to provide improved burst-error-correction, 16 A. Hunt and J. Lodge are with the Communications Research Centre Canada, 3701Carling Ave.,Ottawa, OntarioK2H8S2,Canada. codewords are block-interleaved, providing correction for 0000–0000/00$00.00 c 2011IEEE (cid:13) IEEE/OSAJOURNALOFLIGHTWAVETECHNOLOGY 2 bursts of as many as 1024 transmitted bits. A framing row codes is proposed. The overall code is described in terms consists of 16·255·8 bits, 30592 of which are information of a 512 × 1020 matrix of bits, in which the bits along bits, and the remaining 2048 bits of which are parity. The both the rows of the matrix as well as a particular choice resulting framing structure—a frame consists of four rows— of ‘diagonals’ must form valid codewords in the component isstandardizedinITU-TrecommendationG.709,andremains code. Since the diagonals are chosen to include 2 bits in therequiredframingstructureforOTNs;asadirectresult,the every row, any diagonal codeword has two bits in common coding rate of any candidate code must be R=239/255. with any row codeword; in contrast, for a product code, any row and column have exactly one bit in common. Note that B. ITU-T Recommendation G.975.1 theI.9constructionachievesaproduct-likeconstruction(their As per-channel data rates increased to 10 Gb/s, and the choice of diagonals ensures that each bit is protected by two capabilitiesofhigh-speedelectronicsimproved,the(255,239) componentcodewords) with essentially half the overall block RS code was replaced with stronger error-correcting codes. lengthoftherelatedproductcode(evenso,theI.9codehasthe In ITU-T recommendation G.975.1, several ‘next-generation’ longest block length among all G.975.1 proposals). However, coding schemes were proposed; among the many proposals, the choiceof diagonalsdecreasesthe size of the smallest stall thecommonmechanismforincreasedcodinggainwastheuse patterns,introducinganerrorfloorabove10−14.Foranoutput- of concatenated coding schemes with iterative hard-decision error-rate of 2·10−14, the NCG of the I.9 code is 8.67 dB, decoding. We now describe four of the best proposals, which which is 1.3 dB from capacity. will motivate our approach in Section IV. In Appendix I.3 of G.975.1, a serially concatenated coding III. LDPC VS.PRODUCT CODES scheme is described, with outer (3860,3824) binary BCH code and inner (2040,1930) binary BCH code, which are In this section, we present a high-level view of iterative obtained by shortening their respective mother codes. First, decodersfor LDPC and productcodes. Due to the differences 30592 = 8·3824 information bits are divided into 8 units, in their implementations, a precise comparison of their im- each of which is encoded by the outer code; we will refer plementation complexities is difficult. Nevertheless, since the to the resulting unit of 30880 bits as a ‘block’. Prior to communicationcomplexityofmessage-passingisasignificant encodingbytheinnercode,thecontentsofconsecutiveblocks challenge in LDPC decoder design, we consider the decoder are interleaved (in a ‘continuous’ fashion, similar to convolu- data-flow, i.e., the rate of routing/storing messages, as a tional interleavers [15]). Specifically, each inner codeword in surrogate for the implementation complexity. a given block involves ‘information’ bits from each of the eight preceding ‘outer’ blocks. Note that the interleaving step increases the effective block-length of the overall code, but it A. Decoder-Data-flow Comparison necessitatesasliding-windowstyledecodingalgorithm,dueto WeconsiderasystemthattransmitsinformationatD bits/s, the continuous nature of the interleaver. Furthermore, unlike using a binary error-correcting code of rate R—for which a productcode, the parity bits of the inner code are protected hard decisions at D/R bits/s are input to the decoder—and a bya singlecomponentcodeword,whichreducestheirlevelof protection. For an output-error-rateof 10−15, the NCG of the decoder that operates at a clock frequency fc Hz. 1) LDPC Code: We consider an LDPC decoder that im- I.3 code is 8.99 dB, which is 0.98 dB from capacity. plements sum-product decoding (or some quantized approx- InAppendixI.4ofG.975.1,a serially concatenatedscheme imation) with a parallel-flooding schedule. We assume q-bit with (shortened versions of) an outer (1023,1007) RS code messages internal to the decoder, an average variable node and(shortenedversionsof)aninner(2047,1952)binaryBCH degree d , and N decoder iterations; typically, q is 4 or 5 code is proposed. After encoding 122368 bits with the outer av bits, d ≈ 3, and N ∼ 15−25. Initially, hard-decisions are code, the coded bits are block interleaved and encoded by av input to the decoder at a rate of D/R bits/s and stored in the inner BCH code, resulting in a block length of 130560 flip-flop registers. At each iteration, variable nodes compute bits, i.e., exactly one G.709 frame. As in the previous case, and broadcast q-bit messages over every edge, and similarly the parity bits of the inner code are singly-protected. For an output-error-rate of 10−15, the NCG of the I.4 code is 8.67 forthechecknodes,i.e.,2qdav bitsarebroadcastperiteration per variable node. Since bits arrive from the channel at D/R dB, which is 1.3 dB from capacity. bits/s,thecorrespondinginternaldata-flowperiterationisthen InAppendixI.5ofG.975.1,a serially concatenatedscheme withanouter(1901,1855)RScodeandaninner(512,502)× D2qdav/R, and the total data-flow, including initial loading of 1-bit channel messages, is (510,500) extended-Hamming product code is described. It- erative decoding is applied to the inner product code, after D 2NDqd av which the outer code is decoded; the purpose of the outer FLDPC = + R R code is to eliminate the error floor of the inner code, since 2NDqd av the innercode has small stall patterns(see Section V). For an ≈ . R output-error-rateof10−15,theNCG oftheI.5codeis8.5dB, which is 1.47 dB from capacity. For N = 20, q = 4, d = 3, F ≈ 480D/R, which av LDPC Finally, in Appendix I.9 of G.975.1, a product-like code correspondsto a data-flowof more than 48 Tb/s for 100Gb/s with (1020,988) doubly-extended binary BCH component systems. IEEE/OSAJOURNALOFLIGHTWAVETECHNOLOGY 3 Niterations RDfcbits DRbits/s Masking r1fcbits/s Updaterow DRbits/s Loadvariable Broadcast Broadcast Dbits/s z }| { tree syndrome var-to-check check-to-var nodeflip-flops messages messages n2 Updatecolumn syndromes | 2Dqdav/Rb{izts/s/iteration } Look-up table r2fcbits/s Fig.1. Data-flow inanLDPCdecoder n1 Fig.2. Data-flow intheinitial syndromecomputing 2) ProductCode: When thecomponentcodesof a product code can be efficiently decoded via syndromes (e.g., BCH Referring to Fig. 2, and assuming that the bits in a product codes),thereexistsanespeciallyefficientdecoderfortheprod- codeare transmittedrow-by-row,theinputbus-width(i.e.,the uct code. Briefly, by operating exclusively in the ‘syndrome number of input bits per decoder clock cycle) is D/(Rf ) domain’—whichcompressesthereceivedsignal—andpassing c bits. Now, assuming these bits correspond to a single row only ≤ t messages per (component) decoding (for t-error- of the product code, each non-zero bit corresponds to some correcting component codes), the implementation complexity r -bit mask (i.e., the corresponding column of the parity- of decoding is significantly reduced. 1 check matrix of the row code), the modulo-2 sum of these is The following is a step-by-step description of the decoding performedby a masking tree, and the r -bit outputis masked algorithm: 1 withthecurrentcontentsofthecorresponding(syndrome)flip- 1) Fromthereceiveddata,computeandstorethesyndrome flop register. That is, each clock cycle causes a r -bit mask 1 foreach row and columncodeword.Store a copyof the to be added to the contents of the corresponding row in the received data in memory R. syndrome bank. Of course, each received bit also impacts a 2) Decodethosenon-zerosyndromescorrespondingtorow distinct column syndrome, however, the same r -bit mask is codewords1. In the event of a successful decoding, set 2 applied (when the corresponding received bit is non-zero) to the syndrome to zero, flip the correspondingt or fewer each of the involved column syndromes; the corresponding positions in memory R, and update the t or fewer data-flow is then r bits per clock cycle. 2 affected column syndromes by a masking operation. Once the syndromes are computed from the received data, 3) RepeatStep 2, reversingthe rolesof rowsandcolumns. iterativedecodingcommences.Toperformarowdecoding,an 4) If any syndromes are non-zero, and fewer than the r -bit syndrome is read from the syndrome bank. Since there 1 maximum number of iterations have been performed, are n row codewords,and each row is decodedon average v 2 go to Step 2. Otherwise, outputthe contentsof memory times,thecorrespondingdata-flowfromthesyndromebankto R. the row decoder is r n vD/(Rn n ) = r vD/(Rn ) bits/s. 1 2 1 2 1 1 We quantify the complexity of decoding a productcode by Foreachrowdecoding,atmostt positionsarecorrected,each 1 its decoder data-flow. At first glance, it may seem that this ofwhichis specifiedby⌈log n ⌉+⌈log n ⌉bits. Therefore, 2 1 2 2 approachignoresthecomplexityofdecodingthe(component) the data-flow from the row decoder to the data RAM is t-error-correcting BCH codewords. However, for relatively t n vD(⌈log n ⌉+⌈log n ⌉) t vD(⌈log n ⌉+⌈log n ⌉) small t, the decoding of a component codeword can be ef- 1 2 2 1 2 2 = 1 2 1 2 2 Rn n Rn ficientlydecomposedintoaseriesoflook-uptableoperations, 1 2 1 for which the data-flow interpretation is well-justified. In bits/s. Furthermore,foreach correctedbit, a r2-bitmaskmust this section, we will ignore the data-flow contribution of the be applied to the corresponding column syndrome, which BCH decoding algorithm, but we return to this point in the yieldsadata-flowfromtherowdecodertothesyndromebank Appendix,where it is shown that the correspondingdata-flow of t1n2r2vD/(Rn1n2) = t1r2vD/(Rn1) bits/s. A similar is negligible. analysis can be applied to column decodings. In total, the We assume that rows are encoded by a t -error-correcting decoder data-flow is 1 (n1,k1 = n1−r1) BCH code, and the columns are encoded D F = +(r +r )·f by a t2-error-correcting (n2,k2 = n2 − r2) BCH code, for P R 1 2 c an overall rate R = R R . We assume each row/column Dv 1 2 + ·(t ⌈log n ⌉+t ⌈log n ⌉+r +t r ) codewordisdecoded(onaverage,overthecourseofdecoding Rn 1 2 1 1 2 2 1 1 2 1 the overall product code) v times, where typically v ranges Dv + ·(t ⌈log n ⌉+t ⌈log n ⌉+r +t r ). from 3 to 4. Rn 2 2 1 2 2 2 2 2 1 2 The hard-decisions from the channel—at D/R bits/s—are In this work, we will focus on codes for which n =n ≈ written to a data RAM, in addition to being processed by a 1 2 1000,r =r =32, t =t =3, and the decoder is assumed syndrome computation/storage device. Contrary to the LDPC 1 2 1 2 to operate at f ≈400 MHz. For v =4, we then have a data- decoderdata-flow,the clockfrequencyf playsa centralrole, c c flow of approximately 293 Gb/s. Note that this is more than namely in the data-flow of the initial syndrome calculation. two ordersof magnitudesmaller than the correspondingdata- flowforLDPCdecoding.Intuitively,theadvantagearisesfrom 1Inpractice,thesyndromecorrespondingtoafixedrowisdecodedonlyif itsvaluehaschanged sinceitslastdecoding. two facts. First, when R1 > 1/2 and R2 > 1/2, syndromes IEEE/OSAJOURNALOFLIGHTWAVETECHNOLOGY 4 m (r1+r2)fcbits/s DRvnr11bits/s RDovwt1(⌈log2nR1bn⌉i1t+s/s⌈log2n2⌉) m B0T B1 ese Decoder mm DRbits/s ulatesyndroloadsyndroflip-flops DDRRvvtnn1r12r22bbitists/s/s DataRAM Dbits/s B2T B3 Calcand DCeocluomdenr BT DRvnt22r1bits/s Dvt2(⌈log2n1⌉+⌈log2n2⌉)bits/s 4 Rn2 Fig.4. The‘staircase’ visualization ofstaircase codes. Fig.3. Data-flow inaproduct-code decoder provide a compressed representation of the received signal. suggeststheirconnectiontoproductcodes.However,staircase codes are naturally unterminated (i.e., their block length is Second, the algebraic component codes admit an economical indeterminate), and thus admit a range of decoding strategies message-passingscheme,inthesensethatmessageupdatesare only required for the small fraction of bits that are corrected withvaryinglatencies.Mostimportantly,wewillseethatthey outperform product codes. by a particular (component code) decoding. The rate of a staircase code is r IV. STAIRCASE CODES R =1− , s m The staircase code constructioncombinesideasfrom recur- sinceencodingproducesr paritysymbolsforeachsetofm− sive convolutional coding and block coding. Staircase codes r ‘new’ information symbols. However, note that the related are completely characterized by the relationship between product code has rate successive matrices of symbols. Specifically, consider the (infinite) sequence B ,B ,B ,... of m-by-m matrices B , 2 0 1 2 i 2m−r i∈Z+. Herein, we restrict our attention to Bi with elements Rp =(cid:18) 2m (cid:19) inF , butananalogousconstructionappliesin thenon-binary 2 r r2 case. =1− + , m 4m2 Block B is initialized to a reference state known to the 0 encoder-decoder pair, e.g., block B could be initialized to which is greater than the rate of the staircase code. However, 0 the all-zeros state, i.e., an m-by-m array of zero symbols. forsufficientlyhighrates,thedifferenceissmall,andstaircase Furthermore, we select a conventional FEC code (e.g., Ham- codes outperform product codes of the same rate. ming, BCH, Reed-Solomon, etc.) in systematic form to serve From the context of transmitter latency—which includes as the component code; this code, which we henceforth refer encodinglatencyandframe-mappinglatency—staircasecodes to as C, is selected to have block length 2m symbols, r of havetheadvantage(relativetoproductcodes)thattheeffective which are parity symbols. rate(i.e.,theratioof‘new’informationsymbols,m−r,tothe Encoding proceeds recursively on the B . For each i, totalnumberof ‘new’symbols, m) of a componentcodeword i m(m−r) information symbols (from the streaming source) is exactly the rate of the overall code. Therefore, the encoder arearrangedintothem−rleftmostcolumnsofB ;wedenote produces parity at a ‘regular’ rate, which enables the design i this sub-matrix by B . Then, the entries of the rightmost r of a frame-mapper that minimizes the transmitter latency. i,L columns (this sub-matrix is denoted by B ) are specified as We note that staircase codes can be interpreted as general- i,R follows: ized LDPC codes with a systematic encoder and an indeter- minate block-length, which admits decoding algorithms with 1) Formthem×(2m−r)matrix,A= BT B ,where i−1 i,L a range of latencies. BiT−1 is the matrix-transpose of Bi−(cid:2)1. (cid:3) Usingargumentsanalogoustothoseusedforproductcodes, 2) The entries of B are then computed such that each i,R of the rows of the matrix BT B B is a valid at-error-correctingcomponentcodeC withminimumdistance i−1 i,L i,R d has a Hamming distance between any two staircase codeword of C. That is, th(cid:2)e elements in the(cid:3)jth row of min codewords that is at least d2 . B are exactly the r parity symbols that result from min i,R encoding the 2m−r ‘information’ symbols in the jth row of A. A. Decoding Algorithm Generally, the relationship between successive blocks in a Staircase codes are naturally unterminated (i.e., their block staircase code satisfies the following relation: for any i ≥ 1, length is indeterminate), and thus admit a range of decoding eachoftherowsofthematrix BT B isavalidcodewordin strategies with varying latencies. That is, decodingcan be ac- i−1 i C. An equivalentdescription—(cid:2)fromwh(cid:3)ich the term ‘staircase complishedinasliding-windowfashion,inwhichthedecoder codes’ originates—is suggested by Fig. 4, in which (the operatesonthereceivedbitscorrespondingtoLconsecutively concatenationofthesymbolsin)everyrow(andeverycolumn) received blocks Bi,Bi+1,...,Bi+L−1. For a fixed i, the in the ‘staircase’ is a validcodewordof C; this representation decoderiterativelydecodesasfollows:First,thosecomponent IEEE/OSAJOURNALOFLIGHTWAVETECHNOLOGY 5 mostreliable leastreliable C. A G.709-compatible Staircase Code The ITU-T Recommendation G.709 defines the framing structure and error-correcting coding rate for OTNs. For our Π Π Π purposes, it suffices to know that an optical frame consists of 130560 bits, 122368 of which are information bits, and the remaining 8192 are parity bits, which corresponds to error-correcting codes of rate R = 239/255. Since (510− Fig. 5. Amulti-edge-type graphical representation ofstaircase codes. Πis 32)/510 = 239/255, we will consider a component code a standard block interleaver, i.e., it represents the transpose operation onan with m = 510 and r = 32. Specifically, the binary (n = m-by-mmatrix. 1023,k = 993,t = 3) BCH code with generator polynomial (x10+x3+1)(x10+x3+x2+x+1)(x10+x8+x3+x2+1) codewords that ‘terminate’ in block Bi+L−1 (i.e., whose isadaptedtoprovideanadditional2-biterror-detectingmech- parity bits are in Bi+L−1) are decoded; since every symbol anism, resulting in the generator polynomial2 is involved in two component codewords, the corresponding g(x)=(x10+x3+1)(x10+x3+x2+x+1) syndrome updates are performed, as in Section III-A2. Next, ·(x10+x8+x3+x2+1)(x2+1). those codewordsthatterminatein blockBi+L−2 are decoded. Thisprocesscontinuesuntilthosecodewordsthatterminatein In order to provide a simple mapping to the G.709 frame, we block Bi are decoded. Now, since decoding those codewords first note that2·130560=510·512.Thisleadsus to define a terminating in some block Bj affects those codewords that slightgeneralizationofstaircasecodes,inwhichtheblocksBi terminate in block Bj+1, it is beneficial to return to Bi+L−1 consistof512rowsof510bits.Theencodingruleismodified andtorepeattheprocess.Thisiterativeprocesscontinuesuntil as follows: some maximum number of iterations is performed, at which 1) Form the 512×(512+510) matrix, A= BˆT B , i−1 i,L time the decoder outputs its estimate for the contents of Bi, whereBˆT is obtainedby appendingtwohall-zero rowis accepts in a new block B , and the entire process repeats i−1 i+L (i.e., the decoding window slides one block to the ‘right’). to the top of the matrix-transpose of Bi−1. 2) The entries of B are then computed such that each i,R of the rows of the matrix BT B B is a valid i−1 i,L i,R B. Multi-edge-type Interpretation codeword of C. That is, th(cid:2)e elements in the(cid:3)jth row of B are exactly the 32 parity symbols that result from i,R Staircase codes have a simple graphical representation, encoding the 990 ‘information’ symbols in the jth row which provides a multi-edge-type [3] interpretation of their of A. construction. The term ‘multi-edge-type’ was originally ap- Here, C is the code obtained by shortening the code plied to describe a refined class of irregular LDPC codes, in generated by g(x) by one bit, since our overall codeword whichvariablenodes(andchecknodes)areclassifiedbytheir length is 510+512=1022. degrees with respect to a set of edge types. Intuitively, the introductionofmultipleedgetypesallowsdegree-onevariable V. ERROR FLOORANALYSIS nodes, puncturedvariable nodes, and other beneficial features For iteratively decoded codes, an error floor (in the output that are not admitted by the conventional irregular ensemble. bit-error-rate) can often be attributed to error patterns that In turn, better performance for finite blocklengths and fixed ‘confuse’ the decoder, even though such error patterns could decoding complexities is possible. easily be corrected by a maximum-likelihood decoder. In the In Fig. 5, we present the factor graph representation of a contextof LDPCcodes, these errorpatternsareoftenreferred decoderthatoperatesonawindowofL=4blocks;thegraph to as trapping sets [8]. In the case of product-like codes with forgeneralLfollowsinanobviousway.Dottedvariablenodes aniterativehard-decisiondecodingalgorithm,we willreferto indicate symbols whose value was decoded in the previous them as stall patterns, due to the fact that the decoder gets stage of decoding. The key observation is that when these locked in a state in which no updates are performed, i.e., the symbols are correctly decoded—which is essentially always decoder stalls, as in Fig. 6. the case, since the output BER is required to be less than 10−15—the componentcodewordsin which theyare involved Definition 1: A stall pattern is a set s of codeword posi- tions, for which every row and column involving positions in are effectively shortened by m symbols. Therefore, the most s has at least t+1 positions in s. reliable messages are passed over those edges connecting We note that this definition includes stall patterns that are variable nodes to the shortened (component) codewords, as correctable, since an incorrect decoding may fortuitously indicatedinFig.5.Ontheotherhand,therightmostcollection cause one or more bits in s to be corrected, which could of variable nodes are (with respect to the current decoding then lead to all bits in s eventually being corrected. In this window) only involved in a single component codeword, and section, we obtain an estimate for the error floor by over- thus the edges to which they are connected carry the least boundingthe probabilitiesof these events,andpessimistically reliable messages. Due to the nature of iterative decoding, the intermediate edges carry messages whose reliability lies 2This is the code applied to the rows (but not the slopes) of the I.9 code between these two extremes. inG.975.1. IEEE/OSAJOURNALOFLIGHTWAVETECHNOLOGY 6 probability ζ, and that ζ does not depend on l. Then we can BiT Bi+1 overbound the probability that a particular minimal stall s occurs by 16 16 BiT+2 Bi+3 (cid:18) l (cid:19)p16−lζl =(p+ζ)16. Xl=0 In order to provide evidence in favor of these assumptions, Fig. 6. A stall pattern for a staircase code with a triple-error correcting component code. Since every involved component codeword has 4 errors, Table I presents empirical estimates, for l = 0, l = 1 and decoding stalls. l =2, of the probability that a minimal stall pattern s occurs during iterative decoding, given that 16−l positions in s are (intentionally) received in error. Note that even if a minimal assuming that every stall pattern is uncorrectable (i.e., if any stall is received, there exists a non-zero probability that it stall pattern appears during the course of decoding, it will will be corrected as a result of erroneous decodings; we will appear in the final output). The methods presented for the ignore this effect in our estimation, i.e., we make the worst- error floor analysis apply to a general staircase code, but for case assumption that any minimal stall persists. Furthermore, simplicityofthepresentation,wewillfocusonastaircasecode from the results for l = 1 and l = 2, it appears that our with m = 510 and doubly-extended triple-error-correcting statedassumptionsregardingζ holdtrue,andζ ≈5.8×10−4. component codes. For l >2, we did not have access to sufficient computational resourcesfor estimating the correspondingprobabilities.Nev- A. A Union Bound Technique ertheless,basedontheevidencepresentedinTableI,theerror floorcontributiondueto minimalstallpatternsisestimated as Due to the streaming nature of staircase codes, it is neces- sary to account for stall patterns that span (possibly multiple) 16 consecutiveblocks.Inordertodeterminethebit-error-ratedue 5102 ·Mmin·(p+ζ)16, to stall patterns, we consider a fixed block B , and the set of stall patterns that include positions in B . Spiecifically, we where ζ =5.8×10−4 when p=4.8×10−3. i ‘assign’ to B those stall patterns that include symbols in B i i TABLEI (and possibly additionalpositions in B ) but no symbols in i+1 ESTIMATEDPROBABILITYOFAMINIMALSTALLs,GIVENTHAT16 l Bi−1. LetSi representthe setofstall patternsassignedto Bi. POSITIONSARERECEIVEDINERROR − By the union bound, we then have l Estimatedprobability |s| 0 149/150 BERfloor ≤sX∈SiPr[bits in s in error]· 5102. 12 (11//11777225)2 Therefore, bounding the error floor amounts to enumerating thesetS ,andevaluatingtheprobabilitiesofitselementsbeing i in error. C. Bounding the Contribution Due to Non-minimal Stalls We now wish to account for the error floor contribution of B. Bounding the Contribution Due to Minimal Stalls non-minimalstalls,e.g.,thestallpatternillustratedinFig.7.In Definition 2: A minimal stall pattern has the property that the generalcase, a stall pattern s includescodewordpositions there are only t+1 rows with positions in s, and only t+1 in K rows and L columns, K ≥ 4, L ≥ 4; we refer to columns with positions in s. theseas(K,L)-stalls.Furthermore,each(K,L)-stallincludes Theminimalstallpatternsofastaircasecodecanbecounted l positions,4·max(K,L)≤l ≤K·L,wherethelowerbound in a straightforward manner; the multiplicity of minimal stall follows from the fact that every row and column (in the stall) patterns that are assigned to B is i includes at least 4 positions. Note that there are 4 510 510 510 K M = · · , 510 510 510 min (cid:18) 4 (cid:19) (cid:18) m (cid:19) (cid:18)4−m(cid:19) A = · · mX=1 K,L (cid:18) L (cid:19) (cid:18) m (cid:19) (cid:18)K−m(cid:19) mX=1 and we refer to the set of minimalstall patternsby S . The min probability that the positions in some minimal stall pattern s ways to select the involved rows and columns. are received in error is p16. For a fixed (K,L)6=(4,4) and a fixed choice of rows and Next, we consider the case in which not all positions in columns, we now proceed to overbound the contributions of someminimalstallpatternsarereceivedinerror,butthatdue candidatestallpatterns.Withoutlossofgenerality,we assume to incorrectdecoding(s),all positions in s are—at some point that K ≥ L, and note that there are L K ways of choosing 4 during decoding—simultaneously in error. For some fixed s l = 4K elements (in the L·K ‘grid(cid:0)’ in(cid:1)duced by the choice and l, 1 ≤ l ≤ 16, there are 16 ways in which 16 − l of rows and columns) such that each column includesexactly l positions in s can be received in(cid:0)err(cid:1)or. For the moment, let’s four elements, and that every stall pattern ‘contains’ at least assumethaterroneousbitflipsoccurindependentlywithsome one of these. Now, since a stall pattern includes l elements, IEEE/OSAJOURNALOFLIGHTWAVETECHNOLOGY 7 7.5 8.0 8.5 9.0 290.5 log10(Q1) 0(d.0B) 10.5 11.0 BiT Bi+1 10-6 10-7 BiT+2 Bi+3 10-8 I.9 G.975.1 codes I.3 I.4I.5 Fcoigrr.e7c.tingAconmonp-omniennitmcaoldset.all pattern fora staircase code with a triple-error BE1R100o-1u-09t BSC Limit (C Staircase RS(255,239) 10-11 =23 9 /2 5 4·K ≤l≤K·L,thenumberofstallpatternswithl elements 10-12 5) is overboundedas 10-13 L K K·L−4·K · . 10-14 (cid:18)4(cid:19) (cid:18) l−4·K (cid:19) Fora general(K,L)6=(4,4), itfollowsthatthenumberof 10-1510-2 10-3 10-4 stall patterns with l elements, 4·max(K,L) ≤ l ≤ K ·L, is BERin overboundedas Fig.8. PerformanceofaR=239/255staircasecodeonabinarysymmetric max(K,L) min(K,L) K·L−4·max(K,L) channel with crossover probability BERin, compared with various G.975.1 · . codes.Theupperscaleplotstheequivalent binary-input GaussianchannelQ (cid:18) 4 (cid:19) (cid:18) l−4·max(K,L) (cid:19) (indB),whereBERin=(1/2)erfc(Q/√2). Finally, over the choice of the K rows and L columns, there are bit-error-rate curves for the G.975 RS code, as well as the max(K,L) min(K,L) KL−4·max(K,L) Ml =A · · G.975.1codesdescribedinSectionII.Foranoutputerrorrate K,L K,L (cid:18) 4 (cid:19) (cid:18) l−4·max(K,L) (cid:19)10−15, the staircase code providesapproximately9.41 dB net (K,L)-stalls with l elements. coding gain, which is within 0.56 dB of the Shannon limit, For a fixed K and L, the contributionto the error floor can and an improvement of 0.42 dB relative to the best G.975.1 be estimated as code. K·L l 5102 ·MKl ,L·(p+ζ)l, VII. CONCLUSIONS l=4·mXax(K,L) We proposed staircase codes, a class of product-like FEC andinTableII,we providevaluesforvariousK andL,when codes that provide reliable communication for streaming ζ =5.8×10−4 and p=4.8×10−3. sources. Their construction admits low-latency encoding and Notethatthedominantcontributionto theerrorfloorisdue variable-latency decoding, and a decoding algorithm with tominimalstallpatterns(i.e.,K =L=4),andthattheoverall an efficient hardware implementation. For R = 239/255, a estimatefortheerrorfloorofthecodeis3.8×10−21. Finally, G.709-compatible staircase code was presented, and perfor- wenotethatbyasimilar(butmorecumbersome)analysis,the mance within 0.56 dB of the Shannon Limit at 10−15 was error floor of the G.709-compliantstaircase code is estimated provided via an FPGA-based simulation. to occur at 4.0×10−21. VI. SIMULATION RESULTS APPENDIX In Fig. 8, simulation results—generated in hardware on This section briefly describes known techniques for effi- an FPGA implementation—are provided for the G.709- cientlydecodingtriple-error-correctingbinaryBCHcodes,and compatible staircase code, for L = 7. We also present the discusses the data-flow associated with a lookup-table-based decoder architecture. TABLEII For a syndrome S = (S1,S3,S5), Si ∈ F2m, we first CONTRIBUTIONTOERRORFLOORESTIMATEOF(K,L)-STALLPATTERNS compute D3 = S13 +S3 and D5 = S15 +S5. A triple-error correcting decoder distinguishes the cases K L Contribution 4 4 3.55 10−21 v =0: S =S =S =0 4 5 7.81×10−28 v =1: S1 6=0,2D =3D =0 5 5 2.54×10−22 1 3 5 5 6 2.21×10−28 v =2: S1 6=0,D3 6=0,S1D5 =S3D3 6 6 1.40×10−23 v =3: D3 6=0,v 6=2, 6 7 1.49×10−29 7 7 8.53×10−25 wherev isthe numberofpositionsto invertinordertoobtain 7 8 1.83×10−32 a valid codeword. × IEEE/OSAJOURNALOFLIGHTWAVETECHNOLOGY 8 In order to determine the corresponding positions, a recip- The roots of the suppressed cubic f (z) can be found by Z rocal error-locator polynomial σ˜(x) is defined, the roots of lookupusingatablewith2mentries,eachofwhichisapairof which identify the positions. From [16], we have: elements in F2m. Therefore, in either case, decoding requires 2m bits to be read from a lookup-table memory. v =1: σ˜(x)=x+S 1 Finally, for n=n =n , the data-flow contribution of the v =2: σ˜(x)=x2+S x+D /S 1 2 1 3 1 lookup-table-based decoding architecture is 4mvD. For n = v =3: σ˜(x)=x3+S x2+bx+S b+D nR 1 1 3 1000, m = 10, v = 4, R = 239/255 and D = 100 Gb/s, the where corresponding data-flow is 17.1 Gb/s, which is small relative b=(S12S3+S5)/D3. to the data-flow that arises due to those effects considered in Section III-A2. When t = 2, note that all of the coefficients of σ˜(x) are nonzero. It remains to determine the roots of σ˜(x). For v = 1, REFERENCES it is trivial to determine the error location. For v = 2 or [1] W.D.Grover,“Forwarderrorcorrectionindispersion-limitedlightwave v = 3, lookup-based methods for solving the corresponding systems,”J.Lightw.Technol., vol.6,no.5,pp.643–645,May1988. [2] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, MA: quadraticandcubicequationsaredescribedin[17],[18].Inthe MITPress,1963. remainder of this section, we briefly describe these methods, [3] T. Richardson and R. Urbanke, Modern Coding Theory. Cambridge, and discuss their data-flow. UK:CambridgeUniversity Press,2008. [4] I. B. Djordjevic, M. Arabaci, and L. L. Minkov, “Next generation For a quadratic equation f (x)=x2+ax+b with a6=0, X FEC for high-capacity communication in optical transport networks,” substitute x=ay to obtain J.Lightw.Technol.,vol.27,no.16,pp.3518–3530, Aug.2009. [5] T. Mizuochi, “Recent progress in forward error correction and its f (y)=a2(y2+y+b/a2). interplaywithtransmissionimpairments,”IEEEJ.Sel.TopicsQuantum Y Electron.,vol.12,no.4,pp.544–554, Jul.2006. If f (r) = 0 then f (ar) = 0. Thus the problem of finding [6] Z. Zhang, V. Anantharam, M. J. Wainwright, and B. Nikolic, “An Y X efficient 10GBASE-T ethernet LDPC decoder design with low error roots of f (x) reduces to the problem of finding roots of the X floors,” IEEEJ. Solid-State Circuits, vol. 45, no. 4, pp. 843–855, Apr. suppressed quadratic fY(y), which can be solved by lookup 2010. using a table with 2m entries, each of which is a pair of [7] A. Darabiha, A. Chan Carusone, and F. R. Kschischang, “Power re- elements in F2m. Therefore, when v = 2, decoding requires vdoulc.ti4o3n,tneoc.h8n,iqpupe.s1f8o3r5L–1D8P4C5,dAeucogd.e2r0s0,”8.IEEE J. Solid-State Circuits, 2m bits to be read from a lookup-table memory. [8] T. Richardson, “Error floors of LDPC codes,” in Proc. 41st Allerton Similarly, for a cubic equation f (x)=x3+ax2+bx+c, Conf.Comm.,Control,andComput.,Monticello, IL,2003. X [9] T.Mizuochietal.,“ExperimentaldemonstrationofconcatenatedLDPC substitute x=y+a to obtain and RS codes by FPGAs emulation,” IEEE Photon. Technol. Lett., f (y)=y3+(a2+b)y+ab+c. vol.21,no.18,pp.1302–1304, Sep.2009. Y [10] A. J. Feltstro¨m, D. Truhachev, M. Lentmaier, and K. S. Zigangirov, “Braided block codes,” IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. Note that yf (y) is a linearized polynomial with respect Y 2640–2658, Jun.2009. to F and hence the set of zeros of yf (y) is a vector space [11] W. Zhang, M. Lentmaier, K. S. Zigangirov, and D. J. Costello Jr., 2 Y over F . In particular, the roots of yf (y), if distinct, are of “Braided convolutional codes: A new class of turbo-like codes,” IEEE 2 Y Trans.Inf.Theory,vol.56,no.1,pp.316–331,Jan.2010. the form {0,r ,r ,r +r }. Thus only r and r need to be 1 2 1 2 1 2 [12] C. P.M. J. Baggen and L.M. G.M. Tolhuizen, “On diamond codes,” stored in the lookup table. IEEETrans.Inf.Theory,vol.43,no.5,pp.1400–1411, Sep.1997. Twocasesarise,dependingonthevalueofa2+b=D /D . [13] T.Fuja,C.Heegard, andM.Blaum, “Crossparitycheck convolutional 5 3 codes,” IEEE Trans. Inf. Theory, vol. 35, no. 6, pp. 1265–1276, Nov. IfD =0,sothata2+b=0,thenf (y)=y3+ab+c,andthe 5 Y 1989. roots can be found by finding the cube roots of ab+c=D , [14] A.D.WynerandR.B.Ash,“Analysisofrecurrentcodes,”IEEETrans. 3 which requires lookup using a table with 2m entries, each Inf.Theory,vol.9,no.3,pp.143–156,1963. [15] G.D.ForneyJr.,“Burst-correctingcodesfortheclassicburstychannel,” of which is a pair of elements in F2m. If D5 6= 0, so that IEEETrans.Commun.,vol.19,no.5,pp.772–781,Oct.1971. a2+b6=0, substitute y =(a2+b)1/2z to obtain [16] I. S. Reed and X. Chen, Error-Control Coding for Data Networks. Boston,MA:KluwerAcademic Publishers,1999. f (z)=(a2+b)3/2(z3+z+(ab+c)/(a2+b)3/2), [17] R. T. Chien, B. D. Cunningham, and I. B. Oldham, “Hybrid methods Z for finding roots of a polynomial with application to BCH decoding,” where IEEETrans.Inf.Theory,vol.15,pp.329–335,Mar.1969. ab+c D5 1/2 [18] E. R. Berlekamp, H. Rumsey, and G. Solomon, “On the solution of = 3 . algebraic equations overfinitefields,”Inform.Contr.,vol.10,pp.553– (a2+b)3/2 (cid:18)D3(cid:19) 564,1967. 5

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.