Construction of Block Orthogonal STBCs and Reducing Their Sphere Decoding Complexity G. R. Jithamithra and B. Sundar Rajan, Dept. of ECE, Indian Institute of Science, Bangalore 560012, India Email: jithamithra,bsrajan @ece.iisc.ernet.in { } Abstract—Construction of high rate Space Time Block Codes bylettingtherealvariablesx ,x , ,x takevaluesfroma 2 1 2 ··· K (STBCs) with low decoding complexity has been studied widely realsignalset ,whereA arefixedn n complexmatrices 1 i t t using techniques such as sphere decoding and non Maximum- S × 0 defining the code, known as the weight matrices. The rate of Likelihood (ML) decoders such as the QR decomposition de- 2 this code is K complex symbols per channel use. coder with M paths (QRDM decoder). Recently Ren et al., We are int2enretsted in linear STBCs, since they admit sphere t presented a new class of STBCs known as the block orthogonal c STBCs (BOSTBCs), which could be exploited by the QRDM decoding (SD) [2] and other QR decompositionbased decod- O decoders to achieve significant decoding complexity reduction ing techniquessuch as the QRDM decoder [3] which are fast 2 without performance loss. The block orthogonal property of the ways of decoding for the variables. codes constructed was however only shown via simulations. In 1 Designing STBCs with low decoding complexity has been this paper, we give analytical proofs for the block orthogonal structure of various existing codes in literature including the studiedwidelyintheliterature.Orthogonaldesignswithsingle ] T codesconstructedinthepaperbyRenetal.Weshowthatcodes symboldecodabilitywereproposedin[4],[5],[6].ForSTBCs I formedasthesumofCliffordUnitaryWeightDesigns(CUWDs) with more than two transmit antennas, these came at a cost s. or Coordinate Interleaved Orthogonal Designs (CIODs) exhibit of reduced transmission rates. To increase the rate at the c block orthogonal structure. We also provide new construction cost of higher decoding complexity, multi-group decodable [ ofblockorthogonalcodesfromCyclicDivisionAlgebras(CDAs) andCrossed-ProductAlgebras(CPAs).Inaddition,weshowhow STBCs were introduced in [7], [8], [9]. Another set of low 1 the block orthogonal property of the STBCs can be exploited decodingcomplexitycodesknownasthefastdecodablecodes v to reduce the decoding complexity of a sphere decoder using a were studied in [10]. Fast decodable codes have reduced SD 9 depth first search approach. Simulation results of the decoding complexity owing to the fact that a few of the variables can 4 complexityshowa30%reductioninthenumberoffloatingpoint be decoded as single symbols or in groups if we condition 4 operations(FLOPS)ofBOSTBCsascomparedtoSTBCswithout 3 the block orthogonal structure. themwith respecttothe othervariables.Fastdecodablecodes . for asymmetric systems using division algebras have been 0 1 I. INTRODUCTION &PRELIMINARIES reported [11]. The properties of fast decodable codes and 2 multi-group decodable codes were combined and a new class Consideraminimal-delayspace-timecodedRayleighquasi- 1 of codes called fast group decodable codes were studied in static flat fading MIMO channelwith full channelstate infor- : v mation at the receiver (CSIR). The input output relation for [12]. i A new code property called the block-orthogonal property X such a system is given by was studied in [3] which can be exploited by the QR- r a Y=HX+N, (1) decompositionbased decodersto achievesignificantdecoding complexityreductionwithoutperformanceloss. This property where H Cnr×nt is the channel matrix and N Cnr×nt is was exploited in [13] to reduce to the average ML decoding ∈ ∈ the additive noise. Both H and N have entries that are i.i.d. complexityoftheGoldencode[14]andalsoin[15]toreduce complex-Gaussian with zero mean and variance 1 and N 0 the worst-case complexity of the Golden code with a small respectively. The transmitted codeword is X Cnt×nt and performance loss. While the other low decoding complexity ∈ Y Cnr×nt is the received matrix. The ML decoding metric STBCs use the zero entries in the upper left portion of the ∈ to minimize over all possible values of the codeword X, is upper triangular matrix after the QR decomposition, these M(X)= Y HX 2 . (2) decodersutilizethezeroesinthelowerrightportiontoreduce k − k the complexity further. Definition 1: [1]: A linear STBC over a real (1- The contributions of this paper are as follows: C dimensional) signal set , is a finite set of nt nt matrices, • WegeneralizethesetofsufficientconditionsforanSTBC S × where any codeword matrix belonging to the code is tobeblockorthogonalprovidedin[3]forsub-blocksizes C obtained from, greater than 1. K • Weprovideanalyticalproofsthatthecodesobtainedfrom the sum of Clifford Unitary Weight Designs (CUWDs) X(x ,x ,...,x ) = x A , (3) 1 2 K i i [16] exhibit the block orthogonal property when we i=1 X choose the right ordering and the right number of ma- where ˜x = [x ,x ...,x ]T. In terms of the weight matrices, 1 2 K trices. the generator matrix can be written as • We provide new methods of construction of BOST- ^ ^ ^ BCs using Coordinate Interleaved Orthogonal Designs G= vec(A1) vec(A2) vec(AK) . ··· (CIODs)[17],CyclicDivisionAlgebras(CDAs)[18]and h i Hence, for any STBC, (1) can be written as Crossed Product Algebras (CPAs) [19] along with the analytical proofs of their block orthogonality. ^ ^ vec(Y)=H ˜x+vec(N), eq • WeshowthattheorderingofvariablesoftheSTBCused for the QR decomposition dictates the block orthogonal where Heq ∈ R2nrnt×K is given by Heq = Int ⊗Hˇ G, structure and its parameters. and ˜x = [x ,x ...,x ], with each x drawn from a 1- 1 2 K i (cid:0) (cid:1) • We show how the block orthogonal property of the dimensional (PAM) constellation. Using the above equivalent STBCscanbeexploitedtoreducethedecodingcomplex- system model, the ML decoding metric (2) can be written as ity of a sphere decoder which uses a depth first search approach. M(˜x)= v^ec(Y) Heq˜x 2 . k − k • We provide bounds on the maximum possible reduction Using QR decomposition of H , we get H = QR where in the Euclidean metrics (EM) calculation during sphere eq eq Q R2nrnt×K is an orthonormal matrix and R RK×K decoding of BOSTBCs. ∈ ∈ is an upper triangular matrix. Using this, the ML decoding • Simulationresults show thatwe can reducethe decoding metric now changes to complexity of existing STBCs by upto 30% by utilizing the block orthogonal property. M(˜x)= QTv^ec(Y) R˜x 2= y′ R˜x 2 . (4) k − k k − k The remaining part of the paper is organizedas follows: In If we have H = [h h ...,h ], where h ,i 1,2,...,K are Section II the system model and some known classes of low eq 1 2 K i ∈ columnvectors,thentheQandRmatriceshavethefollowing decoding complexity codes are reviewed. In Section III, we form obtained by the Gram-Schmidt orthogonalization: derive a set of sufficient conditions for an STBC to be block orthogonalandalsotheeffectoforderingofmatricesonit. In Q=[q q ... q ], (5) SectionIV,wepresentproofsofblockorthogonalstructureof 1 2 K variousexisting codes and also discuss some new methodsof where q ,i 1,2,...,K are column vectors, and i ∈ constructionsof the same. In Section V, we discuss a method r q ,h q ,h q ,h to reduce the number of EM calculations while decoding a k 1 k h 1 2i h 1 3i ··· h 1 Ki 0 r q ,h q ,h BOSTBC using a depth first search based sphere decoderand k 2 k h 2 3i ··· h 2 Ki 0 0 r q ,h also derive bounds for the same. Simulation results for the R= k 3 k ··· h 3 Ki , (6) decoding complexity of various BOSTBCs are presented in ... ... ... ... ... Section VI. Concluding remarks constitute Section VII. 0 0 0 r ··· k K k Notations: Throughout the paper, bold lower-case letters where r =h , q = r1 and for i=2,...K, are used to denote vectors and bold upper-case letters to 1 1 1 kr1k denotematrices.Foracomplexvariablex,denotetherealand i−1 r iinmteaggeinrsa,ryalplarretaolfaxndbycoxmI palnedxxnQumrebsepresctairveeldye.nTohteedsebtsyoZf,aRll ri =hi− qj,hi qj, qi = rii . j=1 k k and C, respectively.The operationof stacking the columnsof X(cid:10) (cid:11) A. Low decoding complexity codes X one below the other is denoted by vec(X). The Kronecker productisdenotedby ,I andO denotetheT T identity A short overview of the known low decoding complexity T T ⊗ × matrixandthenullmatrix,respectively.Foracomplexvariable codesisgiveninthissection.Thecodesthatwillbedescribed x, the (ˇ(cid:5)) operator acting on x is defined as follows aremulti-groupdecodablecodes,fastdecodablecodesandfast group decodable codes. x x xˇ, I − Q . Incaseofamulti-groupdecodableSTBC,thevariablescan x x (cid:20) Q I (cid:21) be partitioned into groups such that the ML decoding metric The (ˇ(cid:5)) operator can similarly be applied to any matrix X is decoupled into submetrics such that only the members of ∈ the same groupneed to be decodedjointly.It can be formally Cn×m byreplacingeachentryx byxˇ ,i=1,2, ,n,j = ij ij 1,2, ,m, resulting in a matrix denoted by Xˇ ···R2n×2m. defined as [8], [17], [16]: Give·n··a complex vector x = [x ,x , ,x ]T, ˜x∈is defined Definition 2: An STBC is said to be g-group decodable 1 2 n as ˜x,[x1I,x1Q, ,xnI,xnQ]T. ··· if there exists a partition of {1,2,...,K} into g non-empty ··· subsets Γ ,Γ ,...,Γ such that the following condition is 1 2 g II. SYSTEMMODEL satisfied: A AH +A AH =0, For any Linear STBC with variables x ,x ...,x given by l m m l 1 2 K ^ (3), the generator matrix G [10] is defined by vec(X)=G˜x, whenever l Γ and m Γ and i=j. i j ∈ ∈ 6 Ifwegroupallthevariablesofthesamegrouptogetherin(4), III. BLOCK ORTHOGONALSTBCS then the R matrix for the SD [2], [20] in case of multi-group Block orthogonal codes introduced in [3] are a sub-class decodable codes will be of the following form: of fast decodable / fast group decodable codes. They impose an additional structure on the variables conditioned in these ∆ 0 0 1 ··· codes.AnSTBCissaidtobeblockorthogonaliftheRmatrix 0 ∆ 0 2 R= ... ... ·.·.·. ... , (7) of the code has the following structure: R1 B12 B1Γ 0 0 ∆ ··· ··· g 0 R2 B2Γ where ∆i,i=1,2,...,g is a square upper triangular matrix. R= ... ... ·.·.·. ... , (12) Now,considerthestandardSD ofanSTBC. SupposetheR 0 0 R matrix as defined in (6) turns outto be such that when we fix ··· Γ where each R ,i = 1,2,...,Γ is a blockdiagonal, upper values for a set of symbols, the rest of the symbols become i triangular matrix with k blocks U ,U ,...,U , each of size group decodable, then the code is said to be fast decodable. i1 i2 ik γ γ and B ,i = 1,2,...,Γ, j = i+1,...,Γ are non-zero Formally, it is defined as follows: × ij matrices. Definition 3: An STBC is said to be fast SD if there exists The low decoding complexity codes described in Section a partition of 1,2,...,L where L K into g non-empty { } ≤ II utilize the zero entries in the upper triangular matrix R, subsets Γ ,Γ ,...,Γ such that the following condition is 1 2 g in the breadth first or depth first search decoders such as the satisfied for all i<j sphere decoder or the QRDM decoder to achieve decoding q ,h =0, (8) complexity reduction. The fast sphere decoding complexity h i ji [21] of an STBC is governed by the zeros in the upper left whenever i Γ and j Γ and p = q where q and h are block of the R matrix and does not exploit the zeros in the ∈ p ∈ q 6 i j obtainedfromtheQRdecompositionoftheequivalentchannel lower right blocks. The zeros in the lower right block can be matrix H = [h h ...,h ] = QR with h ,i 1,2,...,K as used to reduce the average decoding complexity of the code eq 1 2 K i ∈ columnvectorsandQ=[q q ... q ]withq ,i 1,2,...,K where the average decoding complexity refers to the average 1 2 K i ∈ as column vectors as defined in (5). number of floating operations performed by the decoder. The Hence, by conditioningK L variables,the code becomes zeros in the lower right block are also utilized in some non − g-group decodable. As a special case, when no conditioning ML decoderssuch as the QRDM decoder [3] or the modified is needed, i.e., L = K, then the code is g-group decodable. sphere decoder [15] to reduce the decodingcomplexityof the TheRmatrixforfastdecodablecodeswillhavethefollowing code. form: A. Design criteria for Block Orthogonal STBCs ∆ B R= 1 , (9) The structure of block orthogonal matrix was defined in 0 B 2 (cid:20) (cid:21) (12).Ingeneral,thesizeofblockdiagonalmatrices,R ’s,and i where ∆ is an L L block diagonal,upper triangular matrix, the upper triangular blocks in these matrices can be arbitrary. × B is a square uppertriangularmatrix andB is a rectangular Similar to [3], we consider only the case that R s have the 2 1 i matrix. same size, k k, and the upper triangular blocks in R s each i × Fast groupdecodable codes were introducedin [12]. These havethesamesize γ γ.Hence,a blockorthogonalcodecan × codes combine the propertiesof multi-groupdecodable codes be represented by the parameters (Γ,k,γ): and the fast decodable codes. These codes allow each of the • Γ: The number of matrices Ri in R; groupsinthemulti-groupdecodablecodestobefastdecoded. • k:ThenumberofblocksintheblockdiagonalmatrixRi The R matrix for a fast group decodable code will have the - denoted by U , 1 j k; ij ≤ ≤ following form: • γ: The number of diagonal entries in the matrices Uij. AsetofsufficientconditionsforanSTBCtobeaBOSTBC R 0 0 1 ··· with the parameters (Γ,k,1) are described below: 0 R 0 R= .. ..2 ·.·.· .. , (10) 1) 2-Block BOSTBC: First a conditionfor the STBC to be . . . . block orthogonal with parameters (2,k,1) is given. The case 0 0 R for Γ>2 will be given subsequently. ··· g Lemma 1: [3] Consider an STBC of size T Nt with where each R ,i=1,2,...,g will have the following form: × i weight matrices A ,...,A , B ,...,B . Let 1 k 1 k ∆ B AR AI BR BI Ri =(cid:20) 0i Bii21 (cid:21), (11) Ai =(cid:20) AiIi −ARii (cid:21), Bi =(cid:20) BiIi −BRii (cid:21) where∆ isanL L blockdiagonal,uppertriangularmatrix, and , [a ] , , [b ] , i= 1,...,k, u= i i× i Ai iup 2T×2Nt Bi iup 2T×2Nt B isasquareuppertriangularmatrixandB isarectangular 1,...2T and p = 1,...2N . This STBC has block orthogonal i2 i1 t matrix. structure (2,k,1) if the following conditions are satisfied: • 1,..., k, 1, k is of dimension 2k. Lemma 4: Let the R matrix of an STBC with weight {A A B B } • ATi Ai =I and BiTBi =I for i=1,...,k. matrices {A1,...,AL} , {B1,...,Bl} be • aAnTidAij==j.−ATj Ai andBiTBj =−BjTBi fori,j =1,...,k R= R1 E , 0 R 6 2 • (p,q,s,t)∈Sdpqst =0 for i,j =1,...,k and i6=j where whereR isa L Lblock(cid:20)-orthogonal(cid:21)matrixwith parameters 1 × P k 2T 2T (Γ 1,k,γ), E is an L l matrix and R2 is a l l upper − × × d = b a . b a triangular matrix. The STBC will be a block orthogonal pqst iup lus jvq lvt l=1 u=1 v=1 ! STBC with parameters (Γ,k,γ) if the following conditions X X X are satisfied: and each element (tuple) of S includes four uniquely permuted scalars drawn from 1,...,2Nt . • The matrices {B1,...,Bl} are k-group decodable with γ { } variables in each group, i.e., B ,...,B can be parti- 1 l 2) Γ-block BOSTBC, Γ > 2: The set of conditions for an { } tionedintok sets S ,...,S , eachofcardinalityγ such 1 k STBC to have a block orthogonal structure with parameters that B BH + B B{H = 0 fo}r all B S , B S , (Γ,k,1) is now given. i j j i i ∈ m j ∈ n m=n. Lemma 2: [3] Let the R matrix of an STBC with weight 6 • The set of matrices A1,...,AL,B1,...,Bl are linearly matrices A1,...,AL , B1,...,Bk be independentover R.{ } { } { } R E • ThematrixEHEisablockdiagonalmatrixwithkblocks R= 1 , of size γ γ. 0 R2 × (cid:20) (cid:21) Proof: Proof is given in Appendix B. whereR isa L Lblock-orthogonalmatrixwith parameters 1 × B. Effect of ordering on block orthogonality (Γ 1,k,1), E is an L k matrix and R is a k k 2 − × × Wenowshowthattheblockorthogonalitypropertydepends uppertriangularmatrix.TheSTBCwillbeablockorthogonal on the ordering of the weight matrices or equivalently the STBCwithparameters(Γ,k,1)ifthefollowingconditionsare ordering of the variables. If we do not choose the right satisfied: ordering, we will be unable to get the desired structure. •• TThheemmaattrriicxesE{iBs1p,a..r.a,-Bukn}itaarrye,Hi.eu.r,wEitHz-ERa=doIn. orthogonal. Example 1: LetusconsidertheGoldencode[14]givenby: 1 α(s +s θ) jα s +s θ The authorsin [3] onlydiscussthe conditionsfor the block X= 1 2 3 4 , (13) orthogonal codes with parameters (Γ,k,1). These conditions √5(cid:20) α(s3+s4θ) α(cid:0)s1+s2θ(cid:1) (cid:21) can be easily derivedfor BOSTBCs with parameters(Γ,k,γ) whereθ = 1+√5 /2,θ = 1 √5(cid:0)/2,α=(cid:1)1+j(1 θ), − − as well. We first derive the conditions for Γ=2. α=1+j 1 θ and s =s +js for i=1,...,4. (cid:0) − (cid:1) i (cid:0)iI iQ(cid:1) Lemma 3: Consider an STBC of size n T with weight If we order the variables (and hence the weight matrices) t× (cid:0) (cid:1) matrices A1,A2,...,Al , B1,B2,...,Bl . Let the R matrix as[s1I,s1Q,s2I,s2Q,s3I,s3Q,s4I,s4Q],thentheRmatrixfor { } { } for this STBC be of the form SD has the following structure R E t 0 0 t t t t t R= 01 R , 0 t t 0 t t t t (cid:20) 2 (cid:21) 0 0 t 0 t t t t whereR1andR2arel luppertriangularmatrices,Eisanl l 0 0 0 t t t t t × × R= , matrix.TheSTBCwillhaveablockorthogonalstructurewith 0 0 0 0 t 0 0 t parameters (2,k,γ) if the following conditions are satisfied: 0 0 0 0 0 t t 0 • The matrices {A1,...,Al} are k-group decodable with 0 0 0 0 0 0 t 0 γ variables in each group, i.e., A ,...,A can be 0 0 0 0 0 0 0 t { 1 l} partitioned into k sets S1,...,Sk , each of cardinality γ where t denotesnon zero entries. This ordering of vari- such that AiAHj +AjA{Hi =0 for}all Ai ∈Sm, Aj ∈Sn, ables has presented a (4,2,1) block orthogonal structure m=n. to the R matrix. Now, if we change the ordering to 6 • The matrices {B1,...,Bl} are k-group decodable with γ [s1I,s2I,s1Q,s2Q,s3I,s4I,s3Q,s4Q], then the R matrix for variables in each group, i.e., B1,...,Bl can be parti- SD has the following structure { } tionedinto k sets S ,...,S , eachofcardinalityγ such that B BH + B B{H1= 0 kfo}r all B S , B S , t t 0 0 t t t t i j j i i ∈ m j ∈ n 0 t 0 0 t t t t m=n. 6 0 0 t t t t t t • The set of matrices {A1,...,Al,B1,...,Bl} are linearly 0 0 0 t t t t t independent over R. R= , 0 0 0 0 t t 0 0 • ThematrixEHEisablockdiagonalmatrixwithkblocks 0 0 0 0 0 t 0 0 of size γ γ. × 0 0 0 0 0 0 t t Proof: Proof is given in Appendix A. 0 0 0 0 0 0 0 t where t denotes non zero entries. This ordering of variables • The unitary matrix in the i-th row and the j-th column has presented a (2,2,2) block orthogonal structure to the R is equal to A A . i (j1)+1 matrix. We can also have an ordering which can leave the The CUWD matrix representation for these matrices for a R matrix bereft of any block orthogonal structure such as system with 2a transmit antennas are given below [9]. Let [s ,s ,s ,s ,s ,s ,s ,s ]. The structure of the R 1I 1Q 4I 2Q 3I 3Q 2I 4Q 0 1 0 j 1 0 matrix in this case will be σ = , σ = , σ = . 1 1 0 2 j 0 3 0 1 t 0 t 0 t t t t (cid:20) − (cid:21) (cid:20) (cid:21) (cid:20) − (cid:21) The representationsof the Cliffordgeneratorsare givenby: 0 t t t t t 0 t 0 0 t t t t t 0 R(γ )= jσ⊗a, 0 0 0 t t t t t 1 ± 3 R= 0 0 0 0 t t t t , R(γ2k)=I2⊗a−k σ1 σ3⊗k−1, 0 0 0 0 0 t t t 0 0 0 0 0 0 t t R(γ2k+1)=I2⊗a−kO σ2O σ3⊗k−1, 0 0 0 0 0 0 0 t R(γ0)=OI2a, O Also note that we have many entries r = 0 even when the ij 6 where k = 1,...,a. The weight matrices of the CUWD for a i-th and the j-th weight matrices are HR orthogonal such as rate-1,fourgroupdecodableSTBC canbederivedasfollows. for cases i=6,j =8 and i=5,j =8 etc. Let α = jR(γ )R(γ ) for i = 1,2,...,a 1. Let λ = i 2i 2i+1 − IV. CONSTRUCTION OF BLOCK ORTHOGONAL STBCS 2a−1. The weight matrices are now given by Code constructions for block orthogonal STBCs with vari- Aλ+1 =R(1), ous parameterswere presented in [3]. It was shown via simu- A =R(γ ), 2λ+1 2a+1 lations that these constructions were indeed block orthogonal A =R(γ ), 3λ+1 2a with the aforementioned parameters. We provide analytical (14) A =A A , proofs for the block orthogonal structure of some of these jλ+k k jλ+1 constructionswhichincludealsootherwellknowncodessuch a−1 A = αki as the BHV code [10], the Silver code [22] and the Srinath- k i Rajan code [23]. We first study some basics of CUWDs and iY=1 CIODs. for j = 1,2,3, k = 1,..λ and where (k ,k ,...,k ) is the 1 2 a−1 binary representation of k 1. A. CUWDs and CIODs − 2) CIODs: Coordinate interleaved orthogonal designs 1) CUWDs: [16] Linear STBCs can be broadly classified (CIODs) were introduced in [17]. as unitary weight designs (UWDs) and non unitary weight Definition 4: A CIOD fora system with2a transmitanten- designs (NUWDs). A UWD is one for which all the weight nas in variables x , i = 1,...,K 1, K even, is a 2a 2a i − × matrices are unitary and NUWDs are defined as those which matrix S(x ,...,x ), such that 0 K−1 are not UWDs. Clifford unitary weight designs (CUWDs) are Θ x˜ ,...,x˜ 0 a proper subclass of UWDs whose weight matrices satisfy 1 0 K−1 S = 2 , (15) certain sufficient conditions for g-group ML decodability. To (cid:16) 0 (cid:17) Θ x˜ ,...,x˜ 2 K K−1 state those sufficient conditions, let us list down the weight 2 (cid:16) (cid:17) matrices of a CUWD in the form of an array as shown in where Θ x˜ ,...,x˜ and Θ x˜ ,...,x˜ are com- 1 0 K−1 2 K K−1 Table I. 2 2 plex ortho(cid:16)gonal designs(cid:17)of size (cid:16)2a−1 2a−1 (cid:17)and x˜i = × TABLEI xiI +jx(i+K/2)modK. STRUCTUREOFCUWDS B. BOSTBCs from CUWDs A1 Aλ+1 ··· A(g−1)λ+1 We now show that STBCs obtained as a sum of rate-1, A2 Aλ+2 ··· A(g−1)λ+2 four group decodable CUWDs exhibit the block orthogonal .. .. .. .. . . . . structure with parameters (2,4,λ). Aλ A2λ ··· AK Lemma 5: Construction I: Let X (s ,s ,...,s ) be a 1 1 2 4λ rate-1, four group decodable STBC obtained from CUWD Alltheweightmatricesinonecolumnbelongtoonegroup. [16] with weight matrices A ,A ,...,A . Let M be 1 2 4λ { } TheweightmatricesofCUWDssatisfythefollowingsufficient an n n matrix such that the set of matrices t t × conditions for g-group ML decodability. A ,A ,...,A ,MA ,MA ,...,MA arelinearlyindepen- 1 2 4λ 1 2 4λ { } • A1 =I. dent over R. Then the STBC given by • All the matrices in the first row except A1 should X(s ,s ,...,s )=X (s ,s ,...,s ) square to I and should pair-wise anti-commute among 1 2 8λ 1 1 2 4λ +M.X (s ,s ,...,s ), themselves. 1 4λ+1 4λ+2 8λ will exhibit a block orthogonal structure with parameters and M as (2,4,λ). 0 j M= , Proof: Proof is given in Appendix C. 1 0 (cid:20) (cid:21) Example 2: Let us consider the BHV code given by: wecanseethatthegoldencodeisaBOSTBCwithparameters X=X (s ,s )+TX (z ,z ), 1 1 2 1 1 2 (2,2,2). where X and X take the Alamouti structure, and 1 1 X1(s1,s2) = ss1 −ss∗∗2 , T = 10 01 and D. BOSTBCs from CIODs [z ,z ]T = U[s ,(cid:20)s ]T2, whe1re(cid:21)U is a unita(cid:20)ry mat−rix (cid:21)chosen In this section we show that the BOSTBCs that can be 1 2 3 4 obtained from CIODs [17]. to maximize the minimum determinant. In this case, as per Lemma 8: Construction IV: Let X (s ,s ,...,s ) be the above construction, M = TU. Hence, the BHV code is a 1 1 2 K a rate-1 CIOD with weight matrices A ,A ,...,A . BOSTBC with parameters (2,4,1). { 1 2 K} Let M be a matrix such that the set of matrices C. BOSTBCsfromCyclicDivision/CrossedProductAlgebras A ,A ,...,A ,MA ,MA ,...,MA are linearly indepen- 1 2 K 1 2 K { } In this section, we show the block orthogonality property dent over R. Then the STBC given by of two constructions from either cyclic division algebras or crossed product algebras over the field Q(i). X(s1,s2,...,s2K)=X1(s1,s2,...,sK) Lemma 6: ConstructionII:LetX be anSTBC withweight +MX (s ,s ,...,s ), 1 K+1 K+2 2K matrices A ,...,A and B ,...,B for the variables 1 K 1 K { } { } [x ... x ] and [x ... x ] respectively such that B = will exhibit a block orthogonal structure with parameters 1I KI 1Q KQ i jA for 1 i K. Then the code X exhibits the block (2,K/2,2). i ≤ ≤ orthogonal property with parameters (K,2,1) if we take the Proof: Proof is given in Appendix F. ordering of weight matrices as A1,B1,...,AK,BK . Example 5: Considerthe2 2codeconstructedbySrinath { } × Proof: Proof is given in Appendix D. et al. in [23] given by Example 3: Consider any STBC obtained from the Cyclic Division Algebra (CDA) [18] over the base field Q(i). The X= x1I +jx2Q ejπ/4(x3I +jx4Q) , structure of such an STBC will be ejπ/4(x +jx ) x +jx (cid:20) 4I 3Q 2I 1Q (cid:21) x γσ(x ) γσn−1(x ) 0 n−1 ··· 1 If we consider, x σ(x ) γσn−1(x ) 1 0 n−2 X= ... ... ·.·.·. ... , X = x1I +jx2Q 0 , xn−1 σ(xn−2) ··· σn−1(x0) 1 (cid:20) 0 x2I +jx1Q (cid:21) where x = x +jx . The weight matrices of this STBC and M as k kI kQ satisfy the propertiesof the constructionabove.Hence, this is 0 ejπ/4 M= , a BOSTBC with parameters (n,2,1). ejπ/4 0 (cid:20) (cid:21) The next construction is a special case of the previous construction. we see that the code is a BOSTBC with parameters (2,2,2). Lemma 7: Construction III: Let X be a two group 1 decodable STBC with weight matrices {A1,...,AK} and V. REDUCTION OFDECODING COMPLEXITYFOR BLOCK {B1,...,BK} for the variables[x1 ... xK] and [xK+1 ... x2K] ORTHOGONAL CODES respectively such that B = jA for i = 1,...K. i i Let M be a matrix such that the matrices in the set In this section we describe how we can achieve decoding A ,A ,...,A ,MA ,MA ,...,MA are linearly indepen- complexity reduction for BOSTBCs. Also we show how 1 2 K 1 2 K { } dent over R. Then the STBC given by the block orthogonal structure helps in the reduction of the EuclideanMetric(EM)calculationsandthesortingoperations X(x ,x ,...,x )=X (x ,x ,...,x ) 1 2 4K 1 1 2 2K for a sphere decoder using a depth first search algorithm. We +M.X1(x2K+1,x2K+2,...,x4K), also briefly present the implications of the block orthogonal structure for QRDM decoders as discussed in [3]. will exhibit a block orthogonal structure with parameters (2,2,K). Proof: Proof is given in Appendix E. A. ML decoding complexity reduction Example 4: Consider the golden code as given in example The sphere decoder under considerationin this section will 1. If we consider, bethedepthfirstsearchalgorithmbaseddecoderwithSchnorr- 1 α(s +s θ) 0 Euchnerenumerationandpruningasdiscussedin[13].Wefirst X = 1 2 , 1 √5 0 α s1+s2θ consider the case of Γ=2 Block Orthogonal Code. (cid:20) (cid:21) (cid:0) (cid:1) in (12). Consider the block R , 1 < i Γ of the R matrix. i ≤ For a given set of values for the variables in the blocks R , |y − r A(1)| |y − r A(2)| m |y − r A(1)| 3 3,3 3 3,3 m > i, we can see that the variables in the blocks U and 3 3,3 i,j |y − r A(2)| 3 3,3 U , 1 j < l k, are independent as seen in the case i,l ≤ ≤ of Γ = 2. Hence, we can use memoization here as well in |y − r A(1)| |y − r A(2)| order to reduce the number of EM calculations and sorting 4 4,4 4 4,4 operations. B. ComplexityreductionboundandMemory requirementsfor depth first sphere decoder 2 4 2 4 Wecalculatethemaximumpossiblereductioninthenumber ofEMvaluescalculatedandthememoryrequirementsforthe look up tables in this section. First we consider the case of 1 2 Γ=2. 1) Γ = 2: Considering a (2,k,γ) BOSTBC, we first calculatethe memoryrequirementsforstoringtheEM values. Let each of the variables of the STBC take values from a Fig.1. Firsttwolevels ofthespheredecoder treeforthecodeinexample constellation of size M. The number of EM values that need (6) to be stored for a single sub-block U , 1 j <k, is 2,j ≤ Mem(U )=M +M2+...+Mγ 2,j 1) Γ = 2: Consider a BOSTBC with parameters (2,k,γ). Mγ+1 M M(Mγ 1) The structure of the R matrix for this code is as mentioned = − = − . M 1 M 1 in (12) with two blocks R and R . This code is fast sphere − − 1 2 These values will need to be stored for (k 1) such sub- decodable, i.e., for a given set of values of variables in sub- − blocks. The total memory requirement for the block R is, blocks U , j = 1,...,k, we can decode the variables in 2 2,j U and U , 1 j < l k, independently. The ML M(Mγ 1) 1,j 1,l ≤ ≤ Mem(R )=(k 1) − . decoding complexity of this code will be O Mkγ+γ . Due 2 − M 1 to the structure of the block orthogonalcode, we can see that − We now find the maximum number of reductions possible (cid:0) (cid:1) the variables in the blocks U and U , 1 j <l k, are 2,j 2,l for the EM calculations for this BOSTBC. This will occur ≤ ≤ alsoindependentinthesensethattheEMcalculationsandthe whenallthenodesarevisitedinthedepthfirstsearch.Forthe Schnorr-Euchnerenumerationbasedsortingoperationsforthe block R , the number of EM calculations for a code without 2 variables in U are independent of the values taken by the 2,j the block orthogonal structure would be variables in U . We illustrate this point with an example. 2,l Example 6: Consider a hypothetical BOSTBC having the M Mkγ 1 O =M +M2+...+Mkγ = − . STBC parameters (2,2,1) with variables x ,x ,x ,x . The R M 1 { 1 2 3 4} (cid:0) − (cid:1) matrix for this BOSTBC will be of the form For a BOSTBC, if we use the look up table, we would be t 0 t t performing the EM calculations only once per each of the 0 t t t sub-block. For k sub-blocks, the number of EM calculations R= 0 0 t 0 will be 0 0 0 t OBOSTBC =k M +M2+...+Mγ The first two levels of the search tree for the sphere decoder M(Mγ 1) are shown in in Figure 1 with the variables assumed to be =k(cid:0) − . (cid:1) M 1 taking values from a 2-PAM constellation - A. As it can be − We thereforeperformonly a small percentageof EM calcula- seenfromthefigure,irrespectiveofthevaluetakenbyx ,the 4 tionsifthecodeexhibitsablockorthogonalstructure.Wecall edge weights (Euclidean metrics) for the variable x remain 3 the ratio of the the number of EM calculated for a BOSTBC the same. to the number of EM calculated if the STBC did not possess From example 6 we can see that instead of calculating the a block orthogonal structure as Euclidean Metric Reduction EM repeatedly, we can store these values in a look up table Ratio (EMRR) given by when they are calculated for the first time and retrieve them whenever needed. This technique of avoiding repeated calcu- O kM(Mγ−1) k(Mγ 1) lations by storing the previously calculated values is known BOSTBC = M−1 = − O M(Mkγ−1) (Mkγ 1) STBC as Memoization [24]. This approach reduces the number of M−1 − k floating point operations (FLOPS) significantly. , 2) Γ>2: Consider a BOSTBC with parameters (Γ,k,γ). ≈ M(k−1)γ The structure of the R matrix for this code is as mentioned which is a decreasing function of k, M and γ. 2) Γ > 2: Considering a (Γ,k,γ) BOSTBC, we first The EM calculations for all the blocks is given by calculatethe memoryrequirementsforstoringthe EMvalues. The memory requirement per sub-block Ui,j, 1 j < k, of O = Γ kM(Γ−i)kγM(Mγ −1) ≤ BOSTBC any block R , 1<j Γ, under consideration is the same as M 1 that of the caise of the≤sub-block U2,j in the Γ=2 case. This Xik=M2 (Mγ 1)M(Γ−1−)kγ 1 is so because, for a given set of values for the variables in = − − . M 1 Mkγ 1 the blocks Rm, i < m Γ, the memory requirement for the − − ≤ The EMRR in this case will be sub-block U can be calculated in the similar way as it was i,j calculated for U2,j for the Γ = 2 case. Hence, the memory O kM(Mγ−1)M(Γ−1)kγ−1 requirements for a block R for a given set of values for the BOSTBC = M−1 Mkγ−1 variables in the blocks R iis the same as that of R in the OSTBC M(M(Γ−1)kγ−1) m 2 M−1 Γ=2 case. k(Mγ 1) k = − . M(Mγ 1) (Mkγ 1) ≈ M(k−1)γ Mem(R ) =(k 1) − . − i conditional − M 1 We can see that the ratio of the reduction of operations is − independent of Γ and dependent only on k and γ. Wecanreusethesamememoryforanothersetofgivenvalues of the variables of R , as the previous EM values will not C. QRDM decoding complexity reduction [3] m be retrievedagain as the depth first search algorithmdoes not In this section we review the simplified QRDM decoding revisitanyofthepreviouslyvisitednodes(i.e.,anypreviously method which exploits the block orthogonal structure of a given set of values for the variables in the tree). Hence, we code as presented in [3]. The traditional QRDM decoder is can write, a breadth first search decoder in which M surviving paths c M(Mγ 1) with the smallest Euclidean metrics are picked at each stage Mem(Ri)=(k−1) M −1 , and the rest of the paths are discarded. If Mc = M(Γ−1)kγ − for a block orthogonal code with parameters (Γ,k,γ), then for 1 < i Γ. Since there are Γ 1 such blocks, the total the QRDM decoder gives ML performance. The simplified ≤ − memory requirement for storing the EM values will be QRDM decoder utilizes the block orthogonal structure of the code to find virtual paths between nodes, which reduces the M(Mγ 1) Mem(R)=(Γ 1)(k 1) − . number of surviving paths to effectively Mceq, to reduce the − − M 1 number of Euclidean metric calculations. For details of how − this is achieved, refer to [3]. The maximum reduction in We now find the maximum number of reductions possible decoding complexity bound for a QRDM decoder is given for the EM calculations for this BOSTBC. This will occur by when all the nodes are visited in the depth first search. For O Mγ BOSTBC blocks other than R , the number of EM calculations for a = . 1 O k(Mγ 1) code without the block orthogonalstructure would be STBC − VI. SIMULATION RESULTSAND DISCUSSION O =M +M2+...+M(Γ−1)kγ STBC In all the simulation scenarios in this section, we consider M M(Γ−1)kγ 1 quasi-staticRayleighflatfadingchannelsandthechannelstate = − . M 1 information (CSI) is known at the receiver perfectly. Any (cid:0) − (cid:1) STBC which does not have a block orthogonal property is For a BOSTBC, if we consider the block R and for a given i assumedtobeafastdecodableSTBCwhichisconditionallyk setofvaluesforthevariablesinR ,i<m Γ,ifweusethe m groupdecodablewithγ symbolspergroup,butnotpossessing ≤ look up table, we would be performing the EM calculations the block diagonal structure for the blocks R ,...,R . 2 Γ only once per each of the sub-block. For k sub-blocks, the number of EM calculations will be A. Sphere decoding using depth first search We first plot the EMRR for BOSTBCs with different O (R ) =k M +M2+...+Mγ BOSTBC i conditional parameters against the SNR. Figures 2 and 3 show the plot M(Mγ 1) of O /O vs SNR for a (2,4,1) BOSTBC (ex- =k(cid:0) − . (cid:1) BOSTBC STBC M 1 amples - Silver code, BHV code) with the symbols being − drawn from 4-QAM, 16-QAM and 64-QAM. We can clearly These calculations need to be repeated for all the M(Γ−i)kγ see that the reduction in the EMRR with the increasing size values of the variables in R . m of signal constellation as explained in section V-B. It can also be seen that a larger value of k gives a lower EMRR O (R )=kM(Γ−i)kγ M +M2+...+Mγ BOSTBC i if we keep the product kγ constant. Figure 4 shows the =kM(Γ−i)kγM(cid:0) (Mγ −1). (cid:1) plot of OBOSTBC/OSTBC vs SNR for a (2,4,2) BOSTBC M 1 (examples - 4 2 code from Pavan et al [23]) with the − × 1 1 0.9 4−QAM 0.9 4−QAM 0.8 16−QAM 0.8 16−QAM 0.7 64−QAM 0.7 OSTBC0.6 OSTBC0.6 O/BOSTBC00..45 O/BOSTBC00..45 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 SNR (dB) SNR (dB) Fig.2. EMRRvsSNRforaBOSTBCwithparameters (2,4,1) Fig.4. EMRRvsSNRforaBOSTBCwithparameters (2,4,2) 105 #FLOPS for 4−QAM for code without BO property 1 #FLOPS for 4−QAM for code with BO property 4−QAM 0.9 #FLOPS for 16−QAM for code without BO property 16−QAM 0.8 #FLOPS for 16−QAM for code with BO property 64−QAM 104 #FLOPS for 64−QAM for code without BO property 0.7 C S #FLOPS for 64−QAM for code with BO property TB0.6 P S O O L /BC0.5 #F T S BO0.4 103 O 0.3 0.2 0.1 102 0 5 10 15 20 25 30 0 SNR (dB) 0 5 10 15 20 25 30 SNR (dB) Fig. 5. Thenumber of FLOPSrequired for decoding for aBOSTBC with Fig.3. EMRRvsSNRforaBOSTBCwithparameters (2,2,2) parameters (2,4,1) B. Comparison with the QRDM decoder approach symbolsbeing drawnfrom 4-QAM and 16-QAM.Notice that The primary difference between the depth first and the the(2,4,2)BOSTBCoffersalowerEMRRascomparedtothe breadth first (QRDM) approach is the variation of the EMRR (2,4,1) BOSTBC due to the higher value of γ, as explained withrespecttoSNR.AsseeninthefiguresfromsectionVI-A, in section V-B. theeffectoftheblockorthogonalpropertyreducesastheSNR We now compare the total number of FLOPS performed increases in the depth first sphere decoder. This is owing to by the sphere decoder for a BOSTBC against that of an theSchnorr-Euchnerenumerationandpruningofbranches.As STBC withouta blockorthogonalstructurefor variousSNRs. theSNRincreases,thedecoderneedstovisitfewernumberof Figures 5, 6, 7 show the plot of number of FLOPS vs SNR nodes in order to find the ML solution and hence the EMRR for a (2,4,1) BOSTBC, a (2,2,2) BOSTBC and a (2,4,2) also tends to 1. However, in the case of a breadth first search BOSTBC respectively with the symbols being drawn from 4- algorithm, all the nodes need to be visited in order to arrive QAM, 16-QAM and 64-QAM for the first two figures and to a solution. Hence the EMRR is independentof the SNR in from 4-QAM and 16-QAM for the last one. We can see that the breadth first search case. To reduce the number of nodes the BOSTBCs offer around 30% reduction in the number of visited, only M paths are selected in the QRDM algorithm c FLOPS for the (2,4,1) and (2,4,2) BOSTBCs and around to reduce complexity. The value of M chosen needs to be c 15% for the (2,2,2) BOSTBC at low SNRs. varied with SNR in order to get near ML performance. 105 [3] T. P. Ren, Y. L. Guan, C. Yuen and E. Y. Zhang, “Block-Orthogonal # FLOPS for 4−QAM without BO structure SpaceTime Code Structure and Its Impact on QRDM Decoding Com- # FLOPS for 4−QAM with BO structure plexityReduction,“IEEEjournalofSelectedTopicsinSignalProcessing, vol.5,issue8,pp.1438-1450,Nov.2011. # FLOPS for 16−QAM without BO structure [4] V.Tarokh,H.JafarkhaniandA.R.Calderbank,“Space-TimeBlockCodes # FLOPS for 16−QAM with BO structure from Orthogonal Designs,“ IEEETrans. Inf. Theory, vol. 45, no. 5, pp. 104 # FLOPS for 64−QAM without BO structure 1456-1467,July1999. # FLOPS for 64−QAM with BO structure [5] X.B.Liang,“OrthogonalDesignswithMaximalRates,“IEEETrans.Inf. S Theory,vol.49,no.10,pp.2468-2503, Oct.2003. P O [6] O.TirkkonenandA.Hottinen,“Square-Matrix EmbeddableSpace-Time L F Block Codes for Complex Signal Constellations,“ IEEE Trans. Inf. # Theory,vol.48,no.2,pp.384-395, Feb.2002. 103 [7] D. N. Dao, C. Yuen, C. Tellambura, Y. L. Guan and T. T. Tjhung, “Four-Group Decodable Space-Time Block Codes,“ IEEETrans. Signal Processing,vol.56,no.1,pp.424-430,Jan.2008. [8] S. Karmakar and B. S. Rajan, “Multigroup Decodable STBCs From CliffordAlgebras,“IEEETrans.Inf.Theory,vol.55,no.1,pp.223-231, Jan.2009. [9] S. Karmakar and B. S. Rajan, “High-rate, Multi-Symbol-Decodable 102 0 5 10 15 20 25 30 STBCsfromClifford Algebras,“IEEETransactions onInf.Theory,vol. SNR (dB) 55,no.06,pp.2682-2695,Jun.2009. [10] E. Biglieri, Y. Hong and E. Viterbo, “On Fast-Decodable Space-Time BlockCodes,“IEEETrans.Inf.Theory,vol.55,no.2,pp.524-530,Feb. Fig. 6. The numberof FLOPSrequired fordecoding fora BOSTBCwith parameters (2,2,2) 2009. [11] R.Vehkalahti, C.Hollanti andF.Oggier,“Fast-Decodable Asymmetric Space-Time Codes from Division Algebras,“ available online at arXiv, 106 arXiv:1010.5644v1[cs.IT]. [12] T.P.Ren,Y.L.Guan,C.YuenandR.J.Shen,“Fast-Group-Decodable # FLOPS for 4−QAM without BO structure Space-TimeBlockCode,“Proceedings IEEEInformation TheoryWork- # FLOPS for 4−QAM with BO structure shop, (ITW 2010), Cairo, Egypt, Jan. 6-8, 2010, available online at http://www1.i2r.a-star.edu.sg/cyuen/publications.html. 105 # FLOPS for 16−QAM without BO structure [13] M.O.SinnokrotandJ.Barry,“FastMaximum-LikelihoodDecodingof # FLOPS for 16−QAM with BO structure theGoldenCode,“IEEETransactionsonWirelessCommun.,vol.9,no. 1,pp.26-31,Jan.2010. [14] J. C. Belfiore, G. Rekaya and E. Viterbo, “The golden code: A 2x2 S P full-rate spacetime code with non-vanishing determinants,“ IEEETrans. LO104 Inf.Theory,vol.51,no.4,pp.1432-1436,Apr.2005. F # [15] S. Kahraman and M. E. Celebi, “Dimensionality Reduction for the GoldenCodewithWorst-caseComplexityofO(cid:0)m2(cid:1),“available online athttp://istanbultek.academia.edu/SinanKahraman . [16] G. S. Rajan and B. S. Rajan, “Multi-group ML Decodable Collocated 103 andDistributedSpaceTimeBlockCodes,“IEEETrans.Inf.Theory,vol. 56,no.7,pp.3221-3247,July2010. [17] Z.AliKhanMd.,andB.S.Rajan,“SingleSymbolMaximumLikelihood Decodable Linear STBCs,“IEEETrans.Inf.Theory, vol. 52, no.5, pp. 102 2062-2091,May2006. 0 5 10 15 20 25 30 [18] B.A.Sethuraman,B.S.RajanandV.Shashidhar,“Full-diversity, high- SNR (dB) ratespace-timeblockcodesfromdivisionalgebras,“IEEETrans.Inform. Theory,vol.49,pp.2596-2616, Oct2003. Fig. 7. The numberof FLOPSrequired fordecoding fora BOSTBCwith [19] V.Shashidhar,B.S.RajanandB.A.Sethuraman,“Information-lossless parameters (2,4,2) space-timeblockcodesfromcrossed-productalgebras,“IEEETrans.Inf. Theory,vol.52,no.9,pp.39133935, Sep2006. [20] O. Damen, A. Chkeif, and J.C. Belfiore, “Lattice Code Decoder for Space-TimeCodes,“IEEECommunicationLetters,vol.4,no.5,pp.161- VII. CONCLUSION 163,May2000. [21] G. R. Jithamithra and B. S. Rajan, “Minimizing the Complex- Inthispaperwehavestudiedtheblockorthogonalproperty ity of Fast Sphere Decoding of STBCs,“ available online at arXiv, ofSTBCs.Wehaveshownthatthispropertydependsuponthe arXiv:1004.2844v2[cs.IT],22May2011. orderingof weight matrices. We have also providedproofsof [22] C.Hollanti, J.Lahtonen, K.Ranto, R.Vehkalahti andE.Viterbo, “On thealgebraicstructureoftheSilvercode:A2 2Perfectspace-timecode various existing codes exhibiting the block orthogonal prop- withnon-vanishingdeterminant,“inProc.ofIEEEInf.TheoryWorkshop, erty.A methodofexploitingthe blockorthogonalstructureof Porto,Portugal,May2008. theSTBCstoreducethespheredecodingcomplexitywasalso [23] K.P.Srinath andB. S.Rajan, “Low ML-Decoding Complexity, Large Coding Gain, Full-Rate, Full-Diversity STBCs for 2x2 and 4x2 MIMO given with bounds on the maximum possible reduction. Systems,“IEEEJournalofSelectedTopicsinSignalProcessing:Special issueonManagingComplexityinMultiuserMIMOSystems,vol.3,no. REFERENCES 6,pp.916-927,Dec.2009. [24] T.H.Cormen,C.E.Leiserson,R.L.Rivest, C.Stein,“Introduction to [1] B. Hassibi and B. Hochwald, “High-rate codes that are linear in space algorithms,“Thirdedition, MITPress,Sep2009. andtime,“ IEEETrans.Inf.Theory,vol. 48,no.7,pp. 1804-1824, July 2002. [2] E.ViterboandJ.Boutros,“AUniversalLatticeCodeDecoderforFading Channels,“ IEEETrans.Inf.Theory,vol.45,no.5,pp.1639-1642,July 1999.