Table Of Content

Low-Complexity Near-ML Decoding of Large Non-Orthogonal STBCs Using PDA Saif K. Mohammed,A. Chockalingam,and B. SundarRajan DepartmentofECE,IndianInstituteofScience,Bangalore560012,INDIA Abstract—Non-orthogonalspace-timeblockcodes(STBC)from that802.11smartWiFi productswith 12 transmitantennas1 cyclicdivisionalgebras (CDA) havinglarge dimensionsareat- at2.5GHzarenowcommerciallyavailable[4](whichestab- tractivebecausetheycansimultaneouslyachievebothhighspec- lishesthatissuesrelatedtoplacementofmanyantennasand tral efficiencies (same spectral efficiency as in V-BLAST for a 9 RF/IFchainscanbesolvedinlargeaperturecommunication givennumberoftransmitantennas)aswellasfulltransmitdi- 0 versity.Decodingofnon-orthogonalSTBCswithhundredsofdi- terminals like set-top boxes/laptops), large non-orthogonal 0 mensionshasbeenachallenge.Inthispaper,wepresentaprob- STBCs(e.g.,16 16STBCfromCDA)incombinationwith 2 × abilistic data association (PDA) based algorithm for decoding large dimension near-ML decoding using PDA can enable n non-orthogonalSTBCswithlargedimensions. Oursimulation communications at increased spectral efficiencies of the or- a results show that the proposed PDA-based algorithm achieves deroftensofbps/Hz(notethatcurrentstandardsachieveonly J nearSISOAWGNuncodedBERaswellasnear-capacitycoded BER (within about 5 dB of the theoretical capacity) for large <10bps/Hzusingonlyupto4transmitantennas). 3 non-orthogonal STBCsfromCDA. Westudytheeffect of spa- 1 PDA,originallydevelopedfortargettracking,iswidelyused tialcorrelationontheBER,andshowthattheperformanceloss due to spatial correlation can be alleviated by providing more in digital communications[5]-[12]. Particularly, PDA algo- ] T receive spatial dimensions. Wereport good BER performance rithm is a reduced complexity alternative to the a posteriori I whena training-basediterative decoding/channelestimation is probability (APP) decoder/detector/equalizer. Near-optimal s. used(insteadofassumingperfectchannelknowledge)inchan- performancehasbeendemonstratedforPDA-basedmultiuser c nels with large coherence times. A comparison of the perfor- detection in CDMA systems [5]-[8]. PDA has been used in [ mancesof thePDAalgorithmandthelikelihoodascent search (LAS)algorithm(reportedinourrecentwork)isalsopresented. the detection of V-BLAST signals with small numberof di- 1 mensions [10]-[12]. To our knowledge, PDA has not been v Keywords–Non-orthogonalSTBCs,largedimensions,highspec- reportedfordecodingnon-orthogonalSTBCs with hundreds 9 tralefficiency,low-complexitynear-MLdecoding,probabilisticdata ofdimensionssofar. Ourresultsinthispapercanbesumma- 6 association. rizedasfollows: 8 1 . I. INTRODUCTION • WeadaptthePDAalgorithmfordecodingnon-orthogo- 1 nalSTBCswithlargedimensions. Withi.i.dfadingand 0 Multiple-inputmultiple-output(MIMO)systemsthatemploy perfectCSIR,thealgorithmachievesnear-SISOAWGN 9 non-orthogonalspace-time blockcodes (STBC) from cyclic uncodedBERandnear-capacitycodedBER(withinabout 0 divisionalgebras(CDA)forarbitrarynumberoftransmitan- 5dBofthetheoreticalcapacity)for12 12STBCfrom : v tennas, Nt, are quite attractive because they can simultane- CDA,4-QAM,rate-3/4turbocode,and×18bps/Hz. i ously provide both full-rate (i.e., N complex symbols per X t RelaxingtheperfectCSIRassumption,wereportresults channel use, which is same as in V-BLAST) as well as full • withatrainingbasediterativePDAdecoding/channeles- r a transmitdiversity[1].The2 2Goldencodeisawellknown timation scheme. The iterative scheme is shown to be × non-orthogonalSTBCfromCDAfor2transmitantennas[2]. effectivewithlargecoherencetimes. High spectralefficienciesof the orderof tensof bps/Hz can Relaxingthei.i.dfadingassumptionbyadoptingaspa- be achieved using large non-orthogonalSTBCs. For exam- • tiallycorrelatedMIMOchannelmodel(proposedbyGes- ple, a 16 16 STBC from CDA has 256 complex symbols bertetalin[18]),weshowthattheperformancelossdue × in it with 512 real dimensions; with 16-QAM and rate-3/4 tospatialcorrelationisalleviatedbyusingmorereceive turbocode,thissystemoffersahighspectralefficiencyof48 spatialdimensionsforafixedreceiveraperture. bps/Hz. Decodingofnon-orthogonalSTBCswithsuchlarge Finally, the performanceofthe PDA algorithmiscom- dimensions,however,hasbeena challenge. Spheredecoder • pared with that of the likelihood ascent search (LAS) anditslow-complexityvariantsareprohibitivelycomplexfor algorithmwerecentlypresentedin[13]-[15]. ThePDA decodingsuchSTBCswithhundredsofdimensions. algorithm is shown to perform better than the LAS al- Inthispaper,wepresentaprobabilisticdataassociation(PDA) gorithm at low SNRs for higher-order QAM (e.g., 16- based algorithm for decoding large non-orthogonal STBCs QAM),andinthepresenceofspatialcorrelation. from CDA. Key attractive features of this algorithm are its low-complexity and near-ML performance in systems with II. SYSTEMMODEL largedimensions(e.g.,hundredsofdimensions). Whilecre- Considera STBC MIMOsystemwithmultipletransmitand atinghundredsofdimensionsinspacealone(e.g.,V-BLAST) receiveantennas. An(n,p,k)STBCisrepresentedbyama- requireshundredsofantennas,useofnon-orthogonalSTBCs fromCDA can createhundredsof dimensionswith justtens 112 antennas in these products are now used only for beamforming. of antennas (space) and tens of channel uses (time). Given Single-beam multi-antenna approaches can offer range increase andinter- ferenceavoidance,butnotspectralefficiencyincrease. trixXc Cn×p,wherenandpdenotethenumberoftransmit Now,(3)canbewrittenas ∈ antennas and number of time slots, respectively, and k de- notesthenumberofcomplexdatasymbolssentinoneSTBC y = H x +n . (7) r r r r matrix.The(i,j)thentryinX representsthecomplexnum- c bertransmittedfromtheithtransmitantennainthejthtime Henceforth,weworkwiththereal-valuedsystemin(7). For slot.TherateofanSTBCis k.LetN andN =ndenotethe notationalsimplicity,wedropsubscriptsrin(7)andwrite p r t number of receive and transmit antennas, respectively. Let Hc CNr×Nt denote the channel gain matrix, where the y = H′x+n, (8) ∈ (i,j)thentryinH isthecomplexchannelgainfromthejth transmitantennatoc the ith receiveantenna. We assume that whereH′ = Hr ∈ R2Nrp×2k,y = yr ∈ R2Nrp×1,x = xr ∈ thechannelgainsremainconstantoveroneSTBCmatrixand R2k×1, andn = nr ∈ R2Nrp×1. We assumethatthe channel coefficientsareknownatthereceiverbutnotatthetransmit- vary (i.i.d) from one STBC matrix to the other. Assuming ter. LetA denotetheM-PAMsignalsetfromwhichx (ith richscattering,wemodeltheentriesofH asi.i.d (0,1). i i c CN entry of x) takes values, i = 0, ,2k 1. Now, define a The receivedspace-timesignalmatrix, Yc ∈ CNr×p, can be 2k-dimensionalsignalspaceSto·b·e· theC−artesianproductof writtenas A toA . TheMLsolutionisthengivenby Yc =HcXc+Nc, (1) 0 2k−1 iwtsheenretriNescar∈emCoNdre×lpedisasthi.ei.dnoCiNse m0,aσtr2ix=atNthtγEesre,cweihveerreaEnds dML = adrg∈mSindT(H′)TH′d−2yTH′d, (9) istheaverageenergyofthetransm(cid:0)ittedsymbols,(cid:1)andγisthe whosecomplexityisexponentialink. averagereceivedSNRperreceiveantenna[3],andthe(i,j)th entryinY isthereceivedsignalattheithreceiveantennain c A. Full-rateNon-orthogonalSTBCsfromCDA the jth time-slot. Consider linear dispersion STBCs, where X canbewrittenintheform[3] We focus on the detection of square (i.e., n=p=Nt), full- c rate(i.e.,k=pn=N2),circulant(wheretheweightmatrices t k A(i)’s are permutation type), non-orthogonal STBCs from X = x(i)A(i), (2) c c c c CDA [1], whose construction for arbitrary number of trans- Xi=1 mitantennasnisgivenbythematrixinEqn.(9.a)givenatthe wherexc(i)istheithcomplexdatasymbol,andA(ci) CNt×p bottomofthispage. In(9.a),ωn = ej2nπ,j = √ 1,anddu,v, ∈ − isitscorrespondingweightmatrix.Thereceivedsignalmodel 0 u,v n 1arethen2datasymbolsfromaQAMalpha- ≤ ≤ − in(1)canbewritteninanequivalentV-BLASTformas bet.Whenδ =t=1,thecodein(9.a)isinformationlossless (ILL),andwhenδ = e√5j andt = ej, itis offull-diversity k y = x(i)(H a(i))+n = H x +n , (3) andinformationlossless(FD-ILL)[1].Highspectralefficien- c c c c c c c c cieswithlargencanbeachievedusingthiscodeconstruction. Xi=1 b e However,sincetheseSTBCsarenon-orthogonal,MLdetec- whereyc CNrp×1 = vec(Yc), Hc CNrp×Ntp = (I tiongetsincreasinglyimpracticalforlargen.Consequently,a ∈ ∈ ⊗ Hc),ac(i) ∈CNtp×1 =vec(A(ci)),nbc ∈CNrp×1 =vec(Nc), keychallengeinrealizingthebenefitsoftheselargeSTBCsin x Ck 1 whose ith entry is the data symbol x(i), and practiceisthatofachievingnear-MLperformanceforlargen c × c Hc ∈CNrp×k whoseithcolumnisHca(ci),i =1,2, ,k. atlowdecodingcomplexities. TheBERperformanceresults ∈ ··· we reportin Sec. IV show thatthe PDA-baseddecodingal- Eachelementofx isanM-PAM/M-QAMsymbol. Lety , c c e b gorithmweproposeinthefollowingsectionessentiallymeets H ,x ,n bedecomposedintorealandimaginarypartsas: c c c thischallenge. e y =y +jy , x =x +jx , c I Q c I Q III. PROPOSED PDA-BASED DECODING n =n +jn , H =H +jH . (4) c I Q c I Q In this section, we present the proposed PDA-based decod- Further, we define Hr R2Nerp×2k, yr R2Nrp×1, xr ing algorithm for square QAM. The applicability of the al- R2k×1,andnr R2Nrp∈×1 as ∈ ∈ gorithm to any rectangular QAM is straightforward. In the ∈ H H real-valuedsystemmodelin(8),eachentryofxbelongstoa Hr =(cid:18) HIQ −HQI (cid:19), yr =[yIT yQT]T, (5) √M-PAMconstellation,whereM isthesizeoftheoriginal xr =[xTI xTQ]T, nr =[nTI nTQ]T. (6) sthqeuaqre=QloAgM(√coMns)teclloantisotintu.eLnettbbit(is0)o,fbt(ih1)e,i·t·h·e,nbt(irqy−x1) odfenxo.te 2 i 2666 PPPninini===−−−000111.ddd210,,,iiitttiii δPPPninini=−==−−010011dddn.10−,,ii1ωω,inniiωttniiiti δδPPPninini==−−=−001101dddnn.0−−,i12ω,,iin2ωωinn22tiiittii ······.··· δδδPPPninini===−−−000111ddd321,,,.iiiωωωnnn(((nnn−−−111)))iiitttiii 3777. (9.a) 6 . . . . . 7 6 . . . . . 7 6664 PPnini==−−0011ddnn−−21,,iittii PPnini==−−0011ddnn−−32,,iiωωnnii ttii PPnini==−−0011ddnn−−43,,iiωωnn22iittii ······ δPPnini==−−0011ddn0−,i1ω,in(ωn−n(n1−)i1t)iiti 7775 Wecanwritethevalueofeachentryofxasalinearcombi- Gaussian,andhenceyisGaussianconditionedonb(j). Since i nationofitsconstituentbitsas thereare2qk 1termsinthedoublesummationin(15),this − Gaussian approximationgets increasinglyaccurate for large q 1 x = − 2jb(j), i=0,1, ,2k 1. (10) Nt(notethatk =Nt2). SinceaGaussiandistributionisfully i i ··· − characterized by its mean and covariance, we evaluate the Xj=0 mean and covarianceof y givenb(j) = +1 and b(j) = 1. i i − Letb∈{+1,−1}2qk×1,definedas Fornotationalsimplicity,letusdefinepji+ =△ P(b(ij) = +1) dwebenco=△atentwhhebr(0itt0re)an·x·sma·sbi0(tqt−ed1)bbi(1t0v)e·c·t·obr(1.qD−e1)fi·n·in·gb(2c0k)−=△1[·2·0·2b1(2qk·−·−·112)iq(−T11,1]), wwLaneehdtecrµpaejinji−E+w(=△.r=△)itPdeEeµ(n(bjioy(i+jt|e)bsa(i=jst)h−e=e1x)+.peI1tc)itsaatcnioldenaµorpjit−hearat=△tpojir+E.N(+yo|wpb(iij,j−f)ro==m1−(.115)),, x = (I⊗c)b, (12) µj+ = h +2k−1 q−1h (2pm+ 1). (16) i qi+j ql+m l − where I is the 2k 2k identity matrix. Using (12), we can Xl=0 mX=0 × m=q(i l)+j rewrite(8)as 6 − y = H(I c) b+n, (13) Similarly,wecanwriteµji− as ′ ⊗ 2k 1 q 1 | △={zH } µji− = −hqi+j + − − hql+m(2plm+−1) whereH ∈ R2Nrp×2qk is the effectivechannelmatrix. Our Xl=0m=qmX(i=0l)+j goalistoobtainb, anestimateofthebvector. Forthis,we 6 − = µj+ 2h . (17) iterativelyupdatethestatisticsofeachbitofb, asdescribed i − qi+j b ainndthheafrodlldoewciinsigosnusbasreectmioand,efoornathceerfitanianlnsutamtibsteircsoftoitegreattibo.ns, Nisegxitv,etnheby2Nrp×2NrpcovariancematrixCji ofygivenb(ij) A. IterativeProcedure b 2k 1 q 1 Cj = E n+ − − h (b(m) 2pm++1) dTahteesa,lgoonreitfhomr eiascihteoraftitvheeicnonnsattiuturee,ntwbhietsr,ea2rqekpestrafotirsmticedupin- i (cid:26)h Xl=0 mX=0 ql+m l − l i m=q(i l)+j eachiteration.Westartthealgorithmbyinitializingtheapri- 6 − 2k 1 q 1 0o,ri·p··ro,b2akb−ili1tieasnadsjP=(b0(ij,)·=··+,q1)−=1.PI(nba(ijn)i=ter−at1i)on=,t0h.5e,s∀taiti=s- hn+Xl=−0 mX−=0 hql+m(b(lm)−2pml ++1)iT(cid:27). (18) tics of the bits are updatedsequentially, i.e., the orderedse- m=q(i l)+j 6 − quenceofupdatesinaniterationis b(00),··· ,b(0q−1),······ , Assuming independenceamong the constituentbits, we can b(0) , b(q−1) . Thestepsinvol(cid:8)vedineachiterationofthe simplifyCj in(18)as 2k 1 ··· 2k 1 i algo−rithmare−der(cid:9)ivedasfollows. 2k 1 q 1 iTshgeivlieknelbihyoodratioofbitb(ij)inaniteration,denotedbyΛ(ij), Cji = σ2I + − − hql+mhTql+m4pml +(1−pml +). (19) Xl=0 mX=0 m=q(i l)+j 6 − P b(j) =+1y Λ(j) =△ i | Using the above mean and covariance expressions, we can i PP(cid:0)(cid:0)by(ijb)(j=) =−1+|y1(cid:1)(cid:1) P b(j) =+1 writethedistributionofygivenb(ij) =±1as = P(cid:0)y|b(ij) = 1(cid:1) P(cid:0)b(ij) = 1(cid:1). (14) P(yb(j) = 1) = e−(y−µji±)T(Cji)−1(y−µji±). (20) (cid:0) | i − (cid:1) (cid:0) i − (cid:1) | i ± (2π)Nrp|Cji|21 △=β(j) △=α(j) | {zi } | {zi } Similarly,P(yb(j) = 1)isgivenby DenotingthetthcolumnofHbyh ,wecanwrite(13)as | i − t 2k 1 q 1 P(yb(j) = 1) = e−(y−µji−)T(Cji)−1(y−µji−). (21) y = hqi+jb(ij)+ − − hql+mb(lm)+n, (15) | i − (2π)Nrp|Cji|12 Xl=0 m=qmX(i=0l)+j Using(20)and(21),βj canbewrittenas 6 − i =△ne P(yb(j) =+1) | {z } βj = | i where n R2Nrp×1 is the interference plus noise vector. i P(y|b(ij) =−1) Tocalceula∈teβi(j),weapproximatethedistributionofntobe =e−((y−µji+)T(Cji)−1(y−µji+)−(y−µji−)T(Cji)−1(y−µji−)). (22) e Usingα(j)andβ(j),Λ(j)iscomputedusing(14).Now,using of the old D matrix. Therefore, using the matrix inversion i i i the valueof Λi(j), the statistics ofb(ij) isupdatedas follows. lemma,thenewD−1 canbeobtainedfromtheoldD−1as From(14),andusingP(b(j) = +1y)+P(b(j) = 1y) = 1, i | i − | D 1h hT D 1 wehave D 1 D 1 − ni+j ni+j − , (28) − ← − − hT D 1h + 1 Λ(j) ni+j − ni+j η P(b(j) =+1y) = i , (23) i | 1+Λ(j) where i and η = 4pj+ 1 pj+ 4pj+ 1 pj+ , (29) i − i − i,old − i,old (cid:0) (cid:1) (cid:0) (cid:1) P(b(j) = 1y) = 1 . (24) where pj+ and pj+ are the new i.e., after the update in i − | 1+Λ(ij) (25) anid (26) ai,nodldold (beforethe(cid:0)update) values, respec- tivel(cid:1)y. Itcanbe(cid:1)seenthatboththenumeratoranddenomina- Asanapproximation,droppingtheconditioningony, tor in the 2nd term on the RHS of (28) can be computedin Λ(j) O(Nr2p2)complexity.Therefore,thecomputationofthenew P(b(j) =+1) i , (25) D 1usingtheoldD 1canbedoneinO(N2p2)complexity. i ≈ 1+Λ(j) − − r i Computationof (Cj) 1: Using (27) and (19), we can write and i − Cj intermsofDas i 1 P(b(j) = 1) . (26) i − ≈ 1+Λ(ij) Cji = D−4pji+(1−pji+)hqi+jhTqi+j. (30) Using the above procedure, we update P(b(j) = +1) and Wecancompute(Cj) 1 fromD 1 atareducedcomplexity i i − − P(b(j) = 1)foralli=0, ,2k 1andj =0, ,q 1 usingthematrixinversionlemma,whichstatesthat i − ··· − ··· − sequentially. This completes one iteration of the algorithm; i.e.,eachiterationinvolvesthecomputationofα(j)andequa- (P+QRS)−1 = P−1 P−1Q(R−1+SP−1Q)−1SP−1. (31) i − tions (16), (17), (19), (22), (14), (25), and (26) for all i,j. SubstitutingP =D,Q =h ,R = The updated valuesof P(b(ij) = +1) and P(b(ij) = −1) in 4pj+(1 pj+2N),rpa×nd2NSrp =2hNTrp×1in(31q)i,+wjege1t×1 (25) and (26) for all i,j are fed back to the next iteration2. − i − i 1×2Nrp qi+j Thealgorithmterminatesafteracertainnumberofsuchiter- D 1h hT D 1 ations. Attheendofthelastiteration,harddecisionismade (Cj) 1 = D 1 − qi+j qi+j − , (32) on the final statistics to obtain the bit estimate b(ij) as +1 if i − − − hTqi+jD−1hqi+j − 4pj+(11 pj+) Λ(j) 1,and 1otherwise.Incodedsystems,Λ(j)’sarefed i − i asisof≥tinputst−othedecoder. bi whichcanbecomputedinO(Nr2p2)complexity. Computationofµji+ andµji−: Computationofβi(j) involves B. ComplexityReduction thecomputationofµji+ andµji− also. From(17),itisclear Themostcomputationallyexpensiveoperationincomputing that µji− can be computed from µji+ with a computational β(j) istheevaluationoftheinverseofthecovariancematrix, overheadofonlyO(Nrp).From(16),itcanbeseenthatcom- Ciji,ofsize2Nrp×2NrpwhichrequiresO(Nr3p3)complex- pthuitsincgomµpjil+exwitoyucladnrbeqeurierdeuOce(dqNasrfpokll)ocwosm.Dpleefixnitey.veHcotowreuvears, ity,whichcanbereducedasfollows.DefinematrixDas 2k 1 q 1 2k−1 q−1 D =△ σ2I+Xl=−0 mX−=0hql+mhTql+m4pml +(1−pml +). (27) u =△ Xl=0 mX=0hql+m(cid:0)2pml +−1(cid:1). (33) Using(16)and(33),wecanwrite Atthestartofthealgorithm,withpj+ andp(j) initializedto i i 0.5foralli,j,Dbecomesσ2I+HHT. µj+ = u+2 1 pj+ h . (34) i − i qi+j Computation of D 1: We note that when the statistics of (cid:0) (cid:1) − ucan becomputediterativelyatO(N p)complexityasfol- b(j) is updated using (25) and (26), the D matrix in (27) r i lows.Whenthestatisticsofb(j)isupdated,wecanobtainthe alsochanges. AstraightforwardinversionofthisupdatedD i newufromtheolduas matrix would require O(N3p3) complexity. However, we r can obtain the D−1 from the previously available D−1 in u u+2 pj+ pj+ h , (35) O(N2p2)complexityasfollows. Sincethestatisticsofonly ← i − i,old ni+j r (cid:0) (cid:1) b(j) is updated, the new D matrix is just a rank one update whose complexity is O(N p). Hence, the computation of i r µji+ in(33)andµji− in(34)needsO(Nrp)complexity. The 2Thecomputation ofthestatistics ofacurrent bitinaniteration makes listing of the proposedPDA algorithmis summarizedin the useofthenewlycomputedstatisticsofitspreviousbits(aspertheordered sequence ofstatistic updates)inthesameiteration andthestatistics ofits Table-Iinthenextpage. nextbitsavailablefromthepreviousiteration. Table-I:ProposedPDA-basedAlgorithmListing 100 LAS, 4x4 STBC Initialization PDA, 4x4 STBC Nt x Nt non−orthogonal ILL STBCs LAS, 8x8 STBC 1.pji+ =pji− =0.5, Λ(ij) =1, 10−1 Nt = Nr, 4−QAM PDA, 8x8 STBC i=0,1, 2k 1,j =0,1, ,q 1. LAS, 16x16 STBC 2.u=0, D−1∀= HHT·+··σ2I−−1. ··· − PSDISAO, 1A6Wx1G6N STBC ` ´ 3.num iter:numberofiterations 10−2 4.κ=1; κistheiterationnumber Rate Statisticsupdateintheκthiteration Error NMuMmSbEe rin oitfi aitle vreactitoonrs f omr L=A 1S0 for PDA BER improves 5.fori=0to2k 1 Bit 10−3 wNitt h= iNncrreasing − 6.forj=0toq 1 − Updateofstatisticsofbitb(j) i 10−4 7.µji+ = u+2`1−pji+´hqi+j 8.µji− = µji+−2hqi+j 9.(Cji)−1 = D−1− hTqi+jDD−−11hhqiq+i+jjh−Tqi4+pjji+D(−11−1pji+) 10−50 2 4 Av6eragFe iRge.ce18ived SNR (1d0B) 12 14 16 10.βij = e−((y−µji+)T(Cji)−1(y−µji+)−(y−µji−)T(Cji)−1(y−µji−)) COMPARISONOFUNCODEDBEROFPDAANDLASALGORITHMSIN 11.pji,+old = pji+, pji,−old = pji− DECODING4×4,8×8,16×16ILLSTBCS.Nt=Nr,4-QAM.# 12.α(j) = pji,+old ITERATIONSm=10FORPDA.MMSEINITIALVECTORFORLAS. i pj− BERimprovesforincreasingSTBCsizes.With4-QAM,PDAandLAS i,old 13.Λ(j) =β(j)α(j) algorithmsachievealmostthesameperformance. i i i 14.pji+ = 1+ΛΛi(ji()j), pji− = 1+Λ1i(j) UpdateofuandD−1 PDAversusLASperformancewith4-QAM:InFig.1,weplot 15.u ← u+2`pji+−pji,+old´hqi+j theuncodedBERofthePDAalgorithmasafunctionofaver- 16.η = 4pj+ 1 pj+ 4pj+ 1 pj+ agereceivedSNRperrxantenna,γ,indecoding4 4,8 8, 17.D−1 ←i D`−1−−iDh−Tqí1+−hjqDi+−ji1,hhoTqlqidi+`+jjD+−−η11i,old´ c1h6a×nn1e6lSstTaBteCisnfforormmaCtioDnAawttihtherNecte=ivNerr(aCnSdIR4-)QaAnMd×i..iP.derf×faedc-t 18.end; Endofforloopstartingatline5 ing are assumed. For the same settings, the performanceof 19.if(κ=num iter)gotoline21 theLASalgorithmin[13]-[15]withMMSEinitialvectorare 20.κ=κ+1,gotoline5 alsoplottedforcomparison.FromFig. 1,itisseenthat 21.b(j) =sgn log(Λ(j)) i ` i ´ the BER performance of PDA algorithm improvesand b i=0,1, ,2k 1, j =0,1, ,q 1 • 22.xi=Pjq∀=−012jˆb(ij)·,··∀i=−0,1,··· ,2k−··1· − ianpcprreoaascehde;se.SgI.S,OperAfoWrmGaNncpeercfloorsmeatoncweitahsinNatbo=utN1rdiBs 23.Tberminate fromSISOAWGNperformanceisachievedat10 3un- − codedBERindecoding16 16STBCfromCDAhaving C. OverallComplexity × 512realdimensions,andthisillustratestheabilityofthe WeneedtocomputeHHT atthestartofthealgorithm. This PDAalgorithmtoachieveexcellentperformanceatlow requiresO(qkN2p2)complexity. So thecomputationofthe complexitiesinlargenon-orthogonalSTBCMIMO. r initialD 1inline2requiresO(qkN2p2)+O(N3p3). Based with4-QAM,PDAandLASalgorithmsachievealmost − r r • onthecomplexityreductioninSec. III-B, thecomplexityin thesameperformance. updatingthestatisticsofoneconstituentbit(lines7to17)is PDAversusLASperformancewith16-QAM:Figure2presents O(N2p2). So, the complexity for the update of all the 2qk r an uncoded BER comparison between PDA and LAS algo- constituentbitsinaniterationisO(qkN2p2). Sincethenum- r rithms for 16 16 STBC from CDA with N = Nr = 16 t ber ofiterationsis fixed, the overallcomplexityof the algo- × and16-QAMunderperfectCSIR and i.i.d fading. It can be rithmisO(qkN2p2)+O(N3p3). ForN = N ,sincethere r r t r seenthatthePDAalgorithmperformsbetteratlowSNRsthan are k symbols per STBC and q bits per symbol, the overall the LAS algorithm. For example, with 8 8 and 16 16 complexityperbitisO(p2N2). × × t STBCs,atlowSNRs(e.g.,<25dBfor16 16STBC),PDA × algorithm performs better by about 1 dB compared to LAS IV. RESULTS ANDDISCUSSIONS algorithmat10 2uncodedBER. − Inthissection,wepresentthesimulateduncoded/codedBER Turbo coded BER performance of PDA: Figure 3 shows the of the PDA algorithm in decoding non-orthogonal STBCs rate-3/4 turbo coded BER of the PDA algorithm under per- from CDA3. Number of iterations in the PDA algorithm is fectCSIRandi.i.dfadingfor12 12ILLSTBCwithN = settom=10inallthesimulations. × t N =12 and 4-QAM, which correspondsto a spectral effi- r 3Our simulation results showed that the performance of FD-ILL (δ = ciencyof18bps/Hz.ThetheoreticalminimumSNRrequired e√5j,t = e−j)andILL(δ = t = 1)STBCswithPDAdecoding were toachieve18bps/Hzspectralefficiencyona N =N =12 almostthesame.Here,wepresenttheperformanceofILLSTBCs. t r 100 NtxNt non−orthogonal ILL STBCs LAS, 4x4 STBC Nt = Nr, 16−QAM PDA, 4x4 STBC Number of iterations m=10 for PDA LAS, 8x8 STBC Perfect CSIR, 18 bps/Hz 10−1 MMSE init. vec. for LAS PLADSA,, 186xx81 S6T SBTCBC 100 Estimated CSIR, Nd = 1, 9 bps/Hz PDA, 16x16 STBC Estimated CSIR, Nd = 8, 16 bps/Hz SISO AWGN Min. SNR reqd. to achieve 18 bps/Hz 10−1 10−2 Rate Bit Error 10−3 Error Rate10−2 142−xQ1A2M IL, Lra SteT−B3C/4, Nturr b=o N Ct o=d 1e2d Bit 10−3 BER improves 10−4 with increasing Nt=Nr 4.3 dB 10−4 . 10−5 5 10 15 20 25 30 35 40 45 Average Received SNR (dB) 10−5 2 4 6 8 10 12 14 Fig.2 Average Received SNR (dB) COMPARISONOFUNCODEDBEROFPDAANDLASALGORITHMSIN Fig.3 DECODING4×4,8×8,16×16ILLSTBCS.Nt=Nr,16-QAM.# TURBOCODEDBEROFTHEPDAALGORITHMINDECODING12×12 ITERATIONSm=10FORPDA.MMSEINITIALVECTORFORLAS. ILLSTBCWITHNt=Nr,4-QAM,RATE-3/4TURBOCODE,18 With16-QAM,PDAperformsbetterthanLASatlowSNRs. BPS/HZANDm=10FORi)PERFECTCSIR,ANDii)ESTIMATEDCSIR USING2ITERATIONSBETWEENPDADECODING/CHANNEL ESTIMATION.WithperfectCSIR,PDAperformsclosetowithinabout5dB MIMOchannelwith perfectCSIRand i.i.dfadingis4.3dB fromcapacity.WithestimatedCSIR,performanceapproachestothatwith (obtainedthroughsimulationoftheergodiccapacityformula perfectCSIRwithincreasingcoherencetimes. [3]). From Fig. 3, it is seen that the PDA algorithmis able toachieveverticalfallincodedBERwithinabout5dBfrom the theoretical minimum SNR, which is a good nearness to (cid:0)(cid:1)1M Pa(cid:0)(cid:1)itlroix(cid:0)(cid:1)t (cid:0)(cid:1)(cid:0)(cid:1) Pilot capacityperformance. (cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1) Data STBCs (cid:0)(cid:1)(cid:0)(cid:1)M(cid:0)(cid:1)atr(cid:0)(cid:1)ix(cid:0)(cid:1) Data STBCs (cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1) Nd (cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1) Space (cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1) (cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1) IpteerrfaetcivteCPSIDRAaDsseucmopdtiinogn/CbhyacnonneslidEesrtiinmgataiotnra:inWinegrbealasexdthite- (cid:0)(cid:0)(cid:1)(cid:1) Nt(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1) (cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:0)(cid:1) erativePDAdecoding/channelestimationscheme. Transmis- Nt (cid:0)(cid:1) Nd(cid:2)Nt (cid:0)(cid:1)(cid:0)(cid:1)(cid:0)(cid:1)time sion is carried out in frames, where one N N pilot ma- 1 Frame t t × trix (for training purposes) followed by N data STBC ma- d Fig.4 tricesaresentineachframeasshowninFig. 4. One frame TRANSMISSIONSCHEMEWITHONEPILOTMATRIXFOLLOWEDBYNd length, T, (taken to be the channel coherence time) is T = DATASTBCMATRICESINEACHFRAME. (N +1)N channeluses.Theproposedschemeworksasfol- d t lows[16]:i)obtainanMMSEestimateofthechannelmatrix during the pilot phase, ii) use the estimated channel matrix todecodethedataSTBCmatricesusingPDAalgorithm,and Fig. 5, we plot the BER of the PDA algorithm in decoding iii) iterate between channel estimation and PDA decoding 12 12STBC fromCDA withperfectCSIR ini)i.i.d. fad- × foracertainnumberoftimes. For12 12STBCfromCDA, ing,andii)correlatedMIMOfadingmodelin[18]. Itisseen in addition to perfectCSIR performa×nce,Fig. 3 also shows that, comparedto i.i.d fading, there is a loss in diversity or- theperformancewithCSIRestimatedusingtheproposedit- derinspatialcorrelationforNt = Nr = 12; further,use of erativedecoding/channelestimationschemeforNd = 1and morerxantennas(Nr = 18,Nt = 12)alleviatesthislossin N = 8. 2 iterations between decoding and channel esti- performance.Wecandecodeperfectcodes[19],[20]oflarge d mationare used. With N = 8 (which correspondsto large dimensionsalsousingtheproposedPDAalgorithm. d coherencetimes,i.e.,slowfading)theBERandbps/Hzwith REFERENCES estimatedCSIRgetclosertothosewithperfectCSIR. [1] B.A.Sethuraman,B.S.Rajan,andV.Shashidhar,“Full-diversityhigh- EffectofSpatialMIMO Correlation: InFigs. 1to 3, weas- ratespace-timeblockcodesfromdivisionalgebras,” IEEETrans.In- sumedi.i.dfading. Butspatialcorrelationattransmit/receive form.Theory,vol.49,no.10,pp.2596-2616,October2003. [2] J.-C.Belfiore, G.Rekaya, andE.Viterbo,“Thegoldencode: A2× antennasand the structure of scattering and propagationen- 2 full-rate space-time code with non-vanishing determinants,” IEEE vironmentcanaffecttherankstructureoftheMIMOchannel Trans.Inform.Theory,vol.51,no.4,pp.1432-1436,April2005. resultingindegradedperformance[17]. Werelaxedthei.i.d. [3] H.Jafarkhani, Space-TimeCoding: TheoryandPractice, Cambridge UniversityPress,2005. fadingassumptionbyconsideringthecorrelatedMIMOchan- [4] http://www.ruckuswireless.com/technology/beamflex.php nelmodelin[18],whichtakesintoaccountcarrierfrequency [5] J.Luo,K.Pattipati, P.Willett, andF.Hasegawa, “Near-optimalmul- tiuserdetectioninsynchronousCDMAusingprobabilisticdataassoci- (fc),spacingbetweenantennaelements(dt,dr),distancebe- ation,”IEEECommun.Letters,vol.5,no.9,pp.361-363,2001. tweentxandrxantennas(R),andscatteringenvironment.In [6] Y. Huang and J. Zhang, “Generalized probabilistic data association multiuserdetector,”Proc.ISIT’2004,June-July2004. Uncoded, Nr = Nt = 12 (LAS) [20] P.Elia,B.A.Sethuraman,andP.V.Kumar,“Perfectspace-timecodes Uncoded, Nr = 18, Nt = 12 (LAS) foranynumberofantennas,”IEEETrans.Inform.Theory,vol.53,no. Uncoded, Nr = Nt = 12 (PDA) 11,pp.3853-3868,November2007. Uncoded, Nr = 18, Nt = 12 (PDA) 101 Turbo coded, Rate−3/4, Nr = Nt = 12 (LAS) Turbo coded, Rate−3/4, Nr = 18, Nt = 12 (LAS) Turbo coded, Rate−3/4, Nr = Nt = 12 (PDA) 100 Turbo coded, Rate−3/4, Nr = 18, Nt = 12 (PDA) Min. SNR for capacity for 36 bps/Hz (Nr = Nt = 12) 10−1 Min. SNR for capacity for 36 bps/Hz (Nr = 18, Nt = 12) Rate Uncoded, Nr = Nt = 12 (PDA), i.i.d. (i.e., no correlation) Error 10−2 12x12 ILL STBC Bit 16−QAM 10−3 Nd = 72 cm, d = d, rr t r 10−4 R = 9.4 dB R = 12.6 dB DRθt r === θ5Dr0 t=0 = m9 20, 0 Sd m=e g.3 . 0 , f = 5 G H z. N N 10−5 S S 10 15 20 25 30 35 40 45 50 Average Received SNR (dB) Fig.5 EFFECTOFSPATIALCORRELATIONONTHEPERFORMANCEOFPDAIN DECODING12×12STBCFROMCDA.Nt=12,Nr =12,18, 16-QAM,RATE-3/4TURBOCODE,36BPS/HZ.CORRELATEDCHANNEL PARAMETERS:fc=5GHZ,R=500M,S=30,Dt=Dr =20M, θt=θr =90◦,Nrdr =72CM,dt=dr.Correlationdegrades performance;usingNr >Ntalleviatesthethisperformanceloss. [7] X.Wang,H.V.Poor,“Iterative(Turbo)softinterferencecancellation anddecodingforcodedCDMA,”IEEETrans.onCommun.,vol.47, no.7,pp.1046-1061,July1999. [8] P. H. Tan and L. K. Rasmussen, “Asymptotically optimal nonlinear MMSEmultiuserdetectionbasedonmultivariateGaussianapproxima- tion,”IEEETrans.onCommun.,vol.54,no.8,pp.1427-1438,August 2006. [9] Y.Yin,Y.Huang,andJ.Zhang,“Turboequalizationusingprobabilistic data association,” Proc. IEEE GLOBECOM’2004, vol. 4, pp. 2535- 2539,November-December2004. [10] D.Pham,K.R.Pattipati,P.K.Willet,andJ.Luo,“Ageneralizedproba- bilisticdataassociationdetectorformultiantennasystems,”IEEECom- mun.Letters,vol.8,no.4,pp.205-207,2004. [11] G. Latsoudas and N. D. Sidiropoulos, “A hybrid probabilistic data association-sphere decoding for multiple-input-multiple-output systems,”IEEESig.Proc.Letters,vol.12,no.4,pp.309-312,April2005. [12] Y.Jiaetal,C.M.Vithanage,C.Andrieu,andR.J.Piechocki,“Proba- bilisticdataassociationforsymboldetectioninMIMOSystems,”Elec- tronicLetters,vol.42,no.1,5Jan.2006. [13] K.VishnuVardhan,SaifK.Mohammed,A.Chockalingam,B.Sundar Rajan,“Alow-complexitydetectorforlargeMIMOsystemsandmul- ticarrierCDMAsystems,”IEEEJSACSpl.Iss.onMultiuserDetection forAdv.Commun.SystemsandNetworks,vol.26,no.3,pp.473-485, April2008. [14] SaifK.Mohammed,A.Chockalingam,andB.SundarRajan“Alow- complexitynear-MLperformanceachievingalgorithmforlargeMIMO detection,”Proc.IEEEISIT’2008,Toronto,Canada,July2008. [15] SaifK.Mohammed,A.Chockalingam, andB.SundarRajan,“High- ratespace-time codedlargeMIMOsystems: Low-complexity detec- tionandperformance,”Proc.IEEEGLOBECOM’2008,NewOrleans, USA,Dec.2008. [16] Ahmed Zaki, Saif K. Mohammed, A. Chockalingam, and B. Sun- dar Rajan, “A training-based iterative detection/channel estimation scheme for large non-orthogonal STBC MIMO systems,” ac- cepted inIEEEICC’2009, Dresden, Germany, April 2009. Alsosee arXiv:0809.2446v2[cs.IT]4Oct2008. [17] D.Shiu,G.J.Foschini,M.J.Gans,andJ.M.Khan,“Fadingcorrelation anditseffectonthecapacityofmulti-antennasystems,”IEEETrans. onCommun.,vol.48,pp.502-513,March2000. [18] D.Gesbert, H.Bo¨lcskei, D.A.Gore,A.J.Paulraj, “OutdoorMIMO wirelesschannels: Modelsandperformanceprediction,”IEEETrans. onCommun.,vol.50,pp.1926-1934,December2002. [19] F.E.Oggier,G.Rekaya,J.-C.Belfiore,andE.Viterbo,“Perfectspace- time block codes,” IEEE Trans. on Inform. Theory, vol. 52, no. 9, September2006.