On Wiener Phase Noise Channels at High Signal-to-Noise Ratio Hassan Ghozlan Gerhard Kramer Department of Electrical Engineering Institute for Communications Engineering University of Southern California Technische Universita¨t Mu¨nchen Los Angeles, CA 90089 USA 80333 Munich, Germany [email protected] [email protected] Abstract—Considerawaveformchannelwherethetransmitted given in Sec. II. One application for such a channel model signal is corrupted by Wiener phase noise and additive white is optical communication under linear propagation, in which Gaussian noise (AWGN). A discrete-time channel model that the laser phase noise is a continuous-time Wiener process takes into account the effect of filtering on the phase noise 3 (see [4] and references therein). Since the sampling of a is developed. The model is based on a multi-sample receiver 1 which, at high Signal-to-Noise Ratio (SNR), achieves a rate that continuous-timeWienerprocessyieldsa discrete-timeWiener 0 grows logarithmically with the SNR if the number of samples process (Gaussian random walk), it is tempting to use the 2 per symbol grows with the square-root of the SNR. Moreover, model(1)with Θ asadiscrete-timeWienerprocess,butthis n the pre-log factor is at least 1/2 in this case. { } ignoresthe effectof filteringpriorto sampling.Itwaspointed a J I. INTRODUCTION out in [4] that “even coherent systems relying on amplitude 9 modulation (phase noise is obviously a problem in systems Phase noise is an impairment that often arises in coherent 2 employingphasemodulation)willsuffersomedegradationdue communicationsystems. Differentmodels are adoptedfor the to the presence of phase noise”. This is because the filtering ] phase noise process depending on the application. In [1], convertsphasefluctuationstoamplitudevariations.Itisworth T Katz and Shamai studied a discrete-time model of a phase mentioningthatfiltering is necessary before samplingto limit I noise channel (partially coherent channel) in which the phase . the variance of the noise samples. s noise is independent and identically distributed (i.i.d.) with c Themodel(1) thusdoesnotfitthechannel(2) andit isnot [ a Tikhonov distribution. This model is reasonable for the obvious whether a pre-log 1/2 is achievable. The model that residual phase error of a phase-tracking scheme, such as a 1 takes the effect of (matched) filtering into account is Phase-Locked Loop (PLL). In [2], the authors investigate v 3 white(Gaussian)phasenoiseforwhichtheyobserveda“spec- Y =X H +N (3) k k k k 2 tral loss” phenomenon. The white phase noise approximates 9 the nonlinear effect of cross-phase modulation (XPM) in a where H is a fading process. The model (3) falls in the 6 Wavelength-Division Multiplexing (WDM) optical communi- { k} . class of non-coherent fading channels, i.e., the transmitter 1 cation system. Lapidoth studied in [3] a discrete-time phase and receiver have knowledge of the distribution of the fading 0 noise channel process H , but have no knowledge of its realization. For 3 k { } 1 Y =X ejΘk +N (1) suchchannels,LapidothandMosershowedin[5]that,athigh k k k : SNR,thecapacitygrowsdouble-logarithmicallywiththeSNR, v Xi atΘhkighisStNhRe,pwhahseereno{iYske}pirsoctehsesoauntdput,N{kXki}s itshethaedidniptiuvte, whReanththeertphraonceusssin{gHakm}aitschsetadtifiolntearrya,nedrgsoamdipc,lianngditrseoguutlpaur.tat { } { } r noise. He considered both memoryless phase noise and phase the symbol rate, we use a multi-sample receiver, i.e., a filter a noise with memory. He showed that the capacity grows whose output is sampled many times per symbol. We show logarithmicallywith theSNR with a pre-logfactor1/2,where that this receiver achieves a rate that grows logarithmically the pre-log is due to amplitude modulation only. The phase with the SNR if the number of samples per symbol grows modulation contributes a bounded number of bits only. with the square-root of the SNR. Furthermore, we show that In this paper, we study a communication system in which a pre-log of 1/2 is achievable through amplitude modulation. the transmitted waveform is corrupted by Wiener phase noise In this paper,we study onlyrectangularpulses butwe believe and AWGN. The model is that the results hold qualitatively for other pulses. r(t)=x(t) exp(jθ(t))+n(t), for t R (2) The paper is organized as follows. The continuous-time ∈ modelisdescribedinSec.IIandthediscretizationisdescribed where x(t) and r(t) are the transmitted and received signals, in Sec. III. We derive a lower bound on the capacity in Sec. respectively, while n(t) and θ(t) are the additive and phase IV and discuss our result in Sec. V. Finally, we conclude the noise, respectively. A detailed description of the model is paper with Sec. VI. II. CONTINUOUS-TIMEMODEL The received waveform r(t) is filtered using an integrator over a sample interval to give the output signal We use the following notation: j = √ 1 , ∗ denotes the − t complex conjugate, δ is the Dirac delta function, is the D ⌈·⌉ y(t)= r(τ) dτ. (13) ceiling operator, [] is the real part of a complex number, ℜ· Zt−∆ log(·) is the natural logarithm and we use X1k to denote the wherey(t)isarealizationofY(t).TheoutputY(t)issampled k-tuple (X1,X2,...,Xk). Suppose the transmit-waveform is every ∆ seconds which yields the discrete-time model: x(t) and the receiver observes Y =X ∆ ejΘk F +N (14) k ⌈k/L⌉ k k r(t)=x(t) exp(jθ(t))+n(t) (4) for k =1,...,nL, where Y Y(k∆), Θ Θ((k 1)∆), k k ≡ ≡ − where n(t) is a realization of a white circularly-symmetric 1 k∆ F ej(Θ(τ)−Θk) dτ (15) complex Gaussian process N(t) with k ≡ ∆ Z(k−1)∆ E[N(t)]=0 and k∆ E[N(t1)N∗(t2)]=σN2 δD(t2−t1). (5) Nk N(τ) dτ. (16) ≡ Z(k−1)∆ The phase θ(t) is a realization of a Wiener process Θ(t): The process N is an i.i.d. circularly-symmetric complex k t Gaussian pro{cess}with mean 0 and E[N 2] = σ2 ∆ while Θ(t)=Θ(0)+ W(τ)dτ (6) the process Θ is the discrete-time W|iekn|er proceNss: k Z0 { } Θ =Θ +W (17) whereΘ(0)isuniformon[ π,π)andW(t)isarealGaussian k k−1 k − process with where Θ is uniform on [ π,π) and W is an i.i.d. real 1 k Gaussian process with mea−n 0 and E[{W }2] = 2πβ∆. The E[W(t)]=0 (7) process F is an i.i.d. process. More|ovekr|, F and W k k k E[W(t )W(t )]=2πβ δ (t t ). (8) are indep{end}ent of N but not independen{t of}each o{ther.} 1 2 D 2 1 k − { } Equations (9) – (11) imply the power constraint The processes N(t) and Θ(t) are independent of each other n and independent of the input as well. N0 = 2σN2 is the 1 E[Xm 2] P = Tsymbol. (18) single-sided power spectral density of the additive noise. n | | ≤ P m=1 X The parameter β is called the full-width at half-maximum IV. LOWERBOUND (FWHM), because the power spectral density of ejΘ(t) has For the kth input symbol X we have L outputs, so it is a Lorentzian shape, for which β is the full-width at half the k convenient to group the L samples per symbol in one vector maximum.The transmittedwaveformsmust satisfy the power anddefineY (Y ,Y ,...,Y ).We constraint further definek X≡ (k−X1)L+an1d X(k−1)L∠+2X. We d(ke−co1)mL+poLse the A Φ ≡| | ≡ 1 T mutual information using the chain rule into two parts: E X(t)2dt (9) "T Z0 | | #≤P I(X1n;Yn1)=I(XAn,1;Yn1)+I(XΦn,1;Yn1|XAn,1). (19) where T is the transmission interval. The first term represents the contribution of the amplitude modulation while the second term represents the contribution III. DISCRETE-TIMEMODEL of the phase modulation. We focus on the amplitude contri- bution and use I(Xn ;Yn Xn ) 0 to obtain the lower Let (x ,x ,...,x ) be the codeword sent by the trans- Φ,1 1| A,1 ≥ 1 2 n bound mitter. Suppose the transmitter uses a unit-energy rectangular pulse, i.e., the waveform sent by the transmitter is I(Xn;Yn) I(Xn ;Yn). (20) 1 1 ≥ A,1 1 n Suppose that XAn,1 is i.i.d. Hence, we have x(t)= xm g(t (m 1)Tsymbol) (10) n mX=1 − − I(XAn,1;Yn1)(=a) I(XA,k;Yn1|XAk−,11) where Tsymbol is the symbol interval and kX=1 n (=b) H(X ) H(X Yn Xk−1) g(t) 1/Tsymbol, 0≤t<Tsymbol, (11) A,k − A,k| 1 A,1 ≡ 0, otherwise. Xk=1 (cid:26) p n (c) Let L be the number of samples per symbol (L 1) and I(XA,k;Yk) ≥ ≥ define the sample interval ∆ as k=1 X n (d) ∆= Tsymbol. (12) ≥ I(XA,k;Vk) (21) L k=1 X where By using (24), we have L V X2∆ σ2 2 Vk = Y(k−1)L+ℓ 2. (22) − A − N Xℓ=1| | (cid:0)= XA2∆(G−1)(cid:1)+2XA∆Z1+(Z0−σN2 ) 2 Step (a) follows from the chain rule of mutual information, =X4∆2(G 1)2+4X2∆2Z2+(Z σ2 )2 (cid:0) A − A 1 0− N(cid:1) (b) follows from the independence of XA,1,XA,2,...,XA,n, +4X3∆2(G 1)Z +2X2∆(G 1)(Z σ2 ) (c) holdsbecause conditioningdoesnot increase entropy,and A − 1 A − 0− N (d)followsfromthedataprocessinginequality.Since XAn,1 is +4XA∆Z1(Z0−σN2 ) (33) identically distributed, then Vn is also identically distributed and hence, using the second-orderstatistics (28), we have 1 and we have, for k 2, ≥ E (V −XA2∆−σN2 )2 I(XA,k;Vk)=I(XA,1;V1). (23) 4X2∆2σ2 (cid:20) A N (cid:21) In the rest of this section, we consider only one symbol 1 1 σ2 1 = P E (G 1)2 + E[G]+ NE (34) (k = 1) and drop the time index. Moreover, we assume that 4σ2 − 2 4∆ X2 N (cid:20) A(cid:21) T = 1 for simplicity. By combining (22) and (14), we (cid:2) (cid:3) symbol where we also used have E[(G 1)Z ]=0. (35) L − 1 V = XA2∆2|Fℓ|2+2XA∆ℜ[ejΦXejΘℓFℓNℓ∗]+|Nℓ|2 Substituting (34) into (32) and using E[G] 1 yield ≤ ℓ=1 =XXA2∆(cid:0)G+2XA∆Z1+Z0 (24(cid:1)) E[−log(QV|XA(V|XA))] 1 1 where G, Z and Z are defined as log∆+ log(4πσ2 )+ E[log(X2)] 1 0 ≤ 2 N 2 A G 1 L F 2 (25) + P E (G 1)2 + 1 + σN2 E 1 . (36) ≡ L | ℓ| 4σ2 − 2 4∆ X2 ℓ=1 N (cid:20) A(cid:21) LX It is convenient t(cid:2)o define X(cid:3)P ≡ XA2. We choose the input Z [ejΦXejΘℓF N∗] (26) distribution 1 ≡ ℜ ℓ ℓ ℓ=1 1 exp xP−Pmin , x P XL PXP(xP)= 0λ, − λ otPhe≥rwismein (37) Z N 2. (27) (cid:26) (cid:0) (cid:1) 0 ℓ ≡ | | where 0<P <P and λ=P P , so that Xℓ=1 min − min The second-order statistics of Z1 and Z0 are E[XP]=E[XA2]=P. (38) E[Z1]=0 Var[Z1]=E[G]σN2 /2 It follows from (30) and (37) that E[Z ]=σ2 Var[Z ]=σ4 ∆ (28) 0 N 0 N ∞ 1 x P E[Z1(Z0−E[Z0])]=0. QV(v)= λexp − P −λ min QV|XP(v|xP) dxP By using the Auxiliary-Channel Lower Bound Theorem in ZPmin (cid:18) (cid:19) exp(P /λ) F (v) (39) [6, Sec. VI], we have ≤ min V I(X ;V) E[ logQ (V)]+E[logQ (V X )] (29) where A ≥ − V V|XA | A Q (v x )=Q (v √x ) (40) where QV|XA(v|xA) is an arbitrary auxiliary channel and V|XP | P V|XA | P and QV(v)=Z PXA(xA)QV|XA(v|xA)dxA (30) FV(v)≡ ∞ λ1 exp −xλP QV|XP(v|xP)dxP. (41) where PXA(·) is the true input distribution, i.e., QV(·) is Z0 (cid:16) (cid:17) the output distribution obtained by connecting the true input The inequality (39) follows from the non-negativity of the sourcetothe auxiliarychannel.E[] istheexpectationaccord- integrand. By combining (31), (40), (41) and making the · ing to the true distribution. We choose the auxiliary channel change of variables x=x ∆, we have P 1 (v x2∆ σ2 )2 F (v) Q (v x )= exp − A − N . V V|XA | A 4πx2A∆2σN2 (cid:18)− 4x2A∆2σN2 (31(cid:19)) = ∞ e−x/(λ∆) 1 exp (v−x−σN2 )2 dx p Z0 λ∆ 4πx∆σN2 (cid:18)− 4x∆σN2 (cid:19) It follows that 1 = p E[−log(QV|XA(V|XA))]=E (V −4XX2A2∆∆2−σ2σN2 )2 λ∆(λ∆+4∆σN2 )× (cid:20) A N (cid:21) p 2 4∆σ2 1 1 exp v σ2 v σ2 1+ N (42) +log∆+ 2log(4πσN2 )+ 2E[log(XA2)]. (32) 4∆σN2 " − N −| − N|r λ∆ #! where we used equation (140) in Appendix A of [7]: where SNR=P/σ2 . Suppose L grows with SNR such that N ∞ 1 x 1 (u x)2 exp exp − dx L= β√SNR . (51) a −a √πbx − bx Z0 (cid:16) (cid:17) (cid:18) (cid:19) l m 1 2 b Since ∆=1/L, then we have = exp u u 1+ . (43) a(a+b) b " −| |r a#! 1 lim SNR∆= and lim SNR∆2 = (52) Therefore,pwe have SNR→∞ ∞ SNR→∞ β2 E[ log(F (V))] which implies V − 1 = log(∆2(λ2+4λσ2 )) 1 1 π2 2 N lim I(XA;V) logSNR 2 log(8π) SNR→∞ − 2 ≥− − 2 − 36 1 4σ2 (53) E[V σ2 ] E[V σ2 ] 1+ N − 2∆σN2 " − N − | − N|r λ # because (see Appendix) (a) 1 4σ2 log(∆λ)+ E[V σ2 ] 1+ N 1 E[(G 1)2] (πβ)2 ≥ 2σN2 ∆ − N "r λ − # ∆li→m0 ∆−2 = 9 . (54) (b) log(∆λ) (44) By combining (20), (21), (23) and (53), we have ≥ where(a)holdsbecausethelogarithmicfunctionismonotonic 1 1 1 π2 and E[|·|]≥E[·], and (b) holds because SNlRim→∞nI(X1n;Yn1)− 2logSNR≥−2− 2log(8π)− 36. E[V σ2 ] (55) − N =E[X2]∆E[G]+2E[X ]∆E[Z ]+E[Z ] σ2 This shows that the information rate grows logarithmically at A A 1 0 − N =P∆E[G] 0. (45) high SNR with a pre-log factor of 1/2. ≥ The monotonicity of the logarithmic function and (39) yield V. DISCUSSION E[ log(Q (V))] E log ePmin/λF (V) There is a wide literature on the design of receiversfor the V V − ≥ − channel model (1) with a discrete-time Wiener phase noise, h (cid:16) P (cid:17)i log∆+logλ min (46) e.g.,see[8],[9],[10]andreferencestherein.Onemaywant to ≥ − λ makeuseofthesedesigns,whichraisesthefollowingquestion: where the last inequality follows from (44). It follows from “when is it justified to approximate the non-coherent fading (29), (36) and (46) that model(3) with the discrete-time phase noise model(1)?” Our I(X ;V) logλ Pmin 1log(4πσ2 ) 1E[log(X2)] result suggests that this approximation may be justified when A ≥ − λ − 2 N − 2 A the phase variation is small over one symbol interval (i.e., P 1 σ2 1 when the phase noise linewidth is small compared to the E (G 1)2 NE . (47) − 4σ2 − − 2 − 4∆ X2 symbolrate) and also the SNR is low to moderate.It mustbe N (cid:20) A(cid:21) (cid:2) (cid:3) noted that the SNR at which the high-SNR asymptotics start If P =P/2, then λ=P P =P/2 and we have min min − to manifest themselves depends on the application. 1 1 2 E = (48) We remark that the authors of [11] treated on-off keying (cid:20)XP(cid:21)≤ Pmin P transmissioninthepresenceofWienerphasenoisebyusinga and double-filteringreceiver,whichiscomposedofanintermediate ∞ 1 frequency(IF)filter,followedbyanenvelopedetector(square- E[log(XP)]= e−(x−λ)/λlog(x)dx law device) and then a post-detection filter. They showed that λ Zλ by optimizing the IF receiver bandwidth the double-filtering ∞ (=a)logλ+ e−(u−1)log(u)du receiver outperforms the single-filtering (matched filter) re- Z1 ceiver.Furthermore,theyshowedviacomputersimulationthat (b) the optimum IF bandwidth increases with the SNR. This is logλ+1 (49) ≤ similar to our result in the sense that we require the number where (a) follows by the change of variables u = x/λ, and of samples per symbol to increase with the SNR in order to (b) holds because log(u) u 1 for all u > 0. Substituting achieve a rate that grows logarithmically with the SNR. ≤ − into (47), we obtain Finally,weremarkthatwehavenotcomputedthecontribu- 1 1 1 tion of phase modulation to the information rate. We believe I(X ;V) logSNR 2 log(8π) A − 2 ≥− − 2 − 2SNR∆ thatusingthemulti-samplereceiveritispossibletoachievean 1SNR E (G 1)2 (50) overallpre-logthat is largerthan 1/2. This matter is currently − 4 − under investigation. (cid:2) (cid:3) VI. CONCLUSION We also have, using M =4 and c=( 1,1, 1,1)T in (57), − − We studied a communication system impaired by Wiener E[F 4] (61) 1 | | phasenoiseandAWGN.Adiscrete-timechannelmodelbased 783 784a+a4+540loga+240aloga+144(loga)2 on filtering and oversampling is considered. The model ac- = − . 18(loga)4 counts for the filtering effects on the phase noise. It is shown thatat highSNR the multi-samplereceiverachievesrates that Computingtheintegralsistediousbutstraightforward.Finally, grow logarithmically with at least a 1/2 pre-log factor if the it follows from (56), and (59) – (61) that number of samples per symbol grows with the square-rootof E[(G 1)2] E[(G 1)2] (πβ)2 the SNR. lim − =(πβ)2 lim − = . ∆→0 ∆2 a→1 (loga)2 9 (62) ACKNOWLEDGMENT REFERENCES H.GhozlanwassupportedbyaUSCAnnenbergFellowship and NSF Grant CCF-09-05235. G. Kramer was supported by [1] M. Katz and S. Shamai. On the capacity-achieving distribution of the discrete-timenoncoherentandpartiallycoherentAWGNchannels.IEEE an Alexander von Humboldt Professorship endowed by the Trans.Inf.Theory,50(10):2257 –2270,Oct.2004. German Federal Ministry of Education and Research. [2] B. Goebel, R.Essiambre, G.Kramer, P.J.Winzer, and N. Hanik. Cal- culation ofmutualinformation forpartially coherentgaussianchannels with applications to fiber optics. IEEE Trans. Inf. Theory, 57(9):5720 APPENDIX –5736,Sep.2011. We discuss the limit in (54). We express E[(G 1)2] as [3] A. Lapidoth. Capacity bounds via duality: A phase noise example. In − Proc.2ndAsian-Euro.WorkshoponInf.Theory,pages58–61,2002. [4] G.J. Foschini and G. Vannucci. Characterizing filtered light waves E[(G 1)2]=Var(G)+(E[G] 1)2 corrupted byphasenoise. IEEETrans.Inf. Theory, 34(6):1437 –1448, − − = 1Var(F 2)+ E[F 2] 1 2 (56) [5] NAo.vL.a1p9i8d8o.th and S.M. Moser. Capacity bounds via duality with 1 1 L | | | | − applications tomultiple-antenna systemsonflat-fading channels. IEEE (cid:0) (cid:1) Trans.Inf.Theory,49(10):2426 –2467,Oct.2003. where the last equality follows from the definition of G in [6] D.M.Arnold,H.-A.Loeliger,P.O.Vontobel,A.Kavcic,andWeiZeng. (25) and because Fk is i.i.d. Simulation-based computation of information rates for channels with Next, we outli{ne t}he steps for computing E[F 4] and memory. IEEETrans.Inf.Theory,52(8):3498 –3508,Aug.2006. 1 E[F 2]. Let M be a positive integer, c = (c ,...|,c| )T be [7] S.M.Moser. Capacityresultsofanopticalintensitychannelwithinput- a c|on1|stant vector, t = (t1,...,tM)T be a no1n-negatMive real dJaenp.en2d0e1n2t. gaussian noise. IEEE Trans. Inf. Theory, 58(1):207 –223, vectorandΘ(t)=(Θ(t ) Θ(0),...,Θ(t ) Θ(0))T where [8] A. Barbieri, G. Colavolpe, and G. Caire. Joint iterative detection and 1 M − − decoding in the presence of phase noise and frequency offset. IEEE Θ(t) is defined in (6). We have Trans.Commun.,55(1):171–179,Jan.2007. [9] A.SpalvieriandL.Barletta. Pilot-aidedcarrierrecoveryinthepresence 1 ∆ ofphasenoise. IEEETrans.Commun.,59(7):1966 –1974,July2011. E exp(jcTΘ(t))dt [10] A. Barbieri and G. Colavolpe. On the information rate and repeat- "∆M Z ···Z0 # accumulate code design for phase noise channels. IEEE Trans. Com- (=a) 1 ∆E exp(jcTΘ(t)) dt [11] Gm.uJn..F,o5s9c(h1i2n)i:,3L22.J3.G–3re2e2n8s,teDine,c.an2d01G1..Vannucci. Noncoherentdetection ∆M ··· of coherent lightwave signals corrupted by phase noise. IEEE Trans. Z Z0 (=b) 1 ∆ex(cid:2)p 1cTΣ(t)c(cid:3) dt Commun.,36(3):306 –314,Mar.1988. ∆M ··· −2 Z Z0 (cid:18) (cid:19) (=c) 1exp ∆cTΣ(t)c du (57) ··· −2 Z Z0 (cid:18) (cid:19) where dt=dt ...dt and Σ(t) is the covariance matrix of M 1 Θ(t) whose entries are given by Σ (t)=2πβmin t ,t , for i,j =1,...,M. (58) ij i j { } Step (a) followsfromthe linearity of expectation, (b)follows by using the characteristic function of a Gaussian random vector, and (c) follows from the transformation of variables t=u ∆ . We define a=e−πβ∆ (59) and use M =2 and c=( 1,1)T in (57) to compute − a 1 loga E[F 2]=2 − − . (60) | 1| (loga)2