arXiv:cs/0701099v1 [cs.IT] 16 Jan 2007 IE E 1 E 2 3 T R A N S A C T IO N S O N IN F O R M A T IO N T H E O R Y 1 IEEETRANSACTIONSONINFORMATIONTHEORY 2 On the Feedback Capacity of Power Constrained Gaussian Noise Channels with Memory Shaohua Yang, Member, IEEE, Aleksandar Kavcˇic´, Senior Member, IEEE, and Sekhar Tatikonda, Member, IEEE Abstract—For a stationary additive Gaussian-noise channel to explicitly characterize the optimal (or feedback-capacity- with a rational noise power spectrum of a finite-order L, achieving) signal and thus compute the feedback channel we derive two new results for the feedback capacity under capacity. Cover and Pombra [7] defined the n-block feedback an average channel input power constraint. First, we show capacity and formulated the n-block feedback capacity com- that a very simple feedback-dependent Gauss-Markov source achieves the feedback capacity, and that Kalman-Bucy filtering putationasan optimizationproblem.Upperandlowerbounds is optimal for processing the feedback. Based on these results, on the feedback capacity were established. For example, it we develop a new method for optimizing the channel inputs for is known that the feedback capacity can never exceed the achieving the Cover-Pombra block-length-n feedback capacity capacity without feedback (feed-forward capacity) by more byusingadynamicprogrammingapproachthatdecomposesthe computation into n sequentially identical optimization problems than0.5 bit/channel-use[7], or thatthe feedbackcapacity can where each stage involves optimizing O(L2) variables. Second, never be more than double the feed-forwardcapacity [8], [9]. we derive the explicit maximal information rate for stationary Somewhattighterupperboundscanbecomputedbynumerical feedback-dependentsources. In general, evaluating the maximal techniques as in [10]. Butman [11] devised a feedback code information rate for stationary sources requires solving only a that recursively transmits the message through the channel, fewequationsbysimplenon-linearprogramming.Forfirst-order which achieves a higher rate than the feed-forward channel autoregressive and/or moving average (ARMA) noise channels, thisoptimizationadmitsaclosedformmaximalinformationrate capacity for certain linear Gaussian channels. For first-order formula.Themaximalinformationrateforstationarysourcesisa autoregressive (AR) noise channels, the closed-form infor- lowerboundonthefeedbackcapacity,anditequalsthefeedback mation rate obtained by Butman [11] is very close to the capacity if the long-standing conjecture, that stationary sources tightest upper bound and thus has been conjectured to be achieve the feedback capacity, holds. the real feedback capacity, but a rigorous proof has been IndexTerms—channelcapacity,directedinformation,dynamic missing. Ordentlich [12] characterized an optimal feedback programming, feedback capacity, Gauss-Markov source, infor- coding scheme for moving-average Gaussian noise channels. mationrate,intersymbolinterference,Kalman-Bucyfilter,linear Tatikonda [13] formulated the feedback channel capacity in Gaussian noise channel, noise whitening filter terms of the directed information rate. Ihara [14] studied the continuous-time Gaussian noise channel with feedback. I. INTRODUCTION There are two main contributions in this work: We consider discrete-time power-constrained linear Gaus- 1) We characterize the optimal signaling and feedback sian noise channels, where the signal is corrupted by an strategyforachievingthen-blockfeedbackcapacityofa additiveGaussianrandomprocess.Whenthechannelismem- power constrainedlinear Gaussian channelwith a ratio- oryless, i.e., when the channel is corrupted by additive white nalpowerspectrum.We showthattheoptimalsourceis Gaussian noise (AWGN), Shannon [1] provided a simple a simple feedback-dependentGauss-Markovsource and formulaforcomputingthefeed-forwardchannelcapacity,and that a Kalman-Bucy filter is optimal for processing the he also proved that feedback does not increase the channel feedback.Thisleadstoareformulationoftheproblemas capacity [2]. a stochastic controloptimizationproblem.Asa result,a For Gaussian noise channels with memory, i.e., when the new method based on dynamic programmingis derived channelnoise is correlated,the feed-forwardchannelcapacity to optimize the source and thus compute the n-block can be determined by the power-spectral-density water-filling feedback capacity. For computingthe n-block feedback method [3], [4], [5], [6]. However, with noiseless feedback, capacity, the new method decomposes the computation i.e., when the transmitter knows without error all previous into n identical sequential optimization problems with channel outputs, it has been a long-standing open problem each stage involving only O(L2) variables, where L is the order of the rational power spectral density of the Manuscript submitted on October 22, 2003; first revision December 10, noise (or channel). 2004;secondrevisionDecember23,2005;thirdrevisionJune10,2006;fourth We prove that a Gauss Markov source (channel input revision September25,2006. ThisworkwassupportedbytheNationalScienceFoundationunderGrant process) Xt of the following form (also depicted in No.CCR-9904458 andbytheNational StorageIndustryConsortium. Fig. 1) is optimal Shaohua Yang is with the Marvell Semiconductor Inc, Santa Clara, CA 95054.Email:[email protected]ˇic´iswiththeUniversity of Hawaii, Honolulu, HI 96822. Sekhar Tatikonda is with the Depart- X =dT(S −m )+e Z , t t t−1 t−1 t t ment of Electrical Engineering, Yale University, New Haven, CT 06520. Email:[email protected]. Kalman innovation | {z } IEEETRANSACTIONSONINFORMATIONTHEORY 3 transmitter Nt X channel output t + channel Zt transmitter Xt + Rt receiver e t d Kalman−Bucy t Filter Fig.2. Adiscrete-time linear Gaussiannoisechannel Fig.1. Optimalsourceforachieving thefeedback capacity an explicit formula for the maximal feedback information rate achieved by stationary sources, which represents a lower where d (a vector of dimension L) and e are prede- t t bound on the feedback capacity. This maximal feedback termined coefficients, vectors S and m are the t−1 t−1 information rate can be evaluated by nonlinear programming channel state (of dimension L) and the channel state techniques. We solve the nonlinear programming problem estimate (computed by the Kalman-Bucy filter in the in closed form for first-order autoregressive/moving-average feedbackloop),respectively,andZ isazero-meanunit- t (ARMA) channels. Section VII concludes the paper. variance Gaussian random variable that is independent of prior channel inputs, outputs and noise. We note that the Kalman-Bucy filter developed in this II. POWER-CONSTRAINED LINEARGAUSSIAN NOISE paper estimates the channel intersymbol-interference CHANNEL MODEL state S and takes a different form from the filters used t Lett∈Zdenotethediscretetimeindex,andlettherandom inSchalkwijk[15]orButman[11]whichrecursivelyes- variableX denotethe channelinputattime t. As depictedin timate the transmitted message, but it is closely related. t Figure 2, additive stationary Gaussian noise N corrupts the Theoptimalchannelinputderivedinthispaperincludes t channel input X to form the channel outputrandom variable a recursive estimate term (the Kalman innovation) and t a noveltyterm (e Z ), thus it could be equivalentto the t t R =X +N . (1) t t t recursive message estimating and transmitting scheme in [15], [11] only if the optimal value of the novelty It is assumed that the power spectral density function of the term e Z is zero, which is an open problem that we noise process N is known and is denoted as S (ω). In its t t t N leave unresolved. most generalformulation, the power spectral density function 2) We derive an explicit formula for the maximal in- of the Gaussian noise processcan be an arbitrarynonnegative formation rate achieved by (asymptotically) stationary function defined on the interval ω ∈ (−π,π], such that it is feedback-dependent sources. This represents a lower even, i.e., S (ω)=S (−ω), and its power is finite N N boundonthefeedbackcapacity.Wenotethatitisalong- π standingconjecturethata stationarysourceachievesthe 1 σ2 = S (ω)dω <∞. (2) feedback capacity, and this conjecture is not proved in N 2π N Z this paper and is still open. If the optimal Kalman- −π Bucy filter for processing the feedback (optimized over Further,itisrequiredthattheaveragesignalpowerbebounded both stationary and non-stationary feedback-dependent by a known value P from above, i.e., we have an average sources) has a steady state, i.e., if the optimal Kalman- channel-inputpower constraint Bucy filter becomesstationary as n→∞, the feedback n channel capacity exists and equals the maximal infor- 1 lim E (X )2 ≤P. (3) mation rate for stationary sources. n→∞ "n t # For the case of the first order autoregressive(AR) noise Xt=1 channels,ouroptimalsignalingschemeforachievingthe Ifafiniteblocklengthnisconsideredasin[7],thenthepower maximal stationary-source information rate turns out to constraint becomes be the same as Butman’s code [11]. n 1 Paper organization: We introduce the Guassian noise E (Xt)2 ≤P. (4) n " # channel model in Section II. For convenience the problem is t=1 X reformulated in the state-space (or state-machine) realization The Gaussian noise channel has memory when the noise is context. In Section III, the n-block feedback capacity is correlated. A correlated noise process N can be obtained by t expressed in a form that is suitable for solving the opti- passing a white Gaussian noise process W through a linear t mization problemusing dynamic programmingtechniques. In filter defined by a constant-coefficient difference equation of Section IV, we show that Gauss-Markov sources achieve the the following form n-block feedback capacity and that a Kalman-Bucy filter is optimal for processing the feedback. Section V is devoted L L N =W − a W − c N , (5) to solving the feedback capacity computation problem. A t t l t−l l t−l simple feedback-capacity-achieving signaling scheme is ex- Xl=1 Xl=1 plicitly characterized and a dynamic programming algorithm where W is a white Gaussian random process with vari- t to optimize the source is presented. In Section VI, we derive ance σ2 . It is easy to verify that the z-domain transfer W IEEETRANSACTIONSONINFORMATIONTHEORY 4 Wt~N (0 , σ W2 ) Wt~N (0 , σ W2 ) H( z ) H( z ) N N t transmitter Xt + t Rt receiver M transmitter Xt + Rt receiver Rt−1 D Fig.3. Alinear Gaussiannoisechannel witharational noisefilter Fig.4. LinearGaussiannoisechannelwithfeedback. function of the linear noise filter in (5) is L WedenotethemessageasMandencodeitintonfeedback- 1− alz−l dependent channel input signal variables Xn by using a 1 H(z)= l=1 , (6) feedback encoder, which in its most general form is PL 1+ c z−l l X =X M,Rt−1,n0 . (8) l=1 t t 1 −∞ and that the power spectral Pdensity function of the noise The receiver tries to de(cid:0)code the messa(cid:1)ge M based on the process Nt is rational, i.e., realization of the channel output variables Rn =rn. 1 1 S (ω) = σ2 H(ejω) 2 We next summarize some of the relevant known results. N W L L (cid:12)(cid:12) 1− (cid:12)(cid:12)ale−jlω 1− alejlω A. Butman’s Recursive Feedback Coding Scheme = σ2 (cid:18) l=1 (cid:19)(cid:18) l=1 (cid:19). (7) W PL PL Butman [11] considered a very general communication 1+ cle−jlω 1+ clejlω system for colored Gaussian noise channels with or without (cid:18) l=1 (cid:19)(cid:18) l=1 (cid:19) feedback. The feedback considered can be noiseless or noisy. SuchalinearGaussiannoPisechannelwithmePmoryisdepicted If we omit the noisy feedbackscenario, the transmitted signal in Figure 3. X is chosen as a linear combination of the feedback and the t It is known that any power spectral density can be ap- Gaussian message M=θ as proximated arbitrarily closely by the rational function (7) if t−1 the model size (memory length) parameter L is chosen to be X =δ θ+ a R . (9) large enough [17]. So, restriction (5) is not strong, but as we t t ti i will see, it is very much needed in the subsequent analysis. Xi=1 Further, without loss of generality, we may assume that the Withthistransmissionmodel,Butmanoptimizedthesignaling functionH(z)hasallitspolesandzerosinside theunitcircle for certain Gaussian noise channels with feedback. For first- and that no poles or zeros occur at the origin z = 0. Thus, order autoregressive noise channels, the following feedback H(z) is a causal and stable minimum phase linear filter [17], schemeachievesthemaximalinformationrateunderthemodel and its inverse is also a causal and stable minimum phase in (9). At each time instant t > 0, the transmitter computes linear filter. the receiver-side minimum-variance estimate of the message For the linear Gaussian noise channel, the channel capac- θ˜=E θ Rt−1 =rt−1 (10) ity [1] is defined as the maximal amount of information that 1 1 could be transmitted per channel use with an arbitrarily small and transmits (cid:2) (cid:12) (cid:3) (cid:12) errorprobability.Itiswellknownthatnoiselesschanneloutput X =δ θ−θ˜ . (11) feedback, i.e., error-free observation of the channel output by t t the transmitter, may increase the channel capacity [11]. Thus, (cid:16) (cid:17) By optimizingthe coefficientsδ to maximizethe information t if we denote the feed-forward channel capacity as C and the rate subject to the power constraint, a tight lower bound on feedback channel capacity as Cfb, then C ≤Cfb. the feedback capacity was obtained, which was shown to be As illustrated in Figure 4, the transmitter knows at time t, greater than the feed-forward channel capacity. This result without error, all previous realizations of the channel outputs was generalized to higher order channels [18]. Proving that before it forms and sends out the next channel input signal. Butman’s coding scheme (9) achieves the capacity is still an The transmission starts at time t = 1. We also assume open problem. that prior to time t = 1, a known signal is transmitted, e.g., X = 0 for t ≤ 0, and thus both the transmitter and t receiver know the prior channel outputs R0 = N0 = B. n-Block Feedback Capacity −∞ −∞ n0 . This is equivalent to assuming that both the receiver CoverandPombra[7]introducedthe(block-length-n)feed- −∞ and the transmitter know N0 = n0 and W0 = w0 . back capacity as the maximal achievable information rate −∞ −∞ −∞ −∞ This assumption on the channel inputs and outputs prior to (i.e., maximal information per channel use) for finite block transmissionismainlyforsimplifyingtheanalysisinthestate- length (or finite horizon) n, and showed that a sequence space setting, which we formally introduce in Section II-B. of codes exists that can achieve the capacity as n → ∞. IEEETRANSACTIONSONINFORMATIONTHEORY 5 Since we are assuming that the noise realization n0−∞ is Wt~N (0 , σ W2 ) known both to the receiver and the transmitter prior to the M transmitter Xt H− 1(z ) + Yt receiver start of transmission, we can express the Cover-Pombra n- blockfeedbackcapacityasCfb(n) =△ max 1I M;Rn n0 , where the maximizationis taken under a finnite-horiz1onp−o∞wer Yt−1 D (cid:0) (cid:12) (cid:1) constraint (cid:12) n Fig.5. Anequivalent linear Gaussiannoisechannel model 1 E (X )2 N0 =n0 ≤P. (12) "n t (cid:12) −∞ −∞# Xt=1 (cid:12)(cid:12) (noiseless feedback) z−1 Define the following cova(cid:12)riance matrices (cid:12) M K(n) =△ E Nn·(Nn)T N0 =n0 transmitter xt + + + yt receiver N 1 1 −∞ −∞ z−1 wt K(Xn) =△ EhX1n·(X1n)T(cid:12)(cid:12)X−0∞ =n0−∞i + a1 st(1) c1 + K(Xn+)N =△ Eh(X1n+N1n)·(cid:12)(cid:12)(X1n+N1n)TiN−0∞ =n0−∞ . + a st(2)z−1 c + 2 2 As shown in [7]h, the n-block feedback capac(cid:12)ity (or maximail information rate) equals (cid:12) aL st(L ) cL (noisy forward channel) K(n) Cfb(n) = max 1 log X+N . (13) Fig.6. AnLth-orderLTIGaussiannoisechannel withnoiselessfeedback 1tr K(n) ≤P 2n (cid:12)(cid:12) K(n) (cid:12)(cid:12) n X (cid:12) N (cid:12) “ ” (cid:12) (cid:12) Here, the maximization is taken over a(cid:12)ll cha(cid:12)nnel input vari- ables Xn, which take the following line(cid:12)ar for(cid:12)m or equivalently 1 t−1 L L L Xt = btiNi+Vt, for t=1,2,...,n, (14) Yt− alYt−l =Xt+ clXt−l+Wt− alWt−l, (16) Xi=1 Xl=1 Xl=1 Xl=1 where bti are the coefficients that need to be optimized, and where Wt is an independent and identically distributed the random variables V have a Gaussian distribution whose (i.i.d.) Gaussian random process with variance σ2 . t W covariance matrix needs to be optimized. The channel model (16), depicted in Figure 5, is a channel Then-blockfeedbackcapacitycomputationproblemin(13) modelwith intersymbolinterferencedue to the filter H−1(z). is formulated for an arbitrary noise covariance matrix K(n). Note that the transmitter obtains Yt by filtering the original N This optimization problem can be solved by methods given channel output Rt using the filter H−1(z). The filters H(z) in [19]. The number of unknown variables in (14) is O(n2). and H−1(z) are causal, minimum phase and invertible, and For the feedback code in (14) the transmitter would need thus do not cause any delay, see [17]. Thus, the two channel to remember and utilize all previous channel output realiza- models depicted in Figure 4 and Figure 5 could be converted tions Rt−1 = rt−1 (or the noise realizations Nt−1 = nt−1) into each other’s form by filtering their channel outputsusing 1 1 1 1 so as to form the next input signal X = x by using the rational causal delay-free filters H(z) and H−1(z), respec- t t linear equation (14). Thus, the encoding complexity grows tively. Hence, the two channel models are mathematically with time t in general. equivalent and have the same feedback (or feed-forward) In the following, we reformulate the linear Gaussian noise channel capacities. channelinFigure4asastate-spacechannelmodel.Thestate- The rational filter H−1(z) can be realized by shift regis- space formalism permits a very simple feedback-capacity- ters [17], and the channel model depicted in Figure 5 can achieving signaling scheme whose encoding complexity is thusberepresentedasastate-space(orstate-machine)channel constantfor any time instantt≥1. For this optimalsignaling model with noiseless feedback. As depicted in Figure 6, we scheme we show in this paper,the numberof variablesgrows only need to consider a linear time-invariant (LTI) filter [17] as O(n), and the encoding complexity is fixed for all t>1. observed through an additive white Gaussian noise (AWGN) channel, see equation (15). The LTI filter in the channel is C. AnEquivalentState-SpaceGaussianNoiseChannelModel completely characterized by its order L and the two vectors of tap coefficients The filter H(z) in (6) is modeled as a rational filter with all poles and zeros inside the unit circle and no poles or a=△ [a ,a ,···,a ]T, (17) zeros in the origin. The inverse of such a filter exists and 1 2 L is also invertible, so we may filter (without causing latency) c=△[c1,c2,···,cL]T, (18) the channel output by a noise whitening filter H−1(z) to get which are also coefficients of the rational filter H(z) in an equivalent state-space (intersymbol interference) channel equation (6). model with white noise (depicted in Figure 5) Let X be the channel input at time t whose realization t Y(z)=H−1(z)R(z)=H−1(z)X(z)+W(z), (15) is denoted by x . Let Y be the noisy channel output at t t IEEETRANSACTIONSONINFORMATIONTHEORY 6 timetwhoserealizationisdenotedbyy (whichisthefiltered Since the variance of the white Gaussian noise W t t channel output R of the linear Gaussian noise channel in is σ2 , from equation (20) we have the following con- t W Figure 4). Let the vectorof values stored in the shift registers ditional differential entropy of the channel output of the LTI filter, i.e., S =△ [S (1),S (2),...,S (L)]T, be the t t t t h Y St,Yt−1 = h Y St △ t 0 1 t t−1 channel state vector, and denote the state realization by s = t = h(W ) [s (1),s (2),...,s (L)]T.Sinceitisassumedthatthechannel (cid:0) (cid:12) (cid:1) (cid:0) t(cid:12) (cid:1) t t t (cid:12) 1 (cid:12) inputs X0 are known prior to the start of transmission at = log(2πeσ2 ). (24) −∞ 2 W time t=1, the state realizationS =s is also known.Then, 0 0 the channel can be described by the following assumptions: III) SincethesequencesXt andSt uniquelydetermineeach other for any given value of s , we can characterizethe 1) The forward channel satisfies a state-space model, i.e., 0 source distribution either in terms of channel inputs X t S = AS +bX (19) as t t−1 t Yt = (a+c)TSt−1+Xt+Wt, (20) Pt xts0t−1,y1t−1 =△ PXt|S0t−1,Y1t−1 xtst0−1,y1t−1 , (25) where Wt is white Gaussian noise with variance σW2 . or(cid:0)in(cid:12)(cid:12)terms of ch(cid:1)annel states St as(cid:0) (cid:12)(cid:12) (cid:1) ThesquarematrixAandthe vectorb aretime-invariant as follows Pt st s0t−1,y1t−1 =△ PS |St−1,Yt−1 st st0−1,y1t−1 . (26) t 0 1 a1 a2 ... aL−1 aL 1 For th(cid:0)e G(cid:12)(cid:12) aussian no(cid:1)ise channel form(cid:0)ula(cid:12)(cid:12)ted in the(cid:1)state- 1 0 ... 0 0 space framework, the feedback capacity for a finite horizon 0 A=△ 0 1 ... 0 0 , b=△ . . (21) n equals [7] ... ... ... ... ... .. 1 0 Cfb(n) = max I(M;Yn|S =s ) 0 0 ... 1 0 n 1 0 0 TheLTIfilterdescribedbythecoefficientvectorsaandc = maxn1 [h(Y1n|S0 =s0)−h(W1n)], (27) is invertible and stable. The initial state S = s is 0 0 where the maximization is over the channel input distribu- known to both the encoder and the decoder. tion (26) and is subject to the average input power con- 2) The feedback link is noiseless and causal, i.e., the straint (22). In the subsequent sections, we focus on char- encoder, before sending out symbol X , knows without t acterizingthe optimalsignaling and the feedbackcapacity for error all previous channel outputs Yt−1 =yt−1. 1 1 the state-space channel model depicted in Figure 6. 3) The average channel input power is constrained by n 1 E (X )2|S =s ≤P, (22) III. n-BLOCK FEEDBACK CAPACITY n t 0 0 In this section, we manipulate the mutual informa- t=1 X (cid:2) (cid:3) tion I(M;Yn|S =s ) into a form that is suitable for where n is the total number of input symbols Xn that 1 0 0 1 evaluation and analysis. Let the source distribution induced are used to encode the message M. by the encoder X = X M,Yt−1 , i.e., the set of all t t 1 From assumptions 1)-3), we have the following: valid conditional probability density functions defined in (25) (cid:0) (cid:1) I) For any given initial channel state S = s , the or (26), be denoted by 0 0 sequences St and Xt determine each other uniquely because of th1e linear 1equation (19). If a state sequence P =△ {Pt st st0−1,y1t−1 ,t=1,2,...}. (28) does not conform to the channel law (19), it is called The n-block fee(cid:0)dba(cid:12)ck capacity(cid:1) computation problem (13) invalid. We only need to consider valid state sequences (cid:12) becomes (see [7]) throughout. From the definition of the state S in (19), t we see thatwhentheinitialstate s isknown,the chan- 0 1 nelinputsequenceXt andthechannelstatesequenceSt Cfb(n) = max I(M;Yn|S =s ) are in a 1-to-1 relationship. So, the mutual information P n 1 0 0 1 rate between the channel input sequence (source) and = max [h(Yn|S =s )−h(Wn)], (29) the channel output sequence is the same as the mutual P n 1 0 0 1 informationratebetweenthechannelstatesequenceand where the maximization is over the source distribution P. the channeloutputsequence.Thus,it is validto refer to Note that in (28) we temporarily ignore the linear structure the state sequence as the source sequence. of the capacity-achievingsignaling(14) derivedby Coverand II) Given the channel state pair St = S ,S , the Pombrain[7],andconsiderthedistributionP insteadbecause t−1 t−1 t channeloutputY isstatisticallyindependentofprevious we first want to show that the a Markov structure of the t channel states St−2 and outputs Yt−1, t(cid:0)hat is (cid:1) distributionP is sufficient.Consequently,we will show thata 0 1 linearsignalingscheme,takingaverysimpleform,issufficient PYt|St0,Y1t−1 yt st0,y1t−1 =PYt|Stt−1 yt stt−1 . (23) for achieving the n-block capacity. (cid:0) (cid:12) (cid:1) (cid:0) (cid:12) (cid:1) (cid:12) (cid:12) IEEETRANSACTIONSONINFORMATIONTHEORY 7 The differentialentropyof the channelnoise in (29) can be and thereby compute the n-block feedback capacity Cfb(n) alternatively expressed as for large block length n. However, by working on a state- space channel model as defined in Section II-C, we are able n n h(Wn)= h(W )= h Y Yt−1,St,S =s . (30) to significantly simplify the problem and derive a simple dy- 1 t t 1 1 0 0 namicprogrammingmethodtocomputethen-blockfeedback t=1 t=1 Here, the eXqualities folloXw fro(cid:0)m t(cid:12)(cid:12)he fact that the w(cid:1)hitened capacity. noise sequence W is independent and identically dis- t tributed (i.i.d.) and from the channel assumptions in Sec- IV. n-BLOCK FEEDBACK-CAPACITY-ACHIEVING tionII-C,see(24).Bysubstituting(30)intotheexpressionfor STRATEGY I(M;Yn|S =s ) in (29), we arrive at several equivalent 1 0 0 A. Gauss-Markov Sources Achieve the Feedback Capacity expressions for I(M;Yn|S =s ) 1 0 0 In the following analysis, we note that since the initial I(M;Yn|S =s ) 1 0 0 channelstates isknownaccordingtothechannelassumption n 0 1 inSectionII-C,fornotationalsimplicity,wewillnotexplicitly = h Y Yt−1,s − log 2πeσ2 (31) t 1 0 2 W write the dependence on s when obvious. t=1(cid:20) (cid:21) 0 Xn (cid:0) (cid:12)(cid:12) (cid:1) (cid:0) (cid:1) Theorem 1: [Feedback-dependent Gauss-Markov sources = h Y Yt−1,s −h Y Yt−1,St,s (32) achieve the feedback capacity] For the power constrained t 1 0 t 1 1 0 Gaussianchannel,afeedback-dependentGauss-Markovsource t=1 Xn (cid:2) (cid:0) (cid:12)(cid:12) (cid:1) (cid:0) (cid:12)(cid:12) (cid:1)(cid:3) distribution (not necessarily stationary) of the following form = h Y Yt−1,s −h Y Yt−1,St ,s (33) t 1 0 t 1 t−1 0 t=1 PGM =△ P s s ,yt−1 ,t=1,2,... (36) Xn (cid:2) (cid:0) (cid:12)(cid:12) (cid:1) (cid:0) (cid:12)(cid:12) (cid:1)(cid:3) t t t−1 1 = I Stt−1;Yt Y1t−1,s0 , (34) achieves the n-blo(cid:8)ck f(cid:0)eed(cid:12)(cid:12)back capac(cid:1)ity Cfb(n), for(cid:9)any block Xt=1 (cid:0) (cid:12) (cid:1) length n. where equalities in (31)(cid:12)and (32)1 follow from the definition Proof: We adoptthe followingproofstrategy.As shown of mutual information; the equality in (33) comes from the in [7], we only need to consider feedback-dependent Gaus- chainruleformutualinformationandthechannelassumptions, sian source distributions. We take an arbitrary feedback- see (24); and equality (34) follows from the definition of dependent Gaussian (not necessarily Markov) source distri- mutual information. The terms in the sums (31) through (34) bution P of the form in (28). From the source P , we 1 1 represent the amount of information that every channel use obtain the marginal state transition probabilities, from which contributes to the total transmitted information. we construct a feedback-dependent Gauss-Markov source P 2 Now, the feedback capacity can be expressed as of the form in (36). We then show that sources P and 1 P , though different in general, induce the same information 1 2 Cfb(n) = max I(M;Yn|S =s ) I(M;Yn|S =s ). Using this argument, we will show that P n 1 0 0 1 0 0 foranyoptimalfeedback-dependentGaussian sourcedistribu- n 1 = max I St ;Y Yt−1,s , (35) tion that achieves the n-block feedback capacity, there exists P n t−1 t 1 0 a feedback-dependent Gauss-Markov source distribution (not t=1 X (cid:0) (cid:12) (cid:1) necessarilystationary)thatalsoachievesthen-blockfeedback where the maximization is taken over (cid:12)the set P of valid capacity. feedback-dependentsource distributionfunctions(28), see [7] Let P be any valid feedback-dependent Gaussian source or [13] for the proof. 1 distribution defined as A source distribution P is called optimal if it maxi- fmeeizdebsacIk(Mca;pYa1cnit|yS0i=n s(03)5)a.ndSinthcues tahcehieivnefsormthaetionn-blroactke P1 =△ Pt st st0−1,y1t−1 ,t=1,2,··· . (37) 1I(M;Yn|S =s ) is linearly proportional to the entropy From the sou(cid:8)rce(cid:0)P ,(cid:12)we define(cid:1)the sequence(cid:9)of conditional n 1 0 0 1(cid:12) rate of the channel output, see (31), a feedback-dependent marginalpdf’sQ s s ,yt−1 ,fort=1,2,···,asfollows t t t−1 1 source is optimal if and only if it maximizes the entropy sraiaten onfoistheechchanannnele,ltohuetpfeuetdpbraocckescsapYatc.itFyoirsaaclhinieevaerdGbayusa- Qt st st−1,y1t−1(cid:0) =△(cid:12)(cid:12)PS(Pt|1S)t−1,Y(cid:1)1t−1 st st−1,y1t−1 (38) Gseaquusesli,awnisthoouurctelodsisstorfibouptitoimn,alsietye,[w7e],o[n1l3y].coTnhseidreefrofreee,dibnacthke- =(cid:0) Z(cid:12)(cid:12)"τtQ−=11Pτ(sτ|sτ0(cid:1)−1,y1τ−1)fYτ|Sττ−(1yτ|sττ−1(cid:0))#Pt(cid:12)(cid:12)(st|st0−1,y1t−1)d(cid:1)st1−2.(39) depSeinncdeenttheGanuusmsibaenr soofuarrcgeudmisetnritbsuftoiorntsh.e probability density "τt−=11Pτ(sτ|sτ0−1,y1τ−1)fYτ|Sττ−(1yτ|sττ−1)#ds1t−2 function (pdf) P s st−1,yt−1 increases linearly with time Z Q t t 0 1 Using the functions Q s s ,yt−1 , we construct a t, it is hard to directly find the optimal distribution P in (28) t t t−1 1 (cid:0) (cid:12) (cid:1) feedback-dependent Markov (not necessarily stationary) (cid:12) (cid:0) (cid:12) (cid:1) 1The right-hand side of equation (32) is defined by Marko [20] and source distribution P2 as (cid:12) Massey[21]asthedirectedinformationfromthechannelstatetothechannel output. P2 = Qt st st−1,y1t−1 ,t=1,2,··· . (40) (cid:8) (cid:0) (cid:12) (cid:1) (cid:9) (cid:12) IEEETRANSACTIONSONINFORMATIONTHEORY 8 The source P induces the following joint pdf of the channel = P(P1) st ,yt|s , (56) states St an2d outputs Yt Stt−1,Y1t|S0 t−1 1 0 t−1 1 (cid:0) (cid:1) where (a) is the result of substituting the definition (39) for PS(Ptt−21),Y1t|S0 stt−1,y1t|s0 sourceP2 andtheinductionassumption(52)into(53),and(b) is obtained by simplifying the expression in (54). = PS(Pt,2Y)t(cid:0)|S st1,y1t|s0(cid:1) dst1−2 (41) Thus, we have shown that the channel states Stt−1 and Z 1 1 0 outputs Yt induced by sources P and P have the same t (cid:0) (cid:1) 1 1 2 = Qτ sτ sτ−1,y1τ−1 PYτ|Sττ−1 yτ sττ−1 dst1−2.(42) idnidsturicbeuttihoen.saItmies itnhfeorremfoarteiocnleIar(Mtha;tYthne|Ssou=rcses)Pa1ccaonrddiPng2 Zτ=1 1 0 0 Y (cid:0) (cid:12) (cid:1) (cid:0) (cid:12) (cid:1) to (34). We next show(cid:12)by induction that the join(cid:12)t distribution (42) of St and Yt induced by the source P is the same as the Note that the set of channel state vector entries t−1 1 2 {S (i):1≤i≤M,1≤τ ≤t−1} and the input sequence one induced by the source P , i.e., τ 1 Xt−1 linearly determine each other according to the channel P(P2) st ,yt|s =P(P1) st ,yt|s (43) law1 in (19). Thus, induced by the Gaussian source P , the Stt−1,Y1t|S0 t−1 1 0 Stt−1,Y1t|S0 t−1 1 0 channel state entries S (i) (for 1 ≤ i ≤ M and 1 ≤ τ1≤ t) = P(P1)(cid:0) st,yt|s(cid:1) dst−2 (cid:0) (cid:1) (44) and the symbols Yt arτe jointly Gaussian. We conclude that St,Yt|S 1 1 0 1 1 Z t1 1 0(cid:0) (cid:1) the conditional pdf’s Qt st st−1,y1t−1 constructed in (39) = Pτ sτ s0τ−1,y1τ−1 PYτ|Sττ−1 yτ sττ−1 dst1−2.(45) adreepeanldseontGGaauusssisa-nMafruknocvt(cid:0)io(nnos(cid:12)(cid:12)tannedcestshaursi(cid:1)lyPs2tatiisonaaryfe)esdobuarccke-. Z τ=1 Y (cid:0) (cid:12) (cid:1) (cid:0) (cid:12) (cid:1) The Gauss-Markov source P also satisfies the input power For t=1, by the(cid:12)definition (39) of source P(cid:12) we have 2 2 constraint (22), which is obvious by the equality in (56). Q (s |s )=P (s |s ). (46) Theorem 1 reveals that, for any given prior channel output 1 1 0 1 1 0 yt−1, it is sufficient to utilize Markov sources to maximize We verify the to-be-proved equality (43) for t = 1 by noting 1 the entropy of the channel output sequence. By Theorem 1, that without loss of optimality, in the sequel we only consider P(P2) (s ,y |s ) = Q (s |s )P y s1 (47) feedback-dependent Gauss-Markov sources of the following S1,Y1|S0 1 1 0 1 1 0 Y1|S10 1 0 form = P (s |s )P (cid:0)y (cid:12)s1 (cid:1) (48) 1 1 0 Y1|S10 1 (cid:12) 0 PGM =△ P s s ,yt−1 ,t=1,2,··· . (57) = P(P1) (s ,y |(cid:0)s )(cid:12), (cid:1) (49) t t t−1 1 S1,Y1|S0 1 1 0 (cid:12) (cid:8) (cid:0) (cid:12) (cid:1) (cid:9) (cid:12) which directly implies B. The Kalman-Bucy Filter is Optimal for Processing the Feedback P(P2) s1,y |s =P(P1) s1,y |s . (50) S10,Y1|S0 0 1 0 S10,Y1|S0 0 1 0 Definition 1: We use αt(·) as shorthand notation for the Now, assume th(cid:0)at the equ(cid:1)ality (43) hold(cid:0)sfor up to(cid:1)time t−1, posterior pdf of the channel state S given the prior channel t where t>1, particularly, outputs Yt =yt, that is 1 1 PS(Ptt−−212),Y1t−1|S0 stt−−12,y1t−1|s0 αt(µ)=△ PSt|S0,Y1t µ s0,y1t . (58) =P(P1) (cid:0) st−1,yt−1(cid:1)|s (51) (cid:0) (cid:12) (cid:1) (cid:3) Stt−−12,Y1t−1|S0 t−2 1 0 For a feedback-dependentGauss-M(cid:12)arkov source PGM, the t−1 (cid:0) (cid:1) = P s sτ−1,yτ−1 P y sτ dst−3.(52) functionsαt(·)canberecursivelycomputedbytheBayesrule τ τ 0 1 Yτ|Sττ−1 τ τ−1 1 as follows Z τ=1 Y (cid:0) (cid:12) (cid:1) (cid:0) (cid:12) (cid:1) The induction ste(cid:12)p for time t is simply show(cid:12) n as follows α µ = αt−1(v)Pt(µ|v,y1t−1)PYt|St−1,St(yt|v,µ)dv . (59) PS=(Ptt−2Q1),Y1ts|S0s(cid:0)stt−,1y,yt−1t1|s0P(cid:1) y st Sincte(cid:0)th(cid:1)e fuRnRRcαtti−o1n(vα)Pt(t(·)u|vis,y1ta−1G)PauYts|sSita−n1,Sptd(yf,t|vit,ui)sducdovmpletely t t t−1 1 Yt|Stt−1 t t−1 characterized by the conditional mean mt and conditional (cid:0)Z(cid:12)(cid:12)PS(Ptt−−212),Y1t−1(cid:1)|S0 stt−−12,(cid:0)y1t−(cid:12)(cid:12)1|s0 (cid:1)dst−2 (53) covariance mmatri=x KEtSof tshe,ycthan,nel state (60) (=a) Z"τtQ−=11Pτ(sτ|sτ0−1,y1τ−1)fYτ(cid:0)|Sττ−(1yτ|sττ−1)#Pt((cid:1)st|st0−1,y1t−1)dst1−2 × Ktt =E(cid:2)(Stt(cid:12)(cid:12)−0 m1t)(cid:3)(St−mt)T s0,y1t . (61) "τt−=11Pτ(sτ|sτ0−1,y1τ−1)fYτ|Sττ−(1yτ|sττ−1)#dst1−2 Wenotethattherecuhrsion(59),i.e.,therecurs(cid:12)(cid:12)ivecomiputation PYt|Stt−1yZtsttQ−1 "τt−=11Pτ(sτ|sτ0−1,y1τ−1)fYτ|Sττ−(1yτ|sττ−1)#dst1−2(54) fiofltemr [t22a]n.d Kt, can be implemented by a Kalman-Bucy t (cid:0) (cid:12) (cid:1)Z Q Theorem 2: [Kalman-Bucy filter is optimal for processing (=b) P (cid:12)s sτ−1,yτ−1 P y sτ dst−2 (55) thefeedback]Forthepower-constrainedlinearGaussianchan- τ τ 0 1 Yτ|Sττ−1 τ τ−1 1 nel,letthefeedback-dependentGauss-Markov(notnecessarily Zτ=1 Y (cid:0) (cid:12) (cid:1) (cid:0) (cid:12) (cid:1) (cid:12) (cid:12) IEEETRANSACTIONSONINFORMATIONTHEORY 9 Wt~N (0 , σ W2 ) we then have M transmitter Xt H− 1(z ) + Yt receiver PYtn,Snt−1|S0,Y1t−1 ytn,snt−1 s0,y˜1t−1 (cid:0) n (cid:12) (cid:1) αt−1(. ) Kalman Filter D =αt−1(st−1) Pτ sτ(cid:12)sτ−1,y1τ−1PYτ|Sττ−1 yτsττ−1 τ=t Y (cid:0) (cid:12) (cid:1) (cid:0) (cid:12) (cid:1) Fig.7. TheoptimalfeedbackstrategyfortheLinearGaussiannoisechannel. =PYtn,Snt−1|S0,Y1t−1 (cid:12)ytn,snt−1 s0,y1t−1 . (cid:12) (67) The equality in (67) direct(cid:0)ly implies(cid:12) that the(cid:1)entropies are (cid:12) equal stationary) source PGM be defined as α h Yn s ,y˜t−1 =h Yn s ,yt−1 , (68) t 0 1 t 0 1 PGM =△ P s s ,α (·) ,t=1,2,... , (62) and that for(cid:0)any(cid:12)τ ≥t th(cid:1)e trans(cid:0)miss(cid:12)ion powe(cid:1)rs are equal α t t t−1 t−1 (cid:12) (cid:12) where the Ma(cid:8)rkov(cid:0)tra(cid:12)(cid:12)nsition probab(cid:1)ility depends(cid:9)only on the E (Xτ)2 s0,y˜1t−1 =E (Xτ)2 s0,y1t−1 . (69) opfostaelrlioprrdioisrtrcibhuatnionnelfuonucttpiountsofYt1ht−e1c.haTnhneelnst-abtleocαkt(·f)eiendsbteaacdk dTihsetrriebfuotri(cid:2)eo,nfoPrτan(cid:12)(cid:12)syτts>τ−01(cid:3),ayn1td−1a,n(cid:2)yytτ−co1nst(cid:12)(cid:12)faonrt Πtim≥e(cid:3)0t, ≤theτso≤urcne capacity Cfb(n) then equals that maximizes the channel output entropy h Yn s ,yt−1 , (cid:0) (cid:12) (cid:1) t 0 1 subject to power co(cid:12)nstraint 1 (cid:0) (cid:12) (cid:1) Cfb(n) = max I(M;Yn|S =s ) n (cid:12) PαGM n 1 0 0 E (Xτ)2 s0,y1t−1 ≤Π, (70) n 1 τ=t = max I St ;Y Yt−1,s , (63) X (cid:2) (cid:12) (cid:3) PαGM nXt=1 (cid:0) t−1 t(cid:12) 1 0(cid:1) wenhteronpyy1th−1Yisnthse,yf˜ete−d1bacukn(cid:12)dveercctoorn,smtrauisnttalso maximize the (cid:12) t 0 1 where the maximization is taken subject to an average input n (cid:0) (cid:12) (cid:1) power constraint (cid:12) E (X )2 s ,y˜t−1 ≤Π, (71) τ 0 1 1 n Xτ=t (cid:2) (cid:12) (cid:3) E (X )2|S =s ≤P. (64) when y˜t−1 is the feedbac(cid:12)k vector. We note that Π is an n t 0 0 1 t=1 arbitrary non-negative constant (independent of the channel X (cid:2) (cid:3) output history, and independent of the initial state). This capacity-achieving strategy is depicted in Figure 7. (cid:3) According to the above analysis, for any given power Proof: Wefirstbrieflyoutlinetheproofstrategy.Wewill budget Π ≥ 0 for transmissions at times τ ≥ t, the optimal consider two different feedback vector realizations (channel source distribution P s s ,yt−1 at time t that maxi- output histories) yt−1 and y˜t−1 that both induce the same t t t−1 1 1 1 mizestheentropyofsubsequentchanneloutputsdependsonly posterior state pdf α (·) at time t − 1. We will then (cid:0) (cid:12) (cid:1) t−1 on αt−1(·). We may summ(cid:12)arize the conclusion as follows: apply equivalent Gauss-Markov source distributions for the Conclusion 1: At any time instant t, for any con- subsequentsourcesymbols(attimest,t+1,...),irrespectiveof stant non-negative power constraint Π, the optimal the feedback realization (yt−1 or y˜t−1), and we get the same 1 1 distribution of the source Xn that maximizes the distributions for the subsequent channel inputs and outputs, t channel output entropy h Yn|s ,yt−1 under the andthusthe sametransmissionpowerandoutputentropy(for power constraint n E (Xt )20 s1,yt−1 ≤ Π, the subsequenttransmissions), irrespectiveof the two channel τ=t (cid:0) τ 0 1(cid:1) depends only on the posterior channel state distri- histories. Using this result, we complete the proof using an P (cid:2) (cid:12) (cid:3) inductive argument. butionαt−1(·), regardlessofthech(cid:12)anneloutputhis- toryyt−1 thatleadtothatposteriorstatedistribution. Suppose that two differentfeedback vectors y˜t−1 and yt−1 1 1 1 We now show that we can drop the conditioning on the (y˜t−1 6= yt−1) induce the same posterior channel state pdf 1 1 channeloutputhistoryyt−1 when formulatingthe powercon- α (·), that is, for any possible state value s = µ we 1 t−1 t−1 straint in Conclusion 1. According to (29), the maximization have problemin(63)isequivalenttomaximizingthechanneloutput α˜t−1 µ =△ PSt−1|S0,Y1t−1 µ s0,y˜1t−1 entropy, i.e., n (cid:0) (cid:1) = PSt−1|S0,Y1t−1(cid:0)µ(cid:12)(cid:12)s0,y1t−1(cid:1)=△αt−1 µ .(65) argmPaxt=1I Stt−1;Yt Y1t−1,s0 Now consider two distributions(cid:0)of(cid:12)(cid:12)the sour(cid:1)ce Sτ, fo(cid:0)r(cid:1)τ ≥ t, X (cid:0) (cid:12)(cid:12) =arg(cid:1)mPaxh(Y1n|s0). (72) the firstdistributionconditionedonyt−1, andthe secondcon- 1 Forany1≤t≤n,we can rewritethechanneloutputentropy ditionedony˜t−1.Ifweletthesetwodistributions,irrespective 1 as of the feedback realization (yt−1 or y˜t−1), be equal to each other for τ ≥t, that is, for τ ≥1 t if 1 h(Y1n|s0)=h Y1t−1|s0 + Pτ sτ sτ−1,y˜1t−1,ytτ−1 =Pτ sτ sτ−1,y1t−1,ytτ−1 , (66) h(cid:0)Ytn s0,y(cid:1)1t−1 PY1t−1 y1t−1 dy1t−1. (73) Z (cid:0) (cid:12) (cid:1) (cid:0) (cid:1) (cid:0) (cid:12) (cid:1) (cid:0) (cid:12) (cid:1) (cid:12) (cid:12) (cid:12) IEEETRANSACTIONSONINFORMATIONTHEORY 10 Similarly, we can rewrite the input power as constraint Π depends only on α (·).2 The optimal source 0 0 now generates the channel input X at time t = 1, whose n t−1 1 E (Xτ)2|s0 = E (Xτ)2|s0 + power is π1 =E (X1)2 s0 . The channel input X1 induces τ=1 τ=1 thechanneloutpuhtY1,wh(cid:12)ichiinturnproducesanewposterior X (cid:2) n (cid:3) X (cid:2) (cid:3) state distribution α1(·). T(cid:12)(cid:12)he leftover power is Π1 =Π0−π1. E (Xτ)2 s0,y1t−1 PYt−1 y1t−1 dy1t−1.(74) Now, the optimal source at time instant t = 2 is one that 1 Z τ=t maximizes h(Yn|s ,y ) subject to the power constraint Π X (cid:2) (cid:12) (cid:3) (cid:0) (cid:1) 2 0 1 1 Fortheoptimalsourcedist(cid:12)ributionthatmaximizes(72)subject and,accordingtoConclusion2,thesourcedistributionattime tothe powerconstraint(64), andforanygivenchanneloutput t=2 depends only on α1(·). This source generates X2. Ex- realization yt−1, we define the remaining power at time t as tending the inductive argument further in similar fashion, we 1 conclude that at each time step t, according to Conclusion 2, n Π yt−1 =△ E (X )2 s ,yt−1 . (75) the sourcedistributionattime t dependsonlyon the posterior t−1 1 τ 0 1 state distribution α (·). t−1 τ=t (cid:0) (cid:1) X (cid:2) (cid:12) (cid:3) An alternative proof based on dynamic programming [23] (cid:12) Clearly Π = nP. By definition, Π yt−1 is a function and Lagrange multipliers is given in Appendix 1. 0 t−1 1 of prior channel outputs and a function of time t. We note Theorem 2 suggests that, for the task of constructing the (cid:0) (cid:1) that the channel inputs and outputs are jointly Gaussian [7]. next signal to be transmitted, all the “knowledge” contained Thus for any τ ≥ t, it is sufficient to only consider optimal in the vector of prior channel outputs yt−1 is captured by the 1 Gaussian sources that satisfy posterior pdf of the channel state α (·). t−1 So far, we have simplified the capacity-achieving feedback E X s ,yt−1 =0, (76) τ 0 1 strategy such that the transmitter does not need to memorize because otherwise(cid:2)one(cid:12)(cid:12) could (cid:3)easily verify that another the entire channel dynamics y1t−1. Instead, for forming the Gaussian source defined as Xˆ =△ X − E X s ,yt−1 optimal signal Xt (or equivalently state St) to be transmitted τ τ τ 0 1 at time t, we only need to know the immediately preceding for τ ≥t would induce the same channel output entropy (73) (cid:2) (cid:12) (cid:3) channel state S = s (determined by the prior chan- while consuming strictly less power than Xτ. N(cid:12)ow, since t−1 t−1 nel inputs) and the Kalman-Bucy filter output, which is a E X s ,yt−1 =0, we have τ 0 1 Gaussian probability density function α (·) characterized t−1 (cid:2) n (cid:12) (cid:3) n by the conditionalmean m and the conditionalcovariance (cid:12)E (Xτ)2 s0,y1t−1 = Var Xτ s0,y1t−1 . (77) matrix Kt−1. t−1 τ=t τ=t X (cid:2) (cid:12) (cid:3) X (cid:0) (cid:12) (cid:1) (cid:12) (cid:12) But since X and Yt−1 are jointly Gaussian, the variance τ 1 Var X s ,yt−1 does not depend on the realization yt−1, C. Properties of Capacity-Achieving Channel Dynamics τ 0 1 1 and(cid:0)we c(cid:12)an write(cid:1) For any feedback-dependent Gauss-Markov source PαGM, n (cid:12) n wehavethefollowingpropertiesforthechannelstates,outputs E (Xτ)2 s0,y1t−1 = E (Xτ)2|s0 , (78) and posterior state distributions. Xτ=t (cid:2) (cid:12) (cid:3) Xτ=t (cid:2) (cid:3) Theorem 3: For the linear Gaussian noise channel, if the or equivalently,Πt−(cid:12)1 =Πt−1 y1t−1 . Hence, conditioningon feedback-dependent Gauss-Markov source distribution PαGM thechanneloutputhistoryyt−1,isirrelevantwhenformulating is used, we have 1 (cid:0) (cid:1) the power constraint. Directly utilizing (78), we can modify 1) The sequence of posterior channel state distributions Conclusion1intoanew(slightlyrelaxed)conclusionthatdoes α (·), or the pairs of variables m and K , is a Markov t t t not require conditioning on the channel output history when random process. formulating the power constraint. 2) The sequence of the pairs (α (·),S ) is a Markov t t Conclusion 2: At any time instant t, for any con- random process. stant non-negative power constraint Π, the opti- 3) The sequence of the pairs (αt(·),Yt) is a Markov mal distribution of the source Xn that maximizes random process. t tthhee cphoawnenrelcoonustptruatinetntronpy hE Y(Xtn|s)20,sy1t−,y1t−1und=er 4) rTahnedosmequpernocceesos.f the triples (αt(·),St,Yt) is a Marko(cid:3)v n E (X )2|s ≤ Πτ,=dtep(cid:0)endsτonly0on 1t(cid:1)he pos- Proof: FortheGauss-MarkovsourcePGM,therecursive τ=t τ 0 P (cid:2) (cid:12) (cid:3) α terior channel state distribution αt−1((cid:12)·), regardless sum-product rule (59) for updating the posterior pdf of the oPf the ch(cid:2)annel outp(cid:3)ut history yt−1 that lead to that channel state becomes 1 posterior state distribution. We now utilize Conclusion 2 to finalize the proof using an α µ = αt−1(v)Pt(µ|v,αt−1(·))PYt|St−1,St(yt|v,µ)dv ,(79) t inductive argument. Let Π0 = nP be the power constraint at Rαt−1(v)Pt(u|v,αt−1(·))PYt|St−1,St(yt|v,u)dudv (cid:0) (cid:1) the beginning of transmissions. We have already established RR in Conclusion 2 that given the power constraint Π0, the 2Note that here α0“µ” = δ“µ−s0” because s0 is the known initial optimal source that maximizes h(Yn|s ) under the power state. 1 0