ebook img

On the Capacity of Multilevel NAND Flash Memory Channels PDF

0.43 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On the Capacity of Multilevel NAND Flash Memory Channels

Capacity of Multilevel NAND Flash Memory Channels Yonglong Li Aleksandar Kavˇci´c Guangyue Han University of Hong Kong University of Hawaii University of Hong Kong email: [email protected] email: [email protected] email: [email protected] 6 May 10, 2016 1 0 2 y Abstract a M In this paper, we initiate a first information-theoretic study on multilevel NAND 7 flash memory channels [2] with intercell interference. More specifically, for a multi- level NAND flash memory channel under mild assumptions, we first prove that such a ] T channel is indecomposable and it features asymptotic equipartition property; we then I further prove that stationary processes achieve its information capacity, and conse- . s quently, as the order tends to infinity, its Markov capacity converges to its information c [ capacity; eventually, we establish that its operational capacity is equal to its infor- mation capacity. Our results suggest that it is highly plausible to apply the ideas 2 and techniques in the computation of the capacity of finite-state channels, which are v 7 relatively better explored, to that of the capacity of multilevel NAND flash memory 7 channels. 6 5 0 Index Terms: mutual information, capacity, flash memory channels, finite-state channels. . 1 0 6 1 Introduction 1 : v As our world is entering a mobile digital era at a lightening pace, NAND flash memories i X have been seen in a great variety of real-life applications ranging from portable consumer r electronics to personal or even enterprise computing. The insatiable demand of greater a affordability from consumers has been driving the industry and academia to relentlessly make use of aggressive technology scaling and multi-level per cell techniques in the bit- cost reduction process. On the other hand though, as their costs continually reduce, flash memories have been more vulnerable to various device or circuit level noises, such as energy consumption, inter-cell interference and program/erase cycling effects, due to the rapidly growing bit density, and maintaining the overall system reliability and performance has become a major concern. To combat this increasingly imminent issue, various fault-tolerance techniques such as error correction codes have been employed. Representative work in this direction include BCH codes [26] and LDPC codes [29, 8], rank modulation [17] and constrained codes [23] and so on. The use of such techniques certainly boosts the overall system performance, however, at the expense of reduced memory storage efficiency. As the level of sophistication of such performance boosting techniques drastically escalates, it is of central importance to know their theoretical limit in terms of achieving the maximal cell storage efficiency. Recently, there have been a number of attempts in response to such a request; see, e.g., [8, 7, 5, 20, 27] and references therein. Particularly, in [8], the authors have mod- elled NAND flash memories as communication channels that can capture the major data distortion noise sources including program/erase cycling effects and inter-cell interference in information-theoretic terms. In this direction, slight yet important modifications to en- hance the mathematical tractability of the channel model in [8] have been made in [2], where multiple communication channels with input inter-symbol interference that are expected to be more amenable to theoretical analysis were explicitly spelled out. On the other hand, with [2] primarily focusing on the optimal detector design, an information-theoretic analysis of the communication channel capacity, which translates to the theoretical limit of memory cell storage efficiency, is still lacking. Our primary concern in this paper is essentially the one dimensional causal channel model proposed in [2], which, mathematically, can be characterized by the following system of equations (for justification of such a mathematical formulation of the channel, see [2]): Y = X +W +U , 0 0 0 0 Y = X +A X +B (Y −E )+W +U , n ≥ 1, (1) n n n n−1 n n−1 n−1 n n where (cid:52) (i) {X }isthechannelinputprocess,takingvaluesfromafinitealphabetX = {v ,v ,··· ,v }, i 0 1 M−1 and {Y } is the channel output process, taking values from R. i (ii) {A }, {B }, {E } and {W } are i.i.d. Gaussian random processes with mean 0 and i i i i variance σ2, 0 < σ2 < 1, σ2 and 1, respectively; A B E (iii) {U } is an i.i.d. random process with the uniform distribution over (α ,α ), α ,α > 0; i 1 2 1 2 (iv) {A }, {B }, {E }, {W }, {U } and {X } are mutually independent. i i i i i i The major differences between our model and that in [2] are as follows: • As in most practical scenarios, our channel model has a “starting” time 0, when the channel is not affected by inter-cell interference; • An extra assumption in our channel model is that σ2 is upper bounded by 1. As B established in Lemma 2.1, such an extra assumption will guarantee the boundedness of the channel output power, and thereby the “stability” of the channel. Our ultimate goal is to compute the operational capacity C of the channel (1), which, roughly speaking, is defined as the highest rate at which information can be sent with arbitrarily low probability of error. The presence of input and output memory in the channel, however, makes the problem extremely difficult: computing the capacity of channels with memory is a long open problem in information theory. One of the most effective strategies to attack such a difficult problem is the so-called Markov approximation scheme, which has been extensively exploited in the past decades for computing the capacity of families of finite-state channels 2 (see [1, 28, 14] and references therein). Roughly speaking, the Markov approximation scheme says that, instead of maximizing the mutual information over general input processes, one can do so over Markovian input processes of order m to obtain the so-called m-th order Markov capacity. The effectiveness of this approach has been justified in [6], where, for a class of finite-state channels, the authors showed that as the order m tends to infinity, the sequence of the Markov capacity will converge to the real capacity of the memory channel. It is plausible that the Markov approximation scheme can be applied to other memory channels as well; as a matter of fact, the main result of the present paper is to confirm this for our channel model. Recently, much progress has been made in computing the Markov capacity of finite-state channels; in particular, a generalized Blahut-Arimoto algorithm and a randomized algorithm have been respectively proposed in [28] and [14], which, under certain conditions, promise convergence to the the Markov capacity. Though there are numerous issues that need to be addressed to justify the applications of the above-mentioned algorithms to our model, the first and foremost question is whether the Markov capacity converges to the real capacity at all. The affirmative answer given in this work, together with other similarities between the channel models, suggests such a framework “transplantation” is indeed plausible. The recursive nature of our channel permits a reformulation into a channel with “state”: Given the channel input and output (x ,y ) at time i, the behavior of our channel in the i i future does not depend on the channel inputs and outputs before time i; put if differently, (x ,y ) can be regarded as the state for the channel at time i+1. Despite the similarities, i i such a reformulated channel posed new challenges compared with the well-known finite- state channels: The most serious one is that our channel output alphabet is infinite, and as a consequence, the “indecomposability” property of our channel, albeit very similar to that of a finite-state channel, is not uniform over all possible channel states; ripple effects of this issue include a number of technical issues, such as the asymptotic equipartition property and even the existence of some fundamental quantities like mutual information rate and capacity. Which is the reason that in our treatment, some non-trivial technical issues have to be circumvented: We will prove that our channel is “indecomposable” in the sense that the behavior of our channel in the distant future is little affected by the channel state in the earlier stages, and a much finer analysis is needed to deal with the above-mentioned non-uniformity issue. The second issue is that the lack of the stationarity of the output process makes it difficult to establish the asymptotic equipartition property for the output process. For this, we observe that the asymptotic mean stationarity [13] of the output process makes it possible to apply tools from ergodic theory to establish the existence of the mutual information rate of our channel and further the asymptotic equipartition property of the output process. Another issue is to mix the “blocked” processes to obtain a stationary process achieving the information capacity, for which we find an adaptation of Feinstein’s method [10] as a solution. The remainder of this paper is organized as follows. In Section 2, we show that the channel (1) is indecomposable, which, among many other applications, ensures the existence of the information capacity of the channel. In Section 4, we show that, when the input {X } n process is stationary and ergodic, {Y } and {X ,Y } possess the asymptotic equipartition n n n property. In Section 5, the information capacity is shown to be equal to the stationary capacity and Markov capacity approaches to the information capacity as the Markov order 3 goes to infinity. Eventually, the operational capacity is shown to be equal to the information capacity. 2 Indecomposability In this section, we will prove that our channel (1) is “indecomposable” in the sense that, in the distant future, it is little affected by the channel state in the earlier stages. Taking the forms of several inequalities in Lemma 2.4, the indecompoposability property, among many other applications, will ensure that the information capacity of our channel is well-defined. To avoid the notational cumbersomeness in the computations, we write ˆ W = X +A X +W −B E +U . i i i i−1 i i i−1 i It then follows from a recursive application of (1) that n n n (cid:88) (cid:89) (cid:89) ˆ ˆ Y = W +B Y = W B +Y B . (2) n n n n−1 i j k+1 i i=k+2 j=i+1 i=k+2 The following lemma gives an upper bound on the moments of the output of the channel (1). Lemma 2.1. There exist M > 0 and β > 2 such that for any n and xn, 2 0 E[|Y |β|Xn = xn] ≤ M , n 0 0 2 and consequently, E[|Y |β] ≤ M . n 2 Proof. In this proof, we will simply replace “Xn = xn” in the conditional part of an expec- 0 0 tation by xn. 0 It follows from Minkowski’s inequality that for any p ≥ 1 (E[|Yn|p|xn0])p1 ≤ (E[|Wˆn|p|xn0])p1 +(E[|Bn|p|xn0])p1(E[|Yn−1|p|xn0])p1 ≤ (E[|Wˆn|p|xn0])p1 +(E[|Bn|p])p1(E[|Yn−1|p|xn0])p1, where we have used the independence between B and Y , and the independence between n n−1 B and Xn. Since σ2 < 1, there exists β ∈ (2,3) such that E[|B |β] < 1. Let n 0 B n ρ = E[|Bn|β]β1, M0 = max{|α1|,|α2|,|vi|, i = 0,··· ,m−1}. (3) Then, from Minkowski’s inequality and Assumptions (i)-(iv), it follows that (E[|Wˆn|β|xn0])β1 ≤ (|xn|β])β1 +(E[|Anxn−1|β])β1 +(E[|Wn|β])β1 +(E[|BnEn|β])β1 +(E[|Un|β])β1 ≤ 2M0 +M0(E[|An|β])β1 +(E[|Wn|β])β1 +(E[|Bn|β])β1(E[|En|β])β1 (a) ≤ 2M +M (E[|A |4])1 +(E[|W |4])1 +(E[|B |4])1(E[|E |4])1, 0 0 n 4 n 4 n 4 n 4 4 where (a) follows from the inequality (E[|X|p])p1 ≤ (E[|X|q])1q for 0 < p < q. Letting M = 2M +M (3σ4)1 +31 +(3σ4)1(3σ4)1, we then have 1 0 0 A 4 4 B 4 E 4 E[|Wˆn|β|xn0]β1 ≤ M1, where we have used the fact that the 4-th moment of a Gaussian random variable with mean 0 and variance σ2 is 3σ4. Therefore, (E[|Yn|β]|xn0])β1 ≤ M1 +ρ(E[|Yn−1|β|x0n−1])β1, (4) which implies that n−1 (E[|Yn|β|xn0])β1 ≤ M1(cid:88)ρi +ρn(E[|Y0|β|x0])β1 i=0 ≤ M1/(1−ρ)+ρn(E[|Y0|β|x0])β1. It then follows from E[|Y0|β|x0]β1 ≤ (|x0|β)β1 +E[|W0|β]β1 +E[|U0|β]β1 ≤ 2M0 +E[|W0|4]41, that there exists M > 0 such that for all xn, 2 0 E[|Y |β|xn] ≤ M , n 0 2 which immediately implies that E[|Y |β] ≤ M . n 2 Lemma 2.1 immediately implies the following corollary. Corollary 2.2. {Y2} is uniformly integrable and there exists constant M > 0 such that n 3 E[Y2|Xn = xn] ≤ M , (5) n 0 0 3 and consequently, E[Y2] ≤ M . (6) n 3 Proof. The desired uniform integrability immediately follows from Theorem 1.8 in [21] and Lemma 2.1, and the inequality (5) follows from the well-known fact that for any β > 2, E[Y2|Xn = xn]12 ≤ E[Yβ|Xn = xn]β1, n 0 0 n 0 0 which immediately implies (6). One consequence of Corollary 2.2 is the following bounds on the entropy of the channel output. 5 Corollary 2.3. For all 0 ≤ m ≤ n, (n−m+1)log2πeM 0 < H(Yn) ≤ 3, m 2 where M is as in Corollary 2.2. 3 Proof. For the upper bound, we have n (cid:88) (n−m+1)log2πeM H(Yn) ≤ H(Y ) ≤ 3, (7) m i 2 i=m where (7) follows from the fact that Gaussian distribution maximizes entropy for a given variance. For the lower bound, using the chain rule for entropy and the fact that conditioning reduces entropy, we have H(Yn) ≥ H(Yn|Xn) m m m n (cid:88) ≥ H(Y |Xn,Y ) i m i−1 k=m n (cid:88) ≥ H(Y |Xi ,Y ,E ,B ,U ) i i−1 i−1 i i i k=m n (=a) (cid:88)H(W |Xi ,Y ,E ,B ,U ) i i−1 i−1 i i i k=m n (b) (cid:88) = H(W ) i i=m (n−m+1)log2πe = > 0, 2 where we have used (1) and Assumption (iv) in deriving (a) and (b). Fix k ≥ 0, and for any x ∈ X and y˜ ∈ R, define k k ˜ Y = X +A x +B (y˜ −E )+W +U , (8) k+1 k+1 n k n k k k+1 k+1 ˜ ˜ Y = X +A X +B (Y −E )+W +U , n ≥ k +1. (9) n n n n−1 n n−1 n−1 n n ˜ Roughly speaking, {Y } “evolves” in the same way as {Y }, however with different “condi- n n tions” at time k. And similarly as in (2), we have n n n (cid:88) (cid:89) (cid:89) ˜ ˆ ˜ Y = W B +Y B . (10) n i j k+1 i i=k+2 j=i+1 i=k+1 Below, wewillusef (orp)withsubscriptedrandomvariablestodenotethecorresponding (conditional)probabilitydensityfunction(ormassfunction). Forinstance,f (y |xn,y ) Yn|Xkn,Yk n k k 6 denotes the conditional density of Y given Xn = xn and Y = y . We may, however, drop n k k k k the subscripts when there is no confusion and similar notational convention will be followed throughout the remainder of the paper. Wearenowreadyforthefollowinglemmathatestablishesthe“indecomposability”ofour channel. Roughly speaking, the following lemma states that our channel is indecomposable in the sense that the output of our channel in the “distant future” is little affected by the “initial” inputs and outputs. Compared with the indecomposability property of finite-state channels [11], our indecomposability does depend on the initial channel inputs and outputs; as a result, a much finer analysis is needed to deal with this non-uniformity issue when one applies Lemma 2.4. Lemma 2.4. a) For any k ≤ n, xn, y and y˜ , we have k k k (cid:90) ∞ (cid:12) (cid:12) (cid:12)f (y |xn,y )−f (y |xn,y˜ )(cid:12)dy ≤ σ2(n−k)(y2 +y˜2). (cid:12) Yn|Xkn,Yk n k k Y˜n|Xkn,Y˜k n k k (cid:12) n B k k −∞ b)For any k ≤ n, xn, y and y˜ , we have k k k (cid:90) ∞ (cid:12) (cid:12) y2(cid:12)f (y |xn,y )−f (y |xn,y˜ )(cid:12)dy ≤ 3σ2(n−k)(y2 +y˜2). n(cid:12) Yn|Xkn,Yk n k k Y˜n|Xkn,Y˜k n k k (cid:12) n B k k −∞ c) For any k,n, x and y and xˆn, we have n n 0 (cid:90) ∞ (cid:12) (cid:12) (cid:12)f (yˆ|xˆn)−f (yˆ|xˆn,x ,y )(cid:12)dyˆ≤ σ2n(σ2x2 +2σ2(y2 +σ2)). (cid:12) Yn|X0n 0 Yn+k+1|Xnn++1k+1,Xn,Yn 0 n n (cid:12) B A n B n E −∞ d) For any k ≤ n and any xn with p (xn) > 0, we have 0 X0n 0 (cid:90) ∞ (cid:12) (cid:12) (cid:12)f (y |xn)−f (y |xn )(cid:12)dy ≤ σ2k(2σ2x2 +2σ2(2M +2σ2)), (cid:12) Yn|X0n n 0 Yn|Xnn−k n n−k (cid:12) n B A n−k B 3 E −∞ where M is as in Corollary 2.2. 3 Proof. a) Conditioned on Xn = xn, Bn = bn , Un = un , E = e , Y = y , and k k k+2 k+2 k+1 k+1 k k k k Y˜ = y˜ , Y and Y˜ are Gaussian random variables with mean (cid:80)n (x +u )(cid:81)n b and k k n n i=k+1 i i j=i+1 j respective variances σ2(bn ,un ) = Var(Y |xn,y ,e ,bn ,un ), σ˜2(bn ,un ) = Var(Y˜ |xn,y˜ ,e ,bn ,un ). k+2 k+1 n k k k k+2 k+1 k+2 k+1 n k k k k+2 k+1 Notethatconditionedonxn, bn , un, e , y andy˜ ,{Wˆ : i = k+2,··· ,n}and{Y ,Y˜ } k k+2 k k k k i k+1 k+1 are independent, which implies that (cid:32) (cid:33) (cid:32) (cid:33) n n n (cid:88) (cid:89) (cid:89) σ2(bn ,un ) =Var Wˆ b |xn,bn ,un +Var Y b |xn,y ,bn ,un ,e k+2 k+1 i j k k+2 k+1 k+1 j k k k+2 k+1 k i=k+2 j=i+1 j=k+2 and (cid:32) (cid:33) (cid:32) (cid:33) n n n (cid:88) (cid:89) (cid:89) σ˜2(bn ,un ) =Var Wˆ b |xn,bn ,un +Var Y˜ b |xn,y˜ ,bn ,un ,e . k+2 k+1 i j k k+2 k+1 k+1 j k k k+2 k+1 k i=k+2 j=i+1 j=k+2 7 So, we have |σ2(bn ,un )−σ˜2(bn ,un )| k+2 k+1 k+2 k+1 (cid:12) (cid:32) (cid:33) (cid:32) (cid:33)(cid:12) (cid:12) (cid:89)n (cid:89)n (cid:12) = (cid:12)Var Y b |xn,y ,bn ,un ,e −Var Y˜ b |xn,y˜ ,bn ,un ,e (cid:12) (cid:12) k+1 j k k k+2 k+1 k k+1 j k k k+2 k+1 k (cid:12) (cid:12) (cid:12) j=k+2 j=k+2 n (cid:12) (cid:12) (cid:89) = ((cid:12)y2 −y˜2(cid:12))σ2 b2 k k B j j=k+2 n (cid:89) ≤ (y2 +y˜2)σ2 b2. (11) k k B j j=k+2 Now, with the following easily verifiable fact σ2(bn ,un ) ≥ Var(W ) = 1 and σ˜2(bn,un) ≥ Var(W ) = 1, (12) k+2 k+1 n 1 0 n we conclude that (cid:90) ∞ (cid:12) (cid:12) (cid:12)f (y |xn,y )−f (y |xn,y˜ )(cid:12)dy (cid:12) Yn|Xkn,Yk n k k Y˜n|Xkn,Y˜k n k k (cid:12) n −∞ (cid:26)(cid:90) ∞ (cid:27) ≤ E |f(y |xn,y ,E ,Bn ,Un )−f(y |xn,y ,Bn ,Un ,E )|dy n k k k k+2 k+1 n k k k+2 k+1 k n −∞ (a) (cid:26)|σ2(Bn ,Un )−σ˜2(Bn ,Un )|min(σ2(Bn ,Un ),σ˜2(Bn ,Un ))(cid:27) ≤ E k+2 k+1 k+2 k+1 k+2 k+1 k+2 k+1 σ2(Bn ,Un )σ˜2(Bn ,Un ) k+2 k+1 k+2 k+1 (cid:40) (cid:41) n (b) (cid:89) ≤ E (y2 +y˜2)σ2 B2 k k B j j=k+2 = (y2 +y˜2)σ2(n−k), (13) k k B where (a) follows from the well-known fact [22] (cid:12) (cid:12) (cid:90) ∞ (cid:12) 1 −(x−µ)2 1 −(x−µ)2(cid:12) |σ2 −σ2|min{σ2,σ2} −∞(cid:12)(cid:12)(cid:12)(cid:112)2πσ12e 2σ12 − (cid:112)2πσ22e 2σ22 (cid:12)(cid:12)(cid:12) dx ≤ 1 2σ12σ22 1 2 . and (b) follows from (11) and (12). b) The proof of b) is similar to a) and the only difference lies in the derivation of (13), which is given as follows: (cid:90) ∞ (cid:12) (cid:12) y2(cid:12)f (y |xn,y )−f (y |xn,y˜ )(cid:12)dy n(cid:12) Yn|Xkn,Yk n k k Y˜n|Xkn,Y˜k n k k (cid:12) n −∞ (cid:26)(cid:90) ∞ (cid:27) ≤ E y2|f(y |xn,y ,E ,Bn ,Un )−f(y |xn,y ,Bn ,Un ,E )|dy n n k k k k+2 k+1 n k k k+2 k+1 k n −∞ (a) (cid:8) (cid:9) ≤ 3E |σ2(Bn ,Un )−σ˜2(Bn ,Un )| k+2 k+1 k+2 k+1 (cid:40) (cid:41) n (cid:89) ≤ 3E (y2 +y˜2)σ2 B2 k k B j j=k+2 = 3(y2 +y˜2)σ2(n−k), k k B 8 where (a) follows from the fact that (see Appendix A for the proof) (cid:12) (cid:12) (cid:90) ∞ (cid:12) 1 −(x−µ)2 1 −(x−µ)2(cid:12) −∞x2(cid:12)(cid:12)(cid:12)(cid:112)2πσ12e 2σ12 − (cid:112)2πσ22e 2σ22 (cid:12)(cid:12)(cid:12) dx ≤ 3|σ12 −σ22|. (14) c) This follows from a completely parallel argument as in a). d) From the assumptions in the channel (1) and Lemma 2.2, it follows that (cid:90) y2 f (y |xn )dy (15) n−k Yn−k|Xnn−k n−k n−k n−k (cid:82) P(Xn−k−1 = x˜n−k−1,Xn = xn ) y2 f (y|x˜n−k−1,xn )dy (cid:88) 0 0 n−k n−k n−k Y |Xn−k 0 n−k n−k = n−k 0 P(Xn = xn ) x˜n−k−1 n−k n−k 0 (cid:82) P(Xn−k−1 = x˜n−k−1,Xn = xn ) y2 f (y|x˜n−k−1,x )dy (cid:88) 0 0 n−k n−k n−k Y |Xn−k 0 n−k n−k = n−k 0 P(Xn = xn ) x˜n−k−1 n−k n−k 0 = (cid:88) P(X0n−k−1 = x˜n0−k−1,Xnn−k = xnn−k)E[Yn2−k|x˜n0−k−1,xn−k] P(Xn = xn ) x˜n−k−1 n−k n−k 0 ≤ M . 3 We then have (cid:90) ∞ (cid:12) (cid:12) (cid:12)f (y |xn)−f (y |xn )(cid:12)dy (cid:12) Yn|X0n n 0 Yn|Xnn−k n n−k (cid:12) n −∞ (cid:90) ∞ (cid:12)(cid:90) (cid:12) = (cid:12)(cid:12) fYn−k|X0n−k(yˆn−k|xn0−k)fYn−k|Xnn−k(y˜n−k|xnn−k) −∞ (cid:12) × (f (y |xn ,yˆ )−f (y |xn ,y˜ ))dyˆ dy˜ (cid:12)dy Yn|Xnn−k,Yn−k n n−k n−k Yn|Xnn−k,Yn−k n n−k n−k n−k n−k(cid:12) n (cid:90) ≤ f (yˆ |xn−k)f (y˜ |xn ) Yn−k|X0n−k n−k 0 Yn−k|Xnn−k n−k n−k (cid:90) ∞ (cid:12) (cid:12) × (cid:12)f (y |xn ,yˆ )−f (y |xn ,y˜ )(cid:12)dy dyˆ dy˜ (cid:12) Yn|Xnn−k,Yn−k n n−k n−k Yn|Xnn−k,Yn−k n n−k n−k (cid:12) n n−k n−k −∞ (cid:90) (a) ≤ f (yˆ |xn−k)f (y˜ |xn )σ2k(y2 +y˜2 )dyˆ dy˜ Yn−k|X0n−k n−k 0 Yn−k|Xnn−k n−k n−k B n−k n−k n−k n−k ≤ 2σ2kM , (16) B 3 where (a) follows from Statement a) in Lemma 2.4. One of the consequences of Lemma 2.4 is the following proposition: Proposition 2.5. a) Let X2n+1 be an independent copy of Xn. Then for any k ≤ n, any n+1 0 x ∈ X and y ∈ R, we have |I(Xn;Yn)−I(X2n+1;Y2n+1|X = x,Y = y)| 0 0 n+1 n+1 n n ≤ 2(k +1)logM +(n−k)(σ2x2 +2σ2(y2 +σ2))σ2klogM. A B E B 9 b)Let{X }beastationaryprocess. ThenthereexistpositiveconstantsM ,M ,M ,M ,M n 4 5 6 7 8 and M such that for any m ≤ k ≤ n 9 (cid:12) (cid:12) (cid:12)I(Xn;Yn)−I(Xm+n;Ym+n)(cid:12) 0 0 m m 3(k +1)log2πeM ≤ 3 +2M πe(M +3M )σ2k n+1 3 8 9 B (cid:18) (cid:19) 1 4M M M 12M M + (M +M M )+ M + 1 3 7 + 3 7 σ2k. (17) n+1 4 5 3 6 (1−σ )2 (n+1)(1−σ2) B B B Proof. a) To prove a), we adapt the classical argument in the proof of Theorem 4.6.4 in [11] as follows. Using the chain rule for mutual information, we have I(Xn;Yn) = I(Xk;Yn)+I(Xn ;Yn |Xk,Yk)+I(Xn ;Yk|Xk). 0 0 0 0 k+1 k+1 0 0 k+1 1 0 It can be verfied that given Xk, Xn and Yk are independent, which implies that 0 k+1 0 I(Xn ;Yk|Xk) = 0. k+1 0 0 Since X takes at most M values, we deduce that i |I(Xk;Yn)| ≤ (k +1)logM, 0 0 which further implies that I(Xn;Yn) ≤ (k +1)logM +I(Xn ;Yn |Xk,Yk). (18) 0 0 k+1 k+1 0 0 Similarly, we have, for any x,y I(X2n+1;Y2n+1|X = x,Y = y) (19) n+1 n+1 n n ≥ −(k +1)logM +I(X2n+1 ;Y2n+1 |Xn+k+1,Yn+k+1,X = x,Y = y). (20) n+k+2 n+k+2 n+1 n+1 n n It follows from the definition of conditional mutual information that (cid:90) (cid:88) I(Xn ;Yn |Xk,Yk) = p(xk) f (yk|xk)I(Xn ;Yn |xk,yk)dyk k+1 k+1 0 0 0 Yk|Xk 0 0 k+1 k+1 0 0 0 0 0 xk 0 (cid:90) (cid:88) = p(xk) f (y |xk)I(Xn ;Yn |xk,y )dy (21) 0 Yk|X0k k 0 k+1 k+1 0 k k xk 0 and I(X2n+1 ;Y˜2n+1 |Xn+k+1,Yn+k+1,X = x,Y = y) n+k+2 n+k+2 n+1 n+1 n n (cid:90) (cid:88) (cid:110) = p (xk|x,y) f (yk|xk,x,y) Xnn++1k+1|Xn,Yn 0 Ynn++1k+1|Xnn++1k+1,Xn,Yn 0 0 xk 0 (cid:9) × I(X2n+1 ;Y2n+1 |Xn+k+1 = xk,Yn+k+1 = yk,X = x,Y = y) dyk n+k+2 n+k+2 n+1 0 n+1 0 n n 0 (cid:90) (cid:88) = p (xk) f (y |xk,x,y)I(Xn ;Yn |xk,y ))dy , (22) Xnn++1k+1 0 Yn+k+1|Xnn++1k+1,Xn,Yn k 0 k+1 k+1 0 k k xk 0 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.