ebook img

Novel Lower Bounds on the Entropy Rate of Binary Hidden Markov Processes PDF

0.13 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Novel Lower Bounds on the Entropy Rate of Binary Hidden Markov Processes

Novel Lower Bounds on the Entropy Rate of Binary Hidden Markov Processes Or Ordentlich MIT [email protected] Abstract—Recently,Samorodnitskyprovedastrengthenedver- [0,1], and a ∗ b , a(1 − b) + b(1 − a). Here, as well as 2 sion of Mrs. Gerber’s Lemma, where the output entropy of a throughouttherestofthepaper,logarithmsaretakentobase2. binary symmetric channel is bounded in terms of the average SincetheentropyrateofthesymmetricMarkovprocess{X } entropyoftheinputprojectedonarandomsubsetofcoordinates. is H¯(X)=h(q), for symmetric hiddenMarkovprocesses tnhe Here,thisresultisappliedforderivingnovellowerboundsonthe entropyrateof binaryhiddenMarkovprocesses. Forsymmetric bound (1) takes the simple form 6 underlyingMarkovprocesses,ourboundimprovesuponthebest 1 H¯(Y)≥h(α∗q). (2) known bound in the very noisy regime. The nonsymmetric case 0 is also considered, and explicit bounds are derived for Markov 2 processes that satisfy the (1,∞)-RLL constraint. Unfortunately, this bound is quite loose for many regimes of the process parameters α and q. y a I. INTRODUCTION Recently, Samorodnitsky [2] proved a strengthened ver- M sion of MGL, where the normalized input entropy Let {X }, n = 1,2,..., be a symmetric stationary binary n H(X ,...,X )/n in the right hand side of (1) is replaced 9 Markov process with transition probability 0 < q < 21, such by th1e averagne normalized entropy of the random vector that X ∼Bernoulli(1) and for any n>1 ] 1 2 (X1,...,Xn) projected on a random subset of coordinates. T Xn =Xn−1⊕Wn, In this paper we apply the results of [2] to derive a novel I lower bound on H¯(Y). Despite its simplicity, we show that s. where{Wn},n=2,3,...,isasequenceofi.i.d.Bernoulli(q) this bound is stronger than the best known lower bounds for c randomvariables,statisticallyindependentofX1.Weconsider the very noisy regime (α → 1), and recovers the strongest [ thehiddenMarkovprocess{Yn},n=1,2,...,obtainedatthe bound for the fast transitions2regime (q → 1). For finite 2 output of a binary symmetric channel (BSC) with crossover values of (α,q) it is numericallydemonstratedt2hat the bound v probability 0 < α < 21, whose input is the process {Xn}. is reasonably close to the true value of H¯(Y), which can 3 Namely, be estimated to an arbitrary precision by various known 5 approximation algorithms. 4 Yn =Xn⊕Zn, 6 We also derive a lower boundon H¯(Y) for the case where where{Z },n=1,2,...,isasequenceofi.i.d.Bernoulli(α) 0 n the process {X } is a nonsymmetric binary Markov process. n . random variables, statistically independentof {Xn}. The task For the special case of Markov processes that satisfy the so- 1 of finding an explicit form for the entropy rate 0 called (1,∞)-RLL constraint, our bound is shown to be tight 6 H(Y ,...,Y ) in the very noisy regime. 1 H¯(Y), lim 1 n n→∞ n : II. PRELIMINARIES v of the process {Y } is a long-standingopen problem,and the n Xi main contribution of this paper is in providing novel lower Let X=(X1,...,Xn) be a binary n-dimensional random bounds for this quantity. vector, [n] , {1,...,n}, and S ⊆ [n] some subset of r a A simple lower bound on H¯(Y) can be obtained by in- coordinates. The projection of X onto S is defined as voking Mrs. Gerber’s Lemma (MGL) [1], which states that if X ,{X :i∈S}. S i {X } is the inputto a BSC with crossoverprobabilityα, and n As before, we assume that Y is the output of a BSC {Y } is the output, then n with crossover probability α, whose input is the vector X. H(Y ,...,Y )≥nh α∗h−1 H(X1,...,Xn) , (1) Samorodnitsky has proved the following result. 1 n n Theorem 1 ( [2, Theorem 1.11]): Letλ=(1−2α)2andlet (cid:18) (cid:18) (cid:19)(cid:19) where h(p) , −plog(p)−(1−p)log(1−p) is the binary B be a random subset uniformly distributed over all subsets entropyfunction,h−1(·)isitsinverserestrictedtotheinterval of [n] with cardinality ⌈λn⌉. Then H(X |B) TheworkofO.OrdentlichwassupportedbytheMIT-TechnionPostdoc- H(Y)≥nh α∗h−1 B −E, (3) λn toralFellowship. (cid:18) (cid:18) (cid:19)(cid:19) where E =O logn ·(n−H(X)). and define the random variable K as the largest k for which n A ≤ n. Clearly, the subset S and the subset {A ,...,A } By Han’s i(cid:18)neqquality(cid:19)[3], the quantity H(XB|B)/λn is hakve the same distribution, and therefore 1 K monotonicallynonincreasingin λ, and therefore,ignoring the error term E, it can be seen that the bound (3) is stronger H(X |S) 1 K S = E H(X |X ,...,X ) than (1). n n Ai Ai−1 A1 Forourpurposes,itwillbeconvenienttoreplaceH(X |B) Xi=1 B n bwyithindHe(pXenSd|eSn)t,lywshaemreplSingiseaacrhanedleommesnutbiswetitohfp[nro]bgaebnileirtayteλd. = n1E H(XAi|XAi−1,...,XA1)1(i≤K)! i=1 It is easy to verify that for any distribution PX on {0,1}n Xn 1 holds = E H(X |X )1(i≤K) (7) n Ai Ai−1 H(X |B)−H(X |S) i=1 ! lim B S =0, X n and we can thne→re∞fore indeed rλenplace B with S in Theorem 1, = n1E 1(1≤K)+ H(XAi−Ai−1+1|X1)1(i≤K)! i=2 perhaps with a different convergence rate for E. In fact, X (8) Polyanskiy and Wu [4] distilled from [2] the inequality n 1 I(U :Y)≤I(U :XS|S), (4) = n Pr(K ≥1)+ E(H(XGi+1|X1)1(i≤K))!, i=2 thatholdsforanyrandomvariableU satisfyingtheMarkovre- X (9) lation U →X→Y. Using(4), the chainrule ofentropy,and theconvexityoftheMGLfunctionϕ(t),h(α∗h−1(t)), itis where 1(T) is an indicator on the event T, (7) follows since {X } is a first-order Markov process, and (8) follows from a simple exercise to prove the following form of Theorem 1. n the stationarity of {X }. For any 2≤i≤n we have n Proposition 1: Let λ = (1−2α)2 and let S be a random subset of [n] generated by independently sampling each ele- E(H(X |X )1(i≤K)) Gi+1 1 ment i with probability λ. Then =E (H(X |X )Pr(K ≥i|G )) Gi Gi+1 1 i H(Y)≥nh α∗h−1 H(XS|S) . (5) =EGi(H(XGi+1|X1)Pr(Binomial(n−Gi,λ)≥i−1)) λn (cid:18) (cid:18) (cid:19)(cid:19) By thelawoflargenumbers,foranyǫ>0andfixedg ,there i exists some N such that for all n>N holds 0 0 III. MAIN RESULT [1−ǫ,1) i≤(λ−ǫ)n In order to apply Proposition 1 for lower bounding H¯(Y), Pr(Binomial(n−gi,λ)≥i−1)∈ we needto evaluatethequantityH(X |S)/λn forsymmetric ((0,ǫ] i≥(λ+ǫ)n S Markov processes {X }. We will use the notation n Combining this with (9) gives that q∗k ,q∗q∗···∗q, 1 lim H(X |S)=λEH(X |X ) ktimes n→∞n S G+1 1 and note that | {z } =λH(XG+1|X1,G), (10) q∗k =Pr(W1⊕···⊕Wk =1)= 1−(1−2q)k. (6) andourclaimfollowssince H(XG+1|X1,G)=Eh q∗G for 2 stationary symmetric Markov processes. (cid:0) (cid:1) Proposition 2: Let0<λ<1,andletS bearandomsubset Remark 1: Anexpressionsimilarto(10)canalsoberecov- of [n] generated by independently sampling each element i ered from [5, Corollary II.2]. with probability λ. Then Our main result now follows directly from combining H(X |S) Propositions 1 and 2 and using the continuity of the MGL lim S =Eh q∗G , n→∞ λn function ϕ(t). whereGisageometricrandomvariabl(cid:0)ewith(cid:1) parameterλ,i.e. Theorem 2: The entropy rate of the process obtained by Pr(G=g)=(1−λ)g−1λ for g =1,2,.... passing a symmetric binary Markov process with transition probability q through a BSC with crossover probability α Proof: Let G , i = 1,2,..., be a sequence of i.i.d. i satisfies geometric random variables with parameter λ. Define the autoregressive process H¯(Y)≥h α∗h−1 Eh q∗G , (11) k A = G , k =1,2,..., where G is a geometric r(cid:0)andom v(cid:0)ariab(cid:0)le wi(cid:1)t(cid:1)h(cid:1)parameter λ= k k (1−2α)2. i=1 X IV. ASYMPTOTICANALYSIS AND NUMERICAL EXAMPLES and below. The upper and lower bounds from [7, Theorem 4.13] on (1−H¯(Y))/ǫ4 are plotted in Figure 1, along with In this section we evaluate the bound from Theorem 2 in the upper bound from Theorem 3. It is seen that Theorem 3 the limits of α → 12 and q fixed (very noisy regime), and in improvesupon the best known lower bounds on H¯(Y) in the the limit of q → 1 and α fixed (fast transitions regime). 2 limit of α → 1. Furthermore, unlike [7, Theorem 4.13] that Theorem 3: Let q be fixed and α= 1 −ǫ. Then only holds for2q ≥ 1, our result holds for all q. 2 4 ∞ log(e) (1−2q)2k H¯(Y)≥1−16ǫ4 +o(ǫ4). 2k(2k−1)1−(1−2q)2k k=1 8 X OW Upper Bound Proof: By, e.g., [6, Lemma 1], we have that for any 0≤ 7 OThWe oLroewme 3r Bound α,γ ≤1 holds 6 h(α∗γ)≥h(γ)+(1−h(γ))·4α(1−α) 5 =1−(1−2α)2(1−h(γ)). (12) ) Y ¯H(ǫ44 Setting β ,Eh q∗G , the RHS of (11) reads h(α∗h−1(β)), −1 which, by (12) and the parametrization α = 1 −ǫ, can be 3 (cid:0) (cid:1) 2 bounded as 2 h α∗h−1(β) ≥1−4ǫ2(1−β). 1 It therefore, re(cid:0)mains to ap(cid:1)proximate β. Recall the Taylor 0 expansion of the binary entropy function 0.25 0.3 0.35 0.4 0.45 0.5 q ∞ 1 log(e) h −p =1− (2p)2k. (13) (cid:18)2 (cid:19) k=12k(2k−1) Fig.1. Comparisonbetween theboundsfrom[7,Theorem4.13]onH¯(Y) X intheverynoisyregime,andthenewboundfromTheorem3. Using (6), we have 1−(1−2q)G Next, we move to show that the lower bound from Theo- β =Eh rem 2 is tight in the extreme regime of fast transitions, i.e., 2 (cid:18) (cid:19) q → 1 and α fixed. Let q = 1 −ǫ. With this parametriza- ∞ log(e) 2 2 =1− E(1−2q)2kG (14) tion, (15) reads 2k(2k−1) Xk∞=1 ∞ log(e) λ(2ǫ)2k log(e) λ(1−2q)2k β =1− (16) =1− , (15) 2k(2k−1)1−(1−λ)(2ǫ)2k 2k(2k−1)1−(1−λ)(1−2q)2k Xk=1 Xk=1 =1−2log(e)λǫ2+O(ǫ4). where (14) is justified since the sum ∞ log(e) E(1 − k=1 2k(2k−1) Now, using (12) and Theorem 2, we have the following 2q)2kG converges, and in (15) we have used the fact that P proposition. EtG = λt . To further approximate (15), we write 1−(1−λ)t Proposition 3: Let α be fixed and q = 1 −ǫ. Then 2 λ(1−2q)2k λ(1−2q)2k 1 1−(1−λ)(1−2q)2k = 1−(1−2q)2k · 1+ λ(1−2q)2k 1−H¯(Y)≤λ(1−β) 1−(1−2q)2k =2log(e)λ2ǫ2+O(ǫ4) λ(1−2q)2k = ·(1+O(λ)) =2log(e)(1−2α)4ǫ2+O(ǫ4) 1−(1−2q)2k =4ǫ2 (1−2q)2k +O ǫ4 . In [7, Theorem 4.12] it was proved that for 0 ≤ α ≤ 1, 1−(1−2q)2k q = 1 −ǫ, as ǫ↓0 2 2 (cid:0) (cid:1) Consequently, 1−H¯(Y)=2log(e)(1−2α)4ǫ2+o(ǫ3). ∞ log(e) (1−2q)2k β =1−4ǫ2 +O ǫ4 . 2k(k−1)1−(1−2q)2k It therefore follows that the bound from Theorem 2 becomes Xk=1 (cid:0) (cid:1) tight as q → 21. which yields the desired result. Itcanbeshownthatintheregimesofveryhigh-SNR(α→ To date, the best known upper and lower boundson H¯(Y) 0 and q fixed) and rare transitions (q → 0 and α fixed), the in the very noisy regime were the ones found in [7, Theorem boundfrom Theorem2 is looser than the boundsfoundin [7, 4.13].Inparticular,theratio 1−H¯(Y) wasboundedformabove Theorem 4.11] and in [8], respectively. ǫ4 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.8 ) ) Y Y ¯H(0.75 ¯H(0.75 0.7 0.7 0.65 0.65 0.6 0.6 MGL MGL Lower Bound from Theorem 2 Lower Bound from Theorem 2 0.55 Approximated Value 0.55 Approximated Value 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 q α (a)α=0.11 (b)q=0.11 Fig.2. Comparisonbetweenthelowerbound(11),MGLbasedbound(2),andtheapproximate value ofH¯(Y)computedusing[7,Algorithm 4.25]. Foranypairoffinitevaluesof(α,q),theentropyrateH¯(Y) probability matrix P, such that X ∼ Bernoulli(π ) and 1 1 canbeapproximatedtoanarbitraryprecision.Forexample,[3, for n = 2,3,... holds Pr(Xn = j|Xn−1 = i) = Pij. For Theorem 4.5.1] shows that k =1,2,..., we define the quantities H(Yn|Yn−1...,Y1,X1)≤H¯(Y)≤H(Yn|Yn−1...,Y1) qi#jk , Pk ij =Pr(Xn =j|Xn−k =i). and the two bounds converge to the same limit as n → ∞. Thefollowing(cid:0)isan(cid:1)extensionof Proposition2 fornonsym- Unfortunately, the computational complexity of the lower metric hidden Markov processes. bound(aswellastheupperbound)above,isexponentialinn.1 Proposition 4: Let{X }beastationaryfirst-orderMarkov To thatend,variousworksintroduceddifferentalgorithmsfor n approximatingH¯(Y)[7],[10]–[12],eachalgorithmexhibiting process with transition probability matrix P and stationary distribution π. Let 0 < λ< 1, and let S be a random subset a different trade-off between approximation accuracy and of [n] generated by independently sampling each element i complexity. with probability λ. Then Toobtainabetterappreciationofthetightnessofthebound from Theorem 2 for finite values of (α,q), we numerically lim H(XS|S) =π Eh q#G +π Eh q#G , (17) compareittotheoutputofonesuchapproximationalgorithm. n→∞ λn 0 01 1 10 In particular, we use [7, Algorithm 4.25] to approximate whereGisageometricrando(cid:16)mvar(cid:17)iablewith(cid:16)param(cid:17)eterλ,i.e. H¯(Y), where the algorithm parameters are chosen to ensure Pr(G=g)=(1−λ)g−1λ for g =1,2,.... high enough accuracy, and plot the results alongside with Proof: The proofis similar to thatof Proposition2 up to the lower bound of Theorem 2. We also plot the lower equation(10),wherenowH(X |X ,G)=π Eh q#G + bound(2)obtainedbysimplyapplyingMrs.Gerber’sLemma. G+1 1 0 01 The results for fixed α = 0.11 and varying q are shown in π Eh q#G . (cid:16) (cid:17) 1 10 Figures 2a, and those for fixed q = 0.11 and varying α, in Com(cid:16)binin(cid:17)gthiswithProposition1andthecontinuityofthe Figure 2b. MGL function ϕ(t) gives the following. V. NONSYMMETRIC MARKOV CHAINS Theorem 4: The entropy rate of the process obtained by passinga stationaryfirst-orderMarkovprocesswith transition In this section we extend ourlower boundfromTheorem2 probability matrix P and stationary distribution π through a to the case where the input to the BSC is a nonsymmetric BSC with crossover probability α satisfies Markov process. Let P= 1−q01 q01 H¯(Y)≥h α∗h−1 π0Eh q0#1G +π1Eh q1#0G , (cid:20) q10 1−q10 (cid:21) where G is a (cid:16)geometric(cid:16)random(cid:16)varia(cid:17)ble with p(cid:16)arame(cid:17)t(cid:17)er(cid:17)λ= be a transition probability matrix, and π = [π π ] be a (1−2α)2. 0 1 stationary distribution for P, such that πP = π. Let {X } n be a stationary first-order Markov process with transition A. Example: Processes Satisfying the (1,∞)-RLL Constraint In this subsection we lower bound the entropy rate of a 1Although, as shown by Birch [9], the gap between the two bounds also decreases exponentially (butpossiblywithasmallexponent)inn. nonsymmetric first order Markov process, with q01 = q and q =1, passed through a BSC with crossover probability α. Proof: Clearly, H¯(Y) ≤ H(Y ) = h(α∗π ). From [6, 10 n 1 ThisunderlyingMarkovprocesssatisfiestheso-called(1,∞)- equation (11)] combined with (13) we have RLL constraint, where no consecutive ones are allowed to ∞ log(e) appear in a sequence. It is not difficult to verify that for this h(α∗π )=1− (2ǫ(1−2π ))2k (22) 1 1 2k(2k−1) choice of q and q we have 01 10 k=1 X =1−2log(e)(1−2π )2ǫ2+O(ǫ4), 1 q 1 π = , π = , 0 1+q 1 1+q whichestablishesourupperbound.For thelower bound,note q0#1k = q+1(−+qq)k+1, q1#0k = 1−1(+−qq)k. tthheatcγon≥cahv(iπty1)o−f hcǫ(2·)fowresohmaveeutnhiavterfsoarlacloln0sta≤ntxc>≤01.Fhoroldms h(x)≤h(π )+h′(π )(x−π ). Thus,using themonotonicity 1 1 1 In this case we have H¯(Y)≥h(α∗h−1(β)), where of h−1(·) and the fact that h′(π ) > 0 for all 0≤ q < 1, we 1 have β ,E 1 h 1−(−q)G+1 + q h 1−(−q)G . h−1(γ)≥h−1(h(π )−cǫ2)≥π − c ǫ2. (23) 1+q 1+q 1+q 1+q 1 1 h′(π ) (cid:18) (cid:18) (cid:19) (cid:18) (cid:19)(cid:19) 1 (18) Now, by Theorem 5, (23) and (22), we have By the concavity of h(·), for any natural number g hold cǫ2 H¯(Y)≥h(α∗h−1(γ))≥h α∗ π − 1 h′(π ) 1−(−q)g+1 (1−qg)+qg(1−q(−1)g+1) (cid:18) (cid:18) 1 (cid:19)(cid:19) h =h =1−2log(e)(1−2π )2ǫ2+O(ǫ4). 1+q 1+q 1 (cid:18) (cid:19) (cid:18) (cid:19) 1 1−q(−1)g+1 ≥(1−qg)h +qgh . (19) 1+q 1+q (cid:18) (cid:19) (cid:18) (cid:19) ACKNOWLEDGMENT and The author is grateful to Alex Samorodnitsky for many 1−(−q)g (1−qg−1)+qg−1(1−q(−1)g) valuablediscussionsandobservations,andtoYuryPolyanskiy h =h and Yihong Wu for sharing an early draft of [4]. 1+q 1+q (cid:18) (cid:19) (cid:18) (cid:19) 1 1−q(−1)g ≥(1−qg−1)h +qg−1h . (20) REFERENCES 1+q 1+q (cid:18) (cid:19) (cid:18) (cid:19) [1] A. Wyner and J. Ziv, “A theorem on the entropy of certain binary sequencesandapplications–I,”IEEETrans.Info.Theory,vol.19,no.6, Substituting (19) and (20) into (18), we obtain pp.769–772, Nov1973. [2] A. Samorodnitsky, “On the entropy of a noisy function,” 2015, arXiv EqG 1 EqG 1−q preprint, available online :http://arxiv.org/abs/1508.01464v3.pdf. β ≥ 1−2 h + h . (21) [3] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed. 1+q 1+q 1+q 1+q (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) Hoboken, NJ:Wiley-Interscience, 2006. [4] Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for Now, using again the fact that E(qG) = 1−(1λ−qλ)q, and channelsandBayesiannetworks,”2016,arXivpreprint,availableonline invoking Theorem 4 gives the following result. :http://arxiv.org/abs/1508.06025. [5] Y.LiandG.Han,“Input-constrainederasurechannels:Mutualinforma- Theorem 5: The entropy rate of a nonsymmetricstationary tionandcapacity,” inISIT,2014,pp.3072–3076. binary first-order Markov process with transition probabilities [6] O. Ordentlich and O. Shayevitz, “Minimum MS. E. Gerber’s lemma,” IEEETrans.Info.Theory,vol.61,no.11,pp.5883–5891, 2015. q = q and q = 1, passed through a BSC with crossover 01 10 [7] E.Ordentlich andT.Weissman,“Bounds ontheentropyrateofbinary probability α, is lower bounded as H¯(Y) ≥ h α∗h−1(γ) , hidden Markov processes,” Entropy of Hidden Markov Processes and where Connections toDynamical Systems,pp.117–171,2011. (cid:0) (cid:1) [8] Y.PeresandA.Quas,“EntropyrateforhiddenMarkovchainswithrare 1 transitions,” Entropy of Hidden Markov Processes and Connections to γ ,h Dynamical Systems,pp.172–178, 2011. 1+q [9] J. J. Birch, “Approximations for the entropy for functions of Markov (cid:18) (cid:19) (1−2α)2q 1 1−q chains,”TheAnnalsofMathematicalStatistics,vol.33,no.3,pp.930– − 2h −h . 938,1962. (1+q)(1−4α(1−α)q) 1+q 1+q [10] H.D.Pfister,“Onthecapacity offinitestatechannels andtheanalysis (cid:18) (cid:18) (cid:19) (cid:18) (cid:19)(cid:19) ofconvolutionalaccumulate-mcodes,”Ph.D.dissertation,Universityof California, SanDiego,2003. The following Corollary of Theorem 5, shows that our [11] P. Jacquet, G. Seroussi, and W. Szpankowski, “On the entropy of a hiddenMarkovprocess,”inDCC,2004,pp.362–371. bound becomes tight as α → 1, and partially recovers the 2 [12] J. Luo and D. Guo, “On the entropy rate of hidden Markov processes results of [13, Section 4.2] and [14, Appendix E]. observed through arbitrary memoryless channels,” IEEE Trans. Info. Theory,vol.55,no.4,pp.1460–1467, 2009. Corollary 1: For the very noisy regime, where α = 1 −ǫ 2 [13] H. D. Pfister, “The capacity of finite-state channels in the high-noise and 0≤q <1, we have regime,” Entropy of Hidden Markov Processes and Connections to Dynamical Systems,pp.179–222, 2011. 1−q 2 [14] G.HanandB.Marcus,“Derivatives ofentropyrateinspecial families H¯(Y)=1−2log(e) ǫ2+O(ǫ4). ofhiddenMarkovchains,”IEEETrans.Info.Theory,vol.53,no.7,pp. 1+q 2642–2652, July2007. (cid:18) (cid:19)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.