Novel Lower Bounds on the Entropy Rate of Binary Hidden Markov Processes Or Ordentlich MIT [email protected] Abstract—Recently,Samorodnitskyprovedastrengthenedver- [0,1], and a ∗ b , a(1 − b) + b(1 − a). Here, as well as 2 sion of Mrs. Gerber’s Lemma, where the output entropy of a throughouttherestofthepaper,logarithmsaretakentobase2. binary symmetric channel is bounded in terms of the average SincetheentropyrateofthesymmetricMarkovprocess{X } entropyoftheinputprojectedonarandomsubsetofcoordinates. is H¯(X)=h(q), for symmetric hiddenMarkovprocesses tnhe Here,thisresultisappliedforderivingnovellowerboundsonthe entropyrateof binaryhiddenMarkovprocesses. Forsymmetric bound (1) takes the simple form 6 underlyingMarkovprocesses,ourboundimprovesuponthebest 1 H¯(Y)≥h(α∗q). (2) known bound in the very noisy regime. The nonsymmetric case 0 is also considered, and explicit bounds are derived for Markov 2 processes that satisfy the (1,∞)-RLL constraint. Unfortunately, this bound is quite loose for many regimes of the process parameters α and q. y a I. INTRODUCTION Recently, Samorodnitsky [2] proved a strengthened ver- M sion of MGL, where the normalized input entropy Let {X }, n = 1,2,..., be a symmetric stationary binary n H(X ,...,X )/n in the right hand side of (1) is replaced 9 Markov process with transition probability 0 < q < 21, such by th1e averagne normalized entropy of the random vector that X ∼Bernoulli(1) and for any n>1 ] 1 2 (X1,...,Xn) projected on a random subset of coordinates. T Xn =Xn−1⊕Wn, In this paper we apply the results of [2] to derive a novel I lower bound on H¯(Y). Despite its simplicity, we show that s. where{Wn},n=2,3,...,isasequenceofi.i.d.Bernoulli(q) this bound is stronger than the best known lower bounds for c randomvariables,statisticallyindependentofX1.Weconsider the very noisy regime (α → 1), and recovers the strongest [ thehiddenMarkovprocess{Yn},n=1,2,...,obtainedatthe bound for the fast transitions2regime (q → 1). For finite 2 output of a binary symmetric channel (BSC) with crossover values of (α,q) it is numericallydemonstratedt2hat the bound v probability 0 < α < 21, whose input is the process {Xn}. is reasonably close to the true value of H¯(Y), which can 3 Namely, be estimated to an arbitrary precision by various known 5 approximation algorithms. 4 Yn =Xn⊕Zn, 6 We also derive a lower boundon H¯(Y) for the case where where{Z },n=1,2,...,isasequenceofi.i.d.Bernoulli(α) 0 n the process {X } is a nonsymmetric binary Markov process. n . random variables, statistically independentof {Xn}. The task For the special case of Markov processes that satisfy the so- 1 of finding an explicit form for the entropy rate 0 called (1,∞)-RLL constraint, our bound is shown to be tight 6 H(Y ,...,Y ) in the very noisy regime. 1 H¯(Y), lim 1 n n→∞ n : II. PRELIMINARIES v of the process {Y } is a long-standingopen problem,and the n Xi main contribution of this paper is in providing novel lower Let X=(X1,...,Xn) be a binary n-dimensional random bounds for this quantity. vector, [n] , {1,...,n}, and S ⊆ [n] some subset of r a A simple lower bound on H¯(Y) can be obtained by in- coordinates. The projection of X onto S is defined as voking Mrs. Gerber’s Lemma (MGL) [1], which states that if X ,{X :i∈S}. S i {X } is the inputto a BSC with crossoverprobabilityα, and n As before, we assume that Y is the output of a BSC {Y } is the output, then n with crossover probability α, whose input is the vector X. H(Y ,...,Y )≥nh α∗h−1 H(X1,...,Xn) , (1) Samorodnitsky has proved the following result. 1 n n Theorem 1 ( [2, Theorem 1.11]): Letλ=(1−2α)2andlet (cid:18) (cid:18) (cid:19)(cid:19) where h(p) , −plog(p)−(1−p)log(1−p) is the binary B be a random subset uniformly distributed over all subsets entropyfunction,h−1(·)isitsinverserestrictedtotheinterval of [n] with cardinality ⌈λn⌉. Then H(X |B) TheworkofO.OrdentlichwassupportedbytheMIT-TechnionPostdoc- H(Y)≥nh α∗h−1 B −E, (3) λn toralFellowship. (cid:18) (cid:18) (cid:19)(cid:19) where E =O logn ·(n−H(X)). and define the random variable K as the largest k for which n A ≤ n. Clearly, the subset S and the subset {A ,...,A } By Han’s i(cid:18)neqquality(cid:19)[3], the quantity H(XB|B)/λn is hakve the same distribution, and therefore 1 K monotonicallynonincreasingin λ, and therefore,ignoring the error term E, it can be seen that the bound (3) is stronger H(X |S) 1 K S = E H(X |X ,...,X ) than (1). n n Ai Ai−1 A1 Forourpurposes,itwillbeconvenienttoreplaceH(X |B) Xi=1 B n bwyithindHe(pXenSd|eSn)t,lywshaemreplSingiseaacrhanedleommesnutbiswetitohfp[nro]bgaebnileirtayteλd. = n1E H(XAi|XAi−1,...,XA1)1(i≤K)! i=1 It is easy to verify that for any distribution PX on {0,1}n Xn 1 holds = E H(X |X )1(i≤K) (7) n Ai Ai−1 H(X |B)−H(X |S) i=1 ! lim B S =0, X n and we can thne→re∞fore indeed rλenplace B with S in Theorem 1, = n1E 1(1≤K)+ H(XAi−Ai−1+1|X1)1(i≤K)! i=2 perhaps with a different convergence rate for E. In fact, X (8) Polyanskiy and Wu [4] distilled from [2] the inequality n 1 I(U :Y)≤I(U :XS|S), (4) = n Pr(K ≥1)+ E(H(XGi+1|X1)1(i≤K))!, i=2 thatholdsforanyrandomvariableU satisfyingtheMarkovre- X (9) lation U →X→Y. Using(4), the chainrule ofentropy,and theconvexityoftheMGLfunctionϕ(t),h(α∗h−1(t)), itis where 1(T) is an indicator on the event T, (7) follows since {X } is a first-order Markov process, and (8) follows from a simple exercise to prove the following form of Theorem 1. n the stationarity of {X }. For any 2≤i≤n we have n Proposition 1: Let λ = (1−2α)2 and let S be a random subset of [n] generated by independently sampling each ele- E(H(X |X )1(i≤K)) Gi+1 1 ment i with probability λ. Then =E (H(X |X )Pr(K ≥i|G )) Gi Gi+1 1 i H(Y)≥nh α∗h−1 H(XS|S) . (5) =EGi(H(XGi+1|X1)Pr(Binomial(n−Gi,λ)≥i−1)) λn (cid:18) (cid:18) (cid:19)(cid:19) By thelawoflargenumbers,foranyǫ>0andfixedg ,there i exists some N such that for all n>N holds 0 0 III. MAIN RESULT [1−ǫ,1) i≤(λ−ǫ)n In order to apply Proposition 1 for lower bounding H¯(Y), Pr(Binomial(n−gi,λ)≥i−1)∈ we needto evaluatethequantityH(X |S)/λn forsymmetric ((0,ǫ] i≥(λ+ǫ)n S Markov processes {X }. We will use the notation n Combining this with (9) gives that q∗k ,q∗q∗···∗q, 1 lim H(X |S)=λEH(X |X ) ktimes n→∞n S G+1 1 and note that | {z } =λH(XG+1|X1,G), (10) q∗k =Pr(W1⊕···⊕Wk =1)= 1−(1−2q)k. (6) andourclaimfollowssince H(XG+1|X1,G)=Eh q∗G for 2 stationary symmetric Markov processes. (cid:0) (cid:1) Proposition 2: Let0<λ<1,andletS bearandomsubset Remark 1: Anexpressionsimilarto(10)canalsoberecov- of [n] generated by independently sampling each element i ered from [5, Corollary II.2]. with probability λ. Then Our main result now follows directly from combining H(X |S) Propositions 1 and 2 and using the continuity of the MGL lim S =Eh q∗G , n→∞ λn function ϕ(t). whereGisageometricrandomvariabl(cid:0)ewith(cid:1) parameterλ,i.e. Theorem 2: The entropy rate of the process obtained by Pr(G=g)=(1−λ)g−1λ for g =1,2,.... passing a symmetric binary Markov process with transition probability q through a BSC with crossover probability α Proof: Let G , i = 1,2,..., be a sequence of i.i.d. i satisfies geometric random variables with parameter λ. Define the autoregressive process H¯(Y)≥h α∗h−1 Eh q∗G , (11) k A = G , k =1,2,..., where G is a geometric r(cid:0)andom v(cid:0)ariab(cid:0)le wi(cid:1)t(cid:1)h(cid:1)parameter λ= k k (1−2α)2. i=1 X IV. ASYMPTOTICANALYSIS AND NUMERICAL EXAMPLES and below. The upper and lower bounds from [7, Theorem 4.13] on (1−H¯(Y))/ǫ4 are plotted in Figure 1, along with In this section we evaluate the bound from Theorem 2 in the upper bound from Theorem 3. It is seen that Theorem 3 the limits of α → 12 and q fixed (very noisy regime), and in improvesupon the best known lower bounds on H¯(Y) in the the limit of q → 1 and α fixed (fast transitions regime). 2 limit of α → 1. Furthermore, unlike [7, Theorem 4.13] that Theorem 3: Let q be fixed and α= 1 −ǫ. Then only holds for2q ≥ 1, our result holds for all q. 2 4 ∞ log(e) (1−2q)2k H¯(Y)≥1−16ǫ4 +o(ǫ4). 2k(2k−1)1−(1−2q)2k k=1 8 X OW Upper Bound Proof: By, e.g., [6, Lemma 1], we have that for any 0≤ 7 OThWe oLroewme 3r Bound α,γ ≤1 holds 6 h(α∗γ)≥h(γ)+(1−h(γ))·4α(1−α) 5 =1−(1−2α)2(1−h(γ)). (12) ) Y ¯H(ǫ44 Setting β ,Eh q∗G , the RHS of (11) reads h(α∗h−1(β)), −1 which, by (12) and the parametrization α = 1 −ǫ, can be 3 (cid:0) (cid:1) 2 bounded as 2 h α∗h−1(β) ≥1−4ǫ2(1−β). 1 It therefore, re(cid:0)mains to ap(cid:1)proximate β. Recall the Taylor 0 expansion of the binary entropy function 0.25 0.3 0.35 0.4 0.45 0.5 q ∞ 1 log(e) h −p =1− (2p)2k. (13) (cid:18)2 (cid:19) k=12k(2k−1) Fig.1. Comparisonbetween theboundsfrom[7,Theorem4.13]onH¯(Y) X intheverynoisyregime,andthenewboundfromTheorem3. Using (6), we have 1−(1−2q)G Next, we move to show that the lower bound from Theo- β =Eh rem 2 is tight in the extreme regime of fast transitions, i.e., 2 (cid:18) (cid:19) q → 1 and α fixed. Let q = 1 −ǫ. With this parametriza- ∞ log(e) 2 2 =1− E(1−2q)2kG (14) tion, (15) reads 2k(2k−1) Xk∞=1 ∞ log(e) λ(2ǫ)2k log(e) λ(1−2q)2k β =1− (16) =1− , (15) 2k(2k−1)1−(1−λ)(2ǫ)2k 2k(2k−1)1−(1−λ)(1−2q)2k Xk=1 Xk=1 =1−2log(e)λǫ2+O(ǫ4). where (14) is justified since the sum ∞ log(e) E(1 − k=1 2k(2k−1) Now, using (12) and Theorem 2, we have the following 2q)2kG converges, and in (15) we have used the fact that P proposition. EtG = λt . To further approximate (15), we write 1−(1−λ)t Proposition 3: Let α be fixed and q = 1 −ǫ. Then 2 λ(1−2q)2k λ(1−2q)2k 1 1−(1−λ)(1−2q)2k = 1−(1−2q)2k · 1+ λ(1−2q)2k 1−H¯(Y)≤λ(1−β) 1−(1−2q)2k =2log(e)λ2ǫ2+O(ǫ4) λ(1−2q)2k = ·(1+O(λ)) =2log(e)(1−2α)4ǫ2+O(ǫ4) 1−(1−2q)2k =4ǫ2 (1−2q)2k +O ǫ4 . In [7, Theorem 4.12] it was proved that for 0 ≤ α ≤ 1, 1−(1−2q)2k q = 1 −ǫ, as ǫ↓0 2 2 (cid:0) (cid:1) Consequently, 1−H¯(Y)=2log(e)(1−2α)4ǫ2+o(ǫ3). ∞ log(e) (1−2q)2k β =1−4ǫ2 +O ǫ4 . 2k(k−1)1−(1−2q)2k It therefore follows that the bound from Theorem 2 becomes Xk=1 (cid:0) (cid:1) tight as q → 21. which yields the desired result. Itcanbeshownthatintheregimesofveryhigh-SNR(α→ To date, the best known upper and lower boundson H¯(Y) 0 and q fixed) and rare transitions (q → 0 and α fixed), the in the very noisy regime were the ones found in [7, Theorem boundfrom Theorem2 is looser than the boundsfoundin [7, 4.13].Inparticular,theratio 1−H¯(Y) wasboundedformabove Theorem 4.11] and in [8], respectively. ǫ4 1 1 0.95 0.95 0.9 0.9 0.85 0.85 0.8 0.8 ) ) Y Y ¯H(0.75 ¯H(0.75 0.7 0.7 0.65 0.65 0.6 0.6 MGL MGL Lower Bound from Theorem 2 Lower Bound from Theorem 2 0.55 Approximated Value 0.55 Approximated Value 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 q α (a)α=0.11 (b)q=0.11 Fig.2. Comparisonbetweenthelowerbound(11),MGLbasedbound(2),andtheapproximate value ofH¯(Y)computedusing[7,Algorithm 4.25]. Foranypairoffinitevaluesof(α,q),theentropyrateH¯(Y) probability matrix P, such that X ∼ Bernoulli(π ) and 1 1 canbeapproximatedtoanarbitraryprecision.Forexample,[3, for n = 2,3,... holds Pr(Xn = j|Xn−1 = i) = Pij. For Theorem 4.5.1] shows that k =1,2,..., we define the quantities H(Yn|Yn−1...,Y1,X1)≤H¯(Y)≤H(Yn|Yn−1...,Y1) qi#jk , Pk ij =Pr(Xn =j|Xn−k =i). and the two bounds converge to the same limit as n → ∞. Thefollowing(cid:0)isan(cid:1)extensionof Proposition2 fornonsym- Unfortunately, the computational complexity of the lower metric hidden Markov processes. bound(aswellastheupperbound)above,isexponentialinn.1 Proposition 4: Let{X }beastationaryfirst-orderMarkov To thatend,variousworksintroduceddifferentalgorithmsfor n approximatingH¯(Y)[7],[10]–[12],eachalgorithmexhibiting process with transition probability matrix P and stationary distribution π. Let 0 < λ< 1, and let S be a random subset a different trade-off between approximation accuracy and of [n] generated by independently sampling each element i complexity. with probability λ. Then Toobtainabetterappreciationofthetightnessofthebound from Theorem 2 for finite values of (α,q), we numerically lim H(XS|S) =π Eh q#G +π Eh q#G , (17) compareittotheoutputofonesuchapproximationalgorithm. n→∞ λn 0 01 1 10 In particular, we use [7, Algorithm 4.25] to approximate whereGisageometricrando(cid:16)mvar(cid:17)iablewith(cid:16)param(cid:17)eterλ,i.e. H¯(Y), where the algorithm parameters are chosen to ensure Pr(G=g)=(1−λ)g−1λ for g =1,2,.... high enough accuracy, and plot the results alongside with Proof: The proofis similar to thatof Proposition2 up to the lower bound of Theorem 2. We also plot the lower equation(10),wherenowH(X |X ,G)=π Eh q#G + bound(2)obtainedbysimplyapplyingMrs.Gerber’sLemma. G+1 1 0 01 The results for fixed α = 0.11 and varying q are shown in π Eh q#G . (cid:16) (cid:17) 1 10 Figures 2a, and those for fixed q = 0.11 and varying α, in Com(cid:16)binin(cid:17)gthiswithProposition1andthecontinuityofthe Figure 2b. MGL function ϕ(t) gives the following. V. NONSYMMETRIC MARKOV CHAINS Theorem 4: The entropy rate of the process obtained by passinga stationaryfirst-orderMarkovprocesswith transition In this section we extend ourlower boundfromTheorem2 probability matrix P and stationary distribution π through a to the case where the input to the BSC is a nonsymmetric BSC with crossover probability α satisfies Markov process. Let P= 1−q01 q01 H¯(Y)≥h α∗h−1 π0Eh q0#1G +π1Eh q1#0G , (cid:20) q10 1−q10 (cid:21) where G is a (cid:16)geometric(cid:16)random(cid:16)varia(cid:17)ble with p(cid:16)arame(cid:17)t(cid:17)er(cid:17)λ= be a transition probability matrix, and π = [π π ] be a (1−2α)2. 0 1 stationary distribution for P, such that πP = π. Let {X } n be a stationary first-order Markov process with transition A. Example: Processes Satisfying the (1,∞)-RLL Constraint In this subsection we lower bound the entropy rate of a 1Although, as shown by Birch [9], the gap between the two bounds also decreases exponentially (butpossiblywithasmallexponent)inn. nonsymmetric first order Markov process, with q01 = q and q =1, passed through a BSC with crossover probability α. Proof: Clearly, H¯(Y) ≤ H(Y ) = h(α∗π ). From [6, 10 n 1 ThisunderlyingMarkovprocesssatisfiestheso-called(1,∞)- equation (11)] combined with (13) we have RLL constraint, where no consecutive ones are allowed to ∞ log(e) appear in a sequence. It is not difficult to verify that for this h(α∗π )=1− (2ǫ(1−2π ))2k (22) 1 1 2k(2k−1) choice of q and q we have 01 10 k=1 X =1−2log(e)(1−2π )2ǫ2+O(ǫ4), 1 q 1 π = , π = , 0 1+q 1 1+q whichestablishesourupperbound.For thelower bound,note q0#1k = q+1(−+qq)k+1, q1#0k = 1−1(+−qq)k. tthheatcγon≥cahv(iπty1)o−f hcǫ(2·)fowresohmaveeutnhiavterfsoarlacloln0sta≤ntxc>≤01.Fhoroldms h(x)≤h(π )+h′(π )(x−π ). Thus,using themonotonicity 1 1 1 In this case we have H¯(Y)≥h(α∗h−1(β)), where of h−1(·) and the fact that h′(π ) > 0 for all 0≤ q < 1, we 1 have β ,E 1 h 1−(−q)G+1 + q h 1−(−q)G . h−1(γ)≥h−1(h(π )−cǫ2)≥π − c ǫ2. (23) 1+q 1+q 1+q 1+q 1 1 h′(π ) (cid:18) (cid:18) (cid:19) (cid:18) (cid:19)(cid:19) 1 (18) Now, by Theorem 5, (23) and (22), we have By the concavity of h(·), for any natural number g hold cǫ2 H¯(Y)≥h(α∗h−1(γ))≥h α∗ π − 1 h′(π ) 1−(−q)g+1 (1−qg)+qg(1−q(−1)g+1) (cid:18) (cid:18) 1 (cid:19)(cid:19) h =h =1−2log(e)(1−2π )2ǫ2+O(ǫ4). 1+q 1+q 1 (cid:18) (cid:19) (cid:18) (cid:19) 1 1−q(−1)g+1 ≥(1−qg)h +qgh . (19) 1+q 1+q (cid:18) (cid:19) (cid:18) (cid:19) ACKNOWLEDGMENT and The author is grateful to Alex Samorodnitsky for many 1−(−q)g (1−qg−1)+qg−1(1−q(−1)g) valuablediscussionsandobservations,andtoYuryPolyanskiy h =h and Yihong Wu for sharing an early draft of [4]. 1+q 1+q (cid:18) (cid:19) (cid:18) (cid:19) 1 1−q(−1)g ≥(1−qg−1)h +qg−1h . (20) REFERENCES 1+q 1+q (cid:18) (cid:19) (cid:18) (cid:19) [1] A. Wyner and J. Ziv, “A theorem on the entropy of certain binary sequencesandapplications–I,”IEEETrans.Info.Theory,vol.19,no.6, Substituting (19) and (20) into (18), we obtain pp.769–772, Nov1973. [2] A. Samorodnitsky, “On the entropy of a noisy function,” 2015, arXiv EqG 1 EqG 1−q preprint, available online :http://arxiv.org/abs/1508.01464v3.pdf. β ≥ 1−2 h + h . (21) [3] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed. 1+q 1+q 1+q 1+q (cid:18) (cid:19) (cid:18) (cid:19) (cid:18) (cid:19) Hoboken, NJ:Wiley-Interscience, 2006. [4] Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for Now, using again the fact that E(qG) = 1−(1λ−qλ)q, and channelsandBayesiannetworks,”2016,arXivpreprint,availableonline invoking Theorem 4 gives the following result. :http://arxiv.org/abs/1508.06025. [5] Y.LiandG.Han,“Input-constrainederasurechannels:Mutualinforma- Theorem 5: The entropy rate of a nonsymmetricstationary tionandcapacity,” inISIT,2014,pp.3072–3076. binary first-order Markov process with transition probabilities [6] O. Ordentlich and O. Shayevitz, “Minimum MS. E. Gerber’s lemma,” IEEETrans.Info.Theory,vol.61,no.11,pp.5883–5891, 2015. q = q and q = 1, passed through a BSC with crossover 01 10 [7] E.Ordentlich andT.Weissman,“Bounds ontheentropyrateofbinary probability α, is lower bounded as H¯(Y) ≥ h α∗h−1(γ) , hidden Markov processes,” Entropy of Hidden Markov Processes and where Connections toDynamical Systems,pp.117–171,2011. (cid:0) (cid:1) [8] Y.PeresandA.Quas,“EntropyrateforhiddenMarkovchainswithrare 1 transitions,” Entropy of Hidden Markov Processes and Connections to γ ,h Dynamical Systems,pp.172–178, 2011. 1+q [9] J. J. Birch, “Approximations for the entropy for functions of Markov (cid:18) (cid:19) (1−2α)2q 1 1−q chains,”TheAnnalsofMathematicalStatistics,vol.33,no.3,pp.930– − 2h −h . 938,1962. (1+q)(1−4α(1−α)q) 1+q 1+q [10] H.D.Pfister,“Onthecapacity offinitestatechannels andtheanalysis (cid:18) (cid:18) (cid:19) (cid:18) (cid:19)(cid:19) ofconvolutionalaccumulate-mcodes,”Ph.D.dissertation,Universityof California, SanDiego,2003. The following Corollary of Theorem 5, shows that our [11] P. Jacquet, G. Seroussi, and W. Szpankowski, “On the entropy of a hiddenMarkovprocess,”inDCC,2004,pp.362–371. bound becomes tight as α → 1, and partially recovers the 2 [12] J. Luo and D. Guo, “On the entropy rate of hidden Markov processes results of [13, Section 4.2] and [14, Appendix E]. observed through arbitrary memoryless channels,” IEEE Trans. Info. Theory,vol.55,no.4,pp.1460–1467, 2009. Corollary 1: For the very noisy regime, where α = 1 −ǫ 2 [13] H. D. Pfister, “The capacity of finite-state channels in the high-noise and 0≤q <1, we have regime,” Entropy of Hidden Markov Processes and Connections to Dynamical Systems,pp.179–222, 2011. 1−q 2 [14] G.HanandB.Marcus,“Derivatives ofentropyrateinspecial families H¯(Y)=1−2log(e) ǫ2+O(ǫ4). ofhiddenMarkovchains,”IEEETrans.Info.Theory,vol.53,no.7,pp. 1+q 2642–2652, July2007. (cid:18) (cid:19)