ebook img

On entropy for mixtures of discrete and continuous variables PDF

0.21 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On entropy for mixtures of discrete and continuous variables

ENTROPY FOR MIXTURES OF DISCRETE AND CONTINUOUS VARIABLES CHANDRANAIR,BALAJIPRABHAKAR,ANDDEVAVRATSHAH Abstract. In this paper, we extend the notion of entropy in a natural manner for a mixed-pair random variable,apairofrandomvariableswithonediscreteandtheothercontinuous. Ourextensionsareconsistent 7 inthatthereexistnatural injectionsfromdiscreteorcontinuous randomvariablesintomixed-pairrandom 0 variablessuchthattheirentropyremainsthesame. Thisextensionofentropyallowsustoobtainsufficient 0 conditionsfortheentropypreservationunderbijectionsbetween mixed-pairrandomvariables. 2 The extended definition of entropy leads to an entropy rate for continuous time Markov chains. As n applications of ourresults,weprovidesimplerproofsofsomeknown probabilisticresults. Theframe-work a developedinthispaperisbestsuitedforestablishingprobabilisticpropertiesofcomplexprocesses,suchas J loadbalancingsystems,queuingnetworks,cachingalgorithms,thathaveinherentdiscretevariables(choices 7 made)andcontinuous variables(occurrence times). 1 ] 1. Introduction T I Thenotionofentropyfordiscreterandomvariablesaswellascontinuousrandomvariablesiswelldefined. s. Entropy preservation of discrete random variable under bijection map is an extremely useful property. For c example, Prabhakar and Gallager [PG03] used this entropy preservation property to obtain an alternate [ proof of the known result that Geometric processes are fixed points under certain queuing disciplines. 2 In many interesting situations, including Example 1.1 given below, the underlying random variables are v mixtures of discrete and continuous random variables. Such systems exhibit natural bijective properties 5 which allow one to obtain non-trivial properties of the system via non-rigorous “ information preservation” 7 arguments. In this paper we develop sufficient conditions to make such arguments rigorous. 0 7 We will extend the definition of entropy to random variables that form a mixed pair of discrete and 0 continuousvariables as wellas obtainsufficientconditions for preservationof entropy. Subsequently, we will 6 provide a rigorous justification of mathematical identities that follow in the example below. 0 / s Example 1.1. Poisson Splitting: Consider a Poisson Process, P, of rate λ. Split the Poisson process into c : two baby-processes P1 and P2 as follows: for each point of P, toss an independent coin of bias p; if coin v turnsupheadsthenthe pointisassignedto P ,elsetoP . Itiswell-knownthatP andP areindependent i 1 2 1 2 X Poisson processes with rates λp and λ(1−p) respectively. r EntropyrateofaPoissonprocesswithrateµisknowntobeµ(1−logµ)natspersecond. Thatis,entropy a rates of P, P , and P are given by λ(1−logλ), λp(1−logλp) and λ(1−p)(1−logλ(1−p)) respectively. 1 2 Further observe that the coin of bias p is tossed at a rate λ and each coin-toss has an entropy equal to −plogp−(1−p)log(1−p) nats. Itisclearthatthereisabijectionbetweenthetuple(P,coin-tossprocess)andthetuple(P ,P ). Observe 1 2 that the joint entropy rate of the two independent baby-processes are given by their sum. This leads to the following “obvious” set of equalities. H (P ,P )=H (P )+H (P ) ER 1 2 ER 1 ER 2 =λp(1−logλp)+λ(1−p)(1−logλ(1−p)) (1.1) =λ(1−logλ)+λ(−plogp−(1−p)log(1−p)) =H (P)+λ(−plogp−(1−p)log(1−p)). ER The last sum can be identified as sum of the entropy rate of the originalPoissonprocess and the entropy rate of the coin tosses. However the presence of differential entropy as well as discrete entropy prevents 1 this interpretation from being rigorous. In this paper, we shall provide rigorous justification to the above equalities. 2. Definitions and Setup This section provides technical definitions and sets up the frame-work for this paper. First, we present some preliminaries. 2.1. Preliminaries. Consider a measure space (Ω,F,P), with P being a probability measure. Let (R,BR) denote the measurable space onR with the Borelσ-algebra. A randomvariableX is a measurablemapping from Ω to R. Let µX denote the induced probability measure on (R,BR) by X. We call X as discrete random variable ifthereis acountablesubset{x ,x ,...}ofRthatformsasupportforthe measureµ . Let 1 2 X p =P(X =x ) and note that p =1. i i i i The entropy of a discrete random variable is defined by the sum P H(X)=− p logp . i i i X Note that this entropy is non-negative and has several well known properties. One natural interpretation of this number is in terms of the maximum compressibility (in bits per symbol) of an i.i.d. sequence of the random variables, X (cf. Shannon’s data compression theorem [Sha48]). A random variable Y, defined on (Ω,F,P), is said to be a continuous random variable if the probability measure, µY, induced on (R,BR) is absolutely continuous with respect to the Lebesgue measure. These probabilitymeasurescanbe characterizedbyanon-negativedensityfunctionf(x)thatsatisfies f(x)dx= R 1. The entropy (differential entropy) of a continuous random variable is defined by the integral R h(Y)=− f(y)logf(y)dy. R Z The entropy of a continuous random variable is not non-negative, though it satisfies several of the other properties of the discrete entropy function. Due to negativity, differential entropy clearly does not have interpretation of maximal compressibility. However, it does have the interpretation of being the limiting difference between the maximally compressed quantization of the random variable and an identical quanti- zationofanindependent U[0,1]∗randomvariable[CT91] as the quantizationresolutiongoesto zero. Hence the term differential entropy is usually preferred to entropy when describing this number. 2.2. Our Setup. In this paper, we are interested in a set of random variables that incorporate the aspects ofboth discrete andcontinuousrandomvariables. LetZ =(X,Y) be ameasurablemapping fromthe space (Ω,F,P) to the space (R×R,BR×BR). Observe that this mapping induces a probability measure µZ on the space (R×R,BR ×BR) as well as two probability measures µX and µY on (R,BR) obtained via the projection of the measure µ . Z Definition 2.1 (Mixed-Pair). Consider a random variables Z = (X,Y). We call Z† a mixed-pair if X is a discrete random variable while Y is a continuous random variable. That is, the support of µ is on the Z product space S×R, with S = {x ,x ,...} is a countable subset of R. That is S forms a support for µ 1 2 X while µ is absolutely continuous with respect to the Lebesgue measure. Y Observe that Z = (X,Y) induces measures {µ ,µ ,....} that are absolutely continuous with respect to 1 2 the Lebesguemeasure,where µi(A)=P(X =xi,Y ∈A), for everyA∈BR. Associatedwith these measures µ , there are non-negative density functions g (y) that satisfy i i g (y)dy =1. i R i Z X ∗U[0,1]represents arandomvariablethatisuniformlydistributedontheinterval[0,1] †For the rest of the paper we shall adopt the notation that random variables Xi represent discrete random variables, Yi representcontinuous randomvariablesandZi representmixed-pairofrandomvariables. 2 Let us define p = g (y)dy. Observe that p ’s are non-negative numbers that satisfy p = 1 and i R i i i i correspondsto theprobabilitymeasureµ . Furtherg(y)= g (y)correspondstothe probabilitymeasure R X i i P µ . Let Y △ 1 P g˜(y)= g (y) i i p i be the probability density function of Y conditioned on X =x . i The following non-negative sequence is well defined for every y ∈R for which g(y)>0, g (y) i p (y)= , i≥1. i g(y) Now g(y)is finite exceptpossibly ona set,A, ofmeasurezero. For y ∈Ac, we havethat p (y)=1;p (y) i i i correspondsto the probabilitythat X =x conditioned onY =y. It follows fromdefinitions of p andp (y) i i i P that p = p (y)g(y)dy. i i R Z Definition2.2(GoodMixed-Pair). Amixed-pairrandomvariableZ =(X,Y)iscalledgoodifthefollowing condition is satisfied: (2.1) |g (y)logg (y)|dy <∞. i i R i Z X Essentially,the goodmixed-pairrandomvariablespossessthe propertythatwhenrestrictedtoanyofthe X values, the conditional differential entropy of Y is well-defined. The following lemma provides a simple sufficient conditions for ensuring that a mixed-pair variable is good. Lemma 2.1. The following conditions are sufficient for a mixed-pair random variable to be a good pair: (a) Random variable Y possess a finite ǫth moment for some ǫ>0, i.e. M = |y|ǫg(y)dy <∞. ǫ R Z (b) There exists δ >0 such that g(y) satisfies g(y)1+δdy <∞. R Z (c) The discrete random variable X has finite entropy, i.e. − p logp <∞. i i i P Proof. The proof is presented in the appendix. (cid:3) Definition 2.3 (Entropyofa mixed-pair). The entropyof agoodmixed-pairrandomvariable is definedby (2.2) H(Z)=− g (y)logg (y)dy. i i R i Z X Definition 2.4 (Vector of Mixed-Pairs). Consider a random vector (Z ,...,Z ) = {(X ,Y ),...,(X ,Y )}. 1 d 1 1 d d We call (Z ,...,Z ) a vector of mixed-pairs if the support of µ is on the product space Sd ×Rd, 1 d (Z1,...,Zd) whereSd ⊂Rd isacountableset. Thatis,Sd formsthe supportfortheprobabilitymeasureµ while (X1,..,Xd) the measure µ is absolutely continuous with respect to the Lebesgue measure on Rd. (Y1,..,Yd) Definition 2.5 (Good Mixed-Pair Vector ). A vector of mixed-pair random variables (Z ,...,Z ) is called 1 d good if the following condition is satisfied: (2.3) |g (y)logg (y)|dy<∞, x x xX∈SdZy∈Rd where g (y) is the density of the continuous random vector Yd conditioned on the event that Xd =x. x 3 AnalogoustoLemma2.1,thefollowingconditionsguaranteethatavectorofmixed-pairrandomvariables is good. Lemma 2.2. The following conditions are sufficient for a mixed-pair random variable to be a good pair: (a) Random variable Yd possess a finite ǫth moment for some ǫ>0, i.e. M = kykǫg(y)dy<∞. ǫ Rd Z (b) There exists δ >0 such that g(y) satisfies g(y)1+δdy<∞. Rd Z (c) The discrete random variable Xd has finite entropy, i.e. − p logp <∞. x∈Sd x x Proof. The proof is similar to that of Lemma 2.1 and is omitted. P (cid:3) In rest of the paper, all mixed-pair variables and vectors are assumed to be good, i.e. assumed to satisfy the condition (2.1). Definition 2.6 (Entropy of a mixed-pair vector). The entropy of a good mixed-pair vector of random variables is defined by (2.4) H(Z)=− g (y)logg (y)dy. x x Rd xX∈SdZ Definition 2.7 (Conditional entropy). Given a pair of random variables (Z ,Z ), the conditional entropy 1 2 is defined as follows H(Z |Z )=H(Z ,Z )−H(Z ). 1 2 1 2 2 It is not hard to see that H(Z |Z ) evaluates to 1 2 g (y ,y ) − g (y ,y )log x1,x2 1 2 dy dy . xX1,x2ZR2 x1,x2 1 2 gx2(y2) 1 2 Definition 2.8 (Mutual Information). Given a pair of random variables (Z ,Z ), the mutual information 1 2 is defined as follows I(Z ;Z )=H(Z )+H(Z )−H(Z ,Z ). 1 2 1 2 1 2 The mutual information evaluates to g (y ,y ) g (y ,y )log x1,x2 1 2 dy dy . xX1,x2ZR2 x1,x2 1 2 gx1(y1)gx2(y2) 1 2 Using the fact that 1+logx<x for x>0 it can be shown that I(Z ;Z ) is non-negative. 1 2 2.3. OldDefinitionsStillWork. Wewillnowpresentinjectionsfromthespaceofdiscrete(orcontinuous) randomvariablesintothespaceofmixed-pairrandomvariablesothattheentropyofthemixed-pairrandom variable is the same as the discrete (or continuous) entropy. Injection: DiscreteintoMixed-Pair: LetX beadiscreterandomvariablewithfiniteentropy. Let{p ,p ,...} 1 2 denote the probability measure associated with X. Consider the mapping σ : X → Z ≡ (X,U) where U d is an independent continuous random variable distributed uniformly on the interval [0,1]. For Z, we have g (y)=p for y ∈[0,1]. Therefore i i 1 H(Z)=− g (y)logg (y)dy = −p logp dy i i i i i ZR i Z0 X X =− p logp =H(X)<∞. i i i X 4 Therefore we see that H(Z)=H(X). Injection: Continuous into Mixed-Pair: LetY be acontinuousrandomvariablewithadensity functiong(y) that satisfies g(y)|logg(y)|dy <∞. R Z Consider the mapping σ :Y →Z ≡(X ,Y) where X is the constantrandomvariable,say P(X =1)=1. c 0 0 0 Observe that g(y)=g (y) and that the pair Z ≡(X ,Y) is a good mixed-pair that satisfies H(Z)=h(Y). 1 0 Thus σ and σ are injections from the space of continuous and discrete random variables into the space d c of good mixed-pairs that preserve the entropy function. 2.4. Discrete-Continuous Variable as Mixed-Pair. Consider a random variable‡ V whose support is combination of both discrete and continuous. That is, it satisfies the following properties: (i) There is a countableset(possiblyfinite)S ={x ,x ,...}suchthatµ (x )=p >0;(ii)measureµ˜ withanassociated 1 2 V i i V non-negativefunctiong˜(y)(absolutelycontinuousw.r.t. theLebesguemeasure),and(iii)thefollowingholds: g˜(y)dy+ p =1. i R Z i X Thus, the random variable V either takes discrete values x ,x ,... with probabilities p ,p ,... or else it 1 2 1 2 is distributed according to the density function 1 g˜(y); where p = p . Observe that V has neither a 1−p i i countable support nor is its measure absolutely continuous with respect to Lebesgue measure. Therefore, P though such random variables are encountered neither the discrete entropy nor the continuous entropy is appropriate. To overcome this difficulty, we will treat such variables as mixed-pair variables by appropriate injection of suchvariablesinto mixed-pair variables. Subsequently, we will be able to use the definition ofentropy for mixed-pair variables. Injection: Discrete-ContinuousintoMixed-Pair: LetV beadiscrete-continuousvariableasconsideredabove. Let the following two conditions be satisfied: − p logp <∞ and g˜(y)|logg˜(y)|dy <∞. i i R i Z X Consider the mapping σ : V → Z ≡ (X,Y) described as follows: When V takes a discrete value x , it is m i mapped onto the pair (x ,u ) where u is chosenindependently and uniformly at randomin [0,1]. When V i i i doesnottakeadiscretevalueandsaytakesvaluey,itgetsmappedtothepair(x ,y)wherex 6=x ,∀i.One 0 0 i canthink of x as anindicator value that V takes when itis notdiscrete. The mixed-pairvariableZ has its 0 associated functions {g (y),g (y),...} where g (y)=p , y ∈[0,1],i≥1 and g (y)=g˜(y). The entropy of Z 0 1 i i 0 as defined earlier is H(Z)=− g (y)logg (y)dy i i R i Z X =− p logp − g˜(y)logg˜(y)dy. i i R i Z X Remark 2.1. Intherestofthepaperwewilltreateveryrandomvariablethatisencounteredasamixed-pair random variable. That is, a discrete variable or a continuous variable would be assumed to be injected into the space of mixed-pairs using the map σ or σ , respectively. d c 3. Bijections and Entropy Preservation In this section we will consider bijections between mixed-pair random variables and establish sufficient conditions under which the entropy is preserved. We first consider the case of mixed-pair random variables and then extend this to vectors of mixed-pair random variables. ‡Normallysuchrandomvariablesarereferredtoasmixedrandomvariables. 5 3.1. Bijections between Mixed-Pairs. Consider mixed-pair random variables Z ≡ (X ,Y ) and Z ≡ 1 1 1 2 (X ,Y ). Specifically, let S = {x } and S = {x } be the countable (possibly finite) supports of the 2 2 1 1i 2 2j discretemeasuresµ andµ suchthatµ (x )>0andµ (x )>0foralli∈S andj ∈S . Therefore X1 X2 X1 1i X2 2j 1 2 a bijection between mixed-pair variables Z and Z can be viewed as bijections between S ×R and S ×R. 1 2 1 2 LetF :S ×R→S ×Rbeabijection. GivenZ ,thisbijectioninducesamixed-pairrandomvariableZ . 1 2 1 2 WerestrictourattentiontothecasewhenF iscontinuousanddifferentiable§. Lettheinducedprojectionsbe F :S ×R→S andF :S ×R→R. LettheassociatedprojectionsoftheinversemapF−1 :S ×R→S ×R d 1 2 c 1 2 1 be F−1 :S ×R→S and F−1 :S ×R→R respectively. d 2 1 c 2 As before, let {g (y )}, {h (y )} denote the non-negative density functions associated with the mixed- i 1 j 2 pair random variables Z and Z respectively. Let (x ,y ) = F(x ,y ), i.e. x = F (x ,y ) and y = 1 2 2j 2 1i 1 2j d 1i 1 2 F (x ,y ). Now, consider a small neighborhood x ×[y ,y +dy ) of (x ,y ). From the continuity of c 1i 1 1i 1 1 1 1i 1 F, for small enough dy , the neighborhood x ×[y ,y +dy ) is mapped to some small neighborhood of 1 1i 1 1 1 (x ,y ), say x ×[y ,y +dy ). The measure of x ×[y ,y +dy ) is ≈ g (y )|dy |, while measure of 2j 2 2j 2 2 2 1i 1 1 1 i 1 1 x ×[y ,y +dy ) is ≈h (y )|dy |. Since distribution of Z is induced by the bijection from Z , we obtain 2j 2 2 2 j 2 2 2 1 dy 1 (3.1) g (y ) =h (y ). i 1 j 2 dy (cid:12) 2(cid:12) (cid:12) (cid:12) Further from y2 =Fc(x1i,y1) we also have, (cid:12) (cid:12) (cid:12) (cid:12) dy dF (x ,y ) (3.2) 2 = c 1i 1 . dy dy 1 1 These immediately imply asufficientconditionunder whichbijections betweenmixed-pairrandomvariables imply that their entropies are preserved. Lemma 3.1. If dFc(x1i,y1) =1 for all points (x ,y )∈S ×R, then H(Z )=H(Z ). dy1 1i 1 1 1 2 (cid:12) (cid:12) Proof. This essen(cid:12)tially follo(cid:12)ws from the change of variables and repeated use of Fubini’s theorem (to in- (cid:12) (cid:12) terchange the sums and the integral). To apply Fubini’s theorem, we use the assumption that mixed-pair random variables are good. Observe that, H(Z )=− g (y )logg (y )dy 1 i 1 i 1 1 R i Z X (a) dFc(x1i,y1) = − h (y )log h (y ) dy j 2 j 2 2 (3.3) j ZR (cid:18) (cid:12) dy1 (cid:12)(cid:19) X (cid:12) (cid:12) (b) (cid:12) (cid:12) = − h (y )logh (y )dy(cid:12) (cid:12) j 2 j 2 2 R j Z X =H(Z ). 2 Here(a)isobtainedbyrepeateduseofFubini’stheoremalongwith(3.1)and(b)followsfromtheassumption of the Lemma that |dFc(x1i,y1]|=1. (cid:3) dy1 3.2. Some Examples. In this section, we present some examples to illustrate our definitions, setup and the entropy preservation Lemma. Example 3.1. Let Y be a continuous random variable that is uniformly distributed in the interval [0,2]. 1 LetX bethediscreterandomvariablethattakesvalue0whenY ∈[0,1]and1otherwise. LetY =Y −X . 2 1 2 1 2 Clearly Y ∈[0,1], is uniformly distributed and independent of X . 2 2 Let Z ≡ (X ,Y ) be the natural injection, σ of Y (i.e. X is just the constant random variable.). 1 1 1 c 1 1 Observe that the bijection between Z to the pair Z ≡(X ,Y ) that satisfies conditions of Lemma 3.1 and 1 2 2 2 implies log2=H(Z )=H(Z ). 1 2 §ThecontinuityofmappingbetweentwocopiesofproductspaceS×Ressentiallymeansthatthemappingiscontinuouswith respecttoright(orY)co-ordinateforfixedxi∈S. Similarly,differentiabilityessentiallymeansdifferentiabilitywithrespectto Y co-ordinate. 6 However, also observe that by plugging in the various definitions of entropy in the appropriate spaces, H(Z )=H(X ,Y )=H(X )+h(Y )=log2+0,where the firsttermis the discreteentropyandthe second 2 2 2 2 2 term is the continuous entropy. In general it is not difficult to see that the two definitions of entropy (for discreteandcontinuousrandomvariables)arecompatiblewitheachotheriftherandomvariablesthemselves are thought of as a mixed-pair. Example 3.2. This example demonstrates that some care must be taken when considering discrete and continuous variables as mixed-pair randomvariables. Consider the following continuous randomvariableY 1 that is uniformly distributed in the interval [0,2]. Now, consider the mixed random variable V that takes 2 the value 2 with probability 1 and takes a value uniformly distributed in the interval [0,1] with probability 2 1. 2 Clearly, there is a mapping that allows us to create V from Y by just mapping Y ∈ (1,2] to the value 2 1 1 V = 2 and by setting Y = V when Y ∈ [0,1]. However, given V = 2 we are not able to reconstruct Y 2 1 2 1 2 1 exactly. Therefore, intuitively one expects that H(Y )>H(V ). 1 2 However, if you use the respective injections, say Y → Z and V → Z , to the space of mixed-pairs of 1 1 2 2 random variables, we can see that H(Y )=H(Z )=log2=H(Z ). 1 1 2 This shows that if we think of H(Z ) as the entropy of the mixed random variable V we get an intuitively 2 2 paradoxicalresult where H(Y )=H(V ) where in reality one would expect H(Y )>H(V ). 1 2 1 2 Thecarefulreaderwillbequicktopointoutthatthe injectionfromV toZ introducesanewcontinuous 2 2 variable, Y , associated with the discrete value of 2, as well as a discrete value x associated with the 22 0 continuous part of V . Indeed the ”new” random variable Y allows us to precisely reconstruct Y from Z 2 22 1 2 and thus complete the inverse mapping of the bijection. Remark 3.1. The examples show that when one has mappings involving various types of random variables and one wishes to use bijections to compare their entropies; one can perform this comparisonas long as the random variables are thought of as mixed-pairs. 3.3. Vector of Mixed-Pair Random Variables. Now, we derive sufficientconditions for entropy preser- vation under bijection between vectors of mixed-pair variables. To this end, let Z = (Z1,...,Z1) and 1 1 d Z =(Z2,...,Z2)be twovectorsofmixed-pairrandomvariableswiththeir supportonS ×Rd andS ×Rd 2 1 d 1 2 respectively. (Here S ,S are countable subsets of Rd.) Let F : S ×Rd → S ×Rd be a continuous and 1 2 1 2 differentiable bijection that induces Z by its application on Z . 2 1 As before, let the projections of F be F : S ×Rd → S and F : S ×Rd → Rd. We consider situation d 1 2 c 1 where F is differentiable. Let g (y),y∈Rd for x ∈S and h (y),y∈Rd for w ∈S be density functions c i i 1 j j 2 as defined before. Let (x ,y1)∈S ×Rd be mapped to (w ,y2)∈S ×Rd. Then, consider d×d Jacobian i 1 j 2 ∂y2 J(x ,y1)≡ k , i ∂y1 (cid:20) l (cid:21)1≤k,l≤d where we have used notation y1 =(y1,...,y1) and y2 =(y2,...,y2). Now, similar to Lemma 3.1 we obtain 1 d 1 d the following entropy preservation for bijection between vector of mixed-pair random variables. Lemma 3.2. If for all (x ,y1)∈S ×Rd, i 1 det(J(x ,y1)) =1, i then H(Z1)=H(Z2). Here det(J) denotes t(cid:12)he determinan(cid:12)t of matrix J. (cid:12) (cid:12) Proof. ThemainingredientsfortheproofofLemma3.1forthescalarcaseweretheequalities(3.1)and(3.2). For a vector of mixed-pair variable we will obtain the following equivalent equalities: For change of dy1 at (x ,y1), let dy2 be induced change at (x ,y2). Let vol(dy) denote the volume of d dimensional rectangular i j region with sides given by components of dy in Rd. Then, (3.4) g (y1)vol(dy1)=h (y2)vol(dy2). i j Further, at (x ,y1), i (3.5) vol(dy2)= det(J(x ,y1)) vol(dy1). i (cid:12) 7 (cid:12) (cid:12) (cid:12) Using exactly the same argument that is used in (3.3) (replacing dy by vol(dyk), k = 1,2), we obtain the k desired result. This completes the proof of Lemma 3.2. (cid:3) Remark 3.2. Essentially,foreverydiscretechoiceX,ifthemappingbetweenthecontinuousvectorshasone, then the bijection preserves entropy. 4. Entropy Rate of Continuous Time Markov Chains AcontinuoustimeMarkovchainiscomposedofthepointprocessthatcharacterizesthetimeoftransitions of the states as well as the discrete states between which the transition happens. Specifically, let x ∈ R i denote the time of ith transition or jump with i∈Z. Let V ∈S denote the state of the Markov chain after i thejumpattimex ,whereSbesomecountablestatespace. Forsimplicity,weassumeS=N. Lettransition i probabilities be p =P(V =ℓ|V =k),k,ℓ∈N for all i. kℓ i i−1 We recall that the entropy rate of a point process P was defined in section 13.5 of [DVJ88] according to the following: “Observation of process conveys information of two kinds: the actual number of points observed and the location of these points given their number.” This led them to define the entropy of a realization {x ,...,x } as 1 N H(N)+H(x ,...,x |N) 1 N The entropy rate of the point process P is defined as follows: let N(T) be the number of points arrived in time interval (0,T] and the instances be x(T)=(x ,...,x ). Then, the entropy rate of the process is 1 N(T) 1 H (P)= lim [H(N(T))+H(x(T)|N(T))], ER T→∞T if the above limit exists. We extend the above definition to the case of Markov chain in a natural fashion. Observation of a continuous time Markov chain over a time interval (0,T] conveys information of three types: the number of points/jumps of the chain in the interval, the location of the points given the number as well as the value of the chain after each jump. Treating each random variable as a mixed-pair allows us to consider all the random variables in a single vector. As before,letN(T)denote the number ofpoints inaninterval(0,T]. Letx(T)=(x ,...,x ), V(T)= 1 N(T) (V ,V ,...,V ) denote the locations of the jumps as well as the values of the chain after the jumps. This 0 1 N(T) leads us to define the entropy of the process during the interval (0,T] as (4.1) H =H(N(T),V(T),x(T)). (0,T] Observe that the (N(T),V(T),x(T)) is a random vector of mixed-pair variables. For a single state Markovchain the above entropy is the same as that of the point process determine the jump/transition times. Similar to the development for point processes, we define the entropy rate of the Markov chain as H H = lim (0,T], if it exists. ER T→∞ T Proposition 4.1. Consider a Markov chain with underlying Point process being Poisson of rate λ, its stationary distribution being π = (π(i)) with transition probability matrix P = [p ]. Then, its entropy rate ij is well-defined and HER =λ(1−logλ)+λHMC, where HMC =− iπ(i) jpijlogpij. Proof. For MarkPov ChaiPn as described in the statement of proposition, we wish to establish that H lim (0,T] =H , ER T→∞ T as defined above. Now H =H(x(T),N(T),V(T)) (0,T] =H(x(T),N(T))+H(V(T)|N(T),x(T)). 8 Considerthetermontherighthandsideoftheaboveequality. ThiscorrespondstothepointsofaPoisson process of rate λ. It is well-known (cf. equation (13.5.10), pg. 565 [DVJ88]) that 1 (4.2) lim H(x(T),N(T))=λ(1−logλ). T→∞T Now consider the term H(V(T)|x(T),N(T)). Since V(T) is independent of x(T), we get from the defini- tion of conditional entropy that (4.3) H(V(T)|x(T),N(T))=H(V(T)|N(T)). One can evaluate H(V(T)|N(T)) as follows, H(V(T)|N(T))= p H(V ,...,V ), k 0 k k X where p is the probability that N(T)=k. The sequence of states V ,...,V can be thought of as sequence k 0 k of states of a discrete time Markov chain with transition matrix P. For a Markov chain, with stationary distribution π (i.e. Pπ =π), it is well-known that 1 lim H(V ,...,V )=− π(i) p logp 0 k ij ij k→∞k i j X X =HMC. Thus, for any ǫ>0, there exists k(ǫ) large enough such that for k >k(ǫ) 1 H(V0,...,Vk)−HMC <ǫ. k (cid:12) (cid:12) (cid:12) (cid:12) For T large enough, using tail-probabili(cid:12)ty estimates of Poisson(cid:12)variable it can be shown that (cid:12) (cid:12) λT P(N(T)≤k(ǫ))≤exp − . 8 (cid:18) (cid:19) Putting these together, we obtain that for given ǫ there exists T(ǫ) large enough such that for T ≥T(ǫ) H(V(T)|N(T)) 1 H(V ,...,V ) 0 k = kp k T T k ! k X k≥k(ǫ)kpk(HMC±ǫ)+O(k(ǫ)) = T P λT +O(k(ǫ)) = (HMC±ǫ) T = λHMC±2ǫ. That is H(V(T)|N(T)) lim =λHMC. T→∞ T Combining (4.2), (4.3) and the above equation we complete the proof of the Proposition 4.1. (cid:3) Fact 4.1 (cf. Ch. 13.5 [DVJ88]). Consider the set of stationary ergodic point processes with mean rate λ. Then the entropy of this collection is maximized by a Poisson Process with rate λ. That is, if P is a stationary ergodic point process with rate λ then H (P)≤λ(1−logλ). ER 9 5. Application 5.1. Computation of continuous entropies. In this sectionwe show how our previous results aidin the computation of traditional continuous time entropies. Let (X ,X ) be two i.i.d. random variables whose 1 2 distributions satisfy the conditions required of it to be a good-mixed pair. Let (Y ,Y ), Y < Y be the 1 2 1 2 ordering of (X ,X ). Then the following holds: 1 2 Lemma 5.1. h(Y ,Y )=h(X ,X )−log2. 1 2 1 2 Proof. Let I represent the indicator function such that I = 0 implies X = Y , and I = 1 implies X = Y . 1 1 1 2 Clearly I is independent of Y ,Y and probability of I = 1 is 1. It is further easy to see that (I,Y ,Y ) ↔ 1 2 2 1 2 (X ,X ) with the corresponding Jacobians evaluating to 1. Thus, viewed as mixed-pairs we can equate the 1 2 entropies, yielding log2+h(Y ,Y )=h(X ,X ). 1 2 1 2 This can be of course be shown using traditional methods but the proof here is an illustration of how our results can be used to obtain such results in an easier fashion. (cid:3) In a similar fashion if (Y ,...,Y ) is an ordering, in increasing order, of the i.i.d. random variables 1 n (X ,...,X ), then 1 n h(Y ,..,Y )=h(X ,....,X )−log(n!) 1 n 1 n 5.2. Poisson Splitting via Entropy Preservation. In this section, we use the sufficient conditions de- veloped in Lemma 3.2 to obtain proof of the following property. Lemma 5.2. Consider a Poisson process, P, of rate λ. Split the process P into two baby-processes P and 1 P as follows: for each point of P, toss an independent coin of bias p. Assign the point to P if coin turns 2 1 up head, else assign it to P . Then, the baby-processes P and P have the same entropy rate as Poisson 2 1 2 processes of rates λp and λ(1−p) respectively. Proof. Consider a Poisson Process, P, of rate λ in the interval [0,T]. Let N(T) be the number of points in this interval and let a(T) = {a ,...,a } be their locations. Further, let C(T) = {C ,...,C } 1 N(T) 1 N(T) be the outcomes of the coin-tosses and M(T) denote the number of heads among them. Denote r(T) = {R ,....,R }, b(T)={B ,....,B } as the locations of the baby-processes P ,P respectively. 1 M(T) 1 N(T)−M(T) 1 2 It is easy to see that the following bijection holds: {a(T),C(T),N(T),M(T)}⇋ (5.1) {r(T),b(T),N(T)−M(T),M(T).} Given the outcomes of the coin-tosses C(T), {r(T),b(T)} is a permutation of a(T). Hence, the Jacobian corresponding to any realization of {C(T),N(T),M(T)} that maps a(T) to {r(T),b(T)} is a permutation matrix, i.e determinant is ±1. Therefore, Lemma 3.2 implies that H(a(T),C(T),N(T),M(T)) (5.2) =H(r(T),b(T),N(T)−M(T),M(T)) ≤H(b(T),N(T)−M(T))+H(r(T),M(T)). M(T) is completely determined by C(T) and it is easy to deduce from the definitions that H(M(T)|a(T),C(T),N(T))=0. Hence H(a(T),C(T),N(T),M(T)) (5.3) =H(a(T),C(T),N(T))+H(M(T)|a(T),C(T),N(T)) =H(a(T),C(T),N(T)). 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.