ebook img

Thinning and Information Projections PDF

0.19 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Thinning and Information Projections

1 Thinning and Information Projections Peter Harremoës, Member, IEEE, Oliver Johnson, and Ioannis Kontoyiannis, Member, IEEE, Abstract—Inthispaperweestablishlowerboundsoninforma- hasdistributionP independentofX ,X ,··· then Y X 1 2 n=1 n tion divergence of a distribution on the integers from a Poisson has distribution α◦P. Obviously the thinning of an indepen- distribution.Theselowerboundsaretightandinthecaseswhere P dent sum of random variablesis the convolutionof thinnings. a rate of convergence in the Law of Thin Numbers can be Thinning transforms any natural exponential family on N 6 computed the rate is determined by the lower bounds proved 0 1 in this paper. General techniques for getting lower bounds in into a natural exponential family on N0. In particular the 0 termsofmomentsaredeveloped.Theresultsaboutlowerbound following classes of distributions are conserved under thin- 2 in the Law of Thin Numbers are used to derive similar results ning: binomial distributions, Poisson distributions, geometric for the Central Limit Theorem. distributions,negativebinomialdistributions,inversebinomial n a Index Terms—Information divergence, Poisson-Charlier poly- distributions, and generalized Poisson distributions. This can J nomials, Poisson distribution, Thinning. beverifiedbydirectcalculations[9],butitcanalsobeproved 7 using the variance function. A distribution on N is said to 0 1 I. INTRODUCTION AND PRELIMINARIES be ultra log-concave if its density with respect to a Poisson distribution is discrete log-concave. Thinning also conserves ] APPROXIMATION by a Poisson distribution is a well R the class of ultra log-concave distributions [10]. studied subject and a careful presentation can be found P The thinningoperationallow us to state and provethe Law in [1]. Connections to information theory have also been . of Thin Numbers in various versions. h established[2],[3].Formostvaluesoftheparameters,thebest t boundsontotalvariationbetweenabinomialdistributionanda Theorem 2 ( [8], [9], [11] ). Let P be a distribution on N a 0 m Poisson distributionwith the same meanhave beenprovedby with mean λ. If P∗n denote the n-fold convolution of P then ideasfrominformationtheoryviaPinsker’sinequality[4]–[7]. [ 1) 1 ◦P∗n converges point-wise to Po(λ) as n→∞. The idea of thinning a random variable was studied in [8] n 1 and used to formulate and prove a Law of Thin Numbers 2) If the divergence D n1 ◦P∗nkPo(λ) is finite eventu- v ally then that is a way of formulating the Law of Small Numbers (cid:0) (cid:1) 5 (Poisson’sLaw)sothatitresemblesformulationoftheCentral 1 5 D ◦P∗nkPo(λ) →0, as n→∞, 2 Limit Theorem for a sequence of independentidentically dis- n (cid:18) (cid:19) 4 tributed randomvariables. Here these ideas will be developed andthesequenceD 1 ◦P∗nkPo(λ) ismonotonically 0 further. There are three main reasons for developing these n . decreasing. 1 results. The first is to get a lower bound for the rate of 3) If P is an ultra log-(cid:0)concave distributi(cid:1)on on N then 0 convergence in the Law of Thin Numbers. The second is to 0 16 use these to get new inequalities and asymptotic results for H 1 ◦P∗n →H(Po(λ)), as n→∞. the central limit theorem. The last reason is to develop the n : (cid:18) (cid:19) v general understanding and techniques related to information Furthermore the sequence H 1 ◦P∗n is monotoni- Xi divergence and information projection. We hope eventually cally increasing. n to be able to tell which aspects of important theorems for (cid:0) (cid:1) r The focusof this paper is to developtechniquesthat allow us a continuous variables like the Entropy Power Inequality that to give lower bounds on the rate of convergence in the Law can be derived from results for discrete variables and which of Thin Numbers. One of our main results Theorem 19 has aspect are essentially related to continuous variables. The the following weaker result as corollary. relevance for communication will not be discussed here, see [8] for some related results. Theorem 3. Let X denote a discrete random variable with E[X]=λ. Then Definition1. LetP denoteadistributiononN .Forα∈[0,1] 0 the α-thinning of P is the distribution α◦P given by 2(D(XkPo(λ)))1/2 ≥1− Var(X). λ ∞ ℓ α◦P (k)= P(ℓ) αk(1−α)ℓ−k. If X ∼ Bi(n,λ/n), then Var(X) = np(1−p) = k ℓ=k (cid:18) (cid:19) λ(1−λ/n). Hence X If X1,X2,X3,... are independent identically distributed λ2 D(XkPo(λ))≥ . Bernoulli random variables with success probability α and Y 4n2 Manuscript received xxx,2015; revised xxx.This workwas supported by For the sequence of binomialdistributions Bi(n,λ/n) the rate theEuropeanPascalNetworkofExcellence. of convergencein information is given by P.HarremoësiswithCopenhagenBusinessCollege,Denmark.O.Johnson is with Bristol University, United Kingdom. I. Kontoyiannis is with Athens λ2 University ofEcon.&Business,Greece. n2D(Bi(n,λ/n)kPo(λ))→ 4 for n→∞, (1) 2 which was proved in [6] and [3]. [13] used the Chen-Stein method to get bounds on the rate of convergence in Poisson convergence in the total variation Corollary 4. Let Po (λ) denote the minimum information β metric.AspointedoutbyKullbackandLeibler[14]thenotion distributionfromPo(λ)andwiththesamemeanandvariance of sufficiency is closely related to the notion of information as Bi(n,λ/n). Then divergence (or Kullback-Leibler divergence). As we shall see n2D(Bi(n,λ/n)kPoβ(λ))→0 for n→∞. knowledge of the second moment is asymptotically sufficient intheLawofThinNumbers.JaynesdevelopedtheMaximum Remark 5. The distribution Po (λ) can be identified with an β Entropy Principle where entropy was maximized under some element in an exponentialfamily that will be studied in more constraints [15]. An obvious problem about the Maximum detail in Section V. Entropy Principle is how to determine which constraints are Proof: According to the Pythagorean Inequality for in- relevantforaspecificproblem.Inthermodynamicsexperience formation divergence [12] we have ofgenerationsofphysicistshasshownthatforanisolated gas in thermodynamic equilibrium the pressure and temperature D(Bi(n,λ/n)kPo(λ))≥D(Bi(n,λ/n)kPoβ(λ)) are sufficient in the sense that knowing only these two quan- +D(Po (λ)kPo(λ)) β tities allow you to determine any other physical property via ≥D(Bi(n,λ/n)kPoβ(λ)) the maximum entropy principle. The problem of determining λ2 whichstatisticisrelevant,persistswhentheMaximumEntropy + . PrincipleisreplacedbyaMinimumInformationPrinciplerel- 4n2 ative to some prior distribution. The results of this paper may Multiplication by n2 leads to be viewed as a systematic approach to the problem of finding n2D(Bi(n,λ/n)kPo(λ)) which quantities are sufficient for Poisson approximation. The rest of the paper is organized as follows. In Section λ2 ≥n2D(Bi(n,λ/n)kPoβ(λ))+ III Poisson-Charlier polynomials are introduced to simplify 4 momentcalculationsfor thinnedsumsof independentrandom and variables. Since one of our main techniques is based on 0≤n2D(Bi(n,λ/n)kPoβ(λ)) information projectionsunder moment constraints we have to addresstheproblemoftheexistenceofinformationprojections λ2 ≤n2D(Bi(n,λ/n)kPo(λ))− . in Section IV. These results may be of independent interest. 4 In SectionV we state ourmain resultson sharplower bounds The result follows by the use of (1). on information divergence. Many of our calculations involve Sincethesecondmomentisthesufficientstatisticintheex- Poisson-Charlierpolynomialsandarequitelengthy.Proofsare ponential family β →Poβ(λ) and Bi(n,λ/n) asymptotically postponed to the appendix. In some cases we are not able to is very close to Poβ(λ) we essentially prove that calculation get sharp lower bounds but under weak conditions the lower of the second moment is asymptotically sufficient for testing bounds are still asymptotically correct as stated in Section the binomial distribution versus the Poisson distribution. VI. Our results are related to the Central Limit Theorem and Pinsker’s inequality can be used to give an upper boundon the Gaussian distribution in Section VII. We end with the total variation when the divergence is given. With a bound conclusionfollowedbytheappendixcontainingseveralofthe on total variation we get a bound on any bounded function proofs. because fdP − fdQ ≤sup|f|·kP −Qk . II. INEQUALITIES IN EXPONENTIALFAMILIES ˆ ˆ 1 (cid:12) (cid:12) Let β →Q ,β ∈Γ denote an exponentialfamily such that (cid:12) (cid:12) β If fdQ(cid:12) =0 we get (cid:12) the Radon-Nikodym derivative is (cid:12) (cid:12) ´ −sup|f|·kP −Qk ≤ fdP ≤sup|f|·kP −Qk . dQβ exp(β·x) 1 ˆ 1 = dQ Z(β) 0 If f is unboundedsuch boundscannotbe derived so we shall and where Γ is the set of β such that the partition functionZ turn our attention to functions that are lower bounded. For is finite, i.e. such functions we have inff ·kP −Qk1 ≤ˆ fdP. Z(β)=ˆ exp(β·x) dQ0(x)<∞. When Q is a Poisson distribution and f is a Poisson-Charlier The partition function is also called the moment generating polynomialthen a much moretightboundcan be derivedand function. The parametrization β → Q is called the natural β inanumberofcasesitwillgivethecorrectrateofconvergence parametrization. The mean value of the distribution Q will β inthelawofsmallnumbersandrelatedconvergencetheorems. bedenotedµ . Thedistributionwithmeanvalueµ isdenoted β Thelowerboundsthatwederivecanalsobeusedtoqualify Qµ so that Qµβ = Qβ. The inverse of the function β → µβ the statement "two moments suffice for Poisson approxima- is denoted βˆ(·) and equals the maximum likelihood estimate tion" that has been the title of an article [13]. In the article ofthenaturalparameter.ThevarianceofQµ isdenotedV (µ) 3 so that µ → V (µ) is the variance function of the exponen- The derivatives of this function are tial family. This variance function uniquely characterizes the dβˆ(t) exponential family. g′(t)= t+ βˆ(t)−βˆ(ν) dt funWcteionnostoe tthhaatt β → lnZ(β) is the cumulant generating Z′ βˆ(cid:16)(t) dβˆ(t) (cid:17) − d Z (cid:16)βˆ(t)(cid:17) dt lnZ(β) =E[X], dβ |β=0 =βˆ(t)−(cid:16)βˆ(ν),(cid:17) d2 lnZ(β) =Var(X), 1 1 dβ2 |β=0 g′′(t)= = . dt/dβˆ(t) V (t) d3 dβ3 lnZ(β)|β=0 =E (X −E[X])3 . According to Taylor’s formula there exists η between µ and h i ν such that Lemma 6. Let β →Q ,β ∈Γ denote an exponential family β with D(QµkQν) dQ exp(β·x) β 1 = . =g(ν)+(µ−ν)f′(ν)+ (µ−ν)2f′′(η) dQ Z(β) 0 2 Then (µ−ν)2 = . 1) for some γ between α and β, 2V (η) V (µ ) D(Q kQ )= γ (α−β)2 α β 2 Definition 7. The signed log-likelihood is defined by and −(2·D(QµkQ ))1/2 µ≤ν, 0 2) for some η between µ and ν. G(µ)= ((2·D(QµkQ0))1/2 µ>ν. (µ−ν)2 D(QµkQν)= . Proposition 8. Let µ0denote the mean value of Q0 and let 2V (η) µ denote some other possible mean value. Then for some η Proof:Thetwopartsofthetheoremareprovedseparately. between µ and ν. 1. We consider the function µ−µ 0 G(µ)= . f(s)=D(QαkQs) V (η)1/2 = lndQα dQ Lemma 9. Let β →Qβ,β ∈Γ denote an exponential family ˆ dQ α with s dQ exp(β·x) exp(α·x) β = . Z(α) dQ Z(β) = ln dQ (x) 0 ˆ exp(s·x) α If µ =0 and V (0)=1 and Z(s) 0 =ˆ ((α−s)·x+lnZ(s)−lnZ(α)) dQα EQ0 X3 >0 then G(µ)≤µ holds for µ(cid:2)in a(cid:3)neighborhood of 0. =(α−s)·µ +lnZ(s)−lnZ(α) α Proof: From Proposition 8 we know that there exists η The derivatives of this function are between µ and 0 such that Z′(s) f′(s)=−µ + =µ −µ , µ−0 α Z(s) s α G(µ)= . V (η)1/2 Z(s)Z′′(s)−(Z′(s))2 f′′(s)= Z(s)2 =V (µs). Thereforeit is sufficientto prove that V (η) is increasing in a neighborhoodof 0. This follows because According to Taylor’s formula there exists γ between α and dV(η) β such that dV (η) = dβ dη dη D(Q kQ ) dβ α β 1 d3 lnZ(β) =f(α)+(β−α)f′(α)+ (β−α)2f′′(γ) = dβ3 2 d2 lnZ(β) V (µ ) dβ2 = 2γ (β−α)2. E (X −η)3 = 2. We consider the function hVar(X) i g(t)=D QtkQν wherethemeanandvariancearetakenwithrespecttotheele- E[X3] = βˆ(cid:0)(t)−βˆ(cid:1)(ν) ·t mentintheexponentialfamilywithmeanη.Since Var(X) >0 E[(X−η)3] (cid:16)+lnZ βˆ(ν(cid:17)) −lnZ βˆ(t) . forβ =0wehavethat Var(X) >0forβ inaneighborhood of 0 so V is increasing. (cid:16) (cid:17) (cid:16) (cid:17) 4 III. MOMENTCALCULATIONS For moment calculations involving sums of thinned vari- Weshallusethenotationxk =x(x−1)···(x−k+1)for ablesweshallalsousethePoisson-Charlierpolynomials[18], which are given by the falling factorial. The factorial moments of an α-thinning are easy to calculate k 1 k Y k Ckλ(x)= λkk! −1/2 kℓ (−λ)k−ℓxℓ E ◦Y =E Xn (2) (cid:0) (cid:1) Xℓ=0(cid:18) (cid:19) "(cid:18)n (cid:19) #  nX=1 !  where k ∈ N0. The Poisson-Charlier polynomials are char-   k acterized as normalized orthogonal polynomials with respect Y =E E X Y (3) to the Poisson distribution Po(λ). The first three Poisson-   n=1 n! (cid:12)(cid:12)  Charlier polynomials are X (cid:12) =EαkYk =αkE Y(cid:12)(cid:12)k . Cλ(x)=1, (cid:12) 0 Thus, thinning scales the fact(cid:2)orial m(cid:3)oments i(cid:2)n th(cid:3)e same way Cλ(x)= x−λ, as ordinary multiplication scales the ordinary moments. 1 λ1/2 Thebinomialdistributions,Poisson distributions,geometric Cλ(x)= x2−(2λ+1)x+λ2. distributions,negativebinomialdistributions,inversebinomial 2 21/2λ distributions, and generalized Poisson distributions are expo- A meanvalueofa Poisson-Charlierpolynomialwillbe called nential families with at most cubic variance functions [16] a Poisson-Charlier moment. First we note that if E[X] = λ [17]. The thinned family is also exponential. the second Poisson-Charlier moment is given by Theorem10. LetV bethevariancefunctionofanexponential Var(X)−λ family with X ∈ N0 as sufficient statistics and let Vα E C2λ(x) = 21/2λ . denote the variance function of the α-thinned family. Then (cid:2) (cid:3) the variance functions V and V are related by the equation Let κ denotethe first value of k such that E Cλ(X) 6=0 a k or, equivalently, E Xk 6= λk. Lower bounds on the rate of x Vα(x)=α2V −αx+x. convergence in the thin law of large numbers(cid:2)are essen(cid:3)tially α (cid:2) (cid:3) Proof: The variance of a(cid:16)thin(cid:17)ned variable can be calcu- given in terms of κ and c=E Cκλ(X) . lated as Proposition 11. The Poisson-C(cid:2)harlier m(cid:3)oments satisfy Var(α◦X)=E (α◦X)2 +E[α◦X]−(E[α◦X])2 1 n E[Cλ(X)] E Cλ ◦ X = k =α2hE X2 +iE[α◦X]−(E[α◦X])2  k n j nk−1 j=1 X =α2 V(cid:2)ar((cid:3)X)−E[X]+E[X]2    for k =0,1,...,κ+1. +(cid:16)E[α◦X]−(E[α◦X])2 (cid:17) Proof:Thisfollowsbystraightforwardcalculationsbased =α2V E[α◦X] on Lemma 25 that can be found in the appendix. α For some moment calculations the following result of (cid:18) (cid:19) −αE[α◦X]+E[α◦X]. Khoklov is useful. Lemma 12. [Khoklov [19]]1 For instance the variance function of the Poisson distribu- xtiosnostihseVth(ixn)ne=dxfaamnidlythiesraelfsoorePVoαiss(oxn).=Inαg2eVneraαxlt−heαtxh+inxne=d Ckλ(x)Cℓλ(x)=(−1)k+ℓ λkk!+ℓ!ℓ 1/2 k+ℓ cmCmλ (x) familyhasavariancefunctionthatisapolynom(cid:0)ial(cid:1)ofthesame (cid:18) (cid:19) mX=0 where c as a function of k,ℓ and λ is given by order and structure. Therefore not only the Poisson family m but all the above mentioned families are conserved under m k ℓ k ℓ µnνn(µ+ν−n)m(−1)µ+ν thinning. This kind of variance function calculations can also µ=0 ν=0 µ ν . be used to verify that if V is the variance function of the n=0P P (cid:0) (cid:1)((cid:0)m(cid:1)!λm)1/2n!λn X exponential family based on P then the variance function of In the appendix we use Lemma 12 to prove the following the exponential family based on 1 ◦P∗n is n result. V (x)−x x→ +x. Lemma 13. n In particular the variance function correspondingto a thinned Cλ(x) 2 = λk 2k c Cλ (x) sum converges to the variance function of the Poisson distri- k k! m m m=0 butions. This observation can be used to give an alternative (cid:0) (cid:1) X proof of the law of thin numbers, but we shall not develop 1The original formula in [19]contains a typo in that the factor (−1)k+l this idea any further in the present paper. ismissingbuttheproofiscorrect. 5 where c as a function of k and λ is given by Assume that D(CkPo(λ)) < ∞ and ℓ ≥ 2. Consider the m m (kn)3nk−n following three cases: (m!λm)−1/2 . n!λn 1) hk =0 for k<ℓ and hℓ >0. nX=0 2) h =0 for k<ℓ and h <0. k ℓ Lemma 14. For a Poisson random variable X with mean 3) h =0 for k<ℓ−1 and h >0. k ℓ−1 value λ we have In case 1 no minimizer exists and D(CkPo(λ))=0. In case E Cλ(X)3 >0 k 2 and 3 there exists a distribution P∗ ∈ C that minimizes for any k ∈N . h i D(PkPo(λ)). 0 Proof: Case 1. If a minimizer existed it would be an Proof: According to Lemma 13 we have element of the corresponding exponential family, but the E Ckλ(X) 3 =E λkk! 2k cmCmλ (X) Ckλ(X) parFtoitriocnasfeusnc2tiaonndc3anlentoPt b=e fiPn∗itebebethceaumseinhimℓ >um0inanfodrmℓa≥tio2n. " ! # h(cid:0) (cid:1) i mX=0 distribution satisfying the conditions. λk Case 2. Assume that h = 0 for k < ℓ and h < 0. = c . k ℓ k! k Assume also that EP∗ Cℓλ(X) < hℓ . Define Pθ = θP∗+ where c is defined in Lemma 13. Thereforeit is sufficient to (1−θ)Po(λ).Thentheconditions(4)holdsforP =Pθ and k (cid:2) (cid:3) prove that k (kn)3nk−n EPθ Cℓλ(X) =θEP∗ Cℓλ(X) +(1−θ)EPo(λ) Cℓλ(X) n!λn >0. (cid:2) (cid:3)=θEP∗(cid:2)Cℓλ(X)(cid:3). (cid:2) (cid:3) n=0 X Thus E [C (λ,X)]=(cid:2)h if (cid:3) This follows because (kn)3nk−n is always non-negative and Pθ ℓ ℓ h it is positive when k/2≤n≤k. θ = ℓ ∈]0,1[. EP∗ Cℓλ(X) IV. EXISTENCEOF MINIMUM INFORMATION Therefore Pθ satisfies (6)(cid:2)but (cid:3) DISTRIBUTIONS LetX bearandomvariableforwhichthemomentsoforder D PθkPo(λ) ≤θD(P∗kPo(λ))<D(P∗kPo(λ)) 1,2,...ℓ exist. We shall assume that E[X] = λ. We are in- and we(cid:0)have a con(cid:1)tradiction. terested in minimizing information divergence D(XkPo(λ)) Case 3. Now, assume that h = 0 for k < ℓ − 1 and under linear conditions on the moments of X and derive k conditionsforaminimuminformationdistributiontoexist.We hℓ−1 >0. Moreover,assume that EP∗ Cℓλ(X) <hℓ. Using theresultofcase1weseethatthereexistsadistributionP˜ for shall use D(CkPo(λ)) as notation for inf D(PkPo(λ)) (cid:2) (cid:3) P∈C whichtheℓfirstmomentsexistandthatthefirstℓ−1moments Lemma 15. For some fixed set (h ,··· ,h ) ∈ Rℓ, let satisfy (4) butwith D P˜kPo(λ) <D(P∗kPo(λ)). Define 1 ℓ K be the convex set of distributions on N0 for which the Pθ =θP∗+(1−θ)P˜(cid:16)(λ). Then(cid:17)the conditions(4) holdsfor first ℓ moments are defined and which satisfies the following P =Pθ and conditions E Cλ(X) =h , for k =1,2,··· ,ℓ−1; (4) EPθ Cℓλ(X) = P k k EP (cid:2)Cℓλ(X)(cid:3)≤hℓ . (5) (cid:2) θE(cid:3)P∗ Cℓλ(X) +(1−θ)θEP˜ Cℓλ(X) . (7) Therefore E Cλ((cid:2)X) ≤(cid:3)h for θ sufficie(cid:2)ntly clo(cid:3)se to 1 If D(KkPo(cid:2)(λ))<∞(cid:3) then the minimum information projec- Pθ ℓ ℓ butD PθkPo(λ) ≤θD(P∗kPo(λ))<D(P∗kPo(λ))and tion of Po(λ) exists. (cid:2) (cid:3) we have a contradiction. Therefore P∗ satisfies the equation Proof: Let G~ ∈ Rℓ−1 be a vector and let CG~ be the set E Cℓλ(cid:0)(X) =hℓ.(cid:1) of distributions satisfying the following inequalities Fortheapplicationswehaveinminditwillbeeasytocheck (cid:2) (cid:3) the condition E Cℓλ(X)−hℓ− Gk· Ckλ(X)−hk ≤0. D(CkPo(λ))<∞, " # k<ℓ X (cid:0) (cid:1) but in general it may be difficult even to determine simple We see that the set CG~ is tight because Cℓλ(x) − hℓ − necessary and sufficient conditions for C 6= ∅ in terms of a G Cλ(x)−h → ∞ for x → ∞. Therefore the k<ℓ k k k set of specified moments. intersection K = C is compact. There exists a Pdistribution(cid:0) P∗ ∈ KG~∈s(cid:1)Rucℓ−h1thaG~t the information divergence T V. LOWERBOUNDS D(PkPo(λ)) is minimal because K is compact. First we consider the exponential family based on the Theorem 16. Let C be the set of distributions on N for 0 distribution Po(λ). The variance function of this exponential whichthefirstℓmomentsaredefinedandsatisfythefollowing family is V (µ)=µ which is an increasing function. Hence equations µ−λ E Cλ(X) =h for k =1,2,··· ,ℓ. (6) G(µ)≤ . k k λ1/2 (cid:2) (cid:3) 6 Squaring this inequality for µ≤λ gives depending on the value of E Cλ(X) compared with β . 2 0 The minimum of the function Cλ(x) with R as domain is (µ−λ)2 (cid:2) 2 (cid:3) D(Po(µ)kPo(λ))≥ . (8) -2-1/2−1/(4·21/2·λ). Hence C2λ(x)≥β0 if λ≥3.8444and 2λ E Cλ(X) ≥ β . Note also that E Cλ(X) ≥ Cλ(E[X]) Let X be a random variable with values in N0 and with accor2ding to Jens0en’s Inequality. The P2oisson-Char2lier poly- mean µ. Then the divergenceD(XkPo(λ)) is minimalif the no(cid:2)mial of o(cid:3)rder 2 satisfies (cid:2) (cid:3) distribution of X is an element of the associated exponential family, i.e. C2λ(λ)=C2λ(λ+1)=-2-1/2 D(XkPo(λ))≥D(Po(µ)kPo(λ)) so Cλ(x) ≥ -2-1/2 for x ≤ λ and for x ≥ λ + 1. Hence 2 E Cλ(X) 2 E C2λ(X) ≥-2-1/2 if E[X]≤λor ifE[X]≥λ+1.We see ≥ (cid:2) 12 (cid:3) . E(cid:2)(cid:2)CCa2λse(X1)(cid:3)(cid:3)w≥heβr0eeExceCpt2λi(fXX) is≥verβy0c.oLnceetnPtraot(eλd)nβeabreλ.the We conjecture that a result similar to (8) holds for any order exponentialfamily based on Po(λ) with Cλ(x) as sufficient (cid:2) (cid:3) 2 of the Poisson-Charlier polynomial. statistic and partition functions Conjecture 17. For any random variable X with values in ∞ N and for any k ∈N we have Z(β)= exp βCλ(x) Po(λ,x). 0 2 E Cλ(X) 2 xX=0 (cid:0) (cid:1) D(XkPo(λ))≥ k (9) LetPo(λ)t denotetheelementinthisexponentialfamilywith 2 (cid:2) (cid:3) mean t. We want to prove that if E Cλ(X) ≤0. k 1 D Po(λ)t Po(λ) ≥ t2 W(cid:2)e have n(cid:3)ot been able to prove this conjecture but we can 2 prove some partial results. (cid:16) (cid:13) (cid:17) for t ∈ [β ,0]. The inequa(cid:13)lity is obvious for t = 0 is is 0 (cid:13) Theorem 18. For any random variable X with values in N sufficient to prove that 0 and any k ∈N there exists ε>0 such that for E Cλ(X) ∈ k dD [−ε,0] inequality (9) holds. ≤t (cid:2) (cid:3) dt Proof: Let Z(β) denote the partition function of the for t ∈ [β ,0]. As in the proof of Lemma 6 we have dD = exponentialfamily based on Po(λ) with Cλ(X) as sufficient 0 dt k βˆ(t) so is is sufficient to prove that βˆ(t)≤t for t∈[β ,0]. statistics. The function x → Cλ(x) is lower bounded so for 0 k If βˆ(t)≤β this is obvious so we just have to prove that any β ≤0 the partition function Z(b) is finite. Therefore for 0 any c∈ minCλ(x),0 there exists an element in the expo- Z′(β) k β ≤t = nentialfamilywithcasmeanvalue.Thisdistributionminimize β Z(β) (cid:3) (cid:3) information divergence under the constraint E Cλ(X) =c. k for β ∈ [β ,0]. This inequality is fulfilled for β = 0 so we Therefore we can use 9 with X replaced by Cλ(X) and we 0 (cid:2)k (cid:3) differentiate once more and we just have to prove that just need to prove that E Cλ(X)3 > 0, which is done in k Lemma 14 below. h i Z(β)Z′′(β)−(Z′(β))2 1≥ Conjecture 17 can be proved for k =2. The proof is quite (Z(β))2 long in order to cover all cases so the main part of the proof We observe that Z(0) = 1 and that Z(β) > 0. Since is postponed to the appendix. Z′(0)=0 and Z′′(β)>0 we have Z(β)≥1 for all values Theorem 19. For any random variable X with values in N0 of β. Therefore is sufficient to prove that 1 ≥ Z′′(β) for the inequality (9) holds for k =2 if E C2λ(X) ≤0. β ∈[β0,0]. The function Remark 20. Note that there is no assum(cid:2)ption on(cid:3) the mean of ∞ X in this theorem. Theorem 3 is a reformulationof Theorem Z′′(β)= Cλ(x)2exp βCλ(x) Po(λ,x) 2 2 19 under the extra assumption that E[X]=λ. x=0 X (cid:0) (cid:1) AsinLemma9,wewanttodemonstratethatVar(Xβ)≤1. is convex, so it is sufficient to prove the inequality d2Z ≤ 1 Since the variance is dβ2 for β = 0 and β = β . Consider the function f(x) = 0 d2 Z′′(β)Z(β)−(Z′(β))2 Z′′(β) x2exp(β x) with f′(x)=(2+β x)xexp(β x). The func- ln(Z(β))= < , 0 0 0 dβ2 (Z(β))2 Z(β) tion f is decreasing for x ≤ 0, has minimum for x = 0, we will prove the theorem by giving bounds on Z′′(β). is increasing for 0 ≤ x ≤ -2/β0, has a local maximum 4exp(-2)/β02 inx=-2/β0,anddecreasingforx≥-2/β0.Then Proof: Define β0 as the negative solution to thelocalmaximumatx=-2/β0hasvalue0.95451<1.Hence f(x) ≤ 1 for x ≥ β . The minimum of the function Cλ(x) β2exp β2 =1. 0 2 withRasdomainis-2−1/2−1/(4·21/2·λ).HenceCλ(x)≥β 2 0 Numericalcalculationsgiveβ(cid:0) =(cid:1)-0.75309. We observe that if λ≥3.8444so for such λ, we have f Cλ(x) ≤1 for all 0 2 β is slightly less that -2-1/2 and divide the proof in two cases x. 0 (cid:0) (cid:1) 7 f(x) Proof: For k =κ there exists ε>0 such that inequality 2 (9) holdswhen the conditionin Theorem18 is fulfilled.Now, by Proposition 11 1.5 1 n E[Cλ(X)] E Cλ ◦ X = κ ∈[−ε,0]  κn j nκ−1 j=1 X 1 for sufficientlylarge valuesofn implying that 1 n 1 E[Cλ(X)] 2 0.5 D ◦ X Po(λ) ≥ κ , n j(cid:13)  2 nκ−1 j=1 (cid:13) (cid:18) (cid:19) x X (cid:13) and the lower bound (1(cid:13)1) follows. (cid:13) β0 1 -β20 5 If the distribution (cid:13)of X is ultra log-concave we auto- -2-1/2 matically have E Cλ(X) ≤ 0, and we conjecture that κ the asymptotic lower bound is tight for ultra log-concave (cid:2) (cid:3) Figure1. Plotofthefunction f(x)=x2exp(β0x). distributions. A similar lower bound on rate of convergence can be achieved even if E Cλ(X) > 0, but then it requires the κ The graph of x → Cλ(x) is a parabola and we have existence of a moment of higher order to “stabilize” the 2 (cid:2) (cid:3) Cλ(λ) = Cλ(λ+1) = -2-1/2. For all values x 6∈ ]λ,λ+1[ moment of order κ. Thus we shall assume the existence of 2 2 we have f Cλ(x) ≤ 1. Since x can only assume integer moments of all orders less than or equal to κ+1. 2 values only x = ⌈λ⌉ will contribute to the mean value with (cid:0) (cid:1) Theorem 22. If X1,X2,... is a sequence of independent a value greater than 1. A careful inspection of different cases identicallydistributed discrete randomvariablesforwhichall will show that although x = ⌈λ⌉ will contribute to the mean Poisson-Charlier moments of order less than κ are zero and with a value f(x)>1, this contributionwill be averaged out for which moments of order up to κ+1 exists then with some other value of x. The details can be found in the appendix. 1 n E Cλ(X) 2 butCioanseP2owfhXereisEstrCon2λg(lyXc)on<cenβt0ra.teInd tnheiasrcλas+e t1h/2e wdihsetrrie- linm→i∞nfn2κ−2Dn ◦Xj=1Xj(cid:13)(cid:13)(cid:13)Po(λ)≥ (cid:2) κ2 (cid:3) . the minimum of C2λ(cid:2)(x) is a(cid:3)ttained. According to Pinsker’s Proof: Definet=n1−κ. T(cid:13)(cid:13)hen  (cid:13) inequality n 1 D(PkPo(λ))≥ 1kP −Po(λ)k2 ECκλn ◦ Xj=t·E Cκλ(X) 2 Xj=1 (cid:2) (cid:3)    so it is sufficient to prove that 1 n kP −Po(λ)k≥ E C2λ(X) . (10) ECκλ+1n ◦Xj=1Xj=tκ−κ1 ·E(cid:2)Cκλ+1(X)(cid:3). Again, we have to divide in a n(cid:12)um(cid:2)ber of ca(cid:3)s(cid:12)es. This is done LetPt∗denotetheminimuminformationdistributionsatisfying (cid:12) (cid:12) in the appendix. EPt∗ Cκλ(X) =a and EPt∗ Cκλ+1(X) =b where (cid:2) (cid:3) (cid:2) (cid:3) VI. ASYMPTOTICLOWERBOUNDS a=t·E Cκλ(X) and b=tκ−κ1 ·E Cκλ+1(X) . This section combines results from Section III, IV, and V. We now present lower bounds on the rate of convergence Then Pt∗ is an(cid:2)element(cid:3)in the exponential f(cid:2)amily and(cid:3) in the Law of Thin Numbers in the sense of information Pt∗(j) = exp β1Cκλ(j)+β2Cκλ+1(j) divergence. The key idea is that we bound D(PkPo(λ)) ≥ Po(λ,j) (cid:0) Z(β1,β2) (cid:1) D(P∗kPo(λ)), where P∗ is the minimum information distri- where Z(β ,β ) is the partition function and β and β are 1 2 1 2 bution in a class containing P. Using the construction for P∗ determined by the conditions. Thus found in Section IV, we can find an explicit expression for D(P∗kPo(λ)) the right hand side t ∞ exp β Cλ(x)+β Cλ (x) Theorem 21. LetX bea randomvariablewith valuesin N . = P∗(x)ln 1 κ 2 κ+1 0 t Z(β ,β ) If E Cκλ(X) ≤0 then there exists n0 such that xX=0 (cid:0) 1 2 (cid:1) ∞ (cid:2) (cid:3) 1 n E Cλ(X) 2 = β1Cκλ(x)+β2Cκλ+1(x) Pt∗(x)−lnZ(β1,β2) n2κ−2Dn ◦Xj=1Xj(cid:13)(cid:13)(cid:13)Po(λ)≥ (cid:2) κ2 (cid:3) . (11) =xβX=1t0E(cid:0) Cκλ(X) +β2tκ−κ1 ·E(cid:1)Cκλ+1(X)  (cid:13)  −lnZ(β ,β ). for n≥n . (cid:13) (cid:2) (cid:3) (cid:2) (cid:3) 1 2 0 (cid:13) 8 Therefore FinallyweusethatVar(⌊aX +b⌋/a)→Var(X)forn→∞ d da ∂D to get D(P∗kPo(λ))= dt ∂a dt t (cid:18) ddbt (cid:12) ∂∂Db (cid:19) Var(X) −1 2 where (·|·) denotes the inner product. T(cid:12)(cid:12)hus D XkΦ µ,σ2 ≥ σ2 (cid:12) (cid:16) 4 (cid:17) d2 (cid:0) (cid:0) (cid:1)(cid:1) E[H2(X −E[X])]2 D(P∗kPo(λ)) = , dt2 t 2 td2a t−1∂D da ∂2D ∂2D da where H2 is the second Hermite polynomial. This inequality = tddd2tt22b (cid:12)(cid:12) t−1∂∂∂Dab !+ dddbtt (cid:12)(cid:12) ∂∂∂b2a∂D2a ∂∂∂a2b∂D2b (cid:12)(cid:12) dddbtt ! cexanpoanlesnotiablefpamroivlyedobfyGaausstsriaainghdtfisotrrwibaurtdiocnas.lcuFloaltlioownining tthhee →E(cid:12)(cid:12) Cλ(X) ∂2DE[C (cid:12)(cid:12)(X)]=E Cλ((cid:12)(cid:12)X) 2, same kind of reasoning we get the following new and non- (cid:12) κ ∂a2 κ(cid:12) κ (cid:12) trivial result: where the phys(cid:2)ics notat(cid:3)ion (~u|A|~v) for (~u(cid:2)|A~v) is(cid:3)used Theorem 23. For any random variable X with mean 0 and when A is a matrix. Hence, variance 1 and for any ℓ∈N there exists ε>0 such 1 n E[H (X)]2 liminfn2κ−2D ◦ X Po(λ) D XkΦ µ,σ2 ≥ 2ℓ n→∞ n j(cid:13)  2 j=1 (cid:13) X (cid:13) if E[H (X)]∈(cid:0)[−ε,0(cid:0)]. (cid:1)(cid:1) ≥liminft−2D(cid:13) (P∗kPo(λ)) 2ℓ (cid:13) t E Cλ(X)(cid:13)2 Proof: We have to prove that E (H2ℓ(X))3 > 0. The ≥ κ2 , product of two Hermite polynomials chan be expanided as (cid:2) (cid:3) which proves the theorem. min(m,n) m n H (x)H (x)= r! H (x). m n m+n−2r r r VII. DISCRETE AND CONTINUOUS DISTRIBUTIONS Xr=0 (cid:18) (cid:19)(cid:18) (cid:19) This result was proved by Feldheim and later extended [20], Theα-thinningα◦P ofadistributionP onN isalsoadis- 0 [21], butcanalsobe derivedbytheresultofKhokhlovbyap- tributiononN .We canextendthe thinningoperationfordis- 0 proximatinga Gaussian distributionby a Poisson distribution. tributionsP of randomvariables Y on N /n={0, 1, 2 ...}, 0 n n Using this result we get by letting α ◦ P be the distribution of 1 nY B , where n j=1 j vthaeriaBbjlearYe awsitbhefdoirsetr.iMbuotiroengPeneornall[y0,,s∞ta)r,tinlPegt wPinthdaenroatnedothme (H2ℓ(X))2 = 2ℓ r! 2rℓ 2H4ℓ−2r(x) distribution on N0/n with Pn(j/n) = P([j/n,j/n+1[). It is Xr=0 (cid:18) (cid:19) easy to see that α◦P converges to the distribution of αY and n 2 as n → ∞. In this sense, thinning can be interpreted as a E (H (X))3 =ℓ! 2ℓ ≥0. 2ℓ discreteanalogofthescalingoperationforcontinuousrandom ℓ h i (cid:18) (cid:19) variables. LetΦ µ,σ2 denotethe distributionof a Gaussian random Note that the Hermite polynomials of even orders are variable with mean µ and variance σ2. We are interested in a essentially generalized Laguerre polynomials so the theorem (cid:0) (cid:1) lower bound on D XkΦ µ,σ2 in terms of the variance of can be translated into a result on these polynomials. X, where X is some random variable. We shall assume that If Conjecture 17 holds then the condition E[H (X)] ∈ (cid:0) (cid:0) (cid:1)(cid:1) 2ℓ Var(X)≤σ2. First we remark that [−ε,0] in Theorem 23 can be replaced by the much weaker D XkΦ µ,σ2 =D aX+bkΦ aµ+b,a2σ2 conditionE[H2ℓ(X)]≤0.Thecaseℓ=1hasbeendiscussed aboveand the case ℓ=2 hasalso been proved[22]2. Assume forreal(cid:0)constan(cid:0)tsaa(cid:1)n(cid:1)db.Th(cid:0)econstantsa(cid:0)andbcanbe(cid:1)c(cid:1)hosen that E[H (X )] is the first Hermite momentthat is different 2ℓ 1 so that aµ+b=a2σ2. Our next step is to discretize fromzeroandthatitisnegative.LetX ,X ,... beasequence 1 2 of iid random variables and let U denote their normalized n D aX +bkΦ aµ+b,a2σ2 sum. In [22] it was provedthat the rate of convergencein the ≈D ⌊aX +b⌋kPo a2σ2 . (12) information theoretic version of the Central Limit Theorem (cid:0) (cid:0) (cid:1)(cid:1) satisfies Next we use Theorem 19 to g(cid:0)et (cid:0) (cid:1)(cid:1) (E[H (X )])2 liminfn2ℓ−2D(U kΦ(0,1))≥ 2ℓ 1 E Ca2σ2(⌊aX+b⌋) 2 n→∞ n 2 2 D ⌊aX+b⌋kPo a2σ2 ≥ = Theorem 23 implies the stronger result that h 2 i 1 (cid:0)Var⌊aX +b⌋(cid:0)−1 2(cid:1)(cid:1)= 1 Var ⌊aXa+b⌋ −1 2. n2ℓ−2D(UnkΦ(0,1))≥ (E[H2ℓ2(X1)])2 (13) 4 a2σ2 4 (cid:16)σ2 (cid:17)  (cid:18) (cid:19) 2Notethat[22]usedadifferentnormalizationoftheHermitepolynomials.   9 holds eventually. We conjecture that the Inequality (13) holds Thus, this sum is only non-zero if a = k−n. Similarly the foralln,andthatthislowerboundwillgivethecorrectrateof sum k k νn(-1)ν(ν−n)b isonlynon-zeroifb=k−n. ν=0 ν convergenceintheinformationtheoreticversionoftheCentral Thus c = k−2(k−n) = 2n−k and the condition c ≥ 0 P (cid:0) (cid:1) Limit Theorem under weak regularity conditions . implies that n≥k/2. Hence VIII. ACKNOWLEDGMENT k k k k µnνn(µ+ν−n)k(−1)µ+ν Lemma 6 was developed in close collaboration with Peter µ ν Grünwald. µX=0νX=0(cid:18) (cid:19)(cid:18) (cid:19) k = n2n−kk!(−1)kk!(-1)k IX. APPENDIX (cid:18) k−n k−n 2n−k (cid:19) The following result is a multinomial version of Vander- =(kn)3nk−n, monde’s identity and is easily proved by induction. whichisalwaysnon-negativeanditispositiveifk/2≤n≤k. Lemma 24. The falling factorial satisfies the multinomial expansion, i.e. Proof of Theorem 19 Case 1: The theorem holds if λ∈ Y k Y N,andfromnowonwewillassumethatλ6∈N.Intheinterval k k Xx = k k ··· k Xxx . ]λ,λ+1] the ceiling ⌈λ⌉ is the only integer and C2λ(x) is xX=1 ! PXkx=k(cid:18) 1 2 Y (cid:19)xY=1 minimal for x=⌈λ⌉. Therefore We apply this result on the thinning operation. d2Z Lemma25. LetX1,X2,...beasequenceofindependentiden- dβ2 = Po(λ,x)f C2λ(x) ticallydistributeddiscreterandomvariablesalldistributedlike x=⌊Xλ⌋,⌈λ⌉ (cid:0) (cid:1) X. Then + Po(λ,x)f Cλ(x) 2 k 1 n x6=⌊Xλ⌋,⌈λ⌉ (cid:0) (cid:1) En ◦ Xj = ≤ Po(λ,x)f C2λ(x) + Po(λ,x)  Xj=1  x=⌊Xλ⌋,⌈λ⌉ (cid:0) (cid:1) x6=⌊Xλ⌋,⌈λ⌉ λk ,   for k ≤κ−1; so we just have to show that λk+ E[Xκ]−λk , for k =κ ;  nκ−1  λκ+1+ (n−1)(κ+12)λn(κE[Xκ]−λκ) for k =κ+1. x=⌊Xλ⌋,⌈λ⌉Po(λ,x)f(cid:0)C2λ(x)(cid:1)≤x=⌊Xλ⌋,⌈λ⌉Po(λ,x). E[Xκ+1]−λκ+1 + , Proof: These equantκions follow from the Vandermonde AsufftfiecriednivtitsoiopnrobvyePthoa(tλ,⌊λ⌋) and 1+ ⌈λλ⌉ we see that it is Identity for factorials combined with Equation (2). Proofof Lemma 13: For any n satisfying 0≤n≤k we ⌈λ⌉f Cλ(⌊λ⌋) +λf Cλ(⌈λ⌉) can use the Vandermonde Identity for factorials to get 2 2 ≤1. ⌈λ⌉+λ (cid:0) (cid:1) (cid:0) (cid:1) k k k k µnνn(µ+ν−n)k(-1)µ+ν = At this point the proof splits into the three sub-cases: λ ∈ µ ν µ=0ν=0(cid:18) (cid:19)(cid:18) (cid:19) [0,1], λ∈]1,2], and λ∈]2,∞[. XX k k µn(µ−n)a(-1)µ Sub-case λ < 1 For λ ∈ ]0,1[ the ceiling of λ is 1 and akbc nc Pµ=0(cid:0)µ(cid:1) × . C2λ(0)= 21λ/2 and C2λ(1)= λ21−/22. We have a+b (cid:18) (cid:19) k k νn(ν−n)b(-1)ν +Xc=k  ν=0 ν  f λ +λf λ−2 Now  P (cid:0) (cid:1)  21/2 21/2 (cid:16) (cid:17)1+λ (cid:16) (cid:17) k µk µn(µ−n)a(-1)µ = λ2exp β021λ/2 +(λ−2)2λexp β02λ1−/22 µX=0(cid:18) (cid:19) (cid:16) (cid:17)2(1+λ) (cid:0) (cid:1) = k kn+a(k−n−a)! (-1)µ = λ2+(λ−2)2λexp -β021/2 . (µ−n−a)!(k−µ)! µ=Xn+a 2(1+λ) 1−β0(cid:0)21λ/2 (cid:1) k k−n−a (cid:16) (cid:17) =kn+a(-1)n+a (-1)µ−n−a µ−n−a As a function of λ∈[0,1] this function is increasing for λ≤ µ=Xn+a(cid:18) (cid:19) 0.458471 and decreasing for λ ≥ 0.4556 and the maximal 0, for a6=k−n; value is 0.9288, which is less than 1. = k!(-1)k, for a=k−n. (cid:26) Sub-case 1<λ<2 For λ∈]1,2[ the ceiling is 2, and we 10 have ⌈λ⌉f Cλ(⌊λ⌋) +λf Cλ(⌈λ⌉) 1.5 2 2 = ⌈λ⌉+λ (cid:0) (cid:1) (cid:0) (cid:1) 1 2f Cλ(1) +λf Cλ(2) 2 2 = 2+λ 0.5 (cid:0) (cid:1) (cid:0) (cid:1) (λ−2)2exp β λ−2 + (λ2−4λ+2)2 exp β λ2−4λ+2 λ 021/2 2λ 0 21/2λ 0 ≤ (cid:16) (cid:17) 2+λ (cid:16) (cid:17) 1 2 3 4 (λ−2)2exp -β 21/2 + (λ2−4λ+2)2 exp β 2−23/2 Figure2. TheleftandtherighthandsideoftheInequality (16). 0 2λ 0 . 2+λ (cid:0) (cid:1) (cid:0) (cid:0) (cid:1)(cid:1) For λ∈[1,2] this functionis decreasing in λ, with maximum The maximal point probability of a Poisson random variable for λ=1 attaining the value 0.8784, which is less than 1. with λ≥1 is exp(-1) implying that Sub-case λ > 2 For λ > 2 we use that the Poisson distribution has mode ⌊λ⌋. The minimum of C2λ(X) as a kP −Po(λ)k function of X is Cλ(⌈λ⌉). The minimum of the function 2 ≥2(Pr(X =⌈λ⌉)−Po(λ,⌈λ⌉)) λ → Cλ(⌈λ⌉) is attained when ⌈λ⌉ = 3. We find the minimum2of C2λ(3) to be -0.7785 as a function with domain ≥2 1 + λ -β021/2−1 1/2−exp(-1) . R+ has minimum -1+2(14/λ2)-1 for x=λ+1/2, to get (cid:18)2 (cid:16) (cid:16) (cid:17)(cid:17) (cid:19) Therefore by (10) it is sufficient to prove that ⌈λ⌉f Cλ(⌊λ⌋) +λf Cλ(⌈λ⌉) 2 2 1 1/2 ⌈λ⌉+λ 2 + λ -β 21/2−1 −exp(-1) ≥ Cλ(⌈λ⌉) . (cid:0) (cid:1) (cid:0) (cid:1) 2 0 2 ≤ 1f -2-1/2 + 1f(-0.7785)=.9704<1, (14) (cid:18) (cid:16) (cid:16) (cid:17)(cid:17) (cid:19) (cid:12) ((cid:12)16) 2 2 (cid:12) (cid:12) ThisinequalityischeckednumericallyandillustratedinFigure (cid:16) (cid:17) which proves the theorem in this case. 2. Proof of Theorem 19 Case 2: This case deals with the Sub-case 0≤λ<1/2 For λ∈[0,1/2] we have situation where E Cλ(X) < β , which is only possible if 2 0 λ≤4. E Cλ(X) ≥Pr(1)Cλ(1)+(1−Pr(1))Cλ(0) (cid:2) (cid:3) 2 2 2 The function Cλ satisfies Cλ(λ) = Cλ(λ+1) = -2-1/2. -2+λ λ Convexity of the f2unction C2λ 2and the fac2t that X can only (cid:2) (cid:3)=Pr(1) 21/2 +(1−Pr(1))21/2 assumeintegervaluesimpliesthatifE C2λ(X) ≤-2-1/2 then λ−2Pr(1) = . Pr(X =⌈λ⌉)≥1/2. 21/2 (cid:2) (cid:3) Sub-case 1 ≤ λ ≤ 4 For λ ∈ [1,4] we can get an even Therefore by (10) it is sufficient to prove that better bound on P(X =⌈λ⌉) as follows. 2Pr(1)−λ E Cλ(X) ≥Pr(X =⌈λ⌉)Cλ(⌈λ⌉) ≤2(Pr(1)−λexp(-λ)), 2 2 21/2 +(1−Pr(X =⌈λ⌉))min Cλ(⌈λ⌉−1),Cλ(⌈λ⌉+1) . (cid:2) (cid:3) 2 2 which is equivalent to HenceE C2λ(X) canbe l(cid:8)owerboundedbyan expressio(cid:9)nof λ the form 21/2λexp(-λ)− ≤ 21/2−1 Pr(1). (cid:2) (cid:3) 2 1 1 (cid:16) (cid:17) 2 +t C2λ(x)+ 2 −t C2λ(x+1), (15) We know that P (1)≥1/2 so it is sufficient to prove that (cid:18) (cid:19) (cid:18) (cid:19) λ 21/2−1 where s = Pr(X =⌈λ⌉) or s = 1−Pr(X =⌈λ⌉) and x = 21/2λexp(-λ)− ≤ . (17) ⌈λ⌉ or x=⌈λ⌉−1. If x is allowed to assume real values in 2 2 (15) then it is a polynomial in x of order 2 with minimum ThisinequalityischeckednumericallyandillustratedinFigure 1 t2 3. -21/2 − 21/2λ. Sub-case 1/2≤λ<1 For λ∈[1/2,1] we have Hence the condition E C2λ(X) <β0 implies that E C2λ(X) ≥Pr(1)C2λ(1)+(1−Pr(1))C2λ(2) - (cid:2)1 − t(cid:3)2 <β (cid:2) (cid:3)=Pr(1)−2+λ +(1−Pr(1))λ2−4λ+2 21/2 21/2λ 0 21/2 21/2λ λ2−4λ+2 2λ−2 and = +Pr(1) . t> λ -β 21/2−1 1/2. 21/2λ 21/2λ 0 The inequality E Cλ(X) ≤β implies that Hence (cid:16) (cid:16) (cid:17)(cid:17) 2 0 1 1 1/2 λ2−(cid:2)4λ+2(cid:3) 2λ−2 Pr(X =⌈λ⌉)≥ +t> + λ -β 21/2−1 . +Pr(1) ≤β 2 2 0 21/2λ 21/2λ 0 (cid:16) (cid:16) (cid:17)(cid:17)

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.