ebook img

How many Laplace transforms of probability measures are there? PDF

0.17 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview How many Laplace transforms of probability measures are there?

How many Laplace transforms of probability measures are there? Fuchang Gao∗ Wenbo V. Li† Jon A. Wellner‡ 0 1 January 1, 2010 0 2 n a J Abstract 1 ] A bracketing metric entropy bound for the class of Laplace transforms of prob- T ability measures on [0,∞) is obtained through its connection with the small de- S viation probability of a smooth Gaussian process. Our results for the particular . h smooth Gaussian process seem to be of independent interest. t a m [ Keywords: Laplace Transform; bracketing metric entropy; completely monotone func- 1 tions; smooth Gaussian process; small deviation probability v 0 0 2 0 . 1 0 0 1 : v i X r a ∗Department of mathematics, University of Idaho, [email protected]. †Department of Mathematical Sciences, University of Delaware, [email protected]. Supported in part by NSF grant DMS-0805929. ‡DepartmentofStatistics,UniversityofWashington,[email protected]. Supportedinpart by NSF Grant DMS-0804587 1 1 Introduction Let µ be a finite measure on [0,∞). The Laplace transform of ν is a function on (0,∞) defined by ∞ f(t) = e−tyµ(dy). (1) Z0 It is easy to check that such a function has the property that (−1)nf(n)(t) ≥ 0 for all non-negative integer n and all t > 0. A function on (0,∞) with this property is called a completely monotone function on (0,∞). A characterization due to Bernstein (c.f. Williamson (1956)) says that f is completely monotone on (0,∞) if and only if there is a non-negative measure µ (not necessary finite) on [0,∞) such that (1) holds. Therefore, due to monotonicity, the class of Laplace transforms of finite measures on [0,∞) is the same as the class of bounded completely monotone functions on (0,∞). These functions can be extended to continuous functions on [0,∞), and we will call them completely monotone on [0,∞). Completely monotonic functions have remarkable applications in various fields, such as probability and statistics, physics and potential theory. The main properties of these functions are given in Widder (1941), Chapter IV. For example, the class of completely monotonic functions is closed under sums, products and pointwise convergence. We refer to Alzer and Berg (2002) for a detailed list of references on completely monotonic functions. Closely related to the class of completely monotonic functions are the so- called k-monotone functions, where the non-negativity of (−1)nf(n) is required for all integers n ≤ k. In fact, completely monotonic functions can be viewed as the limiting case of the k-monotone functions as k → ∞. In this sense, the present work is a partial extension of Gao (2008) and Gao and Wellner (2009). Let M be the class of completely monotone functions on [0,∞) that are bounded ∞ by 1. Then ∞ M = f : [0,∞) → [0,∞) f(t) = e−txµ(dx),kµk ≤ 1 . ∞ (cid:26) (cid:12) Z0 (cid:27) (cid:12) (cid:12) It is well known (see e.g. Feller (1971), Theorem 1, page 439) that the sub-class of M (cid:12) ∞ with f(0) = 1 corresponds exactly to the Laplace transforms of the class of probability measures µ on [0,∞). For a random variable with distribution function F(t) = P(X ≤ t), the survival function S(t) = 1−F(t) = P(X > t). Thus the class ∞ S = S : [0,∞) → [0,∞) S(t) = e−txµ(dx),kµk = 1 ∞ (cid:26) (cid:12) Z0 (cid:27) (cid:12) (cid:12) is exactly the class of survival functions of all scale mixtures of the standard exponential (cid:12) distribution (with survival function e−t), with corresponding densities ∞ p(t) = −S′(t) = xe−xtµ(dx), t ≥ 0. Z0 2 It is easily seen that the class P of such densities with p(0) < ∞ is also a class ∞ of completely monotone functions corresponding to probability measures µ on [0,∞) with finite first moment. These classes have many applications in statistics; see e.g. Jewell (1982) for a brief survey. Jewell (1982) considered nonparametric estimation of a completely monotone density and showed that the nonparametric maximum likelihood estimator (or MLE) for this class is almost surely consistent. The bracketing entropy boundsderived below can be considered as a first step toward global rates of convergence of the MLE. In probability and statistical applications, one way to understand the complexity of a function class is by way of the metric entropy for the class under certain common distances. RecallthatthemetricentropyofafunctionclassF underdistanceρisdefined to be logN(ε,F,ρ) where N(ε,F,ρ) is the minimum number of open balls of radius ε needed to cover F. In statistical applications, sometimes we also need bracketing metric entropy which is defined as logN (ε,F,ρ) where [] n N (ε,F,ρ) := min n : ∃f ,f ,...,f ,f s.t. ρ(f ,f ) ≤ ε,F ⊂ [f ,f ] , [] 1 1 n n k k k k ( ) k=1 [ and [f ,f ] = g ∈ F : f ≤ g ≤ f . k k k k n o Clearly N(ε,F,ρ) ≤ N (ε,F,ρ) and they are close related in our setting below. [] In this paper, we study the metric entropy of M under the Lp(ν) norm given by ∞ ∞ kfkp = |f(x)|pν(dx), 1 ≤ p ≤ ∞, Lp(ν) Z0 where ν is a probability measure on [0,∞). Our main result is the following Theorem 1.1. (i) Let ν be a probability measure on [0,∞). There exists a constant C depending only on p ≥ 1 such that for any 0 < ε < 1/4, logN (ε,M ,k·k ) ≤ Clog(Γ/γ)·|logε|2, [] ∞ Lp(ν) for any 0 < γ < Γ < ∞ such that ν([γ,Γ]) ≥ 1−4−pεp. In particular, if there exists a constant K > 1, such that ν([εK,ε−K]) ≥ 1−4−pεp, then logN (ε,M ,k·k ) ≤ CK|logε|3. [] ∞ Lp(ν) (ii) If ν is Lebesgue measure on [0,1], then logN (ε,M ,k·k ) ≍ logN(ε,M ,k·k ) ≍ |logε|3, [] ∞ L2(ν) ∞ L2(ν) where A ≍ B means there exist universal constants C ,C > 0 such that C A ≤ B ≤ 1 2 1 C B. 2 As an equivalent result for part (ii) of the above theorem, we have the following important small deviation probability estimates for an associated smooth Gaussian pro- cess. In particular, it may be of interest to find a probabilistic proof for the lower bound directly. 3 Theorem 1.2. Let Y(t), t > 0, be a Gaussian process with covariance EY(t)Y(s) = (1−e−t−s)/(t+s), then for 0 < ε < 1 logP sup|Y(t)| < ε ≍ −|logε|3. t>0 (cid:18) (cid:19) The rest of the paper is organized as follows. In Section 2, we provide the upper bound estimate in the main result by explicit construction. In Section 3, we summarize various connections between entropy numbers of a set (and its convex hull) and small ball probability for the associated Gaussian process. Some of our observations in a general setting are stated explicitly for the first time. Finally we identify the particular Gaussian process suitable for our entropy estimates. Then in Section 4, we obtain the required upper bound small ball probability estimate (which implies the lower bound entropy estimates as discussed in section 3) by a simple determinant estimates. This method of small ball estimates is made explicit here for the first time and can be used in many more problems. The technical determinant estimates are also of independent interests. 2 Upper Bound Estimate In this section, we provide an upper bound for N (ε,M ,k · k ), where ν is a [] ∞ Lp(ν) probability measure on [0,∞) and 1 ≤ p ≤ ∞. This is accomplished by an explicit construction of ε-brackets under Lp(ν) distance. For each 0 < ε < 1/4, we choose γ > 0 and Γ = 2mγ where m is a positive integer such that ν([γ,Γ]) ≥ 1−4−pεp. We use the notion I(a ≤ t < b) to denote the indicator function of the interval [a,b). Now for each f ∈ M , we first write in block form ∞ m f(t) = I(0 ≤ t < γ)f(t)+I(t ≥ Γ)f(t)+ I(2i−1γ ≤ t < 2iγ)f(t). i=1 X Then for each block 2i−1γ ≤ t < 2iγ, we separate the integration limits at the level 22−i|logε|/γ and use the first N terms of Taylor’s series expansion of e−u with error terms associated with ξ = ξ , 0 ≤ ξ ≤ 1, to rewrite u,N m f(t) = I(0 ≤ t < γ)f(t)+I(t ≥ Γ)f(t)+ (p (t)+q (t)+r (t)) i i i i=1 X where N (−1)ntn 22−i|logε|/γ p (t) =: I(2i−1γ ≤ t < 2iγ) xnµ(dx) i n! n=0 Z0 X 22−i|logε|/γ (−ξtx)N+1 q (t) =: I(2i−1γ ≤ t < 2iγ) µ(dx) i (N +1)! Z0 ∞ r (t) =: I(2i−1γ ≤ t < 2iγ) e−txµ(dx). i Z22−i|logε|/γ 4 We choose the integer N so that 4e2|logε|−1 ≤ N < 4e2|logε|. (2) Then, by using the inequality k! ≥ (k/e)k and the fact that 0 < ξ < 1, we have within the block 2i−1γ ≤ t < 2iγ, 22−i|logε|/γ (tx)N+1 |q (t)| ≤ µ(dx) i (N +1)! Z0 |4logε|N+1 4e|logε| N+1 ≤ ≤ ≤ e−(N+1) ≤ ε4e2, (N +1)! N +1 (cid:18) (cid:19) where we used tx ≤ 2iγ · 22−i|logε|/γ = 4|logε| in the second inequality above. This implies, due to disjoint supports of q (t), i m q (t) ≤ ε4e2. (3) i (cid:12) (cid:12) (cid:12)Xi=1 (cid:12) (cid:12) (cid:12) (cid:12) (cid:12) Next, we notice that for t ≥ 2i−1γ(cid:12)and x ≥(cid:12)22−iγ−1|logε|, e−tx ≤ ε2. Thus m m ∞ r (t) ≤ I(2i−1γ ≤ t < 2iγ) ε2µ(dx) ≤ ε2. (4) i (cid:12)(cid:12)Xi=1 (cid:12)(cid:12) Xi=1 Z22−iγ−1|logε| (cid:12) (cid:12) Finally, bec(cid:12)ause |f| ≤(cid:12) 1 and ν([0,γ))+ν([Γ,∞)) ≤ 4−pεp, we have (cid:12) (cid:12) k1 f(t)+1 f(t)k ≤ ε/4. 0≤t<γ t≥Γ Lp(ν) Together with (3) and (4), we see that the set m m R =: q (t)+ r (t)+I(t < γ)f(t)+I(t ≥ Γ)f(t) : f ∈ M i i ∞ ( ) i=1 i=1 X X has diameter in Lp(ν)-distance at most ε2 +ε4e2 +ε/4 < ε/2. Therefore, if we denote P = {p (t) : f ∈ M }, then the expansion of f above i i ∞ implies that M ⊂ m P +R, and consequently, we have ∞ i=1 i P m N (ε,M ,k·k ) ≤ N ε/2, P ,k·k . [] ∞ Lp(ν) [] i Lp(ν) ! i=1 X For any 1 ≤ i ≤ m and any p ∈ P , we can write i i Ni p (t) = I(2i−1γ ≤ t < 2iγ) (−1)na (2−iγ−1t)n, (5) i ni n=0 X 5 where 0 ≤ a ≤ |4logε|n/n!. Now we can construct ni N p = I(2i−1γ ≤ t < 2iγ) (−1)nb (2−iγ−1t)n, i ni n=0 X N p = I(2i−1γ ≤ t < 2iγ) (−1)nc (2−iγ−1t)n, ni i n=0 X where ε ⌈2n+2ani⌉ n is even ε ⌊2n+2ani⌋ n is even 2n+2 ε 2n+2 ε b = , c = . ni ni    2nε+2⌊2n+ε2ani⌋ n is odd  2nε+2⌈2n+ε2ani⌉ n is odd Clearly, p (t) ≤p (t) ≤ p (t), and  i i i N |p −p | ≤ I(2i−1γ ≤ t < 2iγ) |c −b |(2−iγ−1t)n i i ni ni n=0 X N ε ≤ I(2i−1γ ≤ t < 2iγ) (2−iγ−1t)n 2n+2 n=0 X ε ≤ I(2i−1γ ≤ t < 2iγ). 2 Hence m m m m p ≤ p ≤ p ≤ p +ε/2. i i i i i=1 i=1 i=1 i=1 X X X X That is, the sets m m P =: p : p ∈ P ,1 ≤ i ≤ m and P =: p : p ∈ P ,1 ≤ i ≤ m i i i i i i ( ) ( ) i=1 i=1 X X form ε/2 brackets of m P in L∞-norm, and thus in Lp(ν)-norm for all 1 ≤ p < ∞. i=1 i Now we count the number of different realizations of P and P. Note that, due to P the uniform bound on a in (5) there are no more than ni 2n+1 |4logε|n · +1 ε n! realizations for b . So, the number of realizations of p is bounded by ni i N 2n+1 |4logε|n · +1 . ε n! n=0(cid:18) (cid:19) Y Because n! > (n/e)n, for all 1 ≤ n ≤ N, we have 2n+1 |4logε|n 3 8e|logε| n · +1 ≤ . ε n! ε n (cid:18) (cid:19) 6 Thus, the number of realizations of p is bounded by i N+1 N 3 ·exp (nlog|8elogε|−nlogn) ε ! (cid:18) (cid:19) n=1 X 3 N+1 N(N +1) N ≤ ·exp log|8elogε|− xlogxdx ε 2 (cid:18) (cid:19) (cid:18) Z1 (cid:19) 3 N+1 N(N +1) N2 N2 ≤ ·exp log|8elogε|− logN + ε 2 2 4 (cid:18) (cid:19) (cid:18) (cid:19) ≤ exp C|logε|2 for some absolute co(cid:0)nstant C,(cid:1)where in the last inequality we used the bounds on N given in (2). Hence the total number of realizations of P is bounded by exp(Cm|logε|2). Similar estimate holds for the total number of realizations of P, and we finally obtain logN (ε,M ,k·k ) ≤ C′m|logε|2 [] ∞ Lp(ν) for some different constant C′. This finishes the proof since m = log (Γ/γ). 2 3 Entropy of Convex Hulls A lower bound estimate of metric entropy is typically difficult, because it often involves a construction of a well-separated set of maximal cardinality. Thus we introduce some soft analytic arguments to avoid this difficulty and change the problem into a familiar one in this section. The hard estimates are given in the next section. First note that M is just the convex hull of the functions k (·), 0 < s < ∞, where ∞ s k (t) = e−ts. We recall a general method about the entropy of convex hulls that was s introduced in Gao (2004). Let T be a set in Rn or in a Hilbert space. The convex hull of T can be expressed as ∞ ∞ conv(T) = a t : t ∈ T,a ≥ 0,n ∈ N, a = 1 ; n n n n n ( ) n=1 n=1 X X while the absolute convex hull of T is defined by ∞ ∞ abconv(T) = a t : t ∈ T,n ∈ N, |a | ≤ 1 . n n n n ( ) n=1 n=1 X X Clearly, by using probability measures and signed measures, we can express conv(T) = tµ(dt) : µ is a probability measure on T ; (cid:26)ZT (cid:27) abconv(T) = tµ(dt) : µ is a signed measure on T,kµk ≤ 1 . TV (cid:26)ZT (cid:27) 7 For any norm k·k the following is clear: conv(T) ⊂ abconv(T) ⊂ conv(T)−conv(T). Therefore, N(ε,conv(T),k·k) ≤ N(ε,abconv(T),k·k) ≤ [N(ε/2,conv(T),k·k)]2. In particular, at the logarithmic level, the two entropy numbers are comparable, modulo constant factors on ε. The benefit of using absolute convex hull is that it is symmetric andcanbeviewedastheunitballofaBanachspace, whichallowsustousethefollowing duality lemma of metric entropy: logN(ε,abconv(T),k·k) ≍ logN(c ε,B,k·k ) 2 T where B is dual ball of the norm k·k, and k·k is the norm introduced by T, that is, T kxk := sup|ht,xi| = sup |ht,xi|. T t∈T t∈abconv(T) Strictly speaking, the duality lemma remains as a conjecture in the general case. How- ever,whenthenormk·kistheHilbertspacenorm,thishasbeenproved. SeeTomczak-Jaegermann (1987), Bourgain et al. (1989), and Artstein et al. (2004). A striking relation discovered by Kuelbs and Li (1993) says that the entropy number logN(ε,B,k·k ) is determined by the Gaussian measure of the set T D =: {x ∈ H : kxk ≤ ε} ε T under some very weak regularity assumptions. For details, see Kuelbs and Li (1993), Li and Linde (1999), and also Corollary 2.2 of Aurzada et al. (2008). Using this rela- tion, we can now summarize the connection between metric entropy of convex hulls and Gaussian measure of D into the following ε Proposition 3.1. Let T be a precompact set in a Hilbert space. For α > 0 and β ∈ R, logP(D ) ≤ −C ε−α|logε|β ε 1 if and only if logN(ε,conv(T),k·k) ≥ C2ε−22+αα|logε|22+βα; and for β > 0 and γ ∈ R, logP(D ) ≤ −C |logε|β(log|logε|)γ ε 1 if and only if logN(ε,conv(T),k·k ) ≥ C |logε|β(log|logε|)γ. 2 2 Furthermore, the results also hold if the directions of the inequalities are switched. 8 The result of this proposition can be implicitly seen in Gao (2004), where an expla- nation of the relation between N(ε,B,k · k ) and the Gaussian measure of D is also T ε given. Perhaps, the most useful case of Proposition 3.1 is when T is a set of functions: K(t,·), t ∈ T, where for each fixed t ∈ T, K(t,·) is a function in L2(Ω), and where Ω is a bounded set in Rd, d ≥ 1. For this special case, we have Corollary 3.2. Let X(t) = K(t,x)dB(x), t ∈ T, where K(t,·) are square-integrable Ω functions on a bounded set Ω in Rd, d ≥ 1, and B(x) is the d-dimensional Brownian R sheet on Ω. If F is the convex hull of the functions K(·,ω), ω ∈ Ω, then logP sup|X(t)| < ε ≍ ε−α|logε|β (cid:18)t∈T (cid:19) for α > 0 and β ∈ R if and only if logN(ε,F,k·k) ≍ ε−22+αα|logε|22+βα; and for β > 0 and γ ∈ R, logP sup|X(t)| < ε ≍ −|logε|β(log|logε|)γ (cid:18)t∈T (cid:19) if and only if logN(ε,F,k·k ) ≍ |logε|β(log|logε|)γ. 2 The authors found this corollary especially useful. For example, it was used in Blei et al. (2007) and Gao (2008) to change a problem of metric entropy to a problem of small deviation probability of a problem about a Gaussian process which is relatively easier. The proof is given in Gao (2008) for the case Ω = [0,1], and in Blei et al. (2007) for the case [0,1]d. For the general case, it can be proved as easily. Indeed, the only thing we need to prove is that P(D ) can be expressed as the probability of the set ε sup |X(t)| < ε. We outline a proof below. Let φ be an orthonormal basis of L2(Ω), t∈T n then ∞ X(t) = K(t,s)dB(s) = ξ K(t,s)φ (s)ds n n ZΩ n=1 ZΩ X where ξ are i.i.d standard normal random variables. Thus, n P(D ) = P g ∈ L2(Ω) : f(s)g(s)ds < ε,f ∈ F ε (cid:26) (cid:12)ZΩ (cid:12) (cid:27) (cid:12) (cid:12) = P g ∈ L2(Ω) : (cid:12) K(t,s)g(cid:12)(s)dsµ(dt) < ε,kµk ≤ 1 (cid:12) (cid:12) TV (cid:26) (cid:12)ZT ZΩ (cid:12) (cid:27) ∞ (cid:12) ∞ ∞ (cid:12) (cid:12) (cid:12) = P a φ (s)(cid:12): a2 < ∞, a (cid:12) K(t,s)φ (s)dsµ(dt) < ε,kµk ≤ 1 n n n n n TV (Xn=1 Xn=1 (cid:12)(cid:12)Xn=1 ZT ZΩ (cid:12)(cid:12) ) ∞ ∞ (cid:12) ∞ (cid:12) (cid:12) (cid:12) = P a φ (s) : a2 < ∞,s(cid:12)up a K(t,s)φ (s)ds < ε(cid:12) n n n n n (Xn=1 Xn=1 t∈T (cid:12)(cid:12)Xn=1 ZΩ (cid:12)(cid:12) ) (cid:12) (cid:12) = P sup|X(t)| < ε . (cid:12) (cid:12) (cid:12) (cid:12) (cid:18)t∈T (cid:19) 9 Now back to our problem of estimate logN(ε,M ,k · k ) in the statement of (ii) ∞ 2 of the theorem, where k · k is the L2 norm under the Lebesgue measure on [0,1], we 2 notice that M is the convex hull of the functions C(·,s), s ∈ [0,∞), on [0,1] with ∞ C(t,s) = e−ts. However, [0,∞) is not bounded. In order to use Corollary 3.2, we need to make a change of variables. Notice that by letting y = e−s, we can view M ∞ as convex hull of K(·,y), y ∈ (0,1], where K(t,y) = yt. Clearly, K(t,·) are square- integrable functions on the bounded set (0,1]. Now, for this K, the corresponding X(t) is a Gaussian process on [0,1] with covariance ts−1 EX(t)X(s) = , s,t ∈ (0,1], (s,t) 6= (1,1), log(ts) and EX(1)2 = 1. Thus, the problem becomes how to estimate P sup |X(t)| < ε , or equivalently P sup|Y(t)| < ε , t∈(0,1] ! (cid:18)t≥0 (cid:19) where Y(t) = X(e−t), which has covariance structure 1−e−t−s EY(t)Y(s) = , s,t ≥ 0. (6) t+s We now turn to the lower estimates of this probability. 4 Lower Bound Estimate Let Y(t), t ≥ 0 be the centered Gaussian process defined in (6). Our goal in this section is to prove that logP(sup|Y(t)| < ε) ≤ −C|logε|3, t≥0 for some constant C > 0. Note that for any sequence of positive numbers {δ }n , i i=1 P sup|Y(t)| < ε ≤ P(max |Y(δ )| < ε) i (cid:18)t≥0 (cid:19) 1≤i≤n = (2π)−n/2(detΣ)−1/2 exp −hy,Σ−1yi dy ···dy 1 n Zmax1≤i≤n|yi|≤ε (cid:0) (cid:1) ≤ (2π)−n/2(detΣ)−1/2(2ε)n ≤ εn(detΣ)−1/2. (7) where the covariance matrix 1−e−δi−δj Σ = (EY(δ )Y(δ )) = . i j 1≤i,j≤n δ +δ (cid:18) i j (cid:19)1≤i,j≤n To find a lower bound for det(Σ), we need the following lemma: 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.