ebook img

The Kendall's Theorem and its Application to the Geometric Ergodicity of Markov Chains PDF

0.33 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Kendall's Theorem and its Application to the Geometric Ergodicity of Markov Chains

The Kendall’s Theorem and its Application to the Geometric Ergodicity of Markov Chains 3 1 0 Witold Bednorz 2 n a J 8 ] R Abstract P . In this paper we prove a sharp quantitative version of the Kendall’s Theo- h t rem. The Kendal Theorem states that under some mild conditions imposed on a a m probability distribution on positive integers (i.e. probabilistic sequence) one can [ prove convergence of its renewal sequence. Due to the well-known property - the 1 first entrance last exit decomposition - such results are of interest in the stability v 1 theory of time homogeneous Markov chains. In particular the approach may be 8 used to measure rates of convergence of geometrically ergodic Markov chains and 4 1 consequently implies estimates on convergence of MCMC estimators. . 1 0 3 1 Introduction 1 : v i X Let (Xn)n>0 be a time-homogeneous Markov chain on a measurable space ( , ), with S B r transition probabilities Pn(x, ), n > 0 and a unique stationary measure π. Let P be a · the transition operator given on the Banach space of bounded measurable functions on ( , ) by Pf(x) = f(y)P(x,dy). Under mild conditions imposed on (X ) the chain n n>0 S B is ergodic, i.e. R Pn(x, ) π( ) 0, as n , (1.1) TV k · − · k → → ∞ Department of Mathematic, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland Research partially supported by MNiSW Grant N N201 387234 The paper was prepared during stay in IM PAN 2010 Mathematics Subject Classification: Primary 60J20;Secondary 60K05;65C05. Geometric Ergodicity; Renewal Theory; Markov Chain Monte Carlo 1 for all starting points x in the usual total variation norm ∈ S µ = sup fdµ , TV k k | | |f|61 Z where µ is a real measure on ( , ). It is known that the aperiodicity, the Harris S B recurrence property and the finiteness of π are equivalent to (1.1), (see Theorem 13.0.1 in [12]). Consequently the recurrence property is necessary to prove the convergence of X distributions to the invariant measure in the total variation norm regardless of the n starting point X = x. In applications (see [11]) there is required a stronger form of 0 the result, namely we expect the exponential rate of the convergence and a reasonable method to estimate this quantity. One of the possible generalizations of the total variation convergence is considering func- tions controlled from above by V : R, V > 1, π(V) < therefore we refer to B as V S → ∞ the Banach space of measurable functions on ( , ), such that sup f(x) /V(x) < S B x∈S | | ∞ with the norm f(x) f := sup | |. V k k V(x) x ∈S Then instead of the total variation distance one can use µ := sup fdµ . V k k | | |f|6V Z The geometric convergence of Pn(x, ) to a unique stationary measure π, means there · exists ρ < r 6 1 such that V (Png)(x) gdπ 6 M (r)rn g g B , (1.2) V V V V k − k k k ∈ Z where ρ is the spectral radius of (P 1 π) acting on (B , ) and M (r) is the V V B V − ⊗ k·k optimal constant. In applications we often work with test functions g from a smaller space B , where W : R and 1 6 W 6 V. In this case we expect W S → (Png)(x) gdπ 6 M (r)rn g , g B , V W W W k − k k k ∈ Z which is valid at least on ρ 6 r 6 1, and M (r) is the optimal constant. The most V W important case is when W 1, i.e. we consider not necessarily uniform geometric ≡ convergence in the total variation norm. Whenever it exists we call ρ the convergence rate of geometric ergodicity for the chain V (X ) . For a class of examples one can prove the geometric convergence (see Chapter n n>0 15 in [12]) and it is closely related to the existence of the exponential moment of the return time for a set C of positive π-measure. ∈ B 2 The main tool to measure the convergence rate of the geometric ergodicity is the drift condition, i.e. the existence of Lyapunov function V : R, V > 1, which is con- S → tracted outside a small set C. The standard formulation of the required properties is the following: ¯ 1. Minorization condition. There exist C , b,b > 0 and a probability measure ν ∈ B on ( , ) such that S B P(x,A) >¯bν(A) for all x C and A . ∈ ∈ B 2. Drift condition. There exist a measurable function V : S [1, ) and constants → ∞ λ < 1 and K < satisfying ∞ λV(x) if x C PV(x) 6 6∈ ( K if x C. ∈ 3. Strong aperiodicity ¯bν(C) > b > 0 The first property means there exists a small set C on which the regeneration of (X ) n n>0 takes place (see Chapter 5 in [12]). The assumption is relatively week since each Harris recurrent chainadmits theexistence ofa small set atleast forsomeofitsm-skeletons (i.e. processes (X ) , m > 1) - see Theorem 5.3.2 in [12]. The small set existence is used nm n>0 in the split chain construction (see Section 6 and cf. [10] for details) to extend (X ) n n>0 to a new Markov Chain on a larger probability space 0,1 , so that (C,1) is a true S ×{ } atom of the new chain and its marginal distribution on ( ,0) equals the distribution of S (X ) . The second condition reads as the existence of a Lyapunov function V which n n>0 is contracted by the semigroup related operator P with the rate λ < 1, for all points outside the small set. Finally the strong aperiodicity means that the regeneration set C is of positive measure for the basic transition probability, so the regeneration can take place in one turn. Our main result concerns ergodic Markov chains. Since the approach is based on the reduction to the study of a renewal sequence, we first consider the atomic case and show how the idea works in this special case. Then we turn to the general case where the split chain construction is required. The results of the type are used whenever exact estimates on the ergodicity are required cf. [3], [8] and [9]. The organization of the paper is as follows: in Section 2 we discuss the atomic case where we describe our main results in this case, then in Section 3 we say what can be done with a method of chain splitting in the general case. In Section 4 we give the 3 proof of our main result - Theorem 2.8; in Section 5 we discuss how the result improves the previously known estimation methods. In section 6 we show how the Kendal type results can be used in Markov theory to estimate rates of convergence. We give a short argument for both the atomic and non atomic case, leaving the tedious computation of some estimates of constants (which improves what has been known) to the Appendix A. Then in Appendix Btes of constants (which improves what has been known) to the Appendix A. Then in Appendix B we analyze the result for typical toy examples. 2 The atomic case ¯ For this section we assume that b = 1. Note that in this setting one can rewrite the minoriziation condition 1 as P(x,A) = ν(A), for all x C. ∈ which implies that C is an atom and ν = P(a, ), for any a C. It remains to translate · ∈ conditions 2-3 into a simpler form which can be used later to prove the geometric ergod- icity. Let τ = τ(C) = inf n > 1 : X C and u = P (X C), for n > 0. Then n n a n { ∈ } ∈ u is the renewal sequence that corresponds to the increment sequence b = P (τ = n) n n a for n > 1. Note that in particular whenever we expect ergodicity lim u exists and n n →∞ is equal u = π(C). Following [1] we define function G(r,x) = E rτ, for all x S and x ∞ ∈ 0 < r 6 λ 1. The main property of G(r,x) is that it is the lower bound for V(x) on the − set C, namely we have that (cf. Proposition 4.1 in [1]) S\ Proposition 2.1 Assume only drift condition (2). 1. For all x S, P (τ < ) = 1. x ∈ ∞ 2. For 1 6 r 6 λ 1 − V(x) if x C, G(r,x) 6 6∈ ( rK if x C. ∈ Therenewal approachisbasedonthefirst entrancelastexit property. Tostateitweneed additional notation H (r,x) = E ( τ rnW(X )), for r > 0 for which the definition W x n=1 n makes sense. We have that (cf. Proposition 4.2 in [1]) P Proposition 2.2 Assume onlythat the Markovchainis geometricallyergodicwith (unique) invariant probability measure π, that C is an atom and that W : R is such that S → 4 W > 1. Suppose g : S R satisfies g 6 1, then for all r > 1 for which right-hand W → k k sides are finite: ∞ sup (Png(a) gdπ)zn 6 | − | |z|=r Xn=1 Z ∞ H (r,a) rH (1,a) 6 H (r,a)sup (u u )zn +π(C) W − W , W n z6r| − ∞ | r 1 | | Xn=0 − for all a C and ∈ ∞ sup (Png(x) gdπ)zn 6 | − | |z|=r Xn=1 Z ∞ 6 H (r,x)+G(r,x)H (r,a) sup (u u )zn + W W n | z6r − ∞ | | | Xn=0 H (r,a) rH (1,a) r(G(r,x) 1) W W +π(C) − G(r,x)+π(C)H (1,a) − , W r 1 r 1 − − for all x C. 6∈ Now the problem splits into two parts: in the first one we have to provide some estimate on H (r,x), x on the interval 1 6 r 6 λ 1, and it is of meaning when we want to W − ∈ S obtain a reasonable bound on M (r), whereas in the second we search for r a lower W 0 bound for the inverse of the radius of convergence of ∞n=0(un − u∞)zn, and then for some upper bound K0(r) on sup|z|=r| ∞n=0(un −u∞)znP|, for r < r0. As for the first issue we acknowledge two cases. The simplest situation is when W 1 P ≡ and therefore H (r,x) = r(G(r,x) 1)/(r 1), H (1,a) = E τ = π(C) 1, which allows 1 1 a − − − to slightly improve estimates on H (r,x) (cf. Proposition 4.2 in [1]). 1 Proposition 2.3 Assume only drift condition (2). 1. For 1 6 r 6 λ 1 − rλ(V(x)−1) if x C, H (r,x) 6 1 λ 6∈ 1 ( r(K−−λ) if x C. 1 λ ∈ − 2. and for 1 6 r 6 λ 1 − H (r,a) rH (1,a) rλ(K 1) 1 − 1 6 − . r 1 (1 λ)2 − − Combining estimates from Propositions 2.1 and 2.3 with Proposition 2.2 we obtain 5 Theorem 2.4 Suppose (X ) satisfies conditions 1-3 with ¯b = 1. Then (X ) is n n>0 n n>0 geometrically ergodic - it verifies (1.2) and we have the following bounds on ρ , M : V 1 ρ 6 r 1 V 0− 2rλ rλ(K 1) r(K λ) M (r) 6 + − + − K (r), 1 1 λ (1 λ)2 1 λ 0 − − − where r = r (b,λ 1,λ 1K) and K (r) = K (r,b,λ 1,λ 1K) are defined in Corollaries 0 0 − − 0 0 − − 2.9,2.11. On the other hand when W V we have weaker bounds on H (r), which are given in V ≡ Proposition 4.2 in [1]. Proposition 2.5 Assume only drift condition (2). 1. For 1 6 r 6 λ 1 − rλ(V(x)−1) if x C, H (r,x) 6 1 rλ 6∈ V ( r(K−−rλ) if x C. 1 rλ ∈ − in particular H (1,x) 6 K λ for all x C. V 1−λ ∈ − 2. and for 1 6 r 6 λ 1 − H (r,a) rH (1,a) rλ(K 1) V − V 6 − . r 1 (1 λ)(1 rλ) − − − Applying Proposition 2.5 instead of 2.3 we obtain a similar result to Theorem 2.4, yet with a worse control on M (r) (that necessarily goes to infinity near r = λ 1). W − Theorem 2.6 Suppose that (X ) satisfies conditions 1-3 with ¯b = 1. Then (X ) n n>0 n n>0 is geometrically ergodic - it verifies (1.2) and we have the following bounds on ρ , M : V V ρ 6 r 1 V 0− rλ rλ(K λ) rλ(K 1) r(K rλ) M (r) 6 + − + − + − K (r), V 1 rλ (1 λ)2 (1 λ)(1 rλ) 1 rλ 0 − − − − − where r = r (b,λ 1,λ 1K) and K (r) = K (r,b,λ 1,λ 1K) are defined Corollaries 0 0 − − 0 0 − − 2.9,2.11. The second part concerns the study of a renewal process. As we have noted when- ever the condition (2) holds we can handle with all quantities in Proposition 2.2 but sup|z|6r| ∞n=0(un − u∞)zn|. Let (τk)k>0 denote subsequent visits of the Markov chain to C, where we assume that τ = 0 (so the chain starts from C). The renewal process P 0 6 is defined by V = inf τ m : τ > m , m > 0. Clearly τ τ , k > 1 forms a m n n k k 1 { − } − − family of independent random variables of the same distribution P (τ τ = n) = a k k 1 − − P (τ = n) = b , n > 1. By the definition we have the equality P(V = 0) = u , i.e. a n n n the probability that the process renew in time n equals u . Observe that u = 1 and n 0 un = nk=1un−kbk, hence denoting b(z) = ∞n=1bnzn, u(z) = ∞n=0unzn, for z ∈ C, we can state the renewal equation in the following form P P P u(z) = 1/(1 b(z)), for z < 1. (2.1) − | | The equation means we can study the convergence of u to u in terms of properties of n ∞ (b ) . Notethat b = P (τ = 1) > bandb(λ 1) 6 λ 1K by theconditions respectively n n>1 1 a − − 1 and 2. Historically, the first result that matches these properties with the geometric ergodicity was due to Kendall [6] who proved that: Theorem 2.7 Assume that b1 > 0 and ∞n=1bnrn < ∞ for some r > 1. Then the limit u = limn un exists and is equalPu = ( ∞n=1nbn)−1, moreover the radius of ∞ →∞ ∞ convergence of ∞n=0(un −u∞)zn is strictly greaterPthan 1. Although the rPesult shows that drift condition implies the geometric ergodicity of an aperiodicMarkovChain, itisnotsatisfactoryinthesense thatitdoesnotprovideneither estimates on the radius of convergence nor a deviation inequality which one could use on the disc where the convergence holds. The Kendall’s theorem was improved first in [13] and then in [1] (see Theorem 3.2). There are also several results where some additional assumptionsonthedistributionofτ arestudied. Forexamplein[2]thereisdescribedhow to provide an optimal bound on the rate of convergence, which gives some computable bound on the value, yet under additional conditions on the τ distribution. Whenever the general Kendall’s problem is considered the bounds obtained in the mentioned papers are still far from being optimal or easy to use. The goal of the paper is to give a sharp estimate on the rate of convergence which significantly improves on the previous results. Our approach is based on introducing π(C) as a parameter, namely we prove that the following result holds: Theorem 2.8 Suppose that (bn)∞n>1 verifies b1 > b > 0, b(r) = ∞n=1bnrn < ∞, for some r > 1. Then u = ( ∞n=1nbn)−1 and P ∞ ∞ P c(r) c(1) sup (u u )zn 6 − , n z=r| − ∞ | c(1)(r 1)([(1 b)D(α) c(r)+c(1)]+) | | Xn=0 − − − where c(r) = b(rr)−11, c(1) = u−1 = π(C)−1 and − ∞ 1+ b (1 e1i+πα) 1 c(1) 1 D(α) = | 1−b − |− , where α = − , 1 e1i+πα 1 b | − | − 7 Consequently whenever one can control c(r) = (b(r) 1)/(r 1) from above, there is a − − boundontherateofconvergencefortherenewalprocess. Thesimplestexpositioniswhen c(1) = π(C) 1 is known and we can control c(r) in a certain point, i.e. c(R) 6 N < , − ∞ for some R > 1. Observe that if b(R) 6 L, then due to c(R) = b(R)−1 we deduce that R 1 c(R) 6 N = L 1, which is our basic setting. Note that by the H¨olde−r inequality, for all − R 1 − 1 6 r 6 R c(r) 1 c(r) c(1) 6 (c(1) 1)( − 1) 6 (1 b)α(rκ(α) 1), − − c(1) 1 − − − − where κ(α) = log( N 1 )/logR = log( N 1 )/logR, α = (c(1) 1)/(1 b). − − c(1) 1 (1 b)α − − − − Corollary 2.9 Suppose that c(1) = π(C) 1 is given, b > b and b(R) 6 L, then − 1 D(α) 1 r = min R,(1+ )κ(α) . (2.2) 0 { α } Moreover for r < r 0 ∞ π(C)(rκ(α) 1) sup (u u )zn 6 K (r) = − , z=r| n − ∞ | 0 (r 1)(α−1D(α) rκ(α) +1) | | Xn=0 − − Remark 2.10 Observe that the bound (1+ D(α))κ(1α) is increasing with b assuming that α L,R,c(1) are fixed. In applications we have to treat c(1) = π(C) 1 as a parameter. The advantage of the − approach is that there is a sharp upper bound on c(1) or rather α = (c(1) 1)/(1 b). − − Using the inequality Rα = R(P∞n=1(n−1)bn)/(1−b) 6 ∞n=2bnRn−1 6 b(R)−bR 6 L−bR , (2.3) 1 b (1 b)R (1 b)R P − − − we obtain that α 6 α , where α = log( L bR )/logR. On the other hand if b = b , 0 0 (1−b)R 1 − then c(1) 1 > 1 b and therefore due to Remark 2.10 we can always require that − − c(1) 1 > 1 b. Consequently to find an estimate on the rate of convergence we search − − (1+ D(α))κ(1α), α [1,α ] for the possible minimum. α ∈ 0 Corollary 2.11 Suppose that b > b and b(R) 6 L. Then 1 D(α) 1 r = min R, min (1+ )κ(α) . (2.4) 0 { 16α6α0 α } Moreover for r < r 0 ∞ rκ(α) 1 sup (u u )zn 6 K (r) = max − (2.5) z=r| n − ∞ | 0 16α6α0 (r 1)(α−1D(α) rκ(α) +1) | | Xn=0 − − The above Corollary should be compared with Theorem 3.2. in [1]. In section 5 we prove that our result is always better than the previous one, moreover we discuss the true benefit of the approach studying the limiting case, where R 1. → 8 3 Non atomic case In the general case we assume ¯b 6 1, which means that true atom may not exists. Therefore we have to use the split chain construction to create an atom on the extended probability space. In the Section 6 we prove Theorems A.6 and A.8 that are equivalents of Theorems 2.4 and 2.6 in the case of general ergodic Markov chain. Consequently applying the first entrance last exit decomposition for the split chain we reduce the ¯ question of the rate of convergence to the study of the renewal sequence for (b ) - the n n>1 probability distribution of the return time to the artificial atom. ¯ Let (u¯ ) be the corresponding renewal sequence for (b ) . In the same way as n n>0 n n>1 ¯ in the atomic case let b(z), u¯(z), z C be corresponding generating functions and ∈ c¯(z) = (¯b(z) 1)/(z 1). Clearly ¯b = ¯bν(C) > b, and c¯(1) = ¯b 1π(C) so as in the 1 − − − atomic case we have a control on the limiting behavior of c¯(z) c¯(1), namely applying − Theorem 2.8 we obtain that whenever c¯(r) < , then ∞ ∞ c¯(r) c¯(1) sup (u¯ u¯ )zn 6 − , (3.1) n z=r| − ∞ | c¯(1)(r 1)([(1 b)D(α¯) c¯(r)+c¯(1)]+) | | Xn=0 − − − where c¯(r) = ¯b(rr)−11, c¯(1) = u¯−1 =¯b−1π(C)−1 and − ∞ 1+ b (1 e1i+πα¯) 1 c¯(1) 1 D(α¯) = | 1−b − |− , where α¯ = − , 1 e1i+πα¯ 1 b | − | − ¯ In this way the problem reduces to the estimate on b(r). The main difficulty is that in the non atomic case Theorem 2 provides only that for R = λ 1 > 1 − b (R) = E Rτ 6 L = KR, for all x C, (3.2) x x ∈ ¯ whereas we need a bound on the generic function of (b ) . We discuss the question n n>1 1 in Section 6, showing in Proposition A.2 that for all 1 6 r 6 min R,(1 b)−1+α1 the { − } following inequality holds ¯br br +(¯b b)r1+α2 ¯b(r) 6 L(r) = max , − , (3.3) {1 (1 ¯b)r1+α1 1 (1 ¯b)r } − − − − where α1 = log((L1−¯b¯b)RR)/logR and α2 = log(L−((¯b1−b¯b)+Rb)R)/logR. Moreover if 1 + b > 2¯b − − then simply ¯ br L(r) = . 1 (1 ¯b)r1+α1 − − ¯ Using (3.3) is the best what the renewal approach can offer to bound b(r). The meaning of the result is that there are only two generic functions that are important to bound 9 ¯ ¯ b(r). If b is close to 1 then we are in the similar setting as in the atomic case and surely one can expect the bound on¯b(r) of the form br+1(¯b−(1b)r¯b1)+rα2, whereas if ¯b is far from 1 only the split chain construction matters and the bou−nd−on ¯b(r) should be like ¯br . 1 (1 ¯b)r1+α1 − − As in the atomic case we will need a bound on the α¯ = c¯(1)−1. We show in Corollary A.3 1 b − that ¯ ¯ ¯ 1 b 1 b b b α¯ 6¯b 1max − (1+α ), − + − α . (3.4) − 1 2 {1 b 1 b 1 b } − − − In fact the maximum equals ¯b 11 ¯b(1+α ) if 1+b > 2¯b and ¯b 11 ¯b + ¯b bα otherwise. − 1−b 1 − 1−b 1−b 2 − − − Now we turn to the basic idea for all the approach presented in the paper. Observe that c¯(r)−1 satisfies the H¨older inequality i.e. for p+q = 1, p,q > 0 c¯(1) 1 − c¯(r ) 1 c¯(r ) 1 c¯(rprq) 1 ( 1 − )p( 2 − )q > 1 2 − , c¯(1) 1 c¯(1) 1 c¯(1) 1 − − − which means that F0(x) = log(c¯c¯((e1x))−11) is convex and F0(0) = 0. By (3.3) we have that − c¯(ex) 6 L(ex) and hence L(ex) ex F (x) 6 F (x) = log( − ). (3.5) 0 1 (1 b)α¯(ex 1) − − Therefore we can easily compute the largest possible function F¯(x) that satisfies the conditions: 1. F¯(x) 6 F (x) for 0 6 x 6 min logR, 1 log(1 ¯b) ; 1 { −1+α1 − } 2. F¯(0) = 0 and F¯ is convex; 3. F¯ is maximal over the functions with the properties 1-2, namely if there ex- ists F that satisfies the above condition then F(x) 6 F¯(x) for all 0 6 x 6 min logR, 1 log(1 ¯b) . { −1+α1 − } Let x be the unique solution of the equation 0 F (x)x = F (x). (3.6) 1′ 1 Note that x 6 1 log(1 ¯b). If additionally x 6 logR then the optimal F¯(x) is of 0 −1+α1 − 0 the form F (x )x for all 0 6 x 6 x F¯(x) = 1′ 0 0 (3.7) ( F1(x) for all x0 6 x 6 min{logR,−1+1α1 log(1−¯b)} otherwise if x > logR then 0 F (logR) F¯(x) = 1 x for all 0 6 x 6 logR. (3.8) logR 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.