Table Of ContentEXPLICIT ERROR BOUNDS FOR LAZY REVERSIBLE
MARKOV CHAIN MONTE CARLO
DANIEL RUDOLF
8
Abstract. We prove explicit, i.e., non-asymptotic, error bounds for Markov
0
ChainMonteCarlomethods,suchasthe Metropolisalgorithm.Theproblemisto
0
2 compute the expectation(or integral)of f with respectto a measureπ whichcan
be givenby a density ̺ with respectto another measure.A straightsimulation of
y
a the desired distribution by a random number generatoris in generalnot possible.
M ThusitisreasonabletouseMarkovchainsamplingwithaburn-in.Westudysuch
an algorithm and extend the analysis of Lovasz and Simonovits (1993) to obtain
3 an explicit error bound.
2
]
A
N 1. Problem description, Introduction
.
h The paper deals with numerical integration based on Markov chains. The main goal
at is to approximate an integral of the following form
m
[ (1) S(f) := f(x)π(dx),
1 ZΩ
where Ω is a given set and π a probability measure. In addition we assume that an
v
7 oracle which computes function values of f is provided. We generate a Markov chain
8
X ,X ,... with transition kernel K, having π as its stationary distribution. After
5 1 2
3 a certain burn-in time there is an average computation over the generated sample
5. (Markov chain steps). For a given function f and burn-in time, say n0, we get as
0 approximation
8 n
1
0
S (f) := f(X ).
: n,n0 n j+n0
v
j=1
i X
X This Markov chain Monte Carlo method (MCMC) for approximating the expecta-
r tion plays a crucial role in numerous applications, especially in statistical physics, in
a
statistics, andinfinancialmathematics. Certainasymptotic errorboundsareknown,
which can be proved via isoperimetric inequalities, the Cheeger inequality and esti-
mates of eigenvalues, see [LS88, Mat99, Mat04]. Here in contrast, we determine an
explicit error bound for S . The individual error of such a method S and a
n,n0 n,n0
function f is measured in mean square sense, i.e.,
e(S ,f) := E S (f) S(f) 2 1/2.
n,n0 | n,n0 − |
Now an outline of the structure of the paper and the main results is given. Section 2
(cid:0) (cid:1)
contains the used notation and repeats some relevant statements. An introduction
of the idea of laziness is given in Section 3, where also the conductance concept and
Date: Version: May 23, 2008.
Keywordsandphrases. MarkovchainMonteCarlo,Metropolisalgorithm,conductance,explicit
error bounds, burn-in, ball walk, reversible,lazy.
1
2 DANIELRUDOLF
a convergence property of the chain is presented. It is useful for getting results to
restrict ourself to Markov chains which have a positive conductance and where the
initial distribution ν, for obtaining the first time step, has a bounded density with
respect to π. Section 4 contains the new results. Let ϕ be the conductance of the
underlying chain. After a burn-in
log dν 10
n dπ ∞ the error obeys e(S ,f) f .
0 ≥ ϕ2 n,n0 ≤ ϕ √n k k∞
(cid:0)(cid:13) (cid:13) (cid:1)
·
(cid:13) (cid:13)
This implies immediately that the number n + n of time steps which are needed
0
for an error ε, can be bounded by
log dν 100 f 2
dπ ∞ + k k∞ .
ϕ2 ϕ2 ε2
& (cid:0)(cid:13) (cid:13) (cid:1)' & '
·
(cid:13) (cid:13)
All results are in a general framework, such that after an adaption it is possible to
apply the theory in different settings e.g. discrete state space or continuous one. In
Section 5 we pick up a problem considered in [MN07]. There the authors use the
Metropolis algorithm for approximating an integral over the d dimensional unit ball
Bd Rd with respect to an unnormalised density. The strict positive density is
⊂
notated by ̺ and moreover we assume that it is logconcave and α is the Lipschitz
constant of log̺. Let δ > 0 and B(x,δ) be the ball with radius δ around x. Then we
suggest the method described in Algorithm 1 (see page 3) for the approximation of
f(x)̺(x)dx
S(f) = S(f,̺) = Bd .
̺(x)dx
R Bd
It is shown that for δ = min 1/√d+1,1/α Rthe error obeys
(cid:8) √d+1m(cid:9)ax √d+1,α
e(Sδ ,f) 8000 f ,
n,n0 ≤ √n k k∞
(cid:8) (cid:9)
where the burn-in time n is chosen larger than 1280000 α(d+1)max d+1,α2 .
0
· { }
It is worth pointing out that the number of time steps which we use for sampling
behaves polynomial in the dimension and also polynomial in the Lipschitz constant
α of the densities. As already mentioned the same integration problem was studied
in [MN07]. The authors asked whether the problem is tractable. That means the
number of function evaluation to obtain an error smaller than ε can be polynomially
bounded by the dimension and the Lipschitz constant. So we give a positive answer;
the problem is tractable, at least if we consider bounded integrands f.
2. Notation and basics
Inthissectionweexplainthemostimportantfactsanddefinitionswhichwearegoing
to use in the analysis. For introductory literature to general Markov chains we refer
the reader to [MT93], [Num84] or [Rev84]. Throughout this study we assume that
(Ω, ) is a measurable countably generated space. Then we call K : Ω [0,1]
A ×A →
Markov kernel or transition kernel if
(i) for each x Ω the mapping A K(x,A) induces a probability measure
∈ ∈ A 7→
on Ω,
EXPLICIT ERROR BOUNDS FOR MARKOV CHAIN MONTE CARLO 3
Algorithm: Sδ (f,̺)
n,n0
(1) choose X randomly on Bd;
1
(2) for i = 1,...,n+n do
0
if rand() < 1/2 then X := X ;
i+1 i
•
else
•
- choose Y B(X ,δ) uniformly;
i
∈
- if Y / Bd then X := X ;
i+1 i
∈
- if Y Bd and ̺(Y) ̺(X ) then X := Y;
i i+1
∈ ≥
- if Y Bd and ̺(Y) ̺(X ) then
i
∈ ≤
- X := Y with Prob ̺(Y)/̺(X ) and
i+1 i
- X := X with Prob 1 ̺(Y)/̺(X ).
i+1 i i
−
(3) Return:
n
1
Sδ (f,̺) := f(X ).
n,n0 n j+n0
j=1
X
Algorithm 1: Metropolis algorithm for S(f,̺)
(ii) for each A the mapping x Ω K(x,A) is an -measurable real
∈ A ∈ 7→ A
function.
In addition = (Ω, , K(x, ) : x Ω ) is the associated Markov scheme. This
M A { · ∈ }
notationistaken from[LS93].AMarkov chainX ,X ,... isgiven throughaMarkov
1 2
scheme and a start distribution ν on Ω. The transition kernel K(x,A) of the
M
Markov chain describes the probability of getting from x Ω to A in one
∈ ∈ A
step. Another important assumption is that the given distribution π is stationary
concerning the considered Markov chain, i.e. for all A
∈ A
π(A) = K(x,A)π(dx).
ZΩ
Roughly spoken that means: Choosing the starting point with distribution π, then
after one step we have the same distribution as before. Another similar but stronger
restriction of the chain is reversibility. A Markov scheme is reversible with respect
to π if for all A,B
∈ A
K(x,A)π(dx) = K(x,B)π(dx).
ZB ZA
The next outcome is taken from [LS93]. But it is not proven there so we will give
an idea of the proof.
Lemma 1. Let be a reversible Markov scheme and let F : Ω Ω R be
M × →
integrable. Then
(2) F(x,y) K(x,dy)π(dx) = F(y,x) K(x,dy)π(dx).
ZΩZΩ ZΩZΩ
Proof. The result is shown using a standard technique of integration theory. Since
the Markov scheme is reversible we have
I (x,y)K(x,dy)π(dx) = I (y,x)K(x,dy)π(dx)
A×B A×B
ZΩZΩ ZΩZΩ
4 DANIELRUDOLF
for A,B . Having finished this we develop the equality of the integrals for an
∈ A
arbitrary set C , where is the product σ-algebra of with itself.
∈ A ⊗ A A ⊗ A A
This is an application of the Dynkin system theorem. Then we consider the case
where f is a simple function, which is straightforward. The next step is to obtain
the equality for positive function and after that extending the result to general
(cid:3)
integrable ones.
Remark 1. If we have a Markov scheme, which is not necessarily reversible but has
a stationary distribution the following holds true
S(f) = f(x)π(dx) = f(y)K(x,dy)π(dx),
ZΩ ZΩZΩ
where f : Ω R is integrable. This can be seen easily by using the same steps as
→
in the proof of Lemma 1.
By Kn(x, ) we denote the n-step transition probabilities and we have for x Ω,
· ∈
A that
∈ A
Kn(x,A) := Kn−1(y,A)K(x,dy) = K(y,A)Kn−1(x,dy).
ZΩ ZΩ
This again constitutes a transition kernel of a Markov chain sharing the invariant
distribution and reversibility with the original one. Thus the outcomes of Lemma 1
and Remark 1 also hold for the n-step transition probabilities i.e.
(3) F(x,y)Kn(x,dy)π(dx) = F(y,x) Kn(x,dy)π(dx).
ZΩZΩ ZΩZΩ
Now we define for a Markov scheme a nonnegative operator P : L (Ω,π)
∞
M →
L (Ω,π) by
∞
(Pf)(x) = f(y)K(x,dy).
ZΩ
(Nonnegative means: if f 0 then Pf 0.) This operator is called Markov or
≥ ≥
transition operator concerning a Markov scheme and describes the expected
M
value of f after one step with the Markov chain from x Ω. The expected value of
∈
f from x Ω after n-steps with the Markov chain is given as
∈
(Pnf)(x) = f(y)Kn(x,dy).
ZΩ
Let us now consider P on the Hilbert space L (Ω,π) and f,g = f(x)g(x)π(dx)
2 h i Ω
denotes the canonical scalar product. Notice that the considered function space is
R
chosen according to the invariant measure. Then we have with Lemma 1
1
(4) f,f f,Pf = (f(x) f(y))2K(x,dy)π(dx) 0.
h i±h i 2 ± ≥
ZΩZΩ
From a functional analysis point of view that means P 1. It is straight-
k kL2→L2 ≤
forward to show that Pn 1 for p = 1,2 or and n N.
k kLp→Lp ≤ ∞ ∈
Let X ,X ,... be the result of a reversible Markov chain. The expectation of the
1 2
chain with starting distribution ν = π and Markov kernel K from scheme is
M
EXPLICIT ERROR BOUNDS FOR MARKOV CHAIN MONTE CARLO 5
denoted by E . Then we get for f L (Ω,π)
π,K 2
∈
E (f(X )) = E (f(X )) = 1,f = S(f),
π,K i π,K 0
h i
E (f(X )2) = E (f(X )2) = f,f = S(f2),
π,K i π,K 0
h i
(5) E (f(X )f(X )) = E (f(X )f(X )) = f,P|i−j|f .
π,K i j π,K 0 |i−j|
The assumption that the initial distribution is the stationary one makes the cal-
(cid:10) (cid:11)
culation easy. In the general case, where the starting point is chosen by a given
probability distribution ν, we obtain for i j and functions f L (Ω,π)
2
≤ ∈
E (f(X )) = Pif(x)ν(dx),
ν,K i
ZΩ
E (f(X )f(X )) = Pi(f(x)Pj−if(x))ν(dx).
ν,K i j
ZΩ
It is easy to verify with (2) that P is self-adjoint as acting on L (Ω,π). In the
2
next part we are going to get one more convenient characteristic of P under some
additional restrictions.
3. Laziness and Conductance
An introduction to laziness and a more detailed view on the conductance is given in
[LS93].Mostresultswhichwearegoingtomentionherearetakenfromthisreference.
A Markov scheme = (Ω, , K(x, ) : x Ω ) is called lazy if K(x, x ) 1/2
M A { · ∈ } { } ≥
for all x Ω. This means the chain stays at least with probability 1/2 in the current
∈
state. Notice that the resulting chain from Algorithm 1 (see page 3) is lazy because
of line three. The crucial fact for slowing down is to deduce that the associated
Markov operator P is positive semidefinite. Therefore we study only lazy chains.
This is formalized in the next Lemma.
Lemma 2. Let be a lazy, reversibleMarkov schemethen we haveforf L (Ω,π)
2
M ∈
(6) Pf,f 0.
h i ≥
Proof. We consider another Markov scheme := (Ω, , K(x, ) : x Ω ), where
M A { · ∈ }
K(x,A) = 2K(x,A) I(x,A) with
−
f e
1 x A
e I(x,A) = ∈
0 x Ac
(
∈
forallA .To verify,thatK is againatransitionkernel we needK(x, x ) 1/2.
∈ A { } ≥
The reversibility condition for holds, since scheme is reversible. The Markov
M M
operator of is given by P =e(2P I), where I is the identity. Since we established
M −
reversibility of the new schemefwe obtain by applying Lemma 1 equality (4) for P.
So it is truefthat e
f,f (2P I)f,f f,f . e
−h i ≤ h − i ≤ h i
Now let us consider
1 1
Pf,f = f,f + (2P I)f,f 0,
h i 2 h i 2 h − i ≥
(cid:3)
such that the claim is proven.
6 DANIELRUDOLF
Having finished this, we can turn to the conductance of the Markov chain. For a
Markov scheme = (Ω, , K(x, ) : x Ω ), which is not necessarily lazy, it is
M A { · ∈ }
defined by
K(x,Ac)π(dx)
ϕ(K,π) = inf A ,
0<π(A)≤1/2 π(A)
R
where π is a stationary distribution. The numerator of the conductance describes
the probability of leaving A in one step, where the starting point is chosen by
π. An important requirement for the following is that the scheme has a positive
conductance, since the next result is not useful otherwise.
Lemma 3. Let be a lazy, reversible Markov scheme and let ν be the initial
M
distribution. Furthermore we assume that the probability distribution ν has a bounded
density function dν with respect to π. Then for A we obtain
dπ ∈ A
dν ϕ(K,π)2 j
(7) Kj(x,A)ν(dx) π(A) 1 .
(cid:12)ZΩ − (cid:12) ≤ s(cid:13)dπ(cid:13)∞(cid:18) − 2 (cid:19)
(cid:12) (cid:12) (cid:13) (cid:13)
(cid:12) (cid:12) (cid:13) (cid:13)
Proof. Look at the result of [LS93, Corollary 1.5, p. 372] and translate it in our
(cid:12) (cid:12) (cid:13) (cid:13)
(cid:3)
notation.
Remark 2. The left hand side of (7) can be transformed as follows
dν
Kj(x,A)ν(dx) π(A) = Kj(x,dy) (x)π(dx) π(A)
− dπ −
ZΩ ZΩZA
dν dν
= (y)Kj(x,dy)π(dx) (y)π(dy)π(dx)
(3) ZAZΩ dπ −ZAZΩ dπ
dν
= (y)(Kj(x,dy) π(dy))π(dx).
dπ −
ZAZΩ
Now it is clear that with Lemma 3 for A
∈ A
dν dν ϕ(K,π)2 j
(8) (y)(Kj(x,dy) π(dy))π(dx) 1 .
(cid:12)ZAZΩ dπ − (cid:12) ≤ s(cid:13)dπ(cid:13)∞(cid:18) − 2 (cid:19)
(cid:12) (cid:12) (cid:13) (cid:13)
Remar(cid:12)k 3. Observe, that we got a bound for(cid:12)the sp(cid:13)eed(cid:13)of convergence to station-
(cid:12) (cid:12) (cid:13) (cid:13)
arity of the considered Markov chain. Once more it is possible to estimate the right
hand side of (8), in detail
dν ϕ(K,π)2 j dν ϕ(K,π)2
(9) 1 exp j
s dπ − 2 ≤ s dπ − 2
(cid:13) (cid:13)∞(cid:18) (cid:19) (cid:13) (cid:13)∞ (cid:20) (cid:21)
(cid:13) (cid:13) (cid:13) (cid:13)
holds true. (cid:13) (cid:13) (cid:13) (cid:13)
(cid:13) (cid:13) (cid:13) (cid:13)
To use the conductance we need a connection to the operator P. This is given in
form of the so called Cheeger inequality. Before we are going to state this conclusion
in a slightly different formulation we define a subset of L (Ω,π) as follows
2
L0 = L0(Ω,π) := f L (Ω,π) : S(f) = 0 .
2 2 { ∈ 2 }
EXPLICIT ERROR BOUNDS FOR MARKOV CHAIN MONTE CARLO 7
Lemma 4 (Cheeger’s inequality). Let be a reversible Markov scheme with con-
M
ductance ϕ(K,π). Then for g L0
∈ 2
ϕ(K,π)2 j
(10) Pjg,g 1 g 2.
≤ − 2 k k2
(cid:18) (cid:19)
(cid:10) (cid:11)
Proof. See [LS93, Corollary 1.8, p. 375]. (cid:3)
Remark 4. There are many other references where the convergence rate of Markov
chains to stationarity is studied, see e.g. [JS89, DS91, Ros95, RR04]. One approach
is to bound the second eigenvalue of the operator P. The relation between the
eigenvalue and the conductance of a Markov chain is given by Cheeger’s inequality
(see Lemma 4). In this context the laziness condition shifts the spectrum of the
Markov operator P restricted to L0 from ( 1,1) by the transformation described in
2 −
Lemma 2 to (0,1) i.e. the second eigenvalue is always positive.
4. Error bounds
This section contains the main result andits proof.At first we are going to repeat an
already known finding, which is used to show an explicit error bound for a general
Markov scheme with initial probability distribution ν. Most arguments to obtain
that result are from [LS93] and [Mat99].
The next conclusion considers an algorithm under the assumption that the starting
point is chosen according to the stationary distribution. So a preliminary burn-in
period is not necessary anymore since we are already at the invariant distribution.
Theorem 5. Let be a lazy, reversible Markov scheme with stationary distribution
M
π, let X ,X ,... be a Markov chain generated by with initial distribution π. Let
1 2
f L (Ω,π), S(f) = f(x)π(dx) and S (f) := SM (f) = 1 n f(X ). Then we
∈ 2 Ω n n,0 n j=1 j
obtain
R P
4
e(S ,f)2 = E S(f) S (f) 2 f 2.
n π,K| − n | ≤ ϕ(K,π)2 n k k2
·
Remark 5. This proof is again taken from [LS93, Theorem 1.9, p. 375]. Since it is
very important in our analysis and because of the slightly different notation we will
repeat it.
8 DANIELRUDOLF
Proof. Let g := f S(f), such that g L0. Then we have with Lemma 2, Lemma 4
− ∈ 2
and g f that
k k2 ≤ k k2
n 2
1
E S(f) S (f) 2 = E g(X )
π,K n π,K j
| − | (cid:12)n (cid:12)
(cid:12) Xj=1 (cid:12)
n n (cid:12) n (cid:12) n
1 (cid:12) 1 (cid:12)
= E (g(X )g(X(cid:12))) = (cid:12) E (g(X )g(X ))
n2 π,K j i (5) n2 π,K 0 |i−j|
j=1 i=1 j=1 i=1
XX XX
n−1
1
= n g,g + 2(n k) Pkg,g
n2 h i −
!
k=1
X (cid:10) (cid:11)
1 n−1 2 ∞ 2 ∞ ϕ(K,π)2 k
2(n k) Pkg,g Pkg,g 1 g 2
≤ n2 − ≤ n ≤ n − 2 k k2
(6) (10)
k=0 k=0 k=0(cid:18) (cid:19)
X (cid:10) (cid:11) X(cid:10) (cid:11) X
4 4
= g 2 f 2.
ϕ(K,π)2 n k k2 ≤ ϕ(K,π)2 n k k2
· ·
Notice that laziness is essentially used by applying Pkg,g 0 in the second
≥ (cid:3)
inequality.
(cid:10) (cid:11)
Let us consider the more general case, where the initial distribution is not the
stationary one. In the next statement a relation between the error of starting with
π and the error of starting not with the invariant distribution is established.
Lemma 6. Let be a reversible Markov scheme with stationary distribution π, let
M
X ,X ,... be a Markov chain generated by with initial distribution ν. Let dν be
1 2 M dπ
a bounded density of ν with respect to π. Then we get for g := f S(f) L0
− ∈ 2
(11) E S(f) S (f) 2 = E S(f) S (f) 2
ν,K| − n,n0 | π,K| − n |
n
1 dν
+ (y) Kn0+j(x,dy) π(dy) g(x)2 π(dx)
n2 dπ −
j=1 ZΩZΩ
X (cid:0) (cid:1)
n−1 n
2 dν
+ (y) Kn0+j(x,dy) π(dy) g(x)Pk−jg(x) π(dx).
n2 dπ −
j=1 k=j+1ZΩZΩ
X X (cid:0) (cid:1)
Proof. It is easy to see, that
n n
1
E S(f) S (f) 2 = E (g(X )g(X ))
ν,K| − n,n0 | n2 ν,K n0+j n0+i
j=1 i=1
XX
n n−1 n
1 2
= Pn0+jg(x)2 ν(dx)+ Pn0+j(g(x)Pk−jg(x)) ν(dx).
n2 n2
j=1 ZΩ j=1 k=j+1ZΩ
X X X
EXPLICIT ERROR BOUNDS FOR MARKOV CHAIN MONTE CARLO 9
For every function h L (Ω,π) and i N under applying (3) the following trans-
2
∈ ∈
formation holds true
dν
Pih(x)ν(dx) = h(y)Ki(x,dy) (x)π(dx)
dπ
ZΩ ZΩZΩ
dν
= (y)Ki(x,dy)h(x)π(dx)
(3) ZΩZΩ dπ
dν
= h(x)π(dx)+ (y) Ki(x,dy) π(dy) h(x)π(dx)
dπ −
ZΩ ZΩZΩ
dν (cid:0) (cid:1)
= Pih(x)π(dx)+ (y) Ki(x,dy) π(dy) h(x)π(dx).
(3) ZΩ ZΩZΩ dπ −
(cid:0) (cid:1)
(cid:3)
Using this in the above setting formula (11) is shown.
The next finding is also a helpful tool to prove the main result of this paper. It
modifies the convergence property, which is described in Lemma 3, such that we are
able to use it in the considered context.
Lemma 7. Let be a lazy, reversible Markov scheme with stationary distribution
M
π, let ν be the initial distribution with bounded density dν of the related Markov
dπ
chain. Then we obtain for h L (Ω,π) and j N
∞
∈ ∈
dν dν ϕ(K,π)2 j
(y) Kj(x,dy) π(dy) h(x)π(dx) 4 h 1 .
(cid:12)ZΩZΩ dπ − (cid:12) ≤ k k∞s(cid:13)dπ(cid:13)∞(cid:18) − 2 (cid:19)
(cid:12) (cid:0) (cid:1) (cid:12) (cid:13) (cid:13)
(cid:12) (cid:12) (cid:13) (cid:13)
(cid:12) (cid:12) (cid:13) (cid:13)
Proof. At first we define p (x) := dν(y)(Kj(x,dy) π(dy)). With the standard
j Ω dπ −
proof technique of integration theory it is easy to see that the measurability of the
R
density and the kernel can be carried over to p . Now we consider the positive and
j
negative parts of the functions h and p . To formalize this we use
j
Ω+ := x Ω : p (x) 0, h(x) 0 ,
+ { ∈ j ≥ ≥ }
Ω+ := x Ω : p (x) 0, h(x) < 0 ,
− { ∈ j ≥ }
Ω− := x Ω : p (x) < 0, h(x) 0 ,
+ { ∈ j ≥ }
Ω− := x Ω : p (x) < 0, h(x) < 0 .
− { ∈ j }
10 DANIELRUDOLF
These subsets of Ω are all included in the σ-algebra ,since p and hare measurable
j
A
functions. So applying (8) leads to the following upper bound
p (x)h(x)π(dx) p (x)h(x)π(dx) + p (x)h(x)π(dx)
j j j
(cid:12)ZΩ (cid:12) ≤ (cid:12)(cid:12)ZΩ++ (cid:12)(cid:12) (cid:12)(cid:12)ZΩ+− (cid:12)(cid:12)
(cid:12) (cid:12)
(cid:12) (cid:12) (cid:12) (cid:12)
(cid:12) (cid:12)
(cid:12) (cid:12) (cid:12) (cid:12)
(cid:12) (cid:12)
+ p (x)h(x)π(cid:12)(dx) + p (x)h(x(cid:12))π(cid:12)(dx) (cid:12)
j j
(cid:12)(cid:12)ZΩ−+ (cid:12)(cid:12) (cid:12)(cid:12)ZΩ−− (cid:12)(cid:12)
(cid:12) (cid:12) (cid:12) (cid:12)
(cid:12) (cid:12) (cid:12) (cid:12)
h (cid:12) p (x)π(dx) +(cid:12) h(cid:12) p (x)π(dx) (cid:12)
≤ k k∞(cid:12)(cid:12)ZΩ++ j (cid:12)(cid:12) k k∞(cid:12)(cid:12)ZΩ+− j (cid:12)(cid:12)
(cid:12) (cid:12) (cid:12) (cid:12)
(cid:12) (cid:12) (cid:12) (cid:12)
+ h(cid:12) p (x)π(cid:12)(dx) + h(cid:12) p (x)π(cid:12)(dx)
k k∞(cid:12)(cid:12)ZΩ−+ j (cid:12)(cid:12) k k∞(cid:12)(cid:12)ZΩ−− j (cid:12)(cid:12)
(cid:12) (cid:12) (cid:12) (cid:12)
(cid:12) dν ϕ(K(cid:12) ,π)2 j (cid:12) (cid:12)
4 h (cid:12) 1 (cid:12) .(cid:12) (cid:12)
(≤8) k k∞s(cid:13)dπ(cid:13)∞(cid:18) − 2 (cid:19)
(cid:13) (cid:13)
(cid:13) (cid:13)
(cid:13) (cid:13) (cid:3)
Now all results are available to obtain our main error bound for the MCMC method
S .
n,n0
Theorem 8. Let X ,X ,... be a lazy, reversible Markov chain, defined by the
1 2
scheme and the initial distribution ν. Let the initial distribution have a bounded
density Mdν with respect to π. Let S (f) = 1 n f(X ) be the approximation
dπ n,n0 n j=1 n0+j
of S(f) = f(x)π(dx), where f L (Ω,π). Then
Ω ∈ ∞ P
R
2 1+24 dν exp n ϕ(K,π)2
dπ ∞ − 0 2
e(S ,f) r f .
n,n0 ≤ qϕ(cid:13)(K(cid:13),π) √nh i k k∞
(cid:13) (cid:13) ·
Proof. By Lemma 6 and Lemma 7 where g := f S(f) we have
−
E S(f) S (f) 2 E S(f) S (f) 2
ν,K| − n,n0 | ≤ π,K| − n |
4 g 2 n dν ϕ(K,π)2 j+n0
+ k k∞ 1
n2 s dπ − 2
j=1 (cid:13) (cid:13)∞(cid:18) (cid:19)
X (cid:13) (cid:13)
8 g 2 n−1 n(cid:13)(cid:13) (cid:13)(cid:13) dν ϕ(K,π)2 j+n0
+ k k∞ Pk−j 1 .
n2 L∞→L∞ s dπ − 2
j=1 k=j+1 (cid:13) (cid:13)∞(cid:18) (cid:19)
X X (cid:13) (cid:13) (cid:13) (cid:13)
(cid:13) (cid:13) (cid:13) (cid:13)
(cid:13) (cid:13)
For an easier notation we define
dν ϕ(K,π)2
(12) ε := exp n .
0 0
s dπ − 2
(cid:13) (cid:13)∞ (cid:20) (cid:21)
(cid:13) (cid:13)
(cid:13) (cid:13)
(cid:13) (cid:13)