ebook img

Bayesian Parameter Inference for Partially Observed Stopped Processes PDF

0.75 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bayesian Parameter Inference for Partially Observed Stopped Processes

Bayesian Parameter Inference for Partially Observed Stopped Processes Ajay Jasra1 and Nikolas Kantas2 1Department ofStatistics &AppliedProbability,NationalUniversityofSingapore, Singapore, 117546,Sg. 2 E-Mail:[email protected] 1 2Department ofElectrical Engineering, ImperialCollegeLondon,London,SW72AZ,UK. 0 2 E-Mail:[email protected] n Abstract a J InthisarticleweconsiderBayesian parameterinferenceassociated topartially-observedstochasticprocesses 8 that start from a set B0 and are stopped or killed at the first hitting time of a known set A. Such processes 1 occur naturally within the context of a wide variety of applications. The associated posterior distributions are highly complex and posterior parameter inference requires the use of advanced Markov chain Monte Carlo ] (MCMC) techniques. Our approach uses a recently introduced simulation methodology, particle Markov chain O MonteCarlo (PMCMC) [1],wheresequentialMonteCarlo(SMC)[18,27]approximationsareembeddedwithin C MCMC.However,whentheparameterofinterestisfixed,standardSMCalgorithmsarenotalwaysappropriate . formanystoppedprocesses. In[11,15]theauthorsintroduceSMCapproximationsofmulti-levelFeynman-Kac t a formulae, which can lead to more efficient algorithms. This is achieved by devising a sequence of nested sets st from B0 to A and then perform the resampling step only when the samples of the process reach intermediate [ level sets in the sequence. Naturally, the choice of the intermediate level sets is critical to the performance of such a scheme. In this paper, we demonstrate that multi-level SMC algorithms can be used as a proposal in 1 PMCMC.Inaddition,weproposeaflexiblestrategythatadaptsthelevelsetsfordifferentparameterproposals. v Ourmethodology is illustrated on thecoalescent model with migration. 7 Key-Words: Stopped Processes, SequentialMonte Carlo, Markov chain Monte Carlo 6 7 3 1 Introduction . 1 0 In this article we consider Markov processes that are stopped when reaching the boundary of a given set A. These 2 1 processes appear in a wide range of applications, such as population genetics [24, 14], finance [7], neuroscience [5], : physics [17, 22] and engineering [6, 26]. The vast majority of the papers in the literature deal with fully observed v stoppedprocessesandassumetheparametersofthemodelareknown. Inthispaperweaddressproblemswhenthis i X is not the case. In particular,Bayesianinference for the model parametersis considered,when the stopped process r is observed indirectly via data. We will propose a generic simulation method that can cope with many types of a partial observations. To the best of our knowledge,there is no previous work in this direction. An exception is [5], where maximum likelihood inference for the model parameters is investigated for the fully observedcase. Inthefullyobservedcase,stoppedprocesseshavebeenstudiedpredominantlyintheareaofrareeventsimulation. In order to estimate the probability of rare events related to stopped processes, one needs to efficiently sample realisationsofa processthat starts ina set B andterminates inthe givenraretargetset Abefore returning to B 0 0 or getting trapped in some absorbing set. This is usually achieved using Importance Sampling (IS) or multi-level splitting; see[19, 30] and the references in those articles for an overview. Recently, sequential Monte Carlo (SMC) methods based on both these techniques have been used in [6, 17, 22]. In [10] the authors also prove under mild conditions that SMC can achieve same performance as popular competing methods based on traditional splitting. Sequential Monte Carlo methods canbe described as a collection of techniques used to approximatea sequence ofdistributionswhosedensitiesareknownpoint-wiseuptoanormalizingconstantandareofincreasingdimension. SMCmethodscombineimportancesamplingandresamplingtoapproximatedistributions. Theideaistointroduce a sequence of proposal densities and to sequentially simulate a collection of N ≫ 1 samples, termed particles, in parallel from these proposals. The success of SMC lies in incorporating a resampling operation to control the variance of the importance weights, whose value would otherwise increase exponentially as the target sequence progresses e.g. [18, 27]. 1 Applying SMC in the context of fully observed stopped processes requires using resampling while taking into accounthowcloseasampleistothetargetset. Thatis,itispossiblethatparticlesclosetoAarelikelytohavevery small weights, whereas particles closer to the starting set B can have very high weights. As a result, the diversity 0 of particles approximating longer paths before reaching A would be depleted by successive resampling steps. In population genetics, for the coalescent model [24], this has been noted as early as in the discussion of [31] by the authors of [11]. Later, in [11] the authors used ideas from splitting and proposed to perform the resampling step onlywheneachsampleoftheprocessreachesintermediatelevelsets,whichdefineasequenceofnestedsetsfromB 0 to A. The same idea appeared in parallel in [15, Section 12.2], where it was formally interpreted as an interacting particle approximationofappropriatemulti-levelFeynman-Kacformulae. Naturally,the choice ofthe intermediate level sets is critical to the performance of such a scheme. That is, the levels should be set in a “direction” towards the set A and so that each level can be reached from the previous one with some reasonable probability [19]. This is usually achievedheuristically using trialsimulation runs. Also more systematic techniques exist: for cases where large deviations can be applied in [13] the authors use optimal control and in [8, 9] the level sets are computed adaptively on the fly using the simulated paths of the process. ThecontributionofthispaperistoaddresstheissueofinferringtheparametersofthelawofthestoppedMarkov process, when the process itself is a latent process and is only partially observed via some given dataset. In the context of Bayesianinference one often needs to sample from the posterior density of the model parameters,which can be very complex. Employing standard Markov chain Monte Carlo (MCMC) methods is not feasible, given the difficulty one faces to sample trajectories of the stopped process. In addition, using SMC for sequential parameter inferencehasbeennotoriouslydifficult;see[2,23]. Inparticular,duetothesuccessiveresamplingstepsthesimulated past of the path of each particle will be very similar to each other. This has been a long standing bottleneck when static parameters θ are estimated online using SMC methods by augmenting them with the latent state. These issues have motivated the recently introduced particle Markov chain Monte Carlo (PMCMC) [1]. Essentially, the method constructs a Markov chain on an extended state-space in the spirit of [3], such that one may apply SMC updates for a latentprocess,i.e. use SMC approximationswithin MCMC.In the contextofparameterinference for stopped process this brings up the possibility of using the multi-level SMC methodology as a proposal in MCMC. This idea to the best ofour knowledgehasnotappearedpreviouslyinthe literature. The main contributionsmade in this article are as follows: • When the sequence of level sets is fixed a priori, the validity of using multi-level SMC within PMCMC is verified. • Toenhanceperformanceweproposeaflexibleschemewherethelevelsetsareadaptedtothecurrentparameter sample. The method is shownto produce unbiasedsamples from the targetposteriordensity. Inaddition, we show both theoretically and via numerical examples how the mixing of the PMCMC algorithm is improved when this adaptive strategy is adopted. Thisarticleisstructuredasfollows: inSection2weformulatetheproblemandpresentthecoalescentasamotivating example. InSection3wepresentmulti-levelSMCforstoppedprocesses. InSection4wedetailaPMCMCalgorithm whichusesmulti-levelSMCapproximationswithinMCMC.Inaddition,specificadaptivestrategiesforthelevelsare proposed,whicharemotivatedbysometheoreticalresultsthatlink the convergencerateofthe PMCMC algorithm tothepropertiesofmulti-levelSMCapproximations. InSection5somenumericalexperimentsforthethecoalescent are given. The paper is concluded in Section 6. The proofs of our theoreticalresults canbe found in the appendix. 1.1 Notations Thefollowingnotationswillbeused. Ameasurablespaceiswrittenas(E,E),withtheclassofprobabilitymeasures on E written P(E). For Rn, n∈N the Borelsets are B(Rn). For a probability measure γ ∈P(E) we will denote the density with respect to an appropriate σ-finite measure dx as γ(x). The total variation distance between two probabilitymeasuresγ ,γ ∈P(E)is writtenaskγ −γ k=sup |γ (A)−γ (A)|. Foravector(x ,...,x ),the 1 2 1 2 A∈E 1 2 i j compactnotationx isused;ifi>j x isanullvector. Foravectorx ,|x | istheL −norm. Theconvention i:j i:j 1:j 1:j 1 1 =1 is adopted. Also, min{a,b} is denoted as a∧b and I (x) is the indicator of a set A. Let E be a countable ∅ A (possibly infinite dimensional) state-space, then Q S(E)= R=(r ) : r ≥0, r =1 and νR=ν for some ν =(ν ) with ν ≥0, ν =1 ij i,j∈E ij il i i∈E i l ( ) l∈E l∈E X X 2 denotes the class of stochastic matrices which possess a stationary distribution. In addition, we will denote as e =(0,...,0,1,0,...,0) the d-dimensional vector whose ith element is 1 and is 0 everywhere else. Finally, for the i discrete collection of integers we will use the notation T ={1,...,d}. d 2 Problem Formulation 2.1 Preliminaries Let θ be a parameter vector on (Θ,B(Θ)), Θ ⊆ Rdθ with an associated prior πθ ∈ P(Θ). The stopped process {X } is a (E,E)−valued discrete-time Markov process defined on a probability space (Ω,F,P ), where P is a t t≥0 θ θ probabilitymeasuredefinedforeveryθ ∈ΘsuchthatforeveryA∈F, P (A) isB(Θ)−measurable. For simplicity θ willwewillassumethroughoutthepaperthattheMarkovprocessishomogeneous. Thestateoftheprocess{X } t t≥0 beginsitsevolutioninanonemptysetB obeyinganinitialdistributionν : B →P(B )andaMarkovtransition 0 θ 0 0 kernel P : E×Θ → P(E). The process is killed once it reaches a non-empty target set A ⊂ B ∈ F such that θ 0 P (X ∈B \A)=1. The associated stopping time is defined as θ 0 0 T =inf{t≥0:X ∈A}, t where it is assumed that P (T < ∞) = 1 and T ∈ I, where I is a collection of positive integer values related to θ possible stopping times. In this paper we assume that we have no direct access to the state of the process. Instead the evolution of the state of the process generates a random observations’ vector, which we will denote as Y. The realisation of this observations’ vector is denoted as y and assume that it takes value in some non empty set F. We will also assume that there is no restriction on A depending on the observed data y, but to simplify exposition this will be omitted from the notation. In the context of Bayesian inference we are interested in the posterior distribution: π(dθ,dx ,τ|y)∝γ (dx ,y,τ)π(dθ), (1) 0:τ θ 0:τ where τ ∈I is the stopping time, π is the prior distribution and γ is the un-normalised complete-data likelihood θ θ with the normalising constant of this quantity being: Z = γ (dx ,y,τ)dx . θ θ 0:τ 0:τ τ∈IZEτ+1 X The subscript on θ will be used throughout to explicitly denote the conditional dependance on the parameter θ. Given the specific structure of the stopped processes one may write γ as θ τ γ (dx ,y,τ)=ξ (y|x )I (x )ν (dx ) P (dx |x ), (2) θ 0:τ θ 0:τ (Ac)τ×A 0:τ θ 0 θ t t−1 t=1 Y where ξ : Θ×F × {τ}×Eτ+1 → (0,1) is the likelihood of the data given the trajectory of the process. θ τ∈I Throughout, it will be assumed that for any θ ∈ Θ, y ∈ F, γ admits a density γ (x ,y,τ) with respect to a σ−finite measure dx(cid:0)Son E = {(cid:1)τ}×Eτ+1 and the posθterior and prior distrθibu0t:τions π, p admit densities 0:τ τ∈I π, p respectively both defined with respect to appropriate σ−finite dominating measures. (cid:0)S (cid:1) Note that (1) is expressed as an inference problem for (θ,x ,τ) and not only θ. The overall motivation 0:τ originates from being able to design a MCMC that can sample from π, which requires one to write the target (or an unbiased estimate of it) up-to a normalizing constant [3]. Still, our primary interest lies in Bayesian inference for the parameter and this can be recovered by the marginals of π with respect to θ. As it will become clear in Section 4 the numerical overheadwhen augmenting the argument of the posterior is necessary and we believe that the marginals with respect to x ,τ might also be useful by-products. 0:τ 2.2 Motivating example: the coalescent The framework presented so far is rather abstract, so we introduce the coalescent model as a motivating example. In Figure 1 we presenta particularrealisationof the coalescentfor two genetic types {A,C}. The process starts at epoch t=0 when the most recentcommonancestor(MRCA) splits into two versionsof itself. In this example A is 3 A t=0 split A→C t=1 mutation t=2 split A→C t=3 mutation τ =4 stop y ={1,2} C A C Figure 1: Coalescent model example: each of {A,C} denote the possible genetic type of observed chromosomes. In this example we have d = 2 and m = 3. The tree propagates forward in time form the MRCA downwards by a sequence of split and mutation moves. Arrows denote a mutation of one type of a chromosome to another. The name of the process originates from viewing the tree backwards in time (from bottom to top) where the points where the graph join are coalescent events. chosen to be the MCRA and the process continues to evolve by split and mutation moves. At the stopping point (here t=4) we observe some data y, which corresponds to the number of genes for each genetic type. In general we will assume there are d different genetic types. The latent state of the process xi is composed of t the number of genes of each type i at epoch t of the process and let also x = (x1,...,xd). The process begins by t t t default when the first split occurs, so the Markov chain {X } is initialised by the density t t≥0 ν if x =2e ν (x )= i 0 i θ 0 0 otherwise. (cid:26) and is propagated using the following transition density: xit−1 µ r if x =x −e +e (mutation) |xt−1|1|xt−1|1−1+µ il t t−1 i l Pθ(xt|xt−1)= xit−1 |xt−1|1−1 if x =x +e (split)  0|xt−1|1|xt−1|1−1+µ othertwise,t−1 i where e is defined in Section 1.1.Here the first transition type corresponds to individuals changing type and is i calledmutation,e.g. A→C att∈{1,3}inFigure1. Thesecondtransitionis calledasplitevent,e.g. t∈{0,2}in the exampleofFigure1. ToavoidanyconfusionwestressthatinFigure1wepresentaparticularrealisationofthe process that is composedby a sequence of alternate split and mutations, but this is not the only possible sequence. For example, the bottom of the tree could have be obtained with C being the possible MCRA and a sequence of two consecutive splits and a mutation. The processis stoppedatepoch τ whenthe number ofindividuals inthe populationreachesm. So forthe state space we define: E = {t}×Et+1 t∈I(cid:18) (cid:19) [ E = {x: x∈(Z+)d and 2≤|x| ≤m} 1 I = {m,m+1,...}, 4 and for the initial and terminal sets we have: B = {x: x∈{0,2}d and |x| =2} 0 1 A = {x:x∈(Z+)d and |x| =m}. 1 The data is generated by setting y := y1:d = x (∈A), which corresponds to the counts of genes that have been τ observed. In the example of Figure 1 this corresponds to m=3. Hence for the complete likelihood we have: d yi! τ γ (x ,y,τ)=I (x ) i=1 ν (x ) P (x |x ) (3) θ 0:τ A∩{x:x=y} τ m! θ 0 θ t t−1 Q (cid:20) t=1 (cid:21) Y As expected, the density is only non-zero if at time τ x matches the data y exactly. τ Our objective is to infer the genetic parameters θ = (µ,R), where µ ∈ R+ and R ∈ S(T ) and hence the d parameter space can be written as Θ = R+ ×S(T ). To facilitate Monte Carlo inference, one can reverse the d time parameter and simulate backward from the data. This is now detailed in the context of importance sampling following the approachin [21]. 2.2.1 Importance sampling for the coalescent model To sample realisations of the process for a given θ ∈ Θ, importance sampling is adopted but with time reversed. First we introduce a time reversed Markov kernel M with density M (x |x ). This is used as an importance θ θ t−1 t sampling proposalwheresampling isperformedbackwardsin time andthe weightingforwardintime. We initialise using the data and simulate the coalescent tree backward in time until two individuals remain of the same type. This procedure ensures that the data is hit when the tree is considered forward in time. The process defined backward in time can be interpreted as a stopped Markov process with the definitions of the initial and terminal sets appropriately modified. For convenience we will consider the reverse event sequence of the previous section, i.e we posed the problem backwards in time with the reverse index being j. The proposal density for the full path starting from the bottom of the tree and stopping at its root can be written as τ q (x )=I (x ) M (x |x ) I (x ). θ 0:τ B0∩{x:x=y} 0 θ j j−1 B0 τ (cid:26)j=1 (cid:27) Y With reference to (3) we have m−1 d yi! τ P (x |x ) γ (x ,y,τ)= i=1 ν (x ) θ j−1 j q (x ). θ 0:τ m−1+µ m! θ τ M (x |x ) θ 0:τ Q (cid:26)j=1 θ j j−1 (cid:27) Y Then the marginal likelihood can be obtained m−1 d yi! τ P (x |x ) Z = i=1 ν (x ) θ j−1 j q (x )dx . θ m−1+µQm! τ∈IZEτ+1 θ τ (cid:26)j=1Mθ(xj|xj−1)(cid:27) θ 0:τ 0:τ X Y In[31]theauthorsderiveanoptimalproposalM withrespecttothevarianceofthemarginallikelihoodestimator. θ For the sake of brevity we omit any further details. In the current setup where there is only mutation and coalescences, the stopped-process can be integrated out [20], but this is not typically possible in more complex scenarios. A more complicated problem, including migration, is presented in Section 5.2. Finally, we remark that the relevance ofthe marginallikelihoodabovewill become clearlater in Section4 as a crucialelement in numerical algorithms for inferring θ. 3 Multi-Level Sequential Monte Carlo Methods In this section we shall briefly introduce generic SMC without extensive details. We refer the reader for a more detailed description to [15, 18]. To ease exposition, when presenting generic SMC, we shall drop the dependence upon parameter θ. 5 Algorithm 1 Generic SMC Algorithm Initialisation, n=1: For i=1,...,N 1. Sample u(i) ∼M . 1 1 2. Compute weights γ (u(i)) W(i) W(i) = 1 1 , W¯(i) = 1 . 1 M (u(i)) 1 N W(j) 1 1 j=1 1 For n=2,...,p, P For i=1,...,N, 1. Resampling: sample index ai ∼f(·|W¯ ), where W¯ =(W¯(1) ,...,W¯(N)). n−1 n−1 n−1 n−1 n−1 2. Sample u(i) ∼M (·|u(ain−1)) and set u(i) =(u(ain−1),u(i)). n n 1:n−1 1:n 1:n−1 n 3. Compute weights γ (u(i)) W(i) W(i) =w (u(i))= n 1:n , W¯(i) = n . n n 1:n γ (u(i) )M (u(i)|u(i) ) n N W(j) n−1 1:n−1 n n 1:n−1 j=1 n P SMC algorithms are designed to simulate from a sequence of probability distributions π ,π ,...,π defined on 1 2 p state space of increasing dimension, namely (G ,G ),(G ×G ,G ⊗G ),...,(G ×···×G ,G ⊗···⊗G ). Each 1 1 1 2 1 2 1 p 1 p distribution in the sequence is assumed to possess densities with respect to a common dominating measure: γ (u ) π (u )= n 1:n n 1:n Z n with each un-normalised density being γ : G ×···×G →R and the normalizing constant being Z . We will n 1 n + n assume throughoutthe article that there are naturalchoices for {γ¯ } andthat we can evaluate each γ point-wise. n n In addition, we do not require knowledge of Z . n 3.1 Generic SMC algorithm SMC algorithms approximate{π }p recursivelyby propagatinga collectionof properlyweighted samples,called n n=1 particles, using a combination of importance sampling and resampling steps. For the importance sampling part of the algorithm, at each step n of the algorithm, we will use general proposal kernels M with densities M , which n n possess normalizing constants that do not depend on the simulated paths. A typical SMC algorithm is given in Algorithm 1 and we obtain the following SMC approximations for π , n N πN(du )= W¯(j)δ (du ) n 1:n n u(i) 1:n 1:n j=1 X and for the normalizing constant Z : n n N 1 Z = W(j) . (4) n N k k=1(cid:26) j=1 (cid:27) Y X In this paper we will use f to be the mbultinomial distribution. Then the resampled index of the ancestor of particle i at time n, namely ai ∈ {1,...,N}, is also a random variable with value chosen with probability n−1 W¯(ain−1). Foreachtime n, we willdenote the complete collectionofancestorsobtainedfromthe resamplingstepas n−1 a¯ = (a1,...,aN) and the randomly simulated values of the state obtained during sampling (step 2 for n ≥ 2) as n n n u¯ = (u(1),...,u(N)). We will also denote a¯ ,u¯ the concatenated vector of all these variables obtained during n n n 1:p 1:p the simulations from time n=1,...,p. Note u¯ the is a vector containing all N ×p simulated states and should 1:p (1) (N) not be confused with the particle sample of the path (u ,...,u ). 1:p 1:p 6 Furthermore, the joint density of all the sampled particles and the resampled indices is N p N ψ(u¯ ,a¯ )= M (u(i)) W¯(ain−1)M (u(i)|u(ain−1),...,u(ai1)) , (5) 1:p 1:p−1 1 1 n−1 n n n−1 1 (cid:18)i=1 (cid:19)n=2(cid:18)i=1 (cid:19) Y Y Y The complete ancestral genealogy at each time can always traced back by defining an ancestry sequence bi for 1:n every i ∈ T and n ≥ 2. In particular, we set the elements of bi using the backward recursion bi = abin+1 N 1:n n n where bi = i. In this context one can view SMC approximations as random probability measures induced by the p imputedrandomgenealogya¯ andallthepossiblesimulatedstatesequencesthatcanbeobtainedusingu¯ . This 1:n 1:n interpretationof SMC approximationswas introducedin [1] and will be later used together with ψ(u¯ ,a¯ ) for 1:p 1:p−1 establishing the complex extended target distribution of PMCMC. 3.2 Multi-Level SMC implementation For different classes of problems one can find a variety of enhanced SMC algorithms; see e.g. [18]. In the context of stopped processes, a multi-level SMC implementation was proposed in [11] and the approach was illustrated for the coalescent model of Section 2.2. We consider a modified approach along the lines of Section 12.2 of [15] which seems better suitedfor generalstoppedprocessesandcanprovablyyieldestimatorsofmuchlowervariancerelative to vanilla SMC. Introduce an arbitrary sequence of F−nested sets B ⊃B ···⊃B =A, p≥2 0 1 p with the corresponding stopping times denoted as T =inf{t≥0:X ∈B }, 1≤l≤p, l t l Note that the Markov property of X implies 0≤T ≤T ≤···≤T =T. t 1 2 p The implementation of multi-level SMC differs from the generic algorithm of Section 3.1 in that between suc- cessive resampling steps one proceeds by propagating in parallel trajectories of X(j) until the set B is reachedfor 0:t n each j ∈T . For a given j ∈T the path X(j) is “frozen” once X(j) ∈B , until the remaining particles reach B N N 0:t 0:t n n and then a resampling step is performed. More formally denote for n=1 X =(x ,τ ) ∈{x ,τ : x ∈B \B , x ∈B } 1 0:τ1 1 0:τ1 1 0:τ1−1 0 1 τ1 1 where τ is a realisation for the stopping time T and similarly for 2≤n≤p we have 1 1 X = x ,τ ∈{x ,τ : x ∈B \B , x ∈B }. n τn−1+1:τn n τn−1+1:τn n τn−1+1:τn−1 n−1 n τn n (cid:0) (cid:1) Multi-level SMC is a SMC algorithm which ultimately targets a sequence of distributions {π } each defined on n a space E = {τ }×Eτn+1 (6) n n τn[∈In where n ∈ T , p ≥ 2 and I ,...,I are finite collections of positive integer values related to the stopping times p 1 p T ,...,T respectively. In the spirit of generic SMC define intermediate target densities π w.r.t to an appropriate 1 p n σ-finite dominating measure dX . We will assume there exists a natural sequence of densities {π = γn} n n Zn 1≤n≤p obeying the restriction γ ≡ γ so that the last target density γ coincides with γ in (2). Note that we define a p θ p θ sequence ofptargetdensities, butthis time the dimensionofγ comparedto γ growswith arandomincrement n n−1 of τ −τ . In addition, γ should clearly depend on the value of θ, but this suppressed in the notation. The n n−1 p following proposition is a direct consequence of the Markov property: Proposition 3.1. Assume P (T < ∞) = 1. Then the stochastic sequence defined (X ) forms a Markov θ n 1≤n≤p chain taking values in E defined (6). In addition, for any bounded measurable function h : E → R, then n n h(X )γ (dX )= h(x ,τ)γ (dx ,y,τ). En p p p τ∈I Eτ+1 0:τ θ 0:τ R The proof can bePfoundRin [15, Proposition 12.2.2, page 438], [15, Proposition 12.2.4, page 444] and the second part is due to γ =γ . p θ 7 We will present multi-level SMC based as a particular implementation of the generic SMC algorithm. Firstly we replace u ,u with X ,X respectively. Contrary to the presentation of Algorithm 1 for multi-level SMC n 1:n n 1:n we will use a homogeneous Markov importance sampling kernel M (dx |x ), where M : Θ × E → P(E), θ t t−1 θ M (dx |x )≡M (dx )byconventionandM isthecorrespondingdensityw.r.t. dx. Tocomputetheimportance θ 0 −1 θ 0 θ sampling weights of step 3 for n≥2 in Algorithm 1 we use instead: γ (X ,...,X ) w (X ,...,X )= n 1 n . n 1 n γ (X ,...,X ) τn M (x |x ) n−1 1 n−1 l=τn−1+1 θ l l−1 Q and for step 2 at n=1: γ (X ) w (X )= 1 1 . 1 1 τ1 M (x |x ) l=0 θ l l−1 To simplify notation from herein we write Q τ1 M (X )= M (x |x ) 1 1 θ l l−1 l=0 Y and given p, for any 2≤n≤p we have τn M (X |X )= M (x |x ), n n n−1 θ l l−1 l=τYn−1+1 whereagainwehavesuppressedtheθ-dependanceofM inthenotation. Wepresentthemulti-levelSMCalgorithm n in Algorithm 2. Note here we include a procedure whereby at each stage n, particles that do not reach B before n time t are rejected by assigning them a zero weight, whereas before it was hinted that resampling is performed n when all particles reach B . Similar to (5), it is clear that the joint probability density of all the random variables n used to implement a multi-level SMC algorithm with multinomial resampling is given by: N p N ψ (X¯ ,a¯ )= M (X(i)) W¯(ain−1)M (X(i)|X(ain−1)) , (7) θ 1:p 1:p−1 1 1 n−1 n n n−1 (cid:18)i=1 (cid:19)n=2(cid:18)i=1 (cid:19) Y Y Y where X¯ is defined similarly to u¯ . Finally, recall by construction Z = Z so the approximation of the 1:p 1:p p θ normalizing constant of γ for a fixed θ is θ p N 1 Z =Z = w(j) X(j) . (8) θ p N n 1:n nY=1(cid:26) Xj=1 (cid:16) (cid:17)(cid:27) b b 3.2.1 Setting the levels We will begin by showing how the levels can be set for the coalescent example of Section 2.2. We will proceed in the spirit of Section 2.2.1 and consider the backward process so that the “time” indexing is set to start from the bottom of the tree towards the root. We introduce a a collection of integers m>l >l >···>l =2 and define 1 2 p B ={x∈(Z+∪{0})d : x=y}, n=0, 0 B ={x∈(Z+∪{0})d : |x| =l }, 1≤n≤p. n 1 n Clearly we have B ⊃ B , 1 ≤ n ≤ p and B = A. One can also write the sequence of target densities for the n−1 n p multi-level setting as: m−1 d (yi)! τ1 γ (x ,τ )= i=1 I (x ) P (x |x )I (x ), 1 0:τ1 1 m−1+µ m! {y} 0 θ l−1 l {xtn∈Bn} tn Q l=1 Y τn γ (x ,τ )=γ (x ,τ ) P (x |x )I (x ), n=1,...,p. n 0:τn n n−1 0:τn−1 n−1 θ l−1 l {xtn∈Bn} tn l=τYn−1+1 8 Algorithm 2 Multi-level SMC Algorithm Initialisation, n=1: For i=1,...,N 1. For t=1,...,t : 1 (a) Sample x(i) ∼M (·|x(i) ). t θ t−1 (b) If x(i) ∈B set τ(i) =t , X(i) = x(i) ,τ(i) and go to step 2. t 1 1 1 0:τ(i) 1 (cid:18) 1 (cid:19) 2. Compute weights W(i) = γ1(X1(i))Iτ1(i)≤t1, W¯(i) = W1(i) . 1 M (X(i)) 1 N W(j) 1 1 j=1 1 For n=2,...,p, P For i=1,...,N, 1. Resampling: sample index ai ∼f(·|W¯ ), where W¯ =(W¯(1) ,...,W¯(N)). n−1 n−1 n−1 n−1 n−1 2. For t=τ(i) +1,...,t : n−1 n (a) Sample x(i) ∼M (·|x(i) ). t θ t−1 (b) If x(i) ∈B set τ(i) =t , X(i) = x(i) ,τ(i) and go to step 3. t n n n (cid:18) τn(i−)1+1:τn(i) n (cid:19) 3. Set X(i) =(X(ain−1),X(i)). 1:n 1:n−1 n 4. Compute weights W(i) =w (X(i))= γn(X1(:in))Iτn(i)≤tn , W¯(i) = Wn(i) . n n 1:n γ (X(i) )M (X(i)|X(i) ) n N W(j) n−1 1:n−1 n n 1:n−1 j=1 n P 9 Algorithm 3 Particle independent Metropolis-algorithm(PIMH) 1. SampleX¯ ,a¯ from(7)usingthemulti-levelimplementationofAlgorithm1detailedinSection(3.2)and 1:p 1:p−1 compute Z . Sample k ∼f(·|W¯ ) . p p 2. Set ξ(0)=b k(0),X¯1:p(0),a¯1:p−1(0) = k,X¯1:p,a¯1:p−1 and Zp(0)=Zp. 3. For i=1,.(cid:0)..,K: (cid:1) (cid:0) (cid:1) b b (a) Propose a new X¯′ ,a¯′ and k′ as in step 1 and compute Z′, 1:p 1:p p ′ (b) Accept this as the new state of the chain with probabiblity 1 ∧ Zcp . If we accept, set ξ(i) = Zbp(i−1) k(i),X¯ (i),a¯ (i) = k′,X¯′ ,a¯′ and Z (i) = Z′. Otherwise reject, ξ(i) = ξ(i − 1) and 1:p 1:p−1 1:p 1:p−1 p p Z (i)=Z (i−1). (cid:0)p p (cid:1) (cid:0) (cid:1) b b b b Themajordesignproblemthatremainsingeneralisthatgivenany candidatesfor{M },howtosetthespacing n,θ (in some sense) of the {B } and how many levels are needed so that good SMC algorithms can be constructed. n Thatis,ifthe{B }arefarapart,thenonecanexpectthatweightswilldegenerateveryquicklyandifthe{B }are n n too close that the algorithm will resample too often and hence lead to poor estimates. For instance, in the context of the coalescent example of Section 2.2, if one uses the above construction for {B } the importance weight at the n n-th resampling time is τn P (x |x ) w (x )= θ l−1 l I (x ), n 0:τn M (x |x ) {xτn∈Bn} τn l=τYn−1+1 θ,n l l−1 Now, in generalfor any {l }p andp it is hardto know beforehandhow muchbetter (or not) the resulting multi- n n=1 level algorithm will perform relative to a vanilla SMC algorithm. Whilst [11] show empirically that in most cases one should expect a considerable improvement, there θ is considered to be fixed. In this case one could design the levels sensibly using offline heuristics or more advanced systematic methods using optimal control [13] or adaptive simulation[8,9],e.g. bysettingthenextlevelusingthemedianofapre-specifiedrankoftheparticlesample. What weaimtoestablishinthenextsectionisthatwhenθ isvariesasinthe contextofMCMC algorithms,onecanboth construct PMCMC algorithms based on multi-level SMC and more importantly easily design for each θ different sequences for {B } based on similar ideas. n 4 Multi-Level Particle Markov Chain Monte Carlo ParticleMarkovChainMonteCarlo(PMCMC)methodsareMCMCalgorithms,whichusealltherandomvariables generatedbySMCapproximationsasproposals. As instandardMCMCthe ideaisto runanergodicMarkovchain to obtain samples from the distribution of interest. The difference lies that in order use the simulated variables from SMC, one defines a complex invariantdistribution for the MCMC on an extended state space. This extended target is such that a marginal of this invariant distribution is the one of interest. This section aims on providing insight to the following questions: 1. Is it valid in general to use multi-level SMC within PMCMC? 2. Given that it is, how can we use the levels to improve the mixing of PMCMC? Theanswertothefirstquestionseemsratherobvious,sowewillprovidesomestandardbutratherstrongconditions forwhichmulti-levelPMCMCisvalid. ForthesecondquestionwewillproposeanextensiontoPMMHthatadapts thelevelsetsusedtoθ ateveryiterationofPMCMC.[1]introducesthreedifferentandgenericPMCMCalgorithms: particle independent Metropolis Hastings algorithm (PIMH), particle marginal Metropolis Hastings (PMMH) and particle Gibbs samplers. In the remainder of the paper we will only focus on the first two of these. 4.1 Particle independent Metropolis Hastings (PIMH) Wewillbeginbypresentingthesimplestgenericalgorithmfoundin[1],namelytheparticleindependentMetropolis Hastings algorithm (PIMH). In this case θ and p are fixed and PIMH is designed to sample from the pre-specified 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.