Tutorial on ABC rejection and ABC SMC for parameter estimation and model selection Tina Toni, Michael P. H. Stumpf [email protected], [email protected] 0 1 In this tutorial we schematically illustrate four algorithms: 0 2 1. ABC rejection for parameter estimation [1, 2], n a 2. ABC SMC for parameter estimation [3, 4, 5], J 9 3. ABC rejection for model selection on the joint space [6], 1 ] 4. ABC SMC for model selection on the joint space [7]. O C We suggest to read this tutorial from the beginning. We start with a detailed t. explanation of the ABC rejection algorithm, which later helps to understand ABC SMC a as it is based on the same concepts. Also, both model selection algorithms are closely t s [ related to parameter estimation algorithms and it is therefore helpful to understand those first. 2 v 2 This tutorial forms a part of the supplementary material of the paper ”Simulation- 7 based model selection for dynamical systems in systems and population biology, 4 4 Bioinformatics, 26 (1), 104-110, 2010” (T. Toni, M. P. H. Stumpf). . 0 1 9 0 : v i X r a 1 o o o o o oo oo o o o o o o o o o o o o o o o o o o o o oo oo o o o ABCfordynamicalsystems ABCframeworkfordynamicalsystems ABC rejection (a) We define a prior distribution P(θ) x andwewouldliketoapproximatetheposterior x x distribution P(θ D ). We start by sampling a x 0 parameter θ∗ fro|m the prior distribution. We time call this sampled parameter a particle. ABCfordynamicalsystems ABC framework for dynamical systems (b) We simulate a data set D∗ according Prior Posterior to some simulation framework f(D θ∗). In P(θ) P(θ|D0) | (a) our examples we use different simulation frameworks. If we simulate a deterministic ox dynamical model, we add some noise at the ox time points of interest. If we simulate a ox ox stochastic dynamical model, we do not add ABCfortidmynaemicalsystems any additional noise to the trajectories. We ABCframeworkfordynamicalsystems (b) compare the simulated data set D∗ (circles) to the experimental data D (crosses) using 0 a distance function, d, and toPlreiorrance (cid:15); iPfosterior ox d(D0,D∗) ≤ (cid:15), we accept θ∗. TP(hθe) toleranceP(θ|D) ox ox ox (cid:15) 0 is the desired level of agreement between time ≥ D and D∗. 0 (c) The particle θ∗ is accepted because Prior Posterior D∗ and D0 are sufficiently close. APBC(fθor)dynamicalsystems P(θ|D0) ABCframeworkfordynamicalsystems (c) (d) We sample another parameter θ∗ from the priordistributionandsimulateacorresponding o dataset D∗. In this case D∗ and D0 are very ox ox different and we reject the particle (we ”throw x x o it away”). ABCfordynamicalsystems time ABC framework for dynamical systems (e) We repeat the whole procedure until Prior Posterior N particles have been accepted. They repre- sent a sample from P(θ d(D ,D∗) (cid:15)), which P(θ) P(θ|D0) 0 ABCframeworkfordynamicalsystems | ≤ (d) approximates the posterior distribution. If x (cid:15) is sufficiently small then the distribution x TinaToni ABCSMC x26/06/2009 3/3 P(θ d(D ,D∗) (cid:15)) will be a good approxi- x 0 mat|ion for the≤“true” posterior distribution, time P(θ D0). x | x x x (f) Many particles were rejected in the Prior Posterior time procedure, for which we have spent a lot Pof(θ) Prior P(θD)Posterior computational effort for simulation. ABC P(θ) | P(θ|D0) rejection is therefore computationally ineffi- (e) (f) cient. We can use ABC SMC to reduce the computational cost. Figure1: SchematicrepresentationofABCre- jection. 2 TinaToni ABCSMC 26/06/2009 3/3 TinaToni ABCSMC 26/06/2009 3/3 August2,2009 1/1 TinaToni ABCSMC 26/06/2009 4/24 TinaToni ABCSMC 26/06/2009 4/24 Population1 Population2 PopulationT Population2 PopulationT ABCSMC(SequentialMonteCarlo) ABC SMC (Toni et al., 2009) (cid:15)1 (cid:15)2 ... (cid:15)T 1 (cid:15)T − (a) As in ABC rejection, we define a prior distributionP(θ)andwewouldliketoapproxi- mateaposteriordistributionP(θ D ). InABC 0 | SMC we do this sequentially by constructing intermediate distributions, which converge to the posterior distribution. We define a Prior IntermediateDistributions Posterior tolerance schedule (cid:15) >(cid:15) >...(cid:15) 0. 1 2 T ≥ (a) ABCSMC(SequentialMonteCarlo) TinaToni,MichaelStumpf ABCdynamicalsystems 03/07/08 1/1 Population1 (b) We sample particles from a prior distribu- (cid:15)1 (cid:15)2 ... (cid:15)T 1 (cid:15)T − tionuntilN particleshavebeenaccepted(have reached the distance smaller than (cid:15) ). For all 1 accepted particles we calculate weights (see [4] for formulas and derivation). We call the sample of all accepted particles ”Population 1”. Prior IntermediateDistributions Posterior (b) ABCSMC(SequentialMonteCarlo) TinaToni,MichaelStumpf ABCdynamicalsystems 03/07/08 1/1 (c) We then sample a particle θ∗ from popu- Population1 (cid:15)1 (cid:15)2 ... (cid:15)T 1 (cid:15)T lation 1 and perturb it to obtain a perturbed − particle θ∗∗ K(θ θ∗), where K is a per- ∼ | turbation kernel (for example a Gaussian random walk). We then simulate a dataset D∗ f(D θ∗∗) and accept the particle θ∗∗ ∼ | if d(D ,D∗∗) (cid:15) . We repeat this until we 0 2 ≤ have accepted N particles in population 2. We calculate weights for all accepted particles. Prior IntermediateDistributions Posterior (c) ABCSMC(SequentialMonteCarlo) TinaToni,MichaelStumpf ABCdynamicalsystems 03/07/08 1/1 (d) We repeat the same procedure for the Population1 Population2 PopulationT (cid:15)1 (cid:15)2 ... (cid:15)T 1 (cid:15)T following populations, until we have accepted − N particles of the last population T and calculated their weights. Population T is a sample of particles that approximates the posterior distribution. ABC SMC is computationally much more Prior IntermediateDistributions Posterior efficient than ABC rejection (see [4] for comparison). (d) FigureTinaT2on:i,MichSaelcSthumpefmaticABCrdeynpamricaelssysteemnstation of 0A3/07B/08C1/1 SMC. 3 o o o o o o o o ABC framework for dynamical systems ABC rejection for model selection A sample from the prior distribu2on (white) and accepted par2cles (blue) x m m m x1 2 3 θ θ θ θx θ θ θ θ θ θ θ θ θ θ θ x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 time Prior Posterior P(m,θ) P(m,θD ) 0 | (a) (b) 0 1. Approximation of P(m|D0) 8 0. 6 0. 4 0. 2 0. 0 0. 1 2 3 Model (Mc)d l Figure 3: (a) Prior and posterior distributions, P(m,θ) and P(m,θ D ), are now defined on a 0 | jointmodelandparameterspace. (b)Particles(m,θ)aresampledfromthepriordistributionand accepted/rejected according to the distance between the simulated and experimental datasets. The accepted particles are shown in dark blue. (c) Six particles have been accepted: one from model 1 and five from model 2. The approximated marginal posterior probability of the model can be calculated as P(m = 1D ) = 1, P(m = 2D ) = 5, P(m = 3D ) = 0 . For illustrative | 0 6 | 0 6 | 0 6 purposes we have chosen a small number of particles. In principle this algorithm will yield consistent marginal posterior model distributions for N . →∞ 4 August2,2009 1/1 ABC SMC for model selection 0 8 1. Population 1 0. 6 A sample from the prior distribu2on and accepted par2cles of Pop1 0. 4 m m m 0. θ θ θ1 θ θ θ θ θ2 θ θ θ θ θ 3 θ θ 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0. 0 0. 1 2 3 Mdl (a) (b) Popula’on t‐1 m m m 1 2 3 θ1 θ2 θ3 θ4 θ5 θ6 m* = m 2 m** ~ KM(m|m*) t m** =mm m13 3 m2 m3 0.0 0.2 0.4 0.6 0.8 1.0 Prior Population 1 1 2 3 1 2 3 θ θ 5 6 SθAIfa* camc*ce cp~pe ltKep/P tr(eetm,mdje**, c**c t(,a θ (θlm|c*uθ)*l* a*f)rt, oe θm t*h *{e)(. mw *e*ig, hθt)s}.t ‐1. 0.0 0.2 0.4 0.6 0.8 1.0 1 P o p u 2la t i o n 2 3 1 P o p u l2a t i o n 3 3 Model Model (c) (d) 5 Figure 4: (a) Particles are sampled from the prior P(m,θ) until N particles have been accepted (in this example N = 6 for illustration purposes, in practice N should be larger), that is the distance is smaller than (cid:15) . The weights w are calculated for all accepted particles and 1 t=1 normalized. (b) To obtain the marginal probability of m, we sum over all the weights corresponding to the model of interest: P (m(cid:48)) = (cid:80) w (m,θ). The histogram of the marginal t=1 m=m(cid:48) t=1 population 1 is presented. (c) This figure shows how we propose and accept a particle of population t, t = 2,...,T. We sample a model m∗ from population t 1 with probability P (m∗). For example, we might t−1 − have sampled m∗ = m . We then perturb the model using a model perturbation kernel KM 2 t to obtain m∗∗ KM (mm∗), for example m∗∗ = m . After we have obtained the model t 3 ∼ | m∗∗, we sample a parameter θ∗ belonging to model m∗∗ from population t 1 and perturb − it to obtain θ∗∗ KPt,m∗∗(θ θ∗). We simulate a dataset D∗ for a particle (m∗∗,θ∗∗) and ∼ | accept (d(D ,D∗) (cid:15) ) or reject the particle. If a particle is accepted, we calculate its weight 0 t ≤ w (m∗∗,θ∗∗). t (d) When N particles of population t have been accepted, we normalize the weights and marginalize them in order to obtain the marginal intermediate population of the model P (m). t We continue until population T, which is the approximation of the joint posterior distribution P(m,θ D ). The quantities of interest are the intermediate and the last marginal population of 0 | the model. In this example there are three populations, T =3. 6 References [1] BeaumontMA,ZhangWandBaldingDJ. ApproximateBayesiancomputationinpopulationgenet- ics. Genetics, 162(4):2025–2035, 2002. [2] Marjoram P, Molitor J, Plagnol V and Tavare S. Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci USA, 100(26):15324–8, 2003. [3] SissonSA,FanYandTanakaMM. SequentialMonteCarlowithoutlikelihoods. Proc Natl Acad Sci USA, 104(6):1760–5, 2007. [4] Toni T, Welch D, Strelkowa N, Ipsen A and Stumpf MPH. Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface, 6:187–202, 2009. [5] Beaumont MA, Cornuet JM, Marin JM and Robert CP. Adaptive approximate Bayesian computa- tion. Biometrika, 96(4):983–990, 2009. [6] GrelaudA,RobertCP,MarinJM,RodolpheFandTalyJF. ABClikelihood-freemethodsformodel choice in Gibbs random fields. Bayesian Analysis, 4(2):317–336, 2009. [7] Toni T and Stumpf MPH. Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics, 26(1):104–10, 2010. 7