Sampling-Based Likelihood Approximations for Infectious Disease Models and Other Related Topics by Rajat Malik A Thesis Presented to The University of Guelph In partial ful(cid:12)lment of requirements for the degree of Doctor of Philosophy in Mathematics & Statistics (Applied Statistics) Guelph, Ontario, Canada ' Rajat Malik, March 2015 ABSTRACT Sampling-Based Likelihood Approximations for Infectious Disease Models and Other Related Topics Rajat Malik Advisor: University of Guelph, 2015 Dr. Rob Deardon Deardon et al. (2010) describe a class of individual-level models (ILMs), (cid:12)tted in a Bayesian framework using Markov chain Monte Carlo (MCMC) techniques. They are used to model the spread of infectious diseases in discrete time. A key feature of these ILMs is that they take into account covariate information on susceptible and infectious individuals as well as shared covariate information such as geography or contact measures. These models quantify probabilistic outcomes regarding the risk of infection. ILMs are developed and (cid:12)tted to data sets from two studies on in(cid:13)uenza transmis- sion within households in Hong Kong during 2008{2009 and 2009{2010. The goal is to estimate the effect of vaccination on infection risk and choose a model that best (cid:12)ts the infection data. The infectious pressure exerted on susceptible individuals de(cid:12)nes the hazard rate (in survival analysis terminology) for individuals. Unfortunately, quantifying this in- fectious pressure for each individual over time can be computationally burdensome, leading to a time-consuming likelihood calculation and, thus, MCMC-based analysis. Therefore, weintroducesamplingmethodstospeed-upthecalculationofthelikelihood function. We compare a simple random sampling scheme with a number of spatially- strati(cid:12)ed sampling approaches. The performances of the sampling-based likelihood approximations are tested and compared via simulation studies, and using data from the 2001 foot-and-mouth disease (FMD) epidemic in the U.K. DataaugmentationisatechniqueusedinBayesianinferencethatallowstheparam- eter set to be augmented by parameters representing missing or censored data. Here, infection times are treated as missing information. The problem of computation time worsens when using data augmentation to allow for uncertainty in infection times due to a signi(cid:12)cant increase in the number of times the likelihood function is calculated at each MCMC step. Therefore, we expand the data-sampling-based likelihood ap- proximating algorithms and develop sampling methods that allow for data augmented infection times parameters. Once again, a simple random sampling approach is ini- tially considered followed by various spatially-strati(cid:12)ed schemes. We test and compare the performances of our methods using simulated data, and data from the 2001 FMD epidemic in the U.K. Dedicated to Mom, Dad, Rahul, and my loving wife, Shikha. Love you all! iv ACKNOWLEDGEMENTS I wish to thank and extend my sincerest gratitude to my advisor, Dr. Rob Deardon for giving me the opportunity to work with him over the last few years. Your tireless patience, encouragement, support, and guidance through my Ph.D. has been truly appreciated and I am forever grateful. I would also like to thank Dr. Grace P. S. Kwong for her guidance with writing each of my papers. Thank you to Dr. Peter Macdonald, and Dr. Zeny Feng for being membersofmyadvisorycommittee. Yoursupportandadvicehashelpedmeimmensely. To my Mom, Dad, and brother: thank you for always encouraging me to go the extramileandnevergiveup. Finally,toShikha: wordscannotdescribehowsupportive, encouraging, and loving you have been to me since we have known each other. This research was funded by the Ontario Ministry of Agriculture, Food and Rural Affairs (OMAFRA) and by the Natural Sciences and Engineering Research Council of Canada (NSERC). Computer equipment was provided by the Canada Foundation for Innovation (Centre for Public Health and Zoonoses { CPHAZ). v Contents 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Infectious Disease Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2.1 Individual-Level Modeling of Infectious Disease Spread . . . . . . 2 1.3 Bayesian Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3.1 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Markov Chain Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . 4 1.4.1 Metropolis-Hastings Algorithm . . . . . . . . . . . . . . . . . . . 5 1.4.2 Random Walk Metropolis-Hastings Algorithm . . . . . . . . . . . 6 1.4.3 Independence Sampler . . . . . . . . . . . . . . . . . . . . . . . . 7 1.5 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Individual-Level Modeling of the Spread of In(cid:13)uenza within House- holds 10 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Infectious Disease Modeling . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Individual-Level Models . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.2 Bayesian Methodology . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Homogeneous Mixing Model. . . . . . . . . . . . . . . . . . . . . 14 2.3.2 Household Infections . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.3 Age-dependent Susceptibility . . . . . . . . . . . . . . . . . . . . 15 2.3.4 Vaccination-dependent Susceptibility and Transmissibility . . . . 15 2.3.5 Household Vaccination-dependent Susceptibility . . . . . . . . . 16 2.4 Study Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.2 Data Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Pilot Study Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.1 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.6 Full Study Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6.1 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.2 Model Fitting Results using Vague Priors . . . . . . . . . . . . . 22 vi 2.6.3 Model Fitting Results using Informative Priors . . . . . . . . . . 24 2.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Parameterizing Spatial Models of Infectious Disease Spread Using Sampling-Based Likelihood Approximations 29 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1 General ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.2 Spatial ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3 FMD-ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.4 Bayesian MCMC Framework . . . . . . . . . . . . . . . . . . . . 34 3.3 Sampling-Based Likelihood Approximations . . . . . . . . . . . . . . . . 35 3.3.1 Simple Random Sampling . . . . . . . . . . . . . . . . . . . . . . 35 3.3.2 Spatially-Strati(cid:12)ed Sampling . . . . . . . . . . . . . . . . . . . . 37 3.4 Infectious Disease Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 Epidemic Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.2 FMD Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.5 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.1 Simulation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5.2 FMD Model Fitting Results . . . . . . . . . . . . . . . . . . . . . 44 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4 Parameterizing Spatial Models of Infectious Disease Transmission that Incorporate Uncertainty in Infection Times Using Sampling- Based Likelihood Approximations 52 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.1 General Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.2.2 Spatial ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.3 FMD-ILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.4 Bayesian Computation . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2.5 MCMC Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.3 Data Augmentation and Sampling-Based Likelihood Approximations . . 61 4.3.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.3.2 Simple Random Sampling Algorithm . . . . . . . . . . . . . . . . 64 4.3.3 Spatially-Strati(cid:12)ed Sampling Algorithm . . . . . . . . . . . . . . 66 4.4 Epidemic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.2 FMD Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4.3 Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5 Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5.1 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5.2 FMD Model Fitting Results . . . . . . . . . . . . . . . . . . . . . 75 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 vii 5 Conclusions 81 5.1 Individual-Level Modeling of the Spread of In(cid:13)uenza within Households 81 5.1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.1.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2 ParameterizingSpatialModelsofInfectiousDiseaseSpreadUsingSampling- Based Likelihood Approximations . . . . . . . . . . . . . . . . . . . . . . 83 5.2.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.2.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3 Parameterizing Spatial Models of Infectious Disease Transmission that Incorporate Uncertainty in Infection Times Using Sampling-Based Like- lihood Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Bibliography 87 Appendix 91 viii List of Tables 2.1 Approximate marginal distribution of each ILM parameter after model (cid:12)tting. For the Gamma distribution, a is a shape parameter and b is a rate parameter. For the Exponential distribution, (cid:21) is a rate parameter. 20 2.2 Summary of (cid:12)tting our ILMs using vague priors to the full study con- ducted during the 2009{2010 in(cid:13)uenza season in Hong Kong. . . . . . . 24 2.3 Summary of (cid:12)tting our ILMs to the full study conducted during the 2009{2010 in(cid:13)uenza season in Hong Kong. Prior choices were based on the posterior results from the pilot study analysis. . . . . . . . . . . . . 25 3.1 Summary statistics comparing model parameter estimation. The results are averaged over 10 different epidemics simulated from the spatial ILM with parameter values (cid:11) = 1:0;(cid:12) = 2:0;(cid:13) = 1, and n = 625. Here, CIs I are the mean credible interval limits. . . . . . . . . . . . . . . . . . . . 42 3.2 Average computation run times (in seconds) of (cid:12)tting the spatial ILM, SRS-ILM, and the SSS-ILM to 10 different simulated epidemics. These epidemicsweresimulatedusingILMparameters(cid:11) = 1:0;(cid:12) = 2:0;(cid:13) = 1; I and n = 625. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 Summary of results from (cid:12)tting the FMD-ILM to the FMD data. We compare the results across our different sampling methods. Note that for spatial strati(cid:12)cation, we sample using (cid:26) = 0:25 from each stratum. . 48 3.4 Computationruntimes(inseconds)of(cid:12)ttingtheFMD-ILMtotheFMD data using various sampling techniques. . . . . . . . . . . . . . . . . . . 49 4.1 Average computation run times (in hours) of (cid:12)tting the data augmented spatial ILM, SRS-ILM, and the SSS-ILM to 10 different simulated epi- demics. These epidemics were simulated using ILM parameters (cid:11) = 1:4, (cid:12) = 2:3, (cid:21) = 1, and n = 625. . . . . . . . . . . . . . . . . . . . . . . . 75 z 3 4.2 Computation run times (in hours) of (cid:12)tting the data augmented FMD- ILM to the FMD data using various sampling techniques. Models pro- duced 20,000 realizations from the posterior. . . . . . . . . . . . . . . . 79 5.1 Summary statistics from the simulation studies comparing model pa- rameter estimation across our different sampling schemes. The results are averaged over 10 different epidemics simulated from the data aug- mented spatial ILM with parameter values (cid:11) = 1:4;(cid:12) = 2:3;(cid:21) = 1, and z 3 n = 625. Here, CIs are the mean credible interval limits. . . . . . . . . 92 ix 5.2 Summary of results from (cid:12)tting the data augmented FMD-ILM to the FMD data. We compare the results across our different sampling meth- ods. Note that for spatial strati(cid:12)cation, we sample (cid:26) = 0:50 from each stratum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 x
Description: