MONTE CARLO METHODS FOR STRUCTURED DATA A DISSERTATION SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL AND MATHEMATICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Adam Guetz January 2012 © 2012 by Adam Nathan Guetz. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/rg833nw3954 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Susan Holmes, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Amin Saberi, Co-Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Peter Glynn Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Abstract Recent years has seen an increased need for modeling of rich data across many engi- neering and scientific disciplines. Much of this data contains structure, or non-trivial relationships between elements, that should be exploited when performing statistical inference. Sampling from and fitting complicated models present challenging com- putational issues, and available deterministic heuristics may be ineffective. Monte Carlo methods present an attractive framework for finding approximate solutions to these problems. This thesis covers two closely related techniques: adaptive impor- tance sampling, and sequential Monte Carlo. Both of these methods make use of sampling-importance resampling to generate approximate samples from distributions of interest. Sequentialimportancesamplingiswellknowntohavedifficultiesinhigh-dimensional settings. I present a technique called conditional sampling-importance resampling, an extension of sampling importance resampling to conditional distributions that improves performance, particularly when independence structure is present. The primary application is to multi-object tracking for a colony of harvester ants in a laboratory setting. Previous approaches tend to make simplifying parametric as- sumptions on the model in order to make computations more tractable, while the approach presented finds approximate solutions to more complicated and realistic models. To analyze structural properties of networks, I expand adaptive importance sampling techniques to the analysis of network growth models such as preferential attachment, using the Plackett-Luce family of distributions on permutations, and I present an application of sequential Monte Carlo to a special form of network growth model called vertex censored stochastic Kronecker product graphs. iv Acknowledgements I’d like to thank my wife Heidi Lubin, my son Levi, my principal advisor Susan Holmes, my co-advisor Amin Saberi, my parents, and all of my friends and extended family. v Contents Abstract iv Acknowledgements v 1 Introduction 1 1.1 Monte Carlo Integration . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Approximate Sampling 6 2.1 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Effective Sample Size . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Sampling Importance Resampling . . . . . . . . . . . . . . . . 11 2.2 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Metropolis Hastings . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.3 Gibbs Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.4 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2.5 Hit-and-Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3 Sequential Monte Carlo 18 3.1 Sequential Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Sequential Importance Sampling . . . . . . . . . . . . . . . . . . . . . 20 3.3 Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 vi 4 Adaptive Importance Sampling 24 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.1 Variance Minimization . . . . . . . . . . . . . . . . . . . . . . 25 4.1.2 Cross-Entropy Method . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Avoiding Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Related Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3.1 Annealed Importance Sampling . . . . . . . . . . . . . . . . . 30 4.3.2 Population Monte Carlo . . . . . . . . . . . . . . . . . . . . . 31 5 Conditional Sampling Importance Resampling 33 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.2 Conditional Resampling . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.2.1 Estimating Marginal Importance Weights . . . . . . . . . . . . 36 5.2.2 Conditional Effective Sample Size . . . . . . . . . . . . . . . . 36 5.2.3 Importance Weight Accounting . . . . . . . . . . . . . . . . . 37 5.3 Example: Multivariate Normal . . . . . . . . . . . . . . . . . . . . . . 38 6 Multi-Object Particle Tracking 43 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.1.1 Single Object Tracking . . . . . . . . . . . . . . . . . . . . . . 43 6.1.2 Multi Object Tracking . . . . . . . . . . . . . . . . . . . . . . 45 6.1.3 Tracking Notation . . . . . . . . . . . . . . . . . . . . . . . . 46 6.2 Conditional SIR Particle Tracking . . . . . . . . . . . . . . . . . . . . 47 6.2.1 Grouping Subsets for Multi-Object Tracking . . . . . . . . . . 48 6.3 Application: Tracking Harvester Ants . . . . . . . . . . . . . . . . . . 49 6.3.1 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . 49 6.3.2 Observation Model . . . . . . . . . . . . . . . . . . . . . . . . 51 6.3.3 State-Space Model . . . . . . . . . . . . . . . . . . . . . . . . 53 6.3.4 Importance Distribution . . . . . . . . . . . . . . . . . . . . . 54 6.3.5 Computing Relative and Marginal Importance Weights . . . . 62 6.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.4.1 Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . . 64 vii 6.4.2 Short Harvester Ant Video . . . . . . . . . . . . . . . . . . . . 65 7 Network Growth Models 70 7.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.1.1 Erd¨os-R´enyi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.1.2 Preferential Attachment . . . . . . . . . . . . . . . . . . . . . 73 7.1.3 Duplication/Divergence . . . . . . . . . . . . . . . . . . . . . 75 7.2 Computing Likelihoods with Adaptive Importance Sampling . . . . . 75 7.2.1 Marginalizing Vertex Ordering . . . . . . . . . . . . . . . . . . 78 7.2.2 Plackett-Luce Model as an Importance Distribution . . . . . . 79 7.2.3 Choice of Description Length Function . . . . . . . . . . . . . 80 7.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 7.3.1 Modified Preferential Attachment Model . . . . . . . . . . . . 81 7.3.2 Adaptive Importance sampling . . . . . . . . . . . . . . . . . 82 7.3.3 Annealed Importance sampling . . . . . . . . . . . . . . . . . 82 7.3.4 Computational Effort . . . . . . . . . . . . . . . . . . . . . . . 83 7.3.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . 84 8 Kronecker Product Graphs 91 8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 8.2 Stochastic Kronecker Product Graph model . . . . . . . . . . . . . . 94 8.2.1 Likelihood under Stochastic Kronecker Product Graph model . 94 8.2.2 Sampling Permutations . . . . . . . . . . . . . . . . . . . . . . 96 8.2.3 Computing Gradients . . . . . . . . . . . . . . . . . . . . . . . 96 8.3 Vertex Censored Stochastic Kronecker Product Graphs . . . . . . . . 97 8.3.1 Importance Sampling for Likelihoods . . . . . . . . . . . . . . 98 8.3.2 Choosing Censored Vertices . . . . . . . . . . . . . . . . . . . 100 8.3.3 Sampling Permutations . . . . . . . . . . . . . . . . . . . . . . 100 8.3.4 Multiplicative Attribute Graphs . . . . . . . . . . . . . . . . . 101 8.4 Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 8.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 102 viii List of Tables 6.1 Observation event types. . . . . . . . . . . . . . . . . . . . . . . . . . 53 7.1 Comparison of estimators for sparse 500 node preferential attachment dataset from Figure 7.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7.2 Comparison of estimators for dataset: 5 networks, 30 nodes each, av- erage degree 2, 20 samples each method . . . . . . . . . . . . . . . . . 86 7.3 Comparison of estimators for dataset: 2 networks, 100 nodes each, average degree 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7.4 Estimatedlog-likelihoodsforMus Musculus protein-proteininteraction networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 ix List of Figures 3.1 Dependence structure of hidden Markov models . . . . . . . . . . . . 19 5.1 CSIR Normal example: eigenvalues of covariance matrices . . . . . . 40 5.2 CSIR Normal example: estimate KL-Divergences . . . . . . . . . . . 41 5.3 Same experiments as in Figure 5.2, plotted by method. . . . . . . . . 42 6.1 Example grouping subset functions . . . . . . . . . . . . . . . . . . . 49 6.2 Blob bisection via spectral partitioning . . . . . . . . . . . . . . . . . 52 6.3 Association of objects with observations. ’Events’ correspond to con- nected components in this bipartite graph, including Normal obser- vations, splitting, merging, false positives, false negatives, and joint events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.4 “True” distribution of path lengths and trajectories per frame, simu- lated example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.5 Centroid observations per frame, simulated example. . . . . . . . . . 66 6.6 Distribution of path lengths and trajectories per frame using a sample from the importance distribution, simulated example. . . . . . . . . . 67 6.7 Distribution of path lengths and trajectories per frame using CSIR, simulated example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.8 GemVident screenshot, showing centroids. . . . . . . . . . . . . . . . 68 6.9 Centroid observations per frame from Harvester ant example. . . . . . 68 6.10 Distribution of path lengths and trajectories per frame using a sample from the importance distribution, Harvester ant example. . . . . . . . 69 x
Description: