ebook img

TWO MODELS INVOLVING BAYESIAN NONPARAMETRIC TECHNIQUES By SUBHAJIT ... PDF

175 Pages·2013·1.8 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview TWO MODELS INVOLVING BAYESIAN NONPARAMETRIC TECHNIQUES By SUBHAJIT ...

TWO MODELS INVOLVING BAYESIAN NONPARAMETRIC TECHNIQUES By SUBHAJIT SENGUPTA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2013 (cid:13)c 2013 Subhajit Sengupta 2 This dissertation is dedicated to my Maa and Baba for their endless support and love 3 ACKNOWLEDGMENTS After completing a wonderful voyage of six and half years, I would like to take this opportunity to thank all the people who accompanied me in this memorable long journey. Without their support and encouragement, it would be impossible for me to come to this point. First of all, I would like to thank my advisors and mentors Dr. Arunava Banerjee and Dr. Je(cid:11)rey Ho without whom I would not be here to write my dissertation in the (cid:12)rst place. This is probably the (cid:12)rst time I am being so formal as to thank them. I owe them a debt of gratitude for their patience, immense support, kindness and belief in me. Arunava is the person who instilled the desire to do meaningful research. He has amazing motivational prowess. There is no way I can (cid:12)nish talking about him within a paragraph. I am especially thankful to Je(cid:11) for being so generous with his time for last two and half years. We had countless interesting and exciting discussion which I will be missing the most while I will not be in Gainesville. His appetite for knowledge should be a dream for every young researcher. I feel really blessed to have Arunava and Je(cid:11) as my teachers, from whom I will never (cid:12)nish learning in my entire life. I would like to thank all other members of my committee Dr. Paul Gader, Dr. Alireza Entezari, Dr. Malay Ghosh for spending their invaluable time for numerous helpful discussions. I would also like to thank Dr. Arunava Banerjee, Dr. Je(cid:11)rey Ho, Dr. Anand Rangarajan, Dr. Meera Sitharam, Dr. Paul Gader for their excellent courses in Computer Science department. Outside our department, I would like to thank Dr. Rosalsky, Dr. Robinson, Dr. Ghosh, Dr. Doss, Dr. Presnell, Dr. Hobert for their wonderful mathematics and statistics courses. I am also very thankful to Dr. Donald Richards from Penn State University for insightful communications over emails related to the (cid:12)rst problem that we will discuss in this dissertation. I was partially supported by a grant (IIS-0902230) from the National Science Foundation to Arunava, back in 2009-2010 and by a research assistantship under Je(cid:11) in 2011 which I gratefully acknowledge. I am also very thankful to the CISE Department for supporting me 4 through teaching assistantships (TA) over the years. It has been a great privilege to spend several years in the CISE department at University of Florida. These moments will always remain dear to me. I am thankful to John Bowers and Joan Crisman for all their help with administrative issues. I am very grateful to Wikimedia foundation for their wonderful e(cid:11)ort of creating a worldwide knowledge resource for every single human being. I would like to thank all of my house mates over the years and all my friends in US and other parts of the world. They were always the source of joy, laughter and support. I’d like to give special thanks to my friends Kiranmoy Das and Subhadip Pal with whom I spent lot of time discussing many interesting statistical problems. I would like to thank all of my lab-mates Karthik Gurumoothy, Venkatakrishnan Ramaswami, Ajit Rajwade, John Corring, Mohsen Ali, Jason Chi, Manu Sethi, Shahed Nejhum, Nathan Vanderkraats, Neko Fisher. I had an wonderful time in my lab enjoying countless interesting conversation with all of you guys. I know that I left out many names of my really good friends but you know who you are ! Finally I am thankful to my loving and caring maa (mother) and baba (father), who always believe in me, supported me every minute, teaching me the values for being a disciplined and responsible person and helped me to (cid:12)nish one of my exciting and enjoyable battles in my life. 5 TABLE OF CONTENTS page ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 CHAPTER 1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.1 Problem Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 Previous Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 BAYESIAN INFERENCE AND NONPARAMETRIC BAYESIAN FRAMEWORK . . . 19 2.1 Bayesian Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.1.1 MAP Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.2 Conjugate Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Nonparametric Bayesian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 Motivation and Theoretical Background . . . . . . . . . . . . . . . . . 21 2.2.2 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.3 De Finetti Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.4 Dirichlet Distribution and Dirichlet Process . . . . . . . . . . . . . . . 23 2.2.4.1 Posterior Distribution for DP . . . . . . . . . . . . . . . . . 25 2.2.4.2 Polya’s Urn Scheme or CRP . . . . . . . . . . . . . . . . . . 26 2.2.4.3 Stick Breaking Representation of DP . . . . . . . . . . . . . 28 2.2.4.4 Dirichlet Process Mixture Model (DPMM) . . . . . . . . . . 29 2.2.5 Beta Process (BP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.2.5.1 Completely Random Measure (CRM) . . . . . . . . . . . . . 36 2.2.5.2 Another viewpoint of CRM . . . . . . . . . . . . . . . . . . 36 2.2.5.3 BP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.2.5.4 Bernoulli process (BeP) and Indian Bu(cid:11)et Process (IBP) . . . 38 2.2.5.5 Connection to IBP . . . . . . . . . . . . . . . . . . . . . . . 39 2.2.6 BP in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.3 Markov Chain Monte Carlo Sampling . . . . . . . . . . . . . . . . . . . . . . 39 2.3.1 Metropolis-Hastings (MH) and Gibbs Sampling . . . . . . . . . . . . . 41 2.3.2 Rejection Sampling Method . . . . . . . . . . . . . . . . . . . . . . . 41 2.3.3 Adaptive Rejection Sampling (ARS) . . . . . . . . . . . . . . . . . . . 42 2.3.4 MCMC sampling Techniques for DPMM . . . . . . . . . . . . . . . . . 42 2.3.4.1 Slice Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.3.4.2 E(cid:14)cient Slice Sampling . . . . . . . . . . . . . . . . . . . . 45 6 2.4 Variational Bayes (VB) Inference . . . . . . . . . . . . . . . . . . . . . . . . 46 2.4.1 Approximate Inference Procedure . . . . . . . . . . . . . . . . . . . . 46 2.4.2 KL-Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3 GEOMETRIC AND STATISTICAL PROPERTIES OF STIEFEL MANIFOLD . . . . 49 3.1 Geometric Properties of Stiefel Manifold . . . . . . . . . . . . . . . . . . . . 49 3.1.1 Analytic Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.2 Stiefel Manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.3 Group Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.1.4 Tangent and Normal Space of Stiefel Manifold . . . . . . . . . . . . . 56 3.2 Statistical Properties of Stiefel Manifold . . . . . . . . . . . . . . . . . . . . 58 3.2.1 Probability Distribution on Stiefel Manifold . . . . . . . . . . . . . . . 58 3.2.2 Properties of Matrix Langevin Distribution . . . . . . . . . . . . . . . 59 3.2.3 Computation of the Hypergeometric Function of a Matrix Argument . . 60 3.2.4 Sampling Random Matrices from Matrix Langevin Distribution on V 61 n,p 3.2.5 The Rejection Sampling Method . . . . . . . . . . . . . . . . . . . . . 62 3.2.6 Gibbs Sampling Method . . . . . . . . . . . . . . . . . . . . . . . . . 63 4 BAYESIAN ANALYSIS OF MATRIX-LANGEVIN ON THE STIEFEL MANIFOLD . . 65 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Motivating Example: Dictionary Learning . . . . . . . . . . . . . . . . . . . . 66 4.3 The Stiefel Manifold and ML Distribution . . . . . . . . . . . . . . . . . . . 67 4.4 Parametric Bayesian Inference for the ML Distribution . . . . . . . . . . . . 68 4.4.1 Likelihood for the ML Distribution . . . . . . . . . . . . . . . . . . . 69 4.4.2 Prior for the Polar Part M . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4.3 Posterior for the Polar Part M . . . . . . . . . . . . . . . . . . . . . . 70 4.4.4 Prior for the Elliptical or Concentration Part K . . . . . . . . . . . . . 70 4.4.5 Upper and Lower Bounds for the F ((cid:1)) Function . . . . . . . . . . . . 70 0 1 4.4.5.1 A Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . 71 4.4.5.2 Lower Bounds for I (x) . . . . . . . . . . . . . . . . . . . . 74 0 4.4.5.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.4.5.4 Lower Bound for F ((cid:1)) Using Lower Bound for I (x) . . . . 76 0 1 0 4.4.6 Posterior for the Elliptical or Concentration Part D . . . . . . . . . . . 77 4.4.6.1 Rejection Sampling . . . . . . . . . . . . . . . . . . . . . . 78 4.4.6.2 Metropolis-Hastings (MH) Sampling Scheme for D . . . . . . 80 4.4.6.3 Hybrid Gibbs Sampling . . . . . . . . . . . . . . . . . . . . 80 4.4.7 Experiments on Simulated Data . . . . . . . . . . . . . . . . . . . . . 81 4.4.8 Extension of the Model to a More General K . . . . . . . . . . . . . . 81 4.4.9 Log-convexity of the Hypergeometric Function . . . . . . . . . . . . . 83 4.4.9.1 A solution . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.4.9.2 Possible ARS Sampling . . . . . . . . . . . . . . . . . . . . 85 4.5 Finite Mixture Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.6 In(cid:12)nite Mixture Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.6.1 DPM Modeling on the Stiefel Manifold . . . . . . . . . . . . . . . . . 87 7 4.6.2 MCMC Inference Scheme . . . . . . . . . . . . . . . . . . . . . . . . 88 4.6.3 Variational Bayes Inference (VB) on Stiefel Manifold . . . . . . . . . . 88 4.6.3.1 Matrix-Langevin distributions . . . . . . . . . . . . . . . . . 90 4.6.3.2 Update equation for (cid:13) . . . . . . . . . . . . . . . . . . . . 90 t 4.6.3.3 Update equation for (cid:28) . . . . . . . . . . . . . . . . . . . . 91 t 4.6.3.4 CG for minimizing F((cid:28)) on the Stiefel manifold . . . . . . . 92 4.6.3.5 Update equation for (cid:30) . . . . . . . . . . . . . . . . . . . . 93 n,t 4.6.3.6 Calculated KL-Divergence . . . . . . . . . . . . . . . . . . . 94 4.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.7.1 Experiments on Synthetic Data . . . . . . . . . . . . . . . . . . . . . 95 4.7.2 Categorization of Objects . . . . . . . . . . . . . . . . . . . . . . . . 96 4.7.3 Classi(cid:12)cation of Outdoor Scenes . . . . . . . . . . . . . . . . . . . . . 98 5 BETA-DIRICHLET PROCESS AND CATEGORICAL INDIAN BUFFET PROCESS . 101 5.1 Multivariate Liouville Distributions . . . . . . . . . . . . . . . . . . . . . . . 101 5.2 Beta-Dirichlet (BD) Distribution . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3 Normalization Constant by Liouville Extension of Dirichlet Integral . . . . . . 103 5.4 BD Distribution Conjugacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.4.1 With Multinomial Likelihood . . . . . . . . . . . . . . . . . . . . . . . 104 5.4.2 With Negative Multinomial Likelihood . . . . . . . . . . . . . . . . . . 105 5.5 Completely Random Measure (CRM) Representation . . . . . . . . . . . . . . 106 5.5.1 Another Viewpoint for CRM . . . . . . . . . . . . . . . . . . . . . . . 108 5.5.2 Campbell’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.6 Beta Dirichlet Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.6.1 BP construction by taking limit from discrete case . . . . . . . . . . . 110 5.6.2 Construction of BDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.7 Multivariate CRM (MCRM) Representation of BDP . . . . . . . . . . . . . . 112 5.8 Beta-Dirichlet process as a Poisson process . . . . . . . . . . . . . . . . . . 114 5.9 A Size-biased Construction for Levy representation of BDP . . . . . . . . . . . 115 5.10 BD-Categorical Process Conjugacy . . . . . . . . . . . . . . . . . . . . . . . 118 5.11 Categorical Process (CaP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.11.1 Conjugacy for CaP and BDP { CRM Formulation . . . . . . . . . . . . 119 5.11.2 BDCaP Conjugacy With Standard Parametrization . . . . . . . . . . . 122 5.11.3 BDCaP Conjugacy Using Alternative Parametrization for BDP in the Base Measure (G) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.12 BDCaP - Conjugacy - Proof Statement . . . . . . . . . . . . . . . . . . . . . 124 5.13 Extension of Indian Bu(cid:11)et Process . . . . . . . . . . . . . . . . . . . . . . . 126 5.14 Extension of Finite Feature Model and the Limiting Case . . . . . . . . . . . . 127 5.15 BD process(BDP) and Categorical Indian Bu(cid:11)er Process (cIBP) . . . . . . . . 132 5.15.1 Symmetric Dirichlet . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.15.2 Asymmetric Dirichlet . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.15.3 Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.16 BD-NM Conjugacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 5.16.1 Negative Multinomial Process (NMP) . . . . . . . . . . . . . . . . . . 138 8 5.16.2 Conjugacy for NMP and BDP { CRM Formulation . . . . . . . . . . . 139 5.16.3 Formal Proof of Conjugacy of NMP for BDP . . . . . . . . . . . . . . 141 5.16.3.1 Prior Part . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.16.3.2 Induced Measure . . . . . . . . . . . . . . . . . . . . . . . . 148 5.16.3.3 Marginal Distribution of Y(cid:22) Via Marked Poisson Process . . . 151 5.16.3.4 Checking the Integration . . . . . . . . . . . . . . . . . . . 153 5.16.4 The Case When (cid:23) = 1 . . . . . . . . . . . . . . . . . . . . . . . . . 154 5.17 Beta-Dirichlet-Negative Multinomial Process as a Marked Poisson Process . . 155 5.18 Experiment with Simulated Data and Results . . . . . . . . . . . . . . . . . . 157 5.18.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 5.19 Inference for BDNM Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.19.1 Negative Multinomial likelihood is conjugate to Beta Dirichlet prior . . 160 5.19.2 Negative Multinomial as a mixture of Gamma and multivariate independent Poisson(MIP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.19.3 Posterior inference with Finite approximate Gibbs sampler . . . . . . . 162 5.19.3.1 BD draws . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 5.19.3.2 Negative multinomial draws . . . . . . . . . . . . . . . . . . 163 5.19.3.3 Gamma-Poisson conjugacy . . . . . . . . . . . . . . . . . . 164 5.19.3.4 Inference steps . . . . . . . . . . . . . . . . . . . . . . . . . 164 5.19.3.5 Sampling z ,y and c . . . . . . . . . . . . . . . . . . 165 d,n d,n d,n 5.19.3.6 Sampling A(cid:22) . . . . . . . . . . . . . . . . . . . . . . . . . 165 d,k 5.19.3.7 Sampling b . . . . . . . . . . . . . . . . . . . . . . . . . 165 0,k 5.19.3.8 Sampling ! . . . . . . . . . . . . . . . . . . . . . . . . . . 166 k 6 DISCUSSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.1 Future work Related to the First Problem . . . . . . . . . . . . . . . . . . . . 167 6.2 Future work Related to the Second Problem . . . . . . . . . . . . . . . . . . 168 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 9 LIST OF TABLES Table page 4-1 Results for synthetic data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4-2 Actual and Estimated number of cluster and accuracy with real data having di(cid:11)erent number of clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 10

Description:
communications over emails related to the first problem that we will discuss I'd like to give special thanks to my friends Kiranmoy Das and Subhadip Pal my lab-mates Karthik Gurumoothy, Venkatakrishnan Ramaswami, Ajit .. 4-2 Actual and Estimated number of cluster and accuracy with real data
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.