ebook img

Probabilistic curve-aligned clustering and prediction with regression mixture models PDF

299 Pages·2004·3.02 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Probabilistic curve-aligned clustering and prediction with regression mixture models

UNIVERSITY OF CALIFORNIA, IRVINE Probabilistic Curve-Aligned Clustering and Prediction with Regression Mixture Models DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Information and Computer Science by Scott John Gaffney Dissertation Committee: Professor Padhraic Smyth, Chair Professor Michael J. Pazzani Professor Pierre Baldi 2004 (cid:1)c 2004 Scott John Gaffney The dissertation of Scott John Gaffney is approved and is acceptable in quality and form for publication on microfilm: Committee Chair University of California, Irvine 2004 ii To Andy Selden the beauty of mathematics is the reflection upon which we perceive iii TABLE OF CONTENTS LIST OF FIGURES x LIST OF TABLES xiv ACKNOWLEDGMENTS xv CURRICULUM VITAE xvi ABSTRACT OF THE DISSERTATION xvii 1 Introduction 1 1.1 Motivation for curve clustering . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Outline of dissertation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Overview of Clustering 14 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Standard clustering techniques . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 Vector-based methods . . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Pairwise distance methods . . . . . . . . . . . . . . . . . . . . 20 2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 iv 3 Curve Clustering with Regression Mixtures 24 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Clustering by density estimation . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Finite mixture models . . . . . . . . . . . . . . . . . . . . . . 26 3.2.2 Model-based clustering . . . . . . . . . . . . . . . . . . . . . . 27 3.2.3 Model-based curve clustering . . . . . . . . . . . . . . . . . . 28 3.3 Polynomial regression mixtures . . . . . . . . . . . . . . . . . . . . . 29 3.3.1 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.2 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.3 EM algorithm for PRMs . . . . . . . . . . . . . . . . . . . . . 36 3.4 Spline regression mixtures . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.2 Definition of splines . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.3 EM algorithm for SRMs . . . . . . . . . . . . . . . . . . . . . 43 3.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5 Kernel regression mixtures . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4 Random effects regression mixtures 57 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Hierarchical model structure . . . . . . . . . . . . . . . . . . . . . . . 59 4.3.1 MAP estimation . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.4 MAP-based EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 v 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 Curve Alignment in Measurement Space 70 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.2 Problem definition and prior work . . . . . . . . . . . . . . . . . . . . 72 5.2.1 Curve preprocessing . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 Translations in space . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.1 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.3.2 EM translation algorithm . . . . . . . . . . . . . . . . . . . . 82 5.4 Affine transformations in space . . . . . . . . . . . . . . . . . . . . . 86 5.4.1 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . 86 5.4.2 EM affine algorithm . . . . . . . . . . . . . . . . . . . . . . . 91 5.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.5.1 Experiments with cyclone data . . . . . . . . . . . . . . . . . 97 5.5.2 Experiments with simulated data . . . . . . . . . . . . . . . . 100 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 6 Curve Alignment in Time 105 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.2 Problem definition and prior work . . . . . . . . . . . . . . . . . . . . 106 6.3 Translations in time . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6.3.1 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . 110 6.3.2 EM time-translation algorithm . . . . . . . . . . . . . . . . . . 114 6.4 Affine transformations in time . . . . . . . . . . . . . . . . . . . . . . 123 6.4.1 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.4.2 EM affine algorithm . . . . . . . . . . . . . . . . . . . . . . . 126 6.5 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 vi 6.5.1 Experiments with gene expression data . . . . . . . . . . . . . 134 6.5.2 Comparisons with simulated data . . . . . . . . . . . . . . . . 137 6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7 Joint Space- and Time-Alignment Models 141 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.2 Joint space- and time-alignment . . . . . . . . . . . . . . . . . . . . . 142 7.2.1 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.2.2 Joint EM alignment algorithm . . . . . . . . . . . . . . . . . . 145 7.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 7.3 Multidimensional curves . . . . . . . . . . . . . . . . . . . . . . . . . 149 7.3.1 Multidimensional space-alignment regression models . . . . . . 150 7.3.2 Multidimensional time-alignment regression models . . . . . . 151 7.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 8 Curve-Aligned Clustering 154 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 8.2 Prior work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 8.3 Adding cluster dependence . . . . . . . . . . . . . . . . . . . . . . . . 159 8.3.1 Joint, marginals and log-likelihood . . . . . . . . . . . . . . . 161 8.3.2 Joint EM clustering-alignment algorithm . . . . . . . . . . . . 163 8.4 Extrapolation to other models . . . . . . . . . . . . . . . . . . . . . . 167 8.4.1 General derivation of joint clustering algorithms . . . . . . . . 167 8.5 Testing methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8.5.1 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . 175 8.5.2 Test log-likelihood . . . . . . . . . . . . . . . . . . . . . . . . 176 vii 8.5.3 Prediction squared error . . . . . . . . . . . . . . . . . . . . . 177 8.6 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 8.6.1 Identification tests . . . . . . . . . . . . . . . . . . . . . . . . 180 8.6.2 Comparisons with non-alignment methods . . . . . . . . . . . 181 8.6.3 Comparisons on joint methodology . . . . . . . . . . . . . . . 183 8.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 9 Identification, Tracking, and Clustering of ETC Cyclones 186 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 9.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 9.3 Problem definition and prior work . . . . . . . . . . . . . . . . . . . . 189 9.4 GCM model and raw dataset . . . . . . . . . . . . . . . . . . . . . . 191 9.5 Identification and tracking of cyclones . . . . . . . . . . . . . . . . . . 192 9.5.1 Cyclone identification . . . . . . . . . . . . . . . . . . . . . . . 193 9.5.2 Tracking of cyclones . . . . . . . . . . . . . . . . . . . . . . . 195 9.6 Regression models for cyclone trajectories . . . . . . . . . . . . . . . 197 9.7 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.7.1 Choosing the order of regression model . . . . . . . . . . . . . 201 9.7.2 Preprocessing techniques . . . . . . . . . . . . . . . . . . . . . 203 9.7.3 Choosing an alignment model . . . . . . . . . . . . . . . . . . 216 9.7.4 Choosing K . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 9.8 Clustering analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 9.8.1 Cluster descriptions . . . . . . . . . . . . . . . . . . . . . . . . 223 9.8.2 Temporal analysis of cyclone clusters . . . . . . . . . . . . . . 232 9.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 viii 10 Clustering Observed Tropical Cyclones 237 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 10.2 Problem definition and prior work . . . . . . . . . . . . . . . . . . . . 238 10.3 Best Track dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 10.4 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 10.4.1 Choosing the order of regression model . . . . . . . . . . . . . 243 10.4.2 Choosing the alignment model . . . . . . . . . . . . . . . . . . 244 10.4.3 Choosing K . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 10.5 Clustering analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 10.5.1 Cluster descriptions . . . . . . . . . . . . . . . . . . . . . . . . 254 10.5.2 Temporal analysis of cyclone clusters . . . . . . . . . . . . . . 261 10.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 11 Conclusion 266 References 268 Appendices 278 A EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 B Monte Carlo cross-validation . . . . . . . . . . . . . . . . . . . . . . . 280 C Matrix multivariate normal density . . . . . . . . . . . . . . . . . . . 281 ix

Description:
IRVINE. Probabilistic Curve-Aligned Clustering and Prediction for publication on microfilm: Committee Chair. University of California, Irvine. 2004.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.