UNIVERSITY OF CALIFORNIA, IRVINE Modeling and Alignment of Biological Networks DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Information and Computer Science by Oleksii Kuchaiev Dissertation Committee: Professor Nataˇsa Prˇzulj, Chair Professor Rina Dechter Professor Wayne Hayes Professor Zoran Nenadi´c 2010 ⃝c 2010 Oleksii Kuchaiev Dedication To my high school math teacher and a very good friend of mine, Nikolaj Pihtar, who fostered my interest in science. Contents List of (cid:12)gures v List of tables vii Acknowledgments viii Curriculum vitae ix Abstract of the dissertation xii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Types of biological networks . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Current challenges in biological network research . . . . . . . . . . . . 9 1.5 Dissertation contributions and outline . . . . . . . . . . . . . . . . . . 11 2 Background 13 2.1 Graph properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Network Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Network alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3 Modeling Biological Networks 51 iii 3.1 Geometric evolutionary dynamics of PPI networks . . . . . . . . . . . 51 3.2 Modeling brain functional networks . . . . . . . . . . . . . . . . . . . 66 3.3 Author’s contributions. . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4 Global Network Alignment 79 4.1 GRAph ALigner (GRAAL) . . . . . . . . . . . . . . . . . . . . . . . 80 4.2 Mathing-based GRAph ALigner (M-GRAAL) . . . . . . . . . . . . . 88 4.3 Evaluation of GRAAL and MGRAAL algorithms . . . . . . . . . . . 95 4.4 Comparison with other methods . . . . . . . . . . . . . . . . . . . . . 96 4.5 Biological applications . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.6 Author’s contributions. . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5 GraphCrunch 2: Software tool for network modeling, alignment and clustering 124 5.1 Main Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.4 Author’s contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 145 6 Conclusions 146 6.1 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6.2 Network Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A Appendix 150 A.1 Statistical tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 A.2 Supplementary Figures . . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.3 GraphCrunch 2 screenshots . . . . . . . . . . . . . . . . . . . . . . . 158 Bibliography 158 List of Figures 1.1 Example of the PPI network. . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Watts-Strogatz “Small-world” model . . . . . . . . . . . . . . . . . . 15 2.2 Graphs with different local structure . . . . . . . . . . . . . . . . . . 16 2.3 Graphlets and automorphism orbits . . . . . . . . . . . . . . . . . . . 18 2.4 Erdos-Renyi random graph . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 B-A preferential attachment scale-free network . . . . . . . . . . . . . 22 2.6 Duplication and mutation scale-free network . . . . . . . . . . . . . . 24 2.7 Degree distribution of the Human PPI network. . . . . . . . . . . . . 25 2.8 Geometric random graph example. . . . . . . . . . . . . . . . . . . . 27 2.9 Probability density functions p(dist|edge) and p(dist|nonedge). . . . . 35 2.10 Global Alignment Graph . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.11 Multiple Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.1 Degree distribution of geometric gene duplication model. . . . . . . . 56 3.2 Example distributions of points in the 2 dimensional Euclidean space, generated by GEO-GD models . . . . . . . . . . . . . . . . . . . . . . 57 3.3 GDD-agreement between yeast PPI and model networks . . . . . . . 60 3.4 GDD-agreement between human PPI and model networks . . . . . . 61 3.5 GDD-agreement between fruitfly PPI and model networks . . . . . . 62 3.6 GDD-agreement between worm PPI and model networks . . . . . . . 63 v 3.7 Experimental protocol (BFNs). . . . . . . . . . . . . . . . . . . . . . 70 3.8 Correlation Vs Mutual Information . . . . . . . . . . . . . . . . . . . 71 3.9 Number of edges in BFN depending on the cognitive task. . . . . . . 75 3.10 GDD-agreement between BFNs and model networks . . . . . . . . . . 77 4.1 GDD vector illustration. . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2 Largest and second largest CCS of yeast—human alignment. . . . . . 104 4.3 Protist phylogenetic trees. . . . . . . . . . . . . . . . . . . . . . . . . 105 4.4 The LCCS in yeast—human alignment uncovered by MGRAAL. . . . 116 4.5 LCCS in MGRAAL’s Campylobacter jejuni and Escherichia coli align- ment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 4.6 LCCSinMGRAAL’sMesorhizobiumlotiandSynechocystissp. PCC6803 alignment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.7 Viral phylogenetic tree. . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.1 GraphCrunch2: GDD-agreementbetweenviralPPIandmodelnetworks.132 5.2 GraphCrunch 2: RGF distance between model and viral networks. . . 133 5.3 GraphCrunch 2: User Interface . . . . . . . . . . . . . . . . . . . . . 143 A.1 The common subgraph in yeast2 and human1 PPI networks that is identified by GRAAL. . . . . . . . . . . . . . . . . . . . . . . . . . . 156 A.2 The common subgraph in yeast2 and human1 PPI networks that is identified by MGRAAL. . . . . . . . . . . . . . . . . . . . . . . . . . 157 A.3 GraphCrunch 2: Data Vs Model . . . . . . . . . . . . . . . . . . . . . 158 A.4 GraphCrunch 2: Pairwise Data Analysis . . . . . . . . . . . . . . . . 159 A.5 GraphCrunch 2: GRAAL . . . . . . . . . . . . . . . . . . . . . . . . . 159 A.6 GraphCrunch 2: Clustering . . . . . . . . . . . . . . . . . . . . . . . 160 A.7 GraphCrunch 2: Results visualization . . . . . . . . . . . . . . . . . . 161 List of Tables 3.1 PPI Networks that we compare with GEO-GD models . . . . . . . . 58 3.2 Network models that we compare with our new GEO-GD models . . 59 3.3 Types of ECoG data that we analyzed. . . . . . . . . . . . . . . . . . 69 3.4 Target and delay stages are topologically the most dissimilar. . . . . . 76 4.1 Random Node Deletions Tests.. . . . . . . . . . . . . . . . . . . . . . 97 4.2 Random Edge Deletions Tests. . . . . . . . . . . . . . . . . . . . . . . 97 4.3 Random Edge Addition Tests. . . . . . . . . . . . . . . . . . . . . . . 98 4.4 Additionofedgesfromthesetoflower-confidenceinteractions. GRAAL and MGRAAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.5 Additionofedgesfromthesetoflower-confidenceinteractions. HGRAAL and IsoRank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.6 Fraction of protein pairs in the MGRAAL’s alignment of yeast and human that share GO terms. . . . . . . . . . . . . . . . . . . . . . . . 117 4.7 Viral PPI networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.1 GraphCrunch 2: K-medoids clusters statistics. . . . . . . . . . . . . . 137 5.2 GraphCrunch 2: K-medoids clusters statistics. In these experiment we consider only GO terms corresponding to “biological process.” . . . . 139 5.3 GraphCrunch 2: K-medoids clusters statistics. In these experiment we analyze human PPI network from BioGRID . . . . . . . . . . . . . . 140 vii Acknowledgements FirstandforemostIamverygratefultomyscientificadvisorandcommitteechair, Professor Nataˇsa Prˇzulj, without whom this research would not have been possible. Prof. Prˇzulj’s professionalism and research insight make her a perfect scientific ad- visor. She has provided me with all the necessary guidance, encouragement, and advices and made my research experience not only rewarding but also very enjoyable. I would like to thank my committee members: Prof. Wayne Hayes, Prof. Zoran Nenadi´c and Prof. Rina Dechter for their cooperation. Additionally, I thank Prof. Hayes for his valuable and interesting discussions on most of the results presented in this dissertation. I am very thankful to Prof. Prˇzulj’s lab members: Tijana Milenkovi´c, Vesna Memisevi´c and Aleksandar Stevanovi´c for being great friends and collaborators over the years. Also, I am very grateful to all my family and friends for always providing all the necessary support for me. Finally, I would like to thank the Donald Bren School of Information and Com- puter Sciences at University of California, Irvine for being an ideal place for graduate students and for the financial support they provided for me. Additionally, my re- search has partially been supported by Prof. Prˇzulj’s NSF CAREER IIS-0644424 grant, Opportunity Award from the Center for Complex Biological Systems at UC Irvine, and travel awards from the National Library of Medicine/National Institutes of Health. Curriculum vitae Oleksii Kuchaiev Research Interests: Application of graph theory and statistics in large network analyses. Education: • Ph.D., Information and Computer Science, University of California, Irvine, 2010. Thesis Title: Modeling and Alignment of Biological Networks. Advisor: Prof. Nataˇsa Prˇzulj. GPA: 3.96 out of 4. • M.Sc., Information and Computer Science, University of California, Irvine, 2009. Thesis Title: Geometric Graphs in Biological Networks. Advisor: Prof. Nataˇsa Prˇzulj. GPA: 3.95 out of 4. • M.Sc., Applied Mathematics, National T. Shevchenko University of Kiev, Ukraine, 2007. GPA: 5.0 out of 5. • B.Sc., Applied Mathematics, National T. Shevchenko University of Kiev, Ukraine, 2005. GPA: 5.0 out of 5. Professional Positions:
Description: